Skip to main content

Lack of vision or lack of money - A TrueNAS RAIDZ expansion story.

As I sit here and watch rsync create another 10TB copy of my data for the fourth time in the last week, I'm thinking about how this could have gone differently and it's unclear as I write this if the problem was me. This is my experience converting my 4 drive mirror vdev into a 6 drive raidz2 on my TrueNAS server in my home while I work around the typical problems with a hobby, limited resources. As a home user, with a healthy paranoia of data loss, I need to be more efficient with the drives I have and a 50% loss with a mirror was proving too much.

I'm going to start at the beginning so when I read this back in a few years, I remember the pain. A year ago I set up a TrueNAS Scale for the first time, and created clever names for my pools like "Isla Nublar" (Jurassic Park reference) and "Top Secret". These names have spaces and capitalization and for over a year had driven me crazy in the CLI. Also at the time, and this is the larger of the two problems, I created a mirrored vdev. A mirror is great, but I'm a home user and my paranoia of data loss got the best of me, so my goal was to change this to a raidz2 but with more disks.

Start the process.  Back up my data to my DAS overnight.  First copy of my data.

Tell my friends what I'm about to do so they can admonish me for my lack of foresight. 

Insert my two spare drives, create a temp vdev with a mirror on these two and temporary datasets.

rsync -avh --progress "/mnt/Isla Nublar/data" "/mnt/mirror/data"

Data is now copied to a temp vdev (Second copy of my data). The big moment arrives and I delete my original mirrored array.

I'll mention that I did test how easy it was to expand an array on my existing mirror pool and it was simple but I did not connect the dots that my raidz2 would handle expansion differently.

The new raidz2 array is complete with 4 drives. I copy the data from the temporary datasets to the new dataset on the new pool. This is the third copy.

I now have two copies of my data again on the server and realize TrueNAS 24.xx Electric Eel didn't support raid expansion of raidz# as it uses an older openzfs. Luckily I'm late to the party for TrueNAS 25.xx Fangtooth upgrade so I knock that out as it includes the openzfs feature request for "raidz expansion" as described here https://github.com/openzfs/zfs/pull/15022.

The expansion option is now available in the GUI.  Expansion needs to be done one disk at a time, and the going from 4 to 5 drives went well, and then 5 to 6 was successful as well. Each required a lot of wait time to rewrite the new parity into the array and then perform a scrub.

I entered some feedback for TrueNAS as the GUI notification for the expansion jumps to 25% and states "Waiting for expansion to start".  Trust me, it's doing something but not telling anyone about it in the GUI.


I used: zpool status -v to verify status.


Relevant forum posts:
https://forums.truenas.com/t/raid-expansion-didnt-go-well-with-electriceel/8149/12

I also found a lot more detail on the mechanics of the expansion here which was fascinating:

https://louwrentius.com/zfs-raidz-expansion-is-awesome-but-has-a-small-caveat.html

Louwrentius's post regarding this topic describes why the process does not restripe the original data, just rebalances it across the drives.  To restripe requires copying the data once again which is how I began my post by creating the fourth copy to another temporary dataset on the raidz2 vdev I just created.

With all this downtime spent watching the rsync --progress on the CLI, I dove into snapshots and spent the time to truly understand them. The reason for this is that by copying the data onto the same vdev, it's now in a different dataset than the references by smb shares and apps. Rather than edit every app and smb share, I will create a snapshot of the temporary dataset on the raidz2 vdev, clone it to a new dataset with the old name and then promote that dataset to the primary. Now I delete the temporary dataset and all my data is logically back where it should be, but physically striped and rebalanced efficiently.

Was this the most efficient way to accomplish my goals? My conclusion is yes, I was forced down this path by a one important factor: The two drives I used to temporarily hold my data in the mirror pair were destined for my raidz2 array.

I had to transfer the data off before the expansion which means all my data copies were necessary. The alternative would have meant starting this process with 4 additional drives, then the data could have stayed on a 2 drive mirror while I upgraded the primary vdev to 6 drives before moving the data back to the new array.

This would have saved me from performing 2 of the 4 copies. As usual, it takes money to save time.

I neglected to mention it earlier, but my pools were renamed without spaces, or capitalization this time - "data" will work just fine going forward.

Comments