There is a newer version of this guide for Ubuntu 20.04.
With Ubuntu 19.10 there is finally (experimental) ZFS setup option. And frankly, you should use it instead of the manual installation procedure. However, manual installation does offer it’s advantages - especially when it comes to pool layout and naming. If manual installation is needed, there is great Root on ZFS installation guide that’s part of ZFS-on-Linux project but its final ZFS layout is a bit too complicated for my taste. Here is my somewhat simplified version of the same intended for a singe disk installations.
After booting into Ubuntu desktop installation we want to get a root prompt. All further commands are going to need root credentials anyhow.
sudo-i
The very first step should be setting up a few variables - disk, pool, host name, and user name. This way we can use them going forward and avoid accidental mistakes. Just make sure to replace these values with ones appropriate for your system.
To start the fun we need debootstrap package. With 19.10 ZFS is available in main repository so we don’t need to add universe as in the previous Ubuntu versions.
aptinstall--yesdebootstrap
General idea of my disk setup is to maximize amount of space available for pool with the minimum of supporting partitions. If you are planning to have multiple kernels, increasing boot partition size might be a good idea. Major change as compared to my previous guide is partition numbering. While having partition layout different than partition order had its advantages, a lot of partition editing tools would simply “correct” the partition order to match layout and thus cause issues down the road.
Assuming UEFI boot, two additional partitions are needed. One for EFI and one for booting. Unlike what you get with the official guide, here I don’t have ZFS pool for boot partition but a plain old ext4. I find potential fixup works better that way and there is a better boot compatibility. If you are thinking about mirroring, making it bigger and ZFS might be a good idea. For a single disk, ext4 will do.
Since we’re dealing with encrypted data, we should auto mount it via crypttab. If there are multiple encrypted drives or partitions, keyscript really comes in handy to open them all with the same password. As it doesn’t have negative consequences, I just add it even for a single disk setup.
Now we get grub started and update our boot environment. Due to Ubuntu 19.10 having some kernel version kerfuffle, we need to manually create initramfs image. As before, boot cryptsetup discovery errors during mkinitramfs and update-initramfs as OK.
Running VirtualBox on ZFS pool intended for general use is not exactly the smoothest experience. Due to it’s disk access pattern, what works for all your data will not work for virtual machine disk access. Yes, you can play with record size and adding SLOG device but you can also go slightly different route. Add disk specifically for VirtualBox.
My testing has found that simple SSD with the following settings does wonders:
First of all, you don’t want compression. Not because data is not compressible but because compression can lead you to believe you have more space than you actually do. Even when you use fixed disk, you can run out of disk space just because some uncompressible data got written within VM. Due to copy-on-write architecture, you can still get into the trouble but exposure is greatly limited.
Ideally record size should match your expected load. In case of VirtualBox that’s 512 bytes. However, tracking 512 byte records takes so much metadata that 4K records are actually both more space efficient and perform better. Depending on your exact hardware you might find that going to 8K or even higher might hit the sweet spot. Testing is the only way to know for sure but 4K is a reasonable starting point.
All other options are just plumbing - of course you want UTF-8 and no access time tracking.
Now you can run VirtualBox without complicating your main data pool.
PPS: I usually just spin up temporary virtual machines for testing and thus I don’t care much about them long term. If you plan to kick something up long-term, do consider mirrored ZFS.
While I already wrote about expanding DropBox’s Ext4 volume on ZFS, I never actually wrote how to create one in the first place. I guess it’s time to fix that injustice.
First you need to create a volume of sufficient size. While you can just make it as big as your Dropbox allowance is, I would advise going with at least double of that. Not only this helps if you are doing ZFS snapshots (remember it’s copy-on-write) but it also helps if you are moving files around as Dropbox fully releases space only once the new files are created.
Whatever you decide, you need to create a volume and format it:
Do note the _netdev part as it ensures dropbox volume is mounted way after ZFS has already done so. Without it you might have a race condition and volume mounting might prevent subpools to be mounted under the same path.
Finally you can install Dropbox as you usually would. While it will complain about directory already being present, you can simply cancel directory selection and it will start syncing regardless.
I have already explained how I deal with ZFS mirror setup on Ubuntu 18.10. But what about laptops that generally come with a single drive?
Well, as before basic instructions are available from ZFS-on-Linux project. However, they do have a certain way of doing things I don’t necessarily subscribe to. Here is my way of setting this up. As always, it’s best to setup remote access so you can copy/paste as steps are numerous.
Unlike in the last guide, this time I want to have a bit of separation. Dataset system will contain the whole system, while data will contain only the home directories. Again, if you want to split it all, follow the original guide:
Due to Dropbox’s idiotic decision to limit file system support drastically for no reason other than to piss people off, I have a small ext4 volume hosted on my ZFS pool.
Originally I made it a bit small (only 8 GB) and got Dropbox complaining. Had I created it as partition, enlarging it would be annoying task at best. However, having it exposed as ZFS block volume, resize was trivial.
First I simply increased volsize property and then told ext4 to simply use that additional space (resize2fs command):
sudo zfs setvolsize=^^16G^^ ^^rpool/data/dropbox^^
sudo resize2fs ^^/dev/zvol/rpool/data/dropbox^^
resize2fs 1.44.4 (18-Aug-2018)
Filesystem at /dev/zvol/rpool/data/dropbox is mounted on /home/user/Dropbox; on-line resizing required
old_desc_blocks =1, new_desc_blocks =2
The filesystem on /dev/zvol/rpool/data/dropbox is now 4194304(4k) blocks long.
As I was setting up my new Linux machine with two disks, I decided to forgo my favorite Linux Mint and give Ubuntu another try. Main reason? ZFS of course.
Ubuntu already has a quite decent guide for ZFS setup but it’s slightly lacking in the mirroring department. So, here I will list steps that follow their approach closely but with slight adjustments as not only I want encrypted setup but also a proper ZFS mirror setup. If you need a single disk ZFS setup, stick with the original guide.
After booting into installation, we can go for Try Ubuntu and open a terminal. My strong suggestion would be to install openssh-server package first and connect to it remotely because that allows for copy/paste:
passwd
Changing password for ubuntu.``
(current) UNIX password: ^^(empty)^^
Enter new UNIX password: ^^password^^
Retype new UNIX password: ^^password^^
passwd: password updated successfully
sudoaptinstall--yes openssh-server
Regardless if you continue directly or you connect via SSH (username is ubuntu), the first task is to get onto root prompt and never leave it again. :)
sudo-i
To get the ZFS on, we need Internet connection and extra repository:
sudo apt-add-repository universe
apt update
Now we can finally install ZFS, partitioning utility, and an installation tool:
aptinstall--yesdebootstrap gdisk zfs-initramfs
First we clean the partition table on disks followed by a few partition definitions (do change ID to match your disks):
There is advantage into creating fine grained datasets as the official guide instructs, but I personally don’t do it. Having one big free-for-all pile is OK for me - anything of any significance I anyhow keep on my network drive where I have properly setup ZFS with rights, quotas, and all other goodies.
Since we are using LUKS encryption, we do need to mount 4th partition too. We’ll do it for both disks and deal with syncing them later:
mkdir /mnt/rpool/boot
mke2fs-t ext2 /dev/disk/by-id/ata_disk1-part4
mount /dev/disk/by-id/ata_disk1-part4 /mnt/rpool/boot
mkdir /mnt/rpool/boot2
mke2fs-t ext2 /dev/disk/by-id/^^ata_disk2^^-part4
mount /dev/disk/by-id/^^ata_disk2^^-part4 /mnt/rpool/boot2
Now we can finally start copying our Linux (do check for current release codename using lsb_release -a). This will take a while:
debootstrap ^^cosmic^^ /mnt/rpool/
Once done, turn off devices flag on pool and check if data has been written or we messed the paths up:
zfs setdevices=off rpool
zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 218M 29.6G 217M /mnt/rpool
Since our system is bare, we do need to prepare a few configuration files:
As this probably updated grub, we need to both correct config (only if we have bare dataset) and copy files to the other boot partition (this has to be repeated on every grub update):
Initial encryption of ZFS pool does require a bit of work - especially when it comes to initial disk randomization. Yes, you could skip it but then encrypted bits are going to stick out. It’s best to randomize it all before even doing anything ZFS related.
The first problem I had with the old setup was the need to start randomizing each disk separately. Since operation takes a while (days!), this usually resulted in me starting all dd commands concurrently thus starving it of resources (mostly CPU for random number generation).
As my CPU can generate enough random data to saturate two disks, it made sense to use parallelize xargs using the serial number (diskid) of each disk as an input. While using /dev/sd* would work, I tend to explicitly specify disks serial number as it’s not destructive if ran on the wrong machine. I consider it a protection against myself. :)
The final command still takes ages but it requires only one window and it will take care to keep only two disks busy at a time:
There used to be times when I encrypted each disk with a separate password and that’s still a bit more secure than having a single one. However, with multiple passwords comes a great annoyance. These days I only have a single password for all the disks in the same pool. It makes my life MUCH easier.
In theory, somebody cracking one disk will immediately get access to all my data but in practice it makes no big difference. If somebody decrypted one disk, they either: found a gaping hole in Geli and/or underlying encryption and thus the other disks will suffer the same fate and there’s nothing I can do; or they intercepted one of my keys. As I always use all the keys together, chances are that intercepting one is the same effort as intercepting them all. So I trade a bit of security for a major simplification.
When it came to setup my remote backup machine, only three things were important: use of 4K disks, two disk redundancy (raidz2), and a reasonably efficient storage of variously sized files. Reading around internet lead me to believe volblocksizetweaking was what I needed.
However, unless you create zvol, that knob is actually not available. The only available property impacting file storage capacity is recordsize. Therefore I decided to try out a couple record sizes and see how storage capacity compares.
For the purpose of test I decided to create virtual machine with six extra 20 GB disks. Yes, using virtual machine was not ideal but I was interested in relative results and not the absolute numbers so this would do. And mind you, I wasn’t interested in speed but just in data usage so again virtual machine seemed like a perfect environment.
Instead of properly testing with real files, I created 100000 files that were about 0.5K, 33000 files about 5K, 11000 files about 50K, 3700 files about 500K, 1200 files about 5M, and finally about 400 files around 50M. Essentially, there were six file sizes with each set being one decade bigger but happening only a third as often. The exact size for each file was actually randomly chosen to ensure some variety.
After repeating test three times with each size and for both 4 and 6 disk setup, I get the following result:
4 disks
6 disks
Record size
Used
Available
Used
Available
4K
-
0
61,557
17,064
8K
-
0
61,171
17,450
16K
34,008
4,191
31,223
47,398
32K
34,025
4,173
31,213
47,408
64K
31,300
6,899
31,268
47,353
128K
31,276
6,923
31,180
47,441
256K
30,719
7,481
31,432
47,189
512K
31,069
7,130
31,814
46,807
1024K
30,920
7,279
31,714
46,907
Two things of interest to note here. The first one is that small record size doesn’t really help at all. Quantity of metadata needed goes well over available disk space in the 4-disk case and causes extremely inefficient storage for 6 disks. Although test data set has 30.2 GB, with overhead occupancy goes into the 60+ GB territory. Quite inefficient.
The default 128K value is actually quite well selected. While my (artificial) data set has shown a bit better result with the larger record sizes, essentially everything at 64K and over doesn’t fare too badly.
PS: Excel with raw data and script example is available for download.
PPS: Yes, the script generates the same random numbers every time - this was done intentionally so that the same amount of logical space is used with every test. Do note that doesn’t translate to the same physical space usage as (mostly due to transaction group timing) a slightly different amount of metadata will be written.