All things ZFS

Installing UEFI ZFS Root on Ubuntu 19.10

There is a newer version of this guide for Ubuntu 20.04.


With Ubuntu 19.10 there is finally (experimental) ZFS setup option. And frankly, you should use it instead of the manual installation procedure. However, manual installation does offer it’s advantages - especially when it comes to pool layout and naming. If manual installation is needed, there is great Root on ZFS installation guide that’s part of ZFS-on-Linux project but its final ZFS layout is a bit too complicated for my taste. Here is my somewhat simplified version of the same intended for a singe disk installations.

After booting into Ubuntu desktop installation we want to get a root prompt. All further commands are going to need root credentials anyhow.

sudo -i

The very first step should be setting up a few variables - disk, pool, host name, and user name. This way we can use them going forward and avoid accidental mistakes. Just make sure to replace these values with ones appropriate for your system.

DISK=/dev/disk/by-id/^^ata_disk^^
POOL=^^ubuntu^^
HOST=^^desktop^^
USER=^^user^^

To start the fun we need debootstrap package. With 19.10 ZFS is available in main repository so we don’t need to add universe as in the previous Ubuntu versions.

apt install --yes debootstrap

General idea of my disk setup is to maximize amount of space available for pool with the minimum of supporting partitions. If you are planning to have multiple kernels, increasing boot partition size might be a good idea. Major change as compared to my previous guide is partition numbering. While having partition layout different than partition order had its advantages, a lot of partition editing tools would simply “correct” the partition order to match layout and thus cause issues down the road.

sgdisk --zap-all                        $DISK

sgdisk -n1:1M:+127M -t1:EF00 -c1:EFI    $DISK
sgdisk -n2:0:+512M  -t2:8300 -c2:Boot   $DISK
sgdisk -n3:0:0      -t3:8309 -c3:Ubuntu $DISK

sgdisk --print                          $DISK

Unless there is a major reason otherwise, I like to use disk encryption.

cryptsetup luksFormat -q --cipher aes-xts-plain64 --key-size 512 \
    --pbkdf pbkdf2 --hash sha256 $DISK-part3

Of course, you should also then open device. I liku to use disk name as the name of mapped device, but really anything goes.

LUKSNAME=`basename $DISK`
cryptsetup luksOpen $DISK-part3 $LUKSNAME

Finally we’re ready to create system ZFS pool.

zpool create -o ashift=12 -O compression=lz4 -O normalization=formD \
    -O acltype=posixacl -O xattr=sa -O dnodesize=auto -O atime=off \
    -O canmount=off -O mountpoint=none -R /mnt/install $POOL /dev/mapper/$LUKSNAME
zfs create -o canmount=noauto -o mountpoint=/ $POOL/root
zfs mount $POOL/root

Assuming UEFI boot, two additional partitions are needed. One for EFI and one for booting. Unlike what you get with the official guide, here I don’t have ZFS pool for boot partition but a plain old ext4. I find potential fixup works better that way and there is a better boot compatibility. If you are thinking about mirroring, making it bigger and ZFS might be a good idea. For a single disk, ext4 will do.

yes | mkfs.ext4 $DISK-part2
mkdir /mnt/install/boot
mount $DISK-part2 /mnt/install/boot/

mkfs.msdos -F 32 -n EFI $DISK-part1
mkdir /mnt/install/boot/efi
mount $DISK-part1 /mnt/install/boot/efi

Bootstrapping Ubuntu on the newly created pool is next. This will take a while.

debootstrap eoan /mnt/install/

zfs set devices=off $POOL

Our newly copied system is lacking a few files and we should make sure they exist before proceeding.

echo $HOST > /mnt/install/etc/hostname
sed "s/ubuntu/$HOST/" /etc/hosts > /mnt/install/etc/hosts
sed '/cdrom/d' /etc/apt/sources.list > /mnt/install/etc/apt/sources.list
cp /etc/netplan/*.yaml /mnt/install/etc/netplan/

If you are installing via WiFi, you might as well copy your wireless credentials:

mkdir -p /mnt/install/etc/NetworkManager/system-connections/
cp /etc/NetworkManager/system-connections/* /mnt/install/etc/NetworkManager/system-connections/

Finally we’re ready to “chroot” into our new system.

mount --rbind /dev  /mnt/install/dev
mount --rbind /proc /mnt/install/proc
mount --rbind /sys  /mnt/install/sys
chroot /mnt/install \
    /usr/bin/env DISK=$DISK POOL=$POOL USER=$USER LUKSNAME=$LUKSNAME \
    bash --login

Let’s not forget to setup locale and time zone.

locale-gen --purge "en_US.UTF-8"
update-locale LANG=en_US.UTF-8 LANGUAGE=en_US
dpkg-reconfigure --frontend noninteractive locales

dpkg-reconfigure tzdata

Now we’re ready to onboard the latest Linux image.

apt update
apt install --yes --no-install-recommends linux-image-generic linux-headers-generic

Followed by boot environment packages.

apt install --yes zfs-initramfs cryptsetup keyutils grub-efi-amd64-signed shim-signed

Since we’re dealing with encrypted data, we should auto mount it via crypttab. If there are multiple encrypted drives or partitions, keyscript really comes in handy to open them all with the same password. As it doesn’t have negative consequences, I just add it even for a single disk setup.

echo "$LUKSNAME UUID=$(blkid -s UUID -o value $DISK-part3) none \
    luks,discard,initramfs,keyscript=decrypt_keyctl" >> /etc/crypttab
cat /etc/crypttab

To mount EFI and boot partitions, we need to do some fstab setup too:

echo "PARTUUID=$(blkid -s PARTUUID -o value $DISK-part2) \
    /boot ext4 noatime,nofail,x-systemd.device-timeout=5s 0 1" >> /etc/fstab
echo "PARTUUID=$(blkid -s PARTUUID -o value $DISK-part1) \
    /boot/efi vfat noatime,nofail,x-systemd.device-timeout=5s 0 1" >> /etc/fstab
cat /etc/fstab

Now we get grub started and update our boot environment. Due to Ubuntu 19.10 having some kernel version kerfuffle, we need to manually create initramfs image. As before, boot cryptsetup discovery errors during mkinitramfs and update-initramfs as OK.

KERNEL=`ls /usr/lib/modules/ | cut -d/ -f1 | sed 's/linux-image-//'`
update-initramfs -u -k $KERNEL

Grub update is what makes EFI tick.

update-grub
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=Ubuntu \
    --recheck --no-floppy

Finally we install out GUI environment. It’ll take ages.

apt-get install --yes ubuntu-desktop samba

Short package upgrade will not hurt.

apt dist-upgrade --yes

We can omit creation of the swap dataset but I personally find a small one handy.

zfs create -V 4G -b $(getconf PAGESIZE) -o compression=off -o logbias=throughput \
    -o sync=always -o primarycache=metadata -o secondarycache=none $POOL/swap
mkswap -f /dev/zvol/$POOL/swap
echo "/dev/zvol/$POOL/swap none swap defaults 0 0" >> /etc/fstab
echo RESUME=none > /etc/initramfs-tools/conf.d/resume

If one is so inclined, /home directory can get a separate dataset too.

rmdir /home
zfs create -o mountpoint=/home $POOL/home

And now we create the user.

adduser $USER

The only remaining task before restart is to assign extra groups to user and make sure its home has correct owner.

usermod -a -G adm,cdrom,dip,lpadmin,plugdev,sambashare,sudo $USER
chown -R $USER:$USER /home/$USER

As install is ready, we can exit our chroot environment.

exit

And cleanup our mount points.

umount /mnt/install/boot/efi
umount /mnt/install/boot
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
zpool export -a

After the reboot you should be able to enjoy your installation.

reboot

PS: There are versions of this guide using the native ZFS encryption for other Ubuntu versions: 21.10 and 20.04

PPS: For LUKS-based ZFS setup, check the following posts: 20.04, 19.04, and 18.10.

ZFS Pool for Virtual Machines

Running VirtualBox on ZFS pool intended for general use is not exactly the smoothest experience. Due to it’s disk access pattern, what works for all your data will not work for virtual machine disk access. Yes, you can play with record size and adding SLOG device but you can also go slightly different route. Add disk specifically for VirtualBox.

My testing has found that simple SSD with the following settings does wonders:

zpool create -o autoexpand=on -m /VirtualBox \
    -O compression=off -O recordsize=4K -O atime=off \
    -O utf8only=on -O normalization=formD -O casesensitivity=sensitive \
    VirtualBox /dev/diskid/^^DISK.eli^^

First of all, you don’t want compression. Not because data is not compressible but because compression can lead you to believe you have more space than you actually do. Even when you use fixed disk, you can run out of disk space just because some uncompressible data got written within VM. Due to copy-on-write architecture, you can still get into the trouble but exposure is greatly limited.

Ideally record size should match your expected load. In case of VirtualBox that’s 512 bytes. However, tracking 512 byte records takes so much metadata that 4K records are actually both more space efficient and perform better. Depending on your exact hardware you might find that going to 8K or even higher might hit the sweet spot. Testing is the only way to know for sure but 4K is a reasonable starting point.

All other options are just plumbing - of course you want UTF-8 and no access time tracking.

Now you can run VirtualBox without complicating your main data pool.


PS: This assumes that you have disks enumerated by diskid and fully encrypted.

PPS: I usually just spin up temporary virtual machines for testing and thus I don’t care much about them long term. If you plan to kick something up long-term, do consider mirrored ZFS.

Installing DropBox on ZFS

While I already wrote about expanding DropBox’s Ext4 volume on ZFS, I never actually wrote how to create one in the first place. I guess it’s time to fix that injustice.

First you need to create a volume of sufficient size. While you can just make it as big as your Dropbox allowance is, I would advise going with at least double of that. Not only this helps if you are doing ZFS snapshots (remember it’s copy-on-write) but it also helps if you are moving files around as Dropbox fully releases space only once the new files are created.

Whatever you decide, you need to create a volume and format it:

sudo zfs create -V 12G ^^pool^^/dropbox
sudo mkfs.ext4 /dev/zvol/^^pool^^/dropbox

Once volume is created, mounting the newly created volume within our user directory is in order:

mkdir /home/^^user^^/Dropbox
sudo mount /dev/zvol/^^pool^^/dropbox /home/^^user^^/Dropbox
sudo chown -R ^^user^^:^^user^^ Dropbox

Of course, to retain it between reboots one should add it to fstab:

echo "/dev/zvol/^^pool^^/dropbox /home/^^user^^/Dropbox ext4 defaults,_netdev 0 0" | sudo tee -a /etc/fstab

Do note the _netdev part as it ensures dropbox volume is mounted way after ZFS has already done so. Without it you might have a race condition and volume mounting might prevent subpools to be mounted under the same path.

Finally you can install Dropbox as you usually would. While it will complain about directory already being present, you can simply cancel directory selection and it will start syncing regardless.

Congratulations, your Dropbox is now on ZFS.

Installing UEFI ZFS Root on Ubuntu 19.04

There is a newer version of this guide for Ubuntu 19.10.


As rumors of Ubuntu 19.04 including ZFS installer proved to be a bit premature, I guess it’s time for a slight adjustment to my previous ZFS instructions.

Again, all this is just a derivation on ZFS-on-Linux project’s instruction for older version.

As before, we first need to get into root prompt:

sudo -i

Followed by getting a few basic packages ready:

apt-add-repository universe
apt update
apt install --yes debootstrap gdisk zfs-initramfs

Disk setup is quite simple with only two partitions:

sgdisk --zap-all                      /dev/disk/by-id/^^ata_disk^^

sgdisk -n3:1M:+511M -t3:8300 -c3:Boot /dev/disk/by-id/^^ata_disk^^
sgdisk -n2:0:+128M  -t2:EF00 -c2:EFI  /dev/disk/by-id/^^ata_disk^^
sgdisk -n1:0:0      -t1:8300 -c1:Data /dev/disk/by-id/^^ata_disk^^

sgdisk --print                        /dev/disk/by-id/^^ata_disk^^

I believe full disk encryption should be a no-brainer so of course we set up LUKS:

cryptsetup luksFormat -q --cipher aes-xts-plain64 --key-size 512 \
    --pbkdf pbkdf2 --hash sha256 /dev/disk/by-id/^^ata_disk^^-part1
cryptsetup luksOpen /dev/disk/by-id/^^ata_disk^^-part1 system

Creating ZFS stays the same as before:

zpool create -o ashift=12 -O atime=off -O canmount=off -O compression=lz4 \
      -O normalization=formD -O xattr=sa -O mountpoint=none system /dev/mapper/system
zfs create -o canmount=noauto -o mountpoint=/mnt/system/ system/root
zfs mount system/root

Getting basic installation on our disks follows next:

debootstrap disco /mnt/system/
zfs set devices=off system
zfs list

And then we setup EFI boot partition:

yes | mkfs.ext4 /dev/disk/by-id/^^ata_disk^^-part3
mount /dev/disk/by-id/^^ata_disk^^-part3 /mnt/system/boot/

mkdir /mnt/system/boot/efi
mkfs.msdos -F 32 -n EFI /dev/disk/by-id/^^ata_disk^^-part2
mount /dev/disk/by-id/^^ata_disk^^-part2 /mnt/system/boot/efi

We need to ensure boot partition auto-mounts:

echo PARTUUID=$(blkid -s PARTUUID -o value /dev/disk/by-id/^^ata_disk^^-part3) \
    /boot ext4 noatime,nofail,x-systemd.device-timeout=5s 0 1 >> /mnt/system/etc/fstab
echo PARTUUID=$(blkid -s PARTUUID -o value /dev/disk/by-id/^^ata_disk^^-part2) \
    /boot/efi vfat noatime,nofail,x-systemd.device-timeout=5s 0 1 >> /mnt/system/etc/fstab
cat /mnt/system/etc/fstab

Before we start using anything, we should prepare a few necessary files:

echo "^^hostname^^" > /mnt/system/etc/hostname
sed 's/ubuntu/^^hostname^^/' /etc/hosts > /mnt/system/etc/hosts
sed '/cdrom/d' /etc/apt/sources.list > /mnt/system/etc/apt/sources.list
cp /etc/netplan/*.yaml /mnt/system/etc/netplan/

If you are installing via WiFi, you might as well copy your credentials:

mkdir -p /mnt/system/etc/NetworkManager/system-connections/
cp /etc/NetworkManager/system-connections/* /mnt/system/etc/NetworkManager/system-connections/

With chroot we can get the first taste of our new system:

mount --rbind --make-rslave /dev  /mnt/system/dev
mount --rbind --make-rslave /proc /mnt/system/proc
mount --rbind --make-rslave /sys  /mnt/system/sys
chroot /mnt/system/ /bin/bash --login

Now we can update our software:

apt update

Immediately followed with locale and time zone setup:

locale-gen --purge "en_US.UTF-8"
update-locale LANG=en_US.UTF-8 LANGUAGE=en_US
dpkg-reconfigure --frontend noninteractive locales

dpkg-reconfigure tzdata

Now we install Linux image and basic ZFS boot packages:

apt install --yes --no-install-recommends linux-image-generic
apt install --yes zfs-initramfs

Since we’re dealing with encrypted data, our cryptsetup should be also auto mounted:

apt install --yes cryptsetup keyutils

echo "system UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk^^-part1) \
    none luks,discard,initramfs,keyscript=decrypt_keyctl" >> /etc/crypttab

cat /etc/crypttab

Now we get grub started:

apt install --yes grub-efi-amd64

And update our boot environment again (seeing errors is nothing unusual):

update-initramfs -u -k all

And then we finalize our grup setup:

update-grub
grub-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=Ubuntu --recheck --no-floppy

Finally we get the rest of desktop system:

apt-get install --yes ubuntu-desktop samba linux-headers-generic
apt dist-upgrade --yes

We can omit creation of the swap dataset but I always find it handy:

zfs create -V 4G -b $(getconf PAGESIZE) -o compression=off -o logbias=throughput \
    -o sync=always -o primarycache=metadata -o secondarycache=none system/swap
mkswap -f /dev/zvol/system/swap
echo "/dev/zvol/system/swap none swap defaults 0 0" >> /etc/fstab
echo RESUME=none > /etc/initramfs-tools/conf.d/resume

If one is so inclined, /home directory can get a separate dataset too:

rmdir /home
zfs create -o mountpoint=/home system/home

Only remaining thing before restart is to create user:

adduser ^^user^^
usermod -a -G adm,cdrom,dip,lpadmin,plugdev,sambashare,sudo ^^user^^
chown -R ^^user^^:^^user^^ /home/^^user^^

As install is ready, we can exit our chroot environment and reboot unmount our new environment. If unmount fails, just repeat it until it doesn’t. :)

exit
umount -R /mnt/system

Finally we can correct root’s mount point and reboot:

zfs set mountpoint=/ system/root
reboot

Assuming nothing went wrong, your UEFI system is now ready.


[2019-10-27: Added --make-rslave]


PS: There are versions of this guide using the native ZFS encryption for other Ubuntu versions: 21.10 and 20.04

PPS: For LUKS-based ZFS setup, check the following posts: 20.04, 19.10, and 18.10.

UEFI Install for Root ZFS Ubuntu 18.10

There is a newer version of this guide for Ubuntu 19.04.


Booting ZFS Ubuntu of MBR is story I already told. But what if we want an encrypted UEFI ZFS setup?

Well, it’s quite simple to previous steps and again just a derivation on ZFS-on-Linux project.

As before, we first need to get into root prompt:

sudo -i

Followed by getting a few basic packages ready:

apt-add-repository universe
apt update
apt install --yes debootstrap gdisk zfs-initramfs

Disk setup is quite simple with only two partitions:

sgdisk --zap-all             /dev/disk/by-id/^^ata_disk^^

sgdisk -n2:1M:+511M -t2:EF00 /dev/disk/by-id/^^ata_disk^^
sgdisk -n1:0:0      -t1:8300 /dev/disk/by-id/^^ata_disk^^

sgdisk --print               /dev/disk/by-id/^^ata_disk^^
 Number  Start (sector)    End (sector)  Size       Code  Name
    1         1050624        67108830   31.5 GiB    8300
    2            2048         1050623   512.0 MiB   8300

I believe full disk encryption should be a no-brainer so of course we set up LUKS:

cryptsetup luksFormat -qc aes-xts-plain64 -s 512 -h sha256 /dev/disk/by-id/^^ata_disk^^-part1
cryptsetup luksOpen /dev/disk/by-id/^^ata_disk^^-part1 luks1

Creating ZFS stays the same as before:

zpool create -o ashift=12 -O atime=off -O canmount=off -O compression=lz4 -O normalization=formD \
    -O xattr=sa -O mountpoint=none rpool /dev/mapper/luks1
zfs create -o canmount=noauto -o mountpoint=/mnt/rpool/ rpool/system
zfs mount rpool/system

Getting basic installation on our disks follows next:

debootstrap cosmic /mnt/rpool/
zfs set devices=off rpool
zfs list

And then we setup EFI boot partition:

mkdosfs -F 32 -n EFI /dev/disk/by-id/^^ata_disk^^-part2
mount /dev/disk/by-id/^^ata_disk^^-part2 /mnt/rpool/boot/

We need to ensure boot partition auto-mounts:

echo PARTUUID=$(blkid -s PARTUUID -o value /dev/disk/by-id/^^ata_disk^^-part2) /boot vfat noatime,nofail,x-systemd.device-timeout=5s 0 1 >> /mnt/rpool/etc/fstab
cat /mnt/rpool/etc/fstab

Before we start using anything, we should prepare a few necessary files:

cp /etc/hostname /mnt/rpool/etc/hostname
cp /etc/hosts /mnt/rpool/etc/hosts
cp /etc/netplan/*.yaml /mnt/rpool/etc/netplan/
sed '/cdrom/d' /etc/apt/sources.list > /mnt/rpool/etc/apt/sources.list

If you are dual-booting system with Windows, do consider turning off UTC BIOS time:

echo UTC=no >> /mnt/rpool/etc/default/rc5

With chroot we can get the first taste of our new system:

mount --rbind /dev  /mnt/rpool/dev
mount --rbind /proc /mnt/rpool/proc
mount --rbind /sys  /mnt/rpool/sys
chroot /mnt/rpool/ /bin/bash --login

Now we can update our software:

apt update

Immediately followed with locale and time zone setup:

locale-gen --purge "en_US.UTF-8"
update-locale LANG=en_US.UTF-8 LANGUAGE=en_US
dpkg-reconfigure --frontend noninteractive locales

dpkg-reconfigure tzdata

Now we install Linux image and basic ZFS boot packages:

apt install --yes --no-install-recommends linux-image-generic
apt install --yes zfs-initramfs

Since we’re dealing with encrypted data, our cryptsetup should be also auto mounted:

apt install --yes cryptsetup

echo "luks1 UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk^^-part1) none luks,discard,initramfs" >> /etc/crypttab
cat /etc/crypttab

Now we get grub started:

apt install --yes grub-efi-amd64

And update our boot environment again (seeing errors is nothing unusual):

update-initramfs -u -k all

And then we finalize our grup setup:

update-grub
grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=ubuntu --recheck --no-floppy

Finally we get the rest of desktop system:

apt-get install --yes ubuntu-desktop samba linux-headers-generic
apt dist-upgrade --yes

We can omit creation of the swap dataset but I always find it handy:

zfs create -V 4G -b $(getconf PAGESIZE) -o compression=off -o logbias=throughput -o sync=always \
    -o primarycache=metadata -o secondarycache=none rpool/swap
mkswap -f /dev/zvol/rpool/swap
echo "/dev/zvol/rpool/swap none swap defaults 0 0" >> /etc/fstab
echo RESUME=none > /etc/initramfs-tools/conf.d/resume

If one is so inclined, /home directory can get a separate dataset too:

rmdir /home
zfs create -o mountpoint=/home rpool/data

Only remaining thing before restart is to create user:

adduser ^^user^^
usermod -a -G adm,cdrom,dip,lpadmin,plugdev,sambashare,sudo ^^user^^
chown -R ^^user^^:^^user^^ /home/^^user^^

As install is ready, we can exit our chroot environment and reboot:

exit
reboot

You will get stuck after the password prompt as our mountpoint for system dataset is wrong. That’s easy to correct:

zfs set mountpoint=/ rpool/system
exit
reboot

Assuming nothing went wrong, your UEFI system is now ready.


PS: There are versions of this guide using the native ZFS encryption for other Ubuntu versions: 21.10 and 20.04

PPS: For LUKS-based ZFS setup, check the following posts: 20.04, 19.10, and 19.04.

Setting up Encrypted Ubuntu 18.10 ZFS Desktop

I have already explained how I deal with ZFS mirror setup on Ubuntu 18.10. But what about laptops that generally come with a single drive?

Well, as before basic instructions are available from ZFS-on-Linux project. However, they do have a certain way of doing things I don’t necessarily subscribe to. Here is my way of setting this up. As always, it’s best to setup remote access so you can copy/paste as steps are numerous.

As before, we first need to get into root prompt:

sudo -i

Followed by getting a few basic packages ready:

apt-add-repository universe
apt update
apt install --yes debootstrap gdisk zfs-initramfs

We setup disks essentially the same way as in previous guide:

sgdisk --zap-all                 /dev/disk/by-id/^^ata_disk^^

sgdisk -a1 -n3:34:2047  -t3:EF02 /dev/disk/by-id/^^ata_disk^^
sgdisk     -n2:1M:+511M -t2:8300 /dev/disk/by-id/^^ata_disk^^
sgdisk     -n1:0:0      -t1:8300 /dev/disk/by-id/^^ata_disk^^

sgdisk --print                   /dev/disk/by-id/^^ata_disk^^
 …
 Number  Start (sector)    End (sector)  Size       Code  Name
    1         1050624        67108830   31.5 GiB    8300
    2            2048         1050623   512.0 MiB   8300
    3              34            2047   1007.0 KiB  EF02

Because we want encryption, we need to setup LUKS:

cryptsetup luksFormat -qc aes-xts-plain64 -s 512 -h sha256 /dev/disk/by-id/^^ata_disk^^-part1
cryptsetup luksOpen /dev/disk/by-id/^^ata_disk^^-part1 luks1

Unlike in the last guide, this time I want to have a bit of separation. Dataset system will contain the whole system, while data will contain only the home directories. Again, if you want to split it all, follow the original guide:

zpool create -o ashift=12 -O atime=off -O canmount=off -O compression=lz4 -O normalization=formD \
    -O xattr=sa -O mountpoint=none rpool /dev/mapper/luks1
zfs create -o canmount=noauto -o mountpoint=/mnt/rpool/ rpool/system
zfs mount rpool/system

We should also setup the boot partition:

mke2fs -Ft ext2 /dev/disk/by-id/^^ata_disk^^-part2
mkdir /mnt/rpool/boot/
mount /dev/disk/by-id/^^ata_disk^^-part2 /mnt/rpool/boot/

Now we can get basic installation onto our disks:

debootstrap cosmic /mnt/rpool/
zfs set devices=off rpool
zfs list

Before we start using it, we prepare few necessary files:

cp /etc/hostname /mnt/rpool/etc/hostname
cp /etc/hosts /mnt/rpool/etc/hosts
cp /etc/netplan/*.yaml /mnt/rpool/etc/netplan/
sed '/cdrom/d' /etc/apt/sources.list > /mnt/rpool/etc/apt/sources.list

With chroot we can get the first taste of our new system:

mount --rbind /dev  /mnt/rpool/dev
mount --rbind /proc /mnt/rpool/proc
mount --rbind /sys  /mnt/rpool/sys
chroot /mnt/rpool/ /bin/bash --login

Now we can update our software and perform locale and time zone setup:

apt update

locale-gen --purge "en_US.UTF-8"
update-locale LANG=en_US.UTF-8 LANGUAGE=en_US
dpkg-reconfigure --frontend noninteractive locales

dpkg-reconfigure tzdata

Now we install Linux image and basic ZFS boot packages:

apt install --yes --no-install-recommends linux-image-generic
apt install --yes zfs-initramfs

Since we’re dealing with encrypted data, our cryptsetup should be also auto mounted:

apt install --yes cryptsetup

echo "luks1 UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk^^-part1) none luks,discard,initramfs" >> /etc/crypttab
cat /etc/crypttab

And of course, we need to auto-mount our boot partition too:

echo "UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk^^-part2) /boot ext2 noatime 0 2" >> /etc/fstab
cat /etc/fstab

Now we get grub started (do select the WHOLE disk):

apt install --yes grub-pc

And update our boot environment again (seeing errors is nothing unusual):

update-initramfs -u -k all

And then we finalize our grup setup:

update-grub
grub-install /dev/disk/by-id/^^ata_disk^^

Finally we get the rest of desktop system:

apt-get install --yes ubuntu-desktop samba linux-headers-generic
apt dist-upgrade --yes

We can omit creation of the swap dataset but I always find it handy:

zfs create -V 4G -b $(getconf PAGESIZE) -o compression=off -o logbias=throughput -o sync=always \
    -o primarycache=metadata -o secondarycache=none rpool/swap
mkswap -f /dev/zvol/rpool/swap
echo "/dev/zvol/rpool/swap none swap defaults 0 0" >> /etc/fstab
echo RESUME=none > /etc/initramfs-tools/conf.d/resume

And now is good time to swap our /home directory too:

rmdir /home
zfs create -o mountpoint=/home rpool/data

Now we are ready to create the user:

adduser -u 1002 ^^user^^
usermod -a -G adm,cdrom,dip,lpadmin,plugdev,sambashare,sudo ^^user^^
chown -R ^^user^^:^^user^^ /home/^^user^^

Lastly we exit our chroot environment and reboot:

exit
reboot

You will get stuck after the password prompt as our mountpoint for system dataset is wrong. That’s easy to correct:

zfs set mountpoint=/ rpool/system
exit
reboot

Assuming nothing went wrong, your system is now ready.

Expanding Ext4 Volume on ZFS

Due to Dropbox’s idiotic decision to limit file system support drastically for no reason other than to piss people off, I have a small ext4 volume hosted on my ZFS pool.

Originally I made it a bit small (only 8 GB) and got Dropbox complaining. Had I created it as partition, enlarging it would be annoying task at best. However, having it exposed as ZFS block volume, resize was trivial.

First I simply increased volsize property and then told ext4 to simply use that additional space (resize2fs command):

sudo zfs set volsize=^^16G^^ ^^rpool/data/dropbox^^

sudo resize2fs ^^/dev/zvol/rpool/data/dropbox^^
 resize2fs 1.44.4 (18-Aug-2018)
 Filesystem at /dev/zvol/rpool/data/dropbox is mounted on /home/user/Dropbox; on-line resizing required
 old_desc_blocks = 1, new_desc_blocks = 2
 The filesystem on /dev/zvol/rpool/data/dropbox is now 4194304 (4k) blocks long.

Doesn’t get much easier.

Booting Encrypted ZFS Mirror on Ubuntu 18.10

As I was setting up my new Linux machine with two disks, I decided to forgo my favorite Linux Mint and give Ubuntu another try. Main reason? ZFS of course.

Ubuntu already has a quite decent guide for ZFS setup but it’s slightly lacking in the mirroring department. So, here I will list steps that follow their approach closely but with slight adjustments as not only I want encrypted setup but also a proper ZFS mirror setup. If you need a single disk ZFS setup, stick with the original guide.

After booting into installation, we can go for Try Ubuntu and open a terminal. My strong suggestion would be to install openssh-server package first and connect to it remotely because that allows for copy/paste:

passwd
Changing password for ubuntu.``
 (current) UNIX password: ^^(empty)^^
 Enter new UNIX password: ^^password^^
 Retype new UNIX password: ^^password^^
 passwd: password updated successfully

sudo apt install --yes openssh-server

Regardless if you continue directly or you connect via SSH (username is ubuntu), the first task is to get onto root prompt and never leave it again. :)

sudo -i

To get the ZFS on, we need Internet connection and extra repository:

sudo apt-add-repository universe
apt update

Now we can finally install ZFS, partitioning utility, and an installation tool:

apt install --yes debootstrap gdisk zfs-initramfs

First we clean the partition table on disks followed by a few partition definitions (do change ID to match your disks):

sgdisk --zap-all /dev/disk/by-id/^^ata_disk1^^
sgdisk --zap-all /dev/disk/by-id/^^ata_disk2^^

sgdisk -a1 -n2:34:2047 -t2:EF02 /dev/disk/by-id/^^ata_disk1^^
sgdisk -a1 -n2:34:2047 -t2:EF02 /dev/disk/by-id/^^ata_disk2^^

sgdisk     -n3:1M:+512M -t3:EF00 /dev/disk/by-id/^^ata_disk1^^
sgdisk     -n3:1M:+512M -t3:EF00 /dev/disk/by-id/^^ata_disk2^^

sgdisk     -n4:0:+512M  -t4:8300 /dev/disk/by-id/^^ata_disk1^^
sgdisk     -n4:0:+512M  -t4:8300 /dev/disk/by-id/^^ata_disk2^^

sgdisk     -n1:0:0      -t1:8300 /dev/disk/by-id/^^ata_disk1^^
sgdisk     -n1:0:0      -t1:8300 /dev/disk/by-id/^^ata_disk2^^

After all these we should end up with both disks showing 4 distinct partitions:

sgdisk --print /dev/disk/by-id/^^ata_disk1^^
 …
 Number  Start (sector)    End (sector)  Size       Code  Name
    1         2099200        67108830   31.0 GiB    8300
    2              34            2047   1007.0 KiB  EF02
    3            2048         1050623   512.0 MiB   EF00
    4         1050624         2099199   512.0 MiB   8300

With partitioning done, it’s time to encrypt our disks and mount them (note that we only encrypt the first partition, not the whole disk):

cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha256 /dev/disk/by-id/^^ata_disk1^^-part1
cryptsetup luksFormat -c aes-xts-plain64 -s 512 -h sha256 /dev/disk/by-id/^^ata_disk2^^-part1

cryptsetup luksOpen /dev/disk/by-id/^^ata_disk1^^-part1 luks1
cryptsetup luksOpen /dev/disk/by-id/^^ata_disk2^^-part1 luks2

Finally we can create our pool (rpool is a “standard” name) consisting of both encrypted devices:

zpool create -o ashift=12 -O atime=off -O compression=lz4 \
    -O normalization=formD -O xattr=sa -O mountpoint=/ -R /mnt/rpool \
    rpool mirror /dev/mapper/luks1 /dev/mapper/luks2

There is advantage into creating fine grained datasets as the official guide instructs, but I personally don’t do it. Having one big free-for-all pile is OK for me - anything of any significance I anyhow keep on my network drive where I have properly setup ZFS with rights, quotas, and all other goodies.

Since we are using LUKS encryption, we do need to mount 4th partition too. We’ll do it for both disks and deal with syncing them later:

mkdir /mnt/rpool/boot
mke2fs -t ext2 /dev/disk/by-id/ata_disk1-part4
mount /dev/disk/by-id/ata_disk1-part4 /mnt/rpool/boot

mkdir /mnt/rpool/boot2
mke2fs -t ext2 /dev/disk/by-id/^^ata_disk2^^-part4
mount /dev/disk/by-id/^^ata_disk2^^-part4 /mnt/rpool/boot2

Now we can finally start copying our Linux (do check for current release codename using lsb_release -a). This will take a while:

debootstrap ^^cosmic^^ /mnt/rpool/

Once done, turn off devices flag on pool and check if data has been written or we messed the paths up:

zfs set devices=off rpool

zfs list
 NAME    USED  AVAIL  REFER  MOUNTPOINT
 rpool   218M  29.6G   217M  /mnt/rpool

Since our system is bare, we do need to prepare a few configuration files:

cp /etc/hostname /mnt/rpool/etc/hostname
cp /etc/hosts /mnt/rpool/etc/hosts
cp /etc/netplan/*.yaml /mnt/rpool/etc/netplan/
sed '/cdrom/d' /etc/apt/sources.list > /mnt/rpool/etc/apt/sources.list

Finally we get to try our our new system:

mount --rbind /dev  /mnt/rpool/dev
mount --rbind /proc /mnt/rpool/proc
mount --rbind /sys  /mnt/rpool/sys
chroot /mnt/rpool/ /bin/bash --login

Once in our new OS, a few further updates are in order:

apt update

locale-gen --purge "^^en_US.UTF-8^^"
update-locale LANG=^^en_US.UTF-8^^ LANGUAGE=^^en_US^^
dpkg-reconfigure --frontend noninteractive locales

dpkg-reconfigure tzdata

Now we need to install linux image and headers:

apt install --yes --no-install-recommends linux-image-generic linux-headers-generic

Then we configure booting ZFS:

apt install --yes zfs-initramfs
echo UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk1^^-part4) /boot  ext2 noatime 0 2 >> /etc/fstab
echo UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk2^^-part4) /boot2 ext2 noatime 0 2 >> /etc/fstab

And disk decryption:

apt install --yes cryptsetup
echo "luks1 UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk1^^-part1) none luks,discard,initramfs" >> /etc/crypttab
echo "luks2 UUID=$(blkid -s UUID -o value /dev/disk/by-id/^^ata_disk2^^-part1) none luks,discard,initramfs" >> /etc/crypttab

And install grub bootloader (select both disks - not partitions!):

apt install --yes grub-pc

Followed by update of boot environment (some errors are ok):

update-initramfs -u -k all
 update-initramfs: Generating /boot/initrd.img-4.18.0-12-generic
 cryptsetup: ERROR: Couldn't resolve device rpool
 cryptsetup: WARNING: Couldn't determine root device

Now we update the grub and fix its config (only needed if you are not using sub-datasets):

update-grub
sed -i "s^root=ZFS=rpool/^root=ZFS=rpool^g" /boot/grub/grub.cfg

Now we get to copy all boot files to second disk:

cp -rp /boot/* /boot2/

With grub install we’re getting close to the end of story:

grub-install /dev/disk/by-id/^^ata_disk1^^
 Installing for i386-pc platform.
 Installation finished. No error reported.

grub-install /dev/disk/by-id/^^ata_disk2^^
 Installing for i386-pc platform.
 Installation finished. No error reported.

Now we install full GUI and upgrade whatever needs it (takes a while):

sudo apt-get install --yes ubuntu-desktop samba
apt dist-upgrade --yes

As this probably updated grub, we need to both correct config (only if we have bare dataset) and copy files to the other boot partition (this has to be repeated on every grub update):

sed -i "s^root=ZFS=rpool/^root=ZFS=rpool^g" /boot/grub/grub.cfg
cp -rp /boot/* /boot2/

Having some swap is always a good idea:

zfs create -V 4G -b $(getconf PAGESIZE) -o compression=off -o logbias=throughput -o sync=always \
    -o primarycache=metadata -o secondarycache=none rpool/swap

mkswap -f /dev/zvol/rpool/swap
echo /dev/zvol/rpool/swap none swap defaults 0 0 >> /etc/fstab
echo RESUME=none > /etc/initramfs-tools/conf.d/resume

Almost there, it’s time to set root password:

passwd

And to create our user for desktop environment:

adduser ^^user^^
usermod -a -G adm,cdrom,dip,lpadmin,plugdev,sambashare,sudo ^^user^^

Finally, we can reboot (don’t forget to remove CD) and enjoy our system:

exit
reboot

Encrypted ZFS (A Slightly Parallel Edition)

Initial encryption of ZFS pool does require a bit of work - especially when it comes to initial disk randomization. Yes, you could skip it but then encrypted bits are going to stick out. It’s best to randomize it all before even doing anything ZFS related.

The first problem I had with the old setup was the need to start randomizing each disk separately. Since operation takes a while (days!), this usually resulted in me starting all dd commands concurrently thus starving it of resources (mostly CPU for random number generation).

As my CPU can generate enough random data to saturate two disks, it made sense to use parallelize xargs using the serial number (diskid) of each disk as an input. While using /dev/sd* would work, I tend to explicitly specify disks serial number as it’s not destructive if ran on the wrong machine. I consider it a protection against myself. :)

The final command still takes ages but it requires only one window and it will take care to keep only two disks busy at a time:

echo "^^DISK-ID-123^^ ^^DISK-ID-456^^ ^^DISK-ID-789^^ ^^DISK-ID-012^^" | \
  tr ' ' '\n' | xargs -I '{}' -P 2 \
  dd if=/dev/urandom of=/dev/diskid/{} bs=1M

After drives are “cleaned”, I do the encryption (one-by-one this time):

echo "^^DISK-ID-123^^ ^^DISK-ID-456^^ ^^DISK-ID-789^^ ^^DISK-ID-012^^" | \
  tr ' ' '\n' | xargs -I '{}' \
  geli init -e AES-XTS -l 128 -s 4096 '/dev/diskid/{}'

There used to be times when I encrypted each disk with a separate password and that’s still a bit more secure than having a single one. However, with multiple passwords comes a great annoyance. These days I only have a single password for all the disks in the same pool. It makes my life MUCH easier.

In theory, somebody cracking one disk will immediately get access to all my data but in practice it makes no big difference. If somebody decrypted one disk, they either: found a gaping hole in Geli and/or underlying encryption and thus the other disks will suffer the same fate and there’s nothing I can do; or they intercepted one of my keys. As I always use all the keys together, chances are that intercepting one is the same effort as intercepting them all. So I trade a bit of security for a major simplification.

Now we get to attach all encrypted drives:

echo "^^DISK-ID-123^^ ^^DISK-ID-456^^ ^^DISK-ID-789^^ ^^DISK-ID-012^^" | \
  tr ' ' '\n' | xargs -I '{}' \
  geli attach '/dev/diskid/{}'

And the final step is creating ZFS pool, using RAIDZ2 and allowing for loss of two disks before data is compromised:

zpool create \
  -o autoexpand=on -m none -O compression=gzip-7 -O atime=off \
  -O utf8only=on -O normalization=formD -O casesensitivity=sensitive \
  ^^Data^^ raidz2 \
  /dev/diskid/^^DISK-ID-123^^.eli  /dev/diskid/^^DISK-ID-456^^.eli /dev/diskid/^^DISK-ID-789^^.eli /dev/diskid/^^DISK-ID-012^^.eli

And that’s it - pool is ready for all the data you can throw at it.


PS: Yes, I am still using Geli - native ZFS encryption didn’t find its way to FreeBSD yet.

PPS: If machine goes down, it is enough to re-attach Geli disks followed by restart of the ZFS daemon:

echo "^^DISK-ID-123^^ ^^DISK-ID-456^^ ^^DISK-ID-789^^ ^^DISK-ID-012^^" | \
  tr ' ' '\n' | xargs -I '{}' \
  geli attach '/dev/diskid/{}'

/etc/rc.d/zfs onestart

ZFS Record Size For Backup Machine

Illustration

When it came to setup my remote backup machine, only three things were important: use of 4K disks, two disk redundancy (raidz2), and a reasonably efficient storage of variously sized files. Reading around internet lead me to believe volblocksize tweaking was what I needed.

However, unless you create zvol, that knob is actually not available. The only available property impacting file storage capacity is recordsize. Therefore I decided to try out a couple record sizes and see how storage capacity compares.

For the purpose of test I decided to create virtual machine with six extra 20 GB disks. Yes, using virtual machine was not ideal but I was interested in relative results and not the absolute numbers so this would do. And mind you, I wasn’t interested in speed but just in data usage so again virtual machine seemed like a perfect environment.

Instead of properly testing with real files, I created 100000 files that were about 0.5K, 33000 files about 5K, 11000 files about 50K, 3700 files about 500K, 1200 files about 5M, and finally about 400 files around 50M. Essentially, there were six file sizes with each set being one decade bigger but happening only a third as often. The exact size for each file was actually randomly chosen to ensure some variety.

After repeating test three times with each size and for both 4 and 6 disk setup, I get the following result:

4 disks6 disks
Record sizeUsedAvailableUsedAvailable
4K-061,55717,064
8K-061,17117,450
16K34,0084,19131,22347,398
32K34,0254,17331,21347,408
64K31,3006,89931,26847,353
128K31,2766,92331,18047,441
256K30,7197,48131,43247,189
512K31,0697,13031,81446,807
1024K30,9207,27931,71446,907

Two things of interest to note here. The first one is that small record size doesn’t really help at all. Quantity of metadata needed goes well over available disk space in the 4-disk case and causes extremely inefficient storage for 6 disks. Although test data set has 30.2 GB, with overhead occupancy goes into the 60+ GB territory. Quite inefficient.

The default 128K value is actually quite well selected. While my (artificial) data set has shown a bit better result with the larger record sizes, essentially everything at 64K and over doesn’t fare too badly.

PS: Excel with raw data and script example is available for download.

PPS: Yes, the script generates the same random numbers every time - this was done intentionally so that the same amount of logical space is used with every test. Do note that doesn’t translate to the same physical space usage as (mostly due to transaction group timing) a slightly different amount of metadata will be written.