Post-Quantum Cryptography - The Last Round

See also round 1 and round 2.


Well, after a long time, NIST announced the first four quantum-resistant cryptographic algorithms.

Both of my StarTrek inspired favorites are actually in. CRYSTALS-Kyber is the one selected for a key exchange purposes and I fully expect to see it in some future OpenSSH version. Since dealing with ED25519 would require a quantum computer much bigger than currently available, eliptic curves are still probably the best default choice. However, you don’t want to wait the last moment to switch. Considering there are still some system that only support RSA (yes Mikrotik, I’m talking about you), switch will take a while.

CRYSTALS-Dilithium, a part of the same family, got selected as one of three suggested digital signature algorithms. From practicality side, it will rarely, if ever, be used alone as its signature output is literaly larger than a KB. That said, there are a few suggested modes (e.g., Dilithium3-AES) keeping the reasonable key size AES provides while retaining quantum assurances of a lattice algorthm.

FALCON was also selected despite difficulty of a correct implementation and a huge potential for side-channel attacks. I guess a small memory footprint and impressive performance in embedded applications were enough to ensure its place among finalists.

Lastly, a SPHINCS+ came out of blue (at least for me) to take its place as the last of the finalists. Since it is slower and larger than either of the other two finalists, it’s hardly a first choice. Regardless, using a different math approach compared to other two finalists was valuable enough to get it in.

NewHope, one of the round two finalists and already used by Chrome ended up like the recent Star Wars sequels. An early succes but ultimately not good enough to pursue.

CRC-8 Nibble Lookup Table

A few weeks ago I left you with the following CRC-8 code for microcontrollers:

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        for (uint8_t i = 0; i < 8; i++) {
            if (crc & 0x80) {
                crc = (uint8_t)(crc << 1) ^ 0x2F;
            } else {
                crc <<= 1;
            }
        }
    }
    return crc;
}

However, this code is a bit lacking when it comes to the performance. And yes, adding a lookup table would speed things up by a lot since it would allow use of precalculated values instead of shifting stuff around.

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        crc = Crc8Lut[crc];
    }
    return crc;
}

However, 256 bytes such lookup table requires is too much when dealing with a microcontroller that often has only that amount available for all its functionality. Some, like my favorite small micro PIC12F1501) might have as little as 64 bytes of RAM.

What we need is a smaller lookup table.

Fortunately, this is actually quite possible if we calculate lookup table independently for the high and low nibble.

for (uint8_t i = 0; i < 16; i++) {
    uint8_t crc = (uint8_t)i;
    for (uint8_t j = 1; j <= 8; j++) {
        if (crc & 0x80) {
            crc = (uint8_t)((crc << 1) ^ poly);
        } else {
            crc <<= 1;
        }
    }
    Crc8LutL[i] = crc;
}
for (uint8_t i = 0; i < 16; i++) {
    uint8_t crc = (uint8_t)i << 4;
    for (uint8_t j = 1; j <= 8; j++) {
        if (crc & 0x80) {
            crc = (uint8_t)((crc << 1) ^ poly);
        } else {
            crc <<= 1;
        }
    }
    Crc8LutH[i] = crc;
}

This shrinks our lookup table to 32 bytes and allows us to use shift-less CRC code. And yes, there is one shift operation in there but XC8 compiler should actually optimize that one away in most cases.

// Polynomial 0x2F
const uint8_t Crc8LutL[] = { 0x00, 0x2F, 0x5E, 0x71, 0xBC, 0x93, 0xE2, 0xCD, 0x57, 0x78, 0x09, 0x26, 0xEB, 0xC4, 0xB5, 0x9A, };
const uint8_t Crc8LutH[] = { 0x00, 0xAE, 0x73, 0xDD, 0xE6, 0x48, 0x95, 0x3B, 0xE3, 0x4D, 0x90, 0x3E, 0x05, 0xAB, 0x76, 0xD8, };

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        crc = Crc8LutL[crc & 0x0F] ^ Crc8LutH[crc >> 4];
    }
    return crc;
}

Implemented on a microcontroller using XC8 compiler, this code fits in 80 words of program memory and uses just 4 extra bytes of RAM (const means your table goes to program memory).

You might opt to go without const and in that case code needs 72 words of program memory and whooping 35 bytes of RAM. As I’m usually running out of RAM before I run out of program memory, I don’t find that particular compromise worth it.

And again, one could squeeze a few more bytes by implementing this in assembly, but not too many.

As always, the example code is available for download.


PS: For curious ones, a full CRC-8 lookup table implementation uses whooping 287 words of program memory but only 2 bytes of RAM.

OBS Studio Tray Icon Change

When customizing one’s desktop, there are always few applications that resist the change. This time I was fighting OBS Studio under Ubuntu and its dreadful tray icon. It just didn’t fit with my desktop theme.

I searched around and found essentially nothing. Yes, there were pages upon pages of people having the same problem and then other people giving them generalized advice but without anything actionable.

To make long story short, it took understanding Qt theming to find a solution for me. It’s not an ideal solution since it does require modification to the theme that could be lost in the future update but the trick was in getting icons added to the /usr/share/icons/hicolor/ directory.

I simply took my chosen 16x16 icons and placed them at the following locations:

  • /usr/share/icons/hicolor/16x16/status/obs-tray.png
  • /usr/share/icons/hicolor/16x16/status/obs-tray-active.png
  • /usr/share/icons/hicolor/16x16/status/obs-tray-paused.png

For a good measure I also changed their 24x24 and 32x32 variants and that was it. OBS now picked up my icons without an error.


PS: And no, adding them to .icons will not work since it’ll break the rest of the system’s theme. And yes, one could go around that by copying the whole theme to the directory but I found it easier just to copy a few files to the system directory instead.

Selecting 8-bit CRC

Data errors are a reality when communicating between electronic devices. While one can use various hashing algorithms in order to verify data, this is not a luxury easily afforded when working with microcontrollers. For such cases, one often uses just a simple parity, sum, or ideally a CRC algorithm.

I won’t go much into a history and general use of these algorithm as Wikipedia covers it nicely. I will just present you with a problem I had. Which of many CRC algorithms do I actually use for my Microchip PIC microcontroller project?

The first part of rough selection was easy. Due to the limited nature of my PIC microcontroller, the only realistic options were CRCs up to 16 bits in length. After discounting all CRCs that ended on non-byte boundaries, that left me with only CRC-16 and CRC-8.

While CRC-16 is superior in all manners, it’s not only more computationally intensive but it also requires more memory to implement. Both of which are notoriously absent from microcontrollers. While both of those might be a worthy compromise if I had long messages that needed checksumming, my messages were couple of bytes in length. So, CRC-8 it is.

However, CRC-8 has quite a few variants out there. Heck, my own C# library supports over 20 of them. What I needed was a logical way to select the best one. And that came in the form of the excellent “Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks” article.

However, what this article has shown me was that I was wrong in thinking there is a single optimal solution. For example, from charts its easy to see that polynomial 0xA6 is vastly superior when it comes to long messages. It will detect two bit errors (yes, simplifying here a bit) no matter what the length. However, if we want CRC to detect four bit errors, polynomial 0x97 will actually do better - albeit only up to 120 bits (15 bytes). Whether one is better than the other, is now function of message length.

And that sealed it for me. Since my messages were to be short, polynomial 0x97 was actually the right choice for me. Since CRC is defined by more than its polynomial, I went to the repository of CRC knowledge to see if someone implemented the same. And I couldn’t find it. Heck, I couldn’t find any of the polynomials discussed.

It took me a hot minute to see that whitepaper and CRC RevEng page were actually describing the same polynomials in a different way. CRC RevEng used their normal form while whitepaper decided to call them out in reversed reciprocal. My 0x97 was actually 0x2F.

After understanding that, I actually found two existing CRC-8 algorithms using my chosen polynomial: AutoSar and OpenSAFETY. Since AuroSar had an extra step of XORing the output with 0xFF, I went with a slightly shorter OpenSAFETY CRC-8 variant. Even better, I had that algorithm already implemented in C#. Converting it to C for my microcontroller would be a breeze.

However, since my C# algorithm used lookup table to speed things up, that made it really unsuitable for microcontroller implementation. How could I give 256 bytes to the lookup table when I had only 1024 to begin with? No, for this I needed a variant that uses as little memory as possible.

After a while, this is what I ended with:

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        for (uint8_t i = 0; i &lt; 8; i++) {
            if (crc & 0x80) {
                crc = (uint8_t)(crc &lt;&lt; 1) ^ 0x2F;
            } else {
                crc &lt;&lt;= 1;
            }
        }
    }
    return crc;
}

Without any optimizations, this code needs only 3 bytes in RAM and 46 words of program memory (37 words with optimizations). And yes, speed does suck a bit but that’s the tradeoff. If speed was of essence, one could always go with lookup tables, no matter the RAM cost.

In any case, my problem was solved and algorithm selected.


PS: Yes, code could be optimized further if we used overflow from STATUS register as that would allow us to have only one crc << 1 but I opted not to do that as to keep code easily transportable between platforms.

PPS: And yes, if one is willing to go into the assembly, you can optimize it even more.

Small(er) Ubuntu Server 22.04 Install on Vultr

For my new Ubuntu Server deployment I decided to go with Intel High Frequency Vultr instance, mostly due to its larger disk allotment. However, going with default Vultr’s deployment image, I ended up with 18 GB of disk occupied. And yes, I could have removed extra stuff I didn’t need (e.g. /usr/local/cuda/ was the prime candidate). However, I decided to go a different route - manual installation.

Illustration

Getting the Ubuntu ISO is easy enough as a wide selection is already available behind ISO Library tab on Server Image selection page. Combine that with noVNC console and you can just click your way through. However, one can also find Shell option within the Help menu giving you access to the bash prompt and allowing for more control.

While noVNC is decent, the lack of copy/paste makes it unwieldy when more then a few commands need to be entered. So, my first task was to SSH into the installation and continue from there. To do this, we have to allow for password login and set the password.

sed -i 's/^#PermitRootLogin.*$/PermitRootLogin yes/' /etc/ssh/sshd_config
systemctl restart sshd
passwd

Now we can connect using any SSH client and do the rest of steps from there.

I like to start by setting a few variables. Here I needed only DISK and HOST.

DISK=/dev/vda
HOST=^^ubuntu^^

Assuming this is a fresh VM, the disk should already be empty but I like to clean it again (just in case) and create a single partition. On servers I quite often skip swap partition and that’s the case here too.

blkdiscard -f $DISK
echo -e "o\nn\np\n1\n\n\nw" | fdisk $DISK
fdisk -l $DISK

Now we can format our partition and mount it into /mnt/install/.

# mkfs.ext4 ${DISK}1
# mkdir /mnt/install
# mount ${DISK}1 /mnt/install/

We will need to install debootstrap to get our installation going.

# apt update ; apt install --yes debootstrap

And now we can finally move installation files to our disk.

debootstrap $(basename `ls -d /cdrom/dists/*/ | grep -v stable | head -1`) /mnt/install/

Before proceeding, we might as well update a few settings.

echo $HOST > /mnt/install/etc/hostname
sed "s/ubuntu/$HOST/" /etc/hosts > /mnt/install/etc/hosts
sed '/cdrom/d' /etc/apt/sources.list > /mnt/install/etc/apt/sources.list
cp /etc/netplan/*.yaml /mnt/install/etc/netplan/

Finally we get to chroot into our newly installed system.

mount --rbind /dev  /mnt/install/dev
mount --rbind /proc /mnt/install/proc
mount --rbind /sys  /mnt/install/sys
chroot /mnt/install \
    /usr/bin/env DISK=$DISK HOST=$HOST \
    bash --login

While optional, I like to update locale settings and set the time zone.

locale-gen --purge "en_US.UTF-8"
update-locale LANG=en_US.UTF-8 LANGUAGE=en_US
dpkg-reconfigure --frontend noninteractive locales
dpkg-reconfigure tzdata

Installing kernel is next.

apt update
apt install --yes --no-install-recommends linux-image-generic linux-headers-generic

Followed by our boot environment packages.

apt install --yes initramfs-tools grub-pc

Setting FSTab will ensure disk is mounted once done.

echo "PARTUUID=$(blkid -s PARTUUID -o value ${DISK}1) \
    / ext4 noatime,nofail,x-systemd.device-timeout=5s 0 1" > /etc/fstab
cat /etc/fstab

Of course, boot wouldn’t be possible without getting images ready.

KERNEL=`ls /usr/lib/modules/ | cut -d/ -f1 | sed 's/linux-image-//'`
update-initramfs -u -k $KERNEL

And finally, boot needs Grub installed to MBR.

update-grub
grub-install ${DISK}

While we could boot our system already, it’s a bit too bare for my taste so I’ll add a basic set of packages.

apt install --yes ubuntu-server-minimal man

Once done, a small update will not hurt.

apt update ; apt dist-upgrade --yes

Of course, we need to ensure we can boot into our system. While one could use passwd to set the root password, I like to use keys for access.

mkdir /root/.ssh
echo '^^ssh-ed25519 AAAA...^^' > /root/.ssh/authorized_keys
chmod 600 -R /root/.ssh

With all set, we can exit our chroot environment.

exit

We might as well unmount our custom install.

mount | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}

And finally we get to reboot. Please note that you should also go to VM Settings and unmount custom ISO to avoid getting into installation again.

reboot

If all steps worked, you will face a pristine Ubuntu installation measuring something around 5 GB. To make it even smaller, one can turn off the swap.

swapoff -a
vi /etc/fstab
rm /swapfile

Regardless, your system is now ready for abuse. :)