Encrypting SD Card Using BitLocker

As I wanted to use SD card to move files from my laptop to Surface Go, it made sense to encryt the data. Easy, just right-click on the drive and turn on BitLocker, I thought. However, my SD card didn’t give me such option. For some reason Microsoft decided to not show this option in context menu.

However, one can still encrypt it using command line. Syntax it a bit annyoying but workable. In my example, I had my SD card mounted as drive S:.

manage-bde.exe -on S: -password -s

To check how the process goes, you can use -status parameter.

manage-bde.exe -status

To make things even better, once encrypted, these disks are usable on Ubuntu 22.04 too. You might just want to install exFAT support.

sudo add-apt-repository universe
sudo apt update
sudo apt install exfat-fuse exfatprogs

Post-Quantum Cryptography - The Last Round

See also round 1 and round 2.


Well, after a long time, NIST announced the first four quantum-resistant cryptographic algorithms.

Both of my StarTrek inspired favorites are actually in. CRYSTALS-Kyber is the one selected for a key exchange purposes and I fully expect to see it in some future OpenSSH version. Since dealing with ED25519 would require a quantum computer much bigger than currently available, eliptic curves are still probably the best default choice. However, you don’t want to wait the last moment to switch. Considering there are still some system that only support RSA (yes Mikrotik, I’m talking about you), switch will take a while.

CRYSTALS-Dilithium, a part of the same family, got selected as one of three suggested digital signature algorithms. From practicality side, it will rarely, if ever, be used alone as its signature output is literaly larger than a KB. That said, there are a few suggested modes (e.g., Dilithium3-AES) keeping the reasonable key size AES provides while retaining quantum assurances of a lattice algorthm.

FALCON was also selected despite difficulty of a correct implementation and a huge potential for side-channel attacks. I guess a small memory footprint and impressive performance in embedded applications were enough to ensure its place among finalists.

Lastly, a SPHINCS+ came out of blue (at least for me) to take its place as the last of the finalists. Since it is slower and larger than either of the other two finalists, it’s hardly a first choice. Regardless, using a different math approach compared to other two finalists was valuable enough to get it in.

NewHope, one of the round two finalists and already used by Chrome ended up like the recent Star Wars sequels. An early succes but ultimately not good enough to pursue.

CRC-8 Nibble Lookup Table

A few weeks ago I left you with the following CRC-8 code for microcontrollers:

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        for (uint8_t i = 0; i < 8; i++) {
            if (crc & 0x80) {
                crc = (uint8_t)(crc << 1) ^ 0x2F;
            } else {
                crc <<= 1;
            }
        }
    }
    return crc;
}

However, this code is a bit lacking when it comes to the performance. And yes, adding a lookup table would speed things up by a lot since it would allow use of precalculated values instead of shifting stuff around.

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        crc = Crc8Lut[crc];
    }
    return crc;
}

However, 256 bytes such lookup table requires is too much when dealing with a microcontroller that often has only that amount available for all its functionality. Some, like my favorite small micro PIC12F1501) might have as little as 64 bytes of RAM.

What we need is a smaller lookup table.

Fortunately, this is actually quite possible if we calculate lookup table independently for the high and low nibble.

for (uint8_t i = 0; i < 16; i++) {
    uint8_t crc = (uint8_t)i;
    for (uint8_t j = 1; j <= 8; j++) {
        if (crc & 0x80) {
            crc = (uint8_t)((crc << 1) ^ poly);
        } else {
            crc <<= 1;
        }
    }
    Crc8LutL[i] = crc;
}
for (uint8_t i = 0; i < 16; i++) {
    uint8_t crc = (uint8_t)i << 4;
    for (uint8_t j = 1; j <= 8; j++) {
        if (crc & 0x80) {
            crc = (uint8_t)((crc << 1) ^ poly);
        } else {
            crc <<= 1;
        }
    }
    Crc8LutH[i] = crc;
}

This shrinks our lookup table to 32 bytes and allows us to use shift-less CRC code. And yes, there is one shift operation in there but XC8 compiler should actually optimize that one away in most cases.

// Polynomial 0x2F
const uint8_t Crc8LutL[] = { 0x00, 0x2F, 0x5E, 0x71, 0xBC, 0x93, 0xE2, 0xCD, 0x57, 0x78, 0x09, 0x26, 0xEB, 0xC4, 0xB5, 0x9A, };
const uint8_t Crc8LutH[] = { 0x00, 0xAE, 0x73, 0xDD, 0xE6, 0x48, 0x95, 0x3B, 0xE3, 0x4D, 0x90, 0x3E, 0x05, 0xAB, 0x76, 0xD8, };

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        crc = Crc8LutL[crc & 0x0F] ^ Crc8LutH[crc >> 4];
    }
    return crc;
}

Implemented on a microcontroller using XC8 compiler, this code fits in 80 words of program memory and uses just 4 extra bytes of RAM (const means your table goes to program memory).

You might opt to go without const and in that case code needs 72 words of program memory and whooping 35 bytes of RAM. As I’m usually running out of RAM before I run out of program memory, I don’t find that particular compromise worth it.

And again, one could squeeze a few more bytes by implementing this in assembly, but not too many.

As always, the example code is available for download.


PS: For curious ones, a full CRC-8 lookup table implementation uses whooping 287 words of program memory but only 2 bytes of RAM.

OBS Studio Tray Icon Change

When customizing one’s desktop, there are always few applications that resist the change. This time I was fighting OBS Studio under Ubuntu and its dreadful tray icon. It just didn’t fit with my desktop theme.

I searched around and found essentially nothing. Yes, there were pages upon pages of people having the same problem and then other people giving them generalized advice but without anything actionable.

To make long story short, it took understanding Qt theming to find a solution for me. It’s not an ideal solution since it does require modification to the theme that could be lost in the future update but the trick was in getting icons added to the /usr/share/icons/hicolor/ directory.

I simply took my chosen 16x16 icons and placed them at the following locations:

  • /usr/share/icons/hicolor/16x16/status/obs-tray.png
  • /usr/share/icons/hicolor/16x16/status/obs-tray-active.png
  • /usr/share/icons/hicolor/16x16/status/obs-tray-paused.png

For a good measure I also changed their 24x24 and 32x32 variants and that was it. OBS now picked up my icons without an error.


PS: And no, adding them to .icons will not work since it’ll break the rest of the system’s theme. And yes, one could go around that by copying the whole theme to the directory but I found it easier just to copy a few files to the system directory instead.

Selecting 8-bit CRC

Data errors are a reality when communicating between electronic devices. While one can use various hashing algorithms in order to verify data, this is not a luxury easily afforded when working with microcontrollers. For such cases, one often uses just a simple parity, sum, or ideally a CRC algorithm.

I won’t go much into a history and general use of these algorithm as Wikipedia covers it nicely. I will just present you with a problem I had. Which of many CRC algorithms do I actually use for my Microchip PIC microcontroller project?

The first part of rough selection was easy. Due to the limited nature of my PIC microcontroller, the only realistic options were CRCs up to 16 bits in length. After discounting all CRCs that ended on non-byte boundaries, that left me with only CRC-16 and CRC-8.

While CRC-16 is superior in all manners, it’s not only more computationally intensive but it also requires more memory to implement. Both of which are notoriously absent from microcontrollers. While both of those might be a worthy compromise if I had long messages that needed checksumming, my messages were couple of bytes in length. So, CRC-8 it is.

However, CRC-8 has quite a few variants out there. Heck, my own C# library supports over 20 of them. What I needed was a logical way to select the best one. And that came in the form of the excellent “Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks” article.

However, what this article has shown me was that I was wrong in thinking there is a single optimal solution. For example, from charts its easy to see that polynomial 0xA6 is vastly superior when it comes to long messages. It will detect two bit errors (yes, simplifying here a bit) no matter what the length. However, if we want CRC to detect four bit errors, polynomial 0x97 will actually do better - albeit only up to 120 bits (15 bytes). Whether one is better than the other, is now function of message length.

And that sealed it for me. Since my messages were to be short, polynomial 0x97 was actually the right choice for me. Since CRC is defined by more than its polynomial, I went to the repository of CRC knowledge to see if someone implemented the same. And I couldn’t find it. Heck, I couldn’t find any of the polynomials discussed.

It took me a hot minute to see that whitepaper and CRC RevEng page were actually describing the same polynomials in a different way. CRC RevEng used their normal form while whitepaper decided to call them out in reversed reciprocal. My 0x97 was actually 0x2F.

After understanding that, I actually found two existing CRC-8 algorithms using my chosen polynomial: AutoSar and OpenSAFETY. Since AuroSar had an extra step of XORing the output with 0xFF, I went with a slightly shorter OpenSAFETY CRC-8 variant. Even better, I had that algorithm already implemented in C#. Converting it to C for my microcontroller would be a breeze.

However, since my C# algorithm used lookup table to speed things up, that made it really unsuitable for microcontroller implementation. How could I give 256 bytes to the lookup table when I had only 1024 to begin with? No, for this I needed a variant that uses as little memory as possible.

After a while, this is what I ended with:

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        for (uint8_t i = 0; i &lt; 8; i++) {
            if (crc & 0x80) {
                crc = (uint8_t)(crc &lt;&lt; 1) ^ 0x2F;
            } else {
                crc &lt;&lt;= 1;
            }
        }
    }
    return crc;
}

Without any optimizations, this code needs only 3 bytes in RAM and 46 words of program memory (37 words with optimizations). And yes, speed does suck a bit but that’s the tradeoff. If speed was of essence, one could always go with lookup tables, no matter the RAM cost.

In any case, my problem was solved and algorithm selected.


PS: Yes, code could be optimized further if we used overflow from STATUS register as that would allow us to have only one crc << 1 but I opted not to do that as to keep code easily transportable between platforms.

PPS: And yes, if one is willing to go into the assembly, you can optimize it even more.