Selecting 8-bit CRC

Data errors are a reality when communicating between electronic devices. While one can use various hashing algorithms in order to verify data, this is not a luxury easily afforded when working with microcontrollers. For such cases, one often uses just a simple parity, sum, or ideally a CRC algorithm.

I won’t go much into a history and general use of these algorithm as Wikipedia covers it nicely. I will just present you with a problem I had. Which of many CRC algorithms do I actually use for my Microchip PIC microcontroller project?

The first part of rough selection was easy. Due to the limited nature of my PIC microcontroller, the only realistic options were CRCs up to 16 bits in length. After discounting all CRCs that ended on non-byte boundaries, that left me with only CRC-16 and CRC-8.

While CRC-16 is superior in all manners, it’s not only more computationally intensive but it also requires more memory to implement. Both of which are notoriously absent from microcontrollers. While both of those might be a worthy compromise if I had long messages that needed checksumming, my messages were couple of bytes in length. So, CRC-8 it is.

However, CRC-8 has quite a few variants out there. Heck, my own C# library supports over 20 of them. What I needed was a logical way to select the best one. And that came in the form of the excellent “Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks” article.

However, what this article has shown me was that I was wrong in thinking there is a single optimal solution. For example, from charts its easy to see that polynomial 0xA6 is vastly superior when it comes to long messages. It will detect two bit errors (yes, simplifying here a bit) no matter what the length. However, if we want CRC to detect four bit errors, polynomial 0x97 will actually do better - albeit only up to 120 bits (15 bytes). Whether one is better than the other, is now function of message length.

And that sealed it for me. Since my messages were to be short, polynomial 0x97 was actually the right choice for me. Since CRC is defined by more than its polynomial, I went to the repository of CRC knowledge to see if someone implemented the same. And I couldn’t find it. Heck, I couldn’t find any of the polynomials discussed.

It took me a hot minute to see that whitepaper and CRC RevEng page were actually describing the same polynomials in a different way. CRC RevEng used their normal form while whitepaper decided to call them out in reversed reciprocal. My 0x97 was actually 0x2F.

After understanding that, I actually found two existing CRC-8 algorithms using my chosen polynomial: AutoSar and OpenSAFETY. Since AuroSar had an extra step of XORing the output with 0xFF, I went with a slightly shorter OpenSAFETY CRC-8 variant. Even better, I had that algorithm already implemented in C#. Converting it to C for my microcontroller would be a breeze.

However, since my C# algorithm used lookup table to speed things up, that made it really unsuitable for microcontroller implementation. How could I give 256 bytes to the lookup table when I had only 1024 to begin with? No, for this I needed a variant that uses as little memory as possible.

After a while, this is what I ended with:

uint8_t crc8(uint8_t* data, uint8_t length) {
    uint8_t crc = 0;
    while (length--) {
        crc ^= *data++;
        for (uint8_t i = 0; i < 8; i++) {
            if (crc & 0x80) {
                crc = (uint8_t)(crc << 1) ^ 0x2F;
            } else {
                crc <<= 1;
            }
        }
    }
    return crc;
}

Without any optimizations, this code needs only 3 bytes in RAM and 46 words of program memory (37 words with optimizations). And yes, speed does suck a bit but that’s the tradeoff. If speed was of essence, one could always go with lookup tables, no matter the RAM cost.

In any case, my problem was solved and algorithm selected.


PS: Yes, code could be optimized further if we used overflow from STATUS register as that would allow us to have only one crc << 1 but I opted not to do that as to keep code easily transportable between platforms.

PPS: And yes, if one is willing to go into the assembly, you can optimize it even more.