Programming in C#, Java, and god knows what not

Hashing It Out

While .NET finally includes CRC-32 and CRC-64 algorithms, it stops at bare minimum and offers only a single standard polynomial for each. Perfectly sufficient if one wants to create something from scratch but woefully inadequate when it comes to integrating with other software.

You see, CRC is just the method of computation and it’s not sufficient to fully describe the result. What you need is polynomial and there’s a bunch of them. At any useful bit length you will find many “standard” polynomials. While .NETs solution gives probably most common 32 and 64 bit variant, it doesn’t cover shorter bit lengths nor does it allow for custom polynomial.

Well, for that purpose I created a library following the same inheritance-from-NonCryptographicHashAlgorithm-class pattern. Not only does it allow for 8, 16, 32, and 64 bit widths, but it also offers a bunch of well-known polynomials in addition to custom polynomial support.

Below is the list of currently supported variants and, as always, code is available on GitHub.

CRC-8CRC-16CRC-32CRC-64
ATMACORNAAL5ECMA-182
AUTOSARARCADCCPGO-ECMA
BLUETOOTHAUG-CCITTAIXMGO-ISO
C2AUTOSARAUTOSARMS
CCITTBUYPASSBASE91-CREDIS
CDMA2000CCITTBASE91-DWE
DARCCCITT-FALSEBZIP2XZ
DVB-S2CCITT-TRUECASTAGNOLI
GSM-ACDMA2000CD-ROM-EDC
GSM-BCMSCKSUM
HITAGDARCDECT-B
I-432-1DDS-110IEEE-802.3
I-CODEDECT-RINTERLAKEN
ITUDECT-XISCSI
LTEDNPISO-HDLC
MAXIMEN-13757JAMCRC
MAXIM-DOWEPCMPEG-2
MIFAREEPC-C1G2PKZIP
MIFARE-MADGENIBUSPOSIX
NRSC-5GSMV-42
OPENSAFETYI-CODEXFER
ROHCIBM-3740XZ
SAE-J1850IBM-SDLC
SMBUSIEC-61158-2
TECH-3250IEEE 802.3
WCDMA2000ISO-HDLD
ISO-IEC-14443-3-A
ISO-IEC-14443-3-B
KERMIT
LHA
LJ1200
LTE
MAXIM
MAXIM-DOW
MCRF4XX
MODBUS
NRSC-5
OPENSAFETY-A
OPENSAFETY-B
PROFIBUS
RIELLO
SPI-FUJITSU
T10-DIF
TELEDISK
TMS37157
UMTS
USB
V-41-LSB
V-41-MSB
VERIFONE
X-25
XMODEM
ZMODEM

UUID Version 7 Implementation and Conundrums

During otherwise uninteresting summer, without too much noise, we got new UUID version(s). While 6 and 8 are nice numbers, version 7 got me intrigued. It’s essentially just a combination of Unix timestamp with some random bits mixed in. Exactly what a doctor might order if you want to use such UUID as a primary key in a database whose index you don’t want to fragment to hell.

Format is easy enough as its ASCII description would suggest:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           unix_ts_ms                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          unix_ts_ms           |  ver  |       rand_a          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var|                        rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            rand_b                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

It’s essentially 48 bits of (Unix) timestamp, followed by 74 bits of randomness, with 6 remaining bits being version and variant information. While timestamp ensures that IDs generated close in time get sorted close to each other, randomness is there to ensure uniqueness for stuff that happens at the same millisecond. And, of course, I am simplifying things a bit, especially for rand_a field but you get the gist of it.

That all behind us, let me guide you through my implementation of the same.

Generating the first 48-bits is straightforward with the only really issue being endianness. Since BinaryPrimitives doesn’t really deal with 48-bit integers, a temporary array was needed.

var msBytes = new byte[8];
BinaryPrimitives.WriteInt64BigEndian(msBytes, ms);
Buffer.BlockCopy(msBytes, 2, Bytes, 0, 6);

Generating randomness has two paths. Every time new millisecond is detected, it will generate 10 bytes of randomness. The lowest 10 bits of first 2 bytes will be used to initialize random starting value (as per Monotonic Random method in 6.2). After accounting for the 4-bit version field that shares the same space, we have 2 bits remaining. Those are simply set to 0 (as per Counter Rollover Guards in the same section). I decided not to implement “bit stealing” from rand_b field nor to implement any fancy rollover handling. If we’re still in the same millisecond, 10-bit counter is just increased and it’s lower bits are written.

if (LastMillisecond != ms) {
    LastMillisecond = ms;
    RandomNumberGenerator.Fill(Bytes.AsSpan(6));
    RandomA = (ushort)(((Bytes[6] & 0x03) << 8) | Bytes[7]);
} else {
    RandomA++;
    Bytes[7] = (byte)(RandomA & 0xFF);
    RandomNumberGenerator.Fill(Bytes.AsSpan(8));
}

Despite code looking a bit smelly when it comes to multithreading, it’s actually thread safe. Why? Well, it comes to both RandomA and LastMillisecond field having a special ThreadStatic attribute.

[ThreadStatic] private static long LastMillisecond;
[ThreadStatic] private static ushort RandomA;

This actually makes each thread have a separate copy of these variables and thus no collisions will happen. Of course, since counters are determined for each thread separately, you don’t get a sequential output between threads. Not ideal, but a conscious choice to avoid a performance hit a proper locking would introduce.

The last part is to fixup bytes in order to add version bits (always a binary 0111) and variant bits (always a binary 10).

Bytes[6] = (byte)(0x70 | ((RandomA >> 8) & 0x0F));
Bytes[8] = (byte)(0x80 | (Bytes[8] & 0x3F));

Add a couple of overrides and that’s it. You can even convert it to Guid. However…

Microsoft’s implementation of UUID know to all as System.Guid is slightly broken. Or isn’t. I guess it depends from where you look at it from. If you look at it as how RFC4122 specifies components, you can see them as data types. And that’s how the original developer thought of it. Not as a binary blob but as a structure containing (little-endian on x86) numbers even though the specification clearly says all numbers are big endian.

Had it stopped at just internal storage, it would be fine. But Microsoft went as far as to convert endianness when converting 128-bit value to the string. And that is fine if you work only with Microsoft’s implementation but it causes issues when you try to deal with almost any other variant.

This also causes one peculiar problem when it comes to converting my version 7 UUID to Microsoft’s Guid. While their binary representations are the same, converting them to string format yields a different value. You can have either binary or string compatibility between those two. But never both. In my case I decided that binary compatibility is more important since you should really be using UUID in its binary form and not space-wasting hexadecimal format.

As always, the full code is available on GitHub.

[2023-01-12: Code has been adjusted a bit in order to follow the current RFC draft. Main changes are introduction of longer monotonic counter and altenate text conversion methods (Base35 and Base58).]

Single Instance Application for .NET 6 or 7

A while ago I wrote C# code to handle single instance application. And that code has served me well on Windows. However, due to its dependency on the Windows API, you really couldn’t target multiplatform .NET code. It was time for an update.

My original code was using a combination of a global mutex in order to detect another instance running, followed by a named pipe communication to transfer arguments to the first-running instance. Fortunatelly, .NET 6 also contained those primitives. Even better, I could replace my named pipe API calls with multiplatform NamedPipeServerStream and NamedPipeClientStream classes.

Unlike my Windows-specific code, I had to use Global\\ prefix in order for code to work properly on Linux. While unfortunate, it actually wasn’t too bad as my mutex name already included the user name. Combine that with assembly location, hash it a bit, and you have a globally unique identifier. While the exact code was changed slightly, the logic remained the same and new code worked without much effort.

Code to transfer arguments had a few more issues. First of all, I had to swap my binary serializer for JSON. Afterward, I had to write a new pipe handling code, albeit using portable .NET implementation as a base this time. Mind you, back when I wrote it for Windows, neither has been supported. Regardless, a bit of time later, both tasks were successfuly done and the freshly updated code has been tested on Linux. Success!

But success was shortlived as the same code didn’t work on Windows. Well, technically it did work but the old instance newer saw the data that was sent. It took a bit of troubleshooting to figure a basic named pipe constructor limited communication to a single process and overload setting PipeOptions.CurrentUserOnly for both client and server was needed. Thankfuly, that didn’t present any issues on Linux so the same code was good for both.

And that was it. Now I had working .NET 6 (or 7) code for a single instance application working for both Windows and Linux (probably MacOS too), allowing not only for detection but also argument forwarding. Just what I needed. :)

You can see both this class and example of its usage in my Medo repository.

Hashing It Out

For a while now I had a selection of CRC algorithms in my library. It offered support for many CRC-8, CRC-16, and CRC-32 variants, all inheriting from a bit clunky HashAlgorithm base class. It wasn’t an ideal choice as hashes and checksums are different beasts but it did a job.

With .NET 7 out, we finally got NonCryptographicHashAlgorithm base class to inherit from and that one is much better at dealing with CRC nuances. Adjustment of algorithms was easy enough but testing has shown one issue. Output of Microsoft’s CRC-32 class and one I have created was exactly reversed. For example, if an input would resut in 0x1A2B3C4D output from my class, Microsoft’s class would return 0x4D3C2B1A. Yep, we selected a different endianess.

I originally created my CRC classes with microcontroller communication in mind. Since most of microcontrollers are from the big-endian world, my classes were designed to output bytes in big-endian fashion. On other hand, Microsoft designed their CRC-32 class for communication on x86 platform. And that one is little-endian.

After giving it a lot of thought, I decided to go the Microsoft route and output result in native endianess if using GetCurrentHash method from NonCryptographicHashAlgorithm base class. Main reason was to have the same overall behavior whether someone uses my, Microsoft’s, or any other class. That said, I my classes always had an “escape hatch” method that outputs a number (HashAsUInt32) that can be then converted to any endianess.

In any case, if you need any CRC-8, CRC-16, or CRC-32 calculations with custom polynomials, do check out Medo.IO.Hashing library.

Text for OpenGL

Illustration

I occasionally like to learn something I literally have no use for. It keeps me happy and entertained. This month I decided to deal with OpenGL. And no, I don’t consider learn OpenGL an useless skill. OpenGL is very much useful and, despite newer technologies, still here to stay. It’s just that, since I do no game development, I have no real use for it. But I wanted to dip my toes into that world regardless.

After getting few triangles on the screen, it came time to output text and I was stunned to learn OpenGL has no real text support. And no, OpenGL is not unique here as neither Vulkan or Metal provide much support. Rendering text is simply not an integral part of rendering pipeline. And, once one gives it a thought, it’s clear it doesn’t belong there.

That’s not to say there are no ways to render text. The most common one is treating text as a texture. The less common way is rasterizing font into triangles. Since I really love bitmap fonts and square is easily constructed from two right triangles, I decided to go the rustic route.

The first issue was which font to select. I wanted something old, rather complete, and free. Due to quirks in the copyright law, bitmap fonts are generally not considered copyrightable under US law. Mind you, that holds true only for their final bitmap form. Fonts that come to you as TTF or OTF are definitely copyrightable.

The other issue with selection was completeness. While selecting old ROM font supporting code page 437 (aka US) is easy, the support for various European languages is limited, to say the least. Fortunately, here I came upon Bedstead font family which covered every language I could think off with some extra. While era-faithful setup would include upscaling and even a rudimentary anti-aliasing, I decided to go with a raw 5x9 pixel grid.

For conversion I wrote BedsteadToVertices utility that simply takes all character bitmaps and extracts them into a Vector2 array of triangles. The resulting file is essentially a C# class returning buffer that can be directly drawn. Something like this:

var triangles = BedsteadVerticesFont.GetVertices(text,
                                                 offsetX: -0.95f,
                                                 offsetY: 0.95f,
                                                 scaleX: 0.1f,
                                                 scaleY: 0.2f);
gl.BindBuffer(BufferTargetARB.ArrayBuffer, BufferId);
gl.BufferData<float>(BufferTargetARB.ArrayBuffer, triangles, BufferUsageARB.DynamicDraw);
gl.DrawArrays(GLEnum.Triangles, 0, (uint)triangles.Length / 2);

The very first naïve version of this file ended up generating a 3.8 MB source file. Not a breaking deal but quite a bit larger than I was comfortable with. So I went with a low hanging fruit first. Using float arrays instead of Vector2 instantly dropped the file size to 2.3 MB. Dropping all floats to 4 decimal places dropped it further to 2.0 MB.

And no, I didn’t think about reducing the whitespace. Code generated files don’t need to be ugly and reducing space count to a minimum would do just that. Especially because removing spaces will result in the exactly same compiled code at the expense of readability. Not worth it.

However, merging consecutive pixels into one big rectangle was yet another optimization that’s both cheap in implementation and reduces file size significantly. In my case, the end result was 1 MB for 1,500 characters. And yes, this is still a big file but if you exclude all the beautiful Unicode non-ASCII characters, that can bring file size down to 61 KB. Had I wanted to go the binary route, that would be even smaller but I was happy enough with this not to bother.

While the original Bedstead font is monospaced, I decided to throw a wrench into this and remove all extra spacing where I could do so easily. That means that font still feels monospaced but you won’t have excessive spaces really visible. And yes, kerning certain letter pairs (e.g., rl) was too out of scope.

On the OpenGL side, one could also argue that this style of bitmap drawing would be an excellent territory for the use of indices to reduce triangle count. One would be right on technicality but I opted not to complicate my life for dubious gains as modern GPUs (even the integrated ones) are quite capable of handling extra few hundred of triangles.

In any case, I solved my problem and, as always, the source code is available for download.


[2022-06-07: With a bit of optimization, ASCII-only file is at 50 KB.]

AddSeconds for TimeOnly

One really annoying fact of life you get when dealing with TimeOnly is that class has no AddSeconds method (AddMilliseconds or AddTicks either). But adding that missing functionality is not that hard - enter extension method.

While TimeOnly has no direct methods to add anything below a minute resolution, it does allow for creating a new instance using 100 ns ticks. So we might as well use it.

public static class TimeOnlyExtensions
{
    public static TimeOnly AddSeconds(this TimeOnly time, double seconds)
    {
        var ticks = (long)(seconds * 10000000 + (seconds >= 0 ? 0.5 : -0.5));
        return AddTicks(time, ticks);
    }

    public static TimeOnly AddMilliseconds(this TimeOnly time, int milliseconds)
    {
        var ticks = (long)(milliseconds * 10000 + (milliseconds >= 0 ? 0.5 : -0.5));
        return AddTicks(time, ticks);
    }

    public static TimeOnly AddTicks(this TimeOnly time, long ticks)
    {
        return new TimeOnly(time.Ticks + ticks);
    }
}

With one simple using static TimeOnlyExtensions; we get to correct this oversight. Hopefully this won’t be needed in the future. While easy enough, it’s annoying that such obvious methods are missing.

Endianness Fun in C#

Those doing parsing of network protocols will be familiar with BitConverter. For example, to read an integer that’s written in big-endian (aka network) order, one could write something like this:

Array.Reverse(buffer, offset, 4);
Console.WriteLine(BitConverter.ToInt32(buffer, offset));

If one is dealing with multiplatform code, they could even go with a slightly smarter code:

if (BitConverter.IsLittleEndian) { Array.Reverse(buffer, offset, 4); }
Console.WriteLine(BitConverter.ToInt32(buffer, offset));

However, that’s the old-style way of doing things. For a while now, .NET also offers BinaryPrimitives class in System.Buffers.Binary namespace. While this class was originally designed to be used with protocol buffers (which explains the namespace), there is no reason why you couldn’t use it anywhere else. Actually, considering how versatile the class is, I am stunned they didn’t just add it in the System namespace.

In any case, reading our 32-bit number is now as easy as:

Console.WriteLine(BinaryPrimitives.ReadInt32BigEndian(buffer));

And yes, this doesn’t accept offset argument. Boo-hoo! Just use Span and slice it.

var span = new ReadOnlySpan<byte>(buffer);
Console.WriteLine(BinaryPrimitives.ReadInt32BigEndian(span.Slice(offset)));

Much easier than remembering if you need to reverse array or not. :)

IDE0180: Use Tuple to Swap Values

As I upgraded my Visual Studio 2022 Preview, I noticed a new code suggestion: IDE0180. It popped next to a reasonably standard variable swap:

var tmp = a;
a = b;
b = tmp;

The new syntax it recommended was much nicer, in my opinion:

(a, b) = (b, a);

I love the new syntax!

Except it’s not new - it has been available since C# 7 (we’re at 10 now, just for reference). I just somehow missed it. The only reason I noticed it now was due to a suggestion. I guess better late than never. :)

Native PNG in C# .NET

It all started with a simple barcode. I was refactoring my old Code128 barcode class to work in .NET 5 and faced an interesting issue - there was no option to output barcode as an image. Yes, on Windows you could rely on System.Drawing but .NET is supposedly multiplatform environment these days. And no, System.Drawing is not supported in Linux - you only get error CS0234: The type or namespace name 'Drawing' does not exist in the namespace 'System' (are you missing an assembly reference?.

If you’re thinking to yourself “Wait, I remember Microsoft’s System.Drawing.Common working on Linux”, you don’t have a bad memory. However, Microsoft since made a small update and, with a stroke of a pen, decided to abandon it. Yes, you can use runtime options as a workaround but official support is no longer there.

Alternatives do exist. Most notably both ImageSharp and SkiaSharp would satisfy any graphic need I might have. But both also present quite a big dependency to pull for what is essentially a trivial task. I mean, how difficult can it be to implement PNG writer?

It turns out, not difficult at all. If you go over specification you’ll notice there are only three chunks you need to support: IHDR, IDAT, and IEND. If you know how to write those three, you can output PNG image that’s readable by all. Literally the only two things that are not straightforward were deflate compression that required an extra header and dealing with CRC.

Most of my debugging time was actually spend dealing with output not showing properly in IrfanView. I could read my output image in Paint.NET, Paint, Gimp, and multitude of other programs I’ve tried. It took me a while before I figured out that IrfanView 4.54 is actually one with the issue. Update to the latest version sorted that one out.

In any case, I successfully added PNG support to by barcode class.

And then I started thinking… Why not make a simple PNG image reader/writer? Well, answers are numerous. First of all, alternatives exist and a lone developer can never hope to have such level of completeness. Secondly, I don’t deal with graphics on a daily basis and thus features would be few and far between. And lastly, while PNG is a relatively simple specification to start with, it has a lot of optional complexity. And some of that complexity would require me to learn much more than I ever want to know (yes, ICC profiles - I’m looking at you).

However, logic be damned. What’s the better use of an afternoon than writing a PNG parser?

In any case, the final product is here and while as simple as it gets it’s relatively feature complete when it comes to loading plain old PNG images. The only mandatory feature I failed to implement is support for interlaced images.

Expectedly, the end result is usable for changing a pixel or two but not for much more as all infrastructure is missing. I will add some of the missing functionality in the future, resize and interlacing support coming first, but I consider this class reasonably feature complete for what I both need and can do without making a full blown library. And I haven’t had as much fun programming in a while.

Overescaping By Default

Writing JSON has became trivial in C# and there’s no class I like better for that purpose than Utf8JsonWriter. Just look at a simple example:

var jsonUtf8 = new Utf8JsonWriter(Console.OpenStandardOutput(),
                                  new JsonWriterOptions() { Indented = true });
jsonUtf8.WriteStartObject();
jsonUtf8.WriteString("Test", "2+2");
jsonUtf8.WriteEndObject();
jsonUtf8.Flush();

This simple code will produce perfectly valid JSON:

{
  "Test": "2\u002B2"
}

While valid, you’ll notice this is slightly different than any other programming language would do. A single plus character became escape sequence \u002B.

In their eternal wisdom, .NET architects decided that, by default, JSON should be over-escaped and they “explained” their reasoning in the ticket. Essentially they did it out of abundance of caution to avoid any issues if someone puts JSON where it might not be expected.

Mind you, in 99% of cases JSON is used in HTTP body and thus doesn’t need this but I guess one odd case justifies this non-standard but valid output in their minds. And no, other JSON encoders don’t behave this way either. Only .NET as far as I can tell.

Fortunately, some time later, they also implemented what I (alongside probably 90% of developers) consider the proper JSON encoder which escapes just mandatory characters and leaves the rest of text alone. It just requires a small extra parameter.

var jsonUtf8 = new Utf8JsonWriter(Console.OpenStandardOutput(),
                                  new JsonWriterOptions() { Indented = true,
                                    ^^Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping^^});
jsonUtf8.WriteStartObject();
jsonUtf8.WriteString("Test", "2+2");
jsonUtf8.WriteEndObject();
jsonUtf8.Flush();

Using UnsafeRelaxedJsonEscaping is not unsafe despite it’s name; darn it, it’s not even relaxed as compared to the specification. It’s just a properly implemented JSON encoder without any extra nonsense thrown in.