Never Gonna BOM You Up

.NET supported Unicode from its very beginning. Pretty much anything you might need for Unicode manipulation is there. Yes, as early adopters, they made a bet on UTF-16 that didn’t pay off since rest of the world has moved toward UTF-8 as an (almost) exclusive encoding. However, if we ignore a bit higher memory footprint, C# strings made Unicode as easy as it gets.

And, while UTF-8 is not a native encoding for its strings, C# is no slouch and has a convenient Encoding.UTF8 static property allowing for easy conversion. However, if you do use that Encoding.UTF8.GetBytes() function, you will get a bit extra.

That something extra is Byte order mark. Its intention is noble - to help detect endianess. However, its usage for UTF-8 is of dubious help since 8-bit encoding doesn’t really have issues with endianness to start with. Unicode specification itself does allows for one but doesn’t recommend it. It merely acknowledges it might happen as a side-effect of data conversion from other unicode encodings that do have endianness.

So, in theory, UTF-8 with BOM should be perfectly acceptable. In practice, only Microsoft really embraced UTF-8 BOM. Pretty much everybody else decided to have UTF-8 without BOM as that allowed for full compatibility with 7-bit ASCII.

With time, .NET/C# stopped being Windows-only and, by today, became really good multiplatform solution. And now, helper function that ought to simplify things is actually producing output that will annoy many command-line tools that don’t expect it. If you read the documentation, solution exists - just create your own UTF-8 converter instance.

private static readonly Encoding Utf8 = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);

Now you can call Utf8.GetBytes() instead and you will get expected result on all platforms, including Windows - no BOM, no problems.

So, one could argue that Encoding.UTF8 default should be changed to what is more appropriate value. I mean, .NET is multiplatform and the current default doesn’t work everywhere. One could argue but this default is not changing, ever.

When any project starts, decisions must be made. And you won’t know for a while if those decisions were good. On the other hand, people will start depending on whatever behavior you selected.

In the case of BOM, it might be that developer got so used to having those three extra bytes that, instead checking the file content, they simply use <=3 as a signal file is empty. Or they have a script that takes output of some C# application and just strips the first three bytes blindly before moving it to non-BOM friendly input. Or any other decision somebody made in project years ago. It doesn’t really matter how bad someones code is. What matters is that code is currently working and new C# release shouldn’t silently break somebody’s code.

So, I am reasonably sure that Microsoft won’t ever change this default. And, begrudgingly, I agree with that. Some bad choices are simply meant to stay around.


PS: And don’t let me start talking about GUIDs and their binary format…

CoreCompile into the Ages

For one project of mine I started having a curious issue. After adding a few, admittedly a bit complicated, classes my compile times under Linux shot to eternity. But that was only when running with dotnet command line tools. In Visual Studio under Windows, all worked just fine.

Under dotnet I would just see CoreCompile step counting seconds, and then minutes. I tried increasing log level - nothing. I tried not cleaning stuff, i.e. using cached files - nothing. So, I tried cleaning up my .csproj file - hm… things improved, albeit just a bit.

A bit of triage later and I was reasonably sure that .NET code analyzer are the culprit. Reason why changes to .csproj reduced the time was because I had AnalysisMode set quite high. Default AnalysisMode simply checks less.

While disabling .NET analyzers altogether was out of question, I was quite OK with not running them all the time. So, until .NET under Linux gets a bit more performant, I simply included EnableNETAnalyzers=false in my build scripts.

  -p:EnableNETAnalyzers=false

Another problem solved.

Custom StringBuilder Pool

In my last post I grumbled about ObjectPool being a separate package. That was essentially the single downside to use it. So, how hard is to implement our own StringBuilder pool?

Well, not that hard. The whole thing can be something like this:

internal static class StringBuilderPool {

    private static readonly ConcurrentQueue<StringBuilder> Pool = new();

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static StringBuilder Get() {
        return Pool.TryDequeue(out StringBuilder? sb) ? sb : new StringBuilder(4096);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool Return(StringBuilder sb) {
        sb.Length = 0;
        Pool.Enqueue(sb);
        return true;
    }

}

In our Get method we check if we have any stored StringBuilder. If yes, we just return the same. If no, we create a new instance.

In the Return method we just add the returned instance to the queue.

Now, this is not exactly an ObjectPool equivalent. For example, it doesn’t limit the pool size. And it will keep large objects around forever. However, for my case it was good enough and unlikely to cause any problems.

And performance… Well, performance is promising, to say the least:

TestMeanErrorStdDevGen0Gen1Allocated
StringBuilder (small)15.762 ns0.3650 ns0.4057 ns0.0181-152 B
ObjectPool (small)17.257 ns0.0616 ns0.0576 ns0.0057-48 B
Custom pool (small)16.864 ns0.0192 ns0.0150 ns0.0057-48 B
Concatenation (small)9.716 ns0.1634 ns0.1528 ns0.0105-88 B
StringBuilder (medium)58.125 ns0.6429 ns0.6013 ns0.0526-440 B
ObjectPool (medium)23.226 ns0.0517 ns0.0484 ns0.0115-96 B
Custom pool (medium)23.660 ns0.2515 ns0.1963 ns0.0115-96 B
Concatenation (medium)66.353 ns1.3307 ns1.2447 ns0.0793-664 B
StringBuilder (large)190.293 ns0.7781 ns0.6498 ns0.24960.00102088 B
ObjectPool (large)92.556 ns0.9281 ns0.8228 ns0.0755-632 B
Custom pool (large)91.470 ns0.5478 ns0.5124 ns0.0755-632 B
Concatenation (large)1,430.599 ns11.5971 ns10.8479 ns4.01690.005733600 B

Pretty much its on-par with ObjectPool implementation. Honestly, results are close enough to be equivalent for all practical purposes.

So, if you don’t want to pull the whole Microsoft.Extensions.ObjectPool just for caching a few StringBuilder instances, consider rolling your own.

To Pool or Not to Pool

Illustration

For a project of mine I “had” to do a lot of string concatenations. Easy solution was just to have a string builder and go wild. But I wondered, does it make sense to use ObjectPool (found in Microsoft.Extensions.ObjectPool package). Thus, I decided to do a few benchmarks.

For my use case, “small” was just appending 3 items to a StringBuilder. The “medium” is does total of 21 appends. And finally, “large” does 201 appends. And no, there is no real reason why I used those exact numbers other than loop ended up being nice. :)

After all this, benchmark results (courtesy of BenchmarkDotNet):

TestMeanErrorStdDevGen0Gen1Allocated
StringBuilder (small)16.295 ns0.1240 ns0.1160 ns0.0181-152 B
StringBuilder Pool (small)17.958 ns0.3125 ns0.2609 ns0.0057-48 B
StringBuilder (medium)87.052 ns1.5177 ns1.4197 ns0.08320.0001696 B
StringBuilder Pool (medium)31.245 ns0.1815 ns0.1417 ns0.0181-152 B
StringBuilder (large)304.724 ns1.6736 ns1.3975 ns0.45200.00293784 B
StringBuilder Pool (large)172.615 ns1.5325 ns1.4335 ns0.1471-1232 B

As you can see, if you are doing just a few appends, it’s probably not worth messing with ObjectPool. Not that you should use StringBuilder either. If you are adding 4 or fewer strings, you might as well concatenate them - it’s actually more performant.

However, if you are adding 5 or more strings together, pool is no worse than instantiating a new StringBuilder. So, for pretty much any scenario where you would use StringBuilder, it pays off to pool it.

Is there a situation where you would avoid pool? Well, performance-wise, I would say probably no. I ran multiple tests and, on my computer, there was no situation where StringBuilder alone was better than either pool or concat. Yes, StringBuilder is performant at low number of appends, but string concatenation is better. As soon as you go over a few appends, ObjectPool actually makes sense.

However, an elephant in the room is ObjectPool’s dependency on external package. Call me old fashioned but there is a value in not depending on extra packages.

The final decision is, of course, dependant on you. But, if performance is important, I see no reason why not to use ObjectPool. I only wish it wasn’t an extra package.


For curious ones, code was as follows:

[Benchmark]
public string Large_WithoutPool() {
    var sb = new StringBuilder();
    sb.Append("Hello");
    for (var i = 0; i < 100; i++) {
        sb.Append(' ');
        sb.Append("World");
    }
    return sb.ToString();
}

[Benchmark]
public string Large_WithPool() {
    var sb = StringBuilderPool.Get();
    try {
        sb.Append("Hello");
        for (var i = 0; i < 100; i++) {
            sb.Append(' ');
            sb.Append("World");
        }
        return sb.ToString();
    } finally {
        sb.Length = 0;
        StringBuilderPool.Return(sb);
    }
}

And yes, I also tested just a simple string concatenation (quite optimized for smaller number of concatenations):

TestMeanErrorStdDevGen0Gen1Allocated
Concatenation (small)9.820 ns0.2365 ns0.2429 ns0.0105-88 B
Concatenation (medium)146.901 ns1.6561 ns1.2930 ns0.2294-1920 B
Concatenation (large)4,710.573 ns43.5370 ns96.4750 ns15.20540.0458127200 B

LocoNS

If one checks all the freeware stuff I made over the years, they might notice a theme. They are usually solving problem that only I seemingly have. And yes, this program one of those too.

As many people do, I have most of my internal DNS resolution handled by mDNS. I used to have it done by my router, but over time I moved to encrypted DNS and spinning that one internally seemed like an overkill. So, I just rely on all elements having their mDNS running and all getting auto-magically resolved. For devices that are not capable of resolving mDNS themselves, I use to run Avahi on my main server. Avahi uses my hosts file and thus I avoid having to distribute config to each machine. Except that Avahi doesn’t really understand my hosts file.

Part of an issue is having two different names for the same server. For example, I have main server and its backup with unique name each (vilya and nenya). But I don’t use that name directly. I usually access the active one using common name (ring) that is switched between them as I need to do some work. Usually ring is the same IP as my main server (vilya). But, if I know I am going to do some work, I will redirect it to the backup server (nenya) in order to keep (read-only) access to all the family stuff. Once done, ring just moves back.

And this simple scenario is something Avahi specifically will not do. Avahi allows only one DNS name per IP, no exceptions. And that’s probably how it should be. But that’s not how I want it. So, I built LocoNS.

LocoNS is as dumb as mDNS servers get. By default it will get onto all available interfaces and use hosts file as its source of truth. If there are multiple names for an IP address (as it’s explicitly allowed in hosts file), it will learn all of them. In addition, it will listen to other mDNS traffic and remember where things are. If there is any query, LocoNS will respond immediately.

The whole application is setup so it works with unmodified hosts file and no special configuration should be necessary for it to work. Of course, you can still change functionality. For example, you can define which interfaces you want to use, whether you want to even “learn” from other mDNS server, or even if you want to use hosts file to begin with. But, configuration is kept simple intentionally.

And no, LocoNS is not a full mDNS solution. To start with, it only supports A and AAAA records. Its intention is to be only a supporting element that will solve one issue mDNS doesn’t usually solve for me.

If this peaked your curiosity, download is available on its page. You can download either AppImage, Debian package, or a docker image. And yes, I know there is no Windows download. While LocoNS will work under Windows, I am just too lazy to make it into a service. I guess I might, if enough people scream at me. Chances are, that probably won’t happen.

If this all sounds as problem you also need solved, do check it out.