Programming in C#, Java, and god knows what not

Better Pseudorandom Numbers

Browsing Internet out of boredom usually brings a lot of nonsense. However, it occasionally also brings a gem. This time I accidentally stumbled upon a family of random algorithm called xoshiro/xoroshiro.

Pseudo-random generators fell out of favor lately as it proper cryptographically-secure algorithms became ubiquitous on modern computers (and often supported by processors RNG). For cases where pseudo-random generators are better fit, most programming languages already include Mersenne twister allowing generation of reasonable randomness.

But that doesn’t mean research into a better (pseudo)randomness has stopped. From that research comes whitepaper named Scrambled linear pseudorandom number generators. Paper alone goes over the algorithms in detail but authors were also kind enough to provide PRNG shootout page giving a practical advise.

After spending quite a few hours with these, I decided that only thing missing is a C# variant of the same. So I created it.

Links to source and NuGet package are here.

Uniform Distribution from Integer in C#

When dealing with random numbers, one often needs to get a random floating point number between 0 and 1 (not inclusive). Unfortunately, most random generators only deal with integers and/or bytes. Since bytes can be easily converted to integers, question becomes: “How can I convert integer to double number in [0..1) range?”

Well, assuming you start with 64-bin unsigned integer, you can use something like this:

ulong value = 1234567890; //some random value
byte[] buffer = BitConverter.GetBytes((ulong)0x3FF << 52 | value >> 12);
return BitConverter.ToDouble(buffer, 0) - 1.0;

With that in mind you you can see that 8-byte (64-bit) buffer is filled with double format combined of (almost) all 1's in exponent and the fraction portion containing random number. If we take that raw buffer and convert it into a double, we’ll get a number in [1..2) range. Simply substracting 1.0 will place it in our desired [0..1) range. It’s as good as distribution of a double can be (i.e. the maximum number of bits - 56 - are used).

This is as good as uniform distribution can get using 64-bit double.


PS: If we apply the same principle to the float, equivalent code will be something like this (assuming a 32-bit random uint as input):

uint value = 1234567890; //some random value
byte[] buffer = BitConverter.GetBytes((uint)0x7F << 23 | value >> 9);
return BitConverter.ToSingle(buffer, 0) - 1.0;

PPS: C#'s Random class uses code that’s probably a bit easier to understand:

return value * (1.0 / Int32.MaxValue);

Unfortunately, this will use only 31 bits for distribution (instead of 52 available in double). This will cause statistical anomalies if used later to scale into a large integer range.

Splitting the Baby

For one of my hardware projects, I decided to try doing things a bit differently. Instead using a single repository, I decided to split it into two - one containing Firmware and other containing Hardware.

Since repository already had those as a subdirectories, I though using --subdirectory-filter as recommended on GitHub would solve it all. Unfortunately, that left way too many commits not touching either of those files. So I decided to tweak procedure a bit.

I first removed all the files I didn’t need using --index-filter. On that cleaned-up state I applied --subdirectory-filter just to bring directory to the root. Unfortunately, while preserving tags was possible, it proved to be annoying enough to actually remove them all and manually retag all once done.

As discussed above, on the COPY of original repository we first remove all files/directories that are NOT Hardware and then we essentially move Hardware directory to the root level of newly reorganized repository.

git remote rm origin
git filter-branch --index-filter \
    'git rm -rf --cached --ignore-unmatch .gitignore LICENSE.md PROTOCOL.md README.md Firmware/' \
    --prune-empty --tag-name-filter cat -- --all
rm -Rf .git/logs .git/refs/original .git/refs/remotes .git/refs/tags
git filter-branch --prune-empty --subdirectory-filter Hardware main
rm -Rf .git/logs .git/refs/original
git gc --prune=all --aggressive
git log --pretty --graph

With Hardware repository sorted, I did exactly the same process for Firmware with the new COPY of original repository, only changing the directory names.

git remote rm origin
git filter-branch --index-filter \
    'git rm -rf --cached --ignore-unmatch .gitignore LICENSE.md PROTOCOL.md README.md Hardware/' \
    --prune-empty --tag-name-filter cat -- --all
rm -Rf .git/logs .git/refs/original .git/refs/remotes .git/refs/tags
git filter-branch --prune-empty --subdirectory-filter Firmware main
rm -Rf .git/logs .git/refs/original
git gc --prune=all --aggressive
git log --pretty --graph

Once I got two repositories, it was easy enough to combine them. I personally love using subtrees but submodules have their audience too.

Using Null in the Face of CS0121

Sometime you might want to use null as a parameter but you get the annoying CS0121.

The call is ambiguous between the following methods or properties…

One could just adjust constructor but that would be an easy way out.

Proper way would be to use default operator. For example, if you want to use null string, you can use something like default(string):

var dummy = new Dummy(default(string));

If we’re using nullable reference types, just add question mark:

var dummy = new Dummy(default(string?));

And this simple trick selects the correct constructor every time.

Parsing GZip Stream Without Looking Back

Some files can exist in two equivalent forms - compressed and uncompressed. One excellent example is .pcap. You can get it as standard .pcap we all know and love but it also comes compressed as .pcap.gz. To open a compressed file in C#, you could pass it to GZipStream - it works flawlessly. However, before doing that you might want to check if you’re dealing with compressed or uncompressed form.

Check itself is easy. Just read first 2 bytes and, if they’re 0x1F8B, you’re dealing with a compressed stream. However, you just consumed 2 bytes and simply handing over file stream to GZipStream will no longer work. If you are dealing with file on a disk, just seek backward and you’re good. But what if you are dealing with streaming data and seeking is not possible?

For .pcap and many more transparently compressed formats, you can simply decide to skip into bread-and-butter of encryption - deflate algorithm. You see, GZip is just a thin wrapper over deflate stream. And quite often it only has a fixed size header. If you move just additional 8 bytes (thus skipping a total of 10), you can use DeflateStream and forget about “rewinding.”

Wanna see example? Check constructor of PcapReader class.

SignTool and Error -2146869243/0x80096005

As I was trying out my new certificate, I got the following error:

SignTool Error: An unexpected internal error has occurred.
Error information: "Error: SignerSign() failed." (-2146869243/0x80096005)

Last time I had this error, I simply gave up and used other timeserver. This time I had a bit more time and wanted to understand from where the error was coming. After a bit of checking, I think I got it now. It’s the digest algorithm.

SignTool still uses SHA-1 as default. Some servers (e.g. timestamp.digicert.com) are ok with that. However, some servers (e.g. timestamp.comodoca.com and timestamp.sectigo.com) are not that generous. They simply refuse to use weak SHA-1 for their signature.

Solution is simple - just add /td sha256 to the list of codesign arguments.

Merging Two Git Repositories

As I went onto rewriting QText, I did so in the completely new repository. It just made more sense that way. In time, this new code became what the next QText version will be. And now there’s a question - should I still keep it in a separate repository?

After some thinking, I decided to bring the new repository (QTextEx) as a branch in the old repository (QText). That way I have a common history while still being able to load the old C# version if needed.

All operations below are to executed in the destination repository (QText).

The first step is to create a fresh new branch without any common history. This will ensure Git doesn’t try to do some “smart” stuff when we already know these repositories are unrelated.

git switch --discard-changes --orphan ^^new^^

This will leave quite a few files behind. You probably want to clean those before proceeding.

The next step is to add one repository into the other. This can be done by adding remote into destination, pointing toward the source. Remote can be anywhere but I find it easiest if I use it directly from my file system. After fetching the repository, a simple merge is all it takes to get the commits. Once that’s done, you can remove the remote.

git remote add ^^QTextEx^^ ^^../QTextEx^^
git fetch --tags ^^QTextEx^^
git merge --ff --allow-unrelated-histories ^^QTextEx^^/main
git remote remove ^^QTextEx^^

After a push, you’ll see that all commits from the source repository are now present in destination too.


PS: Do make backups and triple verify all before pushing it upstream.

PPS: This is a permanent merge of histories. Consider using subtree if you want to keep them separate.

PPPS: Things get a bit more complicated if you want to transfer multiple branches from the other repository.

Background Worker in Qt

Coming from C#, Qt and its C++ base might not look the friendliest. One example is ease of BackgroundWorker and GUI updates. “Proper” way of creating threads in Qt is simply a bit more involved.

However, with some lambda help, one might come upon solution that’s not all that different.

#include <QFutureWatcher>
#include <QtConcurrent/QtConcurrent>
``…``
QFutureWatcher watcher = new QFutureWatcher<bool>();
connect(watcher, &QFutureWatcher<bool>::finished, [&]() {
    //do something once done
    bool result = watcher->future().result();
});
QFuture<bool> future = QtConcurrent::run([]() {
    //do something in background
});
watcher->setFuture(future);

QFuture is doing the heavy lifting but you cannot update UI from its thread. For that we need a QFutureWatcher that’ll notify you within the main thread that processing is done and you get to check for result and do updating then.

Not as elegant as BackgroundWorker but not too annoying either.

Stripping Diacritics in Qt

As someone dealing with languages different than English, I quite often need to deal with diacritics. You know, characters such as č, ć, đ, š, ž, and similar. Even in English texts I can sometime see them, e.g. voilà. And yes, quite often you can omit them and still have understandable text. But often you simply cannot because it changes meaning of the word.

One place where this often bites me is search. It’s really practical for search to be accent insensitive since that allows me to use English keyboard even though I am searching for content in another language. Search that would ignore diacritics would be awesome.

And over the time I implemented something like that in all my apps. As I am building a new QText, it came time to implement it in C++ with a Qt flavor. And, unlike C#, C++ was not really built with a lot of internationalization in mind.

Solution comes from non-other than now late Michael S. Kaplan. While his blog was deleted by Microsoft (great loss!), there are archives of his work still around - courtesy of people who loved his work. His solution was in C# (that’s how I actually rembered it - I already needed that once) and it was beautifully simple. Decompose unicode string, remove non-spacing mark characters, and finally combine what’s left back to a unicode string.

In Qt’s C++, that would be something like this:

QString stripDiacritics(QString text) {
    QString formD = text.normalized(QString::NormalizationForm_D);

    QString filtered;
    for (int i = 0; i &lt; formD.length(); i++) {
        if (formD.at(i).category() != QChar::Mark_NonSpacing) {
            filtered.append(formD.at(i));
        }
    }

    return filtered.normalized(QString::NormalizationForm_C);
}

Renaming Master Branch to Main

Illustration

GitHub is making a major change to the default branch name. As of October 1st, the default branch will be called main instead of master. While this is done just for the new repositories and the official recommendation is to wait until the end of year for the existing ones, I was never the one to follow the rules.

To locally change the name of the branch, you just need to move it.

git branch -m master main

Next step is telling GitHub you have a new branch:

git push -u origin main

If you go to GitHub now, youl’ll see both main and master present with master still being the default. To change this you’ll need to go into the repository settings and switch default branch there.

Only once that step is done, you can delete master branch forever.

git push origin --delete master

Now the existing repository now has main as the default branch name.

As you can see, currently this process is a bit involved and I am sure that GitHub will automate it reasonably soon. You might want to wait with local renames until they do. My plan is to update branch names as I push updates to my projects. Active repositories will get the update sooner while some old repositories might stay with master forever.

That’s all fine and dandy but what about the new repositories? Well, there’s a setting for that too. Just adjust init.defaultBranch Git property.

git config --global init.defaultBranch main

And now you’re ready to roll.


PS: I will not get into politics whether this change was necessary or not. As a Croat, I will never fully understand the slavery and the emotional impact having the master branch might have. For me this change is more of pragmatism. The exact name doesn’t matter much to me and main is a better choice anyhow.