Programming in C#, Java, and god knows what not

Own Pwned

For a while now ';–have i been pwned? has been providing two services. One more known is informing people of data breaches. One slightly less known is their API. My personal favorite are their password search interface. So, I was really sad to see when Troy started charging for it.

While I understand Troy’s reasons, I used this API in freeware application. And yes, I could “swallow” $3.50 this service cost but I wasn’t willing to. My freeware hobby is already costing me enough. :)

Fortunately, Troy is allowing download of password hashes so one could easily make API on their own server. So, over a weekend I did. In my OwnPwned GitHub repository there’s everything you might need to create your own verification service. But there are some differences.

First of all, this is not a substitution for ';–have i been pwned? API as due to dependency on the data from it, it will ALWAYS be one step behind. Also, I haven’t implemented full API as I only needed the password verification portion. Even for password verification portion, I trimmed all extra data (e.g. password breach count) and focused only on passwords themselves.

To make use of the project, you first need to download the latest password dump (ordered by hash). Once you unpack that file, you would use PwnedRepack to convert this to a binary file. I found this step necessary for both speed (as you can use binary search) and for size (as it brought 25 GB file to slightly more manageable but still huge 12 GB).

With file in hand, there are two ways to search data. The first one would be PwnedServe application that will simply expose interface on localhost. Second way forward it serving PwnedPhp on Apache server. Either way, you can do k-anonymity search over a range using the first 5 hexadecimal characters of password’s SHA-1 hash.

Something like this /range/12345/.

And yes, code is not optimized and probably will never be due to the lack of free time on my side. But it does solve my issue. Your mileage may vary.


PS: Please note, Tray Hunt has opensourced some elements of HIBP with more to come. If you need fully-featured interface that’s probably what you should keep eye on.

Small C# InfluxDB client

Well, after doing InfluxDB client bash and Go, time came to do the same in C#.

I will not go too much into details as you can see the source code yourself. Suffice it to say it supports both v1 and v2 line protocol. And usage is simple as it gets:

var measurement = new InfluxMeasurement("Tick")
  .AddTag("t1", "Tag1")
  .AddTag("t2", "Tag2")
  .AddField("f1", 42)
  .AddField("f2", true);
client.Queue(measurement);

Source code is of course on GitHub and project is available and NuGet package.

The Minimal InfluxDB Client in Go

While you can get a proper InfluxDB client library for Go, sometime there’s no substitution for rolling one yourself - especially since InfluxDB’s Line Protocol is really easy.

It’s just a matter of constructing a correct URL and setting up just two headers. Something like this:

func sendInfluxData(line string, baseUrl string, org string, bucket string, token string) {
    var url string
    if len(org) > 0 {
        url = fmt.Sprintf("%s/api/v2/write?org=%s&bucket=%s",
			  baseUrl, org, bucket)
    } else {
        url = fmt.Sprintf("%s/api/v2/write?bucket=%s",
			  baseUrl, bucket)
    }

    request, _ := http.NewRequest("POST", url, strings.NewReader(line))
    request.Header.Set("Content-Type", "text/plain")
    if len(token) > 0 {
        request.Header.Set("Authorization", "Token " + token)
    }

    response, _ := http.DefaultClient.Do(request)
}

And yes, code doesn’t do error checking nor it has any comments. Deal with it. ;)

On Air

Illustration

Working from home requires a bit of synchronization between occupants. Especially if one member of family spends a lot of time on calls. Quite early into the work-at-home adventure, my wife found a solution. She bought a lighted “On Air” sign.

Idea was good. Whenever I am in conference call, I just light up the sign and everybody knows to keep quiet as our words are not private anymore. In reality most of the time I would either leave sign on longer than needed or forger to turn it off when I’m done speaking.

And pretty much all issues could be traced to the position of the sign. While it was visible to everybody else in the room, it wasn’t directly visible to me. And to make it more annoying, turning it off and on required me to get off the chair. Excellent for physical activity but annoying to do if I need to turn it on/off multiple times in a call.

So I decided to automatize this a bit.

I first repurposed one of the Wyze Plug devices I had around and went about looking for API. Unfortunately, Wyze doesn’t offer public API at this time but other people already reverse-engineered it. But alas, all those ports were outdated. Until I found a gem in comments. With those changes it was easy enough to make my own mini application.

While this would be enough for turning on/off the light, I was after something a bit more fine-grained. In quite a few conference calls I might not speak a lot. For them I just usually hit Mute Mic button on my Lenovo P70 and unmute only when I need to actually speak. So it seemed like a good compromise to only light up the sign when I am unmuted. If I’m muted, other family members can have their conversations without impacting my call.

And the following script was the last piece of the puzzle:

#!/bin/bash

LAST_MIC_STATUS=
while (true); do
  CURR_MIC_STATUS=`/usr/bin/amixer get Capture | grep -q '\[off\]' && echo 0 || echo 1`

  if [[ "$LAST_MIC_STATUS" != "$CURR_MIC_STATUS" ]]; then
    LAST_MIC_STATUS=$CURR_MIC_STATUS
    if [[ "$CURR_MIC_STATUS" -ne 0 ]]; then NEW_STATE=true; else NEW_STATE=false; fi

    WYZE_EMAIL="^^unknown@example.com^^" \
    WYZE_PASSWORD="^^changeme^^" \
    WyzePlugControl ^^2CAA8E6616D2^^ $NEW_STATE
  fi

  sleep 1
done

It will essentially just check for the status of my mute button and adjust Wyze Plug accordingly. At least until Wyze changes API again.

Always Pushing Tags

Tagging is nice but I always forget to push the darn things. And yes, I am one of those guys that push all local tags - so what? We’re all kinky in our ways.

There are many ways you can ensure you push tags with your regular push but my favorite is just editing .gitconfig. In the [remote "origin"] section I just add the following two lines:

push = +refs/heads/*:refs/heads/*
push = +refs/tags/*:refs/tags/*

Now each push will come with a bit of extra.

Couldn't Find a Valid ICU Package

As I ran my .NET 5 application on Linux, I was greeted with the following error:

Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support.
   at System.Environment.FailFast(System.String)
   at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
   at System.Globalization.GlobalizationMode..cctor()
   at System.Globalization.CultureData.CreateCultureWithInvariantData()
   at System.Globalization.CultureData.get_Invariant()
   at System.Globalization.CultureInfo..cctor()
   at System.Globalization.CultureInfo.get_CurrentCulture()
   at System.Globalization.NumberFormatInfo.get_CurrentInfo()
...
Aborted (core dumped)

Underlying cause was my ancient Red Hat installation missing localization support and the easy way to deal with it is was to simply set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT environment variable. On command line that would look something like this:

DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 ./myprogram

However, if we really don’t need globalization support, setting it directly in .csproj might be better:

  <PropertyGroup>
    <InvariantGlobalization>true</InvariantGlobalization>
  </PropertyGroup>

Welford's Algorithm

Most of the time, general statistics calculations are easy. Take all data points you have, find the average, the standard deviation, and you have 90% of stuff you need to present a result in a reasonable and familiar manner. However, what if you have streaming data?

Well, then you have a Welford’s method. This magical algorithm enables you to calculate both average and standard deviation as data arrives without wasting a lot of memory accumulating it all.

So, of course, I wrote a C# code for it. To use it, just add something like this:

var stats = new WelfordVariance();
while(true) {
    stats.Add(something.getValue());
    output.SetMean(stats.Mean);
    output.SetStandardDeviation(stats.StandardDeviation);
}

Algorithm is not perfect and will sometime slightly differ from classically calculated standard deviation but it’s generally within a spitting distance and uses minimum of memory. It’s hard to find better bang-for-buck when it comes to large datasets.

Better Pseudorandom Numbers

Browsing Internet out of boredom usually brings a lot of nonsense. However, it occasionally also brings a gem. This time I accidentally stumbled upon a family of random algorithm called xoshiro/xoroshiro.

Pseudo-random generators fell out of favor lately as it proper cryptographically-secure algorithms became ubiquitous on modern computers (and often supported by processors RNG). For cases where pseudo-random generators are better fit, most programming languages already include Mersenne twister allowing generation of reasonable randomness.

But that doesn’t mean research into a better (pseudo)randomness has stopped. From that research comes whitepaper named Scrambled linear pseudorandom number generators. Paper alone goes over the algorithms in detail but authors were also kind enough to provide PRNG shootout page giving a practical advise.

After spending quite a few hours with these, I decided that only thing missing is a C# variant of the same. So I created it.

Links to source and NuGet package are here.

Uniform Distribution from Integer in C#

When dealing with random numbers, one often needs to get a random floating point number between 0 and 1 (not inclusive). Unfortunately, most random generators only deal with integers and/or bytes. Since bytes can be easily converted to integers, question becomes: “How can I convert integer to double number in [0..1) range?”

Well, assuming you start with 64-bin unsigned integer, you can use something like this:

ulong value = 1234567890; //some random value
byte[] buffer = BitConverter.GetBytes((ulong)0x3FF << 52 | value >> 12);
return BitConverter.ToDouble(buffer, 0) - 1.0;

With that in mind you you can see that 8-byte (64-bit) buffer is filled with double format combined of (almost) all 1's in exponent and the fraction portion containing random number. If we take that raw buffer and convert it into a double, we’ll get a number in [1..2) range. Simply substracting 1.0 will place it in our desired [0..1) range. It’s as good as distribution of a double can be (i.e. the maximum number of bits - 56 - are used).

This is as good as uniform distribution can get using 64-bit double.


PS: If we apply the same principle to the float, equivalent code will be something like this (assuming a 32-bit random uint as input):

uint value = 1234567890; //some random value
byte[] buffer = BitConverter.GetBytes((uint)0x7F << 23 | value >> 9);
return BitConverter.ToSingle(buffer, 0) - 1.0;

PPS: C#'s Random class uses code that’s probably a bit easier to understand:

return value * (1.0 / Int32.MaxValue);

Unfortunately, this will use only 31 bits for distribution (instead of 52 available in double). This will cause statistical anomalies if used later to scale into a large integer range.

Splitting the Baby

For one of my hardware projects, I decided to try doing things a bit differently. Instead using a single repository, I decided to split it into two - one containing Firmware and other containing Hardware.

Since repository already had those as a subdirectories, I though using --subdirectory-filter as recommended on GitHub would solve it all. Unfortunately, that left way too many commits not touching either of those files. So I decided to tweak procedure a bit.

I first removed all the files I didn’t need using --index-filter. On that cleaned-up state I applied --subdirectory-filter just to bring directory to the root. Unfortunately, while preserving tags was possible, it proved to be annoying enough to actually remove them all and manually retag all once done.

As discussed above, on the COPY of original repository we first remove all files/directories that are NOT Hardware and then we essentially move Hardware directory to the root level of newly reorganized repository.

git remote rm origin
git filter-branch --index-filter \
    'git rm -rf --cached --ignore-unmatch .gitignore LICENSE.md PROTOCOL.md README.md Firmware/' \
    --prune-empty --tag-name-filter cat -- --all
rm -Rf .git/logs .git/refs/original .git/refs/remotes .git/refs/tags
git filter-branch --prune-empty --subdirectory-filter Hardware main
rm -Rf .git/logs .git/refs/original
git gc --prune=all --aggressive
git log --pretty --graph

With Hardware repository sorted, I did exactly the same process for Firmware with the new COPY of original repository, only changing the directory names.

git remote rm origin
git filter-branch --index-filter \
    'git rm -rf --cached --ignore-unmatch .gitignore LICENSE.md PROTOCOL.md README.md Hardware/' \
    --prune-empty --tag-name-filter cat -- --all
rm -Rf .git/logs .git/refs/original .git/refs/remotes .git/refs/tags
git filter-branch --prune-empty --subdirectory-filter Firmware main
rm -Rf .git/logs .git/refs/original
git gc --prune=all --aggressive
git log --pretty --graph

Once I got two repositories, it was easy enough to combine them. I personally love using subtrees but submodules have their audience too.