Obtaining Hash as a Part of 11ty Build

Last week I stated in post that I’m not singing my software anymore. So someone might be wondering how do I check I have executable coming from you. Well, together with that post I also added SHA-256 to every file that gets downloaded from my site. If you don’t know how, I even created Summae in order to bring this information into the context menu. And, to make the whole task of generating SHA-256 easier on myself, I added it as part of the build process.

First, I had to add crypto-js package to my 11ty package:

npm install crypto-js

Then, in eleventy.config.mjs, I added an import:

import cryptojs from "crypto-js";

Lastly, into eleventy.config.mjs default function I added the sha256 shortcode:

eleventyConfig.addShortcode("sha256", async function (file) {
  const filePath = path.join(eleventyConfig.directories.output, file);
  if (fs.existsSync(filePath)) {
    const fileContent = fs.readFileSync(filePath, 'binary')
    const sha256hash = crypto.createHash('sha256').update(fileContent, 'binary').digest('hex');
    return sha256hash;
  } else {
    console.error(`File not found: ${filePath}`);
    process.exit(1);
    return "";
  }
});

Whenever I want to use the shortcode, I just add a call to it using the file name as an argument.

{% sha256 "/download/file.zip" %}

Now, in reality, there is a bit more code - especially in the templating area. But that code is just to read data from variables, style the output, etc.

So, how does this help?

Well, it allows the end user to check file validity. If hash code matches you know that download was successful and you got the file I was intending to provide. And I can offer this at no cost to myself.

Not Signing Software Anymore

I used to sign my software. And not with my personal code signing certificate - I used a “proper”, third-party one. Since my software offered here is freeware, this wasn’t really cheap - about $50 a year or so. But I could justify it to myself. Barely. All for a green checkmark.

However, since some time in 2023, pretty much all code signing certificate providers trusted by Windows started requiring hardware tokens. And that is indeed more secure. But it also raises certificate price to about $200 a year. For a software company, it’s a trivial matter. For somebody who offers their code for (mostly) free, it’s quite an increase.

So, what does the certificate give you? Blue checkmark when you install your software under Windows. Nothing more, nothing less.

For me, that checkmark is not worth $200. It wasn’t worth $50 but, for that price, I liked having it. And remember, this applies to Windows only. All my Linux software has no benefit whatsoever.

So, when my last code signing certificate expired, I never bothered to get a new one. And I didn’t get a single complaint about it in 2 years now.


PS: Please note that software signing doesn’t protect you from malware. Signing software just means somebody paid money - malware authers can sign their software too.

Speed Boost for Repeated SSH

If you lead Linux life, you probably have a bunch of scripts automating it. Assuming you have access to more than one computer, it’s really easy to use SSH and execute stuff remotely. In my network, for example, I’ve automated daily reports. They connect to all various servers I have around, collect bunch of data, and twice a day I get an e-mail detailing any unexpected findings.

Now, this script has grown over the years. At the very beginning it was just checking ZFS status and health, then it evolved to check smart data, and now it collects all things disk related up to a serial number level. And that’s not the only one. I check connectivity, temperatures, backup states, server health, docker states, and bunch of other stuff. So, my daily mail that used too come at 7:00 and 19:00 every day over time started taking over 30 minutes. While this is not a critical issue, it started bugging me - why the heck it takes that long.

Short analysis later and my culprit was traced to the number of SSH commands those script execute. Just checking my disks remotely executed commands over SSH more than 50 times. Per server. And that wasn’t the only one.

Now, solution was a simple one - just optimize darn scripts. And there was a lot of places to optimize as I rarely cached command output. However, those optimizations would inevitevely make my Bash scripts uglier. And we cannot have that.

Thus, I turned toward another approach - speeding up the SSH. Back in days when I first played with the Ansible, I noticed that it keeps its connections open. At the time I mostly noticed it due to issues it caused. But now I was thinking - if the Ansible can reuse connections, can I?

And indeed I can. Secret lies in adding the following configuration to the ~/.ssh/config file:

ControlMaster  auto
ControlPersist 1m
ControlPath    ~/.ssh/.persist.%r@%h:%p

What this controls is leaving the old SSH connection open, and then reusing the existing connection instead of going throush the SSH authentication each time. Since SSH authentication is not the fastest thing out there, this actually saves a lot of CPU time thus speeding it a lot. And, since connection is encrypted, you don’t lose anything.

Setting ControlMaster to auto allows your SSH connection to reuse the existing connection if it exists and fallback to the “standard” behavior if one cannot be found. Location of cached sockets is controlled using ControlPath setting and one should use directly that is specific to the user. I like using .ssh rather than creating a separate directory but any valid path will do as long as you parameterize it using %r, %h, and %p at a minimum. And lastly, the duration we can specify using ControlPersist value. Here I like using 1 minute as it gives me meaningful caching for script use while not keeping connection so long that I need to kill them manually.

With this, the execution time for my scripts went from more than 30 minutes to less than 5. Not bad for a trivial change.

Console Log Format

Pretty much every application I’ve created has some logging included. Mostly it’s just humble Debug and Trace statements but I’ve used more complex solutions for bigger apps. But one thing was lacking - standardized format.

I’ve noticed all my apps use slightly different log format. My bas scripts, e.g. for docker apps, usually have completely different format than my C# applications. And C# applications are also all different from one another, depending which framework I use. For example, applications using Serilog look completely different than apps I made using just Debug.WriteLine. And no, it’s not really a huge problem since each log output is similar enough for me to parse and use without using much brain power. But, combining those apps (e.g. in Docker) is what makes it annoying. You can clearly see where entry-point script ends, and other application begins. It just looks ugly.

So, I decided to reduce number of different log outputs by figuring out what is important to me when dealing with console logs. So I landed on the following list:

  • instantly recognizable - there should be no though required to figure out what each of fields does
  • minimum effort - same overall log format must be outputtable from simple bash script or complex application
  • grepable - any output must be easily filtered using grep
  • easily parsable - it must be possible to extract each field using basic linux tools (e.g. cut and awk)
  • single line - single line per log entry; if more is needed, in parallel output to different format, e.g. json
  • using std/err output - anything more serious than info should go to stderr output
  • colored - different log levels should result in different colors

In the end, I settled on something like this:

DATE       TIME         LEVEL CATEGORY+ID+TEXT
1969-07-20 20:17:40.000 INFO  lander: Sequence completed
1969-07-20 23:39:33.000 INFO  person:1: Hatch open

Date and time fields were easy choice to start message with. In a lot of my applications I used proper ISO format (e.g. 1969-07-20T20:17:40.000) but I opted to “standardize” on space. Reason is legibility. While date field is needed, I rarely care about it when troubleshooting - for that I most often just care about time. Separating time by a space allows for much greater legibility. As for time-zone, console output will always use the local time-zone.

I am a huge fan of UTC and I believe one should use it - most of the time. But it is hard to justify its usage on home servers where instead of helping it actually hinders troubleshooting. Compromise is just to use local time zone. If server is UTC, output will be UTC. And, as much as I love UTC I hate its denominator - Z. If the whole log is in UTC, adding Z as a suffix just makes things less readable.

I also spent way too much time thinking if I should include milliseconds or not. Since I found them valuable plenty of times, I decided they’re worth of extra 4 characters. Interestingly, I found that getting ahold of them in bash is not always straightforward. While under most Linux distributions, you can get time using date +'%Y-%m-%d %H:%M:%S.%3N', this doesn’t work on Alpine Linux. It’s busybox date doesn’t offer %N as an option. I found that date +"%F $(nmeter -d0 '%3t' | head -n1)" is a simple workaround.

Next space-separated fields is log level. Up to now I often used a single letter log level, e.g. E: for error. But I found that this is not always user friendly. Thus, I decided to expand name a bit:

TextColor.NETSerilogSyslogStream
TRACEDark blueTraceVerbose-1>
DEBUGBright blueDebugDebugDebug (7)1>
INFOBright cyanInformationInformationInformation (6)1>
WARNBright yellowWarningWarningWarning (4)2>
ERRORBright redErrorErrorError (3)2>
FATALBright redCriticalFatalCritical (2)2>

Each log level is now 5 characters long. This makes parsing easier while still maintaining readability. I was really tempted to enclose them in square brackets, e.g. [INFO] since I find this format really readable. However, this would require escaping in grep and that is something I would hate to do.

Making log level field always 5 characters in length also helps to align text that follows. Just cut the first 30 characters and you get rid of date, time, and log level.

Next field contains category name which usually matches the application. This field is not fixed size but it should be easily parsable regardless due to it ending in colon (:) character followed by a space. If log entry has ID, the same is embedded within two colon characters. If ID is 0, it can be omitted. For practical reason, I try sticking to ID numbers 1000-9999 but field has no official width so anything within u64 should be expected

I don’t really use ID for my events in every application but they are so valueable when it comes to a large code base that I simply couldn’t omit them. However, they are just annoying when it comes to small application so I didn’t want to make this a separate field. In the end, I decided to keep them between two colons as that impacted my parsing the least.

And, finally, the last component is the actual log text. This is a free form field with only one rule - no control characters. Any control character (e.g. LF) should be escaped or stripped.

Of course, sometime you will need additional details, exceptions, or execution output. In those cases, I will just drop all that text verbatim with two spaces at front. This will make it not only visually distinct but also really easy to remove using grep.

With this in mind, I will now update my apps.

Will I change all of them? Not really. Most of my old applications will never get a log format update. Not only it’s a lot of work to update them but it might also mess with their tools and scripts.

This is just something I’ll try to keep in my mind going forward.

Just Screw It

Sometimes something that ought to be simple might lead you to the wild goose chase. This time, I was searching for humble 2.5" SSD screws.

And yes, this was the first time in my life I had to search for them. Back in Croatia I have a bin full of leftover PC screws. It would have been easy to just grab them.

However, I moved to the US years ago and I never bothered to bring assorted screws with me. Not that I missed them - pretty much all cases accepting 2.5" drives came with screws. It wasn’t until I made a 3D printed cage for 2.5" drives that I figured I have none laying around.

So, simple enough, I needed between 10 screws to fully screw 2 disks in and attach them to the chassis. I mean, I could have gotten away with using just 3. But, since I was buying screws anyhow, I might as well fill all the holes.

It was easy to find that the screw is a flat top M3 with fine threads (0.5mm). But for length I saw multiple values. Everything from 2 to 5 millimeters.

So I went on to measure the screws I had in my computers, only to find three different dimensions: 3, 3.5, and 4 mm. And that was based on the total of 4 sets of screws (1/1/2 distribution, for curious). I discounted M3x3.5 almost immediately since it was hard to find it at a reasonable price. That left me with M3x3mm and M3x4mm as seemingly equaiy good candidates.

But then I struck the gold - WDs specification. There, in black and white, it’s clearly stated that a 2.5" inch drive will accommodate up to 3mm screw length for side mounting holes and up to 2.5mm for the holes on the bottom. Minimum thread requirements were 1.5mm for the side hole and 1mm for the bottom hole. If I wanted an universall screw for any set of holes, I had to aim for something that has thread length between 1.5 and 2.5 mm.

If we account for sheet metal holding the drive, that means M3x3mm is a clear universal winner. At least in theory.

But how come 2 of my screw sets were 4mm? Wouldn’t that present a problem? Well, all 2.5" drives I had (2 spinning rust, 4 SATA SSD) accepted the full 4mm for the side holes without any issue. All SSD drives with bottom holes were happy to accept the same. And, based on my (limited) sample, using M3x4mm will work just fine - even on WD’s own drives.

In the end I ordered the M3x3mm. Just in case.


PS: For 3.5" drives, check this WD’s mounting specification.