BOM Away, in Git Style

Some time ago I made a Mercurial hook (killbom) that would remove BOM from UTF-8 encoded files. As I switched to Git, I didn’t want to part with it so it was a time for rewrite. Unlike Mercurial, there is no global hook mechanism. You will need to add hook for each repository you want it in.

Start is easy enough. Just create pre-commit file in .git/hooks directory, Looking from the base of the repository file name would thus be .git/hooks/pre-commit. Content of that file would then be as follows:

#!/bin/sh

git diff --cached --diff-filter=ACMR --name-only -z | xargs -0 -n 1 sh -c '
    for FILE; do
        file --brief "$FILE" | grep -q text
        if [ $? -eq 0 ]; then
            cp "$FILE" "$TEMP/KillBom.tmp"
            git checkout -- "$FILE"

            sed -b -i -e "1s/^\xEF\xBB\xBF//" "$FILE"
            NEEDSADD=`git diff --diff-filter=ACMR --name-only | wc -l`
            if [ $NEEDSADD -ne 0 ]; then
                sed -b -i -e "1s/^\xEF\xBB\xBF//" "$TEMP/KillBom.tmp"
                echo "Removed UTF-8 BOM from $FILE"
                git add "$FILE"
            fi

            cp "$TEMP/KillBom.tmp" "$FILE"
            rm "$TEMP/KillBom.tmp"
        else
            echo "BINARY $FILE"
        fi
    done
' sh

ANYCHANGES=`git diff --cached --name-only | wc -l`
if [ $ANYCHANGES -eq 0 ]; then
    git commit --no-verify
    exit 1
fi

What this script does is first getting list of all modified files separated by the null character so that we can deal with spaces in the file names.

git diff --cached --diff-filter=ACMR --name-only -z

For each of these files we then perform replacing of the first three bytes if they are 0xEF, 0xBB, 0xBF:

sed -b -i -e "1s/^\xEF\xBB\xBF//" "$FILE"

What follows is a bit of a mess. Since it is really hard to get information whether file has been changed without temporary files, I am abusing git to check if file has been changed since it was first staged. If that is the case, assumption will be made that it was due to sed before it. If that assumption is not correct, your commit will have one extra file. As people don’t have same file changed in both staged and un-staged are, I believe risk is reasonably low.

After all files are processed, final check is made whether anything is available for commit. If there are no files in staging area, current commit will be terminated and new commit will be started with --no-verify option. Only reason for this change is so that standard commit message can be written in cases when removal of UTF-8 BOM results in no actual files to commit. Replacing it with message “No files to commit” would work equally well.

While my goal of getting BOM removed via the hook has been reasonably successful, Git hook model is really much worse than one Mercurial has. Not only that global (local) hooks are missing but having multiple hooks one after another is not really possible. Yes, you can merge scripts together in a file but that means you’ll need to handle all exit scenarios for each hook you need. And let’s not even get into how portable these hooks are between Windows and Linux.

If you are wondering what is all that $TEMP operation, it is needed in case of interactive commits. Committing just part of file is useful but didn’t play well with this hook. Saving a copy on side sorted that problem.

Download for current version of pre-commit hook can be found at GitHub.

PS: Instead of editing pre-commit file directly, you can also create it somewhere else and create a symbolic link at proper location.

PPS: I have developed and tested this hook under Windows. It should work under Linux too, but your mileage might vary depending on exact distribution.

[2015-07-12: Added support for interactive commits.] [2015-11-17: Added detection for text/binary.]

Offline to Online Switch on a Minecraft Server

Illustration

It all started with my kids learning about Minecraft skins and their dad not being able to get their new look working in the game. No matter what, they would stay Steve and Alex. Quick search told me skins are not supported in offline mode and my home server was setup as such. No worries I thought - I’ll just switch online-mode setting in [server.properties](http://minecraft.gamepedia.com/Server.properties) from false to true and that will be it.

However, after I restarted server, my whole family got to start from scratch. We were in skinned bodies but we were also in new locations. It was as if we logged onto the world for the first time. To make it worse, nobody had access to commands anymore. Our ops status has been effectively revoked.

As I added myself to ops again through Minecraft server GUI, I noticed that ops.json got two entries for my user name but each with different UUID. And I could find both UUIDs in my world’s save directory world\playerdata. That got me wondering. What would happen if I would delete file with new UUID and rename old UUID file to it. That is, if my old UUID was 76116624-b235-36a2-a614-ed79be1855ed and my new UUID was d8b2b4e0-1807-4177-a3ca-46afbd1d7538, would renaming 76116624-b235-36a2-a614-ed79be1855ed to d8b2b4e0-1807-4177-a3ca-46afbd1d7538 enable me to get back into my offline body?

Fortunately yes. Transplantation of player data succeeded without any issues. So I went through all save directories and changed played data from old to new UUID. But that wasn’t all. As we were all ops with different ops level for various worlds, I had to visit every ops.json and adjust for that. Simple search/replace was all it took.

And guess what, if you ever decide to make your server offline again, same annoyance in guaranteed since Minecraft has different UUIDs for online and offline mode. There simply seems no way around it. Later I found that people have even built tools to help them with rename.

As Minecraft requires you to verify your credentials at least once over Internet when you buy it, I cannot believe that there is technical reason behind this. Even more because this change was seemingly introduced only with version 1.7.6. My best guess is that it was added as some sort of anti-piracy measure. And as all such measures do, it ended up annoying more paying players than pirates.

In any case my, now online, server recovered from its temporary amnesia and digging could start again.

PS: Paranoid among us might want to check for UUIDs in whitelist.json too.

Moving From Mercurial to Git (Part 2)

With decision to move away from Git, next big step was to transfer existing repositories. While there is a semi-reasonable Git support on Windows, any major dealing with Git is made much easier if you have Linux laying around. In my case decision was to deal with CentOS.

First step was to install all stuff we’ll need - Git, Mercurial and git-remote-hg script:

yum -y install git
yum -y install mercurial
yum -y install wget
mkdir ~/bin
wget https://raw.github.com/felipec/git-remote-hg/master/git-remote-hg -O ~/bin/git-remote-hg
chmod +x ~/bin/git-remote-hg

With that it was time to clone the first repository:

git clone hg::https://bitbucket.org/jmedved/vhdattach

In ideal world this would be all and we are close. If omitted, this step would skip reproducing branching structure on Git. But, while we are at it, I wanted to recreate branching structure I’m used to. Since conversion process leaves all branches inside remotes/origin/branches path, I wanted to move things around a bit:

cd vhdattach
git branch -a | grep 'remotes/origin/branches' | grep -v default | xargs -n 1 -I _ echo _ | cut -d/ -f 4 | xargs -n 1 -I _ git branch _ remotes/origin/branches/_
git branch

Next (optional) step was to fix my old commits:

git filter-branch --env-filter '

OLD_EMAIL="unknown"
CORRECT_NAME="Josip Medved"
CORRECT_EMAIL="jmedved@jmedved.com"

if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_COMMITTER_NAME="$CORRECT_NAME"
    export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_AUTHOR_NAME="$CORRECT_NAME"
    export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

Since my move to Git was intended to be final I also wanted to change origin. My new home was to be GitHub:

git remote rm origin
git remote add origin git@github.com:medo64/vhdattach.git
git push --all

With that my move was complete.


Part 1

[2020-05-01: There is a bit newer version of this guide.]

Moving From Mercurial to Git (Part 1)

I used to be a huge fan of Visual Basic. When Visual Basic .NET arrived it seemed as most logical thing to use it instead of newfangled C#. However, it became clear quite quickly that Visual Basic is second-class citizen in that universe. Outside of a few exceptions, C# would get everything first. Whether it is a new feature or support for a new platform, Visual Basic would lag behind. So I switched and I haven’t looked back.

Reason why I tell this story is because I am doing similar switch again. I love Mercurial. It is a beautiful distributed versioning system which hides all complexity from a user whether it is in a command line or a great GUI client. It is modern, smart, and well designed system. Any fault with it is just nitpicking and more than outweighed by its features. All that said, Mercurial is also a second-class citizen.

Other distributed source control system almost everybody has heard of is Git. It is powerful and it has reasonable command line interface (although that wasn’t always the case). What it doesn’t have is proper GUI support nor it is equally well designed as Mercurial - especially when it comes to multiplatform support. Git is a bunch of different tools and it shows (you can make a hobby out of finding different commands that do exactly the same thing). However, fact is that you can get used to its shortcomings.

Community gathering around Git is much bigger than anything Mercurial can offer. Small part of it is due to its undeniable power. But I believe big part is due to being Linus’ baby. That gave it an early boost and, once you get used to its peculiarities, there is simply no reason to go to the other versioning system. And more people brings even more people in. And gets more people working on development. So peculiarities became bugs and get fixed. And platform liveliness brings even more people and better. Positive reinforcement at its best.

Simple search for Git vs Mercurial hosting or Git vs Mercurial push best illustrates the difference in usage. Git has simply won. Mind you, that doesn’t mean that Mercurial is going to die - I surely hope not. But it is going to be always an alternative choice.

All that said, I will be moving my open source project to GitHub. Private projects I don’t intend to share will stay with BitBucket but I will be switching them to Git as I do updates. Only things staying on Mercurial will be projects I am sharing with others and those I don’t update at all.

And time with Mercurial wasn’t really wasted - far from it. I would go as far to tell that it is the best distributed version control for taking the first steps. Pretty much all things I’ve learned can transfer directly to Git.

So long and thanks for all the fish.


Part 2

IPv4 Multicast Messing Things Up

I have a reasonable size home IPv4/IPv6 network with 10ish devices accessing it most of the time. And all worked in a perfect harmony until one morning when wife told me she cannot print on our wireless printer (cheap Brother MFC-J435W). While printer seemed fine, all computers seemed to think of it as offline. Without resolving problem I went to work only to be greeted with information that wife’s tablet (Asus Prime) isn’t working either.

After some debugging I pronounced network on both devices dead on continued working on a small utility that uses broadcast to share some basic data over network to whoever might be listening. As I upgraded versions from night before with new code, printer suddenly started to print again. For a minute or so and then stopped again. And few more times that evening. Whenever I restarted media server, printer would work for a while.

Well, I believe you get the pattern.

What exactly causes this particular issue is hard to say. My best guess is that something is wrong with IPv4 multicast packet parser (if it even exists) in both devices. Something in my packets is causing device’s network stack to go haywire. I know it isn’t address itself because I tried with a few other multicast addresses with similar issues. I know it isn’t bandwidth issue since total amount of messages was less that 2 KB/s. In any case, I didn’t have too much time to troubleshoot.

End solution was really simple - I stopped sending my packets using IPv4 addresses and relied on IPv6 multicast only. Not only that IPv6 multicast is a first-class citizen but many network stacks ignore it altogether. Regardless of the exact reason, that worked perfectly.