Sorting \"Dot\" Files

As I got Cent OS 7.4 running, a bit strange thing happened. When I ran usual ll (alias to ls -lA), I got a slightly unexpected result:

ll
 drwxrwx--- 4 apache apache  4096 Dec 24 06:50 download
 -rw-rw---- 1 apache apache  5430 Dec 23 08:06 favicon.ico
 -rw-rw---- 1 apache apache 12300 Dec 26 02:25 .htaccess
 -rw-rw---- 1 apache apache   460 Dec 23 08:06 index.php
 -rw-rw---- 1 apache apache   117 Dec 23 20:39 robots.txt
 drwxrwx--- 2 apache apache  4096 Dec 26 01:44 .well-known
 drwxrwx--- 5 apache apache  4096 Dec 23 17:32 wordpress

Can you spot the issue?

Yep, Cent OS got a bit (too) smart so sorting ignores the starting dot and gets those files too in the alphabetic order. Those used to dot files on the top - though luck.

Well, it’s possible to “correct” this behavior using the slightly different alias in .bashrc:

alias ll='LC_COLLATE=C ls -lA'

This gives a (properly) sorted output:

ll
 -rw-rw---- 1 apache apache 12300 Dec 26 02:25 .htaccess
 drwxrwx--- 2 apache apache  4096 Dec 26 01:44 .well-known
 drwxrwx--- 4 apache apache  4096 Dec 24 06:50 download
 -rw-rw---- 1 apache apache  5430 Dec 23 08:06 favicon.ico
 -rw-rw---- 1 apache apache   460 Dec 23 08:06 index.php
 -rw-rw---- 1 apache apache   117 Dec 23 20:39 robots.txt
 drwxrwx--- 5 apache apache  4096 Dec 23 17:32 wordpress

Requiring Authentication For All But One File

As I planned move of my site to Linode, first I needed a place to test. It was easy enough to create test domain and fill it with migrated data but I didn’t want Google (or any other bot) to index it. The easiest way to do so was to require authentication. In Apache configuration that can be done using Directory directive:

<Directory "/var/www/html">
    AuthType Basic
    AuthUserFile "/var/www/.htpasswd"
    Require valid-user
</Directory>

However, this also means that my robots.txt with disallow statements was also forbidden. What I really wanted was to allow only access to robots.txt while forbidding everything else.

A bit of modification later, this is what I came up with:

<Directory "/var/www/html">
    AuthType Basic
    AuthUserFile "/var/www/.htpasswd"
    Require valid-user
    <Files "robots.txt">
        Allow from all
        Satisfy Any
    </Files>
</Directory>

Meltdown and Spectre

Illustration

It has been a very scary start of the year. We’re only a few days in and world is already falling apart. If you aren’t scared already, it is enough to see a demonstration for Meltdown and Spectre exploits to feel very uncomfortable.

I won’t go into the details as this dreadful exploit family already has a web page with all the information one could desire to know. If that’s not enough, probably every major news outlet has an article or two about it.

In the midst of all this ruckus and panic unfortunately, for most of us, there is nothing to do. Due to the nature of these faults, fix has to be either done in hardware (albeit with some mitigations via microcode update) or in OS kernel of your choice. There is simply nothing application developer can realistically do but wait. Once “big boys” have done their work, there will be a flurry of activity if you need to do some performance testing and that’s it. Explicit regression testing will not be needed as you have it automated to run over night anyhow (wink-wink) and the risk of user code breakage is quite low.

If you are dealing with OS maintenance, you will have a bit more work to do. While some patches are already out, more are still expected, and I trust Murphy will ensure that at least some patches will receive patches of their own. If you are dealing with a cloud environment you will have your work multiplied by a factor but that comes with a saving grace of easily automating stuff across many machines. It will be busy but surmountable.

Those of us who also deal with hardware, I pity. Updating firmware is annoying even when there is no pressure. Generally machine has to go down to even think about it. Then you will try to automate it only to find out that 50% of your blades simply didn’t “take” the update and vendor coolly advises that “it sometime happens” and that you should proceed with manual installation.

And, of course, these servers haven’t had their firmware updated for a while and microcode you want to get will come with bunch of other firmware fixes and changes you don’t want to deal with right now. Tough luck - microcode will not be “backported” to your current version. Just hope it doesn’t change some obscure default causing issue when machine is finally booted up or that you will need to update your pristine 1.0 to some other version before you can even think about getting the latest.

And please don’t think about going home because you’ll see BIOS with microcode update ready in the next few days for your home computer too. For example, my Dell has it for a couple of days now. So you will go updating all personal computers only to discover your wife’s laptop doesn’t boot anymore…

May you live in interesting times, indeed.

Tailing Two Files

Illustration

As I got my web server running, it came to me to track Apache logs for potential issues. My idea was to have a base script that would, on a single screen, show both access and error logs in green/yellow/red pattern depending on HTTP status and error severity. And I didn’t want to see the whole log - I wanted to keep information at minimum - just enough to determine if things are going good or bad. If I see something suspicious, I can always check full logs.

Error log is easy enough but parsing access log in the common log format (aka NCSA) is annoyingly difficult due to its “interesting” choice of delimiters.

Just looks at this example line:

108.162.245.230 - - [26/Dec/2017:01:16:45 +0000] "GET /download/bimil231.exe HTTP/1.1" 200 1024176 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"

First three entries are space separated - easy enough. Then comes date in probably the craziest format one could fine and enclosed in square brackets. Then we have request line in quotes, followed by a bit more space-separated values. And we finish with a few quoted values again. Command-line parsing was definitely not in mind of whoever “designed” this.

With Apache you can of course customize format for logging - but guess what? While you can make something that works better with command-line tools, you will lose a plethora of tools that already work with NCSA format - most notably Webalizer. It might be a bad choice for command line, but it’s the standard regardless.

And extreme flexibility of Linux tools also means you can do trickery to parse fields even when you deal with something as mangled as NCSA.

After a bit of trial and error, my final product was the script looking a bit like this:

#!/bin/bash

LOG_DIRECTORY="/var/www"

trap 'kill $(jobs -p)' EXIT

tail -Fn0 $LOG_DIRECTORY/apache_access.log | gawk '
  BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" }
  {
    code=$6
    request=$5

    ansi="0"
    if (code==200 || code==206 || code==303 || code==304) {
      ansi="32;1"
    } else if (code==301 || code==302 || code==307) {
      ansi="33;1"
    } else if (code==400 || code==401 || code==403 || code==404 || code==500) {
      ansi="31;1"
    }
    printf "%c[%sm%s%c[0m\n", 27, ansi, code " " request, 27
  }
' &

tail -Fn0 $LOG_DIRECTORY/apache_error.log | gawk '
  BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" }
  {
    level=$2
    text=$5 " " $6 " " $7 " " $8 " " $9 " " $10 " " $11 " " $12 " " $13 " " $14 " " $15 " " $16

    ansi="0"
    if (level~/info/) {
      ansi="32"
    } else if (level~/warn/ || level~/notice/) {
      ansi="33"
    } else if (level~/emerg/ || level~/alert/ || level~/crit/ || level~/error/) {
      ansi="31"
    }
    printf "%c[%sm%s%c[0m\n", 27, ansi, level " " text, 27
  }
' &

wait

Script tails both error and access logs, waiting for Ctrl+C. Upon exit, it will kill spawned jobs via trap.

For access log, gawk script will check status code and color entries accordingly. Green color is for 200 OK, 206 Partial Content, 303 See Other, and 304 Not Modified; yellow for 301 Moved Permanently, 302 Found, and 307 Temporary Redirect; red for 400 Bad Request, 401 Unauthorized, 403 Forbidden, and 404 Not Found. All other codes will remain default/gray. Only code and first request line will be printed.

For error log, gawk script will check only error level. Green color will be used for Info; yellow color is for Warn and Notice; red is for Emerg, Alert, Crit, and Error. All other (essentially debug and trace) will remain default/gray. Printout will consist just of error level and first 12 words.

This script will not only shorten quite long error and access log lines to their most essential parts, but coloring will enable one to see the most important issues at a glance - even when lines are flying around. Additionally, having them interleaved lends itself nicely to a single screen monitoring station.

[2018-02-09: If you are running this via SSH on remote server, don’t forget to use -t for proper cleanup after SSH connection fails.]

In the Year 2017

Illustration

First day of 2018 is perfect time to look upon the previous year.

After changing domain last year, I decided to move hosting too. While I left my domains with DreamHost and Plus, I moved my hosting to Linode. Performance-wise, its even smallest package is equivalent to shared hosting and it offers much higher flexibility.

In any case, my decision was topic of quite a few posts and probably will see a few more in 2018. Speaking of posts, year 2017 saw 83 of them. That’s an average of one post every 4-5 days. Right in the ballpark of the last year’s resolution.

Majority of posts was Linux related - whether it’s Linode or Mikrotik. I essentially went over all things I needed to setup my home network, my home NAS, my NTP server, and my web site.

Second most-used category was programming, followed by general updates, and lastly just a single post about electronics. I guess after having it be top category for 2016, I grew a bit tired. But no worries, I have a few electronics projects planned for this year.

Traffic-wise, there was a slight growth as compared to the last year. For Nth year in row, this is driven mostly by VHD Attach, followed closely by OpenVPN and SSTP setup for Mikrotik. My password manager Bimil also saw quite an uptick in downloads.

In regards to the reader’s browser selection, Chrome is still firmly in the first place with just below 60%, Firefox is distant second at 20%, and Internet Explorer has dipped below 10% but still in the third place. In forth place Safari slightly dropped to 5% with Edge still at 3% and going nowhere from the fifth.

Traffic coming from small browsers has increased with almost 5% belonging to either ancient (yes, Opera is here again) or browsers I’ve never heard of (e.g., YaBrowser). I wouldn’t be surprised to see one of them kick Edge off the top 5 list next year.

When it comes to traffic sources, USA is still firmly first with 20% of visitors. Second is still Germany, albeit at slightly lower 7%. Third place belongs to Great Britain and Russia, both at 4%. France, India, Italy, and newcomer Poland follow at 3%. My home country of Croatia is 19th at 1.24%.

This year also saw record 213 countries in the list of visitors. Of all single visit countries, my points go to St. Barthélemy. I am a sucker for names outside of ASCII.

That’s all for the year 2017, all the best in 2018!