Console Log Format

Pretty much every application I’ve created has some logging included. Mostly it’s just humble Debug and Trace statements but I’ve used more complex solutions for bigger apps. But one thing was lacking - standardized format.

I’ve noticed all my apps use slightly different log format. My bas scripts, e.g. for docker apps, usually have completely different format than my C# applications. And C# applications are also all different from one another, depending which framework I use. For example, applications using Serilog look completely different than apps I made using just Debug.WriteLine. And no, it’s not really a huge problem since each log output is similar enough for me to parse and use without using much brain power. But, combining those apps (e.g. in Docker) is what makes it annoying. You can clearly see where entry-point script ends, and other application begins. It just looks ugly.

So, I decided to reduce number of different log outputs by figuring out what is important to me when dealing with console logs. So I landed on the following list:

  • instantly recognizable - there should be no though required to figure out what each of fields does
  • minimum effort - same overall log format must be outputtable from simple bash script or complex application
  • grepable - any output must be easily filtered using grep
  • easily parsable - it must be possible to extract each field using basic linux tools (e.g. cut and awk)
  • single line - single line per log entry; if more is needed, in parallel output to different format, e.g. json
  • using std/err output - anything more serious than info should go to stderr output
  • colored - different log levels should result in different colors

In the end, I settled on something like this:

DATE       TIME         LEVEL CATEGORY+ID+TEXT
1969-07-20 20:17:40.000 INFO  lander: Sequence completed
1969-07-20 23:39:33.000 INFO  person:1: Hatch open

Date and time fields were easy choice to start message with. In a lot of my applications I used proper ISO format (e.g. 1969-07-20T20:17:40.000) but I opted to “standardize” on space. Reason is legibility. While date field is needed, I rarely care about it when troubleshooting - for that I most often just care about time. Separating time by a space allows for much greater legibility. As for time-zone, console output will always use the local time-zone.

I am a huge fan of UTC and I believe one should use it - most of the time. But it is hard to justify its usage on home servers where instead of helping it actually hinders troubleshooting. Compromise is just to use local time zone. If server is UTC, output will be UTC. And, as much as I love UTC I hate its denominator - Z. If the whole log is in UTC, adding Z as a suffix just makes things less readable.

I also spent way too much time thinking if I should include milliseconds or not. Since I found them valuable plenty of times, I decided they’re worth of extra 4 characters. Interestingly, I found that getting ahold of them in bash is not always straightforward. While under most Linux distributions, you can get time using date +'%Y-%m-%d %H:%M:%S.%3N', this doesn’t work on Alpine Linux. It’s busybox date doesn’t offer %N as an option. I found that date +"%F $(nmeter -d0 '%3t' | head -n1)" is a simple workaround.

Next space-separated fields is log level. Up to now I often used a single letter log level, e.g. E: for error. But I found that this is not always user friendly. Thus, I decided to expand name a bit:

TextColor.NETSerilogSyslogStream
TRACEDark blueTraceVerbose-1>
DEBUGBright blueDebugDebugDebug (7)1>
INFOBright cyanInformationInformationInformation (6)1>
WARNBright yellowWarningWarningWarning (4)2>
ERRORBright redErrorErrorError (3)2>
FATALBright redCriticalFatalCritical (2)2>

Each log level is now 5 characters long. This makes parsing easier while still maintaining readability. I was really tempted to enclose them in square brackets, e.g. [INFO] since I find this format really readable. However, this would require escaping in grep and that is something I would hate to do.

Making log level field always 5 characters in length also helps to align text that follows. Just cut the first 30 characters and you get rid of date, time, and log level.

Next field contains category name which usually matches the application. This field is not fixed size but it should be easily parsable regardless due to it ending in colon (:) character followed by a space. If log entry has ID, the same is embedded within two colon characters. If ID is 0, it can be omitted. For practical reason, I try sticking to ID numbers 1000-9999 but field has no official width so anything within u64 should be expected

I don’t really use ID for my events in every application but they are so valueable when it comes to a large code base that I simply couldn’t omit them. However, they are just annoying when it comes to small application so I didn’t want to make this a separate field. In the end, I decided to keep them between two colons as that impacted my parsing the least.

And, finally, the last component is the actual log text. This is a free form field with only one rule - no control characters. Any control character (e.g. LF) should be escaped or stripped.

Of course, sometime you will need additional details, exceptions, or execution output. In those cases, I will just drop all that text verbatim with two spaces at front. This will make it not only visually distinct but also really easy to remove using grep.

With this in mind, I will now update my apps.

Will I change all of them? Not really. Most of my old applications will never get a log format update. Not only it’s a lot of work to update them but it might also mess with their tools and scripts.

This is just something I’ll try to keep in my mind going forward.