Exporting JSON via Awk

I wanted to process Unicode CSV file to extract the first two columns into JSON. With awk it seemed easy enough:

awk '
    BEGIN {
        FS=";"
        print "["
    }
    {
        print "  { \"code\": \"" $1 "\", \"description\": \"" $2 "\" },"
    }
    END {
        print "]"
    }
    ' UnicodeData.txt | less

This will give you ALMOST parsable output. One thing that will spoil it is the last “hanging” comma making the whole JSON invalid (albeit some parsers will still load it). And no, there is no way to tell awk to do something special with the last line as processing of the lines is done one-by-one and thus there is no telling which line is last at any give moment.

What we can do is tell awk to process lines with a single line delay:

awk '
    BEGIN {
        FS=";"
        print "["
    }
    NR>1 {
        print "    { \"code\": \"" code "\", \"description\": \"" description "\" },"
    }
    {
        code = $1
        description = $2
    }
    END {
        print "    { \"code\": \"" code "\", \"description\": \"" description "\" }"
        print "]"
    }
    ' UnicodeData.txt | less

This prints content starting from the second line (NR>1) and what we do in the main loop is just storing fields into our variables that’ll be read in the next iteration. Essentially what we have is a single line delay mechanism. To catch up with the last line we just print it out without trailing comma in END portion of our program.

Valid JSON at last.