Dirty Docs

Illustration

I usually like Google Docs. They are not full-blown editing tool and they will not replace Microsoft Office for me (at least not soon) but they offer quite a lot when it comes to editing documents from various locations. Your document goes where your browser is.

For reasons I will not get into, I had to do some automatic processing on one of my Google Docs textual documents. I created it with care, all headings were properly defined and most of text was of “Normal” style. It was made with clean export into HTML in mind.

Unfortunately export result was quite far from clean code. First thing is that all is in one line. Yes, this saves few bytes but it is pain in the ass if you need any manual editing of this document. Fortunately PSPad knows how to expand such code.

Biggest issue I have here is that everything is set via style-sheets. While I usually agree with that, Google overdid it this time. They added bunch of span tags all over place. Even when you have just “Normal” text, you can be sure that it will not stand without around it. Event bolds and italics will not get just and but they will have full-blown CSS definition.

I agree that this is not an issue if you just want to view it. However this makes any automatic processing of text real pain-in-the-ass. It definitely brings back memory of Microsoft FrontPage and it’s html mess.

P.S. No, ODT export is not solution - it is even bigger and dirtier.

Not All Files Can Be Embedded

Quite a lot of helper files needed for average applications are best stored as resource. This way separation of data and code is kept on logical level but everything gets stored in one executable so nothing can be lost.

In order to make some file resource only thing needed is selecting “Properties” from context menu and setting “Build Action” to “Embedded Resource”. Reading this from code is equally easy (assuming program is called MyProgram and resource file is called Test.txt):

Stream myStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("MyProgram.Test.txt");

It is easy as it can be and I was very surprised when I could not get this to work. I added file “File.hr.xml” to project and I embedded it correctly but GetManifestResourceStream method returned null for it. Documentation says that this can only happen when resource is not there and this was impossible in my case.

In order to confirm this I looped through all resources I had embedded:

foreach (string name in Assembly.GetExecutingAssembly().GetManifestResourceNames()) {
    Debug.WriteLine(name);
}

To my surprise documentation was correct, my file simply wasn’t there. However, in my bin folder there was subfolder named “hr” with “MyProgram.resources” file inside.

Then it hit me. Problem was in multi-dotted extension with valid language code as first part. These files are understood by Visual Studio as being language specific and reading them from another culture is not possible (ok, it is possible but not easy).

Since I really wanted this resource to be available regardless of localization, solution was simple. I just renamed file to “File-HR.xml” and it magically appeared in resource list.

Cleaning Up

What to do if your script needs to kill all processes that it started? Just kill everything that has same parent as your current shell ($$):

#!/bin/bashkill `ps -ef | awk '$3 == '$$' {print $2}'`

Chain of Fools

Microsoft always took pride in maintaining compatibility. One guy decided to test just how upgradeable Windows were going from Windows 1.01 to Windows 7.

P.S. Do notice that there is no Windows ME. Repressed memory perhaps?