GetHashCode

GetHashCode is approximate method for checking of equality that can end-up being used by other classes (most popular one is Hashtable).

.NET documentation gives us these rules:

  • Equal objects should return same hash code.
  • Same object should return same hash code every time until object is changed.
  • Distribution of hash codes should be random.

Equal objects should return same hash code

Sometimes checking for equality can be expensive and it may be few times faster to check whether their hash codes are equal. This will not give you definitive answer but it will allow you to remove quite a few objects from consideration. Hashtable does this through it’s bucket mechanism. If you insert object that doesn’t follow this rule, HashTable may be unable to return correct result (e.g. for ContainsKey method).

Note that this doesn’t mean that different objects must return different hash code.

Same object should return same hash code every time until object is changed

This one seems logical since it follows same reasoning as first rule. Whenever property that can make your Equals or Compare methods return different result changes value, you should recalculate your hash code. Small changes to object (something that doesn’t affect these methods) should not generate new hash code (they can, but there is no reason to).

Distribution of hash codes should be random.

This one is introduced to help classes that use buckets for divide-and-conquer (HashTable is our example again). This ensures that every bucket is filled to approximately same level so that any search doesn’t need to check every object.

Worst case scenario is every object returning same hash code value (e.g. return 0). While this follows rule one and two, performance-wise it is awful since every check will need to take all objects into consideration.

This is important rule, but you should not go through too much effort for this. Since GetHashCode method could get called a lot during search through collection, you should try to make it as fast as possible. Fast GetHashCode with less than optimal distribution will often out-perform elaborate and slow code.

What happens if I don’t override GetHashCode?

You will get default hash code that is based on memory address of your object in memory (in future implementations this may change). While this does work good enough, it may fail to recognize two objects as being same if they are created differently (e.g. returning same data row two times). In most of cases it will work, but it is generally bad idea to use it with collections (most of them use buckets) and it can lead to bugs that are difficult to find.

How to generate it then?

If you intend to use class with collection, you have probably already overriden Equals method (or you implemented some of compare interfaces e.g. IComparer). Whatever you have there to check for equality, use it in GetHashCode also. E.g. if your application uses property named Key for Equals, write:

public override int GetHashCode() {
    return this.Key.GetHashCode();
}

This makes it both simple and fast (if type of Key is one of .NET types) while following all rules.

Slightly more complicated situation is when you check against more than one property. One path you could take is to return GetHashCode based on element that changes more frequently. This will cause few collisions with hash codes (different objects will have same hash code) but it will not cause bugs. Depending on how many properties you have, it may not even have big hit on performance.

Other approach is combining two hash codes into one. E.g.:

public override int GetHashCode() {
    return this.Key.GetHashCode() ^ this.Key2.GetHashCode();
}

If you go that way, always measure speed. In more than one case you will find GetHashCode method that takes all elements into consideration is slower than one that has collisions. It all depends on objects you will use. From my experience, I would recommend avoiding calculating hash code on more than two properties.

Caching?

While caching may sound like a good idea, there is no need for it if you use GetHashCode of .NET Framework’s classes (as we did in examples above). Those classes either already have caching in place or they are using operation that is fast enough so that caching is not needed.

Only if you have your own hashing mechanism, you should consider caching results. Do not forget to update hash code also if object is changed.

Is it worth it?

If you are using something from Collection namespace, answer is yes. Almost anything there is either already using GetHashCode or it may use it in future. Even simplest of all hash codes will help performance.

Single Parent

Quite a few applications have both menu (MenuStrip control) and toolbar (ToolStrip control). Both of these controls have their menu containers. In case of MenuStrip this is ToolStripMenuItem and in case of ToolStrip we can use ToolStripSplitButton to get same effect. Both of those controls share DropDownItems property and with this you could make one ToolStripMenuItem and add it to both:

var newItem = new ToolStripMenuItem("Test");
newItem.Click += new EventHandler(newItem_Click);
toolStripMenuItem1.DropDownItems.Add(newItem);
toolStripSplitButton1.DropDownItems.Add(newItem);

This code looks nice but it does not work. In this case we will get Test added to toolStripSplitButton1 only.

Culprit for this is in SetOwner method (as seen with Reflector):

private void SetOwner(ToolStripItem item) {
    if (this.itemsCollection && (item != null)) {
        if (item.Owner != null) {
            item.Owner.Items.Remove(item);
        }
        item.SetOwner(this.owner);
        if (item.Renderer != null) {
            item.Renderer.InitializeItem(item);
        }
    }
}

As you can see, if item already has an owner, that owner is removed, and only than new owner is set.

Only solution is to create two new items and assign each to their own parent control:

var newItem1 = new ToolStripMenuItem("Test");
newItem1.Click += new EventHandler(newItem_Click);
toolStripMenuItem1.DropDownItems.Add(newItem1);

var newItem2 = new ToolStripMenuItem("Test");
newItem2.Click += new EventHandler(newItem_Click);
toolStripSplitButton1.DropDownItems.Add(newItem2);

While this means that you have two pieces of same code, you can find consolidation in fact that event handler methods can be reused.

Windows 7 on August 6Th

Illustration

Finally it is known. Earliest date you can get Windows 7 (and Windows Server 2008 R2) will be August 6th. This date is for those fortunate enough to be MSDN or TechNet subscriber.

Mere mortals will need to wait until October 22nd.

[2009-07-22: Windows 7 has reached RTM status.]

Hyper-V Drivers in Linux

Illustration

Microsoft has shared his Hyper-V integration components with Linux community. That contribution to Linux Driver Project will make it possible to have integration between any Linux and Hyper-V that was previously reserved for Suse and RedHat.

In worst-case scenario this will enable other distributions to work faster while under Hyper-V.

You can see more details on Windows Virtualization Blog.

Installing Windows XP Media Center Edition

Illustration

I do like Windows 7, but I still have few computers that for various issues (mostly driver availability) must work under Windows XP. Just to have fun, I decided to install Windows XP Media Center Edition this time.

Image of MCE is little bit bigger and it occupies two CDs. There is no DVD image so you are force to insert second one in appropriate moment. What came to me as surprise is that it requires Windows XP Professional Service Pack 2 to be inserted as third disk.

First attempt was just to cancel that dialog and that left me with unusable operating system since half of programs went missing.

For second attempt I just took Service Pack 2 from Internet and gave it to installation. It worked until next file was required and that file could not be found. That left me puzzled since that file was on CD1. I gave him CD1 and OS was installed. When I tried to apply SP3, everything just froze.

In third attempt, when asked for SP2 CD, I just returned first disk back in. I was surprised that installation didn’t mind and everything was installed perfectly. Even subsequent installation of service pack 3 went without problems.

Why did this happen? My guess is that slip-streaming is to be blamed. On original disks, everything was arranged in proper order but when service pack was slip-streamed (MCE comes with SP2), all updated files were arranged on first disk. Nobody bothered to re-arrange it and that caused need to reuse CD1. Installation itself knew only that it needs SP2 files and nobody bothered to change prompt to something more useful like “Please insert first disk again”.