Wednesday, July 6, 2016

Windows Universal String.GetHashCode

Since late 2021 you can use the XxHash classes in the System.IO.hashing namespace for generating 32-bit and 64-bit hashes. There is no need to use the custom hashing code below. The namespace also includes classes for the CRC-32 and CRC-64 algorithms.

I thought I was going barking mad because a user settings value I persisted with a hashed key kept coming back empty every time I restarted my App. After wasting half and hour on traces I could see that the same string produced different GetHashCode values every time I ran the App. Impossible?!

Now I understand perfectly that the implementation of String.GetHashCode might change on different platforms or different .NET Framework versions, but across different runs of the same App ... it's inconceivable. It's true. Read the documentation or run a web search and be bewildered.

I'm utterly gobsmacked by this change of behaviour. The vast change of scope of the predictability of the hash codes in Windows 10 is a devious trap for the unprepared.

Technically it's true, you shouldn't persist string hash codes, but I always thought it was okay to stretch the rules a little bit and use them in something with a relatively short lifetime like user settings which might live for days or weeks. Now the rules are locked down tight ... don't persist hash codes outside the lifetime of an AppDomain.

So what do you use instead of the really convenient String.GetHashCode? I immediately needed some short, simple and fast code to hash my string keys. Luckily I have hobby researched this subject before and published the results here: Hashing Short Strings.

Following my own advice, I picked the following code which combines the FNV-1a hash with a "quick and dirty" pseudo-random sequence.


static int StableHash(string value)
{
  uint hash = 2166136261u;
  for (int i = 0; i < value.Length; i++)
  {
    hash ^= value[i];
    hash = hash * 16777619;
    hash = (1664525 * hash) + 1013904223;
  }
  return (int)hash;
}

Put this method in a handy utility library. The string hash codes it produces are actually more collision resistant and more evenly distributed than String GetHashCode.

For a 64-bit version of stable hashing use this.

static long StableHash64(string value)
{
  ulong hash = 14695981039346656037UL;
  for (int i = 0; i < value.Length; i++)
  {
    hash ^= value[i];
    hash = hash * 1099511628211UL;
    hash = (2770643476691UL * hash) + 4354685564936844689UL;
  }
  return (long)hash;
}


No comments:

Post a Comment