Wednesday, May 22, 2019

Guid constant bits

People often casually say that the .NET Framework Guid contains 128 pseudo-random bits. This is not technically correct however, as we should know that at least one of the 4-bit nibbles is always the value 4, reducing the random bit count to 124. I suspected other bits were fixed values as well, but didn't know which ones. So I used LINQPad to generate hundreds of Guids and dump the bit counts and see which ones really have fixed values.

Here are the pictorial results of the bit values from bit offset 0 to 127 horizontally (I've truncated the lines so they don't scroll off the window).
           111111 11112222 22222233 33333333 44444444 44555555 55556666 66666677 77777777...
01234567 89012345 67890123 45678901 23456789 01234567 89012345 67890123 45678901 23456789...
********-********-********-********-********-********-********-0100****-10******-********...
You can see that bits 56-59 are always 0100 as previously stated. Also notice that bits 64-65 are always 10. So there really are 122 random bits in this type of Guid.

This is all actually explained in the Wikipedia article Universally unique identifier, but it was fun to dump the bits and confirm it's true.

Cryptographic?

A few weeks later I was wondering if Windows GUIDs are generated by some sort of pseudo-random number algorithm or if they were cryptographically strong. If they are pseudo-random then it's quite likely that someone with the appropriate talents can inspect a certain number of sequential GUIDs and determine the algorithm that generates them, and thereafter they can be predicted forever. If GUIDs are generated by a correctly implemented crypto algorithm then it is computationally infeasible to predict them.

I've been wondering about this for many years, and so have a lot of other people it seems, as web searches reveal lots of heated arguments on the subject.

Some people point out that according to RFC 4122, GUIDs are only guaranteed to be unique and there is no requirement that they be crypto strong. However, others point out that the algorithm that generates GUIDs is not specified and may be implemented in different ways by different vendors on different platforms. This leaves a loophole where someone could implement a crypto strong GUID algorithm if they want ... they don't have to, but they could.

So the web arguments continue asking if modern Windows GUIDs are crypto strong.

This archived PDF has an interesting and concise discussion of GUIDs in section 2.5.5 and mentions that it is "common practice" to "use cryptographic-quality random bits for generating [GUIDs]". Aha! That's a great hint, but does Windows do this?

In Feb 2014 someone (in Answer 1) displays a stack trace from 64-bit Windows 10 which shows UuidCreate calling down into "CryptRng" functions. This provides great evidence that the GUIDs (in this case) are crypto strong. Unfortunately there is no evidence that it works this way in all editions of Windows.

So in summary, I personally won't use GUIDs for anything related to cryptography, despite hints that they might be suitable. I do however love using the .NET Guid.NewGuid().GetHashCode() method for generating ad-hoc random numbers. I trust that this "one-liner" generates quality 32-bit random numbers. It's really convenient in cases where creating a Random class instance seems to be overkill or lack of thread safety with Random is a nuisance.

FEBRUARY 2024 NOTE -- I found the following expanded documentation in the .NET 8 Remarks for the Guid.NewGuid method.

On non-Windows platforms, starting with .NET 6, this function calls the OS's underlying cryptographically secure pseudo-random number generator (CSPRNG) to generate 122 bits of strong entropy. In previous versions of .NET, the entropy is not guaranteed to be generated by a CSPRNG.

It is recommended that applications not use the NewGuid method for cryptographic purposes. First, since a Version 4 UUID has a partially predictable bit pattern, the NewGuid function cannot serve as a proper cryptographic pseudo-random function (PRF). If the output of NewGuid is given to a cryptographic component which requires its input to be generated by a proper PRF, the cryptographic component may not be able to maintain its security properties. Second, NewGuid utilizes at most 122 bits of entropy, regardless of platform. Some cryptographic components set a minimum entropy level on their inputs as a matter of policy. Such policies often set the minimum entropy level at 128 bits or higher. Passing the output of NewGuid to such a routine may violate its policy.


Guid GetHashCode

Just after writing the previous section I remembered that the source code for the Guid class is available, so I could look at how GetHashCode actually works on the 16 bytes inside a Guid. It turns out that the hash is generated by mixing the following bytes:

aaaabbcc..f....k

The one-liner that generates the hash code is this:

return _a ^ (((int)_b << 16) | (int)(ushort)_c) ^ (((int)_f << 24) | _k);

So it's interesting that it uses only 10 of the 16 bytes to make the hash, and one of the 'c' bytes has four fixed bits (see first section). This means that only 74 bits out of the 128 bits in a Guid are used to make the hash code. This is so strange that I suspect there is some obscure implementation detail involved that's only known to Microsoft staff. The use of the f and k bytes is particularly interesting.

Perhaps while fine-tuning performance of Guid GetHashCode it was decided that the 74 bits of entropy provided by the 10 bytes was sufficient for generating good 32-bit hash codes, and using any more bytes would be pointless overkill. I'm only guessing.

If anyone thinks they have a technical explanation of why the hash code one-liner is coded the way it is, then I'm keen to hear from you and I'll append your comments to this article.