Software Development: 2019

Saturday, September 28, 2019

Memorable Computer Generated Keys

Computer systems often publish information for humans to read: invoices, sales dockets, memberships, subscriptions, travel bookings, etc. Items like these are expected to have some sort of unique 'key' associated with them, so for example, if you phone an airline to change a flight they will ask "what is your booking number?", then your correct reply allows immediate verification and you can proceed.

So what sort of 'key' should be used for documents that will be consumed by humans? Here are some choices I can think of:

Database Identity Key

Many relational databases (and others) can generate incrementing unique numbers like 0003145 as primary keys. This is good because numbers (of reasonable size) are simple for people to understand. It's bad for security though, because these numbers are predictable and Mallory can guess lots of numbers and try to hack or forge his way into private information.

'Jump' Numbers

Instead of predictably incrementing keys, how about ones that 'jump' with a random gap to the next one? This will produce numbers that are easy to read, and Mallory would have to try many keys to possibly find a valid one to attack. The size of the random jump has to be chosen as a tradeoff between the growth of the key and security through unpredictability.

Guids

Many databases generate a GUID (aka UUID) for each record. This is great for programmers who can be sure these unique keys will never clash with another one in the world. They are create-and-forget ids that are secure because they are sparsely distributed in 122-bit space and it's infeasible to guess them. The bad news is that a value like 8f50806b-1f88-4cd7-9112-4512045df69b on an airline reservation will boggle the mortal mind.

You could publish just a part of it as a key, like the 8f50806b prefix for example. This is not a really friendly string for a person to read, but it may be acceptable (making it uppercase might help the eye). You just have to be sure that there's enough entropy in the substring to guarantee uniqueness and unpredictability, and with GUIDs this is probably the case for normal use. See Birthday Attack for more information.

Artificial Keys

Another technique I quite like is to create an artificial random key that looks friendly to the eye. A key like KLB977 looks like a Victorian car number plate, or 125-982-763 is easy to recognise and dictate. Just pick a consistent format that seems friendly, perhaps something that's familiar to your local culture or language.

You have to be sure that the probability of generating a duplicate artificial key is small enough. If the key is short then you would probably generate the key and do a quick database lookup check that it's not a duplicate before using it. In the previous examples there would be 17.5 million 'number plate' keys and 1 billion 9-digit keys.

Fake keys composed of numbers and letters need some extra care to ensure they are human-friendly. Some numbers and letters have similar shapes and should be avoided. When I make fake keys I compose them out of the following character pool (notice some are missing):

123456789ABCDEFGHJKLMNPQRSTUVWXYZ

If you're extra conservative you may consider the characters JUVW a bit troublesome as well and remove them. Different fonts may affect your choice of 'bad' characters. With the reduced character set, the number of 'number plate' keys reduces to 10 million and the 9-digit ones reduce to about 0.4 billion, but this might be quite acceptable for moderate usage scenarios.

Addendum: Check Digits

In cases where generated keys are under your control, as in the cases of the 'Jump' Numbers and Artificial keys discussed above, you may consider using a check digit scheme to allow basic error detection for your keys.

The Wikipedia article lists many schemes, and there many other specialised ones like ISO 6346 which is used to identify shipping containers. You might even like to make up your own, perhaps checking that each numeric key is within ±3 of a prime number (I made that up). The Verhoeff check digit algorithm is technically very interesting and effective.

Orthogonal Programming

Friday, September 20, 2019

Installer Customer Information Dialog

When creating an MSI installer using a Visual Studio Setup Project you may want the user to enter their company name and a serial number as some sort of anti-piracy measure. Installers for many commercial products use this technique and it would nice if a similar step could be inserted in the wizard sequence in a Setup Project with minimal effort.

In the User Interface Editor you can add a Customer Information dialog which prompts for a Name, Organization and Serial number. It looks like this sample:

Note that the second field 'Organization' is optional and the default is to hide it. Toggle the visibility using the ShowOrganization property in the dialog's properties.

For many years I assumed that it was impossible to retrieve and validate the values from this dialog without writing a C/C++ custom action and manually registering it in the MSI tables. There are web articles that discuss that technique, but it was too much bother for me and I just assumed the dialog was too hard to use and I ignored it.

However, this week I found an article by accident that hinted that the Serial number could be retrieved using the PIDKEY property name. This led me to a Property Reference page where some interesting User Information properties are listed at the bottom of the page. After some experiments I found that the property names USERNAME, COMPANYNAME and PIDKEY correspond to the three fields in the dialog.

Now the challenge was to retrieve the Customer Information dialog values and validate them. I also wanted to save the values after successful install in some well-known location so they can be used by the product later at runtime (this was a personal requirement).

Some Bad News

Custom action (CA) code created by Visual Studio is not of a type that runs in the UI sequence, so it's not possible to interactively validate the serial number. This is where C/C++ code would be required with special registration.

There is still a slightly clumsy way to validate the serial number once the UI sequence ends and the install sequence starts and the managed code custom actions run. If the managed CA detects a bad serial number then it can throw an Exception to cause a message box to display something helpful, then the install rolls back and is cancelled. It's a bit of a nuisance that you can unwittingly enter a bad serial, click through a few more wizard steps, start the install and then discover the Serial is wrong, but it's probably an acceptable inconvenience.

The Good News

In summary, here is how to retrieve and validate the Customer Information dialog values.

• Write a CA class derived from the Installer class (contents discussed later).

• In the Custom Actions editor, add a CA to each of the four install steps which points to the project output containing the CA class (probably your main application project).

• Set the CustomActionData for each of the CA nodes to this (wrapped for easy reading):

target="[TARGETDIR]\"
/pidkey="[PIDKEY]"
/companyname="[COMPANYNAME]"
/username="[USERNAME]"

This is the 'trick' that lets the dialog values be passed down into the CA when it runs. If you have other custom dialogs in the UI sequence, add their properties to the list.

• In the CA's Install override method do something like this skeleton:

public override void Install(IDictionary stateSaver)
{
  base.Install(stateSaver);
  string serial = Context.Parameters["PIDKEY"];
  if (serial != "314159") throw new Exception("Bad serial number");
  // You could save the parameters now
}

Enhance this raw code to be crash-resistant, then use whatever serial validation check suits your needs. The thrown message will appear in a popup and cause the install to cancel and roll back.

If the validation passes, then you could loop through the Context.Parameters collection and write the key-value pairs to a file in a well-known location. The product could read the values at runtime and, for example, it could re-validate the serial as an extra anti-piracy measure.

Serial Validation

The logic that validates the serial number can be a simple or complex as you want. A simple scenario would be to check that the serial number matches the hash of some local environmental information such as the NetBIOS machine name or the Windows Product ID (displayed in Control Panel System). This would restrict installation to specific computers or specific copies of Windows.

Remember though that if the hash algorithm code is inside the installer then someone can easily extract it and reverse engineer it.

A more complex scenario may involve "phoning home" to a web service where sophisticated licensing rules could be implemented.

Summary

So after a decade of ignoring the Custom Information dialog because I thought it was too tricky, a tiny hint led me to discover that there is not really any trick at all to using it. The 'trick' was discovering the special property names USERNAME, COMPANYNAME and PIDKEY.

Note that the first two text fields in the Customer Information dialog seem to be prefilled with the name of the current Windows user account and the Windows registered customer name respectively. If you blank the fields out and continue, the same default values seem to be used anyway.

Orthogonal Programming

Wednesday, May 22, 2019

Guid constant bits

People often casually say that the .NET Framework Guid contains 128 pseudo-random bits. This is not technically correct however, as we should know that at least one of the 4-bit nibbles is always the value 4, reducing the random bit count to 124. I suspected other bits were fixed values as well, but didn't know which ones. So I used LINQPad to generate hundreds of Guids and dump the bit counts and see which ones really have fixed values.

Here are the pictorial results of the bit values from bit offset 0 to 127 horizontally (I've truncated the lines so they don't scroll off the window).

           111111 11112222 22222233 33333333 44444444 44555555 55556666 66666677 77777777...
01234567 89012345 67890123 45678901 23456789 01234567 89012345 67890123 45678901 23456789...
********-********-********-********-********-********-********-0100****-10******-********...

You can see that bits 56-59 are always 0100 as previously stated. Also notice that bits 64-65 are always 10. So there really are 122 random bits in this type of Guid.

This is all actually explained in the Wikipedia article Universally unique identifier, but it was fun to dump the bits and confirm it's true.

Cryptographic?

A few weeks later I was wondering if Windows GUIDs are generated by some sort of pseudo-random number algorithm or if they were cryptographically strong. If they are pseudo-random then it's quite likely that someone with the appropriate talents can inspect a certain number of sequential GUIDs and determine the algorithm that generates them, and thereafter they can be predicted forever. If GUIDs are generated by a correctly implemented crypto algorithm then it is computationally infeasible to predict them.

I've been wondering about this for many years, and so have a lot of other people it seems, as web searches reveal lots of heated arguments on the subject.

Some people point out that according to RFC 4122, GUIDs are only guaranteed to be unique and there is no requirement that they be crypto strong. However, others point out that the algorithm that generates GUIDs is not specified and may be implemented in different ways by different vendors on different platforms. This leaves a loophole where someone could implement a crypto strong GUID algorithm if they want ... they don't have to, but they could.

So the web arguments continue asking if modern Windows GUIDs are crypto strong.

This archived PDF has an interesting and concise discussion of GUIDs in section 2.5.5 and mentions that it is "common practice" to "use cryptographic-quality random bits for generating [GUIDs]". Aha! That's a great hint, but does Windows do this?

In Feb 2014 someone (in Answer 1) displays a stack trace from 64-bit Windows 10 which shows UuidCreate calling down into "CryptRng" functions. This provides great evidence that the GUIDs (in this case) are crypto strong. Unfortunately there is no evidence that it works this way in all editions of Windows.

So in summary, I personally won't use GUIDs for anything related to cryptography, despite hints that they might be suitable. I do however love using the .NET Guid.NewGuid().GetHashCode() method for generating ad-hoc random numbers. I trust that this "one-liner" generates quality 32-bit random numbers. It's really convenient in cases where creating a Random class instance seems to be overkill or lack of thread safety with Random is a nuisance.

FEBRUARY 2024 NOTE -- I found the following expanded documentation in the .NET 8 Remarks for the Guid.NewGuid method.

On non-Windows platforms, starting with .NET 6, this function calls the OS's underlying cryptographically secure pseudo-random number generator (CSPRNG) to generate 122 bits of strong entropy. In previous versions of .NET, the entropy is not guaranteed to be generated by a CSPRNG.

It is recommended that applications not use the NewGuid method for cryptographic purposes. First, since a Version 4 UUID has a partially predictable bit pattern, the NewGuid function cannot serve as a proper cryptographic pseudo-random function (PRF). If the output of NewGuid is given to a cryptographic component which requires its input to be generated by a proper PRF, the cryptographic component may not be able to maintain its security properties. Second, NewGuid utilizes at most 122 bits of entropy, regardless of platform. Some cryptographic components set a minimum entropy level on their inputs as a matter of policy. Such policies often set the minimum entropy level at 128 bits or higher. Passing the output of NewGuid to such a routine may violate its policy.

Guid GetHashCode

Just after writing the previous section I remembered that the source code for the Guid class is available, so I could look at how GetHashCode actually works on the 16 bytes inside a Guid. It turns out that the hash is generated by mixing the following bytes:

aaaabbcc..f....k

The one-liner that generates the hash code is this:

return _a ^ (((int)_b << 16) | (int)(ushort)_c) ^ (((int)_f << 24) | _k);

So it's interesting that it uses only 10 of the 16 bytes to make the hash, and one of the 'c' bytes has four fixed bits (see first section). This means that only 74 bits out of the 128 bits in a Guid are used to make the hash code. This is so strange that I suspect there is some obscure implementation detail involved that's only known to Microsoft staff. The use of the f and k bytes is particularly interesting.

Perhaps while fine-tuning performance of Guid GetHashCode it was decided that the 74 bits of entropy provided by the 10 bytes was sufficient for generating good 32-bit hash codes, and using any more bytes would be pointless overkill. I'm only guessing.

If anyone thinks they have a technical explanation of why the hash code one-liner is coded the way it is, then I'm keen to hear from you and I'll append your comments to this article.

Orthogonal Programming

Tuesday, April 2, 2019

Windows update error 0x80248007

With increasing frequency over the last year I stumble over the problem where a Windows 10 machine will fail to install updates with error 0x80248007. Once this happens you're "stuck" and updates will fail forever more. This seems to only happen when major Windows 10 operating system updates are involved. Extensive web searches produce conflicting and dangerous advice about how to overcome this problem, and some of the less frightening suggestions just don't work. Here is a typical problem screen:

The only way to overcome this problem is to go to the Microsoft Update Catalog web site and search for the specific updates that have failed, download them, then install them from oldest to newest. Here's a typical search result with a download link for a large update:

Updates then install correctly (see next picture) the and Windows Update status and history return to a sensible state.

Perhaps I'm lucky. Your mileage may vary.

Orthogonal Programming

Monday, April 1, 2019

VMware Workstation and Device/Credential Guard are not compatible

Each year or so, or after a big Windows update I can't launch VMWare Workstation due to the error in this post's title. And each time this happens I can't remember what I did last time to fix it, so I run some searches and find dozens of variations of ridiculously complicated answers that I remember DO NOT work. Changing Device Guard policy seemed sensible but it didn't work for me. Just run this command as admin:

bcdedit /set hypervisorlaunchtype off

Then reboot. This seems to have no ill effects. I don't use Hyper-V on my work computer, and it doesn't interfere with development of UWP apps (which used to need Hyper-V installed).

Orthogonal Programming

Saturday, January 5, 2019

UDP Broadcasting Sample

Overview

Every year or so I have to use the UdpClient class to perform simple broadcasting and receiving, and I forgot how to do it. There are lots of combinations of parameters that don't work or produce unhelpful crashes if you get them wrong.

The scenario I'm describing is one of the simplest usages of UDP possible, with complete decoupling of the broadcasters and receivers, only sharing knowledge of a common address and port to use. One app broadcasts messages with no idea who is receiving. Other app(s) within range receive the messages. This is ideal for non-critical communication. For example, an app could broadcast a string of XML to say "customer X has been updated", then receivers could refresh their UIs.

For the sanity of myself and others I have composed a small reference DOS command that implements the usage scenario I have just described. It can be launched to either broadcast or receive UDP message strings. Broadcasting and receiving can run on the same machine or on different machines in a network. The UdpClient class is used in the simplest way I think is possible with minimum code.

Run the command as udpdemo broadcast and it enters a broadcast prompt loop.

Run the command as udpdemo receive and it enters a receive loop.

Extra instructions are displayed on how to tell each loop to gracefully end.

For simplicity, the command uses hard-coded multicast address 227.7.7.7 and port 54777. For more general information run a web search for "multicast address range".

The complete C# source code for the DOS command can be found HERE (as a txt file). The following image shows the demo command running as a broadcaster and a receiver and sending two messages, the last one "end" tells the receiver to close.

Notes

Although using UDP in the way I have in the sample is simple to code and understand, remember that it is connectionless and has no built-in reliability. If messages are blocked by some network rules or receivers are not running, then messages may be silently lost, if that matters in your usage scenario.

A modern alternative to UDP broadcasting may be to use queues. For .NET developers, using Azure Queue storage requires little code, it's really fast, the capacity is enormous and it's dirt cheap. For more sophisticated scenarios there are products like Event Hubs, Service Bus and more. Queues are better than UDP if you don't want messages to be lost.

Orthogonal Programming