Saturday, September 28, 2019

Memorable Computer Generated Keys

Computer systems often publish information for humans to read: invoices, sales dockets, memberships, subscriptions, travel bookings, etc. Items like these are expected to have some sort of unique 'key' associated with them, so for example, if you phone an airline to change a flight they will ask "what is your booking number?", then your correct reply allows immediate verification and you can proceed.

So what sort of 'key' should be used for documents that will be consumed by humans? Here are some choices I can think of:

Database Identity Key

Many relational databases (and others) can generate incrementing unique numbers like 0003145 as primary keys. This is good because numbers (of reasonable size) are simple for people to understand. It's bad for security though, because these numbers are predictable and Mallory can guess lots of numbers and try to hack or forge his way into private information.

'Jump' Numbers

Instead of predictably incrementing keys, how about ones that 'jump' with a random gap to the next one? This will produce numbers that are easy to read, and Mallory would have to try many keys to possibly find a valid one to attack. The size of the random jump has to be chosen as a tradeoff between the growth of the key and security through unpredictability.

Guids

Many databases generate a GUID (aka UUID) for each record. This is great for programmers who can be sure these unique keys will never clash with another one in the world. They are create-and-forget ids that are secure because they are sparsely distributed in 122-bit space and it's infeasible to guess them. The bad news is that a value like 8f50806b-1f88-4cd7-9112-4512045df69b on an airline reservation will boggle the mortal mind.

You could publish just a part of it as a key, like the 8f50806b prefix for example. This is not a really friendly string for a person to read, but it may be acceptable (making it uppercase might help the eye). You just have to be sure that there's enough entropy in the substring to guarantee uniqueness and unpredictability, and with GUIDs this is probably the case for normal use. See Birthday Attack for more information.

Artificial Keys

Another technique I quite like is to create an artificial random key that looks friendly to the eye. A key like KLB977 looks like a Victorian car number plate, or 125-982-763 is easy to recognise and dictate. Just pick a consistent format that seems friendly, perhaps something that's familiar to your local culture or language.

You have to be sure that the probability of generating a duplicate artificial key is small enough. If the key is short then you would probably generate the key and do a quick database lookup check that it's not a duplicate before using it. In the previous examples there would be 17.5 million 'number plate' keys and 1 billion 9-digit keys.

Fake keys composed of numbers and letters need some extra care to ensure they are human-friendly. Some numbers and letters have similar shapes and should be avoided. When I make fake keys I compose them out of the following character pool (notice some are missing):

123456789ABCDEFGHJKLMNPQRSTUVWXYZ

If you're extra conservative you may consider the characters JUVW a bit troublesome as well and remove them. Different fonts may affect your choice of 'bad' characters. With the reduced character set, the number of 'number plate' keys reduces to 10 million and the 9-digit ones reduce to about 0.4 billion, but this might be quite acceptable for moderate usage scenarios.

Addendum: Check Digits

In cases where generated keys are under your control, as in the cases of the 'Jump' Numbers and Artificial keys discussed above, you may consider using a check digit scheme to allow basic error detection for your keys.

The Wikipedia article lists many schemes, and there many other specialised ones like ISO 6346 which is used to identify shipping containers. You might even like to make up your own, perhaps checking that each numeric key is within ±3 of a prime number (I made that up). The Verhoeff check digit algorithm is technically very interesting and effective.

No comments:

Post a Comment