Software Development: January 2016

Sunday, January 31, 2016

Preserving Encoding and BOM

I was running some files through Regex.Replace and a diff showed the first line of some files had changed, but the text looked the same. It turns out that by using StreamReader to read the input files and StreamWriter to write the output files I had removed the BOMs (byte order markers). For example, some of the input XML files started with the UTF-8 BOM bytes 0xEFBBBF but they were stripped off the output files.

After an hour of searching and fiddling around I came to the conclusion that the none of the System.IO classes correctly report the encoding type, which means you can't simply and automatically round-trip a file's encoding when it's processed as a text file. Other web comments seems to support this, but if anyone knows otherwise, please let me know.

I reluctantly used the following code to calculate a file's text encoding.

private static Encoding CalcEncoding(string filename)
{
  var prencs = Encoding.GetEncodings()
      .Select(e => new { Enc = e.GetEncoding(), Pre = e.GetEncoding().GetPreamble() })
      .Where(e => e.Pre.Length > 0).ToArray();
  using (var reader = File.OpenRead(filename))
  {
    var lead = new byte[prencs.Max(p => p.Pre.Length)];
    reader.Read(lead, 0, lead.Length);
    var match = prencs.FirstOrDefault(p => Enumerable.Range(0, p.Pre.Length).All(i => p.Pre[i] == lead[i]));
    return match == null ? null : match.Enc;
  }
}

This method 'sniffs' the file and finds any encoding with preamble bytes that match the start of the file. It's clumsy to have to do this. If you get null back then you have to chose a suitable default encoding, and a new UTF8Encoding(false) class is a good choice on Windows where UTF8 encoding without a BOM is the default for most text file processing.

Once you have the original encoding (or a suitable default), pass it into the StreamWriter's constructor and you can be sure that the original encoding and BOM will be preserved.

Orthogonal Programming

Wednesday, January 27, 2016

IISExpress Trace/Log Files

I noticed by accident while using TreeSize Free that this folder:

MyDocuments\IISExpress\TraceLogFiles

Contained hundreds of log files taking over 120MB. I'd never noticed them before and they looked useless, even to a software developer. Some web articles also hinted the log files weren't of much use and suggested that delete the contents of both the Logs and TraceLogFiles folders. To disable the logging it was suggested that you edit:

MyDocuments\IISExpress\config\applicationhost.config

And comment out the line containing HttpLoggingModule in the globalModules section. I found it in two places and commented out both of them. After rebooting and compiling and running some web apps it looks like no more logging or trace files are generated.

Orthogonal Programming

Thursday, January 21, 2016

Inter-Process Locking

Many years ago I had to share a file between apps running in different Windows processes, and I had to guarantee that only one process would update the file at any time. Now I can't find my old C# code or remember exactly how I created an inter-process lock. Web searches reveal lots of different code samples, and they're much longer and more complicated that what I remember of my old code. So after bumbling around for half an hour I finally rediscovered the simplest possible code, which I publish here so it won't be forgotten again.

Console.WriteLine("Before lock");
using (var m = new Mutex(false, "TheMutexName"))
{
  m.WaitOne();
  Console.WriteLine("Pretend work now. Press enter to continue...");
  Console.ReadLine();
  m.ReleaseMutex();
}
Console.WriteLine("After lock. Press enter to exit");
Console.ReadLine();

The code between the WaitOne and ReleaseMutex is guarded against entry by multiple threads in any processes on the machine.

Orthogonal Programming

Saturday, January 16, 2016

Azure Table Combined Conditions

If you have multiple conditions that must be combined to build a complete Azure Table query, the code can get verbose, increasingly so if there are 3 or more conditions. Extra complexity arises if not all of the conditions are needed. You might finish up with a cluttered mess of code that uses GenerateFilterCondition and CombineFilter methods sprinkled with if statements. The CombineFilter method only takes two arguments, so to combine more conditions you have to nest calls to it.

A LINQ Aggregate trick can be used to simplify building query strings. Here is a sample that has four imaginary conditions which must be combined if the argument to each one is not null.

public string BuildWhere(string pk, string rk, string user, int? rating)
{
  var conditions = new string[]
  {
    pk == null ? null : TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, pk),
    rk == null ? null : TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, rk),
    user == null ? null : TableQuery.GenerateFilterCondition("User", QueryComparisons.Equal, user),
    rating == null ? null : TableQuery.GenerateFilterConditionForInt("Rating", QueryComparisons.Equal, rating.Value)
    // There may be more conditions...
  }
  return conditions
      .Where(c => c != null)
      .DefaultIfEmpty()
      .Aggregate((a, b) => TableQuery.CombineFilters(a, TableOperators.And, b));
}

1. Build a string array of all possible conditions, leaving null entries for those not needed. The GenerateFilterXXX methods safely generate strings with the correct syntax required by Azure Table queries.

2. Run LINQ Aggregate over the conditions to multi-nest the non-null ones inside CombineFilter calls. The sample code works correctly no matter how many conditions are non-null, even one or zero. The resulting string has this structure:

Combine(Combine(Combine(pk,rk),user),rating)

The Aggregate processing creates a effect similar to the way Mathematica's Nest function works. It's a neat and elegant example of the power of LINQ and functional programming style.

P.S. The CombineFilters in my sample is just a fancy and verbose way of putting the word 'and' between two string conditions. Knowing this, the technically correct Aggregate code above can be simplified to this:

return string.Join(" and ", conditions.Where(c => c != null));

Orthogonal Programming

Azure Table Key Encoding

I was getting BadRequest errors inserting rows into an Azure Table. I eventually noticed that some of the PartitionKey strings had \ (backslash) inside them. I then found various online arguments about which characters were forbidden and various silly workarounds, like base64 encoding all the bytes in the key string (which works of course, but is a completely ham-fisted approach).

The MSDN documentation clearly states which characters are forbidden in the key strings, but it doesn't mention % (percent) which some people have reported as troublesome. I have added % to the forbidden list.

My preferred way of encoding and decoding arbitrary strings as row keys is to use Regex Replace, which produces safe encoded strings where forbidden characters are replaced by +xx hex escape sequences. The +xx escape characters have no special meaning, I just made it up because it looks vaguely readable. You can invent your own preferred escaping system. The resulting encoded strings are still reasonably readable. The + escape prefix character itself also has to be considered forbidden and be encoded.

To encode:

string encodedKey = Regex.Replace(sourceKey,
  "[\x00-\x1f\x7f-\x9f\\\\/#?%+]",
  m => $"+{((int)m.Value[0]).ToString("X2")}");

To decode:

string decodedKey = Regex.Replace(value,
  "(\\+[0-9A-F]{2})",
  m => ((char)(Convert.ToByte(m.Groups[0].Value.Substring(1), 16))).ToString());

The string that originally caused my BadRequest error encodes like this:

Plain	*RCS\Demo
Encoded	*RCS+5CDemo

Running a worst case string through the code produces:

Plain	*Back\\Slash \t\vSlash/Hash#Q?Pct%Dot\x95Plus+ΑΒΓΔ
Encoded	*Back+5CSlash +09+0BSlash+2FHash+23Q+3FPct+25Dot+95Plus+2BΑΒΓΔ

Some web articles suggest that you use the Uri and HttpUtility classes to perform the encoding, but unfortunately they do not have the correct behaviour. The documentation makes no comment about Unicode characters outside of the explicitly forbidden set, so I ran a quick experiment to roundtrip a row with a PartitionKey containing the Greek characters ΑΒΓΔ and it worked okay.

Orthogonal Programming