Tuesday, January 28, 2020

Parsing a VDPROJ file

While writing a small utility to scan and summarise Visual Studio project files of type vcproj and vdproj I was reminded that the vdproj file is not XML, it's a custom format that's a little bit like JSON. A quick search revealed that there is no standard library to parse vdproj files, and I saw lots of ugly suggestions as workarounds. It turns out you can convert a vdproj file into an XElement with about 50 lines of quite straightforward code which I have pasted into a static method below.

There is a bit of a hack to deal with "node" names which might be either simple strings or an unpredictable pair of values separated by a semi-colon. A simple name can be turned into a <name> element, but without knowing the rules for the other types, they are simply turned into nodes that look like this sample:
<node data="BootstrapperCfg:{63ACBE69-63AA-4F98-B2B6-99F9E24495F2}">
Ignoring that glitch though, you finish up with a neat XML representation of the whole vdproj file and you can use LINQ-to-XML to query it.

public static XElement ParseVDProj(string vdprojFilename)
{
  using (var reader = new StreamReader(vdprojFilename))
  {
    var stack = new Stack<XElement>();
    var root = new XElement("node", new XAttribute("name", "root"));
    stack.Push(root);
    XElement head = root;
    string line = reader.ReadLine();
    while (line != null)
    {
      if (Regex.IsMatch(line, @"^\s*}"))
      {
        // A close brace pops the stack back a level
        stack.Pop();
        head = stack.First();
      }
      else
      {
        Match m = Regex.Match(line, @"\x22(\w+)\x22 = \x22(.+)\x22");
        if (m.Success)
        {
          // A key = value is added to the current stack head node
          string name = m.Groups[1].Value;
          string val = m.Groups[2].Value;
          var elem = new XElement(name, val);
          head.Add(elem);
        }
        else
        {
          // Otherwise we must be pushing a new head node onto the stack.
          // HACK: If the name is a simple alphanum string then it's used
          // as the node name, otherwise use a fake <node> with the strange
          // name as a data attribute.
          XElement elem = null;
          string rawname = Regex.Match(line, @"^\s*\x22(.+)\x22\s*$").Groups[1].Value;
          if (Regex.IsMatch(rawname, @"^\w+$"))
          {
            elem = new XElement(rawname);
          }
          else
          {
            elem = new XElement("node", new XAttribute("data", rawname));
          }
          head.Add(elem);
          stack.Push(elem);
          head = elem;
          reader.ReadLine();  // Eat the opening brace
        }
      }
      line = reader.ReadLine();
    }
    return root;
  }
}