Friday, April 24, 2015

XmlReader/Writer and XElement

When reading or writing large amounts of XML you can use the XmlReader and XmlWriter classes to stream the data and avoid holding it all in memory. These classes are unfortunately quite low-level and the code that uses them can get quite cluttered. The XmlReader can be particularly tricky to use, as you must inspect the names and types of the nodes as they are sequentially consumed by the reader. You may need to remember your depth and position in the incoming XML, which can result in verbose and fragile code.

A great way to produce shorter and more readable code for XML stream reading is to combine the XmlReader class with the XElement class that was introduced in Framework 3.5. Use XmlReader to process the "outer" elements, then use XElement to read "inner" chunks of XML so they can be conveniently processed using LINQ to XML. Say you had incoming XML like this:

<export>
  <rows>
    <row id="1">
      <name>Fred Smith</name>
      <hired>1998</hired>
    </row>
    :
    : huge numbers of row elements
    :
  </rows>
</export>

Use an XmlReader to sequentially consume the XML, then when you hit the <rows> element, loop over the child <row> elements and pull them into an XElement for processing.

var settings = new XmlReaderSettings() { IgnoreWhitespace = true };
using (var reader = XmlReader.Create("export.xml", settings))
{
  reader.ReadToFollowing("rows");
  reader.Read(); // forward to the first row
  while (reader.Name == "row")
  {
    var elem = (XElement)XElement.ReadFrom(reader);
    int id = (int)elem.Attribute("id");
    string name = (string)elem.Element("name");
    int? hired = (int?)elem.Element("hired");
    Console.WriteLine("We would now import id {0} name {1} hired {2}", id, name, hired);
  }
  reader.ReadEndElement(); // eat the </rows>
}

A real example may contain many more nested elements, but the principle is the same where smaller inner chunks of XML are read into an XElement for convenient processing.

No comments:

Post a Comment