Wednesday, February 24, 2021

Generating Excel documents in C#

From a .NET project I needed to generate an Excel document containing a single sheet containing titles, numbers and dates with simple but pleasant formatting. I found there are two ways of doing this without using 3rd party products.

Open XML

This is the "low level" option. Add NuGet package DocumentFormat.OpenXml to your project and following the instructions in the Open XML SDK documentation pages.

The advantage of this option is that you can create and manipulate modern Office documents without any external dependencies.

The disadvantage of this option is the staggering complexity of manipulating document elements at the lowest level. It's like writing Office assembly language.

I tried to create an Excel document with simple formatting and I almost died of exhaustion from the effort. I was going reasonably well until I had to create styles, and all of the examples I could find on how to build sets of styles were so verbose, fragile and complicated that I threw in the towel.

Using Open XML is an option for specialists who need to manipulate Office documents without dependencies and have the staff or time to research how to do it properly. Web searches hint that there are so many quirks and "gotchas" that there is danger of madness.

COM Interop


April 2023 Note -- In the following section I describe how to create a new sheet and format and fill cells. It turned out in my particular case that it was much simpler to use Excel to manually format a skeleton document in exactly the way I desired, setting cell formats, columns widths, fonts, etc, then load the document and simply put values into the cells. This technique is highly recommended if you have sheets with a regular and structured layout, such as a report with columns. For other more complex cases though, you'll have to fall back to verbose code for cell formatting.

After abandoning the Open XML option I attempted to use classic COM Interop to manipulate an Excel document. I used COM to read Excel sheets about 10 years ago and I remembered that it was only moderately tricky.

Add a project reference to the Microsoft.Office.Interop.Excel PIA to get access to the COM interfaces and classes for working with Excel documents.

This option is certainly much easier than using Open XML, but there's an initially steep learning curve before you "get the feel of it" and become productive. It took me 6 hours of solid work and intense web searching to create a simple but nicely formatted document. Some of the COM interfaces are strongly-typed and you get helpful intellisense in Visual Studio, but many properties are of type object and there is no way of figuring out what to do without running many web searches in the hope that you will find a working sample. Seemingly simple tasks like setting cell borders, background colours or columns widths requires obscure bits of code that would be nearly impossible to discover without the help of samples from other poor people who have cracked the secrets.

📌 Important Note — COM references must be released or copies of Excel will remain running in your user Process. In the following code you will see careful use of ReleaseComObject to release every COM object. Take special note of the symmetry of the startup and shutdown code. For more information see Release Excel COM Objects.

Following are snippets of useful COM code that took me hours to discover.

Create a new customised Excel document

_Application app = new Application();
Workbooks books = app.Workbooks;
_Workbook book = books.Add();
Sheets sheets = book.Sheets;
_Worksheet sheet = book.ActiveSheet as _Worksheet;
sheet.Name = "Product Versions";
sheet.PageSetup.Orientation = XlPageOrientation.xlLandscape;
app.StandardFontSize = 10;

Create a style

Style headstyle = book.Styles.Add("HeadStyle");
headstyle.HorizontalAlignment = XlHAlign.xlHAlignCenter;
headstyle.Font.Bold = true;
headstyle.Interior.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.PowderBlue);
// --repeat--

Create a row of 4 heading text cells

Range range = sheet.Range["A1", "D1"];
range.Style = headstyle;
range = sheet.Range["A1"];
range.Cells.NumberFormatLocal = "@";   // This means Text
range.ColumnWidth = 24;    // First column is wider
Marshal.ReleaseComObject(range);
sheet.Cells[1,1] = "Heading A1";
sheet.Cells[1,2] = "B1";
// --etc--

Set some cells as dates

range = sheet.Range["B3", "H3"];
range.Style = footstyle;
Marshal.ReleaseComObject(range);
range.Cells.NumberFormat = "dd-MMM-yyyy";
sheet.Cells[3, 2] = UpdatedTime.ToOADate();

Setting cell borders

Setting cell background colours causes the borders to be overwritten. If you want cell borders around background coloured cell then you have to set the borders manually.

range = sheet.Range["C1", "D12"];
range.Cells.Borders.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.Gainsboro);
range.Cells.Borders.LineStyle = XlLineStyle.xlContinuous;
Marshal.ReleaseComObject(range);

Graceful shutdown

Finally, the workbook is saved to a file and everything is closed and released (even sheets which wasn't used). I prefer not to set app.Visible = true, but simply use Process.Start to "launch" the file afterwards. Start Task Manager or procexp.exe and ensure that an instance of Excel can be seen starting and then ending when your Interop code runs.

string savename = @"C:\temp\Sample.xlsx";
if (File.Exists(savename)) File.Delete(savename);
book.SaveAs(savename);
book.Close();
app.Quit();
Marshal.ReleaseComObject(sheet);
Marshal.ReleaseComObject(sheets);
Marshal.ReleaseComObject(book);
Marshal.ReleaseComObject(books);
Marshal.ReleaseComObject(app);
Process.Start(savename);

By piecing together bits of the above code it should be possible to create an Excel document that has custom formatting for font size, cell background colours, cell content category types, cell borders.

Handling of cells and ranges is really clumsy and I'm wondering if there's a better way than referencing single cells with the Cells[row,col] syntax and ranges with Range["A1","B2"] format. Mixing the formats is quite irritating.

There are many other appearances and techniques I haven't tried, like text colour, merging cells, font styles (italic, bold, etc). It's pretty likely that just about anything that's possible in Excel can be done via COM Interop as well, but finding out how will surely require meticulous reading of the documentation or hours of web searching for working samples.

Be careful of the order of code. I found that some code would be frustratingly ineffective until I carefully reordered a couple of lines. I was often just shuffling code around until it worked.

I haven't figured out exactly how to mimic the formatting of cells as you do in Excel with the Ctrl+1 dialog. The number formats make little sense at the moment.

Good luck. It's the least worst way of making Excel documents without 3rd party libraries.

Sunday, February 21, 2021

SQLite rehabilitated

 About 10 years ago I attempted to use SQLite in some projects, but getting it integrated and working nicely in a .NET project was a fragile and frustrating process. I remember a cavalcade of compile and runtime errors caused by 32/64 bit confusion and making bad choices due to scatter-brained documentation.

As an experiment this weekend I tried to process one of my old SQLite databases in a modern .NET Core project via Entity Framework, and I'm very pleased to report that it was a really smooth process. I now consider SQLite rehabilitated and is a good candidate for anyone who needs an in-process relational database that needs little to no configuration and has no external dependencies.

The SQLite documentation points out that you don't have to use it for relational data, it can be used as a mini file-system for storing arbitrary key-value data. Their simplest example is a table with a string primary key and another BLOB column for anything you like. A decade ago I would have considered using an ESENT database for that sort of thing, but now I consider SQLite to be a far superior choice.

What follows is a reminder to myself about how to process a SQLite database in a .NET Core project.

Add the Nuget package Microsoft.EntityFrameworkCore.Sqlite.

Create the classes for the tables. Here are skeletons of two tables that had a one-many relationship.

public sealed class Media
{
  public long Id { get; set; }
  public string Name { get; set; }
  :
  public List<Volume> Volumes { set; set; }
}
public sealed class Volume
{
  public long Id { get; set; }
  public string Title { get; set; }
  public long MedId { get; set; }
  :
  public Media Media { get; set; }
}

The relationship is one Media to many Volumes. Note also that the foreign key column is named MedId, which caused a confusing runtime error. When I ran a query like this:

var source = context.Volume.Include(v => v.Media);

I got the runtime error column MediaId is not defined, which is quite true, that name is not defined anywhere in my code or the database. It turns out that Entity Framework assumes that foreign keys are named tableId. So had I chosen the name MediaId instead of MedId I wouldn't have had this problem. To work around this I had to override OnModelCreating in the following code that defines the context.

public class OpusContext : DbContext
{
  public DbSet<Media> Media { get; set; }
  public DbSet<Volume> Volume { get; set; }
  protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
  {
    optionsBuilder.UseSqlite(@"Data Source=C:\temp\opus.db3");
  }
  protected override void OnModelCreating(ModelBuilder modelBuilder)
  {
    modelBuilder.Entity<Volume>()
      .HasOne(v => v.Media)
      .WithMany(m => m.Volumes)
      .HasForeignKey(nameof(EFLiteTest.Volume.MedId));
  }
}

This manually defines the relationship and which column is the foreign key.

In summary: by adding the package and defining the row and context classes in the traditional way I had a complete way of processing the SQLite database with all of the strongly-typed convenience provided by Entity Framework.

Note - Types

Remember that SQLite has a lot of quirks compared to other big name RDBs. The weirdest thing is that it's "weakly typed" (in my words) or "flexibly typed" (in their words). You can create a table with columns that all have strictly defined types and lengths as you would in SQL Server, but SQLite ignores the column definitions and will let you put anything of any length into any column at runtime. This is all described on their quirks documentation page. If you're using Entity Framework and not using raw INSERT statements, then this quirk is irrelevant because all data manipulation will be strongly-typed.