Monday, November 29, 2021

C# Compiler Source Generators

T4 History

I've been a fan of the T4 Text Template facility since I stumbled upon it about 15 years ago. A .tt file in a project with the build action TextTemplatingFileGenerator blends smoothly into the project, edit experience and build process. There are plenty of good articles around on how to use T4 Text Templates.

Writing a .tt file can be a delicate process at first, but once you get the hang of coding the <# #> and <#+ #> pairs you will soon see that vast amounts of repetitive code can be reliably generated with ease. I have used T4 successfully in many large projects.

There are no serious disadvantages to using T4 as far as I'm concerned. You may want to find a Visual Studio extension for .tt file syntax colouring, otherwise the editor shows the contents as plain text. One .tt file generates one output file by default, but if you want to generate multiple split files (like old Entity Framework templates) then look for some utility classes people have published to allow that.

Roslyn Source Generators


In mid 2021 I was very interested to hear that source generation had become a Roslyn compiler feature. In a nutshell, you write an annotated class that acts like a special compiler analyzer, reference the class in a project and it is invoked in the compile stage after the syntax tree has been built. The generator can inspect the whole tree and use it to generate code which is added to the compilation. For tutorials see:


At the time of writing this I've only had time to sanity check that source generation works as advertised. It does of course, but there are some serious worries.
  • Creating a source generator is a moderately complicated process compared to T4 templates. You have to create a class that follows strict conventions and composes the source code as a string, resulting in lots of tedious string manipulation. The consuming project needs a special reference to use the generator.
  • I can't find any way of debugging a source generator. I've seen samples of rather cryptic code that create a mock compile environment, but it looks like hours of research.
  • The whole process seems to be silent, that is, there is no clue about what is happening when inside the source generation process. Days later I found that you have to set <EmitCompilerGeneratedFiles> True in the consuming project to preserve the generated files which can be found deep under the obj folder. A mistake in the generator will produce typical build errors, but double-clicking an error does not jump to the generated source file.
  • Examining the compiler generated syntax tree is really tricky (see next section).

OnVisitSyntaxNode

The generator class can use the OnVisitSyntaxNode method to examine each syntax tree node generated by the compiler, probably using the provided information to discover what code generation is required. The problem is ... figuring out what is passed to this method and how to cast it and how to use it is harder then abstract algebra. Dozens of samples perform weird casts and call mysterious methods to get information about classes and members. There are no clues about what information is available to this method or how to retrieve it.

I'm sure the Roslyn generated tree and classes are all documented somewhere, but where? I'm actually quite angry about how bewildering it is to try and do something useful in this method. Luckily I found enough bits of sample code to get me going.

This article suggests you install the .NET Compiler Platform SDK to provide a visualiser experience inside Visual Studio (I just did, and it works). I also remembered that LINQPad has a similar visualiser feature. These tools may help you progress and understand how to write better generators.

Summary


The source generation feature is cleverly implemented at just the right level in the build process, but it's just so delicate to code, deploy and consume correctly. One little mistake and it all goes to hell.

If you are generating "code from your own code", then source generation is the most appropriate technique. The classic example is generating INotifyPropertyChanged members, as discussed in one of the links above.

If you are generating code (or anything else) from arbitrary rules you have invented, then T4 Text Templates are the best choice. For example, I often create an XML file containing a mixture of rules about what classes, interfaces, stubs, etc are required in the projects, then T4 templates read that XML file and generate whatever is needed in different parts of the app. You don't just have to generate C# code, I've previously used T4 to generate test data and XAML files for a WPF desktop app.

Monday, November 22, 2021

Count Lines of Code

If you have written any utility which counts lines of code, then throw it away because I accidentally discovered someone has done the job ... properly! I've deleted my old scripts because they were really tricky to write and the numbers they produced could only be regarded as approximations. Parsing arbitrary file contents accurately is inherently difficult and I'm glad someone with more patience, expertise and time has done the job for me.

See ▸ Count Lines of Code on GitHub

Just download cloc.exe and put it in a folder with your other favourite utilities.

There are dozens of command line switches to control the behaviour. I originally excluded files with 'Generated' in the names and other patterns for files that are not human authored, but it turns out it wasn't necessary because it can somehow tell the difference between different types of generated files. For example, it conveniently produces separate result lines for 'C# Generated' and 'C# Designer'. I'm not sure if it's looking at the file name or special contents of the files, but I'll assume it knows what it's doing.

It's simple to use and it produces neat, concise and probably accurate statistics.