The code for this post is on GitHub: getting-started-roslyn
Under the hood
Making a programming language actually useful is not simply about designing it well. It is also about providing supporting tools around the language: compilers, obviously, but also editors, build systems, etc.
There are few languages that give you tools to play under the hood. I am thinking about the Language Server Protocol for example. It permits to reuse parts of a compiler to get errors or the position of a definition. Roslyn is another example. Microsoft defined the idea behind it as “compiler as a service”, or more recently, a “platform”. Ok, what the hell does it mean?
Introduction to Roslyn
Using Roslyn you can access the inner workings of the compiler and use all its knowledge to create tools to boost your productivity or simplify your life. For instance, you could finally force everybody to respect the coding style of your project or extend the functionality of the IDE. A common example is to check the correctness of your Regex, while you are writing it, eliminating the need to run the program to check it.
You have it on Windows, Linux and Mac and works on .NET Core.
What we are going to do
In this post we are going to make sure that every int variable is initialized, and if it is already initialized, we make sure it is initialized to the value 42. It’s a simple example, but it will touch the three main areas of interest:
- syntax analysis
- semantic analysis
- syntax transformation
Believe it or not it will be even easy to understand!
Setup
We will create this example on Linux and using Visual Studio Code as an editor, but of course you could use whatever editor you want. Just make sure you install a recent version of .NET Core. Once you have done this, create a new project and open the file project.json. We have two things to do: add the dependencies needed for Roslyn and use a workaround to correct a bug; the fix is simply to add the value “portable-net45+win8+wp8+wpa81” to imports. After our edits we can restore the packages to check that everything works (ie. the bug is fixed).
{ "version": "1.0.0-*", "buildOptions": { "debugType": "portable", "emitEntryPoint": true }, "dependencies": { "Microsoft.Net.Compilers" : "1.3.2", "Microsoft.CodeAnalysis" : "1.3.2" }, "frameworks": { "netcoreapp1.0": { "dependencies": { "Microsoft.NETCore.App": { "type": "platform", "version": "1.0.1" } }, "imports": ["dnxcore50", "portable-net45+win8+wp8+wpa81"] } } }
The Main method
Let’s take a look at our Program.cs. We skip CreateTestCompilation, for now, the only thing to notice is that if you wanted just to look at the SyntaxTree you wouldn’t need to compile anything, you could just build it with something as simple as CSharpSyntaxTree.ParseText(“Text to parse”).
using System; using System.Collections.Generic; using Microsoft.CodeAnalysis; using Microsoft.CodeAnalysis.CSharp; using Microsoft.CodeAnalysis.CSharp.Syntax; using System.Reflection; namespace ConsoleApplication { public class Program { public static void Main(string[] args) { Compilation test = CreateTestCompilation(); foreach (SyntaxTree sourceTree in test.SyntaxTrees) { // creation of the semantic model SemanticModel model = test.GetSemanticModel(sourceTree); // initialization of our rewriter class InitializerRewriter rewriter = new InitializerRewriter(model); // analysis of the tree SyntaxNode newSource = rewriter.Visit(sourceTree.GetRoot()); if(!Directory.Exists(@"../new_src")) Directory.CreateDirectory(@"../new_src"); // if we changed the tree we save a new file if (newSource != sourceTree.GetRoot()) { File.WriteAllText(Path.Combine(@"../new_src", Path.GetFileName(sourceTree.FilePath)), newSource.ToFullString()); } } } private static Compilation CreateTestCompilation() { /* [...] see later */ } } }
We are looping through the source trees, the source files, and get the Semantic Model for everyone of them. This is needed to check the meaning of the code we are seeing.
In our example we have to be sure to initialize only integer variables and not, say, a string. Next, we are giving the semantic model to our InitializerRewriter and then we visit every node of the tree. InitializerRewriter is a kind of walker of the tree that can be used to modify the tree. More precisely, you can’t modify the original tree, but you can create a new one that is identical save for the nodes you have changed. In the end, we check if we have modified the original source and if that’s true we create a new source file. In real life you would rewrite the original one, but to ease tinkering we are creating a new one.
Programmatic compilation
I.e., where we show how you can give orders to your compiler.
private static Compilation CreateTestCompilation() { // creation of the syntax tree for every file String programPath = @"Program.cs"; String programText = File.ReadAllText(programPath); SyntaxTree programTree = CSharpSyntaxTree.ParseText(programText) .WithFilePath(programPath); String rewriterPath = @"InitializerRewriter.cs"; String rewriterText = File.ReadAllText(rewriterPath); SyntaxTree rewriterTree = CSharpSyntaxTree.ParseText(rewriterText) .WithFilePath(rewriterPath); SyntaxTree[] sourceTrees = { programTree, rewriterTree }; // gathering the assemblies MetadataReference mscorlib = MetadataReference.CreateFromFile(typeof(object).GetTypeInfo().Assembly.Location); MetadataReference codeAnalysis = MetadataReference.CreateFromFile(typeof(SyntaxTree).GetTypeInfo().Assembly.Location); MetadataReference csharpCodeAnalysis = MetadataReference.CreateFromFile(typeof(CSharpSyntaxTree).GetTypeInfo().Assembly.Location); MetadataReference[] references = { mscorlib, codeAnalysis, csharpCodeAnalysis }; // compilation return CSharpCompilation.Create("ConsoleApplication", sourceTrees, references, new CSharpCompilationOptions(OutputKind.ConsoleApplication)); }
CreateTestCompilation is fairly easy to understand: we need to compile the source files programmatically, and so we have to parse the text, gather the references to the assemblies needed for our program, and then give the order to compile.
Let’s initialize everything to 42
Because you know, why not?
using System; using System.Collections.Generic; using System.Linq; using Microsoft.CodeAnalysis; using Microsoft.CodeAnalysis.CSharp; using Microsoft.CodeAnalysis.CSharp.Syntax; using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory; namespace ConsoleApplication { public class InitializerRewriter : CSharpSyntaxRewriter { private readonly SemanticModel SemanticModel; public InitializerRewriter(SemanticModel semanticModel) { this.SemanticModel = semanticModel; } public override SyntaxNode VisitVariableDeclaration(VariableDeclarationSyntax node) { // determination of the type of the variable(s) var typeSymbol = (ITypeSymbol)this.SemanticModel.GetSymbolInfo(node.Type).Symbol; bool changed = false; // you could declare more than one variable with one expression SeparatedSyntaxList<VariableDeclaratorSyntax> vs = node.Variables; // we create a space to improve readability SyntaxTrivia space = SyntaxFactory.SyntaxTrivia(SyntaxKind.WhitespaceTrivia, " "); for (var i = 0; i < node.Variables.Count; i++) { // there is not an initialization if (this.SemanticModel.GetSymbolInfo(node.Type).Symbol.ToString() == "int" && node.Variables[i].Initializer == null) { // we create a new espression "42" // preceded by the space we create earlier ExpressionSyntax es = SyntaxFactory.ParseExpression("42") .WithLeadingTrivia(space); // basically we create an assignment to the espression we just created EqualsValueClauseSyntax evc = SyntaxFactory.EqualsValueClause(es) .WithLeadingTrivia(space); // we replace the null initializer with ours vs = vs.Replace(vs.ElementAt(i), vs.ElementAt(i).WithInitializer(evc)); changed = true; } // there is an initialization but it's not to 42 if (this.SemanticModel.GetSymbolInfo(node.Type).Symbol.ToString() == "int" && node.Variables[i].Initializer != null && !node.Variables[i].Initializer.Value.IsEquivalentTo(SyntaxFactory.ParseExpression("42"))) { ExpressionSyntax es = SyntaxFactory.ParseExpression("42") .WithLeadingTrivia(space); EqualsValueClauseSyntax evc = SyntaxFactory.EqualsValueClause(es); vs = vs.Replace(vs.ElementAt(i), vs.ElementAt(i).WithInitializer(evc)); changed = true; } } if(changed == true) { return node.WithVariables(vs); } return base.VisitVariableDeclaration(node); } } }
InitializerRewriter is an implementation of the abstract class CSharpSyntaxRewriter that is used when you want to modify the tree, while CSharpSyntaxWalker is chosen when you just want to just walk through it. VisitVariableDeclaration is one of many functions that you can overwrite, specifically the one that is invoked whenever the walker hit a VariableDeclarationSyntax node. Of course you can also overwrite the generic Visit to get access to all nodes. SyntaxTrivia is all the things that are useful to humans and not the compiler, such as whitespace or comments.
The first thing to notice is the first condition of the first if, it checks whether the type of the node that we are visiting is a int. Since we are looking at the Symbol of the model the condition will be true even if the declaration is in the form “var a = 0”, that is to say we are not merely checking the syntax, but the semantic value. If the second condition is true, that is to say there isn’t an initializer, we create one and we set the value to 42. The second if checks whether there is an int variable that is initialized, but it isn’t initialized to 42. In that case we change the initialization to 42, again, technically we create a new one.
Conclusion
The practical steps to create an initializer are three:
- you create a new value, in our case a “42” with a leading space
- create a new assignment with that value
- use the assignment to replace the original initializer
We can’t create the expression directly, we have to use the factory. These steps are intuitive, if you have experience in compilers: first you create a value then an expression. But if you don’t have experience in compilers it may seem superfluous: why you can’t just assign the initializer to 42 ?
If you want to access the power of the compiler you have to understand how it thinks, how it have to manage every line of code youwrite. For a compiler there always many possibilities to consider and you have to help him narrow them down. For instance you may want to assign not a simple value, but another variable. If you understand this, three lines aren’t too much to ask to access such power.
You have also to remember that you can’t modify anything in the original tree. We create a new VariableDeclarationSyntax node with new variables, with the help of the WithVariables method.
You can now go back to Program.cs and add a simple variable declaration such as int one, two; or string three and see the new source files in the new_src folder. If you run the program, you will notice that it also changes var i = 0 in var i = 42, proving that it checks the results of the compilation and not merely the syntax and that compilation may not always do what you expect it to do.
Enjoy playing with Roslyn!
After many posts from Federico Tomassetti, this one is brought to you by Gabriele Tomassetti. Because programming is a family business.