Getting started with Roslyn: transforming C# code

Getting started with Roslyn

Getting started with Roslyn on C#

The code for this post is on GitHub: getting-started-roslyn

Under the hood

Making a programming language actually useful is not simply about designing it well but it is also about providing supporting tools around the language: compilers, obviously, but also editors, build systems, etc.

There are few languages that give you tools to play under the hood. I am thinking about the Language Server Protocol for example. It permits to reuse parts of a compiler to get errors or the position of a definition. Roslyn is another example. Microsoft defined the idea behind it as “compiler as a service”, or more recently, a “platform”. Ok, what the hell does it mean?

Introduction to Roslyn

Using Roslyn you can access the inner workings of the compiler and use all its knowledge to create tools to boost your productivity or simplify your life. For instance, you could finally force everybody to respect the coding style of your project or extend the functionality of the IDE. A common example is to check the correctness of your Regex, while you are writing it, eliminating the need to run the program to check it.

You have it on Windows, Linux and Mac and works on .NET Core.

What we are going to do

In this post we are going to make sure that every int variable is initialized, and if it is already initialized, we make sure it is initialized to the value 42. It’s a simple example, but it will touch the three main areas of interest:

  1. syntax analysis
  2. semantic analysis
  3. syntax transformation

Believe it or not it will be even easy to understand!

Setup

We will create this example on Linux and using Visual Studio Code as an editor, but of course you could use whatever editor you want. Just make sure you install a recent version of .NET Core. Once you have done this, create a new project and open the file project.json. We have two things to do: add the dependencies needed for Roslyn and use a workaround to correct a bug; the fix is simply to add the value “portable-net45+win8+wp8+wpa81” to imports. After our edits we can restore the packages to check that everything works (ie. the bug is fixed).

The Main method

Let’s take a look at our Program.cs.  We skip CreateTestCompilation, for now, the only thing to notice is that if you wanted just to look at the SyntaxTree you wouldn’t need to compile anything, you could just build it with something as simple as CSharpSyntaxTree.ParseText(“Text to parse”).

We are looping through the source trees, the source files, and get the Semantic Model for everyone of them. This is needed to check the meaning of the code we are seeing.

In our example we have to be sure to initialize only integer variables and not, say, a string. Next, we are giving the semantic model to our InitializerRewriter and then we visit every node of the tree. InitializerRewriter is a kind of walker of the tree that can be used to modify the tree. More precisely, you can’t modify the original tree, but you can create a new one that is identical save for the nodes you have changed. In the end, we check if we have modified the original source and if that’s true we create a new source file. In real life you would rewrite the original one, but to ease tinkering we are creating a new one.

Programmatic compilation

I.e., where we show how you can give orders to your compiler.

CreateTestCompilation is fairly easy to understand: we need to compile the source files programmatically, and so we have to parse the text, gather the references to the assemblies needed for our program, and then give the order to compile.

Let’s initialize everything to 42

Because you know, why not?

InitializerRewriter is an implementation of the abstract class CSharpSyntaxRewriter that is used when you want to modify the tree, while CSharpSyntaxWalker is chosen when you just want to just walk through it. VisitVariableDeclaration is one of many functions that you can overwrite, specifically the one that is invoked whenever the walker hit a VariableDeclarationSyntax node. Of course you can also overwrite the generic Visit to get access to all nodes. SyntaxTrivia is all the things that are useful to humans and not the compiler, such as whitespace or comments.

The first thing to notice is the first condition of the first if, it checks whether the type of the node that we are visiting is a int. Since we are looking at the Symbol of the model the condition will be true even if the declaration is in the form “var a = 0”, that is to say we are not merely checking the syntax, but the semantic value. If the second condition is true, that is to say there isn’t an initializer, we create one and we set the value to 42. The second if checks whether there is an int variable that is initialized, but it isn’t initialized to 42. In that case we change the initialization to 42, again, technically we create a new one.

Conclusion

The practical steps to create an initializer are three:

  1. you create a new value, in our case a “42” with a leading space
  2. create a new assignment with that value
  3. use the assignment to replace the original initializer

We can’t create the expression directly, we have to use the factory. These steps are intuitive, if you have experience in compilers: first you create a value then an expression. But if you don’t have experience in compilers it may seem superfluous: why you can’t just assign the initializer to 42 ?

If you want to access the power of the compiler you have to understand how it thinks, how it have to manage every line of code youwrite. For a compiler there always many possibilities to consider and you have to help him narrow them down. For instance you may want to assign not a simple value, but another variable. If you understand this, three lines aren’t too much to ask to access such power.

You have also to remember that you can’t modify anything in the original tree. We create a new VariableDeclarationSyntax node with new variables, with the help of the WithVariables method.

You can now go back to Program.cs and add a simple variable declaration such as int one, two; or string three and see the new source files in the new_src folder. If you run the program, you will notice that it also changes var i = 0 in var i = 42, proving that it checks the results of the compilation and not merely the syntax and that compilation may not always do what you expect it to do.

Enjoy playing with Roslyn!

After many posts from Federico Tomassetti, this one is brought to you by Gabriele Tomassetti. Because programming is a family business.

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
9 replies
  1. Gabriele
    Gabriele says:

    It’s declared at line 28. We have to declare outside the loop because there could be more than one variable so we have to check and eventually change them one by one.

  2. JnRouvignac
    JnRouvignac says:

    Thanks!
    Mobile phone effect indeed.
    Far too much scrolling to the right and poor search capabilities.

    Thanks for this introduction.

    Roslyn’s rewrite API works in the same way as Python: https://docs.python.org/3.4/library/ast.html#ast.NodeTransformer

    Node names are similar enough to Eclipse JDT API: http://help.eclipse.org/neon/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FVariableDeclarationStatement.html

  3. Gabriele
    Gabriele says:

    The information about Pyhton and JDT is interesting, I guess that’s the default way smart people decide to build such systems ?

    Also, thanks to your comment I noticed that the github version didn’t have the comments in the code, so now I have corrected it. So the mobile phone effect was a blessing in disguise. Thanks for your comment.

Trackbacks & Pingbacks

  1. […] have already seen some examples of transformation and analysis of C# code with Roslyn. Now we are going to see how to create a more complex example […]

  2. […] have mentioned static analysis when we talked about Roslyn, but this service deliver it. Although it doesn’t cover C#, which should probably be a crime. […]

  3. […] Getting started with Roslyn: transforming C# code Roslyn is a very interesting framework that can be used to parse, analyze and transform C# code. In this post we show you how to perform simple transformation to C# code […]

  4. […] week we have seen how to use Roslyn to rewrite source code to your liking. That’s all well and good, but it’s not the only thing you can do when […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply