Translate Javascript to C#

Problems and Strategies to Port from Javascript to C#

Let’s say you need to automatically port some code from one language to another, how are going to do it? Is it even possible? Maybe you have already seen a conversion between similar languages, such as Java to C#. That sounds much simpler in comparison.

In this article we are going to discuss some strategies to translate Javascript to a very different language, such as C#. We will discuss the issues with that and plan some possible solutions. We will not arrive to writing code: that would be far too complicate for an introduction to the topic. Let’s avoid putting together something terribly hacky just for the sake of typing some code.

Having said that, we are going to see all the problems you may find in converting one real Javascript project: fuzzysearch, a tiny but very successful library to calculate the difference between two strings, in the context of spelling correction.

When it’s worth the effort

First of all you should ask yourself if the conversion it’s worth the effort. Even if you were able to successfully obtain some runnable C# you have to consider that the style and the architecture will probably be “unnatural”. As consequence the project could be harder to maintain than if you write it from scratch in C#.

This is a common problem even in carefully planned conversion, as the one who originated, that started as a conversion from Java to C#. Furthermore, you will not be able to use it without manual work for every specific project, because even the standard libraries are just different. Look at the example: while you could capitalize the length of haystack.length, you cannot just capitalize charCodeAt, you will have to map different functions in the source and destination language.

On the other hand all languages have area of specialization which may interest to you, such as Natural Language Processing in Python. And if you accept the fact that you will have to do some manual work, and you are very interested in one project, creating an automatic conversion will give you a huge head start. Though if you are interested in having a generic tool you may want to concentrate on small libraries, such as the Javascript one in the example.

Parse with ANTLR

The first step is parsing, and for that, you should just use ANTLR. There are already many grammars available which may not necessarily be up-to-date, but are much better than starting from scratch and they will give you an idea of the scale of the project. You should use visitors, instead of listeners, because they allow you to control the flow more easily. You should parse the different elements in custom classes, that can manage the small problems that arises. Once you have done this generating C# should be easier.

The small differences

There are things that you could just skip, such as the first and last lines, they most probably don’t apply to your C# project. But you must pay attention to the small differences: the var keyword has a different meaning in Javascript and C#. By coincidence it would work most of the time, and would be quite useful to avoid the problem of the lack of strict typing in Javascript. But it’s not magic, you are just hoping that the compiler will figure it out. And sometimes it’s not a one to one conversion. For instance you can’t use in C# in the way it’s used in the initialization of the for cycle.

The continue before outer should be transformed in goto,  but when it is alone it works just as in C#. A difference that could be fixed quite brutally is the strict equality comparison “===/!==”, that could be replaced with “==/!=” in most of cases, since it’s related to problems due to the dynamic typing of Javascript. In general you can do a pre-parse check and transform the original source code to avoid some problems or even comment out some things that cannot be easily managed.

I present you thy enemy: dynamic typing

The real problem is that Javascript uses dynamic typing while C# use strict typing. In Javascript anything could be anything, which lead to certain issues, such as the aforementioned strict equality operator, but it’s very easy to use. In C# you need to know the type of your variables, because there are checks to be made. And this information is simply not available in the Javascript source code. You might think that you could just use the var keyword, but you can’t. The compiler must be able to determine the real type at compile time, something that will not always be possible. For example you cannot use it in declaring function arguments.

You can use the dynamic keyword, which makes the type be determined at execution time. Still this doesn’t fixes all the problem, such as initialization. You may check the source code for literal initialization or, in theory, even execute the original Javascript in C# and find a way to determine the correct type. But that would be quite convoluted. You might get lucky, and in small project, such our example, you will, but not always.

There are also problems that can be more easily to manage than you imagined. For instance, assigning a function to a variable it’s not something that you usually do as explicitly in C# as you do in Javascript. But it’s easy using the type delegate and constructs such as Func. Of course you still have to deal with determining the correct types of the arguments, if any is present, but it doesn’t add any other difficulties per se.

Not everything is an object and other issues

In Javascript “string” is a string, but not an object, while in C# everything is an object, there are no exceptions. This is a relevant issue, but it’s less problematic than dynamic typing. For instance to convert our example we just have to wrap around the function a custom class, which is not really hard. One obvious problem is that there are different libraries in different languages. Some will not be available in the destination language. On the other hand some part of the project might not be needed in the destination language, because there are already better alternatives. Of course you still have to actually change all the related code or wrap the real library in the destination language around a custom class that mimic the original one.


There are indeed major difficulties even for small project to be able to transform from language to another, especially when they are so different like Javascript and C#. But let’s image that you are interested in something very specific, such a very successful library and its plugins. You want to port the main library and to give a simpler way for the developers of the plugins to port their work. There are probably many similarities in the code, and so you can do most of the work to manage typical problems and can provide guidance for the remaining ones.

Converting code between languages so different in nature it is not easy, that is sure, but you can apply some mixed automatic/manual approach by converting a large amount of code automatically and fix the corner cases manually. If you can also translates the tests maybe you can later refactor the code, once it is in C#, and over time improve the quality.

Code Generation with Roslyn: a Skeleton Class from UML

Code Generation with Roslyn: a Skeleton Class from UML

Get the source code for this article on GitHub

We have already seen some examples of transformation and analysis of C# code with Roslyn. Now we are going to see how to create a more complex example of code generation with Roslyn and parsing with Sprache. We are going to create a skeleton class from a PlantUML file. In short, we are doing the inverse of what we have done. The first step is to parse, of course.

As you can see, there are four entities in this example: PlantUML start and end tags and class, variable and function declarations.

Parsing all the things

We are going to parse the file line by line instead of doing it in one big swoop, this is in part because of the limitations of Sprache, but also because it’s easier to correctly parse one thing at a time instead of trying to get it right all in one go.

With CharExcept we are parsing all characters except for the one(s) indicated, which is an handy but imprecise way to collect all the text for an identifier. The roughness of this process is obvious, because we are forced to exclude all the characters that comes after an identifier. If you look at the file .plantuml, at the beginning of the article, you see that there is a space after the field names, a ‘}’ after the modifier static, a ‘:’ after the argument, to divide identifier and its type, and finally the closing parenthesis, after the type. You might say that we should simply have checked for “Letters”, which would work in this specific case, but would exclude legal C# name for identifiers.

The Modifier parser is quite uninteresting, except for the lines 6 and 11 where we are seeing the same problem just mentioned to identify the correct name. The last case is referring to something that doesn’t happen in this example, but could happen in others UML diagrams: override modifiers. The real deal is in the lines 18 and 22, where we are seeing the Ref parser, which is used, as the documentation says, to: “Refer to another parser indirectly. This allows circular compile-time dependency between parsers”. DelimitedBy is use to select many of the same items delimited by the specified rule, and finally Optional refers to a rule that isn’t necessary to parse correctly, but it might appear. Since the rule is optional, the value could be undefined and it must be accessed using the method shown on the line 22. The rule Method is slightly more complicated, but it uses the same methods. In case you are wondering, methods without a return type are constructors.

Parsing line by line

We can see our parser at work on the main method, where we try to parse every line with every parser and, if successful, we add the value to a custom type, that we are going to see later. We need a custom type because code generation requires to have all the elements in their place, we can’t do it line by line, at least we can’t if we want to use the formatter of Roslyn. We could just take the information and print them ourselves, which is good enough for small project, but complicated for larger one. Also, we would miss all the nice automatic options for formatting. On line 13 we are skipping a cycle, if we found a method, because method could also be parsed, improperly, as fields, so to avoid the risk we jump over.

Code Generation

If you remember the first lessons about Roslyn it’s quite verbose, because it’s very powerful. You have also to remember that we can’t modify nodes, even the ones we create ourselves and are not, say, parsed from a file. Once you get around to use SyntaxFactory for everything, it’s all quite obvious, you have just to find the correct methods. The using directive are simply the ones usually inserted by default by Visual Studio.

Generation of methods

Let’s start by saying that Declarations and DeclarationType are fields in our custom class, that is not shown, but you can look at it in the source code. Then we proceed to generate the method of our skeleton C# class. MethodDeclaration allow us to choose the name and the return type of the method itself; mods refer to the modifiers, which obviously could be more than one, and so they are in a list. Then we create the parameters, which in our case need only a name and a type.

We choose to throw an exception, since we obviously cannot determine the body of the methods just with the UML diagram. So we create a throw statement and a new object of the type NotImplementedException. This also allows us to add a meaningful body to the method. You should add a body in any case, if you use the formatter, because otherwise it will not create a correct method: there won’t be a body or the curly braces.

Generation of fields

The case “field”  is easier that the “method” one and the only real new thing is on line 12, where we use a method to parse the type from a string filled by our parser.

The end of the Generate method is where we add the class created by the for cycle, and use Formatter. Notice that cu is the CompilationUnitSyntax that we created at the beginning of this method.

Limitations of this example

The unit tests are not shown because they don’t contain anything worth noting, although I have to say that Sprache is really easy to test, which is a great thing. If you run the program you would find that the generated code is correct, but it’s still missing something. It lack some of the necessary using directives, because we can’t detect them starting just from the UML diagram. In a real life scenario, with many files and classes and without the original source code, you might identify the assemblies beforehand and then you could use reflection to find their namespace(s). Also, we obviously don’t implement many things that PlantUML has, such as the relationship between classes, so keep that in mind.


Code Generation with Roslyn is not hard, but it requires to know exactly what are you doing. It’s better to have an idea of the code you are generating beforehand, or you will have to take in account every possible case, which would make every little step hard to accomplish. I think it works best for specific scenarios and short pieces of code, for which it could become very useful. In such cases, you could create tools that are useful and productive for your project, or yourself, in a very short period of time and benefit from them, as long as you don’t change tools or work habit. For instance, if you are a professor, you could create an automatic code generator to translate your pseudo-code of a short algorithm in real C#. If you think about it, this complexity is a good thing, otherwise, if anybody could generate whole programs from scratch, us programmers will lose our jobs.

You might think that using Sprache for such a project might have been a bad idea, but it’s actually a good tool for parsing single lines. And while there are limitations, this approach make much easier to make something working in little time, instead of waiting to create a complete grammar for a “real” parser. For cases in which code generation is most useful, specific scenarios and such, this is actually the best approach, in my opinion, since it allows you to easily pick and choose which part to use and just skip the rest.

Create a simple parser in C# with Sprache

Create a simple parser in C# with Sprache

You can find the code for this article on github

Everybody loves ANTLR, but sometimes it may be overkill. On the other hand, a regular expression just doesn’t cut it or it may be too complicated to maintain. What a developer can do in such cases ? He uses Sprache. As its creators say:

Sprache is a simple, lightweight library for constructing parsers directly in C# code.

It doesn’t compete with “industrial strength” language workbenches – it fits somewhere in between regular expressions and a full-featured toolset like ANTLR.

It is a simple but effective tool, whose main limitation is being character-based. In other words, it works on characters and not on tokens. The advantage is that you can work directly with code and you don’t have to use external tools to generate the parser.

The guessing game

You can see the project website if you want to see specific real uses, let’s just say that its even credited by ReSharper and it was created more than six years ago, so it’s stable and quite good. It’s ideal to manage things like error messages created by other tools that you have to deal with, to read logs, to parse queries like the ones you would uses for a simple search library or to read simple formats like Json. In this article we will create a parser for a simple guessing game, we will use .NET Core and xUnit for the unit tests, so it will work also on Linux and Mac.

The objective of the game is to guess a number, and to do that you can ask if the number is greater than a number, less than a number or between two numbers. When you are ready to guess you simply ask if it’s equal to a certain number.

Setup the project

We will use VSCode, instead of Visual Studio, but in the github project you would find two projects, one for each: this because there are still some compatibility quirks relative to project.json and the different .NET Core tools versions used by Visual Studio or the standalone command line version. To clarify, the project.json generated by the .NET Core standalone command line will work also with Visual Studio, but not viceversa (this might be changed when you will read this). Also, with two projects you can easily see how Visual Studio integrates xUnit tests. The C# code itself is the same.

Create the file global.json in the directory of your project, in our case SpracheGame, then create another SpracheGame folder inside src and a SpracheGame.Tests folder inside test. Inside the nested SpracheGame folder you can create a new .NET core program with the usual:

While you are nside the SpracheGame.Tests folder you can create a xUnit test project with:

You can see the final structure here.

SpracheGame folder structure

Change both project.json, adding sprache as a dependency to the main project:

…and add the main project as a dependency for the xUnit test project.

If you are using Visual Studio you may need to add a runtimes section to both of your project.json:

See the .NET documentation for .NET Core Runtime IDentifier (RID) catalog if you need to know other platform IDs.

Create GameParser

Let’s start by creating a class called GameParser and by recognizing numbers and commands.

On line 3 there is the code to parse a number: we start with Sprache.Parse followed by a digit, of which there must be at least one, then we convert from IEnumerable<char> to string, with Text(), and finally we discard whitespace with Token(). So first we choose the type of character we need, in this case Digit, then we set a quantity modifier and trasform the result in something more manageable. Notice that we return Parser<string> and not an int.

On the lines 5-6 we order to the parser to find a character ‘<‘  followed by  one ‘>’, using  Then(). We return an enum instead of a simple string. We can easily check for the presence of different options with the Or(), but it’s important to remember that, just as for ANTLR, the order matters. We have to put the more specific case first, otherwise it would match the generic one instead of reaching the correct case.

Now we have to combine this two simple parser in one Play, and thanks to the LINQ-like syntax the task is very simple. Most commands require only a number, but there is one that requires two, because we have to check if the number to guess is between two given numbers. It also has a different structure, first there is a number, then the symbol, and finally the second number. This is a more natural syntax for the user than using a ‘<>’ symbol followed by two numbers. As you can see, the code is quite simple, we gather the elements with from .. in .. and then we create a new object with select.

It’s time for Play

The only interesting things in the Play class are on the lines 27-51, the Evaluate function, where the “magic” happens, and I use the term magic extremely loosely. The number to guess is provided to the function, then it’s properly checked with the command and the numbers of the specific play that we are evaluating.

Unit Tests are easy

There are basically no disadvantages in using xUnit for our unit tests: it’s compatible with many platforms, it’s still integrated with the Visual Studio Test Explorer and it also have a special feature: theory. Theory is a special kind of test that allow you to supply multiple inputs with one test. Lines 3-6 shows exactly how you can do it. In our case we are testing that our parser can parse numbers with many digits.

The following test is a typical one, we are checking that the symbol ‘>’ is correctly parsed as a Command.Greater. On Line 27 we are making sure that an Exception is raised if we encounter an incorrect Play. Sprache allows also to use TryParse, instead of Parse, if you don’t want to throw an exception. As you can see the simplicity of tool make very easy to test it.

Let’s put everything together

The main function doesn’t contain anything shocking, on the lines 27-28 we parse the input and execute the proper command, then, on 31, we check whether we guessed the correct number and if so we prepare to exit the cycle. Notice that we provide a way to exit the game even without guessing the number correctly, but we check for ‘q’ before trying to parse, because it would be an illegal command for GameParser.


This blog talks much about Language Engineering, which is a fascinating topic, but it is not always used in the everyday life of the average developer. Sprache, instead, is one tool that any developer could find a use for. When a RegEx wasn’t good enough you probably have simply redesigned your application, making your life more complicated. Now you don’t need to, when you meet the mortal enemy of regular expressions, that is to say nested expression, you can just use Sprache, right in your code.

Getting started with ANTLR in C#

The code for this article is available on github.

Readers of this website will know that ANTLR is a great tool to quickly create parsers and help you in working with a known language or create your DSL. While the tool itself is written in Java it can also be used to generate parsers in several other languages, for instance Python, C# or Javascript (with more languages supported by the newly released 4.6 version).

If you want to use C# you can integrate ANTLR in your favorite IDE, as long as that IDE is any recent edition of Visual Studio. The runtime itself works also on Mono, and can be used as a standalone and you can look at the issues for the official C# target for ANTLR 4 to see if you can make it work with other setups, but the easiest way is to use Visual Studio and the provided extension to integrate the generation of the grammar into the your C# project.


The first step is to install ANTLR Language Support extension for Visual Studio, you just have to search for it in for Visual Studio going to ToolsExtensions and Updates. This will allow to easily integrate ANTLR into your workflow by generating automatically the parser and, optionally, listener and visitor starting from your grammar. Now you can add a new ANTLR 4 Combined Grammar or an ANTLR 4 Lexer/Parser in the same way you add any other new item. Then, for each one of your projects, you must add the Nuget package for Antlr4. If you want to manage options and, for instance disable the visitor/listener generation, you can see the official github project.

Create the Grammar

For our simple project we are going to create grammar that parses two lines of text that represents a chat between two people. This could be the basis for a chat program or for a game in which whoever says the shortest word get beaten up with a thesaurus. This is not relevant for the grammar itself, because it handles only the recognition of the various elements of the program. What you choose to do with these elements is managed through the normal code. Add a new ANTLR 4 Combined Grammar with the name Speak. You will see that there is already some text in the new file; delete all and replace it with the following text.

While you may create separate lexer and parser grammar, for a simple project you will want to use a combined grammar and put the parser before the lexer. That’s because as soon as antlr recognize a token in the lexer part, it stop searching. So it’s also important to put the more specific tokens first and then the generic ones, like WORD or ID later. In this example, if we had inverted SAYS and WORD, SAYS would have been hidden by WORD. Another thing to notice is that you can’t use fragments outside of lexer rules.

Having said that, the lexer part is pretty straightforward: we identify a SAYS, that could be written uppercase or lowercase, a WORD, that could be composed of any letter uppercase or lowercase and a NEWLINE. Any text that is WHITESPACE, space and tab, is simply ignored. While this is clearly a simple case, lexer rules will hardly be more complicated than this. Usually the worst thing that could happen is to have to use semantic predicates. These are essentially statement that evaluates to true or false, and in the case they are false they disable the following rule. For instance, you may want to use a ‘/’ as the beginning of a comment, only if it is the first character of a line, otherwise it should be considered an arithmetic operator.

The parser is usually where things gets more complicated, although that’s not the case this time. Every document given to a speak grammar must contain a chat, that in turn is equal to two line rules followed by a End Of File marker. The line must contain a name, the SAYS keyword and a word. Name and word are identical rules, but they have different names because they correspond to different concepts, and they could easily change in a real program.

Visiting the tree

Just like we have seen for Roslyn, ANTLR will automatically create a tree and base visitor (and/or listener). We can create our own visitor class and change what we need. Let’s see an example.

The first line shows how to create a class that inherit from the SpeakBaseVisitor class, that is automatically generated by ANTLR. If you need it, you could restrict the type, for instance for a calculator grammar you could use something like int or double. SpeakLine (not shown) is a custom class that contains two properties: Person and Text. The line 5 shows how to override the function to visit the specific type of node that you want, you just need to use the appropriate type for the context, that contains the information provided by the parser generated by ANTLR. At line 13 we return the SpeakLine object that we just created, this is unusual and it’s useful for the unit testing that we will create later. Usually you would want to return base.VisitLine(context) so that the visitor could continue its journey across the tree.

This code simply populate a list of SpeakLine that hold the name of the person and the word they have spoken. The Lines properties will be used by the main program.

Putting it all together

As you can see there is nothing particularly complicated. The lines 15-18 shows how to create the lexer and then create the tree. The subsequent lines show how to launch the visitor that you have created: you have to get the context for whichever starting rule you use, in our case chat, and the order to visit the tree from that node.

The program itself simply output the information contained in the tree. It would be trivial to modify the grammar program to allow infinite lines to be added, both the Visitor and the main Program would not need to be changed.

Unit testing

Testing is useful in all cases, but it is absolutely crucial when you are creating a grammar, to check that everything is working correctly. If you are creating a grammar for an existing language you probably want to check many working source file, but in any case you want to start with unit testing the single rules. Luckily since the creation of the Community edition of Visual Studio there is a free version of Visual Studio that including an unit testing framework. All you have to do is to create a new Test Project, add all the necessary nuget packages and add a reference to the project assembly you need to test.

There is nothing unexpected in this tests. One observation is that we can create a test to check the single line visitor or we can test the matching of the rule itself. You obviously should do both. You may wonder how the last test works, since we are trying to match a rule that doesn’t match, but we still get the correct type of context as a return value and some correct matching values. This happens because antlr is quite robust and there is only checking one rule. There are no alternatives and since it starts the correct way it is considered a match, although a partial one.


Integrating an ANTLR grammar in a C# project is quite easy with the provided Visual Studio extensions and nuget packages, making it the best way to quickly create parser for your DSL. No more piles of fragile RegEx(s), but don’t forget the tests.

Generate diagrams from C# source code using Roslyn

Representation of the world inspired by Matrix

The code for this post is on Github

Beyond the source code

Last week we have seen how to use Roslyn to rewrite source code to your liking. That’s all well and good, but it’s not the only thing you can do when you have a compiler open and ready to do your bidding. Another possibility is to leverage the knowledge that the compiler has, to support other tools that you use as a programmer, or that are needed by co-workers to simplify their job.

There is two great advantages to use the source code to support everything else:

  1. the source code become the truth, from which everything follow
  2. you can integrate the support for these tools into the processes of continuous integration that you already use

You may say that the point number 1 is already true in any case. But, even for open source software, how many are going to wade through hundreds of files to understand how to use the damn thing ? The reality is that if there is no documentation, it doesn’t exist for most people. Time is too much valuable to lose it behind other people’s code. And this doesn’t even count people that don’t understand code, but they need to know the feature of the software.

Roslyn doesn’t help just programmers

No, it’s true, Roslyn would not write documentation on its own, but it can be used to make it easier and even manage other structured information. In particular today we are talking about UML diagrams. The traditional way is to create them is by hand, which is prone to make them obsolete, or to use programs that reverse engineer the code itself, which is costly and not easily adaptable. Roslyn, instead, allows you to easily create diagrams, at least some kind of diagrams such as class diagrams. Another advantage is that by understanding the source code programmatically you can hide or shows information that are not needed by the reader. For instance, you can hide private properties and methods that the user of the library doesn’t need to know.

The plan

In short the idea is to create text files that are compatible with PlantUML for every class of our source code and then to use PlantUML to create the actual diagrams. In real life it would be trivial to then  create the diagrams programmatically, thanks to the command line and upload the images wherever you want. To generate class diagrams by leveraging the compiler is so easy because the compiler need to understand the source code and so every information is readily available to us. In fact, I didn’t even need to write much code since there is already a small library that does it: Ehi, we are programmers, we are lazy, we are smart enough to leverage existing resources.

We just need to understand how it works. It’s less than 300 lines of code, including comments, so we can delve right in.

Generating the diagram

See, I wasn’t kidding, it’s easy. All the information is readily available from the parser of Roslyn, we just need to take it. GetMembersModifierText (not shown) is simply a switch to associate every modifier keyword to its respesctive plantuml symbol, like SyntaxKind.PublicKeyword equals “+”.  Of course you need to learn the terminology, such as SyntaxKind or the names of the several *Syntax(s), but that isn’t really hard. The only thing slightly harder than a simple “copy value and write a string” is relative to properties, which are what the developers of .NET call “syntactic sugar”, that is to say a shortcut for programmers, that the compiler transform in real functions. Since they are not a standard feature of many languages you have to translate them for UML.

The main method

I don’t show the whole main method because it’s you typical console app: very simple. Since ClassDiagramGenerator is nothing more than a CSharpSyntaxWalker, we just need to gather the text, parse it, and give the order to visit the tree with our walker. The only things to notice are the starting and closing plantuml notation lines that we add to our generated files. Now you can use plantuml to create the diagrams.


Class diagram of ClassDiagramGenerator

Class Diagram generated by PlantUML

Using the source code as a source of intelligence about the code itself is not exactly a free lunch, but it’s quite there. You can write code and then automatically have it translated in a form that co-workers can understand, be them other programmers or something else. And you can integrate this information into the practices and tools that you already use, it’s a win-win. It’s true that in real life there is probably more setup, but the advantages are clear. The information is already there, now Roslyn make it easy accessible, why not use it ?

[1] I just added a few lines to include the relation between base and derived classes [^]

Getting started with Roslyn: transforming C# code

Getting started with Roslyn

Getting started with Roslyn on C#

The code for this post is on GitHub: getting-started-roslyn

Under the hood

Making a programming language actually useful is not simply about designing it well but it is also about providing supporting tools around the language: compilers, obviously, but also editors, build systems, etc.

There are few languages that give you tools to play under the hood. I am thinking about the Language Server Protocol for example. It permits to reuse parts of a compiler to get errors or the position of a definition. Roslyn is another example. Microsoft defined the idea behind it as “compiler as a service”, or more recently, a “platform”. Ok, what the hell does it mean?

Introduction to Roslyn

Using Roslyn you can access the inner workings of the compiler and use all its knowledge to create tools to boost your productivity or simplify your life. For instance, you could finally force everybody to respect the coding style of your project or extend the functionality of the IDE. A common example is to check the correctness of your Regex, while you are writing it, eliminating the need to run the program to check it.

You have it on Windows, Linux and Mac and works on .NET Core.

What we are going to do

In this post we are going to make sure that every int variable is initialized, and if it is already initialized, we make sure it is initialized to the value 42. It’s a simple example, but it will touch the three main areas of interest:

  1. syntax analysis
  2. semantic analysis
  3. syntax transformation

Believe it or not it will be even easy to understand!


We will create this example on Linux and using Visual Studio Code as an editor, but of course you could use whatever editor you want. Just make sure you install a recent version of .NET Core. Once you have done this, create a new project and open the file project.json. We have two things to do: add the dependencies needed for Roslyn and use a workaround to correct a bug; the fix is simply to add the value “portable-net45+win8+wp8+wpa81” to imports. After our edits we can restore the packages to check that everything works (ie. the bug is fixed).

The Main method

Let’s take a look at our Program.cs.  We skip CreateTestCompilation, for now, the only thing to notice is that if you wanted just to look at the SyntaxTree you wouldn’t need to compile anything, you could just build it with something as simple as CSharpSyntaxTree.ParseText(“Text to parse”).

We are looping through the source trees, the source files, and get the Semantic Model for everyone of them. This is needed to check the meaning of the code we are seeing.

In our example we have to be sure to initialize only integer variables and not, say, a string. Next, we are giving the semantic model to our InitializerRewriter and then we visit every node of the tree. InitializerRewriter is a kind of walker of the tree that can be used to modify the tree. More precisely, you can’t modify the original tree, but you can create a new one that is identical save for the nodes you have changed. In the end, we check if we have modified the original source and if that’s true we create a new source file. In real life you would rewrite the original one, but to ease tinkering we are creating a new one.

Programmatic compilation

I.e., where we show how you can give orders to your compiler.

CreateTestCompilation is fairly easy to understand: we need to compile the source files programmatically, and so we have to parse the text, gather the references to the assemblies needed for our program, and then give the order to compile.

Let’s initialize everything to 42

Because you know, why not?

InitializerRewriter is an implementation of the abstract class CSharpSyntaxRewriter that is used when you want to modify the tree, while CSharpSyntaxWalker is chosen when you just want to just walk through it. VisitVariableDeclaration is one of many functions that you can overwrite, specifically the one that is invoked whenever the walker hit a VariableDeclarationSyntax node. Of course you can also overwrite the generic Visit to get access to all nodes. SyntaxTrivia is all the things that are useful to humans and not the compiler, such as whitespace or comments.

The first thing to notice is the first condition of the first if, it checks whether the type of the node that we are visiting is a int. Since we are looking at the Symbol of the model the condition will be true even if the declaration is in the form “var a = 0”, that is to say we are not merely checking the syntax, but the semantic value. If the second condition is true, that is to say there isn’t an initializer, we create one and we set the value to 42. The second if checks whether there is an int variable that is initialized, but it isn’t initialized to 42. In that case we change the initialization to 42, again, technically we create a new one.


The practical steps to create an initializer are three:

  1. you create a new value, in our case a “42” with a leading space
  2. create a new assignment with that value
  3. use the assignment to replace the original initializer

We can’t create the expression directly, we have to use the factory. These steps are intuitive, if you have experience in compilers: first you create a value then an expression. But if you don’t have experience in compilers it may seem superfluous: why you can’t just assign the initializer to 42 ?

If you want to access the power of the compiler you have to understand how it thinks, how it have to manage every line of code youwrite. For a compiler there always many possibilities to consider and you have to help him narrow them down. For instance you may want to assign not a simple value, but another variable. If you understand this, three lines aren’t too much to ask to access such power.

You have also to remember that you can’t modify anything in the original tree. We create a new VariableDeclarationSyntax node with new variables, with the help of the WithVariables method.

You can now go back to Program.cs and add a simple variable declaration such as int one, two; or string three and see the new source files in the new_src folder. If you run the program, you will notice that it also changes var i = 0 in var i = 42, proving that it checks the results of the compilation and not merely the syntax and that compilation may not always do what you expect it to do.

Enjoy playing with Roslyn!

After many posts from Federico Tomassetti, this one is brought to you by Gabriele Tomassetti. Because programming is a family business.