The code for this post in on GitHub: generate-diagrams-from-source-code

Beyond the source code

Last week we have seen how to use Roslyn to rewrite source code to your liking. That’s all well and good, but it’s not the only thing you can do when you have a compiler open and ready to do your bidding. Another possibility is to leverage the knowledge that the compiler has, to support other tools that you use as a programmer, or that are needed by co-workers to simplify their job.

There is two great advantages to use the source code to support everything else:

  1. the source code become the truth, from which everything follow
  2. you can integrate the support for these tools into the processes of continuous integration that you already use

You may say that the point number 1 is already true in any case. But, even for open source software, how many are going to wade through hundreds of files to understand how to use the damn thing ? The reality is that if there is no documentation, it doesn’t exist for most people. Time is too much valuable to lose it behind other people’s code. And this doesn’t even count people that don’t understand code, but they need to know the feature of the software.

Roslyn doesn’t help just programmers

No, it’s true, Roslyn would not write documentation on its own, but it can be used to make it easier and even manage other structured information. In particular today we are talking about UML diagrams. The traditional way is to create them is by hand, which is prone to make them obsolete, or to use programs that reverse engineer the code itself, which is costly and not easily adaptable. Roslyn, instead, allows you to easily create diagrams, at least some kind of diagrams such as class diagrams. Another advantage is that by understanding the source code programmatically you can hide or shows information that are not needed by the reader. For instance, you can hide private properties and methods that the user of the library doesn’t need to know.

The plan

In short the idea is to create text files that are compatible with PlantUML for every class of our source code and then to use PlantUML to create the actual diagrams. In real life it would be trivial to then  create the diagrams programmatically, thanks to the command line and upload the images wherever you want. To generate class diagrams by leveraging the compiler is so easy because the compiler need to understand the source code and so every information is readily available to us. In fact, I didn’t even need to write much code since there is already a small library that does it: https://github.com/pierre3/PlantUmlClassDiagramGenerator1. Ehi, we are programmers, we are lazy, we are smart enough to leverage existing resources.

We just need to understand how it works. It’s less than 300 lines of code, including comments, so we can delve right in.

Generating the diagram

public class ClassDiagramGenerator : CSharpSyntaxWalker
{
    private TextWriter writer;
    private string indent;
    private int nestingDepth = 0;
  
    [...]
    // the most complicated method of this class
    public override void VisitPropertyDeclaration(PropertyDeclarationSyntax node)
    {
        var modifiers = GetMemberModifiersText(node.Modifiers);
        var name = node.Identifier.ToString();
        var typeName = node.Type.ToString();
        var accessor = node.AccessorList.Accessors
           .Where(x => !x.Modifiers.Select(y => y.Kind()).Contains(SyntaxKind.PrivateKeyword))
           .Select(x => $"<<{(x.Modifiers.ToString() == "" ? "" : (x.Modifiers.ToString() + " "))}{x.Keyword}>>");

        var useLiteralInit = node.Initializer?.Value?.Kind().ToString().EndsWith("LiteralExpression") ?? false;
        var initValue = useLiteralInit ? (" = " + node.Initializer.Value.ToString()) : "";

        WriteLine($"{modifiers}{name} : {typeName} {string.Join(" ", accessor)}{initValue}");
    }
 
    [...]

See, I wasn’t kidding, it’s easy. All the information is readily available from the parser of Roslyn, we just need to take it. GetMembersModifierText (not shown) is simply a switch to associate every modifier keyword to its respesctive plantuml symbol, like SyntaxKind.PublicKeyword equals “+”.  Of course you need to learn the terminology, such as SyntaxKind or the names of the several *Syntax(s), but that isn’t really hard. The only thing slightly harder than a simple “copy value and write a string” is relative to properties, which are what the developers of .NET call “syntactic sugar”, that is to say a shortcut for programmers, that the compiler transform in real functions. Since they are not a standard feature of many languages you have to translate them for UML.

The main method

        [...]
        
        foreach (var file in files)
        {
           Console.WriteLine($"Generation PlantUML text for {file}...");
           string outputFile = Path.Combine(outputDir, Path.GetFileNameWithoutExtension(file));
          
           using (var stream = new FileStream(file, FileMode.Open, FileAccess.Read))
           {
                var tree = CSharpSyntaxTree.ParseText(SourceText.From(stream));
                var root = tree.GetRoot();

                using (var writer = new StreamWriter(new FileStream(outputFile + ".ClassDiagram.plantuml", FileMode.OpenOrCreate, FileAccess.Write)))
                {
                    writer.WriteLine("@startuml");

                    var gen = new ClassDiagramGenerator(writer, "    ");
                    gen.Visit(root);

                    writer.Write("@enduml");
                }                 
            }
        }

        [...]

I don’t show the whole main method because it’s you typical console app: very simple. Since ClassDiagramGenerator is nothing more than a CSharpSyntaxWalker, we just need to gather the text, parse it, and give the order to visit the tree with our walker. The only things to notice are the starting and closing plantuml notation lines that we add to our generated files. Now you can use plantuml to create the diagrams.

Conclusion

Class diagram of ClassDiagramGenerator
Class Diagram generated by PlantUML

Using the source code as a source of intelligence about the code itself is not exactly a free lunch, but it’s quite there. You can write code and then automatically have it translated in a form that co-workers can understand, be them other programmers or something else. And you can integrate this information into the practices and tools that you already use, it’s a win-win. It’s true that in real life there is probably more setup, but the advantages are clear. The information is already there, now Roslyn make it easy accessible, why not use it ?


[1] I just added a few lines to include the relation between base and derived classes [^]

Learn ANTLR by example: 4 complex grammars explained

Get these lessons in your inbox along with additional tips on parsing and language engineering

Powered by ConvertKit