An approach to UML Diagrams and ER Models Bearable for a Software Engineer

An approach to UML diagrams and ER models bearable for a Software Engineer

As part of my current job at Groupon I have to create diagrams: those nice pictures which make project managers happy. I write basic UML diagrams (State diagrams and Activity diagrams) together with Entity-Relationship diagrams (yes, the ones for the DB).



Yes, people want these pictures and I have to create them

What Is Wrong with the Previous Process

I am a Software Engineer and I understand the importance of communication, therefore I understand how useful diagrams can be. However, I have to confess that I am always a bit suspicious when I interact with people too fond of them: I am always afraid of dealing with people who like to spend endless time discussing about things, pretending to be able to build something and generally just wasting time. I am an Engineer, I like building things, not just talking about building things.

On the other end system evolves and diagrams can easily end up being outdated. One thing that make this problem harder to fix is that normally you need some specific tool to create diagram, so when you need to update a diagram you have to install the right tool, start it, generate your image and update the document.

I do not like this particular process, I would really love to improve it.

My Current Process

I prefer text formats instead of nice WYSIWYG editors, that is because:

  • text is portable, while each WYSIWYG editors tend to have their own format
  • you can easily compare and merge text files
  • you do not waste endless time trying to convince the editor to do what you want or exploring all the menu items

So if I need to write diagrams I use text formats to describe them and then I generate the actual pictures from those text files. I feel the process is more controllable, repeatable and versionable and the other people get their pretty pictures.

Currently I am using plantuml for UML diagrams and erd for ER diagrams. Erd wins extra points because it is written in Haskell. There is also a nice website that offer a web editor for PlantUML: it is named PlantText.

Now, this solution has problems:

  • you need anyway to install software, at least for the ER diagrams (you can generate the UML diagrams using the planttext website).
  • there are not nice editors supporting the DSLs used to describe these diagrams
  • there is not integration between the web editor and my github repository
  • you need to update the images in the documents after having generated them

The Ideal Process

To solve the current process I would love to have a web application to edit the diagrams and have this web application able to talk with my GitHub repository, doing the versioning for me. I would like also this web applications to generate the images on the fly and my documents to support links to images exposed on the web. That would be great for two reasons:

  1. I would not have to update all the documents containing a diagram when the diagram change. The problem is that many documents take a copy of the image, not a reference. The other problem is that the server with diagrams need to be always up.
  2. I would know where to find the source of a diagram. I imagine for example that we could have an image available at, let’s say, and the web application to edit it at

It would be fantastic to have a process to commit changes and a git hook to generate the images, maybe even updating the existing documents.

What I Started Doing: Syntax Highlighting for PlantUML

Now, I am still far away to have the ideal process in place and probably I will never be there: the effort for implementing it and the changes required to the current process would not justify it. However I am starting doing some steps in that direction. In particular I am focusing on improving the web editor for UML by implementing syntax highlighting.

I have implemented Syntax Highlighting for a large part of PlantUML for the CodeMirror web editor. The code is available on GitHub and I have sent a pull request to plantuml-server.

Writing the Syntax Highlighting mode could resemble writing a grammar, in fact my first thought was writing the grammar for ANTLR and then implement an automatic conversion from EBNF to a CodeMirror mode. However the goal of a grammar and Syntax Highlighting systems are different. The former is intended to parse correct files and stop when it finds errors (only very good grammars have strong error handling and are able to overcome a few errors) while a system for syntax highlighting works on a document that is wrong all the time: as you type the document is incorrect, only when you complete your statement the document is correct until you start typing the next character and the document is wrong again. Syntax highlighting system need to be very robust and tolerate a lot of errors.

This is a random piece of the mode I have defined.

  } else if ( === "stereotype"){                
    if (stream.match(/\(/)) { = "stereotype style"
        return null;
    if (stream.match(/[A-Za-z][A-Za-z_0-9]*/)) {
        return "variable";
    if (stream.match(/>>/)) { = state.old_state;
        return null;
    if (stream.match(/,/)) { return null; }             
} else if ( === "stereotype style"){
    if (stream.match(/\)/)) { = "stereotype"
        return null;
    if (stream.match(/[A-Z]/)) {
        return "string";
    if (matchColors(stream, state)){
        return "atom";
    if (stream.match(/[,]+/)) {
        return null;
} else if ( === "class def"){
    if (stream.match(/[\t ]+/)) {
        return null;
    if (stream.match(/\}/)) { = "base";
        return "bracket";
    if (stream.match(/\.\./)) { = "class def section";
        return "operator"; 
    if (stream.match(/==/)) { = "class def section";
        return "operator"; 
    if (stream.match(/--/)) { = "class def section";
        return "operator"; 
    if (stream.match(/__/)) { = "class def section";
        return "operator"; 
    if (stream.match(/\+|-|#|~/)) {
        if (isMethod(stream)) {                  
  = "class def method";
        } else {
            if (hasTypeAfter(stream)) {
       = "class def attribute (type after)";
            } else {
       = "class def attribute";
        return "attribute";


Now, the basic idea is that you have a state machine where your states start from start and go through things like class def or stereotype style. Depending on the state you interpret tokens differently. Now, the point is that you should keep the number of states very limited. Remember, you want your Syntax Highlighting system to be robust and to provide some reasonable output as the user type. So your parser would not be as refined as the parser you would write for a compiler. You will end up instead having a few states, so few that could make sense to define them manually (no need for parser generators) and they should have human-comprehensible meanings instead of using parser generators as we would do in a compiler.

Note that CodeMirror provides also a library to test tour mode, and I really appreciate that. These are a few of mine tests:

MT("static class methods",
    "[keyword class] [def Car] [bracket {]",
    "  [operator ..][string Method Examples ][operator ..]",
    "  [attribute +][keyword {static}] [def Name][operator ():] [variable Type] [bracket {] [variable arg1], [variable arg2], [variable argn] [bracket }]"

  MT("abstract class methods",
    "[keyword class] [def Car] [bracket {]",
    "  [operator ..][string Method Examples ][operator ..]",
    "  [attribute +][keyword {abstract}] [def Name][operator ():] [variable Type] [bracket {] [variable arg1], [variable arg2], [variable argn] [bracket }]"

  MT("interfaces examples",
    "[keyword class] [def Car]",
    "[variable ICar] [operator ()-] [variable Car]",
    "[variable ICar2] [operator ()--] [variable Car]",
    "[variable Car] [operator -()] [variable ICar3]"

  MT("node package",
    "[keyword package] [def Node] <<[builtin Node]>> [bracket {]",
    "    [keyword class] [def Worker1]",
    "[bracket }]"

Consider the first test: it says that the first word (class) should be recognized as a keyword while the second (car) as definition (or def).

The only problem with writing this code is that the plantuml grammar is… suboptimal. It is used for a lot of different types of diagrams and it is not so clear to me. I would definitely not suggest it as an example of a well designed DSL.

What I Want to Do in the Future

Once I am finished with the syntax highlighting I want to implement the auto-completion. This would make much easier for me to write UML diagrams: currently I have always to look up at examples to figure out how to do things. Some support from the editor would help greatly. It would be fantastic to have also error reporting as you type but that could be a bit more complicate to build.

The next step is to write a web application around the erd program. I started creating the project (erd-web-server), let’s see when I can find the time for playing a little more with Haskell…

Once I have done that I would work on the GitHub ingreation. I would like to access diagrams in my projects and to generate the images as part of a git web hook.

So there is plenty of room for improvements and also an engineer can have fun with diagrams, especially building the tool chain around them.

Download the guide with 68 resources on Creating Programming Languages


Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
11 replies
  1. Nicholas Smith says:

    Common problem.

    As an engineer I’m going to look at the code if I can, digrams can give a highlevel view; tend not to trust them as they easily miss information and can often be out of date.

    Then I’d ask the question does the PM or whomever actually want UML? Do they just want to tell and story and they’re just use to being given that kind of diagram by techies.

    Just this week I was asked this week by a PM, to tidy up what was a pretty awful diagram. Looking for a better way I actually found the SmartArt in MS Word of all places.

    Fill up your diagram with information then you can select from nearly 50 different ways to present / slice the information. Not all will work, but perhaps a few will.

    He loved it, all he actually wants is something pretty to put on a slide that he can talk over to the stake holders.

    Makes me wonder if there is a mainstream role for UML anymore.

  2. Federico says:

    @Nicholas I agree that often diagrams are out of date, so people do not trust them, not maintain them and they become more out of date. I think that in some cases we could help if the process for updating them would be easier. Ideally I would like to be able to click on a diagram and being redirected to an editor where I can change the diagram on the spot.

    Of course it works only for useful diagrams, not for diagrams written just to make a PM happy. In my case ER diagrams and sequence diagrams are pretty useful and we use them to support discussions with other engineers. In other cases I had to built totally useless and trivial state diagrams only because “required”: they had no real values and no one is going to maintain them

    @Angelo thank you for pointing me to this plugin. I have not used NetBeans in a while but I could take a look to take “inspiration” and maybe improve my web-browser based syntax highlighter

  3. Samuel says:

    I think wikis are an ideal platform for this kind of documentation. At least Confluence and Dokuwiki have PlantUML plugins which means that you don’t have to generate an image, and the PlantUML markup is right in the document and versioned with the text. It works pretty well for us. I know Dokuwiki doesn’t have highlighting or autocompletion because its text editor is very basic, but for me the image generation and the requirement to install a tool was the biggest pain point in your old process. If creating the diagram takes a little bit more time because no syntax highlighting, so be it.

  4. Federico Tomassetti says:

    Hi Samuel, I did not know about that. Indeed it seems a great approach. The only advantage I can see in having standalone scripts if you want to reuse them in other places or if you want to generate code from them (I would generate code from DSLs, note from UML diagrams, but that is just my personal taste).

    About syntax highlighting, yes there are things definitely more important. However I find the PlantUML grammar not so intuitive so it helps me having some help from the editor.

    Thank you for your comment, I learn something!

  5. says:

    I use puml-mode for emacs at work. It works great and displays the rendered diagram every time you save the file. I’m also planning to parse the diagrams from in source documentation to allow pull request reviewers to enforce keeping the documentation up to date.

  6. Andre says:

    “The source code is the documentation” but UML is good on architecture/higher level planning/drafting. Many documentation generators (autodoc, doxygen, …) can create UML(-like) diagrams on class level and on the fly today (without much use to me). For (quality) oversight there are other tools too, e.g., the dependency structure matrix (lattix, pfff, intellij, …).

    Handcrafted UML is useful before any code, to gain a good understanding of a problem domain as requisite to a good solution (“problem before solution”), that is to say, UML can work as a learning tool – like “sketchnoting” for the non-UML-folk, when working through books or lectures, talking with experts etc.

    A (semi)formal visual language is more scannable and reveal knowledge gaps better than linear text masses, you have to be concise etc. (Abusing) role names in association relationships can improve expressivity. In this process (giving structure to your knowledge) you learn/remember things better.

    Codewise: 10 years ago or so I actually drew UML diagrams for my classes, later I only documented “patterns” or the principles. IIRC even Grady Booch said that when a project is finished you have to throw all your UML away. Today, I only document high level decisions in a (meta) architecture diagram, the architecture language.

    UML is not a silver bullet. There are other notations like ISTAR (i*) and projects need more things bespoken/documented, e.g., priorities and risks (I use matrices). But UML (or visual drafting and documentation on the right level) should feel helpful, you dont have to force it upon people.

Trackbacks & Pingbacks

  1. […] your page. I have used myself both CodeMirror and ACE in the past. For example I wrote a plugin for CodeMirror to support PlantUML. However there is an issue with these editors: they are difficult to extend and difficult to […]

  2. […] Last week I started discussing how I am working on improving my approach to diagrams generation, so that it could become acceptable for a Software Engineer. […]

Comments are closed.