An approach to UML diagrams and ER models bearable for a Software Engineer

As part of my current job at Groupon I have to create diagrams: those nice pictures which make project managers happy. I write basic UML diagrams (State diagrams and Activity diagrams) together with Entity-Relationship diagrams (yes, the ones for the DB).

FOyn3i8m34Ltd-AhUo_G0Q5I2R4XLHMpavesearAueAuFGa3a_Nz_9_aOrAEkgyBaJfT1DKrPsVTnbuJQlJotCLRGUTuYhnMH6mrH0n98fcm-v7Z1zLD3Cx3fGAdCcbaPSCfzrgYSelwK00Qd1Pd7mWUQJUhKwAophHhCqJFBu7EWcB07_uK3Vevl663lxkuiheisNWIegFuCN_n1G00

687474703a2f2f6275726e7473757368692e6e65742f73747566662f6572642d6578616d706c652d73696d706c652e706e67

Yes, people want these pictures and I have to create them

What Is Wrong with the Previous Process

I am a Software Engineer and I understand the importance of communication, therefore I understand how useful diagrams can be. However, I have to confess that I am always a bit suspicious when I interact with people too fond of them: I am always afraid of dealing with people who like to spend endless time discussing about things, pretending to be able to build something and generally just wasting time. I am an Engineer, I like building things, not just talking about building things.

On the other end system evolves and diagrams can easily end up being outdated. One thing that make this problem harder to fix is that normally you need some specific tool to create diagram, so when you need to update a diagram you have to install the right tool, start it, generate your image and update the document.

I do not like this particular process, I would really love to improve it.

My Current Process

I prefer text formats instead of nice WYSIWYG editors, that is because:

  • text is portable, while each WYSIWYG editors tend to have their own format
  • you can easily compare and merge text files
  • you do not waste endless time trying to convince the editor to do what you want or exploring all the menu items

So if I need to write diagrams I use text formats to describe them and then I generate the actual pictures from those text files. I feel the process is more controllable, repeatable and versionable and the other people get their pretty pictures.

Currently I am using plantuml for UML diagrams and erd for ER diagrams. Erd wins extra points because it is written in Haskell. There is also a nice website that offer a web editor for PlantUML: it is named PlantText.

Now, this solution has problems:

  • you need anyway to install software, at least for the ER diagrams (you can generate the UML diagrams using the planttext website).
  • there are not nice editors supporting the DSLs used to describe these diagrams
  • there is not integration between the web editor and my github repository
  • you need to update the images in the documents after having generated them

The Ideal Process

To solve the current process I would love to have a web application to edit the diagrams and have this web application able to talk with my GitHub repository, doing the versioning for me. I would like also this web applications to generate the images on the fly and my documents to support links to images exposed on the web. That would be great for two reasons:

  1. I would not have to update all the documents containing a diagram when the diagram change. The problem is that many documents take a copy of the image, not a reference. The other problem is that the server with diagrams need to be always up.
  2. I would know where to find the source of a diagram. I imagine for example that we could have an image available at, let’s say, http://diagrams.foo.com/diagram1.png and the web application to edit it at http://diagrams.foo.com/diagram1.png/edit.

It would be fantastic to have a process to commit changes and a git hook to generate the images, maybe even updating the existing documents.

What I Started Doing: Syntax Highlighting for PlantUML

Now, I am still far away to have the ideal process in place and probably I will never be there: the effort for implementing it and the changes required to the current process would not justify it. However I am starting doing some steps in that direction. In particular I am focusing on improving the web editor for UML by implementing syntax highlighting.

I have implemented Syntax Highlighting for a large part of PlantUML for the CodeMirror web editor. The code is available on GitHub and I have sent a pull request to plantuml-server.

Writing the Syntax Highlighting mode could resemble writing a grammar, in fact my first thought was writing the grammar for ANTLR and then implement an automatic conversion from EBNF to a CodeMirror mode. However the goal of a grammar and Syntax Highlighting systems are different. The former is intended to parse correct files and stop when it finds errors (only very good grammars have strong error handling and are able to overcome a few errors) while a system for syntax highlighting works on a document that is wrong all the time: as you type the document is incorrect, only when you complete your statement the document is correct until you start typing the next character and the document is wrong again. Syntax highlighting system need to be very robust and tolerate a lot of errors.

This is a random piece of the mode I have defined.

  } else if (state.name === "stereotype"){                
    if (stream.match(/\(/)) {
        state.name = "stereotype style"
        return null;
    }
    if (stream.match(/[A-Za-z][A-Za-z_0-9]*/)) {
        return "variable";
    }
    if (stream.match(/>>/)) {
        state.name = state.old_state;
        return null;
    }
    if (stream.match(/,/)) { return null; }             
} else if (state.name === "stereotype style"){
    if (stream.match(/\)/)) {
        state.name = "stereotype"
        return null;
    }
    if (stream.match(/[A-Z]/)) {
        return "string";
    }   
    if (matchColors(stream, state)){
        return "atom";
    }
    if (stream.match(/[,]+/)) {
        return null;
    }      
} else if (state.name === "class def"){
    if (stream.match(/[\t ]+/)) {
        return null;
    }               
    if (stream.match(/\}/)) {
        state.name = "base";
        return "bracket";
    }
    if (stream.match(/\.\./)) { 
        state.name = "class def section";
        return "operator"; 
    }                                                                                    
    if (stream.match(/==/)) { 
        state.name = "class def section";
        return "operator"; 
    }      
    if (stream.match(/--/)) { 
        state.name = "class def section";
        return "operator"; 
    }
    if (stream.match(/__/)) { 
        state.name = "class def section";
        return "operator"; 
    }
    if (stream.match(/\+|-|#|~/)) {
        if (isMethod(stream)) {                  
           state.name = "class def method";
        } else {
            if (hasTypeAfter(stream)) {
                state.name = "class def attribute (type after)";
            } else {
                state.name = "class def attribute";
            }
        }
        return "attribute";
    }

 

Now, the basic idea is that you have a state machine where your states start from start and go through things like class def or stereotype style. Depending on the state you interpret tokens differently. Now, the point is that you should keep the number of states very limited. Remember, you want your Syntax Highlighting system to be robust and to provide some reasonable output as the user type. So your parser would not be as refined as the parser you would write for a compiler. You will end up instead having a few states, so few that could make sense to define them manually (no need for parser generators) and they should have human-comprehensible meanings instead of using parser generators as we would do in a compiler.

Note that CodeMirror provides also a library to test tour mode, and I really appreciate that. These are a few of mine tests:

MT("static class methods",
    "[keyword class] [def Car] [bracket {]",
    "  [operator ..][string Method Examples ][operator ..]",
    "  [attribute +][keyword {static}] [def Name][operator ():] [variable Type] [bracket {] [variable arg1], [variable arg2], [variable argn] [bracket }]"
  );     

  MT("abstract class methods",
    "[keyword class] [def Car] [bracket {]",
    "  [operator ..][string Method Examples ][operator ..]",
    "  [attribute +][keyword {abstract}] [def Name][operator ():] [variable Type] [bracket {] [variable arg1], [variable arg2], [variable argn] [bracket }]"
  );

  MT("interfaces examples",
    "[keyword class] [def Car]",
    "[variable ICar] [operator ()-] [variable Car]",
    "[variable ICar2] [operator ()--] [variable Car]",
    "[variable Car] [operator -()] [variable ICar3]"
  );

  MT("node package",
    "[keyword package] [def Node] <<[builtin Node]>> [bracket {]",
    "    [keyword class] [def Worker1]",
    "[bracket }]"
  );

Consider the first test: it says that the first word (class) should be recognized as a keyword while the second (car) as definition (or def).

The only problem with writing this code is that the plantuml grammar is… suboptimal. It is used for a lot of different types of diagrams and it is not so clear to me. I would definitely not suggest it as an example of a well designed DSL.

What I Want to Do in the Future

Once I am finished with the syntax highlighting I want to implement the auto-completion. This would make much easier for me to write UML diagrams: currently I have always to look up at examples to figure out how to do things. Some support from the editor would help greatly. It would be fantastic to have also error reporting as you type but that could be a bit more complicate to build.

The next step is to write a web application around the erd program. I started creating the project (erd-web-server), let’s see when I can find the time for playing a little more with Haskell…

Once I have done that I would work on the GitHub ingreation. I would like to access diagrams in my projects and to generate the images as part of a git web hook.

So there is plenty of room for improvements and also an engineer can have fun with diagrams, especially building the tool chain around them.