News, guides, tutorials and the bigger picture of creating domain specific languages

The complete guide to (external) Domain Specific Languages

The complete guide to Domain Specific Languages
This guide will show you:

  • the what: after a definition we will look into 19 examples of DSLs
  • the why: what are the concrete benefits you can achieve using DSLs
  • the how: we will discuss the different ways to build a DSLs and what are the success factors

After that you will get a list of resources to learn even more: books, websites, papers, screencasts.
This is the most complete resource on Domain Specific Languages out there. I hope you will enjoy it!

Just one thing: here we try to be practical and understandable. If you are looking for formal definitions and theoretical discussions this is not the right place.

DSL Checklist: 7 cases in which you should use a DSL

Screen_shot_2017-03-21_at_17.10.01

Receive the checklist by email and get more tips on DSLs

Powered by ConvertKit

What are Domain Specific Languages?

Domain Specific Languages are languages created to support a particular set of tasks, as they are performed in a specific domain.

You could be familiar with the typical programming languages (a.k.a. General Programming Languages or GPLs). They are tools good enough to create all sort of programs, but not really specific to anything. They are like hammers: good enough for many tasks, if you have the patience and ability to adapt them, but in most cases you would be better off using a more specific tool. You can open a beer with an hammer, it is just way more difficult, risky and lead to poorer results that using a specific tool like a bottle opener.

Languages are tools to solve problems and Domain Specific Languages are specific tools, good to solve a limited set of problems.

Ok, but what does it mean in practice? How do DSLs look like? Let’s see plenty of examples.

19 Examples of Domain Specific Languages

Domain Specific Languages can serve all sort of purposes. They can be used in different contexts and by different kinds of users. Some DSLs are intended to be used by programmers, and therefore are more technical, while others are intended to be used by someone who is not a programmer and therefore they use less geeky concepts and syntax.

Domain Specific Languages can be extremely specific and being created only to be used within a company. I have built several of this kind of DSLs myself, but I am not allowed to share them. I can instead list several examples of public DSLs which are used by millions of persons.

1. DOT – A DSL to define graphs

DOT is a language that can describe graphs, either directed or non directed.

From this description images representing these graphs can be generated. To do that you use a program named graphviz, which works with the DOT language. From the previous example you would get this:

An image generated using the dot DSL

An image generated using the dot DSL

The language permits also to define the shape of the nodes, their colors and many other characteristics. But the basics are pretty simple and almost everyone can learn how to use it in a matter of minutes.

2. PlantUML – A DSL to draw UML diagrams

PlantUML Can be used to define UML diagrams of different kinds. For example we can define a sequence diagram.

From this definition we can get a picture.

plantuml dsl

Image generated with the PlantUML DSL

With a similar syntax different kinds of diagrams can be defined like class diagrams or use case diagrams. Using a textual DSL to define UML diagrams have several advantages: it is easier to version, and it can be modified by everyone without special tools. A DSL like this one could be used during a meeting to support a discussion: as the participants argue different diagrams can be quickly defined and the corresponding images generated with a click.

3. Sed – A DSL to define text transformation

On UNIX-like operating systems (Linux, Mac, BSD, etc.) there are a set of command line tools, each one accepting instructions in their own format. This format can be considered a DSL that permits to specify the tasks to be executed. For example sed executes the text transformations indicated using its own DSL.

Do you want to replace the word “Jack” with the word “John”?

Or do you want to delete all the lines of a file from line 10 until the word “stophere” is found?

The technical level necessary to become proficient with this kind of DSLs is elevated. However many advanced computer users could learn the basic to execute common operation on files. Small things that they could need to do every day and that currently are doing manually. For example, you could have e-mail templates containing placeholders like “{FIRST_NAME}” and replace them with the proper test with one of this commands.

4. Gawk – A DSL to print and process text

Like sed, gawk is another UNIX utility accepting commands in its own language. For example you could print all the lines of a given file which are longer than 80 characters:

Or count the lines in a file:

The UNIX philosophy is to use several or these little utilities and combine them to perform the most amazing and complex tasks. For example you could take an input file, transform it with sed and then print selected parts using gawk.

5. Gherkin – A DSL to define functional tests

Gherkin is a DSL for defining functional tests. It has a very flexible syntax that makes it look almost like free text. Basically developers, analysts and clients can sit around a table and define some scenarios. These scenarios will be then executable as tests, to verify whether the application meet the expectations.
Here it is how we could define the expectations for withdrawing from an ATM:

I really like this DSL because the bar for using it is very low. This DSL however requires a developer to define some code using a GPL. How it works in practice is that a developer define specific commands like: “{name} has {amount}$ on his account” and define the code that execute this command in the GPL chosen for the project (Ruby, Java, or others are supported). Once the developers have created these commands, specific to the application of interest, all users can use them while defining their functional tests. It is also possible to start in the other way: first you write your scenarios, as you want, trying to capture the requirements and only later developers map each command to a corresponding function in a GPL.

In other words, this DSL is great for hiding the real code behind a surface that everyone can understand and everyone can contribute to. It is much better to sit at a table and discuss with a bank representative using the example we have displayed than showing him the hundreds of lines of Java which correspond to those commands, right?

6. Website-spec – A DSL for functional web testing

Gherkin is not the only DSL used to define tests. Website-spec can be used to define functional tests specific for web applications.
Here we define how to navigate on a certain website and what we expect to find.

In this case there is no need for a developer to define the translation of commands to a GPL because this language users domain specific commands, like “Click”, which the interpreter knows how to execute. With a minimal training now anyone can describe a sispecific interaction with a website and the expected results. Pretty neat, eh?

7. SQL – databases

You have probably heard of SQL. It is a language used to define how to insert, modify or extract data from a relational database. Let’s get some stats from the STATS table:

For sure you do not expect the average Joe to be able to write complex queries: SQL is not a trivial language and it requires some time to be mastered. However you do not need to be trained as a developer to learn SQL. Indeed many DBAs are not developers. Maybe Joe should not be trusted with writing access to the database, but he could get read access and write simple queries to answer his own questions instead of having to ask someone and wait to get an answer. Suppose he needs to know the maximum temperature in august in Atlanta:

Maybe Joe will never reach the level of a DBA, but he can learn a few basic queries and adapt them to his needs, making him more independent and letting his colleagues focus on their job instead of helping him out.

8. HTML – web layout

I really hope you have heard of this quite successful language to define documents. It is amazing to think that we could have defined HTML pages 20 years ago, when most people had desktop computers attached to monitors with a resolution of 640×480 pixels and now those some pages can be rendered on the browser running on our smartphones. I guess it is a good example of what can be achieved with DSLs.

Note that HTML is really about defining documents: their structure and the information they contain. The same document then it is rendered differently on a desktop computer, a tablet or a smartphone. The same document is consumed differently from people with disabilities. Specific browser for people with impaired sight help them consume a document defined with HTML by reading the content and support navigation to the different sections of the document.

9. CSS – style

The Cascading Style Sheet language defines the style to use to visualize a document. We can use it to define how an HTML document will appear on the screen or how it will appear when printed.

CSS is a not trivial to master but many persons, with basic or no knowledge of programming, can use it to change the appearance of a web page. This DSL has played an important role in democratizing web design.

10. XML – data encoding

Some years ago XML used to seem the solution to all problems in IT. Now the hype is long gone, but XML is here to stay. A solid DSL to represent data, and a quite flexible one.

While it is not the most readable or impressive language everyone is able to modify the data contained in an XML file.

11. UML – visual modeling

Not all DSLs have to be textual! Languages can be also graphical. For example the Unified Modeling Language is a language with well defined rules. It can be used to define diagrams that are typically used to support discussions. Someone also uses them to generate code or even to define an entire application (look for Model Driven Architecture, if you are interested in this kind of stuff).

UML: an example of DSL

UML: an example of DSL

UML is a vast language (someone said bloated?). There are many different kinds of different diagrams comprised after the UML umbrella. All of them share some commonalities.
Not everyone would agree that UML is a DSL. While it is definitely a language, someone would say it is not domain specific, but generic instead. Domain specificities can be added by means of UML profiles. I consider it instead a language specific to modeling. Now, this is one case that demonstrates which there are not hard and easy rule to define what is a DSL and what is not, mostly because domain is a difficult term to define.

12. VHDL – hardware design

VHDL is a DSL used to define circuits. Once upon the time electronic engineers used to design complex systems directly deciding which gates to use and how to wire them together. VHDL changed all of this, providing higher level concepts that those engineers can used to define their systems.

Example taken from Wikipedia.

There are tools able to process these definitions to derive actual circuit layouts, ready to be printed. Verilog is another DSL similar to VHDL.

13. ANTLR – lexer and parser definitions

ANTLR comes with its own DSL to define lexer and parser grammars. Those are instructions for recognizing the structure of a piece of text.
For example this is a small snippet of a lexer grammar:

JavaCC, Lex, Yacc and Bison are similar tools and all come with their slightly different DSL, inspired by the Backus-Naur form.

14. Make – build system

Make is a language to describe how to build something and the dependencies between different steps. For example you can define how to generate an executable and specifying that to do that you will first need 3 object files. Then you can define for each of those object files how to obtain it from a corresponding source file.

In this example we specify that to create the program myExecutable we will need the object files, and once we have them, we will use gcc to link them together.
We can also define some constants at the top of the file, so it is easy to change the Makefile later, if we need it.

15. Latex – document layout

Latex is used a lot, in the academy and in the publishing industry, to produce gorgeous looking papers and books.

Once you have described a document in this format you typically generate a PDF.

It is quite nice because it can handle references to figure or tables, he can automatically numerate those, it let you control the layout of tables in very complex ways. If you are proficient with LaTeX you can get pretty nice results.

16. OCL – model constraints

OCL stands for Object Constraint Language and it can be used to define additional constraints on objects. Typically it is used together with UML.
For example, if you have a class Appointment with two properties start and end you may want to specify that the end of the appointment follow its start:

You can also define preconditions and postconditions for your operations or invariants that apply to classes.
If you are interested in this sort of stuff you may also want to look into QVT, a set of languages to define model transformations.

17. XPath – XML nodes selection

XPath can be used to select nodes into XML documents. For example, suppose you have a document representing a list of restaurants and you want to get the last restaurant:

XPath expressions are used into XSLT to define which elements to transform or they can be used in combination with many libraries for all sort of languages to define which elements to extract from a document. If you wonder what XSLT is, it is a language to define transformations of XML documents.

18. BPEL – Business processes

BPEL is a language to define the collaboration between web services to implement business processes. It used to be more popular when the world was going through its Service Oriented Architecture (SOA) phase.

The goal of this language is to permit to software architects, or even to analysts, to combine different web-services, or other components, to obtain complex systems.

Example from Eclipse BPE

Example from Eclipse BPEL (https://eclipse.org/bpel/)

There are different implementation of the language, each one with extensions and mostly incompatible one with the other.

19. Actulus Modeling Language – A DSL to calculate life insurance and pensions

This example is taken from the paper “An Actuarial Programming Language for Life Insurance and Pensions” by David Christiansen et al.
This DSL is listed in the Financial Domain-Specific Language Listing where you can find many more similar examples.


So what can we use DSLs for?

After looking at these examples we can derive that DSLs can be used for a variety of goals:

  • define commands to be executed: like sed or gawk
  • describe documents or some of their specific aspects: like html, latex or CSS
  • define rules or processes: like BPMN or Actulus

These are just some typical usages, but DSLs can be used for so many other reasons.

It is also important to notice that DSLs focus on one specific aspect of a system and often it makes sense to combine several of them to describe the different facets of a product. For example HTML and CSS are used together to describe the content and the style or a document, XML and XPath are used to define data and how to traverse that data. OCL is used to define constraints on UML models.

You could think of designing not one DSL but a family or interrelated DSLs.

Ok, at this point you should have some understanding of how a DSL can look like and what it can be used for. Now let’s see why you should use one and then how to build one.

DSLs vs GPLS: 5 Advantages of using Domain Specific Languages

You could be asking yourself the following question:

Why using a specific, limited language instead of a generic, powerful one?

The short answer is that Domain Specific Languages are limited in the things they can do, but because of their specialization they can do much more in their own limited domain.
The 5 Advantages of a DSL over a GPL: Easier to analyze, Safer, More meaningful errors, Easier to port and Easier to learn
Let’s be concrete and see five separate advantages:

  1. We can analyze them much better: while it is impossible in practice to guarantee that a program written in C or in Java will respect certain characteristics (like not ending in an infinite loop) we can perform all sort of analyses when we use DSLs. Precisely because they are limited in what they can do they are easier to analyze.
  2. They are more safe. Less things can possibly go wrong when using a DSL. When is the last time you had a Null Pointer Exception when working with HTML or SQL? Exactly, never. This is very important if we are doing something critical like dealing with the health of someone or his money.
  3. When there are errors those are errors specific to the domain, so that they are easier to understand. They are domain specific: so errors are not about null pointers, they are about things that a domain expert can understand.
  4. It also means that the interpretation is easier, so bringing them to a new platform is easy. The same applies to simulators. The same HTML documents we could open on a PDA in 2000 can now be open on an iPad pro.
  5. We can teach them more easily: they are limited in scope so less time and less training are needed to master them simply because there is less stuff to study.

Why Adopting Domain Specific Languages?

Ok, we have seen what Domain Specific Languages are and how they differ from GPL, now we should understand why we should consider adopting them. What are the real benefits?
Domain Specific Languages are great because:

  1. They let you communicate with domain experts. Do you write medical applications? Doctors do not understand when you talk about arrays or for-loops, but if you use a DSL that is about patients, temperature measures and blood pressure they could understand it better than you do
  2. They let you focus on the important concepts. They hide the implementation or the technical details and expose just the information that really matters.

They are great tools to support reasoning on specific domains and all the other advantages derive from that.  Let’s look at them in details.

Communication with domain experts

DSLs are about communicating with domain experts
In many contexts you need to build software together with domain experts who are not themselves developers.
For example:

  • You could build medical applications and need to communicate with doctors to understand the treatment a companion software should suggest
  • You could build marketing automation software. You would need the marketing people to explain you how to identify clients matching a certain profile, to offer them a particular deal
  • You could build software for the automotive industry. You would need to communicate with the engineers to understand how to control the brakes
  • You could build software for accountants. You need to represent all the specific tax rules to apply in a given context and you would need an accountant to explain them to you

Now, the problem is that these domain experts do not have a background in software development and the way of communicating of developers and those domain experts can be very different, because they speak different languages.

Developers talk about software, while domain experts talk about their domain.

By building a DSL we build a language to communicate between developers and domain experts. This is not too dissimilar to what is described in Domain Driven Design. The difference here is that we want to create a language understood by developers, domain experts and also by the software that will be able to execute the instructions specified in the DSL.

Now, the holy grail would be to create a language, give it to domain experts and have them go away and write their queries or logic alone. In practice, usually DSLs do not achieve that but prove very useful anyway: a typical interaction consists in having a domain expert describes what it wants to a developer, and the developer can immediately write down that description using a DSL. The domain expert could at this point read it and criticize it.

Typically non-developers do not have the analytical skill to formalize a problem, but they can still read it and understand it, if it is written in a DSL using a lingo familiar with the user. Also, these tools could use simulators or run queries on the fly so that the domain expert can look not only at the code itself, but also at the result.

These kinds of interactions in practice can have a very short turnaround: code can be written during a meeting or within days. While typically when using GPL the turnaround is measured at the very least in weeks, if not months or years.

By using DSLs you can very frequently:

  • having domain experts read or write high level tests. Like requirements which are executable
  • when doing co-development with domain experts developers can get feedback at a very fast pace

So the answer to the question:

Can domain experts (not programmers) write DSLs alone?

Is, of course, “it depends”. But definitely DSLs permit to have domain experts involved in the development process. Reducing dramatically feedback cycles and reducing exponentially the risks of disalignments. You know, when you have developers talking a couple of time with the domain experts, than they walk away and come back to show their solution to domain experts. And those experts stare the solution and declare it to be absolutely, completely, irremediably wrong. You can avoid that, by using a DSL.
So it is important to understand the advantages of that, and at the same time be realistic. You see, some decades ago there were enthusiast suggesting that all sort of people could write queries in SQL autonomously:

Forty years later it appears pretty clear that housewives are not going to write SQL.
The point is that many DSLs require to formalize processes in a way that demand significant analytic skills. Those skills are easily found in developers, but are not so common in people with a non scientific background.

Focus and productivity

The fact that the DSLs abstract some technical details, to focus on the knowledge to capture, has important consequences.

On one hand, it make the investments in the code written using DSLs something that mantain value over time. As the technology change you can change the interpreter processing DSL code, but the DSL code can stay the same. An HTML page written 20 years ago can still be opened using devices that no one was able to imagine 20 years ago. The browsers in the meantime have been completely rewritten multiple times. Then the logic can be ported to new technologies.

I want to share a story about a company I have worked with. This company has created its own DSL to define logic for accounting and tax calculations. They started building this DSL 30 years ago and at that time they used to generate console applications. Yes, applications that run in consoles of 80×25 cells. I worked with them re-engineering the compiler and the same code of their DSL is now used to generate reactive web applications. How this happened? Because the DSL captured only the logic, the really valuable part of the programs, an asset extremely important for the company. The technical details were abstracted in the compiler. So we just had to change the DSL compiler to preserve the value of the logic and make it usable in a more modern context.

Domain Logic is what has value while technology changes over time
This teach us that:

Domain logic is what has value and should preserved, while technology change over time

By using a DSL we can decouple domain logic and technlogy and make them evolve separately.

Another advantage of hiding technical details is productivity. Think about the time spent thinking about deallocating memory or choosing implementation of a list would perform best for the case at hand. That time has a poor ROI. With a DSL instead you just focus on the relevant parts of the problem and get it solved.

The typical (wrong) reasons against DSLs

There are a lot of developers out there thinking:

If my language is Turing complete I can do everything with it

Yes and no. You can tackle any problem but you cannot necessarily:

  • write the most concise and clear solution to a problem
  • you cannot write the solution quickly
  • you cannot write a solution that is understandable
  • you cannot provide errors which are understandable
  • you cannot show it and discuss it with domain experts
  • you cannot provide tool support which is meaningful for the problem at hand
  • how reusable is the solution: if you write it in C you cannot later use that solution in other context. With a DSL you can start by building the solution once and then build several code generators or interpreters
  • a DSL can in reality be faster because the generator can be specialized for a certain architecture

But people will have to learn another language

First of all, if a language is tailored for a specific domain, persons that know that domain should be very facilitated to learn the language, because it is about the concepts they are familiar with. It is true that learning has a cost and this cost can be reduced with good tool support: editors that provide auto-completion, proper error messages and quick fixes can reduce the learning time. The possibility to easily obtain feedback, for example by simulating the results of the code just wrote, also helps. There is also the possibility of creating interactive tutorials.

But I would be locked in into the DSL!

I have some shocking news: you are already locked in whatever programming language you are using to express the logic of your systems. If those languages stop evolving you will need to sort your way out. How many companies are trapped by their Cobol, Visual Basic or Scheme codebases? The difference is that, if you are locked in into a language you build and control, you can decide if the language keeps evolving, if the compiler can target a new environment (“let’s generate a web application instead of a console application!”). If you are locked in someone else’s language there is not much you can do about it.

A DSL will not be flexible enough

Designing a DSL requires to define clearly its scope. There will be things left out, things that you could occasionally want to do but that the DSL will not support. Why is that? Because a DSL has to be specific and limited, to support in a great way a set of tasks others have to be left out. To strike the right balance is one of the greatest challenges you will face when designing a DSL, but if you get it right you should be able to cover all reasonable usages with your DSL. It is often possible to design a DSL to be extensible, supporting functionalities written in a GPL, in the case they are really, really needed. But this should be more seen as a way to reassure users, until they realize they would very rarely need it.

Build a DSL take a huge effort

This is just not true, if you use the right approach. In the following section we are going to see different ways to build languages and all the necessary supporting tools with a reasonable effort.
You can find a list of other perceived obstacles to DSLs adoption here: Stumbling Blocks For Domain Specific Languages or you may want to read this paper I coauthored about benefits and problems with adopting modeling in general (it mostly applies to DSLs as well).

How to create Domain Specific Languages

Wonderful, you have seen why Domain Specific Languages are so cool and what benefits they can bring you.
Now there is only one question:

How do we build DSLs?

Let’s see how by looking at:

  • what tools you can use to build DSLs
  • what are the most important success factors
  • which skills do you need

What tools can we use to build Domain Specific Languages?

There are different ways to build a DSL. The goal here is to build a language, with tool support, while keeping the effort reasonable. We are not building the next Java or C# so we are not going to pour tens of man years at building an extra complex compiler or an IDE with tons of features. We are going to build a useful language, with good tool support with an investment that can be substained by a small company.

So I am going to show you the menu and you can pick your own choice. If you need help you can look at the comparison I prepared at the end of this list.

The approaches are divided in three groups, depending on the kind of DSLs you want to build:

You probably know what textual and graphical languages are but you may not have encountered projectional editors before. They are pretty interesting things so you should probably take a look.

Textual languages

These are the most classical languages. Most practicioners will not even conceive other kinds of languages. Admittedly we are all used to work with textual languages. They are easier to support and can be used in all sort of contexts. However to use them productively I think that a specific editor is mandatory. Let’s see how to build textual languages and supporting tools.

A pragmatic do-it-yourself approach

Do it yourself approach
Roll up your own solution: You could reuse a parser generator like ANTLR (or Lex & Yacc if you are an old-style guy) and write the rest of the processing yourself. This is doable but it requires knowing what your doing. If you don’t know where to start you can take a look at my book on building languages.

I am sure you want to read it so I do now want to spoil it too much, but the path is more or less this:

  1. You define the lexer and parser grammar using ANTLR. You do not know ANTLR? No problem, here it is a nice tutorial on ANTLR. In this blog there are many other articles about ANTLR.
  2. You transform the parse tree produced by ANTLR in a format easier to work with. So you get the model of your code.
  3. You resolve reference, build validation, and implement a typesystem as a set of operations on the model of your code. It sounds complex, but if you know what you are doing you can get it done in a few hundreds of lines of code.
  4. Finally you either interpret or compile the model of your code
  5. You build a simple editor for your language. For how to do that look for tutorials on this blog or into the book. I tend to use an hackable editor I built myself. I named it Kanvas

What I like of this approach is that you are in control of what is happening. You can change the whole system, evolve it and it is simple enough that you can really understand it. Sure with this approach you are not going to get a super complex IDE with tens of refactoring operation. Or you are not going to get those things for free at least.

Xtext

Xtext is a solid solution to build textual languages. In practice you define your grammar in a way similar to what you would do with ANTLR but instead of getting just a parser you get a nice editor. This editor is by default an Eclipse plugin. It means that you will be able to edit the files written in your DSL inside Eclipse. If you know how the Eclipse platform works you can create an RCP application: i.e., a stripped down version of Eclipse basically supporting only your language and removing a bunch of stuff that would not be useful to your users.

So Xtext give you an editor and a parser. This parser produces for you a model of your code using the Eclipse Modeling Framework (EMF). So it basically means that you have to study this technology. I remember the long days reading the EMF book as one of the most boring, mind-numbing experiences I have ever had. I also remember asking questions on the Eclipse forums and not getting any answer. I have open bug reports and I have received answers three years after (I am not joking). So it was disheartening at first but over time the community seems to be improved a lot. Right now the material available on the Xtext website is incomparably better than it used to be and the book from Lorenzo Bettini helped a lot (more on it later).

The editors generated by Xtext can be deeply customized, if you know what you are doing. You can get away with minor changes with a reasonable effort, but if you want to do advanced stuff you need to learn the Eclipse internals and this is not a walk in the park.

Recently Xtext escaped the “Eclipse trap” by adding the possibility of generating editors also for IntelliJ IDEA and… the web! When I first found out this I was extremely excited. Me, as many other developers, switched to IntelliJ some years ago and I was missing a way to easily build editors for IntelliJ IDEA. So this was great news, even if Xtext has been created to work with Eclipse so the support for IntelliJ IDEA is not as mature and battle-tested as the one for Eclipse. I did not try yet the web editor, but from what I understood it generates a server side application which is basically an headless Eclipse and on the client side it generates three different editors based on three technologies (each one with a different level of completeness). The one fully supported is Orion, an Eclipse project. While the other two are the well-known CodeMirror and ACE.

You may want to check out this list of projects implemented with Xtext to get an idea of what is possible to achieve using Xtext.

Textual languages: other tools

If I had to build a textual language in most cases I would go for one of the two approaches defined earlier: either my do-it-yourself approach or using Xtext. That said there are alternatives on which I think it makes sense to keep an eye on.

textX is a Python framework inspired by Xtext. You can define the grammar of your language with a syntax very, very close to the one used by Xtext. textX does not use EMF or generate code but it use instead the metaprogramming power of Python to define classes in memory. While it seems nice and easy to use, textX does not generate editor support like Xtext, so that is a major difference.

If you want to get a better feeling of how textX works take a look at this video.


There are other tools like Spoofax. I did not use it myself so I cannot vouch for it. It is more academic stuff than an industrial-grade language workbench, so I would suggest a bit of caution.
Spoofax can be used inside Eclipse. It is based on a set of DSLs to use to create other DSLs.

If you want to look into Spoofax you may want to look at this free short book from Eelco Visser named Declare Your Language.

Graphical languages

Graphical languages seem approachable and frequently domain experts feel more at ease with them than with textual languages and their geeky syntaxes. Graphical languages require building specific editors to be used and they are less flexible than textual languages. Also, they are less frequently used than textual languages and the tools to build graphical languages tend to be less mature and more clunky. Here I present you a list of a few options. If you want to read a more complete list you can look into this survey on graphical modeling editors.

GMF, the painful solution

GMF
There is one way to build graphical editors for your language that over time acquired quite a (not-exactly-positive) reputation. Its name is GMF: Graphical Modeling Framework.
Yes, you can use it to build editors which can be used inside Eclipse. Similarly to Xtext it is based on EMF. Basically you use EMF to define the structure of your data and then GMF permits to specify how the different elements are represented, how their connections are displayed and that sort of stuff. Typically you then edit the details of each element in a separate panel, a sort of form.
You can see an example of a GMF editor in the pictur below.

Image from https://esalagea.wordpress.com/2011/04/13/lets-solve-once-for-all-the-gmf-copy-paste-problem-and-then-forget-about-it/

Now, the documentation is basically unexisting and to make it work is a challenge which requires a great amount of patience and determination.

This framework has potential and it is powerful and flexible, but working with it is far from being an enjoyable experience.

Sirius, hiding GMF

Eclipse Sirius
There are tools built on top of GMF to make the experience less terrible for the language designer. A simple tool is Eclipse Eugenia, while a more complex one is Eclipse Sirius. Sirius reuses some pieces of GMF (GMF Notation, GMF Runtime) but it moves aways from its code generation approach and instead use model introspection. Let me stress this is what I have read about Sirius but I did not used it myself.

I have used Eclipse Eugenia and it helped jump starting my editor, but it is a limited tool and if you want to customize your editor you are back to GMF.

I have not used Eclipse Sirius myself but it seems to be decently supported by Obeo and being used at Thales, so I would expect it to have at least reasonable maturity and usability.

MetaEdit+, the commercial solution

MetaEdit+
MetaEdit+ is a language workbench for defining graphical languages. Contrary to all the other tools we discussed it is a commercial tool. Now, generally I prefer to base my languages on open-source solutions because I found them more reliable. I know that in the worst case I can always jump in and mantain the platform myself, if I really need to. With a commercial solution instead we have to consider what happens if the provider goes out of business. You can probably keep using the tool you bought for a while, until it is not compatible with the operating-system you are using. If you are about doing a large investment you could also consider setting up a source code escrow, to get access to the code in the unfortunate circumstance the provider shuts down. That said MetaCase (the company behind MetaEdit+) is a solid company which has been in business for quite a few years.

I have assisted in two occasions to a presentation from Juha-Pekka Tolvanen and I was positively impressed both of the time. They have a mature solution and they use it to build a bunch of interesting DSLs for their clients. I like very much to check their regular tweets on the DSL of the week. Here a few samples:

So if you ever need a graphical language I would advise to consider this solution. The alternative is in my opinion to use a full-blown projectional editor, which permits to create also graphical languages, but not only those. Curious? Keep reading.

Projectional editors

Projectional editors are extremely powerful and exciting but they are unfamiliar to many users. I can give you the theory bit, and throw at you a definition, but if you really want to understand them watch the video below.

A projectional editor is an editor that show a projection of the content stored on file. The user interacts with such projection and the editor translates those interactions to changes to the persisted model. When you use a text editor you see characters, you add or delete characters and characters are actually saved on disk. In a projectional editor you could edit tables, diagrams and even what it looks like text, but those changes would be persisted in some format different from what you see on screen. Maybe in some XML format, maybe in a binary format. The point is that you can work with those files only inside their special editor. If you think about it, this is the case also for all the graphical languages: you see nice pictures, you drag them around, connect lines and in the end the editor save some obscure format, not the nice pictures you see on the screen. The point with projectional editors is that they are much more flexible than your typical graphical language. You can combine different notations and support all sort of representation you need for your case.

Confused? That is expected, watch the video below, watch many more videos and things will appear clear at some time.

You could also take a look at this explanation of projectional editing written by Martin Fowler.

Jetbrains MPS


Jetbrains MPS is a tool that I have been using for some years, working on it daily on most of last 12 months. It is an extremely powerful tool and it is the most mature projectional editor available out there. It is no accident: Jetbrains has invested significantly on developing it over more than a decade.

Do you want to see how it looks like? Watch the video.

I find very useful MPS to build families of interoperable languages with advanced tooling. Imagine several DSLs to describe the logic of your problems, to define tests, to define documentation. Imagine all sort of simulators, debuggers, tools to analyze code coverage. Everything built on one platform.

Now, it means you need to be ready to embrace Jetbrains MPS and to invest a significant amount of time to properly learn it (or hire someone like me). However if you are ready to do the investment it can simply revolutionize your processes.

Intentional Platform

Intentional Platform
This one is the mysterious and intriguing one.

Charles Simonyi is the man who designed Excel, one of the first space tourists and one of the richiest men on earth. In 1995 he wrote the revolutionary paper The Death of Computer Languages, The Birth of Intentional Programming. In this paper he presents a new, more intuitive way of programming, fundamentally based on projectional editors. He starts working at Microsoft on these ideas, but in 2002 he leaves Microsost to cofound Intentional Software. Since then the publicly available information on their tool, the Intentional Platform has been extremely scarce. They have published a few papers and given a few presentations.

I have heard from people that have used it that it has a lot of potential but it is not there yet. For sure I would love to put my hands on it. The closest you can get is to read this somehow old review of the Intentional Platform from Martin Fowler. Maybe one day even us mortals will have the possibility to know more about this legendary tool.

As far as I know they work with selected companies but for now their tool is not publicly available.

Whole Platform

Whole Platform
This is a good Language Workbench too frequently overlooked. While it has been used in production for many years in a large company in Italy, it is lacking a little bit on the documentation and marketing side. Yes, I know it is sad that pure engineering awesomeness is not enough to gain popularity. Anyway if you want to know more about it you can read my post on getting started with the whole platform.

There are a few concepts that I find quite interesting. I am not an expert on the Whole Platform: I have just played with it and talked with his authors.

One aspect that I find really interesting and different with respect to the other Language Workbenches is that the Whole Platform supports quite well working with existing formats, and evolve from existing processes to more advanced Language-Engineering approaches. For example, it is quite easy to define grammars for existing formats in order to parse them.

The following is an example of a grammar to parse JSON but the same approach has been used to parse very complex formats used in the banking domain.

Whole Platform – Example of grammar

Another idea I love about the Whole Platform is the Pattern Language: the possibility to take a model (in this sense a piece of Java code is a model) and define variability points that could be filled with values from another model.

Whole Platform – Pattern Language

Also, the Whole Platform is quite rich and it supports also graphical languages.

Whole Platform - Graphical Language

Whole Platform – Graphical Language

Riccardo Solmi and Enrico Persiani are the minds behind the Whole Platform and you should probably talk to them if you are interested in using this Language Workbench.
The images of the Whole Platform I have used are released under the CC Attribution 2.0 license (https://creativecommons.org/licenses/by/2.0/)

Comparing the different approaches

ApproachTypeWhen to use it
Do-it-yourselfTextualYou need to be in absolute control of the platforms supported and you do not want any vendor lock-in
XtextTextualYou want a textual language with good editors and you want it fast
textXTextualYou love the flexibility of dynamic languages while editor support is not important to you
SpoofaxTextualYou like to work with a sound theoretical approach and you are not afraid of a few bumps in the road
GMFGraphicalYou need extreme flexibility to build your very own graphical editor
Eclipse SiriusGraphicalYou are ready to trade some flexibility to get things done quicker and saving some mind sanity
MetaEdit+GraphicalYou are fine using commercial software for a stragegic component and you want results quickly
Jetbrains MPSProjectionalYou want to build family of languages with powerful and complex tooling like simulators, debuggers, testing support and more
Intentional WorkbenchProjectionalYou have some connection that give you access to the most mysterios and hyped Language Workbench
Whole PlatformProjectionalYou want to support existing formats and transition smoothly to a new approach

One thing I would like to stress is that projectional editing is a superset of the graphical editing. So you can define graphical languages using Jetbrains MPS. Given there is not a clear and great alternative to build only graphical DSLs I would use Jetbrains MPS in that case. Alternatively I would consider build the tooling myself targeting the web. Another interesting option could be looking into something like FXDiagram.

Still not enough?

If you want to keep an eye on the new Language Workbenches that come up I suggest you follow the “Language Workbench Challenge”, a workshop that is typically colocated with the Software Language Engineering conference. I co-authored a paper and attended the last edition and it was awesome.

What do I need to make my DSL succeed?

There are just two things that will seem obvious but are not:

  1. You need your users to use it
  2. You need your users to get benefits from using it

To achieve this you will need to build the right tool support and adopt the right skills. We are going to discuss all of this in this section.

Get users to use it

The first point means that you need to win the support of users. During my PhD I conducted a survey on the reasons why DSLs are not adopted. There are several causes, but one important factor is resistance from users, especially when they are developers.

If you target a DSL to developers, they could resist because they feel they are not in control as when they use a General Purpose Language (GPL). They could also fear that a DSL lowers the bar, being simpler to use than, let’s say, Java. In addition to that, as all innovations, a new DSL is threatening for seasoned developers because it reduces the importance of some of their skills, like the vast experience in dealing with the quirks of the current GPL you are using at your company.

If your DSL is intended to non-developers generally it is easier to win their support. Why? Because you are giving them a superpower: the ability to do something on their own. They could use a DSL to automatize a procedure that previously was done manually. Maybe before the DSL was available the only possibility for them to do something was bothering some developer to write custom code. Now they get a DSL which means more power and independence because of it. Still they could resist adopting it if they perceive it as too difficult or if they feel it does not match their way of thinking.

To me the key as a DSL designer in this case is being humble and listen. Listen to the developers, work on capturing their experience and embedding it in the design of the DSL or the tooling around it. Involve them in the design of the DSL. When talking with your users, technical or not, communicate that the DSL will be a tool for them, designed to support them and derived by their understanding of the domain at hand. When designing DSLs the cowboy approach does not work, you need to succeed as a team or not succeed at all.

Give benefits to users

DSL Benefits

If you get the support of users and people start using it you win only if they get a significant advantage from using the DSL.

We have discussed the importance of a DSL as communication tool, as a medium to support co-design. This is vital.

In addition to this, you can increase significantly the productivity of your users by building first-class tool support.

A few examples:

  • a great editor with syntax highlighting and auto completion: so that learning the language and using it feel like a breeze
  • great error messages: a DSL is an high level language and error can be very significant for users
  • simulators: nothing helps users as the possibility to interact with a simulator and see the results of what they are writing
  • static analysis: in some contexts the possibility to analyze the code and reassure against possible mistake is a big win

These are a few ideas but more can be adopted, depending on the specific case. Specific tools support for specific languages.

Tool support: why we do not care about internal Domain Specific Languages

There is one factor that is frequently overlooked and this is tool support.

Many practicioners think that the only relevant thing is the syntax of your language or what it permits to express, with everything else being a detail.

This is just fundamentally wrong because the tool support can multiply exponentially the productivity when using any language, especially a DSL. This is a crucial aspect to consider, because the language should be designed considering tool support.

Because when building Domain Specific Languages, if you want to get serious, you have to build good tool support. Tool support is an essential key in delivering value.

I recorded this short video on this very subject.

Building a language: tool support from Federico Tomassetti on Vimeo.

Tool support is the reason why internal domain specific languages (i.e., fake DSLs) are irrelevant: they do not have any significant tool support.

When using some host languages you can bend them enough of getting some sort of feeling of having your own little language, but that is it. There are some languages that are flexible enough to give a sort of decent … like ruby. With other languages you get very poor results. I feel pity for the people trying to build “internal DSLs” for languages like Scala or Java. The worst of all is lispers. I understand their philosophy “if you want to solve a problem in LISP, first you create your LISP dialect and then solve it using it”. I understand and I think it is a great technique. Just let’s not pretend this is a real DSL. This is not something you can share and work with closely with domain experts. It can be your trick to be more productive, but that is it.

You see those ridicuosouly long chains of method calls and you hear someone presenting those as Domain Specific Language. That makes me feel a mix of two emotions:

  • pity for him and his users
  • rage for the confusion it creates. Real DSLs are very different and they can bring real benefits. Stop mixing them with this… thing

Just build a real DSL, so an external DSL!

What skills are needed to write a DSL?

Typically you need to be able to have high abstraction skills, the same you need typically for metaprogramming. If writing a library is 3 times harder than writing a program, writing a framework or a DSL is typically 3 times harder than writing a library.

You need to be humble: you may need a developer, but typically you need to create this DSL for other professional that are going to use it for their job. Listen to what theydo, understand how they work. What could seem wrong or naive to you could have reasons you do not yet understand. Acting as the all-mighty-expert is the single best way to create a failed DSL, not useful in practice and therefore not used.

Aside from this, practice and learn. Keeping doing it for a few years should do the trick.

Resources

Now that you have seen what DSLs can bring you and you have an idea how to build them you should be happy.

But you are not, you want more, you want to understand better DSLs, you want to learn everything about them.

Well, I do not know about everything, but definitely I can give you some pointers. I can tell you where to find the good stuff.

Let’s see what we can find:

Books

DSL Engineering

I would start suggesting to read the DSL Engineering by Markus Völter. This PDF version of the book is donation-ware. So just read it and donate. Alternatively you can find the printed version on Amazon.

The book start with an introduction part: it is very useful to set your terminology straight. After that it comes the DSL design part, focusing on different aspects separately. If you do not have direct access to an expert to teach you how to design DSLs reading this part of this book is the best alternative I can recommend (together with as much practice as you can, of course).

Then it comes the part about implementation: remember that Markus has a PhD, but he is first of all someone who gets things done so this part is very well written, with examples based on Xtext, Spoofax and MPS.

Part IV is about scenarios in which DSLs are useful. Given this is based on his large experience in this field there are a lot of interesting comments.

I had the occasion to work with Markus. I used to admire him a lot before meet him and I now I admire him even more. He is simply the best one on this field, so if you can learn something from him do it. Read his books, watch his presentations, follow his projects. It will be a good way to invest your time. He is lately working on Jetbrains MPS stuff, so you should follow what is going on with mbeddr and IETS3. Mbeddr is both a set of plugins for MPS and an implementation of the C language in MPS with special domain-specific extensions to support development of embedded software. IETS3 is instead an expression language built in MPS.

Martin Fowler is a very famous thought leader and bestseller author. I really admire his clarity. He is the author of Domain Specific Languages, a book about both internal and external DSLs.

I find the mental models presented in the book quite useful and elegant. In practice however I find internal DSLs irrelevant, so I am interested in only some portions of this book.

There are 15 chapters dedicated specifically to external domain specific languages. While those chapters are organized around implementation techniques there are comments and remarks from which you can learn some design principles.

I think the sections on the alternative computational models and code generation are very valuable. You will have an hard time finding an exploration to these topics at this level of detail anywhere else.

The book is 7 years old and the techniques may have evolved since the book was written, but the vast majority of the considerations presented in the book are still valid. And of course they are thoughtful and well explained, as you would expect from Martin Fowler.

If you are interested in textual languages and in particular on ANTLR you should definitely look into the Language Implementation Patterns from Terence Parr. I like the author, I like the publisher (the Pragmatic Bookshelf) and unsurprisingly I love the book.

The book starts discussing different parsing algorithms. If you like to learn how stuff works you should take a look at these chapters. Then there are chapters about working with the Abstract Syntax Tree, extracting information, transforming it. This is the kind of stuff you need to learn if you want to become a Language Engineer. Also, there chapters on resolving references, building symbol tables or implementing a typesystem. These are the foundations to learn how to process the information expressed in your DSL.

Finally Terence explains you how to use the information you have processed by building an interpreter or a code generator. At this point you end your journey, having seen how to build a useful language from start to finish. This book will give you solid basis to learn how to implement DSLs. The only thing missing is a discussion on how to design DSLs, but this is not the goal of this book.

 


The MPS Language Workbench - Volume I  The MPS Language Workbench - Volume II
If you are looking into MPS there are not many resources around. It could make sense to buy the two books from Fabien Campagne on the MPS Language Workbench. They explain in details all the many features of MPS (admittedly some are a bit obscure). If I would have to find one thing missing is more advices on language design. These books are very good references to learn how MPS works, but there is not much guidance on how to combine these features to get your results. One reason for that is that MPS is an extremely powerful tool, which can be used in very different ways so it is not easy to give general directions.

Volume I explains separately the different aspects of a language: how to define the structure (the metamodel), how to define the editors, the behavior, the constraints, the typesystem rules and so on. Most of the chapters are in reference-manual style (e.g. the chapter The Structure AspectStructure In Practice). Everything you need to learn to get started and build real languages with MPS is explained in Volume I.

Volume II is mostly about the advanced stuff that you can safely ignore at the beginning. I suggest looking into this book only when you feel comfortable with all the topics explained in Volume I. If you have never used MPS before it will take some time. Volume II explaine you how to use the build framework to define complex building configurations, it gives you an overview of all the different kinds of testing you may want to use for your languages. It also show you how to define custom aspects for your language or custom persistence.

I have bought the Google Play version but they are available also in print form.
Implementing Domain Specific Languages with Xtext and Xtend - Second-Edition

On Xtext there is a quite practical and enjoyable book from Lorenzo Bettini: Implementing Domain-Specific Languages with Xtext and Xtend. I wrote a review on the second edition of this book: if you want to read my long-form opinion of that book you can visit the link.

If you want to learn how to write textual languages with good tool support you could start following a couple of tutorials on Xtext and then jump to this book. This book will explain you everything you need to know to use Xtext to build rather complex editors for your language. The only caveat is that Xtext is part of a complex ecosytem so if you really want to become an expert of Xtext you need to learn EMF and Xtend. The book does a good job in teaching you what you need to know to get started on these subjects but you may have to complete your education with other resources too, when you want to progress.

What I like about this book is that is not a reference manual, but it contains indications and opinions on topics like Scoping or building a typesystem rules (the author has some significative experience on this specific topic). Also, the author is interested in best practices so you will read his take on testing and continuos integration. The kind of stuff you should not ignore if you are serious about language engineering.

If you are interested in DSLs in general you can take a look at DSLs in ActionThe book is interesting but I have two comments:

  1. It focus way too much on internal DSLs, which are, as we all know, not the real thing
  2. They have misspelled my name. -1 point for that

Specifically on external DSLs there is not much: the author briefly discuss Xtext and then spend a chapter on using Scala parser combinators to build external DSLs. That would not have been my first choice. So if you are interested in learning how to implement an external DSL do not pick this book. However if you want a gentle introduction to the topic of DSLs, if you are a degenerate who prefers internal DSLs instead of external DSLs, or if you want to read every available resource on DSLs this book would be a good choice.

Domain Driven Design

Domain Driven Design is a relevant and important book to read. Because you need skills to understand the domain, in order to be able to represent it in your language, and to design your language so that it can capture it. Now, I should probably just praise this book and stress how much I have enjoyed it. Unfortunately I tend to err on the honesty side so I warn you: this is one of the most boring books I have ever read. It is important, it is useful, it is great but it just so plain and long. It stayed on my night stand for months.

What you should get how of this book is the importance of capturing the Domain in all of your software artifacts. The book stress the importance of building a common language to be shared among the stakeholders. This is completely and absolutely relevant if you want to build Domain Specific Languages. What is not part of this book is how to map this domain model to a language. For that part you should refer to the other books, specific to DSLs design. This book is a good complement to any of them.

Websites and papers

Companies

I design and implement Domain Specific Languages for a living but instead of tell you how great I am, I will instead list other companies that you could work with.

The first, obvious name is Itemis. They are a German company with small offices in France and Switzerland. They employ some of the best in the field: I have met and worked with Markus Völter and Bernd Kolb and I am seriously impressed by the level at which they work. They have worked with so many companies and have done so many projects that the list will be scarely too long. I would just say that in the later years they have done amazing work using Jetbrains MPS. They have created the mbeddr project and from it they have derived a set of utilities, named the mbeddr platform, which contributed enormously to the growth of MPS. They have contributed early and significantly to the Language Workbench community, so if you need some help on a DSL you should seriously consider working with them.

TypeFox is another German company.  I have interviewed one of the founders some time ago. Several Xtext core committers, including the project lead, are involved in the company so as you can imagine they have some serious competencies on Xtext and the EMF platform in general. If you need Xtext training or consulting I would consider them the top-choice. They are also a Solution Member of Eclipse and I would expect them to be able to build complex Eclipse-based solutions. If you want to hear more you can read this interview to Jan Köhnlein. Jan is one of the founders of the company and we talked a few months after the company was created.

Jetbrains is obviously the company behind Jetbrains MPS. Recently they started offering training on MPS. I asked about this during my interview at Vaclav Pech. They offer training, either basic or advanced, on their premises in Prague or on-site.

At the moment I do not think they offer MPS consulting (for that you can talk to me or to Itemis).

Conclusions

There are many reasons why you should really consider Domain Specific Languages. I have seen companies benefit enormously from DSLs. Most of the people I have worked with used DSLs as a key differentiator that helped them increase productivity by 10-20 times, reduce time-to-market and feedback cycles, increase the longevity of their business logic and much more.

Aside from the practical benefits I find the topic extremely fascinating. Most of all I feel that by building DSLs we build powerful tools that help other people do their job. As language designers we act as enablers, our languages can be used by skilled professionals to achieve great things and this is an amazing feeling for me.

Could you help me, please?

If you found this guide useful please share it, spread the work and link it. I spent several years working on this subject and a few weeks working on this guide. I would be very happy if you could help me reach others who could find it useful.

Thank you so much!

A few ideas:

  • Share it on Twitter
  • Share it on Facebook
  • Share it on LinkedIN
  • Share it on Google+
  • Write about it in your blog
  • Send an e-mail to your colleagues

(For sharing there are the buttons down below)

 

Building a language: tool support

When building a language there is one aspect which is absolutely crucial: that is tool support.
Tool support will determine if your language is usable at all, it will influence the reception from user and the usability.
In this video I explain why and how you can build great tool support with a limited effort.

Tool support is what makes a real language. This is also one of the reasons why I am against the whole concept of internal DSLs: they are nice but they are not really languages. They cannot be compared in any way to external DSLs (a.k.a. real DSLs).

And it is nice that experts agree on this:

What are good books on best practices of the design of domain-specific languages (DSL)?

Recently I answered this question on quora. I thought it could be useful to report the answer here, in case someone else is looking for books about Domain Specific Languages.

What are good books on best practices of the design of domain-specific languages (DSL)?

On one side you have internal DSLs: they are sort of fluent interfaces written typically in flexible languages like Ruby. You create functions, classes and metaprogramming tricks to let the programmer specify the logic in something that resembles a specific language but is actually just regular code (Ruby for example).

Then you have external DSLs: they are regular languages with their tooling. You can have parsers, editors, code generators or interpreters and other stuff.

External DSLs are the real stuff, internal DSLs are nice but really nothing game changing.

  1. On external DSLs I strongly suggest you to read the book from Markus Völter, DSL Engineering. Markus is a very well known consultant doing real stuff.  This is a book about external DSLs, with practical pieces of advice, comparing several language workbenches. He talks about very different approaches to DSLs and he has unique hands-on experience. Simply put, he is probably the best world expert on DSLs.
  2. If you want to get started with one specific Language Workbench you can read the book from Lorenzo Bettini: Implementing Domain-Specific Languages with Xtext and Xtend. It will teach you how to write DSLs using Xtext. Xtext is a mature Language Workbench for writing textual DSLs. It is a very reasonable choice and when you have finished this book you will have some real competence in writing DSLs.
  3. If you are more interested in something more high-level and more focused on internal DSLs you can take a look at Domain Specific Languages from Martin Fowler. He is a well-known opinion leader and a great speaker, so it could make sense to listen to what he has to say.
  4. As an alternative, you can look to DSLs in Action (I was one of the technical reviewer for this book). It is nice to read but also this one is focused mostly on internal DSLs.
  5. If you are not interested in building great editors for your language or you want to have a fully customized solution for your textual language you can consider looking into ANTLR. It is “just” a parser generator, so it does not provide full tool support like Xtext or MPS but it is a well-known tool and the book Language Implementation Patterns describe how to use it in practice. The author introduces patterns which are useful in general.

 

You could be interested in The complete guide to Domain Specific Languages.

There we discuss all sort of examples, resources, tools to build DSLs and more


Domain Specific Languages

Interview with Jan Köhnlein on TypeFox, DSLs and Xtext

Jan Köhnlein is a leading expert in DSL design and an Xtext committer. After obtaining his PhD from the Technical University of Hamburg he worked for companies very well known in the modeling field: Gentleware and Itemis. Recently he founded TypeFox together with Sven Efftinge and Moritz Eysholdt.

I thought it was a good moment to ask him a few questions about his new venture. Of course it is also a great possibility to ask him about his views on DSLs and the future of Xtext

About TypeFox

TypeFox

Can you introduce TypeFox and tells us more about what services you offer?

TypeFox is a software company focussed on development tools for software engineers and domain experts. As the leading company behind the open-source frameworks Xtext and Xtend, we have great expertise in language engineering, domain-specific languages (DSLs) and tool construction. TypeFox offers all kinds of services around these topics, such as professional support, projects, consulting and training. Our office is based in Kiel, (Germany) at the coast of the Baltic Sea.

How were the very first months at the new company?

TypeFox was founded in January 2016 by Sven Efftinge, Moritz Eysholdt and me. It has been and still is an exciting experience. None of us had led an enterprise before, so there were a lot of new things to learn in addition to getting the company going financially. But we managed to gather a team of very skilled programmers and thanks to the good visibility of the Xtext project we did not have problems to find customers. All in all we had a very successful start. So successful that we plan to grow further.

On DSLs

Are your typical customers already aware of the potentialities of DSLs or do you have to do a lot of evangelization?

Most of our customers choose us because they need Xtext experts, so they are already convinced of the DSL approach. But we do evangelize at conferences, and community gatherings and by writing articles for magazines or online portals.

In your experience what kind of customers are more willing to embrace DSLs? Does it depend on the size of the company or the domain?

As all kinds of abstraction techniques, the DSL approach needs a certain problem size to pay off. Even though Xtext is pretty lightweight for a language engineering framework, you would not create an entity DSL for just a handful of entities. But for a few hundreds you would. So yes, the problem size matters.

Company size does not matter that much. Bigger companies are more likely to have many domain experts, who are not necessarily software engineers, like mathematicians in an insurance company. Those experts can be more directly involved in the software development when they can capture their knowledge in DSLs.

DSLs seem to me more widespread in Europe than in the US: is that true for you? In which countries do you see them more used?

From my former experience, the development in the USA has been more focussed on getting things shipped, while in Europe and especially in Germany we have spent a lot of time on the architecture. Both approaches have pros and cons, but the latter mentality is for sure more susceptible to the DSL approach. Currently we have about as many customers in Europe as we have in the US, so they seem to be converging.

Have you seen any change in the number of DSLs adopters in the last years? Do you think they are going to become more common in the near future?

The basic idea of DSLs is pretty old, and there have always been phases of hype and decline. I see that DSLs have become mainstream in certain areas like enterprise application development, and some industries, like automotive in Germany, even cannot do without them anymore.

What are in your opinion the main issues to adopt DSLs?

Technically, DSLs raise the tool stack. It is not always clear whether this overhead is worthwhile, and many developers are skeptical about that for good reasons. And of course not every programmer is a good language designer.

The other issue is that there is scaffolding: Many developers ask why they should write a code generator for getters and setters, XML mappings or Maven poms if their IDE or web framework generates them on demand. It is not always easy to convince them that it is more sustainable to capture the model in a DSL and write a code generator that can be run repeatedly and can be fine-tuned to the specific needs of the actual project.

Do you work more on DSLs intended for developers or for domain experts? Are they very different?

We currently develop a bit more technical DSLs than business DSLs. This may also be due to the fact that Xtext had been available for Eclipse only, whose look and feel often  scares non-developers off. Since 2.9 you can also generate a web editor for your DSL, and this is where many business DSLs live.

From a language point of view, the main difference is syntax, where developers accept a program like text notation, while many business experts prefer forms, tables, diagrams and prose.

On Xtext

450px-Xtext_logo

I think that most newcomers to Xtext find the learning curve very steep, probably because of EMF. The quality of the documentation on the official website is very high, especially compared to other Eclipse projects but is there anything else you think the community could do to help people to adopt Xtext?

Thanks for the compliment, but documentation is always something that can be improved. Since Xtext is hosted on Github, it should be very easy for community members to submit pull requests for corrections. And luckily there is the book by Lorenzo Bettini, which is an excellent addition to the official docs. If you have an interesting project based on Xtext, an entry in the community section on the Xtext homepage is also just a PR away. Other than that, we have a very vivid forum where you get answers to your questions very quickly which is also driven by the work of the community.

You and your team are contributors to Xtext and you have an amazing experience with it. Can you tell us a bit about what is going on with Xtext and what interesting things are coming in the future?

We definitely plan to invest more in web support. One part of that is to integrate Xtext with Eclipse Che. Sven is going to talk about that at EclipseCon France 2016. In addition we are going to polish the new features added in 2.9, such as IDEA, Gradle and Maven support, as well as the new generator architecture.

One big feature Xtext got recently is the possibility to generate editors for IntelliJ and for the web. How mature is this feature? Is it helping to attract new Xtext users?

IDEA support is ready to be adopted by the users. Unfortunately, we did not get too much feedback so far. We hope that we are going to get more in the future.

Web support is working fine as well. But as there are a plethora of web application architectures, it will be interesting to use it in more projects and find out where we have to make it more generic or specific.

Xtext is about textual notations but models can support other visualizations (using GEF or other tools). What are your thoughts on combining graphical notations with Xtext? I remember reading a first post from you in 2009 and you kept writing on the subject (a more modern approach is presented here). Is it something you do often? Is it effective?

As Xtext is based on EMF like most graphical frameworks from the Eclipse world, the integration seems to be quite straight forward. But the problems emerge from a different understanding of transactions and object identity. The result will be strange glitches and bugs in the integrated tool.

This is why I advocate graphical views on top of Xtext, that only read the model. With FXDiagram you can do that pretty easily, and get a very modern diagram editor on top.
My colleagues Miro Spönemann and Christian Schneider have both worked on the KIELER project before, so you will probably see more contributions from TypeFox to the graphics community in the future.

By the way, we have a customer project where we integrate a textual DSL editor into a table, which was not easy implement for exactly the same reasons mentioned above, but works great now.

Final thoughts

Xtext is probably the most mature and complete system to create textual DSLs, however there are other “Language Workbenches” available. I am thinking of Jetbrains MPS, Rascal, Acceleo and others. What do you think of these alternatives? Any competitor I am missing? Any feature you think Xtext should get or cases in which an alternative tool is the right solution?

Ranting on competitors is not my style, but it is hard to find an alternative to Xtext that is as powerful, extensible and mature, does not lock you into a closed world of specific development tools, is open-source and backed by companies that provide professional support.

Is there anything else we forgot to discuss?

I just want to say thank you to our team, to all committers and supporters of the Xtext projects and to the community. You made Xtext what is is today.

Finally, what is the best way for customers interested in Xtext consulting to get in contact with TypeFox?

Write us an email to contact@typefox.io, or bump into us at conferences, like EclipseCon, JAX and others.

Apart from that, we are hiring. So if you know any good software developers who want to join our team, spread the word!

If you are interested in joining TypeFox visit http://www.typefox.io/career

Recognizing hand-written shapes programmatically: find the key points of a rectangle

A far-fetched goal I have is using sketching on a whiteboard as a way to define programs. I mean formal programs that you can execute. Of course through your sketches you would define programs in a high level domain specific language, for example describing a state machine or an Entity-Relationship diagram.

To do so I would like to start recognizing rectangles. Then I will move to recognize other shapes, connecting lines and recognizing text present in the diagram. For now let’s focus on recognizing rectangles.

My general approach would be the following:

  1. recognize the meaningful lines
  2. recognize key points among those lines
  3. classify those key points using AI
  4. find shapes by combining the classified key points

Ok. This is not going to be something I complete over a week-end.

The input images

We will use 3 images: two have them have been drawn on a whiteboard by me, under different light conditions. The third one was found on the Internet. It has the particularity that the sketch was done on a graph paper (i.e., there is a grid on the paper).

Let’s see how we can process these images. We will use Java and the BoofCV image processing library.

Gray scale

As first thing we convert the image to gray scale. Here we get a problem with the image taken under artificial light:

Screenshot from 2016-04-02 14-51-23

We want to remove that giant gray blob on bottom right corner. To do so we will use derivatives.

Derivatives

We blur the image, to reduce the effect of noise and calculate the derivates. This is a way to capture the sharp variations of colors which happens vertically or horizontally.

We would got something like this for the image taken under natural light:

Screenshot from 2016-04-02 14-44-43

However for the image taken under artificial light we see the noise:

Screenshot from 2016-04-02 14-53-42

At this point we take each point of image and look if around it there is an high number of points with an high derivative (either horizontal or vertical). We keep the points satisfying the condition and we set all the other points to white. We do that a couple of times.

This is the result:

Screenshot from 2016-04-02 14-56-12

Contours

We do some additional filtering and then we invoke a function to find the contours inside the image. We draw the external contours in red and the internal ones in blue.

Screenshot from 2016-04-02 14-58-19

We then remove the short contours

Screenshot from 2016-04-02 14-59-01

Key points

The contours we get are drawn as a list of segments which are very short. Let’s draw the extremes of the segments in blue.

Screenshot from 2016-04-02 15-00-19

Yes, they are very short: you just see a continuous set of extremes, very close one to each other. We want to get much less segments and much longer.

To do that we use basically two strategies:

  1. we simply merge consecutive extremes which are very close
  2. we take sequences of three consecutive points: A, B, C. If B is very close to the line between A and C we just remove B

We apply two times both these strategies and get much simpler contours. This is the final results.

What next

Now we have a reasonable number of relevant points. I want to now proceed to classify them through machine learning techniques. For example I want to recognize single points to be a top left corner of a rectangle or points part of an arrow. Then I will proceed to combine those recognized points to obtain entire shapes (my rectangles!).

Right now I am already generating the images to classify and I am thinking about which features to use for machine learning. I have some ideas, but we will see them in one of next posts.

Training images looks like this:

point_1_109

What if sketching on a whiteboard was a form of programming?

There are different possible notations for languages: textual and graphical are the two larger families.

We can define graphical notations for formal languages. For example we can define a graphical notation to define state machines.

A Domain Specific Language for state machines can be defined using JetBrains MPS. However the user would have to edit the state machine inside MPS.

I wonder if we could instead specify a state machine just by sketching something on a whiteboard. Consider this example:

sm1

We have a start state (the white circle on top) which leads us to the state A without needing any event. Then from state A we can go to state B when an event of type b occurs. Or we can go to state C if an event of type a occurs. From C we need a state of type c to move to D.

Note that I did not represent the direction of transitions, we could assume that they always lead from the higher state to the lower.

I think that at this point we should just make a picture to this sketch and obtain a program which would permit to run this state machine.

To do that we should recognize the different shape, the text present and assign everything to a corresponding concept in our DSL.

The first step would be to recognize rectangles. I guess there are already libraries to do that. I tried OpenCV but I really do not like the way C++ dependencies are handled. I started playing with BoofCV instead, which is written in Java. I first used functions to recognize segments and I got something like this:

Screenshot from 2016-03-28 19-12-53

Then I wrote some functions to merge segments which have endpoints very close and similar slope, to reduce the number of segments. That did not work particularly well.

Screenshot from 2016-03-28 19-13-03

I realized I should probably use some pre-processing: first a Gaussian blur and then calculate the derivatives over the X and Y axis. I got something like this. The colors depends on the values present in the derivates X and Y images. I should use those values to recognize horizontal and vertical lines with common endpoints and so get rectangles.

Screenshot from 2016-03-29 18-49-46

This is not as easy as I hoped but I would love to spend more time on it in the future.

Conclusions

Still, people will need some ability to formalize their thoughts but we can definitely remove some of the incidental barriers due to languages, notations and tools.

Getting started with Jetbrains MPS: how to define a simple language (screencast)

I am blessed (or cursed) with having many different interests in software development. However my deepest love is with language engineering. There are different tools and techniques but Jetbrains MPS is particularly fascinating.

To help people getting started or just get the feeling of how it is to develop with Jetbrains MPS I recorded this short screencast.

 

 

If you are not familiar with this tool here there are the basic ideas.

Jetbrains MPS is a projectional editor: that means that it internally stores the information contained in the different nodes.

If you create a model, then change how certain nodes should look, and you reopen the model it will be automatically using the new “look & feel” that you just. This is simply because what is stored is the pure information, while the appearance is recreated each time you open a model. In other words when you open a model it is projected in a form that you defined and that should be easy to understand for a human being.

This is very different from what happens with text language: suppose you create your language and then decide to replace braces with brackets. You have now to manually go over all your existing source files and modify them, because in that case what is saved is not the pure information, the abstract syntax tree, but its representation: the textual form.

In Jetbrains MPS you work on the different concerns of your language separately: you first define the single concept and the information they contain, then you work on the other aspects. There are several like typesystem, generation, dataflow, etc. In this short screencast we just look at the concept definition and the editor aspect.

One of the advantages of a projectional editor is that you can change the appearance of your nodes without invalidating your existing models. That is great to incrementally improve the appearance of the single nodes and the way they can be edited. In the screencast we start with the default representation of the concept we defined, as provided by default by Jetbrains MPS. Then we refine the editors in a couple of steps until we have a representation which is good enough.

At the end of the screencast we have a language we can use to define very simple todo lists. The possibility to improve it are endless:

  • we could add more concepts: like priority, deadlines and tags for each single task
  • we could add the possibility to mark a task as done and add a nice shortcut to do so
  • we could integrate the language with some other tool: what if we could synchronize our models with Trello or with workflowy?
  • we could support subtasks

Question: have you ever thought of creating your own language, either a general purpose language or a domain specific language? If so did you actually get started?

 

Develop DSLs for Eclipse and IntelliJ using Xtext

In this post we are going to see how to develop a simple language. We will aim to get:

  • a parser for the language
  • an editor for IntelliJ. The editor should have syntax highlighting, validation and auto-completion

We would also get for free an editor for Eclipse and web editor, but please contain your excitement, we are not going to look into that in this post.

In the last year I have focused on learning new stuff (mostly web and ops stuff) but one of the things I still like the most is to develop DSLs (Domain Specific Languages). The first related technology I played with was Xtext: Xtext is a fantastic tool that let you define the grammar of your language and generate amazing editors for such language. Until now it has been developed only for the Eclipse platform: it means that new languages could be developed using Eclipse and the resulting editors could then be installed in Eclipse.

Lately I have been using far less Eclipse and so I my interest in Xtext faded until now, when finally the new release of Xtext (still in beta) is targeting IntelliJ. So while we will develop our language using Eclipse, we will then generate plugins to use our language both in IntelliJ.

The techniques we are going to see can be used to develop any sort of language, but we are going to apply them to a specific case: AST transformations. This post is intended for Xtext newbies and I am not going in many details for now, I am just sharing my first impression of the IntelliJ target. Consider that this functionality is currently a beta, so we could expect some rough edges.

The problem we are trying to solve: adapt ANTLR parsers to get awesome ASTs

I like playing with parsers and ANTLR is a great parser generator. There are beatiful grammars out there for full blown languages like Java. Now, the problem is that the grammars of languages like Java are quite complex and the generated parsers produce ASTs that are not easy to use. The main problem is due to how precedence rules are handled. Consider the grammar for Java 8 produced by Terence Parr and Sam Harwell. Let’s look at how some expressions are defined:

This is just a fragment of the large portion of code used to define expressions. Now consider you have a simple preIncrementExpression (something like: ++a). In the AST we will have node of type preIncrementExpression that will be contained in an unaryExpression. The unaryExpression will be contained in a multiplicativeExpression, which will be contained in an additiveExpression and so on and so forth. This organization is necessary to handle operator precedence between the different kind of operations, so that 1 + 2 * 3  is parsed as a sum of 1 and 2 * 3 instead of a multiplication of 1 + 2  and 3. The problem is that from the logical point of view multiplications and additions are expressions at the same level: it does not make sense to have Matryoshka AST nodes.

Consider this code:

The AST produced by this grammar is:

While we would like something like:

Ideally we want to specify grammars that produce the Matryoshka-style of ASTs but using a more flat ASTs when doing analysis on the code, so we are going to build adapters from the ASTs as produced by Antlr and the “logical” ASTs.

How do we plan to do that? We will start by developing a language defining the shape of nodes as we want them to appear in the logical ASTs and we will also define how to map the Antlr nodes (the Matryoshka-style nodes) into these logical nodes.

This is just the problem we are trying to solve: Xtext can be used to develop any sort of language, is just that being a parser maniac I like to use DSLs to solve parser related problems. Which is very meta.

Getting started: installing Eclipse Luna DSL and create the project

We are going to download a version of Eclipse containing the beta of Xtext 2.9.

In your brand new Eclipse you can create a new type of projects: Xtext Projects.

Screenshot from 2015-06-01 09:44:03

We just have to define the name of the project and pick an extension to be associated with our new language

Screenshot from 2015-06-01 09:45:14

And then we select the platforms that we are interested into (yes, there is also the web platform… we will look into that in the future)

Screenshot from 2015-06-01 09:47:27

The project created contains a sample grammar. We could use it as is, we would have just to generate a few files running the MWE2 file.

mwe

After running this command we could just use our new plugin in IntelliJ or in Eclipse. But we are going instead to first change the grammar, to transform the given example in our glorious DSL.

An example of our DSL

Our language will look like this in IntelliJ IDEA (cool, eh?).

Screenshot from 2015-06-02 19:42:14

Of course this is just a start but we are start defining some basic node types for a Java parser:

  • an enum representing the possible modifiers (warning: this is not a complete list)
  • the CompilationUnit which contains an optional PackageDeclaration and possibly many TypeDeclarations
  • TypeDeclaration is an abstract node and there are three concrete types extending it: EnumDeclaration, ClassDeclaration and InterfaceDeclaration (we are missing the annotation declaration)

We will need to add tens of expressions and statements but you should get an idea of the language we are trying to build.

Note also that we have a reference to an Antlr grammar (in the first line) but we are not yet specifying how our defined node types maps to the Antlr node types.

Now the question is: how do we build it?

Define the grammar

We can define the grammar of our language with a simple EBNF notation (with a few extensions). Look for a file with the xtext extension in your project and change it like this:

The first rule we define corresponds to the root of the AST (Model in our case). Our Model starts with a reference to an Antlr file and a list of Declarations. The idea is to specify declarations of our “logical” node types and how the “antlr” node types should be mapped to them. So we will define transformations that will have references to element defined… in the antlr grammar that will we specify in the AntlrGrammarRef rule.

We could define either Enum or NodeType. The NodeType has a name, can be abstract and can extends another NodeType. Note that the supertype is a reference to a NodeType. It means that the resulting editor will automatically be able to gives us auto-completion (listing all the NodeTypes defined in the file) and validation, verifying we are referring to an existing NodeType.

In our NodeTypes we can defined as many fields as we want (NodeTypeField). Each field starts with a name, followed by an operator:

  • *= means we can have 0..n values in this field
  • ?= means that the field is optional (0..1) value
  • means that exactly one value is always present

The NodeTypeField have also a value type which can be an enum defined inline (UnnamedEnumDeclaration), a relation (it means this node contains other nodes) or an attribute (it means this node has some basic attributes like a string or a boolean).

Pretty simple, eh?

So we basically re-run the MWE2 files and we are ready to go.

See the plugin in action

To see our plugin installed in IntelliJ IDEA we have just to run gradle runIdea from the directory containing the idea plugin (me.tomassetti.asttransf.idea in our case). Just note that you need a recent version of gradle and you need to define JAVA_HOME. This command will download IntelliJ IDEA, install the plugin we developed and start it. In the opened IDE you can create a new project and define a new file. Just use the extension we specified when we created the project (.anttr in our case) and IDEA should use our newly defined editor.

Currently validation is working but the editor seems to react quite slowly. Auto-completion is instead broken for me. Consider that this is just a beta, so I expect these issues to disappear before Xtext 2.9 is released.

Next steps

We are just getting started but it is amazing how we can have a DSL with its editor for IDEA working in a matter of minutes.

I plan to work in a few different direction:

  • We need to see how to package and distribute the plugin: we can try it using gradle runIdea but we want to just produce a binary for people to install it without having to process the sources of the editor
  • Use arbitrary dependencies from Maven: this is going to be rather complicate because Maven and the Eclipse plugin (OSGi bundles) define their dependencies in their own way, so jars have to be typically be packaged into bundles to being used in Eclipse plugins. However there are alternatives like Tycho and the p2-maven-plugin. Spoiler: I do not expect this one too be fast and easy…
  • We are not yet able to refer to elements defined in the Antlr grammar. Now, it means that we should be able to parse the Antlr grammar and create programmatically EMF models, so that we can refer it in our DSL. It require to know EMF (and it gets some time…). I am going to play with that in the future and this will probably require a loooong tutorial.

Conclusions

While I do not like Eclipse anymore (now I am used to IDEA and it seems to me so much better: faster and lighter) the Eclipse Modeling Framework keeps being a very interesting piece of software and be able to use it with IDEA is great.

It was a while that I was not playing with EMF and Xtext and I have to say that I have seen some improvements. I had the feeling that Eclipse was not very command-line friendly and it was in general difficult to integrate it with CI systems. I am seeing an effort being done for fixing these problems (see Tycho or the gradle job we have used to start IDEA with the editor we developed) and it seems very positive to me.

Mixing technologies, combining the best aspects of different worlds in a pragmatic way is my philosophy, so I hope to find the time to play more with this stuff.

DSLs in Action

I just received the final version of DSLs in Action, a book I reviewed for Manning.

The author included in this version an example regarding Xtext, probably I was not the only one to suggest it.

Now I am writing a short review to post on Amazon.

In my opinion the part on internal DSLs is great: it is very pragmatical and there example using a lot of languages. I have to suggest taking a look at this book.

P.s. And they pick a sentence of mine to put on the cover, wow, I am happy about that 😀

I just received the final version of DSLs in Action, a book I reviewed for Manning. It is satisfying to read that the author included an Xtext example as I suggested. I am now writing a short review to post on Amazon. The part on internal DSLs is great. It is very pragmatical and there example using a lot of languages.