On the need of a generic library around ANTLR: using reflection to build a metamodel

I am a Language Engineer: I use several tools to define and process languages.

Among other tools I use ANTLR: it is simple, it is flexible, I can build things around it.

However I find myself rebuilding similar tools around ANTLR for different projects. I see two problems with that:

  • ANTLR is a very good building block but with ANTLR alone not much can be done: the value lies in the processing we can do on the AST and I do not see an ecosystem of libraries around ANTLR
  • ANTLR does not produce a metamodel of the grammar: without it becomes very difficult to build generic tools around ANTLR

Let me explain that:

  • For people with experience with EMF: we basically need an Ecore-equivalent for each grammar.
  • For the others: read next paragraph

Why we need a metamodel

Suppose I want to build a generic library to produce an XML file or a JSON document from an AST produced by ANTLR. How could I do that?

Well, given a ParseRuleContext I can take the rule index and find the name. I have generated the parser for the Python grammar to have some examples, so let’s see how to do that with an actual class:

Let’s look at the class Single_inputContext:

I should obtain something like this:

Good. It is very easy for me to look at the class and recognize these elements, however how can I do that automatically?

Reflection, obviously, you will think.

Yes. That would work. However what if when we have multiple elements? Take this class:

To define metamodels I would not try to come up anything fancy. I would use the classical schema which is at the base of EMF and it is similar to what it is available in MPS.

I would add a sort of container named Package or Metamodel. The Package would list several Entities. We could also mark one of those entity as the root Entity.

Each Entity would have:

  • a name
  • an optional parent Entity (from which it inherits properties and relations)
  • a list of properties
  • a list of relations

Each Property would have:

  • a name
  • a type chosen among the primitive type. In practice I expect to use just String and Integers. Possibly enums in the future
  • a multiplicity (1 or many)

Each Relation would have:

  • a name
  • the kind: containment or reference. Now, the AST knows only about containments, however later we could implement symbol resolution and model transformations and at that stage we will need references
  • a target type: another Entity
  • a multiplicity (1 or many)

Next steps

I would start building a metamodel and later building generic tools taking advantage of the metamodel.

There are other things that typically need:

  • transformations: the AST which I generally get from ANTLR is determined by how I am force to express the grammar to obtain something parsable. Sometimes I have also to do some refactoring to improve performance. I want to transform the AST after parsing to obtain closer to the logical structure of the language.
  • unmarshalling: from the AST I want to produce the test back
  • symbol resolution: this could be absolutely not trivial, as I have found out building a symbol solver for Java

Yes, I know that some of you are thinking: just use Xtext. While I like EMF (Xtext is built on top of it), it has a steep learning curve and I have seen many people confused by it. I also do not like how OSGi plays with the non-OSGi world. Finally Xtext is coming with a lot of dependencies.

Do not get my wrong: I think Xtext is an amazing solution in a lot of contexts. However there are clients who prefer a leaner approach. For the cases in which it makes sense we need an alternative. I think it can be built on top of ANTLR, but there is work to do.

By the way years ago I built something similar for .NET and I called it NetModelingFramework.

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply