On the need of a generic library around ANTLR: using reflection to build a metamodel

On the need of a generic library around ANTLR_ using reflection to build a metamodel

I am a Language Engineer: I use several tools to define and process languages.

Among other tools I use ANTLR: it is simple, it is flexible, I can build things around it.

However I find myself rebuilding similar tools around ANTLR for different projects. I see two problems with that:

  • ANTLR is a very good building block but with ANTLR alone not much can be done: the value lies in the processing we can do on the AST and I do not see an ecosystem of libraries around ANTLR
  • ANTLR does not produce a metamodel of the grammar: without it becomes very difficult to build generic tools around ANTLR

Let me explain that:

  • For people with experience with EMF: we basically need an Ecore-equivalent for each grammar.
  • For the others: read next paragraph

Why we need a metamodel

Suppose I want to build a generic library to produce an XML file or a JSON document from an AST produced by ANTLR. How could I do that?

Well, given a ParseRuleContext I can take the rule index and find the name. I have generated the parser for the Python grammar to have some examples, so let’s see how to do that with an actual class:

Python3Parser.Single_inputContext astRoot = pythonParse(...my code...);
String ruleName = Python3Parser.ruleNames[astRoot.getRuleIndex()];

Let’s look at the class Single_inputContext:

public static class Single_inputContext extends ParserRuleContext {
    public TerminalNode NEWLINE() { return getToken(Python3Parser.NEWLINE, 0); }
    public Simple_stmtContext simple_stmt() {
        return getRuleContext(Simple_stmtContext.class,0);
    public Compound_stmtContext compound_stmt() {
        return getRuleContext(Compound_stmtContext.class,0);
    public Single_inputContext(ParserRuleContext parent, int invokingState) {
        super(parent, invokingState);
    @Override public int getRuleIndex() { return RULE_single_input; }
    public void enterRule(ParseTreeListener listener) {
        if ( listener instanceof Python3Listener ) ((Python3Listener)listener).enterSingle_input(this);
    public void exitRule(ParseTreeListener listener) {
        if ( listener instanceof Python3Listener ) ((Python3Listener)listener).exitSingle_input(this);

In this case I would like to:

I should obtain something like this:

<Single_input NEWLINES="...">

Good. It is very easy for me to look at the class and recognize these elements, however how can I do that automatically?

Reflection, obviously, you will think.

Yes. That would work. However what if when we have multiple elements? Take this class:

public static class File_inputContext extends ParserRuleContext {
    public TerminalNode EOF() { return getToken(Python3Parser.EOF, 0); }
    public List<TerminalNode> NEWLINE() { return getTokens(Python3Parser.NEWLINE); }
    public TerminalNode NEWLINE(int i) {
        return getToken(Python3Parser.NEWLINE, i);
    public List<StmtContext> stmt() {
        return getRuleContexts(StmtContext.class);
    public StmtContext stmt(int i) {
        return getRuleContext(StmtContext.class,i);
    public File_inputContext(ParserRuleContext parent, int invokingState) {
        super(parent, invokingState);
    @Override public int getRuleIndex() { return RULE_file_input; }
    public void enterRule(ParseTreeListener listener) {
        if ( listener instanceof Python3Listener ) ((Python3Listener)listener).enterFile_input(this);
    public void exitRule(ParseTreeListener listener) {
        if ( listener instanceof Python3Listener ) ((Python3Listener)listener).exitFile_input(this);
Class clazz = Python3Parser.File_inputContext.class;
Method method = clazz.getMethod("stmt");
Type listType = method.getGenericReturnType();
if (listType instanceof ParameterizedType) {
    Type elementType = ((ParameterizedType) listType).getActualTypeArguments()[0];
    System.out.println("ELEMENT TYPE "+elementType);
ELEMENT TYPE class me.tomassetti.antlrplus.python.Python3Parser$StmtContext

To define metamodels I would not try to come up anything fancy. I would use the classical schema which is at the base of EMF and it is similar to what it is available in MPS.

I would add a sort of container named Package or Metamodel. The Package would list several Entities. We could also mark one of those entity as the root Entity.

Each Entity would have:

  • a name
  • an optional parent Entity (from which it inherits properties and relations)
  • a list of properties
  • a list of relations

Each Property would have:

  • a name
  • a type chosen among the primitive type. In practice I expect to use just String and Integers. Possibly enums in the future
  • a multiplicity (1 or many)

Each Relation would have:

  • a name
  • the kind: containment or reference. Now, the AST knows only about containments, however later we could implement symbol resolution and model transformations and at that stage we will need references
  • a target type: another Entity
  • a multiplicity (1 or many)

Next steps

I would start building a metamodel and later building generic tools taking advantage of the metamodel.

There are other things that typically need:

  • transformations: the AST which I generally get from ANTLR is determined by how I am force to express the grammar to obtain something parsable. Sometimes I have also to do some refactoring to improve performance. I want to transform the AST after parsing to obtain closer to the logical structure of the language.
  • unmarshalling: from the AST I want to produce the test back
  • symbol resolution: this could be absolutely not trivial, as I have found out building a symbol solver for Java

Yes, I know that some of you are thinking: just use Xtext. While I like EMF (Xtext is built on top of it), it has a steep learning curve and I have seen many people confused by it. I also do not like how OSGi plays with the non-OSGi world. Finally Xtext is coming with a lot of dependencies.

Do not get my wrong: I think Xtext is an amazing solution in a lot of contexts. However there are clients who prefer a leaner approach. For the cases in which it makes sense we need an alternative. I think it can be built on top of ANTLR, but there is work to do.

By the way years ago I built something similar for .NET and I called it NetModelingFramework.

The ANTLR Mega Tutorial as a PDF

Antlr mega tutorial

Get the Mega Tutorial delivered to your email and read it when you want on the device you want

Powered by ConvertKit

Do You Need a Parser?

We can design parsers for new languages, or rewrite parsers for existing languages built in house.

On top of parsers we can then help building interpreters, compilers, code generators, documentation generators, or translators (code converters) to other languages.