A Shiny, New ANTLR Release: 4.10

Every day we get closer to death, but on the other hand we also get closer to a new ANTLR release. All in all a fair trade, maybe. My point is ~~we are all going to die soon~~ that we have a new major release of ANTLR: 4.10. The previous one was in the most recent annus horribilis.

Contribution Process Changes

This is a major release for a few reasons, some notable for the project itself, some notable for all users. The project is adopting an improved contribution process. This is also interesting for all users because that includes a requirement to sign all commits indicating that they agree with the Developer Certificate of Origin. This is helpful for companies that need to keep track of all contributors of the code that they use, for legal or security reasons.

A New ATN Format, New Parsers

ANTLR internally uses a state machine called an augmented transition network (ATN). The format of this ATN has changed with this release. This has one major consequence: the parser generated with this new release cannot be used by old runtimes. So, you need to re-generate all of your parsers, when you update the runtime for your project. In practice, this is not a big issue, but it is something to take into account. We encounter this issue when using the ANTLR PHP runtime in our tutorial on ANTLR for PHP.

New Minimum Java Version

ANTLR now uses Java 11 for the source code and the compiled .class files for the ANTLR tool. This is relevant only if you are compiling the ANTLR tools itself. However, the Java runtime target have also been updated to require Java 8 (bumping up from Java 7). This means that your Java code must be compiled with Java 8 or later.

Now You Can Have Case Insensitive Lexers

This might be a small change, but a very welcome one. By default an ANTLR parser relies on a case sensitive lexer, so an input string like CODE is considered different than code. This is the best approach for a general case, but there are notable exceptions. For example, SQL is a common language in which case do not matter.

You could bypass the issue by redesigning a grammar or using a base class that automatically changes the input stream to adjust the case. However, the first option is problematic if you are relying on a public grammar, because that would force you to fork it. The second one is a problem because you effectively add a target-language dependent requirement to your grammar. This means that you need to use or create a case changing class for every runtime you use.

You should look the discussion about the change for understanding the nuances of this feature. This is something you should do especially if you are interested in languages that do not use a Latin alphabet, have non-obvious ways to change case or issues with transliteration. Example of such languages could be German or Russian.

Better Handling Of Reserved Words

ANTLR could sometimes generates parser using names that were conflicting with the ones used in your grammar, leaving you with unexpected runtime errors. This could be especially puzzling when using a grammar with multiple target languages, because it could happen that one grammar rule was fine for one language, but created issued in another.

ANTLR will now escape reserved words for each target language in order to fix this and similar issues.

Improved JavaScript Runtime

The JavaScript runtime has been reworked, improving compatibility with JavaScript style of code and the overall quality.

We run a few tests, like the one mentioned in our article about improving performance of ANTLR parsers. We found a noticeable improvement when parsing our Kotlin example file with the new JavaScript runtime.

Language	Startup	Time elapsed (seconds)
JavaScript with 4.9.3	Cold	4
JavaScript with 4.9.3	Warm-up	0.11
JavaScript with 4.10	Cold	3.7
JavaScript with 4.10	Warm-up	0.1

This is not a complete and scientific performance test, so your mileage may vary. However you might also get a 10% improvement, which is a nice change.

We did not found a noticeable improvement when parsing a JSON file, but it was already under 0.1 seconds in cold startup, so there was little to measure to begin with.

Summary

In addition to the mentioned changes, there are a number of bugfixes and small things that you can read in the release notes.

ANTLR is the best tool to build parsers out there: it is well-tested, reliable enough to be used in an enterprise setting and usable enough so that you actually want to work with it. This release confirms this, by offering a few quality-of-live improvements, fixes and maintenance changes.

To support you in polishing your parser you can use libraries like Kolasu:

Kolasu supplies the infrastructure to build a custom, possibly mutable, Abstract Syntax Tree (AST) using Kotlin. In particular it can be integrated easily with ANTLR, but it can also be used without.

You know it is good, because we made it.

If you still do not know how to use this wonderful tool, you can read our complete mega tutorial on ANTLR.

Resources

If you are looking to learn more about ANTLR, you can find interesting our video course “Using ANTLR like a Professional”.

A Shiny, New ANTLR Release: 4.10

Contribution Process Changes

A New ATN Format, New Parsers

New Minimum Java Version

Now You Can Have Case Insensitive Lexers

Better Handling Of Reserved Words

Improved JavaScript Runtime

Summary

Resources

Categories

More on ANTLR

We better Go with ANTLR 4.11

Interview with Kevin Mackey