This post is part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools.
- Building a lexer
- Building a parser
- Creating an editor with syntax highlighting
- Build an editor with autocompletion
- Mapping the parse tree to the abstract syntax tree
- Model to model transformations
- Validation
- Generating bytecode
After writing this series of posts I refined my method, expanded it, and clarified into this book titled How to create pragmatic, lightweight languages
In this post, we will start working on a very simple expression language. We will build it in our language sandbox and therefore we will call the language Sandy.
I think that tool support is vital for a language: for this reason we will start with an extremely simple language but we will build rich tool support for it. To benefit from a language we need a parser, interpreters and compilers, editors and more. It seems to me that there is a lot of material on building simple parsers but very few material on building the rest of the infrastructure needed to make using a language practical and effective.
I would like to focus on exactly these aspects, making a language small but fully useful. Then you will be able to grow your language organically.
The code is available on GitHub: https://github.com/ftomassetti/LangSandbox. The code presented in this article corresponds to the tag 01_lexer.
The language
The language will permit to define variables and expressions. We will support:
- integer and decimal literals
- variable definition and assignment
- the basic mathematical operations (addition, subtraction, multiplication, division)
- the usage of parenthesis
Examples of a valid file:
var a = 10 / 3 var b = (5 + 3) * 2 var c = a / b
The tools we will use
We will use:
- ANTLR to generate the lexer and the parser
- use Gradle as our build system
- write the code in Kotlin. It will be very basic Kotlin, given I just started learning it.
Setup the project
Our build.gradle file will look like this
buildscript { ext.kotlin_version = '1.3.70' repositories { mavenCentral() maven { name 'JFrog OSS snapshot repo' url 'https://oss.jfrog.org/oss-snapshot-local/' } jcenter() } dependencies { classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version" } } apply plugin: 'kotlin' apply plugin: 'java' apply plugin: 'idea' apply plugin: 'antlr' repositories { mavenLocal() mavenCentral() jcenter() } dependencies { antlr "org.antlr:antlr4:4.8" compile "org.antlr:antlr4-runtime:4.8" compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version" compile "org.jetbrains.kotlin:kotlin-reflect:$kotlin_version" testCompile "org.jetbrains.kotlin:kotlin-test:$kotlin_version" testCompile "org.jetbrains.kotlin:kotlin-test-junit:$kotlin_version" testCompile 'junit:junit:4.13' } generateGrammarSource { maxHeapSize = "64m" arguments += ['-package', 'me.tomassetti.langsandbox'] outputDirectory = new File("generated-src/antlr/main/me/tomassetti/langsandbox".toString()) } compileJava.dependsOn generateGrammarSource sourceSets { generated { java.srcDir 'generated-src/antlr/main/' } } compileJava.source sourceSets.generated.java, sourceSets.main.java clean{ delete "generated-src" } idea { module { sourceDirs += file("generated-src/antlr/main") } }
We can run:
- ./gradlew idea to generate the IDEA project files
- ./gradlew generateGrammarSource to generate the ANTLR lexer and parser
Implementing the lexer
We will build the lexer and the parser in two separate files. This is the lexer:
lexer grammar SandyLexer; // Whitespace NEWLINE : 'rn' | 'r' | 'n' ; WS : [t ]+ ; // Keywords VAR : 'var' ; // Literals INTLIT : '0'|[1-9][0-9]* ; DECLIT : '0'|[1-9][0-9]* '.' [0-9]+ ; // Operators PLUS : '+' ; MINUS : '-' ; ASTERISK : '*' ; DIVISION : '/' ; ASSIGN : '=' ; LPAREN : '(' ; RPAREN : ')' ; // Identifiers ID : [_]*[a-z][A-Za-z0-9_]* ;
Now we can simply run ./gradlew generateGrammarSource and the lexer will be generated for us from the previous definition.
Testing the lexer
Testing is always important but while building languages it is absolutely critical: if the tools supporting your language are not correct this could affect all possible programs you will build for them. So let’s start testing the lexer: we will just verify that the sequence of tokens the lexer produces is the one we aspect.
package me.tomassetti.sandy import me.tomassetti.langsandbox.SandyLexer import org.antlr.v4.runtime.CharStreams import java.util.* import kotlin.test.assertEquals import org.junit.Test as test class SandyLexerTest { fun lexerForCode(code: String) = SandyLexer(CharStreams.fromString(code)) fun lexerForResource(resourceName: String) = SandyLexer(ANTLRInputStream(this.javaClass.getResourceAsStream("/${resourceName}.sandy"))) fun tokens(lexer: SandyLexer): List<String> { val tokens = LinkedList<String>() do { val t = lexer.nextToken() when (t.type) { -1 -> tokens.add("EOF") else -> if (t.type != SandyLexer.WS) tokens.add(lexer.ruleNames[t.type - 1]) } } while (t.type != -1) return tokens } @test fun parseVarDeclarationAssignedAnIntegerLiteral() { assertEquals(listOf("VAR", "ID", "ASSIGN", "INTLIT", "EOF"), tokens(lexerForCode("var a = 1"))) } @test fun parseVarDeclarationAssignedADecimalLiteral() { assertEquals(listOf("VAR", "ID", "ASSIGN", "DECLIT", "EOF"), tokens(lexerForCode("var a = 1.23"))) } @test fun parseVarDeclarationAssignedASum() { assertEquals(listOf("VAR", "ID", "ASSIGN", "INTLIT", "PLUS", "INTLIT", "EOF"), tokens(lexerForCode("var a = 1 + 2"))) } @test fun parseMathematicalExpression() { assertEquals(listOf("INTLIT", "PLUS", "ID", "ASTERISK", "INTLIT", "DIVISION", "INTLIT", "MINUS", "INTLIT", "EOF"), tokens(lexerForCode("1 + a * 3 / 4 - 5"))) } @test fun parseMathematicalExpressionWithParenthesis() { assertEquals(listOf("INTLIT", "PLUS", "LPAREN", "ID", "ASTERISK", "INTLIT", "RPAREN", "MINUS", "DECLIT", "EOF"), tokens(lexerForCode("1 + (a * 3) - 5.12"))) } }
Conclusions and next steps
We started with the first small step: we set up the project and built the lexer.
There is a long way in front of us before making the language usable in practice but we started. We will next work on the parser with the same approach: building something simple that we can test and compile through the command line.
The ANTLR Mega Tutorial as a PDF

Get the Mega Tutorial delivered to your email and read it when you want on the device you want