Building a compiler for your own language: validation


So you have parsed your code and built a clean AST for it. Now it is time to check if what the user has expressed make sense at all. We should perform validation, identifying semantical errors, to communicate together with lexical and syntactical errors (provided by the parser).

This post is the part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools.

  1. Building a lexer
  2. Building a parser
  3. Creating an editor with syntax highlighting
  4. Build an editor with autocompletion
  5. Mapping the parse tree to the abstract syntax tree
  6. Model to model transformations
  7. Validation
  8. Generating bytecode

After writing this series of posts I refined my method, expanded it, and clarified into this book titled
How to create pragmatic, lightweight languages



Code is available on GitHub under the tag 07_validation

Implement semantic checks

In the previous post we have seen how to implement the function process to execute an action on all the nodes of our AST. A typical case is that we want to execute certain operation only on certain nodes. We want still to use process to navigate the tree. We can do that by creating this function named specificProcess.

Let’s see how to use specificProcess to:

  • find all the VariableDeclarations and check they are not re-declaring a variable already declared
  • find all the VarReferences and verify they are not referring to a variable that has been not declared or has been declared after the VarReference
  • perform the same check done on VarReferences also for Assignments

Ok, so invoking validate on the root of the AST will return all possible semantic errors.

Getting all errors: lexical, syntactic, and semantic

We first need to invoke the ANTLR parser and get:

  • the parse tree
  • the list of lexical and syntactic error

Then we map the parse tree to the AST and perform the semantic validation. Finally we return the AST and all the errors combined.

In the rest of the system we will simply call the SandyParserFacade without the need to invoke the ANTLR parser directly.

Test validation

Will it fly? Let’s verify that.


This is all nice and well: with a simple call we can get a list of all errors we have. For each of them we have a description and the position. This is enough for our compiler but we would now need to show these errors in the editor. We will do that in out of the future posts.

Download the guide with 68 resources on Creating Programming Languages


Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
2 replies

Comments are closed.