So you have parsed your code and built a clean AST for it. Now it is time to check if what the user has expressed make sense at all. We should perform validation, identifying semantical errors, to communicate together with lexical and syntactical errors (provided by the parser).

    This post is the part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools.

    1. Building a lexer
    2. Building a parser
    3. Creating an editor with syntax highlighting
    4. Build an editor with autocompletion
    5. Mapping the parse tree to the abstract syntax tree
    6. Model to model transformations
    7. Validation
    8. Generating bytecode

    After writing this series of posts I refined my method, expanded it, and clarified into this book titled How to create pragmatic, lightweight languages

    Code

    Code is available on GitHub under the tag 07_validation

    Implement semantic checks

    In the previous post we have seen how to implement the function process to execute an action on all the nodes of our AST. A typical case is that we want to execute certain operation only on certain nodes. We want still to use process to navigate the tree. We can do that by creating this function named specificProcess.

    fun <T: Node> Node.specificProcess(klass: Class<T>, operation: (T) -> Unit) {
        process { if (klass.isInstance(it)) { operation(it as T) } }
    }

    Let’s see how to use specificProcess to:

    • find all the VariableDeclarations and check they are not re-declaring a variable already declared
    • find all the VarReferences and verify they are not referring to a variable that has been not declared or has been declared after the VarReference
    • perform the same check done on VarReferences also for Assignments
    data class Error(val message: String, val position: Point)
    
    fun SandyFile.validate() : List<Error> {
        val errors = LinkedList<Error>()
    
        // check a variable is not duplicated
        val varsByName = HashMap<String, VarDeclaration>()
        this.specificProcess(VarDeclaration::class.java) {
            if (varsByName.containsKey(it.varName)) {
                errors.add(Error("A variable named '${it.varName}' has been already declared at ${varsByName[it.varName]!!.position!!.start}",
                        it.position!!.start))
            } else {
                varsByName[it.varName] = it
            }
        }
    
        // check a variable is not referred before being declared
        this.specificProcess(VarReference::class.java) {
            if (!varsByName.containsKey(it.varName)) {
                errors.add(Error("There is no variable named '${it.varName}'", it.position!!.start))
            } else if (it.isBefore(varsByName[it.varName]!!)) {
                errors.add(Error("You cannot refer to variable '${it.varName}' before its declaration", it.position!!.start))
            }
        }
        this.specificProcess(Assignment::class.java) {
            if (!varsByName.containsKey(it.varName)) {
                errors.add(Error("There is no variable named '${it.varName}'", it.position!!.start))
            } else if (it.isBefore(varsByName[it.varName]!!)) {
                errors.add(Error("You cannot refer to variable '${it.varName}' before its declaration", it.position!!.start))
            }
        }
    
        return errors
    }

    Ok, so invoking validate on the root of the AST will return all possible semantic errors.

    Getting all errors: lexical, syntactic, and semantic

    We first need to invoke the ANTLR parser and get:

    • the parse tree
    • the list of lexical and syntactic error
    data class AntlrParsingResult(val root : SandyFileContext?, val errors: List<Error>) {
        fun isCorrect() = errors.isEmpty() && root != null
    }
    
    fun String.toStream(charset: Charset = Charsets.UTF_8) = ByteArrayInputStream(toByteArray(charset))
    
    object SandyAntlrParserFacade {
    
        fun parse(code: String) : AntlrParsingResult = parse(code.toStream())
    
        fun parse(file: File) : AntlrParsingResult = parse(FileInputStream(file))
    
        fun parse(inputStream: InputStream) : AntlrParsingResult {
            val lexicalAndSyntaticErrors = LinkedList<Error>()
            val errorListener = object : ANTLRErrorListener {
                override fun reportAmbiguity(p0: Parser?, p1: DFA?, p2: Int, p3: Int, p4: Boolean, p5: BitSet?, p6: ATNConfigSet?) {
                    // Ignored for now
                }
    
                override fun reportAttemptingFullContext(p0: Parser?, p1: DFA?, p2: Int, p3: Int, p4: BitSet?, p5: ATNConfigSet?) {
                    // Ignored for now
                }
    
                override fun syntaxError(recognizer: Recognizer<*, *>?, offendingSymbol: Any?, line: Int, charPositionInline: Int, msg: String, ex: RecognitionException?) {
                    lexicalAndSyntaticErrors.add(Error(msg, Point(line, charPositionInline)))
                }
    
                override fun reportContextSensitivity(p0: Parser?, p1: DFA?, p2: Int, p3: Int, p4: Int, p5: ATNConfigSet?) {
                    // Ignored for now
                }
            }
    
            val lexer = SandyLexer(ANTLRInputStream(inputStream))
            lexer.removeErrorListeners()
            lexer.addErrorListener(errorListener)
            val parser = SandyParser(CommonTokenStream(lexer))
            parser.removeErrorListeners()
            parser.addErrorListener(errorListener)
            val antlrRoot = parser.sandyFile()
            return AntlrParsingResult(antlrRoot, lexicalAndSyntaticErrors)
        }
    
    }
    
    

    Then we map the parse tree to the AST and perform the semantic validation. Finally we return the AST and all the errors combined.

    data class ParsingResult(val root : SandyFile?, val errors: List<Error>) {
        fun isCorrect() = errors.isEmpty() && root != null
    }
    
    object SandyParserFacade {
    
        fun parse(code: String) : ParsingResult = parse(code.toStream())
    
        fun parse(file: File) : ParsingResult = parse(FileInputStream(file))
    
        fun parse(inputStream: InputStream) : ParsingResult {
            val antlrParsingResult = SandyAntlrParserFacade.parse(inputStream)
            val lexicalAnsSyntaticErrors = antlrParsingResult.errors
            val antlrRoot = antlrParsingResult.root
            val astRoot = antlrRoot?.toAst(considerPosition = true)
            val semanticErrors = astRoot?.validate() ?: emptyList()
            return ParsingResult(astRoot, lexicalAnsSyntaticErrors + semanticErrors)
        }
    
    }

    In the rest of the system we will simply call the SandyParserFacade without the need to invoke the ANTLR parser directly.

    Test validation

    Will it fly? Let’s verify that.

    class ValidationTest {
    
        @test fun duplicateVar() {
            val errors = SandyParserFacade.parse("""var a = 1
                                                   |var a =2""".trimMargin("|")).errors
            assertEquals(listOf(Error("A variable named 'a' has been already declared at Line 1, Column 0", Point(2,0))), errors)
        }
    
        @test fun unexistingVarReference() {
            val errors = SandyParserFacade.parse("var a = b + 2").errors
            assertEquals(listOf(Error("There is no variable named 'b'", Point(1,8))), errors)
        }
    
        @test fun varReferenceBeforeDeclaration() {
            val errors = SandyParserFacade.parse("""var a = b + 2
                                                   |var b = 2""".trimMargin("|")).errors
            assertEquals(listOf(Error("You cannot refer to variable 'b' before its declaration", Point(1,8))), errors)
        }
    
        @test fun unexistingVarAssignment() {
            val errors = SandyParserFacade.parse("a = 3").errors
            assertEquals(listOf(Error("There is no variable named 'a'", Point(1,0))), errors)
        }
    
        @test fun varAssignmentBeforeDeclaration() {
            val errors = SandyParserFacade.parse("""a = 1
                                                   |var a =2""".trimMargin("|")).errors
            assertEquals(listOf(Error("You cannot refer to variable 'a' before its declaration", Point(1,0))), errors)
    
        }
    

    Conclusions

    This is all nice and well: with a simple call we can get a list of all errors we have. For each of them we have a description and the position. This is enough for our compiler but we would now need to show these errors in the editor. We will do that in out of the future posts.

    Download the guide with 68 resources on Creating Programming Languages

    Receive the guide to your inbox to read it on all your devices when you have time

    Powered by ConvertKit
     
    Creating a Programming Language

    Learn to Create Programming Languages

    Subscribe to our newsletter to get the FREE email course that teaches you how to create a programming language