Building a compiler for your own language: model-to-model transformations

Most of the work done in tools supporting a language consists in manipulating the AST. In this post we are going to see how to perform transformation and processing on the Abstract Syntax Tree through model-to-model transformations. These techniques will be useful to perform operation like:

validation: finding errors in the AST
remove syntax sugar: by transforming the AST into equivalent but more explicit forms
fill symbol tables and resolve symbols

All these techniques will be based on generic ways to navigate and change the AST. Let’s see how.

This post is the part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools.

After writing this series of posts I refined my method, expanded it, and clarified into this book titled How to create pragmatic, lightweight languages

Code

Code is available on GitHub under the tag 06_transformations

The operations we need

We will need to:

process the AST: we want to do some operation to extract information from the AST
transform the AST: we want to transform the single nodes of the AST

Implement the transformations manually

We could implement the transformation manually on each single class of the AST. For example, this is how we could implement these methods on a SumExpression

fun SumExpression.process(operation: (Node) -> Unit) {
    operation.invoke(this)
    this.left.process(operation)
    this.right.process(operation)
}

fun SumExpression.transform(operation: (Node) -> Node) : Node {
    val newLeft = this.left.transform(operation) as Expression
    val newRight = this.right.transform(operation) as Expression
    return operation(SumExpression(newLeft, newRight))
}

We would have just to repeat it for each single class of our metamodel. Quite boring, eh?

Transformations using reflection

The alternative is to use reflection. In this way we can specify these operations for Node and they will work for every class of every metamodel (so basically, for every language we are going to work on).

fun Node.process(operation: (Node) -> Unit) {
    operation(this)
    this.javaClass.kotlin.memberProperties.forEach { p ->
        val v = p.get(this)
        when (v) {
            is Node -> v.process(operation)
            is Collection<*> -> v.forEach { if (it is Node) it.process(operation) }
        }
    }
}

fun Node.transform(operation: (Node) -> Node) : Node {
    operation(this)
    val changes = HashMap<String, Any>()
    this.javaClass.kotlin.memberProperties.forEach { p ->
        val v = p.get(this)
        when (v) {
            is Node -> {
                val newValue = v.transform(operation)
                if (newValue != v) changes[p.name] = newValue
            }
            is Collection<*> -> {
                val newValue = v.map { if (it is Node) it.transform(operation) else it }
                if (newValue != v) changes[p.name] = newValue
            }
        }
    }
    var instanceToTransform = this
    if (!changes.isEmpty()) {
        val constructor = this.javaClass.kotlin.primaryConstructor!!
        val params = HashMap<KParameter, Any?>()
        constructor.parameters.forEach { param ->
            if (changes.containsKey(param.name)) {
                params[param] = changes[param.name]
            } else {
                params[param] = this.javaClass.kotlin.memberProperties.find { param.name == it.name }!!.get(this)
            }
        }
        instanceToTransform = constructor.callBy(params)
    }
    return operation(instanceToTransform)
}

Testing the transformations

It all looks nice and well but the question is: does it actually work?
Let’s try.

In this little test we will transform references to a variable A into references to a variable B.

class ModelTest {

    @test fun transformVarName() {
        val startTree = SandyFile(listOf(
                VarDeclaration("A", IntLit("10")),
                Assignment("A", IntLit("11")),
                Print(VarReference("A"))))
        val expectedTransformedTree = SandyFile(listOf(
                VarDeclaration("B", IntLit("10")),
                Assignment("B", IntLit("11")),
                Print(VarReference("B"))))
        assertEquals(expectedTransformedTree, startTree.transform {
            when (it) {
                is VarDeclaration -> VarDeclaration("B", it.value)
                is VarReference -> VarReference("B")
                is Assignment -> Assignment("B", it.value)
                else -> it
            }
        })
    }

}

Conclusions

Another building block for the future. In the next steps we will see how to use this, for example to implement validation.

Create Programming Languages

Learn the basics of creating a programming language with this email course.

Building a compiler for your own language: model-to-model transformations

Code

The operations we need

Implement the transformations manually

Transformations using reflection

Testing the transformations

Conclusions

Create Programming Languages

Categories

More on Language Engineering

Transforming RPG Code into Clear Sequence Diagrams

Build an Editor With Kolasu and Language Servers