How to write a transpiler

Introduction

A transpiler is a program that can process code in a certain language and generate the corresponding code in another language.

You can find also other terms being used, like source-to-source translator.

When you may want to use a transpiler?

Simply put, when you have a project that is giving value to you but you are not happy with the programming language in which the project is written.

In this article we are going to see:

  • What a transpiler is and what it is not
  • What problems we can solve with a transpiler
  • The architecture of a well-designed transpiler
  • A complete example to transform RPG code into Java code
  • Some extra resources for further investigation of the subject

The code presented in this tutorial is available on GitHub: https://github.com/Strumenta/rpg-to-java-transpiler

Why not a rewrite?

But I could just rewrite the project from scratch in Java (or C#, or Python, or whatever you prefer)”

This is something many naive and enthusiastic developers would immediately suggest. And this may work for small systems. For large systems, we agree with Joel Spolsky, when he points that rewriting the code of a significant project from scratch is the classical top mistake a company can do (see Things You Should Never Do).

Why this is the case?

Because a codebase that has been around for some years has captured a lot of extremely valuable knowledge and that knowledge is not readily available in any other form than in the codebase.

So, yes, you can take an army of developers, make them read the existing codebase, understanding (and it will not be trivial), and rewriting it in the new language. They will eventually get it right, besides a few mistakes to correct here and there. So after a few years and a significant expense, you will get a system, written in the new language, which works almost exactly as the new system. If we suppose that you did not need to evolve the old system in the meantime.

This is typically an expensive plan and along the way, developers realize how they underestimate the complexity of the original system. At that point, the organization either gives up or has to increase significantly the budget for the conversion.

The fact is, almost all successful complex systems are evolutions of simpler systems. Recreating complex systems from scratch is hard.

It is also not easy to champion these projects internally as at the end in the project will have the same features as the original project, so if the cost is very high… it could require a lot of pushing to get it over the line.

Now, there are good reasons to move away from certain languages or platforms. It is just that rewriting is not the right solution, but the problem is real. The solution is using a transpiler, which will automatically translate the code for you.

The many names of a transpiler

A transpiler is sometimes called in other ways:

  • source-to-source compiler: this name makes sense. In a transpiler, we indeed read code in the original language and we produce something out of it, like a compiler does. The fact is that what we produce is not something immediately executable, is not a binary executable file or some bytecode for a virtual machine, but it is instead code in the target language. That code will have in turn to be compiled or interpreted to be executed. For example, if we transpile Java to C++, the resulting C++ code will have to be compiled before it can be executed. While if we transpile Java to Python, the resulting code will have to be interpreted using the Python interpreter.
  • transcompiler: this is very similar to the term transpiler. It is just another way to merge the words translator and compiler
  • source-to-source translator: this is maybe the best term. I like it because this expresses clearly what a transpiler does. It translates code from one language to another. I think terms as transpiler or transcompiler are more intuitive when we translate from a language at a higher level of abstraction down to a language with a lower level of abstraction because each construct is expanded in a set of more basic instructions, similarly to what a compiler does.

Ok, we have seen different names used to indicate a transpiler. I suggest using always the term transpiler, as it seems the most widely accepted one.

Transpiler vs Interpreter vs Compiler

Some readers could be confused about the differences between transpilers, interpreters, and compilers, so let’s try to summarize them in the following table:

InputOutputWhat it does
InterpreterSource code-Process source code and execute it without modifying it
CompilerSource codeNative executable or bytecode Process source code and generate something executable
TranspilerSource code Source code (in another language) Process source code and generate source code in another language

Generally speaking, a transpiler is more similar to a compiler than an interpreter.

The difference is that a compiler translates to something at a lower level of abstraction, while a transpiler produces an output at a similar level of abstraction of the input.

There are border-line cases: if we had a program that was to process high-level code, for example in C++, and produced the corresponding code in assembler, we would still consider it a transpiler, but some would call it a compiler and we could see their reasons to do it, as the level of abstraction of the target is much lower.

So these definitions are not obvious and sometimes there is confusion. Especially as some years ago, the creators of some languages were not creating proper compilers. Instead, they were building transpilers from their language to C and then used the C compiler. They sometimes called their transpiler “a compiler”, to not put the accent on the fact that they did not have a proper compiler for their language. For example, here you can find a list of open-source “compilers” which are actually transpilers targeting C: https://github.com/dbohdan/compilers-targeting-c

What problems does a Transpiler solve

Yes, we want to see how to design a great transpiler but before diving into that let’s examine what are the business reasons to build a transpiler.

In other words, which problems are solved by a transpiler?

We can identify four ones:

  1. Migration: if you have legacy code you want to migrate to some platform with better capabilities or for which it is easier to find hardware or support. For example, your code is written in COBOL and runs on some mainframe. The cost of your MIPS is killing you. Or your code is written in RPG, and you would like to move to the JVM to take advantage of many more libraries.
  2. Compatibility: you need your code to be compatible with certain other systems. For example, browsers support exclusively JavaScript. So if you want to run your code in the browser and it is not written in JavaScript, you can then transpile it to JavaScript. Another case is when you want to have compatibility with different versions of a compiler or interpreter. You may want to transpiler your Python 2 code to Python 3, for example. Or you modern JavaScript code to a previous version of JavaScript, so that it is more widely supported.
  3. Coding skills: sometimes the problem is that your code is written in a language that no one knows anymore. In that case, you may want to translate it to a more common language so that it is easier to find the competencies to maintain and evolve it. For example, if your code is written in a long-forgotten 4GL language you may want to translate it to Java or C#.
  4. Performance: another reason to perform a transpilation is to target a faster language. For example, for a client of ours we wrote a transpiler from VBA to C++, so that the resulting code was significantly faster.

The JavaScript transpilers

Let’s examine a well-known case: the JavaScript transpilers.

Some languages can be compiled to JavaScript:

  • CoffeeScript
  • TypeScript
  • Kotlin
  • Elm
  • ClojureScript

The reasons why you may prefer to program in these languages instead of using JavaScript directly are beyond the scope of this article. The fact is that the browser cannot execute code written in these languages directly, so if we want to use it in web applications we need to first transpile it to JavaScript.

Sometimes we may want to transpile even across different versions of JavaScript. This is the case when we want to take advantage of the features of a recent version of JavaScript, however, at the same time, we want out code to work also on older browsers. Those older browsers will not support the more recent version of JavaScript that we want to use. What to do then? Simple, we’ll use a recent version of JavaScript to program and then transpile the code to a more established version of JavaScript, one we can expect to be widely supported in browsers. This way, we get the best of both worlds.

For doing this we may want tools such as Traceur and Babel. Interestingly they present themselves as “JavaScript compilers” but in reality, they are transpilers, as they produce code, which is then interpreted in the browser.

When it makes sense to write a transpiler and when it does not

You may want to build a transpiler when you have a system written in some language which is not ideal, for one of the four reasons we saw in the previous section.

A transpiler typically makes sense if:

  • The system you want to transform is valuable and will be valuable for the foreseeable future
  • The system is significant. Typically 100K lines of code or more

A transpiler does not make sense if:

  • You plan to soon change significantly the logic of the system
  • The system is not that valuable to you
  • The system is small: 10K lines of code or less

In those cases, it could make sense to dismiss or rewrite the original system, as building a transpiler might not make sense. You may also consider using a transpiler if it is available off-the-shelf as a software product.

The Architecture of a Transpiler

Let’s see what is the right architecture for a transpiler.

We start from code, in the original language, and arrive to code, in the target language.

To perform the transformation, however, we will not work directly on code. Instead, we will work on a representation of the code that would make this operation easier and more maintainable. This representation is the abstract syntax tree (AST).

So our transpiler will have these stages:

  1. Parsing stage: we will adopt a parser to obtain an AST from the code of the original language
  2. Transformation stage: we will transform in one or more steps the AST of the original language into the corresponding AST of the target language
  3. Generation stage: once we have the AST of the target language we generate the corresponding code out of it

 

Let’s make a few considerations.

For the parsing stage, we can reuse a parser for the original language. So a parser is a component of a transpiler. For example, our Cobol parser or our PL/SQL parser could be used as a component of a transpiler from Cobol to Python or from PL/SQL to Java.

The transformation stage is where we implement the core of the transpiler. When transpiling two similar languages we may avoid the intermediate model, while when translating two languages which are very different we may want to use more than one intermediate step in the transformation. In general, having one intermediate step works well in common cases.

Finally, we come to the generation stage. For this, we need a code generator. It can be based on templates and this part is typically the easiest one to build. It could be boring and takes some time but typically there aren’t major challenges to overcome if the AST has been designed well.

How does the architecture of a transpiler compare to the architecture of an interpreter and a compiler?

  • A transpiler, an interpreter and a compiler all share the first stage: the parsing stage. The same parser can be used to build all these systems
  • The difference is in what those systems do once they get the AST of the original language. A transpiler transforms it in another AST and eventually generates code out of it. A compiler also transforms it and eventually generates binary code or bytecode out of it. An interpreter instead directly executes the statements specified in the AST.

Considerations for the implementation of a transpiler

Now, there is a single biggest mistake we see in persons trying to implement a transpiler without experience in this: they try to generate directly the code of the target language from the AST of the original language. In some cases, they do even worse: they do not even build a proper AST for the original language but they use a lower lever tree, called a parse-tree or concrete-syntax-tree. When they do, they get something working quickly for the simplest cases but they run very soon into a wall and they are unable to complete the transpiler. At this stage, they look for help and frequently ask for our services. The intermediate model is useful because it permits to break a complex transformation into steps, which can be tested independently. It is also useful when building multiple targets. For example, suppose you want to transpile RPG into both Java and C#. You could transform the AST for RPG into an intermediate model and then build a transformation from the intermediate model to the Java AST and one from the intermediate model to the C# AST. In this way, you would reuse most of the work in the two transpilers.

One important aspect to consider during the transpilation is the ability to trace back an element to the corresponding line in the original code. This is useful for producing error messages during the transpilation. If later the target code is executed sometimes we want to be able to trace back to the original code for debugging. For example, if we write code in ClojureScript and transpile it to JavaScript, when an error arises in our web application we want to correct it in the original ClojureScript code, not in the transpiled JavaScript code, as it is generated and overwritten each time. To do that a mechanism like source maps can be used: they basically specify a correspondence between a line in the generated code and a line in the original code.

Another aspect to consider is the runtime libraries. The original code was written using the standard library of the original language. When we translate the code into the target language we may reuse the standard library of the target language but we may need to complement with additional libraries. For example, many languages have some implicit function or standard library to print on the screen. This is typically not a problem because also the target language will have similar functionality. For example, if we are translating Java code to C, we may translate calls to System.out.println into calls to printf. However, the Java standard library supports also things like XML processing and network operations. So we either need to find C libraries that implement those functionalities or we need to create them so that we can translate our calls to Java methods related to these operations into equivalent C operations. Implementing the runtime libraries can be a very significant effort, and this is something which is frequently overlooked.

Tutorial on writing a transpiler: transpile RPG code into Java

We are going to see how to build a transpiler from RPG to Java. The full code discusses here is on GitHub: https://github.com/Strumenta/rpg-to-java-transpiler

When we have finished it our transpile will be able to process code like this:

      * Calculates number of Fibonacci in an iterative way
     D ppdat           S              8
     D NBR             S              8  0
     D RESULT          S              8  0 INZ(0)
     D COUNT           S              8  0
     D A               S              8  0 INZ(0)
     D B               S              8  0 INZ(1)
     C     *entry        plist
     C                   parm                    ppdat                          I
      *
     C                   Eval      NBR    = %Dec(ppdat : 8 : 0)
     C                   EXSR      FIB
     C                   clear                   dsp              50
     C                   eval      dsp= 'FIBONACCI OF: ' +  ppdat +
     C                                 ' IS: ' + %CHAR(RESULT)
     C                   dsply                   dsp
     C                   eval      ppdat = %CHAR(RESULT)
     C                   seton                                        lr
      *--------------------------------------------------------------*
     C     FIB           BEGSR
     C                   SELECT
     C                   WHEN      NBR = 0
     C                   EVAL      RESULT = 0
     C                   WHEN      NBR = 1
     C                   EVAL      RESULT = 1
     C                   OTHER
     C                   FOR       COUNT = 2 TO NBR
     C                   EVAL      RESULT = A + B
     C                   EVAL      A = B
     C                   EVAL      B = RESULT
     C                   ENDFOR
     C                   ENDSL
     C                   ENDSR
      *--------------------------------------------------------------*

Into this Java code:

public class CALCFIB {

    private java.lang.String ppdat;

    private long NBR;

    private long RESULT = 0;

    private long COUNT;

    private long A = 0;

    private long B = 1;

    private java.lang.String dsp;

    void executeProgram(java.lang.String ppdat) {
        this.ppdat = ppdat;
        this.NBR = java.lang.Integer.valueOf(this.ppdat);
        FIB();
        this.dsp = "";
        this.dsp = "FIBONACCI OF: " + this.ppdat + " IS: " + "" + this.RESULT;
        java.lang.System.out.println(this.dsp);
        this.ppdat = "" + this.RESULT;
    }

    void FIB() {
        if (this.NBR == 1) {
            this.RESULT = 1;
        } else {
            for (this.COUNT = 2; this.COUNT <= this.NBR; this.COUNT++) {
                this.RESULT = this.A + this.B;
                this.A = this.B;
                this.B = this.RESULT;
            }
        }
    }
}

The transpiler we are presenting in this article has the structure we would suggest for a real, industrial-quality transpiler. However, we do not dive into the building a proper runtime-library to keep the article manageable. After all, it is already over 6.000 words long :)

Understanding the RPG code

RPG is an old positional language, typically used on the AS/400, now renamed IBM i. It has been widely used and it is still present in the codebases of large organizations. You do not need to know anything about RPG to understand this tutorial as we are going to explain everything you need.

First of all, RPG is a positional language, so, the column at which you write a certain character matters. For example, you can look at the character at column 6 to understand what a line specifies: if you find the character D, then the line contains a data declaration. If you find the character C then the line contains a statement. The other lines in our example are comments and you can disregard them.

We have seen that a D at column 6 indicates a data declaration. So lines 2, 3, 4, 5, 6, 7 contain data declarations. In particular, they define global variables.

     D ppdat           S              8
     D NBR             S              8  0
     D RESULT          S              8  0 INZ(0)
     D COUNT           S              8  0
     D A               S              8  0 INZ(0)
     D B               S              8  0 INZ(1)
  • At line 2 we have the declaration of a string of 8 characters named ppdat
  • At line 3 we have the declaration of an integer of 8 digits named NBR (the 0 indicates number of decimal digits)
  • At line 4 we have the declaration of an integer of 8 digits named RESULT. It is initialized to 0
  • At line 5 we have the declaration of an integer of 8 digits named COUNT
  • At line 6 we have the declaration of an integer of 8 digits named A. It is initialized to 0
  • At line 7 we have the declaration of an integer of 8 digits named B. It is initialized to 1
     C     *entry        plist
     C                   parm                    ppdat                          I

Line 8 and 9 define the plist which means “parameters list”. In practice, this is the list of values we should specify when invoking the program. In this example, the plist tells us that when we invoke this program we should specify a value for ppdat.

The code of the body of the program runs from line 11 to line 18.

From line 20 to line 33 we have a subroutine named FIB. Think of a method taking no parameters and returning no values. It communicates with the rest of the program by reading and writing global variables.

     C                   Eval      NBR    = %Dec(ppdat : 8 : 0)
     C                   EXSR      FIB
     C                   clear                   dsp              50
     C                   eval      dsp= 'FIBONACCI OF: ' +  ppdat +
     C                                 ' IS: ' + %CHAR(RESULT)
     C                   dsply                   dsp
     C                   eval      ppdat = %CHAR(RESULT)
     C                   seton                                        lr

The body of the program does the following:

  • At line 11 it parses the value of ppdat as a number and assigns it to NBR
  • At line 12 it invokes the subroutine FIB
  • At line 13 it sets the variable dsp to an empty string. It also implicitly declares the variable dsp
  • At lines 14 and 15 it assigns a new value to dsp. The value is obtained combining string literals, ppdat, and RESULT (after converting it from a number into a string)
  • At line 16 it prints on the string the value of dsp
  • At line 17 it assigns the value of ppdat to the string representation of RESULT
  • At line 18 it assigns a flag that indicates that the next execution of this program should not retain the values from this execution. Feel free to ignore it as it is not important for our goals
     C     FIB           BEGSR
     C                   SELECT
     C                   WHEN      NBR = 0
     C                   EVAL      RESULT = 0
     C                   WHEN      NBR = 1
     C                   EVAL      RESULT = 1
     C                   OTHER
     C                   FOR       COUNT = 2 TO NBR
     C                   EVAL      RESULT = A + B
     C                   EVAL      A = B
     C                   EVAL      B = RESULT
     C                   ENDFOR
     C                   ENDSL
     C                   ENDSR

The subroutine FIB does the following:

  • The entire body of the subroutine is composed of a select statement, which is basically equivalent to a series of if-else. It is similar to a Java switch statement, without an expression to switch on.
  • At line 22 we have the first case of the select statement. When NBR is equal to zero we assign 0 to RESULT
  • At line 24 we have the second case of the select statement. When NBR is equal to one we assign 1 to RESULT
  • At line 26 we have the other clause: when none of the previous clauses has been called then we will execute the body of the “other” clause
  • At line 27 we have the body of the “other” clause: it is composed by a single for-statement. In the for-statement, the value COUNT is initialized to 2 and it is incremented until it becomes equal to the value of NBR. For example, if NBR is equal to 5 we will execute the body of the for statement with COUNT equals 2, then 3, then 4, and finally 5.
  • At lines 28, 29, and 30 we have the body of the for-statement. It assigns the sum of A and  to RESULT, then to A, finally RESULT to B

Setup of the project

We will write our transpiler in Kotlin and we use Gradle as our build tool.

Let’s take a look at the dependencies:

  • We will use the RPG parser from the Jariko project. You can read more about Jariko in an article we wrote: JaRIKo, an RPG Interpreter in Kotlin.
  • We will then use JavaParser to define the Java AST and generate code out of it.
  • Finally, we add JUnit for our tests.
plugins {
    id 'org.jetbrains.kotlin.jvm' version '1.3.72'
}

repositories {
   maven { url 'https://jitpack.io' }
   mavenCentral()
}

dependencies {
    implementation 'com.github.smeup.jariko:rpgJavaInterpreter-core:v0.1.5'
    implementation 'com.github.javaparser:javaparser-symbol-solver-core:3.16.1'
    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1'
}

tasks.test {
    useJUnitPlatform()
    testLogging {
        events("passed", "skipped", "failed")
    }
}

The general structure of the transpiler

We will define a class containing the main method to make possible to invoke our transpiler from the command line

The main method will look like this:

fun main(args: Array<String>) {
    if (args.size != 1) {
        System.err.println("Exactly one argument expected");
        exitProcess(1)
    }
    val inputFile = File(args[0])
    if (!inputFile.isFile || !inputFile.exists()) {
        System.err.println("Path specified does not exist or it is not a file: $inputFile");
        exitProcess(1)
    }
    val transpilationRes = transpileRpgToJava(FileInputStream(inputFile), inputFile.nameWithoutExtension)
    println(transpilationRes)
}

Basically, we expect the user to specify the path to the RPG file to convert. We then invoke the transpileRpgToJava method. It takes an InputStream providing access to the input to convert and the name of the program to generate. We obtain the name of the program from the file name.

For example, if we transpile foo.rpgle into Java code, we will generate a program named foo.

Now, let’s look at the transpileRpgToJava code:

fun transpileRpgToJava(source: InputStream, name: String) : String {
    val rpgAst = parseRpgCode(source)
    val javaAst = transform(rpgAst, name)
    return generate(javaAst)
}

In this method, we invoke the RPG parser we got from Jariko. Then we invoke transform to obtain the equivalent Java AST. Finally, we call generate to produce Java code from the Java AST.

Let’s see the transform method:

fun transform(rpgAst: CompilationUnit, name: String) : com.github.javaparser.ast.CompilationUnit {
    val intermediateAst = transformFromRPGtoIntermediate(rpgAst, name)
    val javaAst = transformFromIntermediateToJava(intermediateAst)
    return javaAst
}

We first transform the RPG AST into the intermediate AST, then we transform the intermediate AST into the Java AST and we return it.

The generate method is very simple as JavaParser makes easy to generate code from any piece of the Java AST:

fun generate(javaAst: com.github.javaparser.ast.CompilationUnit) : String {
    return javaAst.toString(PrettyPrinterConfiguration())
}

In the end, we obtain this code for our Transpiler.kt file:

package com.strumenta.rpgtojava

import com.github.javaparser.printer.PrettyPrinterConfiguration
import com.smeup.rpgparser.interpreter.*
import com.smeup.rpgparser.parsing.ast.*
import com.smeup.rpgparser.parsing.facade.RpgParserFacade
import com.smeup.rpgparser.parsing.parsetreetoast.resolveAndValidate
import com.strumenta.rpgtojava.intermediateast.GProgram
import com.strumenta.rpgtojava.transformations.transformFromIntermediateToJava
import com.strumenta.rpgtojava.transformations.transformFromRPGtoIntermediate
import java.io.File
import java.io.FileInputStream
import java.io.InputStream
import kotlin.system.exitProcess

fun transform(rpgAst: CompilationUnit, name: String) : com.github.javaparser.ast.CompilationUnit {
    val intermediateAst = transformFromRPGtoIntermediate(rpgAst, name)
    val javaAst = transformFromIntermediateToJava(intermediateAst)
    return javaAst
}

fun generate(javaAst: com.github.javaparser.ast.CompilationUnit) : String {
    return javaAst.toString(PrettyPrinterConfiguration())
}

fun parseRpgCode(source: InputStream) : CompilationUnit {
    val facade = RpgParserFacade()
    facade.muteSupport = false
    val rpgAst = facade.parseAndProduceAst(source)
    rpgAst.resolveAndValidate(DummyDBInterface)
    return rpgAst
}

fun transformRpgToIntermediate(source: InputStream, name: String) : GProgram {
    val rpgAst = parseRpgCode(source)
    return transformFromRPGtoIntermediate(rpgAst, name)
}

fun transpileRpgToJava(source: InputStream, name: String) : String {
    val rpgAst = parseRpgCode(source)
    val javaAst = transform(rpgAst, name)
    return generate(javaAst)
}

fun main(args: Array<String>) {
    if (args.size != 1) {
        System.err.println("Exactly one argument expected");
        exitProcess(1)
    }
    val inputFile = File(args[0])
    if (!inputFile.isFile || !inputFile.exists()) {
        System.err.println("Path specified does not exist or it is not a file: $inputFile");
        exitProcess(1)
    }
    val transpilationRes = transpileRpgToJava(FileInputStream(inputFile), inputFile.nameWithoutExtension)
    println(transpilationRes)
}

Defining the intermediate AST

We have seen that the RPG AST will come from Jariko, and the Java AST will come from JavaParser. We will instead need to define the intermediate AST ourselves.

While examining the right architecture of transpilers we have discussed the need for an intermediate AST. It acts as a buffer within the source and the target language, and it permits to split a complex transformation in two smaller transformations: from the source language to intermediate AST, and from the intermediate AST to the target language. When we need to define the intermediate AST we need to make it in a way that is as generic and simple as possible. One way to think about the intermediate AST is to think to the AST of a generic language, and imagining that you have to translate this AST to very diverse languages.

In practice, we do not start by writing the whole intermediate AST, but we typically grow it incrementally as we implement the transpiler.

In this case, we can start writing the element that represents an entire program. To make code more readable we will use a prefix for all the classes of the intermediate AST. That prefix is G which stands for Generic.

interface GNamed {
    val name: String
}

data class GProgram(
        override val name: String,
        val globalVariables: MutableList<GGlobalVariable> = mutableListOf(),
        val mainFunction: GFunction = GFunction("executeProgram"),
        val otherFunctions: MutableList<GFunction> = mutableListOf()
) : Node(), GNamed

Note that every class of our intermediate AST will extend, directly or indirectly, Node. Node is a class from the Kolasu library. It is a dependency of Jariko, so by adding Jariko as a dependency we indirectly include it. Kolasu is a library that simplifies writing ASTs in Kotlin.

In practice, we say that our program will contain:

  • A set of global variables
  • A function representing the entry point of the program
  • A set of secondary functions

In the repository, you can see that we organized in the intermediate AST under a single package (com.strumenta.rpgtojava.intermediateast) and we have split the code into separate files.

For the details look at the code on GitHub: com.strumenta.rpgtojava.intermediateast

From RPG to the intermediate AST

To make this code run we need to define transformFromRPGToIntermediate:

fun transformFromRPGtoIntermediate(rpgCu: CompilationUnit, name: String) : GProgram {
    val ctx = RpgToIntermediateContext()
    val program = GProgram(name)
    // Translate data declarations
    rpgCu.allDataDefinitions.forEach {
        val gv = GGlobalVariable(it.name, it.type.toGType())
        if (it is DataDefinition) {
            if (it.initializationValue != null) {
                gv.initialValue = (it.initializationValue as Expression).toGExpression(ctx)
            }
        }
        program.globalVariables.add(gv)
        ctx.registerCorrespondence(it, gv)
    }
    // Create the mapping for the functions, as they could be called
    rpgCu.subroutines.forEach {
        val gf = GFunction(it.name)
        program.otherFunctions.add(gf)
        ctx.registerCorrespondence(it, gf)
    }
    // Translate main function
    rpgCu.entryPlist?.params?.forEach {
        program.mainFunction.parameters.add(GFunction.Parameter(it.param.referred!!.name, it.param.referred!!.type.toGType()))
    }
    program.mainFunction.body.addAll(rpgCu.main.stmts.toGStatements(ctx))
    // Translate subroutines
    rpgCu.subroutines.forEach {
        val gf = ctx.toGFunction(it)
        gf.body.addAll(it.stmts.toGStatements(ctx))
    }
    return program
}

Let’s examine this code:

  • in the ctx variable, we keep track of certain correspondences between elements in the RPG AST and elements we create in the intermediate AST. This is a typical pattern we adopt in interpreters
  • we create the root of the intermediate AST, which is an element of type GProgram
  • we navigate all the data definitions present in the RPG code and, for each of them, we create a global variable in the intermediate AST. Look at it.type.toGType(): here we are transforming an RPG type into a corresponding type in the intermediate AST. We do a similar thing for expressions and for statements. For each data definition, we check if it has an initialization value. If this is the case we convert the expression defining such value
  • we then iterate the first time over the RPG subroutines. For now, we want just to create the corresponding functions into the intermediate AST and save them. We need to do that because when processing the code of the main function we could have calls to the subroutines and we need to know to which functions in the intermediate AST they correspond, as we will use this information while converting the calls
  • we then look at the entry plist. Basically it tells us which parameters we should pass when calling the main function. For this reason for each entry in the plist we add a parameter to the main function
  • then we translate all the RPG statements into statements for the intermediate AST. We do this first for the main function
  • then we translate all RPG subroutines into functions in the intermediate AST. We retrieve the functions we previously created from ctx. Then we convert the statements

Transforming RPG statements into intermediate AST statements

When implementing a transpiler I would typically start by defining the conversion methods like this:

private fun Statement.toGStatements(ctx: RpgToIntermediateContext) : List<GStatement> {
    return when (this) {
        else -> TODO("Not yet implemented $this")
    }
}

Then I would work on an example, and as this code cause an exception I would expand the method to handle that code if necessary adding an element in the intermediate AST.

In this case, this is the code I obtained when I was able to convert the example we have seen at the beginning of the tutorial:

private fun List<Statement>.toGStatements(ctx: RpgToIntermediateContext) = this.map { it.toGStatements(ctx) }.flatten()

private fun Statement.toGStatements(ctx: RpgToIntermediateContext) : List<GStatement> {
    return when (this) {
        is PlistStmt -> emptyList() // we can ignore this
        is EvalStmt -> listOf(GAssignment(this.target.toGTarget(ctx), this.expression.toGExpression(ctx)))
        is ExecuteSubroutine -> listOf(GExecuteFunction(ctx.toGFunction(this.subroutine.referred ?: throw RuntimeException("Unresolved subroutine"))))
        is ClearStmt -> listOf(GResetStringStmt(this.value.toGTarget(ctx)))
        is DisplayStmt -> listOf(GPrintStmt(this.response!!.toGExpression(ctx)))
        is SetStmt -> emptyList() // we can ignore this
        is SelectStmt -> {
            var elseCase : GSwitchStmt.Else? = null
            val cases = this.cases.map {
                GSwitchStmt.Case(it.condition.toGExpression(ctx), it.body.toGStatements(ctx))
            }

            if (this.other != null) {
                elseCase = GSwitchStmt.Else(this.other!!.body.toGStatements(ctx))
            }

            listOf(GSwitchStmt(cases, elseCase))
        }
        is ForStmt -> {
            val target = (this.init as AssignmentExpr).target.toGTarget(ctx)
            val initValue = (this.init as AssignmentExpr).value.toGExpression(ctx)
            listOf(GForStmt(target, initValue, this.endValue.toGExpression(ctx), this.body.toGStatements(ctx)))
        }
        else -> TODO("Not yet implemented $this")
    }
}

The entries for the different kinds of RPG statements are ordered in the way I encountered them in the transpiler, to show how this was built incrementally. In practice, we may want to refactor this. One could order them by type or they could be ordered based on their prevalence: for performance reasons is best to have the most frequent statements (such as EvalStmt) at the top.

Similarly, we convert Expressions:

private fun Expression.toGExpression(ctx: RpgToIntermediateContext): GExpression {
    return when (this) {
        is IntLiteral -> GIntegerLiteral(this.value)
        is RealLiteral -> GDecimalLiteral(this.value.toDouble())
        is StringLiteral -> GStringLiteral(this.value)
        is DecExpr -> {
            // FIXME We assume that decimal digits are 0
            //       in reality we should use the interpreter to evaluate that
            GStringToIntExpr(this.value.toGExpression(ctx))
        }
        is DataRefExpr -> {
            GGlobalVariableRef(ctx.toGGlobalVariable(this.variable.referred ?: throw RuntimeException("Unresolved variable")))
        }
        is PlusExpr -> {
            if (this.left.wrappedType() is StringType) {
                GStringConcatExpr(this.left.toGExpression(ctx), this.right.toGExpression(ctx))
            } else {
                GSumExpr(this.left.toGExpression(ctx), this.right.toGExpression(ctx))
            }
        }
        is CharExpr -> {
            GToStringExpr(this.value.toGExpression(ctx))
        }
        is EqualityExpr -> {
            GEqualityExpr(this.left.toGExpression(ctx), this.right.toGExpression(ctx))
        }
        else -> TODO("Not yet implemented $this")
    }
}

And Types:

private fun Type.toGType(): GType {
    return when (this) {
        is StringType -> GStringType
        is NumberType -> {
            if (this.integer) {
                GIntegerType
            } else {
                GDecimalType
            }
        }
        else -> TODO("Not yet implemented $this")
    }
}

From the intermediate AST to Java

The way we convert the intermediate AST to Java code is very similar to the way we have converted the RPG AST into the intermediate AST.

The entire code for this part of the system is reported below. We have extension methods to convert types, expressions, and statements from the intermediate AST to the Java AST. That part should be straightforward.

The transformation of the high-level part of the AST is handled in the first method below: transformFromIntermediateToJava.

Let’s look at it:

  • We create a Java Compilation Unit: it represents a file
  • In this Compilation Unit, we create a class, named after the program in the intermediate AST
  • For each global variable in the intermediate AST we create a Java field. We use the same name of the global variable and we convert the type. We also consider initialization values, if they are present
  • We create a method named “executeProgram”. This will represent our entry point. We add the parameters, as they are specified in the mainFunction from the intermediate AST. We then convert all statements
  • We convert the additional functions into Java methods. They have no parameters, so we just need to assign the name and convert all the statements
fun transformFromIntermediateToJava(intermediateAst: GProgram) : com.github.javaparser.ast.CompilationUnit {
    val javaCu = com.github.javaparser.ast.CompilationUnit()
    val javaClass = javaCu.addClass(intermediateAst.name)
    intermediateAst.globalVariables.forEach {
        val javaField = javaClass.addField(it.type.toJavaType(), it.name, Modifier.Keyword.PRIVATE)
        if (it.initialValue != null) {
            javaField.getVariable(0).setInitializer(it.initialValue!!.toJavaExpression())
        }
    }
    val mainMethod = javaClass.addMethod("executeProgram");
    intermediateAst.mainFunction.parameters.forEach {
        mainMethod.addParameter(it.type.toJavaType(), it.name)
        mainMethod.body.get().addStatement(JavaParser().parseStatement("this.${it.name} = ${it.name};").result.get())
    }
    intermediateAst.mainFunction.body.forEach {
        mainMethod.body.get().addStatement(it.toJavaStatement())
    }

    intermediateAst.otherFunctions.forEach {
        val javaMethod = javaClass.addMethod(it.name)
        it.body.forEach {
            javaMethod.body.get().addStatement(it.toJavaStatement())
        }
    }

    return javaCu;
}

private fun GStatement.toJavaStatement(): com.github.javaparser.ast.stmt.Statement {
    return when (this) {
        is GAssignment -> ExpressionStmt(
                AssignExpr(
                        this.target.toJavaExpression(),
                        this.value.toJavaExpression(),
                        AssignExpr.Operator.ASSIGN
                )
        )
        is GExecuteFunction -> ExpressionStmt(
                MethodCallExpr(this.function.name)
        )
        is GResetStringStmt -> {
            ExpressionStmt(AssignExpr(
                    this.target.toJavaExpression(),
                    StringLiteralExpr(""),
                    AssignExpr.Operator.ASSIGN
            ))
        }
        is GPrintStmt -> {
            val expr = JavaParser().parseExpression<MethodCallExpr>("java.lang.System.out.println()").result.get();
            expr.addArgument(this.value.toJavaExpression())
            ExpressionStmt(expr)
        }
        is GSwitchStmt -> {
            var top : com.github.javaparser.ast.stmt.Statement? = null
            var current : IfStmt? = null
            this.cases.forEach {
                val newIf = IfStmt()
                newIf!!.condition = it.condition.toJavaExpression()
                val block = BlockStmt()
                it.body.map { it.toJavaStatement() }.forEach { block.addStatement(it) }
                newIf!!.thenStmt = block
                if (current == null) {
                    current = newIf
                    top = newIf
                } else {
                    current!!.setElseStmt(newIf)
                    current = newIf
                }
            }
            if (this.elseCase != null) {
                val block = BlockStmt()
                this.elseCase.body.map { it.toJavaStatement() }.forEach { block.addStatement(it) }
                if (current == null) {
                    top = block
                } else {
                    current!!.setElseStmt(block)
                }
            }
            current!!
        }
        is GForStmt -> {
            val forStmt = com.github.javaparser.ast.stmt.ForStmt()
            forStmt.initialization = NodeList(AssignExpr(this.variable.toJavaExpression(), this.minValue.toJavaExpression(), AssignExpr.Operator.ASSIGN))
            forStmt.update = NodeList(UnaryExpr(this.variable.toJavaExpression(), UnaryExpr.Operator.POSTFIX_INCREMENT))
            forStmt.setCompare(BinaryExpr(this.variable.toJavaExpression(), this.maxValue.toJavaExpression(), BinaryExpr.Operator.LESS_EQUALS))
            val block = BlockStmt()
            this.body.map { it.toJavaStatement() }.forEach { block.addStatement(it) }
            forStmt.body = block
            forStmt
        }
        else -> TODO("Not yet implemented $this")
    }
}

private fun GTarget.toJavaExpression(): com.github.javaparser.ast.expr.Expression {
    return when (this) {
        is GGlobalVariableTarget -> JavaParser().parseExpression<com.github.javaparser.ast.expr.Expression>("this.${this.globalVariable.name}").result.get()
        else -> TODO("Not yet implemented $this")
    }
}

private fun GExpression.toJavaExpression(): com.github.javaparser.ast.expr.Expression {
    return when (this) {
        is GIntegerLiteral -> IntegerLiteralExpr(this.value.toString())
        is GDecimalLiteral -> DoubleLiteralExpr(this.value.toString())
        is GStringLiteral -> StringLiteralExpr(this.value)
        is GStringToIntExpr -> {
            val call = JavaParser().parseExpression<com.github.javaparser.ast.expr.MethodCallExpr>("java.lang.Integer.valueOf()").result.get()
            call.addArgument(this.string.toJavaExpression())
            call
        }
        is GGlobalVariableRef -> {
            JavaParser().parseExpression<com.github.javaparser.ast.expr.Expression>("this.${this.globalVariable.name}").result.get()
        }
        is GStringConcatExpr -> {
            BinaryExpr(this.left.toJavaExpression(), this.right.toJavaExpression(), BinaryExpr.Operator.PLUS)
        }
        is GToStringExpr -> {
            BinaryExpr(StringLiteralExpr(""), this.value.toJavaExpression(), BinaryExpr.Operator.PLUS)
        }
        is GEqualityExpr -> {
            BinaryExpr(this.left.toJavaExpression(), this.right.toJavaExpression(), BinaryExpr.Operator.EQUALS)
        }
        is GSumExpr -> {
            BinaryExpr(this.left.toJavaExpression(), this.right.toJavaExpression(), BinaryExpr.Operator.PLUS)
        }
        else -> TODO("Not yet implemented $this")
    }
}

private fun GType.toJavaType(): com.github.javaparser.ast.type.Type {
    return when (this) {
        is GStringType -> JavaParser().parseClassOrInterfaceType("java.lang.String").result.get()
        is GIntegerType -> PrimitiveType.longType()
        is GDecimalType -> PrimitiveType.doubleType()
        else -> TODO("Not yet implemented $this")
    }
}

Testing the transpiler

When writing a transpiler according to this architecture we may want to test mainly four aspects:

  1. How we parse the source code and get the source AST
  2. How we convert the source AST into the intermediate AST
  3. How we convert the intermediate AST into the target AST
  4. How we convert the target AST into the target code

We may want to add more unit tests for the single functions and we may want to add end-to-end tests to verify the whole system, but the core of our testing efforts should be targeted at ensuring that these four layers work well individually.

In this case, we do not need to test extensively points 1 and 4, because we are reusing the parser from Jariko and the code generator from JavaParser. We can expect these components to have been tested.

We should instead focus on points 2 and 3, which contain our transformations.

Testing the RPG AST to intermediate AST transformation

Now, in our first test, we want to verify if we can translate very simple RPG code into the correct intermediate AST.

We’ll write three tests:

  • We verify an RPG data definition is converted into an intermediate AST global variable
  • We verify an EVAL RPG statement is converted into an assignment in  the intermediate AST
  • We verify that a subroutine is converted into a function in the intermediate AST
import com.strumenta.kolasu.parsing.toStream
import com.strumenta.rpgtojava.intermediateast.*
import com.strumenta.rpgtojava.transformRpgToIntermediate
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.Assertions.*

class RpgToIntermediateTest {

    @Test
    fun simpleGlobalVariable() {
        val actualGProgram = transformRpgToIntermediate("""     D NBR             S              8  0""".toStream(), "Test")
        assertEquals(GProgram("Test", globalVariables = mutableListOf(GGlobalVariable("NBR", GIntegerType))), actualGProgram)
    }

    @Test
    fun simpleStatement() {
        val actualGProgram = transformRpgToIntermediate("""     D NBR             S              8  0
     C                   EVAL      NBR = 123""".toStream(), "Test")
        val expectedGProgram = GProgram("Test",
                globalVariables = mutableListOf(GGlobalVariable("NBR", GIntegerType)))
        expectedGProgram.mainFunction.body.add(GAssignment(GGlobalVariableTarget(expectedGProgram.globalVariables[0]), GIntegerLiteral(123L)))
        assertEquals(expectedGProgram, actualGProgram)
    }

    @Test
    fun emptySupportFunction() {
        val actualGProgram = transformRpgToIntermediate("""     C     FIB           BEGSR
     C                   ENDSR""".toStream(), "Test")
        assertEquals(GProgram("Test", otherFunctions = mutableListOf(GFunction("FIB"))), actualGProgram)
    }

}

For this test we create a helper method named transformRpgToIntemediate:

fun transformRpgToIntermediate(source: InputStream, name: String) : GProgram {
    val rpgAst = parseRpgCode(source)
    return transformFromRPGtoIntermediate(rpgAst, name)
}

We just parse RPG code and invoke the first transformation: from RPG to intermediate. By using this function we do not have to create the RPG AST manually. We could do that but it is a little tricky, so we take this shortcut to keep the tutorial simple.

Testing the intermediate AST to Java AST transformation

These tests are equivalent to the tests we have seen for the transformation from RPG to intermediate AST.

They are slightly longer because we are building the Java AST manually.

import com.github.javaparser.ast.expr.AssignExpr
import com.github.javaparser.ast.expr.FieldAccessExpr
import com.github.javaparser.ast.expr.IntegerLiteralExpr
import com.github.javaparser.ast.expr.ThisExpr
import com.github.javaparser.ast.type.PrimitiveType
import com.strumenta.rpgtojava.intermediateast.*
import com.strumenta.rpgtojava.transformations.transformFromIntermediateToJava
import org.junit.jupiter.api.Assertions.assertEquals
import org.junit.jupiter.api.Test
import com.github.javaparser.ast.CompilationUnit as JavaCU

class IntermediateToJavaTest {

    @Test
    fun simpleGlobalVariable() {
        val intermediateProgram = GProgram("Test", globalVariables = mutableListOf(GGlobalVariable("NBR", GIntegerType)))
        val actualJavaCu = transformFromIntermediateToJava(intermediateProgram)
        val expectedJavaCu = JavaCU()
        val expectedJavaClass = expectedJavaCu.addClass("Test")
        expectedJavaClass.addField(PrimitiveType.longType(), "NBR").setPrivate(true)
        expectedJavaClass.addMethod("executeProgram")
        assertEquals(expectedJavaCu, actualJavaCu)
    }

    @Test
    fun simpleStatement() {
        val intermediateProgram = GProgram("Test",
                globalVariables = mutableListOf(GGlobalVariable("NBR", GIntegerType)))
        intermediateProgram.mainFunction.body.add(GAssignment(GGlobalVariableTarget(intermediateProgram.globalVariables[0]), GIntegerLiteral(123L)))
        val actualJavaCu = transformFromIntermediateToJava(intermediateProgram)
        val expectedJavaCu = JavaCU()
        val expectedJavaClass = expectedJavaCu.addClass("Test")
        expectedJavaClass.addField(PrimitiveType.longType(), "NBR").setPrivate(true)
        val executeProgram = expectedJavaClass.addMethod("executeProgram")
        executeProgram.body.get().addStatement(AssignExpr(FieldAccessExpr(ThisExpr(), "NBR"), IntegerLiteralExpr("123"), AssignExpr.Operator.ASSIGN))
        assertEquals(expectedJavaCu, actualJavaCu)
    }

    @Test
    fun emptySupportFunction() {
        val intermediateProgram = GProgram("Test", otherFunctions = mutableListOf(GFunction("FIB")))
        val actualJavaCu = transformFromIntermediateToJava(intermediateProgram)
        val expectedJavaCu = JavaCU()
        val expectedJavaClass = expectedJavaCu.addClass("Test")
        expectedJavaClass.addMethod("executeProgram")
        expectedJavaClass.addMethod("FIB")
        assertEquals(expectedJavaCu, actualJavaCu)
    }

}

While the tests were quite simple they should show you a concrete approach to test a transpiler.

Wrapping up our tutorial

In this tutorial, we have seen a very concrete implementation of a transpiler, designed according to the architecture we have presented at the beginning of the article.

These conversions are obviously partial: we are not handling all the statements and expressions of RPG and we are making some assumptions, which do not take into consideration some advanced usages of some statements and expressions. Nevertheless, this should show you the approach you can use when designing a transpiler. And despite all these limitations, this transpiler actually works on a few simple examples.

This is an architecture that leads to transpilers that are testable and maintainable in the long run. We have seen time and time again organization and professionals use a quick-and-dirty approach, that seems to work well at the beginning but lead development to a halt later. Those approaches were trying to do too much in one go: the worst cases were trying to generate the target code directly when traversing the parse-tree of the source code. Other cases built an AST for the source code but then tried to directly generate code for the target language, without building an AST for the target language. At the very minimum you should have an AST for the source language and from that generate an AST for the target language. Then you should separately generate code for the target language from the AST for the target language. In most cases anyway, you will benefit from having an intermediate AST, even if it requires some extra work.

The approach presented shows how to test each layer individually. Take advantage of that :)

Summary

Transpilers can be very important resources. Many organizations and developers are not really aware that it is possible to write transpilers. Most of those who do know that this is possible have no idea how to get started. In the article, we tried to explain what they are, when they should have used, and how they should be designed.

Given that it is way too easy to give vague indications that make you seem smart, we backed up our recommendations with a complete tutorial showing how to build a transpiler to convert between two very different languages.

This has all been based on our experience writing commercial transpilers for organizations of different sizes. In case you ever need commercial support to build a transpiler, well, you may want to contact us.

Before we go there are a few resources we want to share:

Enjoy! I hope this has been useful.

References

There are a couple of websites which were useful while preparing this article:

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit