Introduction

In this article, we’ll review JaRIKo, an RPG open-source interpreter in Kotlin for a subset of the RPG programming language running on the Java Virtual Machine. People interested in legacy modernization techniques should find this interesting: as we’ll see in the course of the article, the driving force behind JaRIKo is the ultimate migration of a product with thousands of customers and challenges abound in this endeavor.

JaRIKo is written in Kotlin, and it’s the result of an ongoing collaboration between Strumenta and Sme.UP, who initially contracted us for our language engineering expertise. Strumenta designed and implemented the first version of the interpreter, which is now being actively maintained and evolved by a team of Sme.UP and Strumenta architects and developers working tightly together.

About RPG

RPG (originally Report Program Generator) is a proprietary high-level programming language for business applications, developed by IBM. It was originally created in 1959, to bring punch card processing to the IBM 1400 series computers. However, its more modern descendants are still in use today, despite only being available on IBM i systems (formerly known as AS/400).

Until the release of RPG IV, dated 1994, it has been exclusively a fixed-format language. That is, the textual elements of a program must be placed in a specific position of a source line of code in order to have a specific meaning. Instead, most modern languages are “free-form”. That is, they do impose some structure to the code but they don’t mandate a precise positioning of program elements.

This is an excerpt of a fixed-format RPG program:

      *---------------------------------------------------------------
     D* M A I N
      *---------------------------------------------------------------
      *
      * Initial settings
     C                   EXSR      IMP0
      * Function / Method
1    C                   SELECT
      * Init
1x   C                   WHEN      U$FUNZ='INZ'
     C                   EXSR      FINZ
      * Invoke URL
1x   C                   WHEN      U$FUNZ='ESE'
     C                   EXSR      FEXE
      * Detach (empty subroutine in this case)
1x   C                   WHEN      U$FUNZ='CLO'
     C                   EXSR      FCLO
1e   C                   ENDSL
      * Final settings
     C                   EXSR      FIN0
      * End
     C                   SETON                                        RT

Nowadays, RPG is a niche technology, heavily tied to IBM hardware and operating systems that are not very widespread. Still, there are businesses that thrive on RPG and continue to invest significant amounts of money in RPG and related technologies.

It’s worth noting that typical RPG users are not programmers as we intend them today; instead, often they’re technically skilled business analysts and consultants. Let’s keep that in mind over the course of this article, as people who have a background in computer science might find some choices to be odd. This also means that we can see RPG as a DSL (Domain-Specific Language) for business applications and reporting. We’ll return to this later on.

Why JaRIKo?

The need for something like JaRIKo arose in the company, Sme.UP, for several reasons that we’ll now examine. Sme.UP, among other things, has built an ERP product in RPG over the years, which they successfully deploy and tailor to a variety of customers.

One concern that is common among companies using RPG is vendor lock-in to IBM. In fact, in the future, they may decide to discontinue support for RPG, the IBM i OS, or specialized hardware – or to significantly increase their cost. Thus, having the possibility to also run RPG programs outside of IBM’s hardware and software mitigates that risk. This does not mean in any way that we want, or suggest, to abandon IBM – we merely want to add a possibility.

Another key objective in designing and implementing JaRIKo has been the interoperability with the JVM ecosystem; in particular, the ability to tap into the huge selection of Java libraries. RPG is tailored to writing business rules, but when we have to parse XML or generate JSON, for example, we have very limited tools at our disposal. Running on the JVM, and thus on commodity hardware and software, also means reducing the cost of operation for customers running the Sme.UP ERP on-premises.

Being able to run the code on cheaper systems could also open new market opportunities, as smaller players, who cannot afford to pay the cost of a full ERP on IBM hardware, could nevertheless benefit from more limited versions of the same products. Also, it opens the road for cloud computing offerings with a pay-per-use model, which allow better control of operating costs and efficient scaling.

Finally, JaRIKo is open-source, a precise choice of Sme.UP since its inception. This means that it could receive contributions from other parties interested in the technology, expanding it beyond the immediate needs of Sme.UP. The decision to develop such a technology in the open is a courageous and unusual move in the field of commercial ERPs.

Side Benefits

In addition to the goals stated in the previous paragraph, we can identify some side benefits in a tool like JaRIKo.

A more modern ecosystem doesn’t only mean more libraries; it also brings tools such as rich editors and code versioning systems. Furthermore, it paves the way for modern development practices such as unit testing. We’ll cover testing in greater depth in the following sections.

Also, multiple implementations of a programming language mean more possibilities for the evolution of the language. Until now, only IBM has had the power to add new features to RPG, with its own release cycle. This is particularly important for Sme.UP particularly, and other similar shops, because many RPG developers are actually domain experts and consultants. The core product development team can add new statements and expressions to the language that help developers in the field tackle common problems and patterns. Of course, adding new features to the language might break backward compatibility; currently, the same programs can run both on IBM i and JaRIKo, and in some cases, it’s worth keeping it like that.

Finally, an open-source interpreter is a fantastic opportunity for study and training, not just production software. Indeed, getting access to an IBM i system is not affordable for every individual who might want to learn RPG, or for smaller companies wishing to train their personnel. This is made even simpler by the fact that an instance of JaRIKo is available on the web as a service: everyone can try RPG in their browser!

Why an Interpreter?

An interpreter is a program that reads and executes other programs on the spot.

The solution to the goals we stated earlier is not necessarily an interpreter. We could have built a transpiler, that is, a source-to-source converter (for example to Java). Or, Sme.UP could have manually translated their software to Kotlin, or to some other language. One might think that a valid goal would be to get rid of RPG altogether and fully migrate to modern technologies.

Well, there are solid reasons for preferring an interpreter (there are also reasons to prefer other solutions, of course).

An example interactive RPG session with JaRIKo. Image courtesy of Sme.UP.
An example interactive session with JaRIKo in the browser. Try it! Image courtesy of Sme.UP.

Benefits of an Interpreter

Companies like Sme.UP have made huge investments in RPG; they have been tuning their products over the years to fit their customers’ needs. This is especially true as most of their RPG code implements business rules and requirements, and not infrastructure or low-level code.

In fact, as is typical with ERP’s, while the core development team works on the product itself and deals with all the plumbing, employing a variety of technologies including RPG, Sme.UP employs hundreds of consultants, usually working on their customers’ premises, who adapt the product to fit business requirements. Usually, they’re trained and proficient with RPG, and it would be a massive effort to retrain all those people to use new technologies, which are significantly different from what they know.

Also, RPG has a lean development cycle without any lengthy compile, deploy and restart steps. The developer edits a source file, feeds it into the system and immediately sees the result. With an interpreter, we can maintain the same fast cycle easily. That’s not impossible to achieve with a just-in-time compiler, and that may well be in the future of JaRIKo, but for the time being, an interpreter is much simpler.

Finally, rewriting everything in a new language is costly and highly risky. Companies have failed or nearly failed in trying to do so, and customers often are not willing to pay for technological upgrades that bring no immediately visible benefits to their operations. Instead, it’s much better to incrementally rewrite some parts of the system while keeping the whole thing running and evolving. An interpreter like JaRIKo makes this possible, as we’ll see in the following sections.

How JaRIKo Works

RPG applications are composed of many different “programs” that call one another; we can think of “programs” as “functions” in other programming languages. Programs can be user-defined, or they can be built-in or system APIs.

In JaRIKo, the unit of interpretation is the program. Programs need not be written in RPG; indeed, some programs are in Kotlin or other JVM languages, for several reasons:

  • They’re intrinsics of the interpreter, built-in primitive building blocks for user-defined programs;
  • They perform a series of calls to some Java/JVM library or legacy application, wrapping them up in a nice API for RPG;
  • Or, they’re called repeatedly in some inner loop that needs maximum performance.

Conversely, programs in a JVM language can call back into the interpreter to invoke RPG programs.

We can write programs in RPG and later replace them with Java or Kotlin. Image courtesy of Sme.UP.
We can write programs in RPG and later replace them with Java or Kotlin. Image courtesy of Sme.UP.

In the next few sections, we’ll look into the details of how the interpreter builds, represents, and executes programs.

Parsing RPG

"PRG

The first step in interpreting a programming language is parsing. To parse a language means to recognize the structure of the source code, in order to represent it in a format suitable for further elaboration by machine. A good parser also reports precise, meaningful error messages to the developer, and it’s able to recover from errors in the source code in some circumstances.

For JaRIKo, we’ve used the ANTLR4 parser generator. This is a tool that, from an input grammar, generates a parser in Java or in one of the other supported languages. ANTLR4 parsers exhibit all of the above characteristics.

The RPG grammar we used to create JaRIKo derives from a grammar by Ryan Eberly. Ryan’s work was directed at building tools that could understand RPG, for example, an editor that knows how to highlight RPG syntax elements. So, we adapted the grammar towards use in an interpreter, dealing with problems such as operator precedence that are not very important in an editor. At the time of writing, the grammar covers almost the entire RPG language.

The Abstract Syntax Tree

A parser generated by ANTLR acts in two phases: first, a lexer component tokenizes the input, breaking a long string of text into smaller “tokens”; then, the actual parser processes the stream of tokens according to the rules of the grammar. The result is a parse tree: a tree data structure representing the hierarchical decomposition of the input into the elements of the language. This is part of the parse tree of an RPG program:

R
  Statement
    Dspec
      T:DS_FIXED[D]
      Ds_name
        T:NAME[Msg§]
      T:EXTERNAL_DESCRIPTION[ ]
      T:DATA_STRUCTURE_TYPE[ ]
      T:DEF_TYPE_S[S ]
      T:FROM_POSITION[ ]
      T:TO_POSITION[ 12]
      T:DATA_TYPE[ ]
      T:DECIMAL_POSITIONS[ ]
      T:RESERVED[ ]
      T:EOL[
]

With ANTLR, a parse tree is made of objects that are part of the ANTLR runtime for the target language – in our case, Java. An ANTLR Kotlin runtime exists, but it’s still experimental and not yet adequate for our use cases. Anyway, Kotlin is fully interoperable with Java.

Along with the lexer and the parser, ANTLR also generates classes to traverse and transform the parse tree. In fact, our interpreter further processes the parse tree in order to produce another tree structure. This is the AST or abstract syntax tree. The parse tree closely mirrors the textual structure of the code; instead, the AST models the logical organization of a piece of code. This is the AST corresponding to the previous parse tree:

DataDefinition {
  fields = [ ]
  initializationValue = null
  name = Msg§
  type = StringType(length=12)
  muteAnnotations = [ ]
  parseTreeNode = null
} // DataDefinition

We perform the transformation from a parse tree to an AST by using our own open-source library, Kolasu.

In contrast to ANTLR’s parse tree of Java objects, our AST is made of Kotlin data classes. For those unfamiliar to Kotlin, these are similar to Java Beans but don’t require the programmer to declare all the boilerplate getters and setters, equals and hashcode methods and so on.

We use another key Kotlin feature to drive the transformation of a parse tree into an AST: extension methods. We enrich each of the generated ANTLR Java classes with a toAST method, without touching the generated source code. Neat!

Interpreting the AST

Once we have an AST, we can proceed with the actual interpretation of the AST. This phase is also known as the evaluation, i.e., giving a value to an expression, represented by a tree.

In turn, in the scope of JaRIKo, we can subdivide evaluation into two distinct phases: symbol resolution and evaluation proper.

Resolving Symbols

Symbol resolution is the process with which we:

  • Determine whether multiple occurrences of the same identifier indeed refer to the same program element (e.g., the variable X) or not;
  • Determine the kind and scope of an identifier (e.g., a local variable, a subroutine, a system API call, etc.)

In JaRIKo, at the moment, we have just one global symbol table. That’s because the interpreter only supports subroutines, not functions; therefore, all bindings are global.

Actual Interpretation

The actual interpretation consists in producing a value and side-effects for each expression or statement in a program.

In JaRIKo, the interpreter proper is just a couple of big switch-style statements; one for expressions, the other for statements. At the time of writing, these core interpreter switches currently cover ~60-70% of the RPG language. In practice, they implement the overwhelming majority of the expressions and statements used across the entire Sme.UP codebase, which is the result of decades of coding work by dozens of developers. While we cannot say that the Sme.UP ERP is particularly representative of general-purpose RPG code, we can safely assume that JaRIKo covers the most generally useful part of RPG.

Here’s an excerpt of the expression evaluation code:

private fun interpretConcrete(expression: Expression): Value {
    return when (expression) {
        is StringLiteral -> StringValue(expression.value)
        is IntLiteral -> IntValue(expression.value)
        is RealLiteral -> DecimalValue(expression.value)
        is NumberOfElementsExpr -> {
            val value = interpret(expression.value)
            when (value) {
                is ArrayValue -> value.arrayLength().asValue()
                else -> throw IllegalStateException("Cannot ask number of elements of $value")
            }
        }
        //Etc.

Challenges

We’ll now review some of the challenges that we encountered in writing JaRIKo.

Fixed-format Language

As we’ve already said, RPG is a fixed-format language. This, it’s the position of the tokens in the stream, in addition to the actual constituent characters, that drive the lexing phase. However, EBNF-style grammars such as those employed by ANTLR lack the capability to express such a structure; they’re exclusively driven by the sequence of characters in the input stream.

So, how did we manage to parse RPG with ANTLR?

Well, ANTLR grammars admit semantic predicates and semantic actions. That is, we can attach preconditions and side-effects to grammar productions – a technique which is particularly important in syntax-directed translations. As it turns out, it’is also very useful in our case. In fact, by checking the character position in the input stream at each grammar production where it matters, we’re extending ANTLR’s capabilities and taking the position into account. To do so, we use the getCharPositionInLine method which is built into ANTLR.

The downside? Semantic predicates and actions are in the target language, in our case, Java. Therefore, we cannot use the grammar specification as it is to generate parsers in other languages (e.g., Python). We’d need to translate the semantic predicates and actions from Java to Python.

Data Representation

Differences in data representation are another class of problems.

AS/400 type systems, for example, use the EBCDIC character encoding scheme instead of ASCII or Unicode. Thus, on IBM mainframe computers, the number 0 and the character ‘0’ are actually the same object; in other words, the same sequence of bits encodes both values, and the interpretation as a number or a character depends on the context. On other architectures, including the Java Virtual Machine, this is not the case. So, we must perform some clever conversions, since programmers might have based some of their code on the equivalence of 0 and ‘0’.

Endianness is also a potential problem. IBM systems are big-endian, and as such, they encode multi-byte numbers differently from the little-endian computers that we’re accustomed to. While there’s code in the wild making byte-level operations on numbers that would be affected by endianness, so far this hasn’t been a real problem for JaRIKo specifically. That’s because the Sme.UP codebase does not make use of such instructions.

Another issue is packed numbers, a way of representing numbers by encoding each decimal digit with four bits. This is not only a problem for code relying on bitwise operations; since RPG supports union types and a packed number might overlap (or “overlay” in RPG parlance) with other data such as a string or an array, care must be taken to correctly emulate this behaviour.

Finally, RPG union types also allow different encoding schemes (e.g., different compression modes). The JVM does not directly support such a feature and it would need to be emulated, but again, JaRIKo simply does not implement it as Sme.UP does not use it.

Access to the Host Environment

Every program, through as many abstraction layers as we fancy to add, ultimately ends up talking to a host environment: the operating system and its services. On IBM i, those include the file system and the DB2 database.

For example, many RPG programs map in-memory entities to persistent data records. The OS organizes these records according to a method called RLA, which stands for Record-Level Access. The system stores records (files) sequentially, and it keeps indices for traversing them efficiently in some user-defined order. It’s essentially a NoSQL system that predates relational databases.

When we execute RPG code outside of IBM i or AS/400, we don’t typically have DB2 or RLA available. So, in designing JaRIKo, we’ve had to decide where to draw the line: what we want to emulate and what we don’t want to support. Otherwise, we would have had to rewrite a heap of OS services! Luckily, the Sme.UP ERP already abstracts the data access layer when running on IBM machines, so that the software can use different data sources transparently while maintaining the same data access API. This clever design decision has made our life easier when porting programs outside of IBM i.

Interfaces

JaRIKo provides a few interfaces that programs can connect to. In particular, there’s a JDBC data access layer that we use to run tests on HSQLDB, but that could be easily be adapted to other database systems such as PostgreSQL. There’s also a system interface for common host services, such as display to print a string to the console (or send it to the web browser!), findProgram to locate a program to invoke, and so on.

In the end, as we’ve done with the subset of RPG that we support, the driving force in choosing what to implement has been actual usage in the Sme.UP codebase, which may or may not be typical RPG code. That’s why we chose not to implement any of the OS/400 system APIs. Anyway, JaRIKo being open-source, other parties can add the bits that they miss, while building upon what the interpreter provides out of the box.

Unit Testing RPG Code

We’ve mentioned earlier how having an interpreter allows experimentation with new language features. One feature, which is specific to JaRIKo, that we’re particularly proud of, is MUTEs. MUTEs are a technique for embedding assertions in RPG comments, that we use for unit tests. Here’s an example:

MU* VAL1($TIMMS) VAL2(5000) COMP(LT)
 C                   EVAL      $TIMMS=$TIMMS/1000

In the code above, the MU* line is a comment in RPG, and in the normal execution flow of the program, it’s ignored. However, when testing, we interpret the comment as an assertion about the subsequent line (which is a line of computation code, as indicated by the letter C). The assertion is in reverse Polish notation, also known as postfix; that is, the comparison operator (here, Less Than) follows the arguments.

MUTEs can also express non-functional assertions. One example is timeouts:

MU* TIMEOUT(9000)
 C                   SETON                                        LR

Note that we only assert the timeout after the instruction has finished execution; we don’t stop the interpretation when the timeout expires, even though with an interpreter it’s possible and relatively simple to do so, as long as the code isn’t calling into foreign JVM code such as a third-party Java library.

These are just a couple of examples of the way we’re introducing modern engineering practices to the RPG world.

What’s Next?

JaRIKo is an evolving effort, and what we’ve covered here is only part of the story. And some of it has yet to be written.

Possible future developments include:

  • Integration with Truffle/GraalVM for efficient Just-In-Time compilation, and other performance improvements
  • Debugging facilities
  • Better integration with JVM languages other than Kotlin
  • An editor in the browser, based on Monaco
  • A REPL (of which there’s already a prototype)
  • A FaaS cloud offering – that is, serverless RPG!
JaRIKo as a component in a web application. Image courtesy of Sme.UP.
JaRIKo as a component in a web application. Image courtesy of Sme.UP.

Each of these bullet points could potentially be an article of its own, and we cannot expand on them all here. But if you’re interested in some of them, please leave a comment!

Summary

To wrap it up, we’ve seen from a high-level point of view how we brought an established technology into a modern development ecosystem using an interpreter, and how that differs from a compiler or transpiler. Through that, we’ve had an example of how programming language implementation techniques are very relevant in today’s polyglot environments. We’ve also touched on the importance of domain-specific languages in the hands of domain experts and consultants.

The home of the JaRIKo project is on GitHub. An interactive online interpreter is available.

Acknowledgments

JaRIKo is the result of the ongoing collective work of several dedicated people, including:

Federico Tomassetti and Maurizio Taverna of Strumenta, who did most of the design and implementation of the original prototype of the interpreter.

Mauro Sanfilippo and Franco Lombardo of Sme.UP, who have originally envisioned the project and who has been part of it since the beginning, with a crucial role in leading the vision (Mauro) and contributing to the implementation (Franco).

Marco Benetti of Sme.UP, who’s been fundamental in defining the requirements and investigating the peculiarities of the IBM i platform.