Introduction

This is the second of two articles in which we’ll go through the design and implementation of a domain-specific language (DSL) for modding Minecraft. We’ll use the most advanced techniques that we have learned at Strumenta, and we’ll only use open-source tools and libraries. 

In the first part, we’ve given an introduction to DSL design and started planning for an implementation, by looking at the existing practices and tools for writing Minecraft mods. In this second part, we’ll produce a working prototype that we’ll use to build our first mod.

We’ll be using Kotlin, because that’s the language we prefer at Strumenta, and for which we have the best libraries and tools. However, we could have used Java too, or Python, TypeScript, C# – all languages covered by the various incarnations of our StarLasu framework for languages. In particular, the implementation we’ll be using with Kotlin is called Kolasu.

Note that this is completely independent of the fact that we’re generating Java code. One thing is the technology used to implement our language; another thing is the code that we generate, which in this case is Java because Minecraft requires that.

All the code developed for this tutorial is available in a GitHub repository. In the course of the article, we’ll go over several iterations of the code. We have tagged each iteration in the repository so that it’s easier to follow along.

First Concept: the Mod

In the first part of this tutorial, we’ve seen the properties we have to set to configure a new mod. In other words, we’ve started from the bottom, the code that we have to generate. Now that we’ve got an idea of what we’re going to generate, we’ll define the high-level concepts that will constitute the elements of our language.

We’ll return to code generation later, to connect all the pieces. It’s not always possible or advisable to proceed this way – bottom-up and top-down – but in this particular case it works well. So, let’s start with our first concept: the mod.

So, to model our Mod, we’ll write a Kotlin class. As we’ve seen, a mod has various properties including an ID, name, version, and a license: So, let’s model those properties in a Kolasu node:

data class Mod(

   val id: String,

   val name: String,

   val version: String,

   val license: String? = null

) : Node()

We’ve omitted all the project setup because it’s described in other articles; even though they may be outdated at the time of writing, the principles remain more or less the same. Also, in the GitHub repository associated with this tutorial, we can find all the code, including build scripts.

Now we have an abstract representation of a mod. We can see that this is just a plain Kotlin class. In other words, so far the process isn’t particularly different from what we’d have done if we were building a software library or application to deal with Minecraft mods.

One notable exception is that our Mod class extends Kolasu’s Node class, and that makes it into an AST node with all the features that Kolasu provides, such as traversal methods, transformations, tracking of source positions, etc. – more on that later.

This first version of our mod is tagged “first-concept” in the Git repo.

Adding Syntax

Since we’re building a DSL we need something else beyond abstract constructs. Specifically, we’ll have to add syntax and semantics in order to turn our AST into an actual language.

Let’s focus on the first part, syntax. We’ll write an ANTLR4 grammar: this is the technology we’ve chosen as a company and built most of our expertise on; besides that, Kolasu comes with optional integration with ANTLR built-in, so it will require less glue code compared to other parser generators or to a completely different solution such as a projectional editor, for example MPS or Freon.

This is the relevant portion of the grammar with which we establish a syntax for the Mod concept:

mod: 'mod' name=NAME '(' id=NAME '@' version=STRING ')' '{'
   ('license' ':' license=STRING)?
'}' EOF;

We’ve omitted most of the details, but the GitHub repo contains all the glue code necessary to generate the parser with ANTLR at build time. In particular, the “adding-syntax” tag refers to the code up to this point in our tutorial.

Connecting the Pieces

There’s another step we have to take before moving forward with our language. So far, we’ve defined a minimal AST with a single concept, and the syntax for it, but we haven’t connected the two pieces.

To do so, we have to set up a ParseTreeToASTTransformer that maps ANTLR parse trees to our Kolasu AST. When building parsers for our customers, we use proprietary automated tools that generate an initial version of that for us, following the grammar and the AST. Here, however, we have to define it manually:

class MinecraftParseTreeToASTTransformer(
   issues: MutableList<Issue>,
   source: Source?
) : ParseTreeToASTTransformer(issues, false, source) {
   init {
       registerNodeFactory<ModContext, Mod> {
           if (exception == null) {
               val name = if (name.type == MinecraftLexer.NAME) name.text else stringContents(name)!!
               Mod(id.text, name, stringContents(version)!!, stringContents(license) ?: "")
           } else null
       }
   }


   private fun stringContents(token: Token?): String? = token?.text?.substring(1, token.text.length - 1)
}

Here we can see that we transform a ModContext parse tree node into a Mod AST node. In this particular example we use a closure, because we may need to extract the text from string literals, so as not to include the opening and closing quotes in the name of the mod. However, in many other cases, the declarative API is sufficient and we don’t have to write any custom code – just specify the source and target node type, and how to map child nodes.

To put everything together, now we can define a KolasuParser subclass:

class MinecraftModParser : KolasuParser<Mod, MinecraftParser, ModContext, KolasuANTLRToken>(ANTLRTokenFactory()) {
   override fun createANTLRLexer(charStream: CharStream): Lexer = MinecraftLexer(charStream)


   override fun createANTLRParser(tokenStream: TokenStream): MinecraftParser = MinecraftParser(tokenStream)


   override fun parseTreeToAst(
       parseTreeRoot: ModContext,
       considerPosition: Boolean,
       issues: MutableList<Issue>,
       source: Source?
   ): Mod? {
       return MinecraftParseTreeToASTTransformer(issues, source).transform(parseTreeRoot) as Mod?
   }
}

This will be the entry point for parsing our language. Most of the features of our parser are provided by Kolasu and ANTLR, here we’re just gluing everything together. We can feed this parser a variety of stream sources – such as a string or a file – and it will return a Result object with the AST (the “root” property) and a list of Issues encountered during parsing.

We’ve described all these concepts at length in other articles (you can start with the Chisel Method, an open-source method for parsers that we’ve designed), so please refer to them if you want more information. Also, we can find the code above in the “adding-syntax” tag that we introduced earlier.

Code Generation

Once we have the information about the Mod in our AST, we’ll want to make it executable. This means generating a proper Minecraft mod jar out of it, that we’ll run in the game as discussed earlier.

There are multiple ways in which we can achieve that result. The typical choice we make, absent other considerations, is to generate code – in this case, Java code and configuration files – and then have existing tools produce the final artifact. In other words, we’ll generate a Java mod project as if we had coded it manually, and then we’ll use Gradle and the Java compiler to build the jar.

We could have generated the jar directly, but that adds a lot of low-level complexity, particularly when generating Java bytecode (or other forms of machine code). There’s no compelling reason to deal with such low-level details in most cases. Sure, it’s more efficient to generate Java bytecode than to print text and launch the Java compiler that will parse that text and compile it to produce more or less the same bytecode. However, we’re talking about a tiny number of files, any difference in performance is going to be negligible.

Furthermore, this language differs from the typical project we develop at Strumenta because the code generator won’t actually generate all the code. Most of it we’ll keep as-is from the Mod Development Kit we’ve downloaded in Part 1 of this tutorial. So our strategy is to overwrite the parts that we want to change, rather than writing the code from scratch. As we’ll see, this will result in some design choices down the line.

Now, in typical DSLs or transpilers, the generated code won’t be self-sufficient as well, it will usually depend on some library runtime. Also, the generated code often is not a complete project, and will have to be integrated into a larger codebase at some point. However, our case is different as the code will be inserted in an existing project already during generation. 

Once we have generated the code, we’ll want to obtain a jar file to deploy our mod; we’ll just use Gradle (`./gradlew build`) as we’ve already shown in Part 1. This is a manual step but we can easily automate it by launching Gradle as a subprocess.

Error Reporting and Position Information

Another consideration is error reporting. We can identify two scenarios.

First, the code that the user typed could be incorrect. In that case, we should report the error(s) and abort the process without generating any code. That’s what the typical compiler or transpiler does, for example when you introduce a type error or refer to a variable that wasn’t declared. In case the error is in the parsing stage, the parser itself (ANTLR in our case) typically can report the position of the error in the source code. However, if the error happens when verifying the correctness of the AST – for example, checking types – then we have to be able to map an AST node back to the source code it was parsed from, to show the user where the incorrect code is.

The other scenario is the following. Suppose our code generator has a bug and generates invalid code. In that case, the Java compiler may not be able to compile it and will report one or more errors, referencing positions in the generated code that are meaningless for the user, unless we make the effort to trace them back to sections of the code that the user wrote.

Of course, our little transpiler will be completely bug-free and will always generate perfect code ;) but in general, proper issue reporting is an important part of a satisfying user experience with a language, so the correct propagation of errors and warnings across the various layers of the transpiler, with a proper translation of line and column information, should not be an afterthought.

Kolasu helps us in both scenarios because it keeps position information so that each node in the AST can be traced back to the source code it came from. Not only that, but when generating code using its own facilities, it keeps trace of where each piece of generated code came from. However, that doesn’t apply to our transpiler, because we’re not using those Kolasu APIs because we want to modify an existing Java files instead of overwriting them.

Alternatively, it is sometimes possible to enrich machine code with line and column number information so that runtime errors can refer to the correct section of the source code. This is generally not available when generating text – with the notable exception of JavaScript source maps – and it’s often hard to map runtime stack traces back to the original source code (as opposed to the generated code), so this is one reason that could make the generation of binary code preferable. Given we’re targeting Minecraft and the generated code ought to be simple, this shouldn’t be a concern for us.

So let’s look at how we may generate a minimal Minecraft mod from our Mod instance.

Generating Mod Properties

As we’ve seen in Part 1 of this tutorial, the properties that we’ve defined in our Mod class (and more) are set in the gradle.properties file at the root of the mod project. So the first step of our code generation process is to output an updated version of that file.

Since our code generator is overwriting an existing file, we want a solution that will preserve the contents of that file as much as possible, including the order of the properties and the comments. This rules out the native Java Properties class: while it allows one to load a properties file, modify it, and save it back, it loses all formatting, comments etc. in the process. Instead, we’ll use the Apache Commons Configuration library, which luckily comes with faithful preservation of formatting.

We won’t include the code for generating the properties file here because it’s not particularly interesting, but we can find it in the Git repo. Now let’s move on.

Setting the Mod ID in Java Code

In part 1 of this tutorial, we also learned that the mod ID that we configure in gradle.properties must match with the MODID constant in the mod’s entry point Java class, which is called initially ExampleMod.

We could adopt the same approach that we used for the properties file: to parse the Java code, modify it in memory, and then dump it again as text preserving all formatting, comments etc. However, this is significantly harder to do because Java is a much more complex language and parsers offering syntactic preservation – the ability to print the code back into a string keeping the formatting as close to the original as possible – are not common.

JavaParser offers such a capability, with some limitations, but we don’t want to introduce such complexity in a tutorial. It could be a nice exercise for the reader, though.

So, we’ll cheat and play dirty, and just apply regular expression substitution. A technique we usually avoid because it scales very poorly, but in these limited circumstances it’s apt for the job. We have the following field declaration:

public static final String MODID = "examplemod";

And we want to replace “examplemod” with our own mod ID:

fun patchModClass(sdkDir: File, mod: Mod) {

   val modClassFile = File(sdkDir, "src/main/java/com/example/examplemod/ExampleMod.java")

   val newText = modClassFile.readText()

       .replace(Regex("(public static final String MODID = \").*?\";"), "$1${mod.id}\";")

   modClassFile.writeText(newText)

}

Here we’ve used the fact that we can use groups in regular expressions to capture a portion of the matched string (in parentheses), then refer to it in the substitution ($1), to avoid repeating the “public static final…” part.

Normally, code generation doesn’t work like this. The additional constraint of preserving formatting is specific to this project. When no such constraint is present, we typically generate the target code using our own proprietary language modules if available – by building a Kolasu AST programmatically, then generating code text from it – or possibly an open-source library if one exists. These typically produce code that is formatted (for example, it’s not all on a single line, it’s indented to reflect logical nesting, and so on) but with a fixed format or at most a few controllable parameters (for example, tab size).

We’ve tagged the code up to this point as “code-generation” in the Git repo.

Next Iteration: Evolving the Language

So far, we’ve built the minimal kernel of a domain-specific language: from syntax (the parser), to semantics (the AST), to code generation. Extending the language is “simply” a matter of expanding each of those elements.

For example, the Minecraft mod SDK comes with a pre-built “block” that we can add to the game, defined in the field “EXAMPLE_BLOCK”. To actually allow the player to place that block into the world, at least in the game’s “creative mode”, we also have to include an “item” that will go into the player’s inventory and will create the associated block when used (defined in the field “EXAMPLE_BLOCK_ITEM”).

So, we have learned that the language has a concept of a “block” with some properties (aesthetic and behavioral), it has a concept of an “item”, and in particular there’s a category of items that are “block items” that the player can use to place blocks in the world. We can therefore reflect that in the syntax, semantics, and generation parts of the language.

Typically, in Strumenta we start from the syntax because we have proprietary tools to automatically generate a first sketch of the AST from an annotated ANTLR grammar, and to later replace machine-generated AST classes with hand-crafted versions if needed. But it’s also perfectly fine to start from the AST, like we did in this tutorial. So we could add a first version of the block concept:

data class Block(val name: String) : Node()

And in the grammar:

block: 'block' name=NAME '{' /* reserved for future extensions */ '}';

Just a name with no properties. We’ve covered in other articles how to add more complex properties like references for example, or enumerations.

We’ll then modify our AST and grammar to include Block nodes.

AST:

data class Mod(

   /* Other properties omitted */

   var blocks: List<Block> = listOf()

) : Node()

Grammar:

mod: 'mod' name=(NAME | STRING) '(' id=NAME '@' version=STRING ')' '{'
   ('license' ':' license=STRING)?
   block*
'}' EOF;

AST Transformer:

registerNodeFactory<ModContext, Mod> {

   if (exception == null) {

       val name = if (name.type == MinecraftLexer.NAME) name.text else stringContents(name)!!

       Mod(id.text, name, stringContents(version)!!, stringContents(license))

   } else {

       null

   }

}.withChild(Mod::blocks, ModContext::block)

registerNodeFactory<BlockContext, Block> {

   Block(name.text)

}

We’ll leave code generation as an exercise to the reader. Have fun with regular expressions!

The code so far is available on GitHub with the tag “evolving-the-language”.

A Language Is More Than Just the Language

So far we’ve focused on the parser and code generator, but a modern language is more than that. We expect to have accompanying tools like editors or IDE plugins, linters, etc. Of course, we cannot include all that stuff in our tutorial or it would become a book!

Fortunately, we have other tutorials that explore some of those topics in more detail: editors, code completion, etc.

Importantly, we’ve also skipped the treatment of more advanced semantics topics:

  • symbol resolution, i.e. connecting nodes in the AST when one referfs to another by name;
  • and type computation, when we assign each expression node a type that tells us which uses of the node are guaranteed to be valid (such as using it with a certain function or method).

We have other articles about those aspects too. Again, for the sake of brevity and simplicity, we couldn’t include them here. But these concerns will apply to our language too when it grows, because some entities (e.g. a block item) refer to other entities (e.g. a block type). 

Wrapping Up

Our language is nowhere near being complete, however, we’ve shown all the basic ingredients and hinted at possible improvements and extensions, including:

  • Code generation using a proper language module such as JavaParser;
  • Semantic enrichment such as symbol resolution and type computation;
  • Editor support including syntax highlighting, error reporting, code completion.

All the code is on GitHub and we have tagged several commits to match the chapters in this tutorial, so you can better follow the evolution of this mini-language.

We hope that you’ve learned something and that you’ll want to build upon it to grow your own DSL, for Minecraft or other applications. And if that ever becomes a commercial project, and you need consulting to overcome some obstacle or validate your work, we’re here to help.