Generating bytecode

In this post we are going to see how to generate bytecode for our language. So far we have seen how to build a language to express what we want, how to validate that language, how to build an editor for that language but yet we cannot actually run the code. Time to fix that. By compiling for the JVM our code will be able to run on all sort of platforms. That sounds pretty great to me!


This post is the part of a series. The goal of the series is to describe how to create a useful language and all the supporting tools.

  1. Building a lexer
  2. Building a parser
  3. Creating an editor with syntax highlighting
  4. Build an editor with autocompletion
  5. Mapping the parse tree to the abstract syntax tree
  6. Model to model transformations
  7. Validation
  8. Generating bytecode

After writing this series of posts I refined my method, expanded it, and clarified into this book titled
How to create pragmatic, lightweight languages



Code is available on GitHub under the tag 08_bytecode

Adding a print statement

Before jumping in the bytecode generation let’s just add a print statement to our language. It is fairly easy: we just need to change a few lines in the lexer and parser definitions and we are good to go.

The general structure of our compiler

Let’s start from the entry point for our compiler. We will either take the code from the standard input or from a file (to be specified as the first parameter). Once we get the code we try to build an AST and check for lexical and syntactical errors. If there are none we validate the AST and check for semantic errors. If still we have no errors we go on with the bytecode generation.

Note that in this example we are always producing a class file named MyClass. Probably later we would like to find a way to specify a name for the class file, but for now this is good enough.

Using ASM to generate bytecode

Now, let’s dive in the funny part. The compile method of JvmCompiler is where we produce the bytes that later we will save into a class file. How do we produce those bytes? With some help from ASM, which is a library to produce Bytecode. Now, we could generate the bytes array ourselves, but the point is that it would involve some boring tasks like generating the classpool structures. ASM does that for us. We still need to have some understanding of how the JVM is structured but we can survive without being experts on the nitty-gritty details.

About types

Ok, we have seen that our code use types. This is needed because depending on the type we need to use different instructions. For example to put a value in an integer variable we use ISTORE while to put a value in a double variable we use DSTORE. When we call System.out.println on an integer we need to specify the signature (I)V while when we call it to print a double we specify (D)V.

To be able to do so we need to know the type of each expression. In our super, super simple language we use just int and double for now. In a real language we may want to use more types but this would be enough to show you the principles.


As we have seen the JVM is a stack-based machine. So every time we want to use a value we push it on the stack and then do some operations. Let’s see how we can push values into the stack


We can also create a gradle task to compile source files


We did not go in any detail and we sort of rush through the code. My goal here is just to give you an overview of what is the general strategy to use to generate bytecode. Of course if you want to build a serious language you will need to do some studying and understand the internals of the JVM, there is no escape from that. I just hope that this brief introduction was enough to show you that this is not as scarying or complicate and most people think.

Download the guide with 68 resources on Creating Programming Languages


Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
1 reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply