mps-bytecode: creating, loading, modifying, saving and executing JVM class files using Jetbrains MPS

I recently created this project to edit JVM class-files inside Jetbrains MPS.

Why?

  • It was fun
  • It is a great tool to learn about the JVM internals. Do you know what a frame is? How dynamic invoke works?
  • Instead of generating Java code for my applications I could translate to mps-bytecode. More about this later.

Code is on GitHub, as always: mps-bytecode

Opening an existing class files

Consider this simple program:

package me.tomassetti;

public class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello world!");
    }
}

You can compile it simply by running:

javac HelloWorld.java

Now you have a class file. You can import that class file in MPS. To do that you just create a ClassFileLoader, type the filename of the class file and click on load

ex1.mkv

I will add support to import whole JARs. It should be pretty simple.

Taking a look at the class file

One key element of the class file is the constant pool. It is basically an indexed list of elements which define constants. Each of these elements has one type and it is followed by some data.

Look at this example:

  • We have several UTF8 elements (e.g., 7, 8, 10, 12, 12, 13, 14, etc.).
  • We have also Strings, which just contain a pointer to an UTF8 element.

For example the element 3 is composed by one byte indicating the type of the element (Constant Pool String) and one byte indicating the index of the UTF8 string (in this case 18, corresponding to the string “Hello world!”).

Screenshot from 2016-04-24 11-29-17

Note that when we loaded the file we created proper references between class pool elements. That means that if you move the referenced element (e.g., 18), the element holding the reference (e.g., 3) will be updated automatically:

ex2.mkv

We have then fields and methods. This class contains two methods, even if we defined just one. The additional method is the default constructor:

Screenshot from 2016-04-24 11-40-24

It contains the access flags, the name and descriptor indexes (the actual strings “<init>” and “()V” are contained in the Constant Pool). After that there are the attributes. One very important attribute is the “Code” attribute which contains the instructions which are executed by the JVM when this method is invoked.

aload_0
invokespecial 1
return

In addition to that there is one attribute of the code attribute: it basically tells us the line numbers corresponding to the different instructions

This is instead our main method:

Screenshot from 2016-04-24 11-40-37

Not much more code here:

gestatic 2 0         
ldc 3               
invokevirtual 4
return

It does the following:

  • take the static field specified at index 2 (the index is 2 0 where 0 is the high byte and 0 the low byte -> 2 in total). It is the out field of System
  • it pushes on the stack the String constant at index 3 (Hello world!)
  • it invokes the method specified at index 4 (println). The method will take the parameters from the stack
  • it returns

 

Running a class file from inside MPS

You can also run classes inside MPS.  You have to define Executors like this one:

Screenshot from 2016-04-24 11-54-09

The executor defines the class to invoke, the classpath to use and the arguments passed to the main function.

In this case we just specify a link to a class file, no additional classpath entries and no arguments

ex3

 

Modifying and saving a class file

We have seen how to import a Class File but you can also create them from scratch. You can then edit the class files as you want and save them simply by generating the project containing them. The editor knows about the semantic of class files so it will help you writing consistent class files. For example, if an instruction requires using an index to a Constant Pool String element this will be verified for you. Writing class files is not trivial, so a little help is not bad.

What can this be used for?

Aside from having fun and learning more about the JVM, my goal is to define new languages to compile for the JVM, without having to generate Java source code.

Now, in many cases it can be simpler to just generate Java and let the Java compiler handle the bytecode generation for you. However on one side there is possible valid “JVM code” which cannot be obtained by compiling Java. This could involve invoke dynamic, for example. In addition to that there is a guts-feeling factor: if I generate Java code and compile that my language does not feel completely to be a real language. I know it is not rationale and there are plenty of proper languages which generated C code instead of producing machine code, however this is a sensation I cannot fight for now. So I will play a bit more with these concepts and see what I learn from this.

Also, mps-bytecode could also be used to examine JVM classes obtained from several languages.

What do you think of this? Any other usage you can envision for it?