How and Why to Analyze, Generate and Transform Java Code Using Spoon

How and Why to Analyze, Generate and Transform Java Code Using Spoon

This article is paired by a companion repository with all the code: ftomassetti/spoon-examples

Spoon is a tool to analyze, generate, and transform Java code.

In this article we will see what can be achieved by using techniques for processing code programmatically. I think these techniques are not very well-known or utilized and this is a pity because they can be incredibly useful. Who knows, some ideas could be useful for your current projects even if you do not want to use Spoon or not even process Java code but use instead C#, Python, Kotlin or some other language. Let’s learn how to program in a smarter way.

Spoon has some features overlapping with JavaParser, a framework I contribute to. For some tasks Spoon could be a better choice, while for others JavaParser has a clear advantage. We will dive into the differences among these tools later on.

This article is paired by a companion repository with all the code: ftomassetti/spoon-examples

What can be achieved using code processing techniques?

Spoon, and code processing tools in general, can be used for:

  • Code analysis
    • Computing source code metrics, for example finding out how many classes have more than a certain number of methods
    • Enforce architectural rules, like forcing all test classes to have a name ending in Test or accessing the database only from a certain package
    • Implement static analysis techniques to identify bugs, bad code smells and anti-patterns, similarly to what is done with FindBugs or SonarJava
    • Using it as an annotation processor (which is basically a compiler plugin) to extract information from code
  • Code generation
    • Generate repetitive code programmatically. For example, generate a visitor from a hierarchy of classes (you can read more in our article about code generation)
    • Generate code from some model. For example, generate serialization classes from an XML schema
  • Code transformation
    • Automated refactoring, like transforming a parameter used in several methods in a field specified in the constructor
    • Instrumenting code, for example for logging or code coverage purposes 
    • Semantic patching, for example migrating a project to use a new version of a library
    • Transpiling to another language, for example from Java to C++ (you can read more in our article about creating transpilers)

These big three families are roughly distinguished from the way we interact with code:

  • In code analysis code is an input that we use to produce an output that is not code
  • In code generation we use some input that is typically not code, or not the code in the same language we output. The output is code
  • In code transformation the same codebase is the input and output

Setting Up Spoon

To setup spoon you need to provide:

  • the code to analyze
  • all the dependencies (and the dependencies of the dependencies, of course)

With this information Spoon build a model of your code. On that model you can perform fairly advanced analyses. This is different from how JavaParser works. In JavaParser, if you want, you can just build a lightweight model of your code, without the need of considering dependencies. This can be useful when you do not have the dependencies available or when you need to perform simple and fast operation. You can also do more advanced analysis by enabling symbol resolution, but that is optional and work also when only some of the dependencies are available.

One thing I liked about Spoon is the support for taking the configuration from Maven. This is a very useful feature in my opinion. I would just love to have support for Gradle, however.

In our example we do not use the maven configuration, we just specify a directory containing our code. In our case we are examining the core module of JavaParser, which has zero dependencies, so we do not need to specify any JAR to build our code model.

Now that we have a model let’s see how we can use it.

By the way, examples are in Kotlin because it is such a concise and nice language that it works quite well for tutorials, in my opinion. Do you agree?

Performing Code Analysis Using Spoon

Let’s start with printing a list of classes with more than 20 methods:

In this example we setup the model in the main function, then in examineClassesWithManyMethods we filter the classes by number of methods and then use a couple of utility functions to print a list of those classes (printTitle, printList).

Running this code we obtain this output:

Let’s try something else now. Let’s try to find all test classes and ensure their names ends with “Test”. A test class will be a class with at least a method annotated with org.unit.Test.

Building the model is almost the same as before, we just added more source directories and JARs, as the testing module as a dependency on JUnit.

In verifyTestClassesHaveProperName  we:

  • filter all classes which are test classes (they have at least a method annotated with org.junit.Test)
  • find all test classes with a name ending with Test and all test which don’t
  • we print the list of the classes to be fixed and some statistics about them

Let’s run this code and we get this result:

Of course these were rather simple examples but hopefully they should be enough to show the potential of Spoon and code analysis. It is reasonably easy to process the model representing your code, extract interesting information, and verify certain semantic rules are respected.

For more advanced usages you can also take a look at this article about Using Spoon for Architecture Enforcement.

Performing Code Generation Using Spoon

Let’s see an example of code generation considering a common task: serialization and unserialization of code to and from JSON. We will start by taking a JSON schema and from that we will generate classes to represent the entities described by the JSON schema.

This is a rather advanced example and it took me a while to familiarize with Spoon enough to be able to write it. I had also to ask a few questions to their team to solve a couple of issues. It is true that this code is far from trivial to write, however I think we should consider that this is a significantly complex feature so that sounds fair to me.

Ok, now let’s jump into the code.

This is a JSON schema:

At the top level we can see the entity represented by the whole schema. We know it will be represented as an object and have two properties:

  • fruits: an array of string
  • vegetables: an array of veggies where a veggie is another object described below, in the definitions section

In the definition section we can see that veggie is an object with two properties:

  • veggieName: a string
  • veggieLike: a boolean

What We Should Get

What we want to get is two java classes: one to represent the whole schema and one to represent single veggiesThese two classes should permit to read and write the single fields, to serialize the instance to JSON and to unserialize the instance from JSON.

Our code should generate two classes:


This is an example of how we could use these two classes:

In the example we build an instance of FruitThing and a couple of Veggies. We then serialize them and unserialize them back, so that we can prove that both serialization and unserialization work.

The Generation Process: General Organization

The generation process will produce a set of instances of GeneratedJavaFile, each with its own filename and code. We could later write them on file or compile them in memory.

In the main function of our program we will read the JSON schema and pass it to the function generateJsonSchema. We will pass it along with two parameters: first the name of the package in which to generate our classes, then the name of the class representing the whole schema.

Once we will get the generated classes we will just print them on the screen to take a quick look.

Ok, so the magic is happening in generateJsonSchema, right?

In generateJsonSchema we parse the InputStream providing the schema and we call generateClasses, which will return us a bunch of CompilationUnits. Basically, every CompilationUnit is the Abstract Syntax Tree of a single Java file.

Once we get those compilation units we print them as Java code. We also calculate the appropriate filename and instantiate instances of GeneratedJavaFile.

So, it seems we have now to take a look at generateClasses.

In generateClasses we first create the package (CtPackageImpl class). We will use it to generate all classes. We will keep it into the ClassProvider class. It will be used to generate and track the classes we will produce. Then we call an extension method we added to schema, which is called generateClassRecursively.

Finally we will get the classes out of classProvider and put them in CompilationUnits.

What happens in generateClassRecursively? Basically we look for schemas defining objects and for each of them we generate a class. We also crawl the schema looking into properties, to see if they indirectly define or use other object schemas for which we may want to generate classes.

A single class is generate in the extension method generateClass for ObjectSchema. When it produces a class we pass it to the classProvider so that it is recorded.

So far we have setup the logic to crawl the schema and decide what to generate, but we have not seen much of the Spoon specific API. This changes in generateClass.

Here we start by instantiating CtClassImpl, then we:

  • set the proper package (obtained from the classProvider)
  • set the class as public
  • specify the name of the class: we could have received as parameter, in the case of the class representing the whole schema, otherwise we can derive it from the schema itself
  • look at the single properties and handle them in addProperty
  • call addSerializeMethod to add a serialization method that we will use to generate JSON from an instance of this class

So, what we do to add a property?

We simply add a field (CtField). We set the right name, type, and visibility and add it to the class. For the moment we do not generate getters or setters.

The Generation Process: Serialization

In this section we will see how we generate the serialize method of our classes. For our two classes they look like this:

This is the entry point for the generation of such method:

We instantiate CtMethodImpl and then:

  • we set the visibility of the method
  • we set the return type to JSONObject
  • we set the name to serialize
  • we create the res variable of type JSONObject
  • for each property we will generate serialization statements that will add the value of the property into res
  • finally we add a return statement and set this block as the body of the method

Here we have used a bunch of utility methods to simplify our code because the Spoon API is quite verbose.

For example, we are using createLocalVar and objectInstance, which look like this:

Now we can take a look at how we generate the statements of the serialization method for the specific properties.

Basically we delegate to SerializationUtils.serialize. That method will be included in the runtime library to be used with our generated code.

This is how it looks:

The way we serialize a certain property depends on its type. It is easy for simple values (strings and booleans) while it gets more tricky for arrays. For anything that is JsonSerializable we call the corresponding serialize method. Why we want to do this? So that we can use the serialize method we generate for our classes (FruitThing and Veggie).

The Generation Process: Unserialization

Let’s see the unserialize methods we should be able to generate:

Which is the piece of code responsible for generating such methods? Unsurprisingly it is called addUnserializeMethod:

The structure is very similar to what we have seen before. Of course here what is relevant is the call to addUnserializeStmts.

Now, here things get complicated. We have basically to call the setter for each property. To the setter we will pass the result of unserialize with the appropriate cast to match the type of the property. To call unserialize we need a TypeToken, which is used to guide the unserialization process. We want to unserialize differently the same value, depending if we want to obtain an integer or string: the type token tells us what we are aiming to obtain.

The Generation Process: Comments

To build this example we had to write a lot of utilities methods. There are some parts of the whole example we did not show here in the article, however you can find all of that code in the companion repository.

Note also that we could save the code to file and use the compiler API to compile programmatically. We could even compile it in memory if we wanted. In a real case I would suggest doing this instead of copy-pasting code manually into a file as I did working on this tutorial.

Performing Code Transformation Using Spoon

Code transformations can be very useful when working with large codebases or to prevent human errors on repetive tasks.

For example, imagine you decided to change how a specific pattern has to be implemented. Suppose you are using the singleton pattern tens of times in your codebase and you want to ensure that every time the instance is created lazily (i.e., only when it demanded for the first time). You could perform this transformation automatically.

Or suppose that you are updating a library you are using and a certain method you were relying on was renamed, or the order of its parameter changed. Again, you could solve this by using a code transformation.

For our example we will take something simple. We will refactor a single class. In this class we have several methods receiving, among others, a specific parameter. Given this parameter is required for basically every operation, we decided to move it to the constructor and save it as a field instance. We want then to transform all methods that were getting that parameter, so that they will not require it anymore and they will instead access the corresponding field.

Let’s see how the transformation would look like:

In this example we are transforming just the class defining the methods; in a real case we may want to transform also the invocations of those methods.

How did we implement this code transformation

Let’s start by taking a look at the main method of our code transformation example, so that we can see the general structure:

As you can see we:

  • parse the code
  • apply the refactoring, defined in our class ParamToFieldRefactoring
  • we print the resulting code

The interesting bits are of course into ParamToFieldRefactoring.

First of all we add the new field to the class:

Then we add a parameter to all constructors, so that we can receive the value and assign it to the field:

Note that in a real application we may also want to consider the case in which the class used to have just the default constructor, and add a brand new constructor taking the single value to be assigned to a field. To keep things simple we ignored that in our example.

Finally, we want to modify all methods. If they were using a parameter with the name considered we would remove such parameter. We would also look for all references to that parameter and replace them with references to the new field:

And that is it! We should now just print the code and we are done.

How do we do the printing of the code? Through a little extension method named toCode:

More on code transformation

If you want to read more about code transformations with Spoon it could be interesting to take a look at:

  • CocoSpoon, a tool to instrument Java code for calculating code coverage
  • Trebuchet, a Proof-of-Concept to show how Java code can be translated to C++ using Spoon.

How this post was born

Spoon is tool to process Java code. In a way it can be seen as a competitor of JavaParser. I have been wanting to investigate it for a long time but I have a huge pile of things I would like to look into and Spoon never made it to the top of the list. Then some of the users of JavaParser pointed to us a discussion on the Spoon project on the differences between JavaParser and Spoon. In my opinion there were some misconceptions and the Spoon contributors were selling JavaParser a bit short… after all thousands of developers and reputable companies are using JavaParser and are quite happy with it. Also, JavaParser is probably the most well known parser for Java out there. So I started a discussion with the contributors of Spoon and this lead to the idea of writing this post.

While this post was written with the help of Spoon’s contributors I am the author of this post, and I am a contributor to JavaParser so this is my “bias alert”!

Comparing Spoon and JavaParser

Spoon is the academic-born alternative to JavaParser. While JavaParser implement the symbol resolution itself (which is the most hard part) Spoon instead act as a wrapper aroung the Eclipse Java Compiler and then build some high level APIs on top of it. So, what consequences there are of this choice?

  • The Eclipse Java Compiler is mature and while it is not bug-free is reasonably solid
  • The Eclipse Java Compiler is a large beast which come with its dependencies and its complex configuration
  • The Eclipse Java Compiler is… a compiler, it is not a library for symbol resolution so it is less flexible than the home grown solution we have at JavaParser

Personally I am very biased by being a contributor to JavaParser. I am used to JavaParser and certain behaviors of Spoon seemed unnatural to me. For examples, type casts on snippet expressions seemed not to work; class access (e.g., “String.class”) is not represented by a specific expression but as field access. However, some features are really useful and we should get them in JavaParser too.

All in all they are different tools, with different sets of features and I think also different philosophies, as we discuss below.

Regarding documentation, it seems to be a bit better for JavaParser: we have a book, available for free and downloaded thousands of times, and we have a set of tutorials.

Different philosophies

Now, Spoon was created in an academical environment and in France. In my experience French engineers are very talented but they tend to re-invent things in a “Frenchy way”. Take for example the license adopted for the project: is that Apache License? GPL? LGPL? The Eclipse license? No, it is the CeCILL-C FREE SOFTWARE LICENSE AGREEMENT. A license I never heard of, creating specifically to comply with some French regulations. Now, this could be the greatest license ever written but for a company wanting to adopt this project they would need to look into it, figure out what this means, what are the implications, if it is compatible with the other licenses they are using and so on. In my opinion things could be much, much simpler if they have just picked one existing license. Because there is reality out there and in this reality companies do not want to have to study this license just to use Spoon. This is very different from the approach we have in JavaParser where we are very pragmatic. We discussed with companies and figured out which licenses they needed, then we worked hard to offer a double license (Apache License or LGPL) to our users. Why? Because they were options they were familiar with.

In general I had this feeling of different philosophies while talking with the guys from Spoon. They clearly perceived their product is much better and frankly seem a bit disappointed that JavaParser is so much more popular. We discussed the possibility of some collaborations but they seem to me that they were starting from the perspective we are right. In JavaParser we do not assume we are right. We simply listen to users, discuss between ourselves and then try to move an bit forward, making the lives of our users easier. A big strength is that we receive a lot of feedback, so users help us correct the direction when we are wrong.

Regarding dependencies, at JavaParser we strived so far to keep the core module without any dependency. We may relax this constraint in the future but in general we consider dependency management as an important aspect. In Spoon instead you need to add a maven repository to use a library that is not even on Maven central or any of the well known Maven repositories. Why? Why making life for users that little bit harder?


I think that code processing is pretty powerful: it permits to use our skills as developers to automatize part of our work, reducing the workload and errors. It is a nice tool to have in your toolbox if you work with large codebases. At the very least I think that more developers should be aware of the possibilities it offers.

When performing code processing on Java code Spoon is a valid solution. So I invite you to familiarize with it and consider using it, I think you would do yourself a favor.

Update: following this article the guys at Spoon started looking into changing the license and fix the issue with the dependency being hosted outside Maven central

Get a free course on JavaParser

Screen shot 2017 04 04 at 22.14.55

Directly to your email 5 lessons on JavaParser and the JavaSymbolSolver

Powered by ConvertKit