Getting started with JavaParser: analyzing Java Code programmatically

One of the things I like the most is to parse code and to perform automatic operations on it. For this reason I started contributing to JavaParser and created a couple of related projects: java-symbol-solver and effectivejava.

As a contributor of JavaParser I read over and over some very similar questions about extracting information from Java source code. For this reason I thought that I could help providing some simple examples, just to get started with parsing Java code.

All the source code is available on Github: analyze-java-code-examples

java_jp

Common code

When using JavaParser there are a bunch of operations we want typically to do every time. Often we want to operate on a whole project, so given a directory we would explore all the Java files. This class should help doing this:

For each Java file we want first to build an Abstract Syntax Tree (AST) for each Java file and then to navigate it. There are two main strategies to do so:

  1. use a visitor: this is the right strategy when you want to operate on specific types of AST nodes
  2. use a recursive iterator: this permits to process all sort of nodes

Visitors can be written extending classes included in JavaParser, while this is a simple node iterator:

Now let’s see how to use this code to solve some questions found on Stack Overflow.

How to extract the name of all classes in a normal String from java class?

Asked on Stack Overflow

This solution can be solved looking for the ClassOrInterfaceDeclaration nodes. Given we want a specific kind of node we can use a Visitor. Note that the VoidVisitorAdapter permits to pass an arbitrary argument. In this case we do not need that, so we specify the type Object and we just ignore it in our visit method.

We run the example on the source code of JUnit and we got this output:

 

Is there any parser for Java code that could return the line numbers that compose a statement?

Asked on Stack Overflow

In this case I need to find all sort of statements. Now, there are several classes extending the Statement base class so I could use a visitor but I would need to write the same code in several visit methods, one for each subclass of Statement. In addition I want only to get the top level statements, not the statements inside it. For example, a for statement could contain several other statements. With our custom NodeIterator we can easily implement this logic.

And this is a portion of the output obtained running the program on the source code of JUnit.

You could notice that the statement reported spans across 5, not 6 as reported (12..17 are 6 lines). This is because we are printing a cleaned version of the statement, removing white lines, comments and formatting the code.

Extract methods calls from Java code

Asked on Stack Overflow

For extract method calls we can use again a Visitor, so this is pretty straightforward and fairly similar to the first example we have seen.

As you can see the solution is very similar to the one for listing classes.

Next steps

You can answer a lot of questions with the approaches presented here: you navigate the AST, find the nodes you are interested into and get whatever information you are looking for. There are however a couple of other things we should look at: first of all how to transform the code. While extract information is great, refactoring is even more useful. Then for more advanced questions we need to resolve symbols using java-symbol-solver. For example:

  • looking at the AST we can find the name of a class, but not the list of interfaces it implements indirectly
  • when looking at a method invocation we can not easily find the declaration of that method. In which class or interface was it declared? Which of the different overloaded variants are we invoking?

We will look into that in the future. Hopefully these examples should help you getting started!

5 Comments

  1. Pingback: Getting started with JavaParser: analyzing Java Code programmatically | Ranfind Web Programming

  2. Hi Federico,

    first of all, many thanks for this post. it’s indeed very useful.

    my name is Jose, and I’m the author of the post “Is there any parser for Java code that could return the line numbers that compose a statement?”. although useful, the code that you pasted to answer that question, does not actually answered it… in fact, it just returns the begin/end line number of each method. (please correct me if I’m wrong).

    let’s assume I’ve this class ( http://pastebin.com/bW6eprWB ), which has several statements spread by several lines. my idea was to build a parser that returns (for the above example), something like
    statement [line numbers that compose a statement]

    statement 1 [1, 2]
    statement 11 [11, 12, 13, 14, 15]
    statement 19 [19, 20]
    statement 23 [23, 24, 25]
    statement 30 [30, 31]
    statement 32 [32, 33]
    statement 41 [41, 42]

    do you have any idea on how I should update your snippet to handle my Example class?


    Thanks

  3. Hi Jose, thank you for you comment. Yes, I wanted to solve a problem a little more general. Do what you want should be very, very easy starting from the example I presented. I print the start and ending line, while you want the list of all lines on which the statement spans, right? You can do that using a simple for. Am I missing something?

  4. Hi Federico,

    First of all thanks a lot for this post, it has been very useful to me.
    My name is Arpeet, I am a graduate student at San Jose State University. I am working on a research project where I am using Java Parser. I had few doubts about parsing specific types of members and variables from a source code. Is there any way by which I can specifically parse or classify standard java functions and functions used in my source code. Also, is there any way by which I can separately parse global variables and local variables from the source codes.

    Best Regards,
    Arpeet

  5. JavaParser produces an AST. The AST contains all the information that is directly present in the code but it does not do all the elaborations necessary to resolve symbols. Now, if you see a method call to a method X you need to look if the class has a method X or if it has an ancestor named X or if it is statically importing a static method named X and so on. Only when you do that you can find which method X is being called and so understand if it comes from the Java standard library or not. The same applies to resolving types of members and variables. Now, this is rather involved but you may want to look at the java-symbol-solver, a project I wrote to do exactly that. It is on GitHub: https://github.com/ftomassetti/java-symbol-solver

Leave a Reply

Your email address will not be published. Required fields are marked *