Getting started with JavaParser: analyzing Java Code programmatically

One of the things I like the most is to parse code and to perform automatic operations on it. For this reason I started contributing to JavaParser and created a couple of related projects: java-symbol-solver and effectivejava.

As a contributor of JavaParser I read over and over some very similar questions about extracting information from Java source code. For this reason I thought that I could help providing some simple examples, just to get started with parsing Java code.

All the source code is available on Github: analyze-java-code-examples

Getting Started with JavaParser

Common code

When using JavaParser there are a bunch of operations we want typically to do every time. Often we want to operate on a whole project, so given a directory we would explore all the Java files. This class should help doing this:

For each Java file we want first to build an Abstract Syntax Tree (AST) for each Java file and then to navigate it. There are two main strategies to do so:

  1. use a visitor: this is the right strategy when you want to operate on specific types of AST nodes
  2. use a recursive iterator: this permits to process all sort of nodes

Visitors can be written extending classes included in JavaParser, while this is a simple node iterator:

Now let’s see how to use this code to solve some questions found on Stack Overflow.

How to extract the name of all classes in a normal String from java class?

Asked on Stack Overflow

This solution can be solved looking for the ClassOrInterfaceDeclaration nodes. Given we want a specific kind of node we can use a Visitor. Note that the VoidVisitorAdapter permits to pass an arbitrary argument. In this case we do not need that, so we specify the type Object and we just ignore it in our visit method.

We run the example on the source code of JUnit and we got this output:


Is there any parser for Java code that could return the line numbers that compose a statement?

Asked on Stack Overflow

In this case I need to find all sort of statements. Now, there are several classes extending the Statement base class so I could use a visitor but I would need to write the same code in several visit methods, one for each subclass of Statement. In addition I want only to get the top level statements, not the statements inside it. For example, a for statement could contain several other statements. With our custom NodeIterator we can easily implement this logic.

And this is a portion of the output obtained running the program on the source code of JUnit.

You could notice that the statement reported spans across 5, not 6 as reported (12..17 are 6 lines). This is because we are printing a cleaned version of the statement, removing whitelines, comments and formatting the code.

Extract methods calls from Java code

Asked on Stack Overflow

For extract method calls we can use again a Visitor, so this is pretty straightforward and fairly similar to the first example we have seen.

As you can see the solution is very similar to the one for listing classes.

Next steps

You can answer a lot of questions with the approaches presented here: you navigate the AST, find the nodes you are interested into and get whatever information you are looking for. There are however a couple of other things we should look at: first of all how to transform the code. While extract information is great, refactoring is even more useful. Then for more advanced questions we need to resolve symbols using java-symbol-solver. For example:

  • looking at the AST we can find the name of a class, but not the list of interfaces it implements indirectly
  • when looking at a method invokation we can not easily find the declaration of that method. In which class or interface was it declared? Which of the different overloaded variants are we invoking?

We will look into that in the future. Hopefully these examples should help you getting started!

13 replies
  1. Jose says:

    Hi Federico,

    first of all, many thanks for this post. it’s indeed very useful.

    my name is Jose, and I’m the author of the post “Is there any parser for Java code that could return the line numbers that compose a statement?”. although useful, the code that you pasted to answer that question, does not actually answered it… in fact, it just returns the begin/end line number of each method. (please correct me if I’m wrong).

    let’s assume I’ve this class ( ), which has several statements spread by several lines. my idea was to build a parser that returns (for the above example), something like
    statement [line numbers that compose a statement]

    statement 1 [1, 2]
    statement 11 [11, 12, 13, 14, 15]
    statement 19 [19, 20]
    statement 23 [23, 24, 25]
    statement 30 [30, 31]
    statement 32 [32, 33]
    statement 41 [41, 42]

    do you have any idea on how I should update your snippet to handle my Example class?


  2. Federico Tomassetti says:

    Hi Jose, thank you for you comment. Yes, I wanted to solve a problem a little more general. Do what you want should be very, very easy starting from the example I presented. I print the start and ending line, while you want the list of all lines on which the statement spans, right? You can do that using a simple for. Am I missing something?

  3. Arpeet Desai says:

    Hi Federico,

    First of all thanks a lot for this post, it has been very useful to me.
    My name is Arpeet, I am a graduate student at San Jose State University. I am working on a research project where I am using Java Parser. I had few doubts about parsing specific types of members and variables from a source code. Is there any way by which I can specifically parse or classify standard java functions and functions used in my source code. Also, is there any way by which I can separately parse global variables and local variables from the source codes.

    Best Regards,

  4. Federico Tomassetti says:

    JavaParser produces an AST. The AST contains all the information that is directly present in the code but it does not do all the elaborations necessary to resolve symbols. Now, if you see a method call to a method X you need to look if the class has a method X or if it has an ancestor named X or if it is statically importing a static method named X and so on. Only when you do that you can find which method X is being called and so understand if it comes from the Java standard library or not. The same applies to resolving types of members and variables. Now, this is rather involved but you may want to look at the java-symbol-solver, a project I wrote to do exactly that. It is on GitHub:

  5. Colin Maxfield says:

    Hi Federico,

    Thanks for your guide it was very helpful. I was wondering though if there was a way to remove a node from the AST using the node iterator method of looking at the AST. If you use the visitor approach you can just return “null” from the visit method and that will remove it from the tree but how would you accomplish a similar thing from the node iterator method?


  6. Federico Tomassetti says:

    Hi Colin, yours is a very interesting question. It is indeed currently difficult to do so in Javaparser because a node know its parent but it does not know the collection is part of, so there is not a generic way to remove a Node.

    I think there are two approaches to this:
    1) We build this into Javaparser: I think it is a useful feature and I am going to open a ticket to discuss this
    2) As a temporary alternative you can do some reflection trick

    Now about the reflection trick: we use the getter-setters conventions in Javaparser and you can take advantage from that. When you want to remove a node you take the parent node and then through reflection you find all the methods which return nodes or list of nodes. You then invoke those methods to find the field containing the node to remove. Suppose you want to delete a node which is the value of an AssignExpr ( First you get the parent (AssignExpr) and you find it has two methods returning nodes: getTarget and getValue. You invoke them and find that the node you want to delete is returned by getValue. At this point you can invoke setValue(null) and remove the node from the three. This approach can be generalized so that it can be applied to all classes. It would be slow and require some familiarity with how reflection works but it should not be too difficult to implement.

  7. Colin Maxfield says:

    That reflection trick looks like it will do the trick. I was able to get a form of that working and it wasn’t too bad. I agree that it would be a useful feature but thanks for this work around!

  8. Bianca Del Carretto says:

    Hi Federico,

    Sorry if I am bothering you with a silly question, but can I run Junit test in command line? I am not able to do so with gradle test because there is no test task, maybe shall I add it to the build file?

    I would like to completely understand how to obtain an ast from source Java code.

    Thanks a lot!

  9. Federico Tomassetti says:

    Ciao Bianca,
    there are no silly questions, only not exhaustive posts 🙂
    There are no tests so you can run them but if you write tests under src/test/ you can then run them with ./gradlew test
    Does it help?

  10. Bianca Del Carretto says:

    It helps a lot, thank you very much.
    I have just another question for you: do you think I can use javaparser to obtain a graph grammar from java source code?
    I should make this transformation for my master thesis.

    Thanks again!

  11. Federico Tomassetti says:

    What do you mean by graph grammar?
    JavaParser produces an Abstract Syntax Tree. If you meant you need an AST then sure.
    If you meant instead a Graph because you need to have links from usages of symbols and their definitions then you need to use JavaSymbolSolver together with JavaParser.

Trackbacks & Pingbacks

  1. […] are reusing DirExplorer, a supporting class presented in the introduction to JavaParser. This class permits to process a directory, recursively, parsing all the Java files contained […]

  2. […] by /u/ftomassetti [link] [comments] Статья полностью:Getting started with JavaParser: analyzing Java Code […]

Comments are closed.