How to Build a Symbol Solver for Java, in Clojure

How to build a symbol solver for Java, in Clojure

Last week we have seen how to build models of Java code from source and JAR files. Now we want to be able to solve symbols: in practice we want to link the symbol references in our source code to symbol declarations either in Java source files or jar files. This process is called symbol resolution.

Symbol References and Symbol Declarations

As first thing we have to consider that the same symbol could point to different declarations in different contexts. Consider this absurd piece of code:

It contains several declaration of the symbol a:

  • at line 5 it is declared as field of type int
  • at line 7 it is declared as a parameter of type long
  • at line 11 it is declared as a field of anonymous class, and it has type char
  • at line 14 it is declared as a local variable of type long

We have also two references to the symbol a:

  • at line 8, when we increment a by 1
  • at line 15, when we increment a by 2

Now to solve symbols we can figure out which symbol references refer to which symbol declarations. So that we can do things like understanding the type of a symbol, and so which operations can be performed on a symbol.

Scoping Rules

The principle to solve symbol is easy (but the implementation could be… tricky): given a symbol reference we look for the closest corresponding declaration. Until we find one we keep moving further away from the reference, going up in the AST.

In our example we would match the reference on line 15 with the declaration on line 14. We would also mach the reference on line 8 with the declaration on line 7. Simple, eh?

Consider this other example:

Now we have different definitions of A, in some cases as a Class (lines 3 and 13), in others as a variables (line 20). The reference on line 21 (new A().foo();) matches the declaration on line 13, not the declaration on line 20, because we can use only type declarations to match symbol references inside a new object instantiation statement. Things starts to become not so easy…

There are other things to consider:

  • import statements: importing com.foo.A makes references to A be resolved with the declarations of the class A in package com.foo
  • classes in the same package which can be referred by the simple name, while for the others we have to use the fully qualified name
  • classes in the default package cannot be referred outside the default package
  • fields and methods inherited should be considered (for methods we have to consider overloading and overriding without confusing them)
  • when matching methods we have to consider the compatibility of the parameters (not easy at all)
  • etc. etc. etc.

While the principle is simple, there are tons of rules and exceptions and things to consider to properly resolve the symbols. The nice thing is: we can start easily and then improve the solution as we go on.

Designing Our Symbol Solver

One note: in this post I will explain the approach I am currently using to build my symbol solver for effectivejava. I am not saying this is a perfect solution and I am sure I am going to improve this solution over time. However this approach seems to cover a large number of cases and I think it is not an awful solution.

A simple reference to a name (whatever it indicates a variable, a parameter, a field, etc.) is represented by a NameExpr in the AST produced by JavaParser. We start by creating a function that takes a NameExpr node and return the corresponding declaration, if any could be found. The function solveNameExpr basically just invoke the function solveSymbol passing three parameters:

  1. the AST node itself: it represents the scope in which to solve the symbol
  2. nil: this represent extra context information, in this case it is not needed
  3. the name to be resolved: well, it should be self-explanatory

We start by declaring a protocol (which is somehow similar to a Java interface) for the concept of scope:

The basic idea is that in each scope we try to look for declarations corresponding to the symbol to be solved, if we do not find them we delegate to the parent scope. We specify a default implementation for the AST node (com.github.javaparser.ast.Node) which just delegate to the parent. For some node types we provide a specific implementation.

For BlockStmt we look among the instructions preceding the one examined for variable declarations. The current examine statement is passed as the context. If you are interested for the functions used by this one just look at the code of effectivejava: it is on GitHub 🙂

For a MethodDeclaration we look among the parameters

For ClassOrInterfaceDeclaration we look among the fields of the class or interface (classes could have static fields).

For CompilationUnit we look for other classes in the same package (both using their simple or qualified names), we consider the import statements and we look for the types declared in the file.

If we do not manage to solve a symbol in the scope of the Compilation Unit there is no parent to delegate, so we use the nil scope which represents the absence of scope. In that case only absolute names can be solved, like the canonical names of classes. We do that using the typeSolver function. The typeSolver need to know which classpath to use and it basically look up for classes among source files directories and JAR. Also in this case feel free to dig into the code of effectivejava.

Conclusions

I think Clojure is great to build solutions incrementally: we can implement the different methods of the protocols as we move forward. Build simple tests and improve them one piece at the time.

We have built this simple solution iteratively and until now it is working fine for our goals. We will keep write more test cases and we could have to refactor a thing or two but I think we are heading in the right general direction.

In the future could be useful to extract this symbol resolver in a separate library, to be used with JavaParser to perform static analysis and refactoring.

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
5 replies
  1. Muntazir Fadhel says:

    Hi Federico, since Eclipse for example is based on the JDT AST parser which from what I understand is similar to JavaParser, do you know how is it able to seemingly solve for symbols? Is it using algorithms similar to the one used in this article?

  2. Federico Tomassetti says:

    I do not know how the JDT is built internally. However I think they build symbol tables and then resolve symbols by looking at those symbol tables. The approach I am using here is ok if you want to resolve one specific symbol. Precompiling symbol tables is instead more efficient if you want to solve all symbols in a file.

Trackbacks & Pingbacks

  1. […] i javaparser contributor , did in clojure, on top of javaparser. explain how implement in 1 post how build symbol solver java, in clojure […]

  2. […] it is not trivial but doable. If you are interested in the subject you can read a post I just wrote How to build a symbol solver for Java, in Clojure. The post contains also a link to the code, freely available on […]

  3. […] I am a JavaParser contributor and I just did that in Clojure, on top of JavaParser. I explain how implement that in one post How to build a symbol solver for Java, in Clojure […]

Comments are closed.