Effective Java: a tool to explore and measure your Java code written in Clojure

When I was working at TripAdvisor we had this internal book club. We started by reading Effective Java, this very famous book from Joshua Bloch. It is interesting and all but I think this book was so successful that most of his advices are now part of the Java culture: most of Java developers know them and sort of apply them. The point is that these advices, while very reasonable, are not consistently applied in large codebases. For this reason I decided I wanted to create a tool to verify that our codebase (which was… big) was actually adopting the suggestions from the book, so I started writing a tool named Effective Java and I started writing it in Clojure.

Running queries from the command line

The basic idea is that you can run several queries against your code. The queries implemented are loosely based on the advices of the book. For example, one obvious advice is that you should not have tons of constructors and use factory methods instead. So let’s suppose we want to verify which classes have 5 constructors or more; you run this:

And you get back something like this:

At this point you can look at this code and decide which parts you should refactor to improve the quality of your code. I think that the only way that works when dealing with a large codebase is to consistently improve it with an infinite patience and love.  I find very useful to have some way to narrow your focus on something actionable (e.g., one single class or one single method) because if you stop and stare at the whole codebase you will just leave in despair and become a monk somewhere far, far away from classes 10K of lines long or constructors taking 20 parameters. When faced with such a daunting task you should not think, you should instead find one single problem and fix it. One way to focus is having someone finding issues for you.

paraocchi

 

Let a tool be your blinkers.

Queries implemented

The number of queries currently implemented are very limited:

  • number of non private constructors
  • number of arguments for the non private constructors
  • type of singleton implemented by a class (public field, static factory, enum)

The number of queries is limited because I focused more on building several ways of using the tool (listed below) and because I am working on far too many things 🙂

How the model is built

Another reason for the limited number of queries is that I am currently using JavaParser to build a model of the parsed code. While it is a great tool (not saying this because I am a contributor to the project…:D) it is not able to resolve symbols. On one hand it means less configuration for the user but it limits the kind of analysis which is possible to do. Things could change in the future because JavaParser is evolving to support this kind of analysis. I could also switch to use other tools like JaMoPP or MoDisco . Such tools are definitely more complex and less lightweight but it could be worthy to take a closer look to them. Ideally we want a tool which both parse source code and it is also able to build a model for compiled code (to consider our dependencies). Such tools should be able to integrate the different type of models performing analysis on the resulting megamodel. So the model obtained by parsing a Java file could have references to the model obtained by analyzing a Jar. It is the kind of things which require some work and building a parser it is just a fraction of the effort.

The interactive mode

I think that in some cases you do not exactly what you are looking for. You want just to explore the code and figure out things as you go. So you could parse some source code, ask some queries, then changing the thresholds to limit your results and ask some queries more. Then you fix something and run your query again. Something like this:

Screenshot from 2015-04-05 11:04:15

So I spent some time implementing the interactive mode instead of writing more queries. Stupid me, but I have to say it was fun to write that piece of Clojure code 😀

Using Effective Java as a linter for Java

This feature is still in the very early stages. It currently runs a couple of queries for the number of constructors and the number of parameters for each constructor

Please note that EffectiveJava does not do syntax highlighting: it is the widget I am using to show the code (what a dirty trick, eh?).

How it is implemented

The various queries are implemented as Operation:

An operation takes the actual query to execute on the model of the code, the params (such as the thresholds to be used), and the headers of the table to be produced.

So the operation can be used with printOperation:

printTable contains all the logic to produce a (sort of) nice looking map. Note that different kind of objects could be contained in the result of a query: classes, methods, constructors, fields, etc. I need a way to transform all of them in strings. For that I am using a Clojure multi-method:

For example I print the qualified name (getQName) for different elements. The way it is calculated it is using a Clojure protocol. This is a part of the implementation:

For the interactive mode we use insta-parser to parse the commands:

We have then just a loop passing in the state the list of loaded classes.

To parse the options obtained from the command line we just use parse opts:

There is then some code to interface the tool with Javaparser but you are lucky: I am not going to bore you with that. If you are interested feel free to write to me: I like helping out and I like Javaparser so I am very happy to answer all of your questions about it.

Conclusions

The idea of building tools to explore codebases is something has fascinated me for a while. During my PhD I have built some tools in that general area, for example CodeModels, which wraps several parsers for several languages and permit to perform cross-language analysis.

What do you think? Is it worthy to spend some more time on this tool? Would you like to use something like this? What are your strategy for static analysis?

Download the guide with 68 resources on Creating Programming Languages

68resources

Receive the guide to your inbox to read it on all your devices when you have time

Powered by ConvertKit
4 replies
  1. Jeshan
    Jeshan says:

    Hi Federico,
    I’m currently reading the book too and have been thinking about implementing these ideas too!

    While this would be an exciting challenge to solve with writing code, I was wondering if some ideas have not already been implemented by static analysis tools.

    It turns out yes (although I’m not sure how many of them yet), e.g
    PMD has implemented rules like Use NotifyAll Instead Of Notify and Non ThreadSafe Singleton.

    You’re asking if it’s ok for you to spend more time on this.
    It depends on what your goals are; this would be a nice technical challenge.
    Otherwise, let’s all profit from tools that others have already built!

    Jeshan

  2. Federico Tomassetti
    Federico Tomassetti says:

    Hi Jeshan, thank you for your comment! I guess that there are other linters for Java out there but with effectivejava I would like to build a tool to interactively explore your code and play with different thresholds. In addition to that not all the checks described in effectivejava are implemented in other static analysis tools.

    But yes, you are true, there is some overlapping with other projects.

  3. PuZZleDucK
    PuZZleDucK says:

    Looks cool, I really like the idea of a code-explorer for navigating a large unknown codebase!
    For the purpose of “finding where to start” or “finding the worst parts” how about adding a scoring system… like too many constructors is -5 and too many parameters is worth -3. Then we could sort files by the weights of problems or only look at files above a threshold.

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply