Java comments parsing

Recently I have done some work on JavaParser, focusing on parsing comments and attributing them to the element being commented.

I like working on manipulating source code. I like this problem also because it does not have obvious solutions, but it can be solved only relaying on heuristics and conventions.

Some notes on comments parsing as it is implemented right now, with more documentation to come soon.


Three different kinds of comments are parsed:

  • Line comments (from // to end line)
  • Block comments (from /* to */)
  • Javadoc comments (from /** to */)

Comments are parsed as all the other elements of the grammar, so we provide their position in the source code and their content.

We also try to understand to which element they refer and attribute comments to the node we supposed being the target of that comment. Note that to do that we use some simple heuristics, and while this normally works quite well there are limitations and it is not possible to devise an algorithm able to understand with absolute accuracy which element is targeted by a comment.

Principle used to attribute comments

  • Each element can have only one comment associated
  • Line comments which follow an element on the same line are attributed to the last element present in the line which starts and end on the line. If no element start and end on the line, the comments is associated to the last node ending on that line. This kind of association is stronger in respect to the others.
  • Comments which are alone in one line (or more than one lines) are associated to the first element following them.
  • A comment cannot be associated to another comment (i.e., no comments commenting other comments)
  • Comments not on the same line as other nodes and preceeding empty lines are considered orphans (this behavior can be changed using JavaParser.setDoNotAssignCommentsPreceedingEmptyLines(boolean) )

Not all the comments can be associated to one element, the remaining comments are considered orphan comments. They will be inserted in the list of orphan comments of the first node which contains them.

Typical Use Examples

class A {
// orphan comment

In this case there is no element immediately following the orphan comment, therefore it is listed as an orphan comment of the element containing it (class A).

/* Orphan comment /
Comment of the class */
class A { }

In this case the first comment is attributed to the declaration of variable a because it precedes it, while the second remains an orphan comment because empty lines separate it from the first node. If JavaParser.setDoNotAssignCommentsPreceedingEmptyLines(false) was invoked before parsing, also the second comment would have been associated to the following declaration.

int a = 0; // comment associated to the field

This comment is associated to the whole field, because it is the last (and only) node before the comment.

int a
= 0; // comment associated to zero

In this case “another comment” is associated to the variable declaration, and because only one element can be associated to a single node “a comment” remains an orphan comment.

Atypical Use Examples

Due to the liberal nature of what is considered valid with regards to comment syntax the parser has had to make a number of sensible assumptions.

/* A block comment that
// Contains a line comment
public static void main(String args[]) {

In this case a single comment is created for the block comment, where the content is “A block comment that // Contains a line comment”

// Returns number of vowels in a name
public int countVowels(String name) {

In this case the line comment is attributed to the return type of the method, rather than the method itself. This is because the start line number of a method is determined by it’s first annotation; therefore all methods comments need to proceed annotations.

The up to date documentation is available at

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply