Recently I have done some work on JavaParser, focusing on parsing comments and attributing them to the element being commented.
I like working on manipulating source code. I like this problem also because it does not have obvious solutions, but it can be solved only relaying on heuristics and conventions.
Some notes on comments parsing as it is implemented right now, with more documentation to come soon.
Three different kinds of comments are parsed:
- Line comments (from // to end line)
- Block comments (from /* to */)
- Javadoc comments (from /** to */)
Comments are parsed as all the other elements of the grammar, so we provide their position in the source code and their content.
We also try to understand to which element they refer and attribute comments to the node we supposed being the target of that comment. Note that to do that we use some simple heuristics, and while this normally works quite well there are limitations. It is simply not possible to devise an algorithm able to understand, with absolute accuracy, which element is targeted by a comment. That is because the language itself does not provide a formal way to indicate that, so any developer can choose its own style.
Principles Used to Attribute Comments to Elements
- Each element can have only one comment associated
- Line comments which follow an element on the same line are attributed to the last element present in the line which starts and end on the line. If no element start and end on the line, the comments is associated to the last node ending on that line. This kind of association is stronger in respect to the others.
- Comments which are alone in one line (or more than one lines) are associated to the first element following them.
- A comment cannot be associated to another comment (i.e., no comments commenting other comments)
- Comments not on the same line as other nodes and preceding empty lines are considered orphans. This behavior can be changed using
JavaParser.setDoNotAssignCommentsPreceedingEmptyLines(boolean)
Not all the comments can be associated to one element: the remaining comments are considered orphan comments. They will be inserted in the list of orphan comments of the first node which contains them.
Typical Use Examples
class A { // orphan comment }
In this case there is no element immediately following the orphan comment, therefore it is listed as an orphan comment of the element containing it (class A).
/* Orphan comment */ /* Comment of the class */ class A { }
In this case we can see that the comment closer to the class declaration is assigned to it, while the one preceeding it is marked as orphan.
// First comment int aVar; // Second comment int anotherVar;
In this case the first comment is attributed to the declaration of variable a because it precedes it, while the second remains an orphan comment because empty lines separate it from the first node. If JavaParser.setDoNotAssignCommentsPreceedingEmptyLines(false)
was invoked before parsing, also the second comment would have been associated to the following declaration.
// lost comment int a = 0; // comment associated to the field
This comment is associated to the whole field, because it is the last (and only) node before the comment.
int a = 0; // comment associated to zero
In this case the comment is associated to zero, because both zero and the field (which contains also the node 0) end in the same line but the 0 also start on this line, so it is preferred.
// a comment int b; // another comment
In this case “another comment” is associated to the variable declaration, and because only one element can be associated to a single node “a comment” remains an orphan comment.
Atypical Use Examples
Due to the liberal nature of what is considered valid with regards to comment syntax, the parser has had to make a number of sensible assumptions.
/* A block comment that // Contains a line comment */ public static void main(String args[]) { }
In this case a single comment is created for the block comment, where the content is A block comment that // Contains a line comment
.
@Override // Returns number of vowels in a name public int countVowels(String name) { }
In this case the line comment is attributed to the return type of the method, rather than the method itself. This is because the start line number of a method is determined by it’s first annotation; therefore all methods comments need to proceed annotations.
The up to date documentation is available in the wiki of the project: Comments.