Parse error recovery
Parse error recovery
Since JavaParser 3.2.7, JavaParser is able to recover from parse errors. What does this mean?
It’s not about semantic errors
This is not about semantic errors. The following code will parse correctly, but ParseResult
will tell you parsing was not successful because the class extends from more than one super class. This is a semantic error. The parser is pretty flexible in what it will accept, so it will accept that there is more than one super class. Afterwards, a validation will inspect the resulting CompilationUnit
, check the amount of super classes, and add a Problem
to the ParseResult
if there is more than one.
class X extends Y,Z {
//
}
How do we, the JavaParser developers, choose if a certain error should be a parse error or semantic error? Well, there is not hard rule, but we prefer to let the grammar construct an AST and nothing else, since semantic validation are much easier to write and maintain.
Parse errors: CompilationUnit
level
Here is a parse error:
class X {
Uh oh, where is the }
? Oops! Up until version 3.2.6, this would result in no AST at all. The parser would throw an exception, we would add a problem for that exception, but we had no AST in our hands. We have started to implement JavaCC’s Deep Error Recovery. Currently we use it in two places, the first being CompilationUnit
. Now we handle exceptions inside the parser. CompilationUnit
is the highest level of Node
, so we should be handling all parse errors now. That means you will never get an empty AST! When we get a parse error, we handle it as follows:
- add the error as a
Problem
- skip source code until we find a place where we hope the current node ends.
- create a node with
parsed
set toUNPARSABLE
. When this is encountered, its other properties (exceptrange
andtokenRange
) should be considered invalid. - continue parsing as if nothing happened.
So to recover from a parse error on the CompilationUnit
level, we can’t do much else but skip to <EOF>
. That means that without more work, every parse error will result in an AST consisting of one node: a CompilationUnit
with no fields set except range
and tokenRange
. This situation is already a step forward: we still have all the tokens, so a user can still see what the source file looked like.
Parse errors: Statement
level
In 3.2.7, one more level of recovery was added: the Statement
level.
class X{
int x(){
X X X;
}
}
Okay, X X X
makes no sense to Java. To recover, we skip to ;
this time, because every statement ends with a ;
. To stay consistent with CompilationUnit
we should add a Statement
node now, but here we have a problem: Statement
is abstract and should stay that way. Now we could add a specific concrete statement like WhileStmt
, but that makes no sense, so here we created a new AST node: UnparsableStmt
. When asked if it is parsed, it will always return UNPARSABLE
. Here we can see the advantage of handling parse errors: you will get a more or less complete AST:
- CompilationUnit
- ClassOrInterfaceDeclaration X
- MethodDeclaration x
- BlockStmt
- UnparsableStmt "X X X;"
Conclusion
Although the implementation in 3.2.7 is very, very basic, it defines the pattern to follow when implementing other parse error recovery points. If you’re interested in more thoughts, check the PR