Error recovery in language parsers?

Hi all,

When writing parsers for cursom languages in IntelliJ, is there a general strategy used for error recovery or correction? I've looked at a couple of existing parsers, but it's hard to figure out from the code if there's a particular strategy. In recursive descent parsers, normally when you detect an error either you just skip a token and try again, or skip input up to one of a set of tokens (either static or calculated from the grammar). In IntelliJ the second, at least, is harder due to the fact that you have to build a tree using all tokens that the lexer passes you.

From looking at some of the existing parsers, it seems like an AST node for a particular production is generally created even if the production didn't parse correctly. This seems like a good strategy for simplifying the parser, but you're really just pushing a lot of the complication further down - it means you then need code at all levels of your language plugin to deal with incorrectly formed AST nodes. Would a better strategy be to create a single node for erroneous input (an ASTError node or some such), and only build the PSI tree from correctly-formed elements?

I'd be really interested to hear about experiences with this for larger or more complex grammars.


Please sign in to leave a comment.