GrammarKit: error recovery for list of one thing followed by list of a different thing

I'm working on a parser for Elm, and I'm running into trouble with the parse error recovery of top-level declarations. Elm is a Haskell-like language where each file has a list of zero-or-more imports at the top followed by zero-or-more function/value/type declarations. The declarations are delimited using an offside rule, which I have implemented by introducing a lexer on top of the JFlex lexer which synthesizes the appropriate tokens to indicate when significant whitespace transitions occur. For the purpose of this post, we will pretend that there is a ';' token which delimits a declaration.

The problem that I'm having is that when the user starts typing part of the word `import` somewhere in the import section, the parser thinks that the user has begun writing a function declaration, which causes all subsequent import declarations to be marked as errors.

Example Elm input, where user has written part of the word 'import' on line 3:

import A
;
impo
;
import C
;
import D
;
foo = 42
;
type Foo = String

I would like line 3 to be marked as an error (e.g. "expected 'import', got 'impo') and the rest of the file parses correctly. But what's happening is that the lines 'import C' and 'import D' are also being marked as an error because the parser thinks that we have transitioned out of the import section and into the value declaration section.

Here is my grammar:

{
tokens=[
space='regexp:\s+'
number='regexp:\d+(\.\d*)?'
id='regexp:\p{Alpha}\w*'
]
}

Root ::= Module

Module ::= ImportList? ';'? TopDeclList?

private generic_recover ::= !';'

private ImportList ::= Import (';' Import)*
private Import ::= ImportClause {recoverWhile=generic_recover }
ImportClause ::= 'import' id {pin=2}

private TopDeclList ::= TopDecl (';' TopDecl)*
private TopDecl ::= !<<eof>> Declaration {pin=1 recoverWhile=generic_recover }
private Declaration ::= ValueDecl | TypeDecl

ValueDecl ::= id '=' Expression {pin=2}
TypeDecl ::= 'type' id '=' id {pin=2}

Expression ::= id | number

 

 

I'm 99% sure that the pinned <<eof>> lookahead on the TopDecl rule is contributing to the problem since it's allowing the rule to match for partial input such as the incomplete 'import' word. But if I remove  the eof lookahead, then it breaks parse error recovery on the function and value declarations.

Any suggestions for how to structure this? I looked at the HaskForce plugin's grammar and tried following their pattern, but still no luck.

Thanks.

0
2 comments

The way I read this is that *everything* becomes a TopDecl unless it matches ValueDecl or TypeDecl first.  Since you pin TopDecl to not being at EOF, everything matches (excepting the semi-colon because it's part of the recovery rule, so it will be left alone).  Above that line, only the TopDeclList will be matched because it uses the TopDecl that was already matched.  None of the rules in the Import section will match.  (Rules defined later in the file have precedence.)

I don't think you need the `!<<EOF>>` at all.  Just move the recoverWhile to the ValueDecl.  (What you mean by "function declaration" isn't clear, since you don't have any function rules in this sample.)

In the end, the best way to figure out what is going on is to either use the PsiViewer (`-Didea.is.internal=true` on the command line to launch your plugin) or to step your way through the parser in the debugger.  It will take you a couple of hours to figure it all out, but once you do, watching what is actually matched is very informative.

 

 

 

0

I'm actually seeing similar behavior but without the !<<EOF>> instruction.

For other people who come across this error I had to do a couple of things that weren't solved by moving the recoverWhile down (especially since that's the precise example here (https://github.com/JetBrains/Grammar-Kit/blob/master/HOWTO.md#22-using-recoverwhile-attribute).

Firstly, I needed proper pin and recoverWhile for my include declarations. For that I had to make sure to recover to the next include OR DECLARATION.

Secondly, tokens parsed ahead of my first valid include broke everything because it thought everything was trying to be a declaration. For that I added an error parse statement that has only a recoverWhile.

For my case, both of these were needed to prevent the issue in this thread about falling through to the next statement. Take a look at my schema below for an example

schema ::= pre_include_error? incl* declaration*

private pre_include_error ::= {recoverWhile=incl_recover}

incl ::= INCLUDE string_constant SEMICOLON
{pin=INCLUDE
recoverWhile=incl_recover
}
private incl_recover ::= !(incl_start | decl_start)
private incl_start ::= INCLUDE

declaration ::= namespace_decl
| type_decl
| enum_decl
| union_decl
| root_decl
| file_extension_decl
| file_identifier_decl
| attribute_decl
| rpc_decl
{recoverWhile=decl_recover}
private decl_recover ::= !(decl_start)
private decl_start ::= (NAMESPACE |
TABLE |
STRUCT |
ENUM |
UNION |
ROOT_TYPE |
FILE_EXTENSION |
FILE_IDENTIFIER |
ATTRIBUTE |
RPC_SERVICE)
 
0

Please sign in to leave a comment.