Grammar Kit: Synchronize state between lexer and parser in error recovery

Answered

I'm currently trying to build a lexer and parser for the Nix Expression Language. The language has a feature known as antiquotation.

"Hello, ${ "world" }!"

As you can see in the example above, `${...}` can be used to put an arbitrary expression into a string. However, curly braces are also used at other locations. As a result, the lexer needs to track curly braces to decide if characters after `}` are part of a string.

The parser also tracks curly braces as defined by its grammar. However, I'm wondering how to keep the state synchronized in case of error recovery. As a naive implementation, I could try the following.

antiquotation ::= '{' expr '}' { pin=1 }
expr ::= ... { recoverWhile=antiquotation_recover }
private antiquotation_recover ::= !('}')

Unfortunately, the recover loop of `expr` might consume opening curly braces without adjusting the state of the parser. As a result, the state of the parser goes out of sync with the lexer. I'm not sure how to fix that. Note that the HOWTO.md mentions that the argument for recoverWhile "should be a predicate rule, i.e. leave input intact".

Here is an example for a possible consequence.

"Hello ${
\\ // <- Causes parsing error
{ // <- Will be consumed by error recovery
} // <- Parser thinks string should continue this line.
} End" // <- Lexer continues string in this line.

Are there strategies to avoid such scenarios?

3 comments
Comment actions Permalink

While looking at the implementation of GeneratedParserUtilBase.exit_section_impl_, I noticed that Grammar-Kir seems to implicitly try to respect braces during error recovery.

int parenCount = 0;
while ((eatMoreFlag || parenCount > 0) && builder.rawTokenIndex() < lastErrorPos) {
IElementType tokenType = builder.getTokenType();
if (state.braces != null) {
if (tokenType == state.braces[0].getLeftBraceType()) parenCount ++;
else if (tokenType == state.braces[0].getRightBraceType()) parenCount --;
}
if (!(builder.rawTokenIndex() < lastErrorPos)) break;
state.tokenAdvancer.parse(builder, frame.level + 1);
eatMoreFlag = eatMore.parse(builder, frame.level + 1);
}

This is strange because it is not documented, and it seems to only respect the first type of braces returned by PairedBraceMatcher.getPairs(), even if BracePair.isStructural() returns false for the first pair.

Unfortunately, this doesn't help in my case because I can only use it to match either `{...}` or `${...}` but not both at the same time.

0
Comment actions Permalink

Johannes, sorry for the late reply – the thread got lost in a queue. Did you manage to handle your case?

0
Comment actions Permalink

Jakub Chrzanowski Thanks for your reply. Yes, it seems to work right now.

  1. I separated ${ into two separate token types, and
    https://github.com/NixOS/nix-idea/blob/9a5c99b4a0ff543f199e143c1477e6347c1505e2/src/main/lang/Nix.flex#L160-L164
  2. I ensured that curly braces ({ and }) are the first pair of braces which are provided by my brace matcher.
    https://github.com/NixOS/nix-idea/blob/9a5c99b4a0ff543f199e143c1477e6347c1505e2/src/main/java/org/nixos/idea/lang/NixBraceMatcher.java#L13-L24

Since this relies on undocumented (and also kind of inconsistent) behavior of Grammar-Kit, I guess it might break in future versions. However, I also got used to the idea that it is probably not that important.

0

Please sign in to leave a comment.