building robust parsers

I'm facing the problem that in my custom language I have many definitions in a row. If parsing one definition fails, the complete AST becomes erroneous.
The situation in detail:
This is the grammar:

ontologyDescription::= definition*
definition ::=  prolog definitionPart?
private definitionPart ::= OPENING_BRACKET definitionBody CLOSING_BRACKET
definitionBody ::= ENTRY*

The language and the psi look like this:
You see that I have 2 definitions: SOMETHING and SOMETHING_ELSE and the AST looks. However when I add a sytax error in SOMETHING, also SOMETHING_ELSE breaks:
Leaving out the PARENT_REFERENCE throws the error in SOMETHING. Now only one definition is parsed because of the error in SOMETHING.
Is there a way to use recoverWhile or recoverUntil attributes in the grammar to parse the correct definitions?
Thanks in advance,

Comment actions Permalink

I was able to solve the problem by setting the pin attribute correctly.

ontologyDescription::= definition*
definition ::=  prolog definitionPart
prolog::= IDENTIFIER extendsPart?
private extendsPart::=EXTENDS PARENT_REFERENCE {pin=1}
private definitionPart ::= OPENING_BRACKET definitionBody CLOSING_BRACKET
definitionBody ::= ENTRY*

I guess the meaning is the following:
if the pinned token is consumed the complete rule (extendsPart) is parsed. I have big problems to understand revoverWhile, recoverUntil and pin attribute

Comment actions Permalink

Grammar-Kit 1.1.5

  • Historical typo fixed: recoverUntil attribute renamed to recoverWhile (indeed it always meant while)

We should use 'recoverWhile' when loop while(true){..} will be generated in our parser.
E.g in places where you describe the 'element*'. As you have written here 'ontologyDescription::= definition*'

//This code placed in your generated parser
  // definition_body_element*
  private static boolean definitionPart_1(PsiBuilder builder_, int level_) {
    if (!recursion_guard_(builder_, level_, "definitionPart_1")) return false;
    int pos_ = current_position_(builder_);
    while (true) {
      if (!definition_body_element(builder_, level_ + 1)) break;
      if (!empty_element_parsed_guard_(builder_, "definitionPart_1", pos_)) break;
      pos_ = current_position_(builder_);
    return true;

Your BNF should look like this:

  tokens = [
    EXTENDS = "extends"
    PARENT_REFERENCE = "parent"

    Identifier = "regexp:[a-zA-Z$_][a-zA-Z0-9$_]*"
    //ENTRY = I have no ideas what does it mean, Let it be just a number
    ENTRY = "regexp:-?[0-9]*"
    //PARENT_REFERENCE = I have no ideas what does it mean, Let it be just 'parent'

ontology_description::= definition_element*
private definition_element ::= !<<eof>> definition { pin=1 recoverWhile = definition_element_recover }
//pin=1 - it means: if your lexer isn't at the end of the file, we are in 'definition_element' exactly
private definition_element_recover ::= !( prolog | '[' )
definition ::= prolog definitionPart {pin=1}
prolog::= Identifier extendsPart?
private extendsPart ::= "extends" parent_reference {pin=1}
parent_reference ::= "parent"
private definitionPart ::= "[" definition_body_element* "]" { pin=1 }
private definition_body_element ::= entry { recoverWhile=definition_body_element_recover }
private definition_body_element_recover ::= !("]" | definition_element)
entry ::=ENTRY


Please sign in to leave a comment.