building robust parsers
Hallo,
I'm facing the problem that in my custom language I have many definitions in a row. If parsing one definition fails, the complete AST becomes erroneous.
The situation in detail:
This is the grammar:
ontologyDescription::= definition*
definition ::= prolog definitionPart?
prolog::= IDENTIFIER (EXTENDS PARENT_REFERENCE)?
private definitionPart ::= OPENING_BRACKET definitionBody CLOSING_BRACKET
definitionBody ::= ENTRY*
The language and the psi look like this:
You see that I have 2 definitions: SOMETHING and SOMETHING_ELSE and the AST looks. However when I add a sytax error in SOMETHING, also SOMETHING_ELSE breaks:
Leaving out the PARENT_REFERENCE throws the error in SOMETHING. Now only one definition is parsed because of the error in SOMETHING.
Is there a way to use recoverWhile or recoverUntil attributes in the grammar to parse the correct definitions?
Thanks in advance,
Sebastian
请先登录再写评论。
I was able to solve the problem by setting the pin attribute correctly.
ontologyDescription::= definition*
definition ::= prolog definitionPart
prolog::= IDENTIFIER extendsPart?
private extendsPart::=EXTENDS PARENT_REFERENCE {pin=1}
private definitionPart ::= OPENING_BRACKET definitionBody CLOSING_BRACKET
definitionBody ::= ENTRY*
I guess the meaning is the following:
if the pinned token is consumed the complete rule (extendsPart) is parsed. I have big problems to understand revoverWhile, recoverUntil and pin attribute
Grammar-Kit 1.1.5
We should use 'recoverWhile' when loop while(true){..} will be generated in our parser.
E.g in places where you describe the 'element*'. As you have written here 'ontologyDescription::= definition*'
//This code placed in your generated parser
// definition_body_element*
private static boolean definitionPart_1(PsiBuilder builder_, int level_) {
if (!recursion_guard_(builder_, level_, "definitionPart_1")) return false;
int pos_ = current_position_(builder_);
while (true) {
if (!definition_body_element(builder_, level_ + 1)) break;
if (!empty_element_parsed_guard_(builder_, "definitionPart_1", pos_)) break;
pos_ = current_position_(builder_);
}
return true;
}
Your BNF should look like this:
{
tokens = [
EXTENDS = "extends"
OPENING_BRACKET = "["
CLOSING_BRACKET = "]"
PARENT_REFERENCE = "parent"
Identifier = "regexp:[a-zA-Z$_][a-zA-Z0-9$_]*"
//ENTRY = I have no ideas what does it mean, Let it be just a number
ENTRY = "regexp:-?[0-9]*"
//PARENT_REFERENCE = I have no ideas what does it mean, Let it be just 'parent'
]
}
ontology_description::= definition_element*
private definition_element ::= !<<eof>> definition { pin=1 recoverWhile = definition_element_recover }
//pin=1 - it means: if your lexer isn't at the end of the file, we are in 'definition_element' exactly
private definition_element_recover ::= !( prolog | '[' )
definition ::= prolog definitionPart {pin=1}
prolog::= Identifier extendsPart?
private extendsPart ::= "extends" parent_reference {pin=1}
parent_reference ::= "parent"
private definitionPart ::= "[" definition_body_element* "]" { pin=1 }
private definition_body_element ::= entry { recoverWhile=definition_body_element_recover }
private definition_body_element_recover ::= !("]" | definition_element)
entry ::=ENTRY
Attachment(s):
1.png
https://github.com/JetBrains/Grammar-Kit/blob/master/TUTORIAL.md explains it nicely IMHO