Expression handling in grammar-kit - issue with deep trees
I am trying to write a parser for the D programming language using grammar-kit and have made good progress, but I am having an issue with their specification for expressions.
Their language spec specifies that expressions are like this:
Expression ::= CommaExpression
CommaExpression ::= AssignExpression ( "," CommaExpression)?
AssignExpression ::= ConditionalExpxression | (ConditionalExpression "=" AssignExpression) | // ...
ConditionalExpression ::= OrOrExpression | (OrOrExpression "?" Expression ":" ConditionalExpression)
... and like twenty other downward expressions until it reaches...
PrimaryExpression ::= INTEGER_LITERAL | IDENTIFIER | // ...
This means that, for something as simple as this:
int x = 0;
, the content to the right of the assignment operator and before the semicolon, (literally just the zero), has about twenty nested levels deep of PSI elements. Is there some way I can have a single root psi element (Expression) that has a string identifier on it corresponding to the type of expression it contains? Or does somebody know a better solution to this?
请先登录再写评论。
Please verify you followed https://github.com/JetBrains/Grammar-Kit/blob/master/HOWTO.md 3.1ff to make PSI hierarchy flatter.
Yup, I have been over section 2.4 many times over the last few weeks and nothing I did seemed to "activate" the special expression handling, and I figured it was because my expressions are of the form:
instead of:
which fundamentally changes the syntax. I omitted the extends attribute on each of the expressions earlier and probably shouldn't have as that is relevant info. Even after I did that, I was not able to utilize the left recursion like they said, and the tree depth was still huge, so I assume the fundamental structure of the language didn't allow me to use grammar-kit's built in handling for expressions.
Could you link/provide the full grammar file?
Sure, this is the full BNF. I had to eliminate certain instances of left recursion, but beyond that, the grammar is mostly the same as the spec.
(Don't worry, I will go through and replace the tokens like DOT_OP with "." later. Didn't realize I could at the time. There is also plenty of optimization to be done, but before that I wanted to make sure this was even possible with this language.)
For expression parsing one can use Pratt parsing approach (the simplest bottom-up parsing case when it becomes just a loop).
This is exactly what you need in that case. Expression parsing is described here:
https://github.com/JetBrains/Grammar-Kit/blob/master/HOWTO.md#24-compact-expression-parsing-with-priorities
Okay, I'm still not quite getting what you are saying. Yes, I need to use Pratt parsing. I thought this was a feature in grammar-kit for handling expression rules explicitly by following the rules laid out in section 2.4. The rules to "enable" this logic are:
1. All "expression" rules should extend "the root expression rule". ( they do)
2. priority increases from top to bottom. (not really a rule, but it is laid out like that in my bnf)
3. use left recursion for binary and postfix expression. (tried it both ways with left recursion and left recursion eliminated with no success)
4. private rules to define a group of operators with the same priority. (does this need to be done for Pratt parsing to be applied?)
5. Use rightAssociative attribute when the default left associativity is not appropriate (same here, is this a requirement to activate grammar kits special handling?)
Are there explicit rules somewhere to follow for grammar kit to apply the logic laid out in that section?
I use this QuickDoc popup to check what rules and tokens violate expression parsing contract.
It shall say "Rule contains no public rules and no tokens" at the bottom and display priority table.
If everything is laid out as described and you still get "rule employs left recursion" please check the root expression rule QuickDoc.
Also feel free to compare your code with GK test grammar (just copy/paste it to scratch file) :
https://github.com/JetBrains/Grammar-Kit/blob/master/testData/generator/ExprParser.bnf#L26
Sorry about the length of time between responses, I don't get much time these days to code for fun.
So I have never gotten or seen that QuickDoc warning "Rule contains no public rules and no tokens". However, when I specify an expression rule with left recursion, it gives me the warning that left recursion is not supported. When I switch all my expressions that originally used left recursion back to using left recursion and generate the parser, it actually succeeds. However, when I run the plugin and open a project with an open D file, the IDE completely hangs on the Opening Editors... step using 100% CPU, (and my computer actually overheated and died once).
HOWEVER, I found that when I switch only the first rule that originally used left recursion, and keep the rest of the rules with left recursion eliminated, the project loads properly, and that expression does not show in the PSI tree.
The only change to the BNF above was:
And I deleted OrOrExpressionPrime. But then the moment I convert the next rule down, AndAndExpression to use left-recursion, then it hangs. So clearly, it's trying to use Pratt parsing because the OrOrExpression was ommitted from the Psi tree, but something about that AndAndExpression rule is killing the editor. Any idea what's up?
When I change the AndAndExpression rule to use left recursion, it looks like this:
Which causes the hang. The full BNF which causes the hang on run is here and my expressions start at line 89.
Here's the snippet of how Pratt-parsing grammar shall look like in your case
(and the QuickDoc shows the Priority Table part now on the screenshot):
Wouldn't changing the language like that change the set that it accepts? I worked a couple of examples by hand but could not produce an expression that would not be accepted by both grammar subsets. I'm not quite sure how to prove that they are equivalent though.
I converted the grammar to be exactly like the examples specified. All the warnings dissappeared as you said, the parser generation time vastly decreased, and the comments you showed in the generated parser file appeared above the root expression rule, so I assume it somewhat worked.
However, I'm still running into the issue from before. When I run the plugin, (opens another instance of intellij where my plugin is active), and try to load a dlang file, it shows the file and begins loading and then freezes with 100% cpu usage until I kill it. It occasionally generates a freeze log, but the stack trace does not explicitly enter my project. Is this something with my language, or an issue with grammar kit, or an issue with intellij? I'm not quite sure where to look from here.
Updated BNF for now: https://gist.github.com/FatalCatharsis/aad0b72c3ffed144c3d61ee73dcb283f
Stack trace in a freeze log: https://gist.github.com/FatalCatharsis/d48809dbadae465c764261fe1ec8eb04
Updating for posterity. The cause of the freeze, (infinite loop), was placing the conditional value in the comma expression at the root of the expression tree:
Removing the conditional at the end and making it,
technically changes the language and stops productions like:
but no longer caused the freeze. I have yet to run into another issue of productions not being accepted by pratt parsing of expressions. Thanks for your help.