Speeding up negative look-ahead with GrammarKit-based parsers


For IntellIJ Elixir, I have a GrammarKIt based grammar, https://github.com/KronicDeth/intellij-elixir/blob/825344da93f6c446b3fcdf18295a7dca7337a91d/src/org/elixir_lang/Elixir.bnf.  It works well and I even found some bugs in the native parser from testing it, but the issue is that Elixir contains anonymous functions and other constructs that can have multiple clauses, but the start of each looks the same as an expression until you hit the `->`, like so:

anonymous = fn

  [] ->
    IO.puts "line 1 of empty"
    IO.puts "line 2 of empty"
  [h | t] ->
    IO.puts "line 1 of non-empty"
    IO.puts "line 2 of non-empty"

My look-ahead is that each body expression (the IO.puts, :empty, and :nonempty) lines is

private stabBodyExpression ::= !stabOperationPrefix expression

While `stabOperationPrefix` is


private stabOperationPrefix ::= stabParenthesesSignature stabInfixOperator |
stabNoParenthesesSignature stabInfixOperator |


So, a body expression is anything that isn't the start of a new line.  The problem is that `stabNoParenthesesSignature` looks a lot like a normal expression line because if it contains no comma (that is the clause takes a single argument) then it is exactly a normal expression and only becomes the arguments to the clause of the `stabInfixOperator` after it.

Are there any techniques I can use to either eliminate the negative look-head or to make it faster?  Users have noticed performance problems at only 7 nesting levels as reported here https://github.com/KronicDeth/intellij-elixir/issues/580.  I don't think anyone should nesting anonymous functions that much, but the editor also becomes unresponsive to Enter when they do that.

Comment actions Permalink

I use these approaches:

1. use the simplest lexer/token based lookaheads

2. minimize rollbacks by keeping already parsed parts via left rules, or manual frame.marker remapping

3. making parsing  less strict yet more simple by moving complex logic to Annotator or some inspection


The "stab" branches look very similar and contain complex expression logic, so:

the 1st is not very helpful, the 2nd is may be, the 3rd way looks pretty much applicable.


The 3rd approach is often overlooked but is very viable in IDE-grade parsers, i.e. consider editing the first expression in a long list so the way it is parsed quickly changes from one bnf branch to the other. In IDE it is better to add one error or inspection warning and keep the rest expressions on the list parsed and colored.

Comment actions Permalink

For the 2nd approach, I'm not really sure how to do the left rules since the start of stab matches matchExpression, which has to be the way it is to trigger the Pratt Parser detection.  Could you explain the "manual frame.marker remapping" part more?

Comment actions Permalink

Manual element type remapping:


1. PsiBuilder API way: 

rule ::= some <<remap>> // remap after "some" is successfully parsed

public static boolean remap(builder, level) {

  m = builder.getLatestDoneMarker()





2. GPUB ErrorState frame way (frames can be traversed up the stack)

some ::= .... <<remap>> // remap current frame element type

public static boolean remap(builder, level) {

ErrorState.get(builder).currentFrame.elementType = CUSTOM



3. GPUB Section Hooks way

some ::= .... { hooks=[remap] }

public static final Hook<WhitespacesAndCommentsBinder[]> REMAP = (builder, marker, param) -> {
  if (marker == null) return null

  m = marker.precede()



  return m


Notes: remap() method and REMAP hook constant shall be statically imported in a parser


Please sign in to leave a comment.