GrammarKit - statement struggle with inline function



I'm start hard with adding support for lambda (inline functions) into bnf grammar.

I'll simplify as much as possible.

Language is indent based with optional semicolons so I generate endStmt tokens when encoutering new lines (conditioned) - which is fine so for example:

var a = 1

results into [var, id, eq, number, endStmt]

Problem is, that when adding lambdas those endStmt tokens are consumed by lambda statement and than variable declaration fails that it is not closed properly f.e.:

var a = func():
return 1

with a grammar something like:

varDecl ::= VAR id EQ expr
lambda ::= FUNC "(" ")" statement+

statement ::= expr endStmt | varDecl | ...
expr ::= lambda | ...

The statement from lambda consumes that line end so tokens for it are are: [return, number, endStmt] but than that variable declaration has is stuck with [var, id, eq, expr, <missing endStmt>]

I'm not sure how to possibly approach such problem, since Lexer does not care (should not care I believe), that in case of inline function at the last statement it is required suddenly to create 2 endStmt tokens.

Is there some way to for example tell PsiParser to return that one last token, when lambda is sucessfully parsed or add it on the fly - or any other approach?

I cannot just remove that endStmt from lambda since it can have any number of lines and nesting so it's still required to f.e.:

var a = func():
var b = 2; var c = 3
return 1

in this example it still creates 3 endStmt tokens for lambda which has 3 statements but the last statement consumes endStmt of last line and than I'm missing it for "a" variable which again sees: [var, id, eq, expr, <missing endStmt>]

only this time that expr is from "func" up to "return 1\n"


Correct me if I'm wrong, but digging into it - the bnf grammar used with grammar-kit is only Context free,
while the grammar I'm trying to create is not as statements inside inline function are little different f.e.

call_fn(func(): return 1)

"return 1" is a statement that should always be terminated by semicolon or new_line, but once it is used inside inline function like this, it's now terminated by ")" which belongs to call statement so it depends on context.

Which would mean that I cannot rely on bnf grammar and have to write custom Parser?


Hello, just a reminder so it won't be forgotten.

I'd be glad for any tips.


First thing comes to mind is move endStmt out from the statement. It is not part of it, it's part of the markup.

So it is like:

program ::= statement (semicolon ? endStmt statement)*

That'd create a new issue of splitting statements somehow as not all of them ends by semicon - like if condition

if true:

where endStmt is closing asd() call, the whole if condition is not closed by yet another endStmt token so moving it out of statement rule I'd have to create 2 different groups where one ends with it and other does not.


Also I don't think this would solve issue with lambdas in any way.

I'm thinking about LSP as Intellij 2023.2 should support it officially and this script language offers it as well, but there is still not documentation to it and I'm not sure how it actually interact with the rest of the plugin - like if it will be used instead of Parser than do I have to still create my own PsiElements and map whatever tokens are returned by LSP, or will the inner workings of LSP implementation create their own PsiElement that I can work with and many more questions.


Ou ok - seems that it's mostly just separate from all the other features not integrated like I though.


Please sign in to leave a comment.