JavaCC plugin: embed Java code with file view provider

Answered

Hi, I've recently been developing a plugin for the JavaCC parser generator and would like to ask for some advice about file view providers.

JavaCC files heavily mix Java code within JavaCC's DSL. Java code can be interspersed in many different places, and JavaCC constructs all more or less map to some java construct, e.g nonterminal declarations correspond each to one Java method in the parser file. For now I use a very complicated language injection scheme to inject java into the blocks while respecting the structure of the BNF file. It technically works quite well but it's *atrociously* slow because it's injected into the whole file. Using file view providers is a solution I hadn't thought about at the time but it seems like it's the right use case.

You probably know what a JavaCC grammar looks like, but just for reference:


```java

PARSER_BEGIN(MyParser)

// This is Java code

public class MyParser {
// some declarations are injected here by JavaCC,
// like some parsing methods and a field for the lexer
// as well as a method for each nonterminal production
// the JJTree preprocessor also inserts `implements` clauses!

void foo() { // user can define some additionnal declarations

}

}

PARSER_END(MyParser)

// The rest of the file is a list of productions
// Each production is translated to a java method in the parser file

void FooProduction(): // this is a BNF production
{
// this block is copied at the beginning of the generated method
// it can contain e.g. local variable declarations
int barResult = 0;
}
{ // this block contains a BNF-like expansion
"a"
| barResult = BarProduction() // the result of calling the generated BarProduction method is stored into the variable barResult.
| "foo" { foo(); } // the braced block is java code, it's copied into the method at the point after the expansion "foo" is matched. It's called a parser actions block.

}


JAVACODE
int BarProduction() {
// this is a Javacode production
// it defines a nonterminal that is visible to the rest of the productions of the grammar
// the body is arbitrary java code
// the method is simply copied into the parser file
}

```

So a JavaCC file should have several psi trees:
* A main JavaCC tree
* A Java tree for the parser file
* A Java tree for the token manager file (the problem is exactly the same as for the parser file, so we don't need to discuss it here. Basically in some contexts the java blocks are injected not in the parser file but in the lexer file).

One thing I'm concerned about is that with file view providers, all the trees should have the same text. This is mentioned here. The thing is e.g., the nonterminal declarations are obviously part of the JavaCC tree, but the Java tree should know that they're methods of the class, to allow references to work properly...

So if I use a file view provider to host the several trees separately:

* will I get better performance compared to using injection? For now I use a multi-host injector which injects all the blocks of the file as if they were in a single Java file. Construction of prefixes and suffixes for each host is very slow as it processes the whole grammar. Each time a block (eg a parser actions unit) is added, the whole file has to be reinjected, which means the new block will not be injected until after well over a minute for real-world-sized grammars. This is obviously not cool and the only real added-value this injection scheme provides to the plugin is when editing already present hosts (which is already somewhat useful). I imagine file view providers would be more optimal performance-wise, but if not then it probably makes no sense to refactor.
* will it possible to "inject" declarations into the Java tree, that don't have corresponding java text in the file? I don't see how that can be done if all the JavaCC code is merged into a single outer language element in the java tree...
  * For example, in the PARSER_BEGIN block above, the code in methods should know about the declarations of all production methods and also the implicit declarations inserted by JavaCC. The call `foo();` in the parser actions block in `FooProduction` should refer to the method `foo()` defined in the PARSER_BEGIN section.
* Is it possible to influence the parsing of the java fragments based on the outer javacc context? E.g. in the following
```java

int A(boolean cond): {} {...}

void B():
{Point a = new Point(0,0);}
{
a.x = A(Foo.someBooleanMethod())
}
```

The call to `A(...)` should be part of the JavaCC tree (it's a ref to the production `A`), but the argument `Foo.someBooleanMethod()` should be part of the java tree and behave like java code when the caret is within the call parentheses. Similarly, the `a.x` on the lhs of the assignment is java code, but the whole expansion `a.x = A(...)` is a javacc construct. Also, the parsing of the declaration block at the beginning of `B` is like a block, so any local declarations are allowed, but the parsing of the lhs of the assignment should only accept valid Java LHSs, and the parsing of the method arguments should only accept java expressions.

Thanks for any help!

1 comment
Official comment

1. Yes, you'll likely get better performance with using multi-file view providers. Although if your injection is slow because of some platform code, we'd appreciate a performance problem ticket in the tracker, so we could try to fix that. If you inject into the whole BNF file, which is already parsed, why should calculation of prefixes/suffixes be long? I'd expect it to be some relatively simple function of the BNF tree, but I might be wrong.

2. Yes, you can add synthetic Java declarations. There are different ways of doing that. One could be to add their text "under" outer BNF elements using TemplateDataElementType.RangeCollector API. Another one — to construct light methods/fields and return them at resolve time using PsiAugmentProvider API.

3. "Is it possible to influence the parsing of the java fragments based on the outer javacc context?" If I understand correctly and "a.x = A(Foo.someBooleanMethod())" is Java, then BNF "A(", then again Java, then again BNF ")", then that looks hard, because you have to separate the different languages on the lexer level. I'm not sure how feasible that is in JavaCC. You could try to pretend it's all Java, just with a special declaration "A" that plays nicely with what's inside.

Please sign in to leave a comment.