Strategy to support two dialects for same programming language in my plugin

Answered

Created June 14, 2021 14:31

Hi dear IntelliJ IDEA specialists.

I’m developing a programming language support plugin for IntelliJ IDEA.

I want to add a support for programming languages A (already implemented) and A+ (want to add) where A+ is a superset of A.
(For detail here A = Posix AWK, A+ = Gawk.)

The most obvious approach I’m thinking of is just to extend the BNF grammar I already have for A (possibly lexer as well) to support A+. This should not break A since it’s a subset of A+.

However what I dislike is that when I only want to use A dialect for my program (for the most portability) - nothing will prevent me from accidental usage of some A+ features.

That is why I was thinking of adding a project-level config option to setup the desired language dialect. (I guess the same approach is used in IDEA itself for JavaScript support where you can set up the desired EcmaScript edition.) — I’m interested as well of how this is achieved.

Now having this config option added I don’t quite get how can I enforce the A-only restriction for the developer. Should it be via hints/annotations? I.e. I will correctly parse A+ but will have a set of additional checks in place that will mark the unsupported features as errors in editor?

Initially I was thinking to have 2 separate BNF’s to keep the two explicitly distinct. But I realized that this will enormously increase the work efforts and cause code duplication.

Then I thought that maybe it is possible to come up with “conditional” BNF grammar, that depending on a flag will use some additional clauses/variants in grammar. Obviously this can allow most code reuse. I have a guess that this might work using higher-order external rules << ... >>. Do you think this is viable approach? Is there any recommended approach at building “conditional” grammar/parser? Is this even possible?

I admit though that this approach is somewhat limiting for developer, because it will strictly prevent parsing of A+ in A mode instead of, say, parse and highlight unsupported and ask to switch to A+.

I hope I described it clear enough. Would be glad to hear any recommendations and, even better, see some links to existing code solving similar problem.

3 comments

Reece Dunn

Created June 15, 2021 11:25

There are two strategies I've used for this in my plugin.

For different versions of the same language and language extensions, I have a project configuration to specify the supported version, then an inspection (which could also be done as an annotator) that checks PSI elements in the higher versions and language extensions if they are in a supported version, otherwise issue a warning on an element that the PSI element says to place the message on.

For the subset language (XPath subset of XQuery), I have a separate language infrastructure for XPath and XQuery, but inherit the XPath lexer and parser in the XQuery lexer and parser, overriding various methods that have different grammar.

IIUC, the JavaScript, SQL, and RegEx implementations create separate languages for each variant, construct an options object and pass that to the lexer/parser in the ParserDefinition implementation. You can also override the doParseContents of the IFileElementType to get a lexer and parser based on the target language/sublanguage/dialect, such as setting the correct lexer/parser flags.

Piotr Tomiak

Created June 21, 2021 15:06

Hi!

The JavaScript language support in WebStorm/IntelliJ is a good example of how you can support multiple languages with similar roots. As far as JavaScript itself is concerned, we have a single parser now and appropriate inspections informing about flavor required for a particular feature. However, a more specialized Vue.js/Angular expression parsers reimplements parsers on top of the JavaScript one (having different lexers) and reuse Psi elements from base JavaScript language. Having a different lexer helps to disable some of the language features out of the box.

As far as configuration is concerned. If A+ contains all features of A, I would go for a single parser (if the file type is the same), which have all features enabled and a dedicated `Annotator`, which should take care of reporting unsupported features of the A+, when only A is enabled. It should also provide a quick fix to change support level. This is actually how out `Java` language support works, each version adds new language features and `Java` parser is enhanced with them.

It is possible to create custom rules, like `<<a_plus_level>>` in BNF grammar, which will enable/disable whole blocks of AST, but I would go for the single parser - Annotator approach.

Volodymyr Gubarkov

Created June 24, 2021 21:04

Thank you Reece and Piotr for your advice. Eventually I'm now as well more in favor of single parser. Looks like the most straight forward approach.

Please sign in to leave a comment.