language composition (with Grammar-Kit)

Imagine there are three languages, A, B and C where

  • A and B are independent, but they can both embed expressions from C.
  • there can be more than 2 languages embedding C in the future.
  • we need to know the context when parsing C, i.e. get a reference to it's host
  • we need stand-alone parsers in addition to the plugin
  • all the languages are developed using Grammar-Kit

What's the best way to pull this off?

I can see the following options:

  1. Have 2 separate grammars for A+C and B+C. Super straightforward, but lots of duplicating code, esp. when we get to implementing references, refactorings, formatters etc for C. Can be blown up really bad if there are more than 2 languages embedding C. 
  2. Use chameleons as described in the doc. Can't make it work so far, plus there's a recent post by Gregory stating that "Language embedding via ILazyParseableElementType is *almost* not possible". Quite discouraging.
  3. Use language injections (PsiLanguageInjectionHost). Not sure it's a good match since we need to only inject specific kind of language, not any language. We also need parsers to work in stand-alone mode, which will be more complicated in this case.
  4. Have one mega-grammar for A+B+C, with different parsers derived from it. Rough idea is to write different subclasses of Grammar-Kit-generated parser overriding `parse_root_`. Somewhat hacky and not modular but at least no code duplication
  5. Scratch everything and switch to MPS =)

Any advices please?


Depends on the exact nature of the A, B and C.

For example:

- A & B are like different dialects and C is some expression language for both. Then I'd go (4).

- If A, B and C are of same nature then it is possible to go (2)

Regarding 2: it is *possible* when it is OK to have the same "getContainingFile()"

Regarding 3: a hard-coded injector will ensure that only C is injected. But I've never tried standalone parsing + injectors.


GK allows to split one grammar into several parsers.

Also 'public static' parsing methods allows to manually compose different grammars into one parser. And I actually use that.



I've got a working prototype using (2) after some tinkering, but it looks like (4) would be a more viable solution in our case.

Any more details on how GK allows to split one grammar into multiple parsers and on composing multiple grammars using 'public static' parsing methods, please?




root ::= xxx
yyy ::= B

xxx ::= A <<parseYYY>>


public class ParserUtil {
public boolean parseYYY(PsiBuilder b, int l) {
return abc.Parser1.yyy(b, l);

Here the composition is made via external parseYYY() call. In case of  a single grammar it is unnecessary

but the same approach can be used to tie several grammars together.


nice! you probably want to document the ability to have multiple top-level {} blocks


Please sign in to leave a comment.