language composition (with Grammar-Kit)

Konstantin Sobolev

Created June 24, 2016 22:52

Imagine there are three languages, A, B and C where

A and B are independent, but they can both embed expressions from C.
there can be more than 2 languages embedding C in the future.
we need to know the context when parsing C, i.e. get a reference to it's host
we need stand-alone parsers in addition to the plugin
all the languages are developed using Grammar-Kit

What's the best way to pull this off?

I can see the following options:

Have 2 separate grammars for A+C and B+C. Super straightforward, but lots of duplicating code, esp. when we get to implementing references, refactorings, formatters etc for C. Can be blown up really bad if there are more than 2 languages embedding C.
Use chameleons as described in the doc. Can't make it work so far, plus there's a recent post by Gregory stating that "Language embedding via ILazyParseableElementType is *almost* not possible". Quite discouraging.
Use language injections (PsiLanguageInjectionHost). Not sure it's a good match since we need to only inject specific kind of language, not any language. We also need parsers to work in stand-alone mode, which will be more complicated in this case.
Have one mega-grammar for A+B+C, with different parsers derived from it. Rough idea is to write different subclasses of Grammar-Kit-generated parser overriding `parse_root_`. Somewhat hacky and not modular but at least no code duplication
Scratch everything and switch to MPS =)

Any advices please?

4 comments

Gregory Shrago

Created June 25, 2016 10:38

Depends on the exact nature of the A, B and C.

For example:

- A & B are like different dialects and C is some expression language for both. Then I'd go (4).

- If A, B and C are of same nature then it is possible to go (2)

Regarding 2: it is *possible* when it is OK to have the same "getContainingFile()"

Regarding 3: a hard-coded injector will ensure that only C is injected. But I've never tried standalone parsing + injectors.

GK allows to split one grammar into several parsers.

Also 'public static' parsing methods allows to manually compose different grammars into one parser. And I actually use that.

Konstantin Sobolev

Created June 27, 2016 23:15

I've got a working prototype using (2) after some tinkering, but it looks like (4) would be a more viable solution in our case.

Any more details on how GK allows to split one grammar into multiple parsers and on composing multiple grammars using 'public static' parsing methods, please?

Thanks!

Gregory Shrago

Created June 28, 2016 20:35

Splitting:

{
  parserClass="abc.Parser1"
  parserUtilClass="abc.ParserUtil"
}
root ::= xxx
yyy ::= B

;{
  parserClass="abc.Parser2"
}
xxx ::= A <<parseYYY>>

Composing:

public class ParserUtil {
  public boolean parseYYY(PsiBuilder b, int l) {
    return abc.Parser1.yyy(b, l);
  }
}

Here the composition is made via external parseYYY() call. In case of a single grammar it is unnecessary

but the same approach can be used to tie several grammars together.

Konstantin Sobolev

Created June 28, 2016 21:13

nice! you probably want to document the ability to have multiple top-level {} blocks

Please sign in to leave a comment.