Processing part of a YAML file with specific lexer and parser

Answered

Created August 10, 2021 11:41

I have a language that is embedded by YAML for the general structure but scalars should be processed with the DSL. However, only when they are located at a specific position in the YAML file.

myProp: !ACertainType
   title: my title
   foo: |
       a = b;

In the example above, I would like the existing YAML parser to parse the document and have the scalar at foo: parsed again.

I know how to determine the location in the file to determine what to parse again but I wonder what the best approach:

- create my own YAML + DSL single lexer / parser (current situation)
- create a new IFileElementType that overrides the doParseContents, basically replacing the YAML Scalar AST node with a more tokenized node with the DSL
- another way?

I need the DSL part to be aware of DSL parts in other parts of the document (and other documents).

The reason I want to move away from the current situation (a single lexer/parser) is that I end up with either:
- making the lexer aware of where it is in the document and determine the next state from there. That works but I don't feel a Lexer should have that responsibility
- parse every YAML Scalar as DSL which generates tokens where it's not supposed to be (the title: example above should be simple plain text)

Is there any example in the community repo that I'm missing or another example that I can use?

Kr, Tim

7 comments

Reece Dunn

Created August 15, 2021 07:38

The best way to do this is with language injections to insert the ACertainType language into the YAML file. You can implement the MultiHostInjector interface to do this programmatically, specifying the YAMLScalarText PSI element in the elementsToInjectIn function. In the getLanguagestoInject, you can then do something like:

if (!isFooInACertainType(context)) return;

PsiLanguageInjectionHost host = (PsiLanguageInjectionHost)context;
registrar.startInjecting(ACertainTypeLanguage.INSTANCE)
registrar.addPlace(null, null, host, host.getTextRange())
registrar.doneInjecting()

You can then define the lexer, parser, etc. for the language you are supporting in that YAML structure.

Tim

Created August 15, 2021 17:50

Hi Reece,

thanks for example. It appears this doesn't work for my case. The buildIn YAML language has a variety of YAML Scalar values that can be parsed, for example, when prefixed with a multiline pipe | it will be a YAML Scalar List:

someProperty: |
   This entire block
   is processed as a multiline
   yaml scalar block and needs to be parsed by my custom language lexer / grammar

All variants inherit from YAMLScalar so I provided that interface to the elementsToInjectIn. When that didn't work I tried it with PsiElement.class and noticed that even when inserting into any PsiElement it appears only certain types of PsiElement are actually processed via the getLanguagestoInject. For example, a YAMLKeyValue (the key + value, in this case the entire block I posted) and the tree parents all the way up to the document itself.

Is there some way to determine which elements are processed in the getLanguagestoInject?

I also tried the alternative with languageInjectionPerformer / languageInjectionContributor but that gave the same result

Kr, Tim

Tim

Created August 15, 2021 19:02

I was able to inject the DSL into the YAMLScalar however, the result is not really what I was hoping for. The host file (pure yaml) is not aware of the injected dsl it appears, there is now a specific DSL file for the injected code. The question is, how can this injected file interact with other parts of the yaml file or other parts of the project.

Looking at the MultiHostInjector this appears to be the designed behavior where these kinds of code injections are handled in isolation which is not what I'm looking for.

Reece Dunn

Created August 16, 2021 19:18

You can use PsiTreeUtil.getContextOfType to get access to the YAMLScalar element from your DSL, you can then use the YAML API to traverse the elements.

If you want to adapt the YAML file itself, you could create PsiElement wrappers like XsltElementFactoryImpl does for XSLT-based XML files. That xslt support (in the xpath plugin under xpath-lang) is also an example of adding different functionality to XML files (elements and attributes) that may be transferrable to YAML.

Tim

Created August 17, 2021 04:36

Hi Reece,

thanks for the Xslt pointer, that looks really promosing. Especially since I want to contribute special meaning to key:value positions in the yaml file based on their position. I'll have another look at the multi-host injection also with your suggestion.

Thanks for helping,

Tim

Yann Cebron

Created August 18, 2021 15:34

Please share if you found fully working solution

Tim

Created October 15, 2021 20:27

It took a bit of time but I found a way to create a working solution:

https://github.com/timmisset/omt-odt-plugin

For anyone needing some background info. There are 2 languages at play here, OMT and ODT. Where OMT is a YAML extension that provide meaning to specific structures, the ODT is a completely seperate ODT with its own lexer / grammer.

It uses the (mostly undocumented) MetaType API available in the Yaml plugin of IntelliJ to implement the required structure. The OMTMetaTypeProvider is used to determine how specific Yaml positions (like a mapping at a certain position) can be translated to something that makes sense in the OMT language.

In the OMT language it is possible to declare variables that can be used in the ODT language.

https://github.com/timmisset/omt-odt-plugin/blob/master/src/main/java/com/misset/opp/odt/psi/references/ODTVariableReference.java shows how the ODTVariableReference can be resolved to the declaring element in the OMT file.

The multihost injection is part of the ODT package and also uses the meta-type provider to determine which YAML scalars should be injected with the ODT DSL, simply by 'tagging' the meta-type of those scalars with an ODTInjectable interface.

Hope this helps other people on the same journey ;)

Please sign in to leave a comment.