Example of a custom language plugin for a templating language

Permanently deleted user

Created February 19, 2011 15:40

Hi,

I am looking into developing a plugin for the Play! framework (http://playframework.org) and have started writing a custom language plugin for the templating engine. The Play! template files are simple HTML files (that also have the .html file extension) with snippets of markup such as e.g. #{tag param1:value, param2:value /}

My question is whether there's an example around of such a templating language that mixes an existing language (HTML) with custom markup. I read the custom language plugin development page at http://confluence.jetbrains.net/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA and know about the ILazyParseableElementType, however the details of how to implement this mechanism aren't very clear to me. So any existing plugin that does templating of that sort would be very helpful... I was thinking of something like FreeMarker / Velocity but I take it that since these are Ultimate features there won't be any code flying around.

By the way, Max, thanks for fixing the JFlex plugin!

Manuel

17 comments

Permanently deleted user

Created February 19, 2011 19:32

I think you are looking for the FileViewProvider support described here:

http://confluence.jetbrains.net/display/IDEADEV/IntelliJ+IDEA+Architectural+Overview#IntelliJIDEAArchitecturalOverview-FileViewProviders

Permanently deleted user

Created February 28, 2011 17:52

Yes, that probably is what I need, thanks!

One more question while I am at it: I tried seeing if the stub I have coded so far does anything by extending the com.intellij.testFramework.ParsingTestCase.
The test seems to run (I get a PSI text file created), but my parser seems not to be taken into account at all. Yet I have the feeling that I have extended all the necessary extension points in plugin.xml (a fileTypeFactory and a lang.parserDefinition, for now I do not have more). When debugging, it looks however as though my ParserDefinition is never being used.
Is there anything more / special to do in order to get this TestCase to work? I'm running it from within the IDEA instance in which I am doing the plugin development, not within an instance that would run the plugin, as that seems odd to me.

Thanks again,

Manuel

Permanently deleted user

Created February 28, 2011 22:17

Hi Manuel, did you get it working ? Im a trying to do the same thing for the Markdown plugin, which is a markup language where you can put HTML blocks. I don't get the FileViewProvider thing here, why would we need it if everything is done at the lexer level as the wiki says ?

Permanently deleted user

Created March 01, 2011 09:14

Salut Julien,

so far I have not had luck with running the test case, and I haven't tried yet deploying the plugin and seeing if anything works when running an instance of IDEA.
I will try digging further by looking at how language plugins of which the source is available operate, to at least have the lexer & parser be invoked.

Dmitry Jemerov

Created March 01, 2011 11:02

Hello Manuel,

What are the VM options specified in the unit test run configuration?

Yes, that probably is what I need, thanks!

One more question while I am at it: I tried seeing if the stub I have
coded so far does anything by extending the
com.intellij.testFramework.ParsingTestCase.

The test seems to run (I get a PSI text file created), but my parser
seems not to be taken into account at all. Yet I have the feeling that
I have extended all the necessary extension points in plugin.xml (a
fileTypeFactory and a lang.parserDefinition, for now I do not have
more). When debugging, it looks however as though my ParserDefinition
is never being used.

Is there anything more / special to do in order to get this TestCase
to work? I'm running it from within the IDEA instance in which I am
doing the plugin development, not within an instance that would run
the plugin, as that seems odd to me.

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"

Permanently deleted user

Created March 01, 2011 15:38

Hi Dmitry,

that probably is my problem -- I have no VM parameters defined in the test run configuration. Is there any documentation as to what parameters I should provide, or could you tell me where I should look / what should be provided?

Thanks,

Manuel

Permanently deleted user

Created April 28, 2011 19:00

Hi again!

I didn't have much time to work on the plugin lately, but now that I looked again into it I realized that I'm still stuck here: I don't really know what VM parameters to pass to the test runner to get the environment to run.
Also I haven't had any chance finding documentation or an example online.

Dmitry (or anyone with knowledge and a bit of time to reply), any idea how a ParsingTestCase needs to be started?

Thanks!

Manuel

Yann Cebron

Created April 29, 2011 07:06

just use the VM parameters of IDEA itself, at least the ones for memory settings Mac: http://devnet.jetbrains.net/docs/DOC-197 Windows: idea.properties in $IDEA_HOME$ (??)

Permanently deleted user

Created April 29, 2011 12:28

Thanks, I figured out where was the problem... I forgot to implement one method of the ParserDefinition and so the test just stopped. Now I've got the tests running and am writing the Parser - and I'll try my luck with getting the FileViewProvider to work (the Play! templates have a HTML extension so I'm not entirely sure how to plug the additional expression definition in there).

Dmitry Jemerov

Created April 29, 2011 12:50

Hello Manuel,

You'll need to use the LanguageSubsitutor API to replace the language for
HTML files which are Play templates with your own language.

Thanks, I figured out where was the problem... I forgot to implement
one method of the ParserDefinition and so the test just stopped. Now
I've got the tests running and am writing the Parser - and I'll try my
luck with getting the FileViewProvider to work (the Play! templates
have a HTML extension so I'm not entirely sure how to plug the
additional expression definition in there).

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"

Permanently deleted user

Created April 29, 2011 14:46

Hi Dmitry,

You'll need to use the LanguageSubsitutor API to replace the language for
HTML files which are Play templates with your own language.

Thanks, I found the API -- that is, I have the feeling that this would replace the entire language of the HTML file, whilst what I want to do is just to add the parsing for some parts of the document. E.g. in the same way you can have JavaScript in a HTML file, here the plugin would parse only specific expressions. For example:

<form action="@{PlayController.submitAction()}"> ... </form>

In this example, everything is HTML except for the part in "@{ }".

Is this somehow possible?

Dmitry Jemerov

Created April 29, 2011 14:57

Hello Manuel,

If the Play framework specific stuff only lives in HTML attribute values,
you don't need to replace the language at all. Instead, you can use language
injection to inject the Play specific language into the attribute value.

Hi Dmitry,

You'll need to use the
> LanguageSubsitutor API to replace the language for
>
> HTML files which are Play templates with your own language.
>
>

Thanks, I found the API -- that is, I have the feeling that this would
replace the entire language of the HTML file, whilst what I want to do
is just to add the parsing for some parts of the document. E.g. in the
same way you can have JavaScript in a HTML file, here the plugin would
parse only specific expressions. For example:
 action="@{PlayController.submitAction()}"> ... ]]>
In this example, everything is HTML except for the part in "@{ }".

Is this somehow possible?

---
Original message URL:
http://devnet.jetbrains.net/message/5301809#5301809

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"

Permanently deleted user

Created April 29, 2011 15:35

Hi Dmitry,

indeed, I didn't think of this. Is there a way in the configuration of a plugin to enable this language injection by default?

Depending on the type of the expression, some may be used just anywhere in the page -- there is one that is pretty much similar to a JSTL expression in JSP pages and also looks the same:

<h1>Hello ${user.name}</h2>

Would this situation (content of a HTML tag) also be covered by language injection? If it isn't, is there maybe a way to create a configuration which would enable the usage of HTML, JavaScript and Play expressions together?

While I am at it - the content of the ${ } expressions is a groovy expression, so I suppose that in order to make the plugin complete I'll need to plug in groovy inside of ${ } (there's also a syntax that allows to include groovy code blocks, via %{ // groovy code here }% -- very much like JSP scriptlets).

Thanks very much for your help,

Manuel

Peter Gromov

Created May 03, 2011 19:27

Hi,

From the examples you give Play! templates really seem quite similar to GSP. So language injection won't save you, neither will LanguageSubstitutor. You need to create your own FileViewProvider extending MultiplePsiFilesPerDocumentFileViewProvider which would manage different PSI trees, at least two: one for HTML and one for Groovy. I should say this is not an easy task by itself, made even harder by the fact that we don't have a single example of such multi-tree files in the open source. But we are here to help :)

Both trees should have the same text. The foreign (Groovy) fragments should be represented as special OuterLanguageElement leaves in the HTML tree. HTML fragments in the Groovy tree are more legitimate, they just have a special element type which your PSI should know about (e.g. TEMPLATE_TEXT).

The lexing is done for the Groovy tree: you should be able to split your token stream into template text and the Groovy code in the appropriate places. This can't be done by the regular expressions themselves, so I suggest to write the lexer in two parts: one is very simple and autogenerated (by JFlex, for example), it tokenizes the text into minimal possible tokens there could be. No lexer state management is needed here. Then another part of the lexer comes into play: it reads the stream of those minimal tokens and translates it into the stream of actual tokens the parser will use. For example, it could take all the tokens at the beginning of the file until the first ${ and merge them into one big TEMPLATE_TEXT. Then it could return tokens from the base lexer as they are until the ending } is encountered. Some brace counting is necessary here, and this is precisely what JFlex is bad at. I usually do this sort of 'lexing parser' by extending LookAhedLexer class.

After you have such a lexer, you can have syntax highlighting in the editor. You'll probably need to extend LayeredLexerEditorHighlighter to get syntax highlighting for HTML, Groovy and the injection syntax in between them.

You can also parse the token stream created by your lexer. At this point you need to look at this text as a code to be executed by Play! framework. Template text is just some printing, the Groovy injections is some custom code, perhaps residing inside some special nodes.

You also have to create the HTML tree based on the same lexer output. Here we usually only parse the template text parts (HTML) without any Groovy injections. Then the text of those injections gets inserted into the resulting HTML tree as OuterLanguageElements. If your template data file element type extends TemplateDataElementType, you'll get that for almost free.

I think that's enough for today :) If questions arise, they're always welcome.

Permanently deleted user

Created May 04, 2011 13:24

Hi Peter,

thanks for the long reply, it's very helpful! I think I understand most of what needs to be done, I just have some questions for clarification (and I'll most likely have a couple more questions while implementing this):

The goal is to get a document that has multiple PSI trees, one for each language. So far I can think of HTML, Groovy, Play and Javascript. Regarding Javascript, is that covered by HTML? I suppose it's an easier case to deal with since Javascript can live only inside of <script> tags so it should be easy to identify it, but would I need to do something additional there or is that taken care of by the HTML parsing? (as far as I know, Javascript is not available in the community edition, but I suppose there is a way to check for that).

I should say that the Play "language" will be very limited: as far as I can tell, there is only one case where there really is a need for it, and that is for calling tags, where the syntax is like:

#{some.tag key1:value1, key2:value2 }
 <!-- some HTML markup -->
#{/}

So the only Play specific part here would be some.tag - key1:value1, key2:value2 is a groovy collection as far as I know. For the rest of the constructs of Play's templating language, the expressions should all be valid groovy expressions / code.

The lexing is done for the Groovy tree: you should be able to split your token stream into template text and the Groovy code in the appropriate places. This can't be done by the regular expressions themselves, so I suggest to write the lexer in two parts: one is very simple and autogenerated (by JFlex, for example), it tokenizes the text into minimal possible tokens there could be. No lexer state management is needed here. Then another part of the lexer comes into play: it reads the stream of those minimal tokens and translates it into the stream of actual tokens the parser will use. For example, it could take all the tokens at the beginning of the file until the first ${ and merge them into one big TEMPLATE_TEXT. Then it could return tokens from the base lexer as they are until the ending } is encountered. Some brace counting is necessary here, and this is precisely what JFlex is bad at. I usually do this sort of 'lexing parser' by extending LookAhedLexer class.

Allright, I think I understand how this is supposed to work. I suppose I would need to do a "composite" lexer that will call the first lexer and work on the token stream produced by this (using the LookAheadLexer).
Also, when talking about "elementary particles", I suppose you mean string literals, whitespaces, { [ / ( @ ) \ } ] and friends?

I do have an experience of the kind of situation where you have to deal with "stuff between braces" - I wrote an ANTLR based parser for a DSL for describing database models which mixed in a subset of HQL, and which could contain { expressions } of different kinds (in one place the content would be HQL, in another one it would be still a part of the DSL). The approach I used there was to treat everything inside of braces as one special literal, and pass the content to different parsers to transform the expression text into an AST. I then shifted the resulting tree nodes (in matters of syntax points, i.e. line & column number) before inserting it back in the main tree. This was possible because the expressions inside of braces could not contain other braces, so no brace counting is necessary. The reason I am bringing all this up is because to my understanding, Play has the following main constructs that have some more elaborate groovy expressions inside of them:

${user.name} // simple groovy expressions, without braces
%{ if(true) { //do something } }% // groovy code, with complex statements involving braces

My idea would be to have a jflex lexer that recognizes those expressions and passes their content to the groovy lexer & parser. So the lexer would be pretty dumb, e.g. this code

<h2>${user.name}</h2>
%{
  // some groovy code
}%

would produce the following token type stream:

HTML_LITERAL
GROOVY_EXPRESSION
HTML_LITERAL
GROOVY_CODE

and then I walk the stream and pass the different tokens to the different parsers and somehow glue it all back together.

But I suppose this is not si simply possible :-) I mean, I'd need to be able to tell the different lexers / parsers " this token starts at line x col y and ends at line x' col y' ".

You can also parse the token stream created by your lexer. At this point you need to look at this text as a code to be executed by Play! framework. Template text is just some printing, the Groovy injections is some custom code, perhaps residing inside some special nodes.

This would be for the purpose of creating a Play "language" PSI tree, right? I guess the only interesting elements this tree would feature are the tag calls, in order to be able to attach relations on the tag name (tags can be either in their own .html template-like file, or defined in a java class). In that case, how do I treat the "alien" tokens -- do I also use the OuterLanguageElement or is this only for HTML, meaning I would make my own SomeThingElseElementType?

You also have to create the HTML tree based on the same lexer output. Here we usually only parse the template text parts (HTML) without any Groovy injections. Then the text of those injections gets inserted into the resulting HTML tree as OuterLanguageElements. If your template data file element type extends TemplateDataElementType, you'll get that for almost free.

So here I would pass the TEMPLATE_TEXT tokens, but not the rest, and get back a HTML PSI tree? In that case, how do I know where to put back the groovy text as OuterElementType leaves? Or do I pass a placeholder that has a special ElementType with some line & column information?

Thanks very much for your help!

Peter Gromov

Created May 04, 2011 14:19

You certainly don't need a dedicated JavaScript tree, it gets embedded into the HTML tree automatically in the presence of JS plugin. And yes, having separate Play/Groovy trees would be better (it's possible to merge them into one but I'm not sure the Groovy code will be ready for that).

By elementary particles, I mean precisely identifiers, keywords, whitespaces, some special symbols and their combinations (only in cases the symbols don't mean anything when used separately). With string literals it's not so clear, since they only occur in Groovy, and I suppose that reusing Groovy lexer would be the correct solution, it already deals with them. When string literals may contain some injections, it gets very messy. And you certainly need some brace counting even in the "simple" Groovy expressions since they may(?) contain closures.

Yes, the Play tree will be the main tree. It will also know about both other trees, and I think the corresponding elements deserve some special element types. OuterLanguageElement's are only needed when you insert something foreign into a tree, something that that tree knows nothing about. So both HTML and Groovy trees will contain those elements.

When working with TemplateDataElementType, you pass to it the complete text of the file and a lexer. You also tell that what you need to parse is all the tokens of that lexer of a specific type (TEMPLATE_TEXT). The text of those tokens gets concatenated (you may also customize this a bit by inserting some delimiters) and parsed. After that, the text tokenized as other element types gets inserted to the appropriate places as OuterLanguageElement's.

Permanently deleted user

Created May 05, 2011 08:25

Hi,

I'm also currently working with Play! - it is really a great framework and I would love to see better support for it in IDEA in general.
Since I've decided to use the Japid template module instead of the default Groovy based template engine, I was already thinking of writing an IDEA plugin for Japid myself when I came across this thread :-)
I have to admit that I have no experience in writing plugins for IDEA and it certainly doesn't look like an easy task. Therefore I wanted to ask whether you would share your plugin code or if it is going to be a closed source plugin.

Anyway, since I would really love to see official support for Play! by JetBrains, I have also created a new issue in YouTrack:

http://youtrack.jetbrains.net/issue/IDEA-69224

Feel free to vote for it ;-)

Christian

Please sign in to leave a comment.