Perl5 plugin for Intellij IDEA

Hi everyone.

Recently i've decided to try to make perl5 plugin for InterlliJ IDEA. I've seen feew attempts to start, but they've failed after creating four base classes :)

The problem with perl is his too free syntax, which requires a very custom lexer and parser. I've tried to port perl's original lexer, but it's too big, got lot of legacy stuff, so i've droped the idea.

Currently, i'm making lexer with JFlex and it works, but need tunings for different perlish situations. Anyway, some syntax is already highighted:
46fdd3eb65.jpg
The second problem is that language plugins development documentation looks outdated (of course, not so many readers), but examples helps in such cases. Sometimes...

But is there some gurus in language plugins who could help with tricky questions or just to save some time in useless tries?

Also, if you would like to participate in perl5 plugin development, you are welcome: https://github.com/hurricup/Perl5-IDEA

236 comments
Comment actions Permalink

Please feel free to post specific questions in this forum. It's monitored by IntelliJ IDEA developers, and it's difficult to be more guru than that when it concerns IntelliJ plugin development. :)

0
Comment actions Permalink

How to parse nested block with different syntax ? Doc block, for example, has it's own sytax and i want to implement it separately. But it can be built into the perl source.
I found the block and make token for it. What should i do next? Couldn't find docs about it. And where can i find built-in syntaxes, like HTML, SQL. Got an idea.
Thanks.

0
Comment actions Permalink

To implement such separate parsing, you can use ILazyParseableElementType. You can find an example here: https://github.com/JetBrains/kotlin/blob/master/compiler/frontend/src/org/jetbrains/kotlin/kdoc/lexer/KDocTokens.java#L34

The HTML parser is part of the Community Edition source code; you can find it here: https://github.com/JetBrains/intellij-community/blob/master/xml/xml-psi-impl/src/com/intellij/lang/html/HtmlParsing.java

The SQL parser is only included in IntelliJ IDEA Ultimate; it's not open-source.

0
Comment actions Permalink

Am I allowed to implement my own lexer and parser for SQL? With blackjack and youknowwhat?

0
Comment actions Permalink

Sure. There is already an open-source plugin that supports several SQL dialects: https://plugins.jetbrains.com/plugin/1800?pr=idea
However, note that getting our SQL parsers to a reasonably complete state took us several years of effort, so you may want to prefer to focus your efforts on other aspects of your project.

0
Comment actions Permalink

Okay. I've created small lexer for perl POD (documentation) and filetype for .pod files. Works like a charm:
http://dl2.joxi.net/drive/0004/3351/294167/150421/391eed874c.jpg
Modified perlpod element to be a block of this language, using chameleon element.  Works too:

http://dl1.joxi.net/drive/0004/3351/294167/150421/31169d184d.jpg
But no coloring in the chameleon block

http://dl2.joxi.net/drive/0004/3351/294167/150421/4d8751bdb2.jpg
Seems i'm missing something

0
Comment actions Permalink

Works like a charm, thanks

POD in Perl:

http://dl2.joxi.net/drive/0004/3351/294167/150423/9f55fb5427.jpg

And Perl in POD

http://dl1.joxi.net/drive/0004/3351/294167/150423/40e1374dd6.jpg

Btw, got a question, why Perl inside Pod inside Perl is not colored? Both highilghters are Layered and works with 2 layers.

0
Comment actions Permalink

Not sure if i hav a lexing mistake or missing something again.

Perl has multiline strings and i've implemented lexing of those.

Looks like this:

http://dl1.joxi.net/drive/0004/3351/294167/150423/01114c932d.jpg

But, if i type anything inside string or after end marker, my hilighting stops working (until doc reload or full copy/paste):

http://dl2.joxi.net/drive/0004/3351/294167/150423/bc0063e9ea.jpg

Tokens looks the same in PSI viewer. Nothing happens if you are typing before such construction.

Also question: can i assume, that document is always lexed in one pass, not by pieces.

Thanks.

0
Comment actions Permalink

I've set a debugging to the lexer advance method and what I see is going on on adding character:

  1. Re-scan from some previous position. Couldn't figure out - which one. It's not YYINITIAL and not last non-newline. Looks like from token before modified one.
  2. Full re-scan
  3. Full re-scan


Questions:
From which position partial re-scan being done? If this is from previous token, i belive problem is that my psi tree is not a tree yet, and when multiline string will become a leaf of assigning expression (in my case) should it fix the problem, right?
Why two full re-scans?

0
Comment actions Permalink

Here is my progress :)
http://dl2.joxi.net/drive/0004/3351/294167/150425/66136f06ba.png

And next portion of questions:

  1. Is it possible and how to make multi-line annotations?
  2. Is it possible to do something with a bug, when re-generating lexer or parser from flex/bnf files not updating some classes, if those clases Java files are currently opened in IDEA.
  3. Is it possible to automatically clean-up gen folder when i'm re-generating parser and add back generated files to the VC?
  4. How should i handle incorrect syntax and annotations? At the moment i'm creating an element for a proper syntax and one element for the incorrect one (for this particular keyword). But not sure it's a right way. For example:  
    • Correct syntax:
      package_use ::= 'use' package_use_arguments ';'
      package_no ::= 'no' package_use_arguments ';'
      
      package_use_arguments ::=
          perl_package PERL_VERSION perl_call_params ?
          | perl_package perl_call_params ?
          | PERL_VERSION;
      
      
    • Incorrect syntax:
       
      package_use_invalid ::= 'use' code_line_invalid_element*';' 
      
    And then, i'm catching PsiElement in annotator and display message with proper syntax.  Still, not sure it's a proper way to handle this thing.
  5. And one more: before i've implemented invalid_syntax elements, parser started to build some DUMMY blocks. What are they?
0
Comment actions Permalink

Another question.
Implemented CompletionContributor and got a problem.

When searching for result, some internal class making search prefix (symbols i've entered) and it works for function names and scalar variables (like $var).

But. We've got arrays (@array) hashes (%hash) and globs (*glob) and that class cut off symbols %/*/@ and search doesn't work.

What is fastest workaround here?

0
Comment actions Permalink

Just wanna say - your platform is AWESOME! I'm so excited, can't even explain with my poor English :)

0
Comment actions Permalink

Please see com.intellij.codeInsight.completion.CompletionResultSet#withPrefixMatcher(com.intellij.codeInsight.completion.PrefixMatcher)

0
Comment actions Permalink

Is it possible to specify my own PsiElement class for leaf elements?
Currently they are all PsiElements and only non-private composite elements are generated.

0
Comment actions Permalink

I'd like to make built-ins methods and vars decorated (bold)
At the moment i'm generating different tokens in lexer, but it's not really comfortable on later work.
Is there other way to do that?

0
Comment actions Permalink

And information about perfixMatcher didn't help. I digged a bit in sources, but couldn't figure out what to do and how to avoid Java insides to cut off @

0
Comment actions Permalink

Very nice article, thanks.
I have an annotator but thought that it's only for warnings/errors/infos.
Thanks again.

0
Comment actions Permalink

When calling withPrefixMatcher, just pass a string there that contains the "@" prefix. If you have a PsiReference in those elements, you can also make sure that reference ranges cover those special symbols; then prefix should also include them. Or you can treat those symbols as prefixes and not parts of the identifiers, then there's no need to include them into matching prefix, but you should modify the search instead to take this type information into account.

0
Comment actions Permalink

Seems i don't understand something.
I'm not calling withPrefixmatcher.
I'm using CompletionContributor and my CompletonProviders addCompletion being invoked with pre-created CompletionResultSet. And i can't modify that.

0
Comment actions Permalink

You can create another result set + prefix matcher by calling CompletionResultSet#withPrefixMatcher and passing all the lookup element to the resulting CompletionResultSet.

0
Comment actions Permalink

It is possible (via com.intellij.lang.ASTFactory#createLeaf, see IDEA source for examples). But there's rarely need for it. Usually it's composite elements that contain logic and represent program's constructs. Leaf elements are just tokens.

0
Comment actions Permalink

Well, i'm pretty new to Java all right but...

I'm getting resultSet as argument. And there is no return value in addCompletion.
If i'll create new resultset in addCompletion how can i pass it back?

0
Comment actions Permalink

Btw, is there any chance to get a live consultation and guidance how to not shoot in your foot on language plugin development? As far as i know there is a JetBrains office in Spb :)

0
Comment actions Permalink

resultSet = resultSet.withPrefixMatcher("@" + resultSet.getPrefixMatcher().getPrefix())
...
resultSet.add...

You don't have to pass it back, it should work this way.

As for live consulation, that's not very easy, since all of us who write in this thread are actually located in Germany :) There are other people in the SPb office, of course. But since we've already started here, you may, for example, contact me by email (peter@jetbrains.com) and we can discuss things in email, IM or Skype. But that has a disadvantage that the results won't be available to other plugin developers who read this thread :)

0
Comment actions Permalink

Oh, thanks, didn't know it works in such way :)
Thought that it will redefine resultset in my method and keep original in caller.

0
Comment actions Permalink

Lexer rescans from a position where its getState returned the same as it was in the beginning (usually 0; see Lexer#start documentation). So if you see some strange relexing behavior, please ensure you return 0 where you can safely restart and something else where you can't (e.g. inside string literals and comments)

0
Comment actions Permalink

Oh, so i should just make a Lexic state for inteval from opening to close marker and that should solve the problem. Thanks.

0

Please sign in to leave a comment.