Perl5 plugin for Intellij IDEA

Hi everyone.

Recently i've decided to try to make perl5 plugin for InterlliJ IDEA. I've seen feew attempts to start, but they've failed after creating four base classes :)

The problem with perl is his too free syntax, which requires a very custom lexer and parser. I've tried to port perl's original lexer, but it's too big, got lot of legacy stuff, so i've droped the idea.

Currently, i'm making lexer with JFlex and it works, but need tunings for different perlish situations. Anyway, some syntax is already highighted:
46fdd3eb65.jpg
The second problem is that language plugins development documentation looks outdated (of course, not so many readers), but examples helps in such cases. Sometimes...

But is there some gurus in language plugins who could help with tricky questions or just to save some time in useless tries?

Also, if you would like to participate in perl5 plugin development, you are welcome: https://github.com/hurricup/Perl5-IDEA

1
236 comments

I don't know much about Perl, so I'll answer on a very abstract level.

If you need to implement lots of analysis in the parser, that's fine. It probably means that you'll reflect the results of the analysis in the PSI tree structure: in element types, how nodes are nested etc. This is trivially reusable in the annotator. AFAIK there's no good way of reusing more information, e.g. some additional structures you've built in the process of parsing. OTOH complicated parsing likely means that it'll be slow and not very incremental, which is obviously a problem.

You could also think about having a very simple PSI structure that's easy to be produced in the parser, and defer all the complicated analysis until it's needed in annotators, completion etc. Then you can cache and share it between all these clients quite easily. If it's possible to do this, I'd go this way.

Another way: parse the file twice. The first time you extract some important information (if you do this via FileBasedIndexExtension, IDEA will cache it on disk for you), and the second part, in your actual PSI parser, will be guided by this important information and make decisions based on it. That might work, but there's so many things that could go wrong here, so I'd only advise to do this as a last resort.

1

That's quite a lot to do, actually. You need to have several PSI trees in one file. You should have a custom view provider (extends MultiplePsiFilesPerDocumentFileViewProvider implements TemplateLanguageFileViewProvider) for that, have PhpStylePerl as a base language and HTML as the template data language. Both files should build PSI on the same text, but skip the parts that belong to another language by dedicating special token types to them, skipping them in the parser and creating OuterLanguageElement PSI for these tokens.

The best start would be to check out CFML plugin (https://github.com/JetBrains/intellij-plugins/blob/master/CFML/src/com/intellij/coldFusion/model/files/CfmlFileViewProviderFactory.java) and get inspiration from its source.

1

Your base language (be it Perl or not, I don't know if you want to have Perl without HTML) should understand Perl and have a notion of HTML fragments. It should lex these fragments as tokens of a special type (as in CfmlElementTypes#TEMPLATE_TEXT). But it shouldn't be able to fully lex HTML, it can just understand where a fragment starts and where it ends. There are probably some markers of it in the text.

The template data file will have a special content element type (as in CfmlElementTypes#TEMPLATE_DATA) that'll do the lexing for you. You'll still need to build a highlighting lexer yourself, but that's quite easy (see CfmlHighlighter).

1

Yes, you can have other language support with multiple-PsiFile approach. But if you only need self-contained fragments that don't depend on other fragments in the same file (e.g. for resolve), you can also use language injection (PsiLanguageInjectionHost) or just embed them into your main tree with ILazyParseableElementType. And you can define an optional plugin dependency on SQL support plugin, as described at https://confluence.jetbrains.com/display/IDEADEV/IntelliJ+IDEA+Plugin+Structure.

Second: please see Settings | File and Code Templates. There are many examples there, some relatively "smart". Templates should be packaged in "fileTemplates" directory of your plugin jar, e.g. https://github.com/JetBrains/intellij-community/tree/3f7e93e20b7e79ba389adf593b3b59e46a3e01d1/plugins/groovy/groovy-psi/resources/fileTemplates

1

It depends on your code, but mostly it's used for templating languages, where data language (e.g. HTML) fragments are interspersed with templating directives (e.g. PHP). But nevertheless these data language fragments are parts of a whole, so they should be treated as such and be parts of a single PSI tree. Language injection or lazy-parseable elements can have independent contents, e.g. a Java file may contain several SQL statements in its string literals, and they are completely different and not related at all. That's one of the reasons we use injection for them, and not multi-root view provider.

Choosing between injection and lazy-parseable elements is simple: if you can determine the areas of another language when lexing/parsing, choose the latter. Otherwise - injection.

1

It looks like your element needs to implement PsiNameIdentifierOwner and have StringContent as its name identifier.
References from declaration to itself are possible, but strange. It's not a reference, after all, it's just a name.
For in-place rename, please override com.intellij.lang.refactoring.RefactoringSupportProvider#isInplaceRenameAvailable. You might also need to extend VariableInplaceRenameHandler, as some plugins do.
For element presentations in UI, please look at ElementDescriptionProvider implementations and add your own.

1
Avatar
Alexandr Evstigneev

First, your PSI element, which content should be treated as another language must implement PsiLanguageInjectionHost interface (PsiComment implements it by default)
Second, you should implement class which implements LanguageInjector interface and register it in languageInjector extension point. This class contains logic - which language to inject in particular element. Available languages may be obtained from Language.getRegisteredLanguages()

1

I asked someone at JetBrains a while ago about this, since I'll probably need to do something similar in my plugin for Clojure.

Firstly, one comment was that this is considered a very bad code smell, since it implies that the index doesn't depend just on the file content, and this is a very basic invariant in the IntelliJ indexing system. However for very flexible languages like Clojure (and Perl, I guess) this is a sad reality - sometimes you need some sort of config to interpret the code.

What he recommended was to go with FileBasedIndex.requestReindex(VirtualFile) - this will rebuild all indices for that file. I only asked about file based indexes since that's all I use, but I believe that should also refresh stub indices since they use file based indexes under the hood. If you need to reparse those files as well, use PushedFilePropertiesUpdater#filePropertiesChanged instead (you might need to implement FilePropertyPusher for this, I'm not sure).

If you have a lot of files to update he recommended entering dumb mode, using code similar to PushedFilePropertiesUpdaterImpl#scheduleDumbModeReindexingIfNeeded. I think this is only in recent versions and will need to be copied into your code if you're supporting older versions.

    DumbModeTask task = FileBasedIndexProjectHandler.createChangedFilesIndexingTask(myProject);
    if (task != null) {
      DumbService.getInstance(myProject).queueTask(task);
    }



I also asked about just restarting the daemon analyzer, in case my change only affects local symbol resolution, not indexing. Here's the magic incantation for that:


      ((PsiModificationTrackerImpl)PsiManager.getInstance(project).getModificationTracker()).incCounter()
      PsiManager.getInstance(project).dropResolveCaches()
      DaemonCodeAnalyzer.getInstance(project).restart()

I haven't actually implemented any of this yet, but that seems to be the recommended approach.
1

Please feel free to post specific questions in this forum. It's monitored by IntelliJ IDEA developers, and it's difficult to be more guru than that when it concerns IntelliJ plugin development. :)

0
Avatar
Alexandr Evstigneev

How to parse nested block with different syntax ? Doc block, for example, has it's own sytax and i want to implement it separately. But it can be built into the perl source.
I found the block and make token for it. What should i do next? Couldn't find docs about it. And where can i find built-in syntaxes, like HTML, SQL. Got an idea.
Thanks.

0

To implement such separate parsing, you can use ILazyParseableElementType. You can find an example here: https://github.com/JetBrains/kotlin/blob/master/compiler/frontend/src/org/jetbrains/kotlin/kdoc/lexer/KDocTokens.java#L34

The HTML parser is part of the Community Edition source code; you can find it here: https://github.com/JetBrains/intellij-community/blob/master/xml/xml-psi-impl/src/com/intellij/lang/html/HtmlParsing.java

The SQL parser is only included in IntelliJ IDEA Ultimate; it's not open-source.

0
Avatar
Alexandr Evstigneev

Am I allowed to implement my own lexer and parser for SQL? With blackjack and youknowwhat?

0

Sure. There is already an open-source plugin that supports several SQL dialects: https://plugins.jetbrains.com/plugin/1800?pr=idea
However, note that getting our SQL parsers to a reasonably complete state took us several years of effort, so you may want to prefer to focus your efforts on other aspects of your project.

0
Avatar
Alexandr Evstigneev

Okay. I've created small lexer for perl POD (documentation) and filetype for .pod files. Works like a charm:
http://dl2.joxi.net/drive/0004/3351/294167/150421/391eed874c.jpg
Modified perlpod element to be a block of this language, using chameleon element.  Works too:

http://dl1.joxi.net/drive/0004/3351/294167/150421/31169d184d.jpg
But no coloring in the chameleon block

http://dl2.joxi.net/drive/0004/3351/294167/150421/4d8751bdb2.jpg
Seems i'm missing something

0

You also need to register a new layer for the lexer in your syntax highlighter, as it is done here: https://github.com/JetBrains/kotlin/blob/master/idea/idea-analysis/src/org/jetbrains/kotlin/idea/highlighter/JetHighlightingLexer.java


0
Avatar
Alexandr Evstigneev

Works like a charm, thanks

POD in Perl:

http://dl2.joxi.net/drive/0004/3351/294167/150423/9f55fb5427.jpg

And Perl in POD

http://dl1.joxi.net/drive/0004/3351/294167/150423/40e1374dd6.jpg

Btw, got a question, why Perl inside Pod inside Perl is not colored? Both highilghters are Layered and works with 2 layers.

0
Avatar
Alexandr Evstigneev

Not sure if i hav a lexing mistake or missing something again.

Perl has multiline strings and i've implemented lexing of those.

Looks like this:

http://dl1.joxi.net/drive/0004/3351/294167/150423/01114c932d.jpg

But, if i type anything inside string or after end marker, my hilighting stops working (until doc reload or full copy/paste):

http://dl2.joxi.net/drive/0004/3351/294167/150423/bc0063e9ea.jpg

Tokens looks the same in PSI viewer. Nothing happens if you are typing before such construction.

Also question: can i assume, that document is always lexed in one pass, not by pieces.

Thanks.

0
Avatar
Alexandr Evstigneev

I've set a debugging to the lexer advance method and what I see is going on on adding character:

  1. Re-scan from some previous position. Couldn't figure out - which one. It's not YYINITIAL and not last non-newline. Looks like from token before modified one.
  2. Full re-scan
  3. Full re-scan


Questions:
From which position partial re-scan being done? If this is from previous token, i belive problem is that my psi tree is not a tree yet, and when multiline string will become a leaf of assigning expression (in my case) should it fix the problem, right?
Why two full re-scans?

0
Avatar
Alexandr Evstigneev

Btw, http://youtrack.jetbrains.com is not working

0
Avatar
Alexandr Evstigneev

Here is my progress :)
http://dl2.joxi.net/drive/0004/3351/294167/150425/66136f06ba.png

And next portion of questions:

  1. Is it possible and how to make multi-line annotations?
  2. Is it possible to do something with a bug, when re-generating lexer or parser from flex/bnf files not updating some classes, if those clases Java files are currently opened in IDEA.
  3. Is it possible to automatically clean-up gen folder when i'm re-generating parser and add back generated files to the VC?
  4. How should i handle incorrect syntax and annotations? At the moment i'm creating an element for a proper syntax and one element for the incorrect one (for this particular keyword). But not sure it's a right way. For example:  
    • Correct syntax:
      package_use ::= 'use' package_use_arguments ';'
      package_no ::= 'no' package_use_arguments ';'
      
      package_use_arguments ::=
          perl_package PERL_VERSION perl_call_params ?
          | perl_package perl_call_params ?
          | PERL_VERSION;
      
      
    • Incorrect syntax:
       
      package_use_invalid ::= 'use' code_line_invalid_element*';' 
      
    And then, i'm catching PsiElement in annotator and display message with proper syntax.  Still, not sure it's a proper way to handle this thing.
  5. And one more: before i've implemented invalid_syntax elements, parser started to build some DUMMY blocks. What are they?
0
Avatar
Alexandr Evstigneev

Another question.
Implemented CompletionContributor and got a problem.

When searching for result, some internal class making search prefix (symbols i've entered) and it works for function names and scalar variables (like $var).

But. We've got arrays (@array) hashes (%hash) and globs (*glob) and that class cut off symbols %/*/@ and search doesn't work.

What is fastest workaround here?

0
Avatar
Alexandr Evstigneev

Just wanna say - your platform is AWESOME! I'm so excited, can't even explain with my poor English :)

0

Please see com.intellij.codeInsight.completion.CompletionResultSet#withPrefixMatcher(com.intellij.codeInsight.completion.PrefixMatcher)

0
Avatar
Alexandr Evstigneev

Is it possible to specify my own PsiElement class for leaf elements?
Currently they are all PsiElements and only non-private composite elements are generated.

0
Avatar
Alexandr Evstigneev

I'd like to make built-ins methods and vars decorated (bold)
At the moment i'm generating different tokens in lexer, but it's not really comfortable on later work.
Is there other way to do that?

0
Avatar
Alexandr Evstigneev

And information about perfixMatcher didn't help. I digged a bit in sources, but couldn't figure out what to do and how to avoid Java insides to cut off @

0
Avatar
Alexandr Evstigneev

Very nice article, thanks.
I have an annotator but thought that it's only for warnings/errors/infos.
Thanks again.

0

When calling withPrefixMatcher, just pass a string there that contains the "@" prefix. If you have a PsiReference in those elements, you can also make sure that reference ranges cover those special symbols; then prefix should also include them. Or you can treat those symbols as prefixes and not parts of the identifiers, then there's no need to include them into matching prefix, but you should modify the search instead to take this type information into account.

0
Avatar
Alexandr Evstigneev

Seems i don't understand something.
I'm not calling withPrefixmatcher.
I'm using CompletionContributor and my CompletonProviders addCompletion being invoked with pre-created CompletionResultSet. And i can't modify that.

0

Please sign in to leave a comment.