ANTLR parser support for jetbrains plug-in development

howdy. I finally managed to get a separate library together that adapts ANTLR stuff to intellij.  note that I copied over a lot of support for trees from ANTLR, such as Xpath.  For example, here is how you get the name nodes for all functions assuming a tree structure from sample grammar.

 
Collection<? extends PsiElement> allfuncs =
    XPath.findAll(SampleLanguage.INSTANCE, tree, "/script/function/ID");

Here is the library and a sample plug-in (screenshot attached)

https://github.com/antlr/jetbrains

https://github.com/antlr/jetbrains-plugin-sample


I gave up trying to force fit ANTLR parse trees into intellij. Once I embraced PSI, I made rapid progress by simply adapting my favorite tree stuff to PSI types.

@jetbrains guys:  any interest in formally accepting the ANTLR support lib as part of the standard plug-in SDK? It's quite small and would provide an excellent service to plug-in developers.

Terence



Attachment(s):
screenshot.png
18 comments
Comment actions Permalink

Wow! This is really cool, I'm really glad you decided to go through with getting ANTLR working with the intellij stuff! Couple of quick remarks/questions for you.

for context:

I'm in a research group writing a "translator-like-thing" for a domain specific language, and back in (2013? .. somewhere around there) I transitioned to antlr 4 from 2.x and it just blew me away. Anyways, having used intellij to write the tools for this language (translator, etc), I became interested in figuring out how to write a useful intellij plugin for it. Long story short, you can imagine my disappointment when I realized after looking into plugin development that I'd have to re-write my new, clean (e.g. actionless) antlr4 grammar in this .bnf format supported by jetbrains. The prospect of having to maintain two separate grammars (keeping them in lockstep) is a nightmare in its own right, but doing so in the context of a (somewhat?) turbulent research environment made it even worse.

Having seen the antlr plugin you wrote and the adaptor you and sam harwell wrote back in 2014 I was very encouraged that eventually antlr4 support would come. As a little experiment in the meantime (over the last couple of months) I wrote a subset of my language in .bnf, and expressed the same subset in antlr4 using your initial adaptor and attempted to write a fairly complete little plugin that includes the works (ref completion, keyword completion, struct view) around these two approaches.

Here are a couple things I noticed in going through with this dumb(?) exercise (if you're interested :)):

1. pinning: Not sure how closely you've looked at the bnf stuff (probably much more than me) but one useful thing they allow is the ability to pin certain parts of a rule,
for instance if I say (in bnf):

OpDecl ::= 'Operation' identifier Signature ';' {pin=2}

the rule OpDecl won't be 'recognized' until the user  starts to type 'identifier' (at least one char). Then when you go to automatically gen the PSI nodes for this guy the generator will automatically annotate (inside the node for OpDecl) getIdentifier() as @NotNull. As innoculous as this seems, this is actually pretty nice when doing things like reference completion, as it requires you, in your getReference(..) methods to avoid having to do all of these uglity nullity checks yourself (maybe I'm wrong or just incompetent -- probably), but I ended up with if not null checks all over the place when working with the PSI built by the antlr adaptor to avoid these annoying cases where I would erase the name of something in the middle of completion and get all kinds of errors in the background. Granted, while this is an inconvenience, not sure it's reason enough to rewrite the entire grammar in bnf :) . I noticed at some point you mentioned this in an git todo issue so maybe you have plans for this.

2. making psi rule nodes invisible (private): Another thing I ran into; say I have a rule like this:

 
mathDefinitionSig
    :   mathPrefixDefinitionSig
    
|   mathInfixDefinitionSig
    
|   mathOutfixDefinitionSig
    
;


Ideally, psi wise, I'd like to have an (handcoded) abstract class called something like MathAbstractDefinitionSig (meaning no concrete node generated for mathDefinitionSig). Then tell the generator that the concrete nodes for prefix, outfixl, and infix should extend the MathAbstractDefSig class I wrote. In bnf, I'd just say something like this:

 
extends("Math.*(Signature)")="edu.mylang.plugin.psi.impl.ResAbstractMathSignatureImpl"
 
implements("Math.*Signature")="edu.mylang.plugin.psi.ResMathDefinitionSignature"
 
private MathDefinitionSignature ::=
           MathPrefixDefinitionSignature
        |  MathOutfixDefinitionSignature

        |  MathInfixDefinitionSignature


Then, if you check out the tree in PsiViewer, sure enough MathDefinitionSignature (rightly) has no concrete presence. Right now I'm not sure how I'd accomplish this in the ASTFactory approach currently (advocated?) with the antlr adaptor. If I didn't care to manually create a node for MathDefinitionSignature, wouldn't it just automatically make it a Composite node (er, ASTWrapperPsiElement or whatever it's called) -- either way it's still there.

3. parser error reporting:  Lastly, I have noticed some things in terms of error reporting. For instance if I type this into the editor:

Module X

but the identifier for the module requires, say, a semicolon to appear after the identifier X, I don't get any syntax error reports or underlines -- it'll only detect this if there is something declared after the missing semicolon

Module X

Def. foo : B;

then it'll tell me there's an error after the X. Why is this? Is it an adaptor thing? It's odd because if I gen the parser with the bnf grammar, I get an immediate error underline appearing after the X even when there's nothing proceeding in the file.
(I should mention too, this is still the case with your most recent -- December 2015 -- version of the adaptor)

Sorry about the long post, I'll just restate: I'm really glad your working on this and thanks for all your work! Let me know if I've stated anything incorrectly (probably :)) or if you had any followup questions/remarks. I'm kind of bouncing between drinking the jetbrains bnf koolaid right now or just buckling down and going the antlr route (though I do realize it might be kind of early to decide) as you just "officially" posted this today!

Thanks again.

0
Comment actions Permalink

Hi Daniel! Heh, thanks for the detailed comments. I'd be very happy if you would become a contributor to the library, particularly since you have some very good suggestions. I am also interested in getting different PSI nodes for different alternatives. For example, ANTLR builds a different tree node, called AssignContext, for the assignments but we get a generic ANTLRPsiNodeAdaptor object so far in my library:

 
statement
   :  'if' '(' expr ')' statement ('else' statement)?       # If
   
|  'while' '(' expr ')' statement                # While
   
|  ID '=' expr                                  # Assign


Annoying as I want to use xpath to go find assign nodes ;) and I can't do that unless I have a rule reference.

concerning your last question about parse errors. Put EOF on the end of your start rule so that it forces it to look for the end of file. Otherwise it assumes that it should match only as much of it can.

I'm currently working on the sample plug-in, adding comments and so on.

Ter

0
Comment actions Permalink

Thanks Terence, I'll definitely be keeping my eye on this and would be happy to help in any way that I can!

Also, I tried out the EOF suggestion mentioned, but still haven't managed to get the error highlights that I would like in my syntax checker. See the attached picture; surely this should indicate in the editor (with a red line or otherwise) that 'end' is expected but hasn't been found.

I took a bunch of alternatives out temporarily to simplify the picture a bit. And I am sure that re-gen'd the grammar after adding EOF. I even tweaked a keyword so I could be sure I was indeed running the most recent version of the grammar.
 
Perhaps this has to do with the version of ANTLR I'm running? I just checked and I'm running 4.5-complete.jar.



Attachment(s):
eof.png
0
Comment actions Permalink

what is your start rule? if it's module, you should DEFINITELY get an error as EOF is there.  Do you have a listener on the ANTLR parser? the parser adaptor I have should add error nodes for missing or bad stuff.  Does that help?
Ter

0
Comment actions Permalink

Yep, my start rule is module (I specify that as well in RESOLVEParserDefinition#createParser() -- like sample does.)

Hmm, guess I'm missing something. Right now I have all the Parser definition stuff, language, and filetype boilerplate stuff all there and added to meta inf, I do see a syntaxErrorListener in the adaptor git submodule, though I haven't touched it -- I thought the adaptor handeled that..

I don't need to implement the SyntaxHighlighter or anything to get those messages right?

0
Comment actions Permalink

Syntax errors should be handled automatically in the adapter.  for example, I see error nodes in the PSI tree for syntax errors.   You shouldn't need a syntax highlighter but I doubt the parser would be called unless you have one.

Please set a breakpoint in ANTLRParserAdaptor#parse() after it executes the following:

ParseTree parseTree = null;
PsiBuilder.Marker rollbackMarker = builder.mark();
try {
   parseTree = parse(parser, root);
}
finally {
   rollbackMarker.rollbackTo();
}


then look at the listener. That should help track things down.

The other thing that I always check in this situation when code needs to be generated and incorporated into a project: make sure that you don't have the code being generated in different places. Perhaps once with a maven build and once manually? they might be generated into two different directories and you're picking up an outdated version of the grammar. one that does not have EOF.

Ter

0
Comment actions Permalink

Ok, I set the breakpoint and sure enough, looks like the syntax error is in there, but for some reason it's not getting reported.

Maybe I'll try hooking up the syntax highligher and see if that makes a difference, though the breakpoint was trigged and it just seems like it shouldn't matter.

Oh, and I'm not using maven at the moment (I don't have the pom.xml in) for now I'm just right clicking on the .g4 file and, as expected, all the gen files go to gen/ (I've checked by making sure gen was empty then redoing it, etc) -- so hopefully thats not the problem. That'd be pretty anticlimactic.

Dan



Attachment(s):
breakpoint.png
0
Comment actions Permalink

Sorry for the delay. I had to rework how errors are handled by my adapator to properly highlight errors in the document.  I first had to understand what the impedance mismatch was and then alter the error strategy and handler.  Please pull the antlr/jetbrains lib down again and try it. :)
Ter



Attachment(s):
Screen Shot 2015-12-17 at 11.31.58 AM.png
0
Comment actions Permalink

Very nice! Much more responsive to the partially formed things I'm typing in.

One thing I noticed: I don't think error notifications kick in for the first keyword recognized for a given rule. For example, here are two rules:

moduleDecl
     :      precisModuleDecl EOF
     ;

precisModuleDecl
     :     'Precis' name=ID ';'
           'end' closename=ID ';'
     ;

if I just type Precis into the editor, clearly ID is supposed to follow, but I don't get an underline notification. However if I continue typing and go ahead with the ID, everything starts to work as expected -- I get notifications for everything erroneous/missing. It's just after completing that first token that I'm not seeing the message. Attached is a pic of this. You can actually see in PSI viewer that the errnode in that case is missing (though it does say nextchild is that error...).

Oh, and I think you have a stray printout that needs to be hushed up on line 19 of ErrorStrategyAdaptor (I can squash it if you need).



Attachment(s):
odd.png
0
Comment actions Permalink

crap. i could swear I tried all scenarios. Yep, works for me. See snapshot. My grammar is:

 
script
   :  function EOF
   
;

function
   
:  'func' ID '!'
   
;


maybe check your lexer. do you have stuff like this?

 
WS : [ \t\n\r]+ -> channel(HIDDEN) ;

/** "catch all" rule for any char not matche in a token rule of your
*  grammar. Lexers in Intellij must return all tokens good and bad.
*  There must be a token to cover all characters, which makes sense, for
*  an IDE. The parser however should not see these bad tokens because
*  it just confuses the issue. Hence, the hidden channel.
*/
ERRCHAR
   
:  .  -> channel(HIDDEN)
   ;


Attachment(s):
Screen Shot 2015-12-17 at 2.36.31 PM.png
0
Comment actions Permalink

So I looked to make sure that I had all necessary things you suggested at the end of the lexer portion of my grammar. In fact my grammar (apart from the top level rules you're looking at is a match with the sample.g4)

So before I tried simplifying the issue (that'll show me! ), here's the actual snippet (no simplification -- except for the comments within precis):

 
moduleDecl
    :   ( precisModuleDecl
        
| precisExtensionModuleDecl
        
) EOF
    
;

precisModuleDecl
    
:   'Precis' name=ID ';'
        //    (usesList)?
        //    precisBlock
        'end' closename
=ID ';'
    
;

precisExtensionModuleDecl
    
:   'Precis' 'Extension' name=ID 'for' precis=ID ';'
        // ...
        'end' closename
=ID ';'
    
;


So it's a top level rule (moduleDecl) with two alts terminated by an EOF.

doing this then typing in Precis, no error node is created. I thought maybe this had to do with the fact that each alternative begins with the same token... So  I changed the precisExtensionModuleDecl rule to:

precisExtensionModuleDecl
    :   'Prexis' 'Extension' name=ID 'for' precis=ID ';'
        // ...
        'end' closename=ID ';'
    ;


and voila, everything works it all works as expected -- meaning just "Precis" isn't valid input. But when I change it back to the way it was, "Precis" becomes valid.

Shouldn't the error node that's not being created in this case be something like: "eof found, ID or Extension expected.."?

0
Comment actions Permalink

ah! that is the difference. I know the issue. thinking of solution.
Ter

0
Comment actions Permalink

Works! Thanks! I'm going to try getting some ref completion stuff going and will post any questions I have as I run up on them.

0
Comment actions Permalink

I'll push my sample plugin changes for refs/defs shortly. basics seem to work.

0
Comment actions Permalink

Sorry to keep bothering you with this syntax err highlighting stuff, but any idea what the problem with lists might be?

For instance, I have a uses/import list rule that looks like:

 
usesList
    :   'uses' ID (',' ID)* ';'
    
;


So I type

Precis X;
     uses y

end X;

doesn't catch the missing semicolon after y. Another one too:

Precis X;

     Definition x : ;

end X;

rule for this construct starts at "mathStandardDefinitionDecl", probably just easier to link to the .g4 and look at it on git. Speaking of git, I should probably just start moving this discussion to an issue thread instead?

the g4 file: https://github.com/Welchd1/jetbrains-plugin-resolve/blob/antlr4-way/src/edu/clemson/resolve/jetbrains/parser/Resolve.g4

0
Comment actions Permalink

hi. sure, can you start an issue at antlr/jetbrains repo?
Thanks!
Ter

0
Comment actions Permalink

Issue discussion moved here: https://github.com/antlr/jetbrains/issues/2.

Apologies if I hijacked the thread!

0

Please sign in to leave a comment.