Code completion and syntax error recovery

I'm having a problem that I cannot find an answer to. I've been working on a grammar, and it can parse the script files when the code in the file is well formed, but it fails pretty hard when something as simple as a semi-colon is missing. I've been playing with {pin} and {recoverWhile}, but I think I'm missing something, as it seems to make the well formed code come up as invalid. I didn't think much of this problem as the editor shows roughly where the code goes wrong, but now when trying to implement code completion, it will only work if the syntax is already valid.

The custom language plugin I'm writing is for a language similar to Objective-C, so I wanted code completion to work with something like [target selector:value] the problem is, the closing "]" has to be already be there, and if there is code after the statement, the semi-colon must have been placed as well, or code completion fails to see that the current element is part of the larger "method call" element. The parent of the current element is the entire file which has been reduced to token literals and ID elements, no compound elements are found in the document.

Is there a way to have the parser "pretend" that those elements are already there, then run the check? I've been reading peoples grammar and code for the last two weeks while working on this, and I feel like I'm not making much progress. I don't understand how to use {pin} and {recoverWhile} without them causing valid code to be filled with what the parser thinks are syntax errors.

I've read the grammar kit README and HOWTO on Github, I've poured over every bnf file I could find in various repositories, and I still don't get how to do this, and I've been reading as many seemingly related topics on StackOverflow. I've been programming for a few years now, and have always been able to find the answers I've been looking for online, without needing to ask a question of my own, but I'm currently at a loss.

5 comments
Comment actions Permalink

Don't beat yourself up too much.  Grammar-kit is really under-documented unless you already know how it all works -- then it's just minimalist.

 

{pin=n} does one useful thing:  If you get to the n'th token in the rule, then you've got a match.  No rule above this one (in the BNF file) will match if you get to the pin position.  *Everything* after it is collected as long as the rest of the rule components match; the {recoverWhile} rule holds true; or until the end of the file is reached.  The collected data will show up as either expected elements or an error block.

The corollary use for {pin} is that the rule *will not match at all* until you get to the pin position.  This can be very useful to disambiguate rules requiring similar lexemes.  Consider the following rules:

unadornedMacro ::= '@macro'

adornedMacro ::= '@macro' '(' parameterList? ')' { pin=2 }

parameterList ::= expression ( ',' expression)* { recoverwhile=recovery }

private recovery ::= ! ')'

If you don't use pin at all, then parsing the text `@macro` will match adornedMacro!  This, because of the default value of pin is 1, if it's not specified.  If you use {pin=2} on adornedMacro, then unadornedMacro can be matched.

 

Now let's talk about {recoverWhile}.  Its purpose is to allow gathering of lexemes until the next match of (or the beginning of) another rule can reasonably be expected.  So, while it's grabbing everything, you want to stop at some logical point that the parser can then continue with an enclosing rule or start anew.

It seems obvious to add {recoverwhile=!')'} to adornedMacro above.  That would be incorrect, though, because that leaves the ')' as the next lexeme to be parsed, which, if it doesn't match any other rule, will cause a parsing error.  Instead, it should be placed on the parameterList rule, so that when the parameterList is finished, the closing ')' is available for the adornedMacro rule to match and complete.  It is often advantageous to make new rules to control recoverWhile; here is an example of it from the Haxe plugin (https://github.com/HaxeFoundation/intellij-haxe):

private emptyObjectLiteral ::= '{' '}'
objectLiteral ::= ('{' objectLiteralElementList+ '}') | emptyObjectLiteral

private objectLiteralElementList ::= objectLiteralElement (',' objectLiteralElement)* ','? {recoverWhile="object_literal_list_recover"}
private object_literal_list_recover ::= !('}')

objectLiteralElement ::= (identifier | stringLiteralExpression) ':' expression {recoverWhile="object_literal_part_recover"}
private object_literal_part_recover ::= !(',' | '}')

In this, you can see that the private objectLiteralElementList could easily have been part of the objectLiteral rule, but it isn't, for precisely the reason stated above: it creates a useful recovery pattern.

 

2
Comment actions Permalink

Thank you for such a quick and helpful response. It finally explained why when I used `recoverWhile` it would tell me that it expected a symbol, even though the statement was indeed followed by the same symbol it was expecting. Your explanation has helped me get going in the right direction, and I have been trying to use it to help me the last few days.

There is still a problem I cannot get past.  The main thing I am trying to do at this point in time, is to create code completion contributions for Objective-C style method calls. `[object selector:value]`. Before you explained the recover while, in order for the completion to work, it required a completely valid method call, before it would try to complete. Now my plugin will try to complete so long as there is the opening bracket, object reference, and a trailing semi-colon. Unfortunately I still need the trailing semi-colon. When I try to add a recoverWhile to the method call, it really messes things up, and most of the code in the file, even unrelated to method calls, returns as invalid and the psi tree is mostly a single level of the most primitive element types. Even code that has no method calls above it still come back as invalid, as anything listed in the recoverWhile statement is marked up in red. This happens despite not coming after an invalid(or really any) method call.

I was wondering if there was another way to help autocomplete along without recoverWhile. If there is some api to pretend there is a semi-colon there. If there is a way to try inserting a semi-colon after invalid statements virtually(not actually in the editor), just to have the parser re-check the code to see if it help - a way to use code to try to recover the code statements. Is there an easy way, or at least a well documented more difficult way to do something like this. I was thinking I could try walking backward through the sibling elements one-by-one, and try to figure out if it could possibly be a method call and go from there. Seems less than ideal, but if it is the only way to do this, I will do it. I am just hoping there is another way, a smarter way.

If you could help guide me once more in the right direction, I would really appreciate it.

0
Comment actions Permalink

To figure out why auto-complete isn't picking up your methods, you should debug through the completion code.  It basically takes a list of CompletionContributors that it loads from the installed plugins and then asks each one for some completions.  To get the results you want, it is best/easiest to create your own contributor.  Failing that, take a look at other contributor implementations (the java one is in the IntelliJ Community sources: https://github.com/JetBrains/intellij-community) and see how they are filtering the completion list (which I think is just loaded from the word index).

But that probably isn't the crux of your problem.  It sounds to me like there is some rule in the BNF that is taking precedence when the semi-colon is missing.  (You don't pin on the semi-colon, do you?)  You can try and see what happens when you make the semi-colon optional (e.g. ';'?).  But then you won't get a syntax error marked when it is missing.  You can also debug through the parser that is generated from your BNF.  It really isn't that hard and can show you where things go wrong.

You *can*, of course try to do as you suggest in your last paragraph (insert a semi-colon and re-parse).  You would clone the PSI (using PsiElement.copy()), add your new semi-colon element (created via the PsiElementFactory for your language) at the appropriate place, and reparse the file to see if things look better.  However, detecting "better" is difficult, at best. 

I suggest looking further at your BNF, and failing that, look into a custom contributor. 

BTW, you can see the resulting PSI structure using the PSI Viewer tool:  Add "-Didea.is.internal=true" to the java command line in your debug/run configuration.  Two new menu items will appear in the Tools menu: "View PSI Structure" and "View PSI Structure of current file".  They bring up a dialog which shows the PSI in all of its glory.  You can even edit code and, after pressing the "Rebuild PSI" button, see the results.

There are also preview/debugging modes available when editing the BNF itself, though I've had little luck using them with a complex grammar.

0
Comment actions Permalink

Hello, thank you for taking time to reply. I had already written my completion contributor for method selector completion, which is how I became aware of the problem with the psi tree breaking. It would not contribute in the case of a missing semi-colon, because it was no longer recognizing the element as being part of a method call. You gave me an idea however, when you mentioned that I should check to see if it works when I do not require the semi-colon, and as it does work, I decided to check for the semi-colon in the annotator, and add markup if it's missing. One thing I'm not sure how to do now, is that when the parser finds the missing semi-colon, it puts a small red indicator next to where it expects one. When I try to do the same(i.e. when setting the range to the end of the statement, plus one), it never shows up. I think it's because it is marking the new line character, so the mark ends up invisible. Is there a special way to mark the space next to a statement, even if no character is visible there?

0
Comment actions Permalink

I think that the parser does it by (eventually) creating a PsiErrorElement and inserting that into the tree, rather than highlighting an existing element.

Something like:

PsiElement next = ... // Get the element at the position where the semi-colon should be.

PsiElement errorMarker = new PsiErrorElementImpl("Missing semi-colon");

next.parent.addBefore(errorMarker, next);

 

0

Please sign in to leave a comment.