The dreaded "IntellijIdeaRulezzz" string

We added a bug report for this as the behavior we see simply cannot be correct.

http://youtrack.jetbrains.com/issue/IDEA-123914

The problem is that the IntellijIdeaRulezzz string keeps getting added to my input. It seems to occur exactly during autocomplete as this comment says:

/**

   * A default string that is inserted to the file before completion to guarantee that there'll always be some non-empty element there

   */

  public static @NonNls final String DUMMY_IDENTIFIER = "IntellijIdeaRulezzz ";

Sometimes the lexer is only told to scan up until the “I”, but sometimes the end of the input to scan is at the last z. This seems like a hack, right?  My experience so far is that the source code for intellij is really good (of course completely without comments where you need them but we all know that). This is the first bit of code that looks truly wrong/bad.

I see it again *as a literal* and in lowercase elsewhere:

final String fakeInitializer = "intellijidearulezzz";

in class IntroduceVariableBase.

Does anybody have any idea how we can get around this? Sam Harwell, my co-author on ANTLR, said that he added a CompletionContributor that set the dummy identifier to "", which would prevent it from adding that string. Unfortunately that seem to completely break the framework as it relies on that being a unique string in the document. [I should point out that Sam is extremely experienced at constructing plugins/IDEs.]

Context: I'm working on the ANTLR plug-in and this has cost me days of time.

thanks for any help or hints!

Terence

20 comments

PS:  you can stick whatever weird text in that you want, but don't tell my lexer and parser to scan it, please. Yes, i've read all the other posts.

0

That string is inserted during completion, to ensure that there's always a valid identifier at the caret when running completion. You can customise it to whatever would be a valid identifier in your language, but generally you need something (i.e. you can't customise it to insert an empty string, as Sam tried). The way this works is during completion the PSI is copied and that symbol is inserted into the copy. You then do your autocomplete logic but that symbol is never inserted into your original document. The only way it sneaks into the real document is generally bugs in the autocomplete code, i.e. creating an autocomplete item from the dummy symbol.

0

Hi Colin! Thanks for the reply.  the problem is that no matter what I use as the identifier, the lexer should not be trying to scan it, right?  We don't see it in the real document, only in the text that the lexer is asked to scan. Unfortunately this extra string then appears in a visual representation of the input as I show shortly.  I can understand that there might be something in auto complete that needs an extra bit of text, although I'm not sure why as I am not a GUI expert, but why would the lexer need to scan some random invalid identifier for auto completion? that doesn't seem correct to me.

The ANTLR plug-in has a grammar file in an editor window and, just like in grammarkit plug-in, we have a (2nd) input window where the user can type sample input for that grammar. We show the ANTLR parse tree as the user types input dynamically. What we see is that damn string inside our parse tree.  I am attaching a screen snapshot.  After typing the letter x, we see that window snapshot and in the output we see the lexer is complaining about that extraneous input:

line 1:1 token recognition error at: 'I'
line 1:2 token recognition error at: 'n'
line 1:3 token recognition error at: 't'
line 1:4 token recognition error at: 'e'
line 1:5 token recognition error at: 'l'
line 1:6 token recognition error at: 'l'
line 1:7 token recognition error at: 'i'
line 1:8 token recognition error at: 'j'
line 1:9 token recognition error at: 'I'

...

I just don't see how that can be anything but a bug. The lexer should never see that string. Instead of parse tree

(a (e xIntellijIdeaRulezzz ) <missing ';'>)

we should see

(a (e x ) <missing ';'>)

After typing the ';', it gets complete input and shows the proper tree. With erroneous input I also see that extra identifier.

I suppose I could unhack the hack by ignoring (in the lexer input string) anything that starts with that string. Of course in the future this will break when they use a different string or somebody else changes the string on me.

Hopefully somebody at jetbrains can figure out why this is happening. I suspect this is a bug because there are other cases where it tells me to stop at the first letter of that same bogus string during lexing.

Thanks!
Ter



Attachment(s):
damnstring.png
0

thanks. wow. actually doc! :)  Yep, Sam tried making a CompletionContributor but it won't take "". must be an id but that extra text gets scanned by lexer, which must be a bug I guess.

Ter

0

I'm not actually sure when the copy happens if the text of the doc is copied and then it's lexed and parsed, or if the PSI structure is copied and then the identifier is inserted directly in the PSI. I suspect that the text is copied and then lexed and parsed, which is consistent with what you're seeing.

The token is inserted to make autocomplete logic more consistent - often autocomplete needs to take into account the surrounding context and without an identifier there that context is often invalid. Imagine that I'm completing a static method call in Java, say Integer.parseInt(). If I have the Integer.| (with | representing the cursor) that's going to be an invalid chunk of code which would complicate the completion - having Integer.IntelliJRulezz at least makes the immediate context for completion a valid identifier, even though it's a little nonsensical.

If you're seeing that identifier in the tree, then it must be building that tree based on the file copy, not the original file - are you triggering the build of the tree inside your completion somehow? If so, you can get the original tree from CompletionParameters#getOriginalFile().

If you can prevent your lexer from lexing anything containing that string that might be the best solution (although a little hacky) - if you're worried about future breakage then you can always supply your own dummy identifier in the CompletionContext or use the value of CompletionInitializationContext.DUMMY_IDENTIFIER.

0

I guess in your case, you don't know what a valid identifier looks like, right? Since you're scanning text using a lexer that the user has created. That might make things a little trickier - have you looked at what GrammarKit does in that case?

0

Of course it's not a bug. The whole point of inserting this string is to always have some element in the PSI (in the copy of a file) that the completion can anchor to - in other words, something on which getVariants() can be called. In order for the element to appear in the PSI, it needs to be scanned with your lexer and parsed with your parser.

0

As my coauthor puts it: "Basically the string is a workaround for not having a parser that can handle incomplete input. ANTLR 4 is not one of those parsers, and the code completion logic (especially in AW2 and GoWorks) depends on being provided the *actual* input. The only valid "dummy identifier" for this parser is the empty string."

You'll forgive me for my ignorance in IDE design, but few in the world have my experience with lexers and parsers. There is no excuse for making my parser parse invalid input and then asking me somehow to ignore it. NetBeans doesn't seem to need this but I haven't checked eclipse. Instead of having some random identifier "to anchor to", your PSI could have a node that represents the set of input symbols that could come next.  From there, your code completion could even include non-identifiers for more sophisticated completion. Surely your parser needs to know the set of possible input symbols for error recovery already.

Given "a.", every parser should be able to tell you what can come next, whether it is punctuation or an identifier. The grammar itself tells you what is possible. You don't need to insert gibberish to fool your parser.  This is a weakness in the parsing strategy.  If the next symbol is an identifier then you simply return all possible identifiers from that context in getVariants(). Given "a.b", we see an identifier without having to guess what comes next and can once again find all possible identifiers visible in that context, this time filtering for those that start with b or whatever.

I love Intellij and refuse to use other IDE, but there is a reason why there are essentially no plug-ins compared to other IDEs. You know the usual complaints: very little documentation, very little comments in the code, and the jetbrains humans in this forum always respond to your questions with: "why do you want to know that" or "why should you ever want to do that". I've actually gotten the hilarious response from one of you guys that it's an important thing for new people in the company not to have any comments in the code.

So, it is not a bug in the sense that you specifically designed to do this I guess. Just know that you will never convince this parsing expert that what you're doing is necessary or reasonable. BTW, I do not specify anything that would make auto completion turn on for this preview input editor so it shouldn't even be inserting that in the first place.

Finally, if the lexer is supposed to scan that crazy string, then why do you sometimes pass an ending index that makes my lexer stop at the start of that crazy string?  You are saying that is not a bug to scan that string. Ok, well, then you have a bug because sometimes it tells me not to scan that string.  It cannot be correct both ways. Logic dictates that one of those two conditions must be a bug if one strategy is correct, right?

Terence

0

FWIW, I've written several language plugins for IntelliJ now and this has never been a problem for me (documentation yes, crazy symbols in my completion logic no). You say that they could insert a node that represents the set of input symbols that could come next, but that's more or less what that symbol is - although it doesn't give you a list of what could come next, it's up to the language logic to decide that. And that's the crux, I think - for better or for worse, the way IntelliJ works doesn't require a tight coupling of the parsing logic with autocompletion, and I'd consider that a feature, not a bug.

Anyway, the way it works is never going to change since it's a pretty integral part of a fundamental part of the platform, and it doesn't stop IntelliJ from providing the best completion around for non-trivial languages.

0

Hi Colin, yep, we're stuck with it. But why does it sometimes tell me to stop at the start of the string and sometimes to scan it?  That inconsistency is surely a bug.

Also, if I have given it an identifier already, it shouldn't add that string.

I think this only came up because this little test language does not allow normal identifiers. Most languages will not run into this problem, but one can imagine building a hexadecimal data editor or something in intellij as we have XML editors.

I think the solution for me is to have my input stream simply ignore this string; I just hope that it doesn't break some assumptions. Thanks for your time.

Ter

0

Solved by changing string to unusual char:

/** Avoid dummy ID that intellij always puts in; messes up ANTLR parse tree view. */
public class PreviewCompletionContributor extends CompletionContributor {
     /** appears as space but I don't flip to a dot like real space char in parse tree */
     public static final char DUMMY_IDENTIFIER = '\u001F';


     @Override
     public void beforeCompletion(@NotNull CompletionInitializationContext context) {
          context.setDummyIdentifier(String.valueOf(DUMMY_IDENTIFIER));
     }
}


Parse tree node has a nonprinting char to the right but works well enough. No other shenanigans worked. Tried everything.
0

The token is inserted into a copy of the file, so your lexer gets to lex your original and the copy at the same time, since you have probably typed something in the original document the original gets reprocessed by the lexer along with the copy that has been made. So during completion there are 2 copies of the file, one with the string and one without.

0

hi Jon! nice to hear from you. That's very interesting information. Maybe my solution is to simply ignore errors from a lexer working on the copy?  That would explain why I saw more lexing requests than I thought were needed. Excellent. I think this thread will serve people well in the future. I will stick with my weird character solution for now ;)

Ter

0

Thanks Terence, saved my bacon. I would agree that IntelliJIdeaRulezzz design decision is somewhere at the top of bad design decisions. Especially when it inserts a space after the string. However, your solution makes the workaround painless.

In markdown this was causing the element type to become TEXT instead of LINK_REF because [Some Text](link-ref) is parsed correctly but [Some Text](linIntellijIdeaRulezzz k-ref) is not recognized as an explicit link reference and is spit out by markdown as text because of it. So completion would not recognize the element type to be completed.

Didn't hit this for wiki links because they allow spaces in the link.

The solution to use a non-common character for the DUMMY_IDENTIFIER solves the problem since I only need it to parse correctly so that the element under cursor corresponds to what is in the original file.

Whew!

0

Yes, I have and thank you very much for putting those notes together. It was there that I first learned of the IntellijIdeaRulezzz. When I encountered this problem myself I was not surprised to see the dreaded string.

0

If the trailing whitespace bothers you, I just noticed that CompletionInitializationContext contains a second trimmed constant named DUMMY_IDENTIFIER_TRIMMED that should avoid your problems, in fact it is used by JavaCompletionContributor.

0

Thanks. I will try this setting.

Yes, it is the whitespace that was causing the problem. Not everywhere in the markdown grammar can you add whitespace and expect the parsing to remain the same.

0

Please sign in to leave a comment.