Custom Language: How to re-lex user input when it finally matches a token?

I have a JFlex language which describes a simple grammar which is slightly different from most in that it does not have an 'identifier' token. There are some keywords, string literals, and brace pairs, but no identifiers.

Using this grammar as a highlighting lexer in an IDEA custom language, when the user starts typing a keyword, the characters are lexed as bad characters since they don't match anything in the grammar yet. However, once the user has finished typing the keyword, it remains highlighted as bad characters. If I then go back to the beginning of the keyword and insert a space, the word then becomes recognized a keyword and is highlighted correctly.

I assume this is because the highlighter is doing incremental lexing of each character typed but when the space is inserted earlier in the line, it re-lexes the whole line or to the end of the file?

What can I do to fix this?



I wouldn't make use of the "invalid character" highlighting as this should IMO be reserved for single unexpected characters (i.e. typical non-identifier characters). I'd instead extend the grammar with an "invalid-keyword" token that matches something like \w+ to catch keywords that aren't completely typed yet or are misspelled in some way. Then, in your highlighter, define this token to get the color for unknown symbols and you should be set.



That's interesting. I ended up adding an identifier token to my grammar just to work around the problem but I didn't go so far as making it highlight as invalid. I'll have a look at how that works.



Maxim Mossienko also had this to say: "Highlighter does lexing from the point where lexer has initial state.
Probably, your lexer could switch the state on encountering invalid
input text to noninitial state. if so highlighter will relex from valid
position before the invalid input tokens."

I'm still using the identifier token workaround but I'll try the above at some point as well.


Please sign in to leave a comment.