Syntax Highlighting gets stuck

I'm trying to develop a custom language plugin as inspired by the recent blog describing how.

I have a subset of my antlr grammar implemented in Grammar-Kit and JFlex, and and have basically gotten
through the  syntax highlighting step.

The problem I'm experiencing is that unless certain token gets entered into the editor atomically
via copy&paste, that they show up with BAD_CHARACTER highlighting.

Even if I correct the syntax of the text in the editor, and the PsiViewer shows the proper tokens
in the tree, the text has been poisoned by being temporarily invalid.

Is there something I need to do do force the refresh of the syntax highlighting?

Comment actions Permalink

Is it possible for you to create a screencast to demonstrate the problem?

BAD_CHARACTER highlighting means that the JFlex lexer returns BAD_CHARACTER tokenTypes for the tokens.

Possible answer:

Using Grammar-Kit it is possible to right a text-driven parser and not token-type-driven so the PSI will look good even if tokens are not.
In the generated code you will see that consumeToken(builder, text) variant will be used instead of consumeToken(builder, tokenType).
Grammar-Kit will show text-matched tokens in different color in editor.

Text-driven parsing is generally slower.

Another one:

If on pasting the correct code highlighting is OK then there's something wrong with the lexer.
Lexer output should be investigated in debugger to know for sure.
You may find the complete token sequence in PsiBuilder instance while debugging.

Comment actions Permalink

I've made some progress in this area while waiting for help.

It seems like the style of the parser needed for the editor and syntax highlighting needs to be significantly more foregiving than
one for code generation.

One token that showed my issue was one generated by a STRING_LITERAL rule.

But my rule needed a complete valid string, and didn't use the multi-state parsing I've seen in some examples.
So as I typed past the quote, I started returning BAD_CHARACTER tokens.  When the final QUOTE was entered, the STRING_LITERAL
token went into to PSI tree, but the highlighting stayed with 'bad' marking.

I've changed my string rule to look like:

STRING_LITERAL = \"([^\\\"\r\n]|{ESCAPE_SEQUENCE})*(\"|\\)?

and things seem better.

Is this an appropriate approach to take?



Comment actions Permalink

I use separate rules for valid and unclosed strings and other constructs. Something like that:

STRING_SINGLELINE=\' (\'\'|[^\'\r\n]) * \'
STRING_SINGLELINE_BAD=\' (\'\'|[^\'\r\n]) *

Seems to make everyone happy so far

Please sign in to leave a comment.