Implementing a custom syntax highlighter based on psi changes, can't use lexer

Answered

Hello, I am working on a language plugin for IntelliJ platform.  The problem that I have is that this language  has a "position sensitive" lexer. It means that the same combination of symbols can be treated differently based on the context. For instance there is a syntax for java-like generics and you can also write xml like tags.  I can't determine it in the lexer, I even reworked PsiBuilder. All was good until I tried to implement syntax highlighting. I obviously can't use SyntaxHighlighter, so I tried to implement EditorHighlighter with PsiTreeChangeListener. It does not work well. It lags. And PsiTreeChangeListener leaks. It does not seem there is a good way to dispose of PsiTreeChangeListener in EditorHighlighter. I would be very grateful if someone could give me a tip or guide me into the right direction. 

// EditorHighlighterImpl.kt

override fun setEditor(editor: HighlighterClient) {
this.editor = editor
val editingVsFile = FileDocumentManager.getInstance().getFile(editor.document)
val psiFile = psiDocumentManager.getPsiFile(editor.document)
tokens = extractTokensFromPsiFile(psiFile)

// FIXME: Leaks
val psiTreeChangeListener = object : PsiTreeChangeAdapter() {
private var lastModificationStamp: Long = -1

override fun childrenChanged(event: PsiTreeChangeEvent) {
if (editor is Editor && editor.isDisposed) {
// A dummy way of tring to dispose of the listener
psiManger.removePsiTreeChangeListener(this)
return
}

val psiFile = psiFile ?: return

if (psiFile.virtualFile != editingVsFile)
return

val newModificationStamp = psiFile.modificationStamp
if (newModificationStamp == lastModificationStamp) {
return
}

lastModificationStamp = newModificationStamp

ApplicationManager.getApplication().executeOnPooledThread {
runReadAction {
tokens = extractTokensFromPsiFile(psiFile) // This is SegmentArrayWithData
}
runInEdt {
editor.repaint(0, psiFile.textLength)
}
}
}
}

psiManger.addPsiTreeChangeListener(psiTreeChangeListener, highlighterDisposable)
}

override fun createIterator(startOffset: Int): HighlighterIterator {
return PsiHighlighterIterator(
tokens,
currentTokenIndex = tokens.findSegmentIndex(min(startOffset, tokens.lastValidOffset)),
document = editor?.document
)
}

 

// Langauge example

class A<tag> // here <tag> is lexed as LT, ID, LT

{

}

<tag>   <!-- here <tag> is lexed as OPEN_TAG -->
some plain text
class is not a keyword, keywords are treated as text
// this is not a comment, but simple raw text
<!-- this one is a comment -->
</tag>

 

4 comments
Comment actions Permalink

> I can't determine it in the lexer, I even reworked PsiBuilder.

So it's not possible for you to use lexical states in Lexer definition directly - at all?

0
Comment actions Permalink

Hello, thank you for your response. 

It is correct. I can't determine all the lexemes in advance. A lexeme type can be different depending on the parsing context. In the standard PsiBuilder implementation all the lexemes are calculated before the the parsing starts, that is why I could not use it. I created something like LexAsYouGoPsiBuilder where I can change the current state of the lexer to change the produced tokens. I absolutely have to use a completely parsed PsiFile in order to show highlighting correctly. What is the best way to implement it? Maybe there is something in the SDK that I can use. My current implementation is a quick hack (Check the original post). First of all it leaks, but more than that it starts to lag when the file size becomes bigger than a couple of thousand lines of code. If someone with the knowledge of the platform could guide in how to approach it I  would be very thankful. 

Here is an example of what tokens are produced depending on the context (there are even more different variations). Note that  "a-b" and  "some-attribute" are treated differently, the same as "// eol comment" can be a comment or just some plain text

function test() {
createList<int>(a-b); // eol comment
/*
ID(createList)
LT
ID(int)
GT
LEFT_PARENTHESIS
ID(a)
MINUS
ID(b)
RIGHT_PARENTHESIS
SEMICOLON
WHITESPACE
EOL_COMMENT(// eol comment)
*/

<tag some-attribute>(); // eol comment </tag>;
/*
LT
ID(tag)
ID(some-attribute)
GT
WHITESPACE((); // eol comment )
LT
SLASH
ID(tag)
GT
SEMICOLON
*/
}
0
Comment actions Permalink

At least in this sample, it looks feasible to switch between "XML" and "Java" code context purely in Lexer (first token `<` as switch)

There's also com.intellij.lang.SyntaxTreeBuilder#remapCurrentToken and com.intellij.lang.ITokenTypeRemapper that might eventually help here.

Alternatively, you might implement custom highlighting by providing com.intellij.codeHighlighting.TextEditorHighlightingPass

0
Comment actions Permalink

Thanks for com.intellij.codeHighlighting.TextEditorHighlightingPass tip. I will have a look. I guess it is what I was looking for.

0

Please sign in to leave a comment.