Implementing a custom syntax highlighter based on psi changes, can't use lexer Follow
Hello, I am working on a language plugin for IntelliJ platform. The problem that I have is that this language has a "position sensitive" lexer. It means that the same combination of symbols can be treated differently based on the context. For instance there is a syntax for java-like generics and you can also write xml like tags. I can't determine it in the lexer, I even reworked PsiBuilder. All was good until I tried to implement syntax highlighting. I obviously can't use SyntaxHighlighter, so I tried to implement EditorHighlighter with PsiTreeChangeListener. It does not work well. It lags. And PsiTreeChangeListener leaks. It does not seem there is a good way to dispose of PsiTreeChangeListener in EditorHighlighter. I would be very grateful if someone could give me a tip or guide me into the right direction.
// EditorHighlighterImpl.kt
override fun setEditor(editor: HighlighterClient) {
this.editor = editor
val editingVsFile = FileDocumentManager.getInstance().getFile(editor.document)
val psiFile = psiDocumentManager.getPsiFile(editor.document)
tokens = extractTokensFromPsiFile(psiFile)
// FIXME: Leaks
val psiTreeChangeListener = object : PsiTreeChangeAdapter() {
private var lastModificationStamp: Long = -1
override fun childrenChanged(event: PsiTreeChangeEvent) {
if (editor is Editor && editor.isDisposed) {
// A dummy way of tring to dispose of the listener
psiManger.removePsiTreeChangeListener(this)
return
}
val psiFile = psiFile ?: return
if (psiFile.virtualFile != editingVsFile)
return
val newModificationStamp = psiFile.modificationStamp
if (newModificationStamp == lastModificationStamp) {
return
}
lastModificationStamp = newModificationStamp
ApplicationManager.getApplication().executeOnPooledThread {
runReadAction {
tokens = extractTokensFromPsiFile(psiFile) // This is SegmentArrayWithData
}
runInEdt {
editor.repaint(0, psiFile.textLength)
}
}
}
}
psiManger.addPsiTreeChangeListener(psiTreeChangeListener, highlighterDisposable)
}
override fun createIterator(startOffset: Int): HighlighterIterator {
return PsiHighlighterIterator(
tokens,
currentTokenIndex = tokens.findSegmentIndex(min(startOffset, tokens.lastValidOffset)),
document = editor?.document
)
}
// Langauge example
class A<tag> // here <tag> is lexed as LT, ID, LT
{
}
<tag> <!-- here <tag> is lexed as OPEN_TAG -->
some plain text
class is not a keyword, keywords are treated as text
// this is not a comment, but simple raw text
<!-- this one is a comment -->
</tag>
Please sign in to leave a comment.
> I can't determine it in the lexer, I even reworked PsiBuilder.
So it's not possible for you to use lexical states in Lexer definition directly - at all?
Hello, thank you for your response.
It is correct. I can't determine all the lexemes in advance. A lexeme type can be different depending on the parsing context. In the standard PsiBuilder implementation all the lexemes are calculated before the the parsing starts, that is why I could not use it. I created something like LexAsYouGoPsiBuilder where I can change the current state of the lexer to change the produced tokens. I absolutely have to use a completely parsed PsiFile in order to show highlighting correctly. What is the best way to implement it? Maybe there is something in the SDK that I can use. My current implementation is a quick hack (Check the original post). First of all it leaks, but more than that it starts to lag when the file size becomes bigger than a couple of thousand lines of code. If someone with the knowledge of the platform could guide in how to approach it I would be very thankful.
Here is an example of what tokens are produced depending on the context (there are even more different variations). Note that "a-b" and "some-attribute" are treated differently, the same as "// eol comment" can be a comment or just some plain text
At least in this sample, it looks feasible to switch between "XML" and "Java" code context purely in Lexer (first token `<` as switch)
There's also com.intellij.lang.SyntaxTreeBuilder#remapCurrentToken and com.intellij.lang.ITokenTypeRemapper that might eventually help here.
Alternatively, you might implement custom highlighting by providing com.intellij.codeHighlighting.TextEditorHighlightingPass
Thanks for com.intellij.codeHighlighting.TextEditorHighlightingPass tip. I will have a look. I guess it is what I was looking for.