I've been working on a language plugin and once I started to actually try to use it myself, I noticed there seems to be a big problem somewhere between the lexer and the parser. Specifically, I'm working on a Haskell plugin and that language has an interesting aspect in that braces are required for program correctness, but the braces are not usually written by programmers themselves. Instead, language files are effectively preprocessed according to an "offside" or "layout" rule that inserts braces in the appropriate spots, based on newlines/indentation after certain keywords that require braces. I implemented this in my plugin with what is effectively a facade that sits between the lexer and its accessors so that when the class using the lexer asks for the next token, it might actually get a "virtual" brace if that's what the layout rule requires at that point in the source file.
This all works great for unit tests and every crazy source file I've thrown at it, at least for the initial parse. The trouble starts when I edit an existing file; I'll quickly see my PSI tree fall apart in the viewer after the edit point and see an error before the edit point saying, for example, "} expected" signifying, obviously, that the parser (which knows nothing about the layout rule) needs a closing brace but didn't get one from the lexer. While I can't rule out some insidious bug in my own code, I suspect this actually has something to do with the "incremental re-lexing" that the IntelliJ framework appears to perform. Given the natural of the layout rule (or at least my implementation thereof), re-lexing a small portion of the file without the surrounding context seems doomed to fail.
So first question: Are there any general guidelines around virtual tokens like this? Even it doesn't my specific situation, such guidelines might lead me in the right direction. Beyond that, are the any suggestions for how I might go about fixing this? I can think of a few to try:
1. Prevent incremental re-lexing either by hackery or by configuration. Might be willing to consider this path temporarily if it let me continue putting the plugin through its paces and designing features but the performance implications are likely to be too dire to seriously consider this option.
2. The lexer interface defines getting the current lexer state and a method to return to that state. Presumably IntelliJ keeps track of this state number and its association to tokens/text ranges so that it can restore the state when restarting the lexer in the middle of the file. If that is indeed how it works, then in theory some sort of lexer state manager could keep track of the layout stack at every point in the file and then lookup the right stack context for a given state number. This seems pretty nightmarish and fraught with error and probably a massive overload of the meaning of the lexer state which, from the few examples I've been able to find, seems more geared towards "I'm inside a string right now".
3. Some other nifty lexer feature that solves all my problems which I have been too blind to see yet.
The other possiblity I can see is learning more about how the lexer/parser interact. Specifically, when and how is the PSI tree refreshed? If the parse tree was always a full regen after the editor input was re-lexed, it would seem very feasible to put the layout processing shim in front of the parser as a pre-processing step.
Thanks for any suggestions and for taking the time to read this long-winded post.