Best way of handling comments
Hi,
I am developing plugin for custom language - and there are comments, and they can be everywhere (like after dot of function invocation, just after the operator etc) - reflecting this in BNF would be bit of complicated ( a lot of places to put them in will complicate it). Of course, I could emit whitespace tokens for them out of lexer, and then parser will ignore them - but I still like to have proper hilghlighting - so definitely need to e able to distignuish them somehow.
My best idea at the moment, is tio provide param for a lexer , and emit proper comment tokens while im highlighting mode and spaces when not.
like:
<YYINITIAL> {COMMENT_START} {
yybegin(COMMENT);
stepInComment();
if(highlighting) {
return MMTalkTypes.COMMENT_START;
} else {
return TokenType.WHITE_SPACE;
}
}
Or is there better way to do this?
请先登录再写评论。
Hi Konstantin!
"An important feature of PsiBuilder is its handling of whitespace and comments. The types of tokens which are treated as whitespace or comments are defined by the methods
getWhitespaceTokens()
andgetCommentTokens()
in the ParserDefinition class. PsiBuilder automatically omits whitespace and comment tokens from the stream of tokens it passes to PsiParser, and adjusts the token ranges of AST nodes so that leading and trailing whitespace tokens are not included in the node."From https://github.com/JetBrains/intellij-sdk-docs/blob/master/reference_guide/custom_language_support/implementing_parser_and_psi.md
Thanks! Thas solves it. Just overlook that I have 3 comment tokens ( also beginn and end of comment )
Replacing:
With
Solved the problem. COMMENT token was also defined ( as composite):
Had to do this because language allows nested comments ( does not make sense - but it is legacy and so it stays as it is)
Nested comments shall be tracked by lexer via JFlex custom logic (states stack).
The rule below will never match because comment tokens are skipped by PsiBuilder:
Actually it does not matter anymore. But having those terminals defined in BNF provides string constants for tokes which are needed for lexer. And it does pay to reset custom stack depth. So I ended with:
And States: