Best way of handling comments

Hi, 

 

I am developing plugin for custom language -  and there are comments, and they can be everywhere (like after dot of function invocation, just after the operator etc) -  reflecting this in BNF  would be bit of complicated ( a lot of places to put them in will complicate it).    Of course, I could emit  whitespace  tokens for them out of lexer,   and then parser will ignore them  -   but I still like to have  proper hilghlighting  -  so definitely need to e able to distignuish them somehow. 

 

My best idea at the moment, is tio provide param for a lexer ,  and emit proper comment tokens while im highlighting mode and spaces when not. 

 

like:

 

<YYINITIAL> {COMMENT_START}                                 {
yybegin(COMMENT);
stepInComment();
if(highlighting) {
return MMTalkTypes.COMMENT_START;
} else {
return TokenType.WHITE_SPACE;
}
}

Or is there better way to do this? 

 

0
4 comments

Hi Konstantin!

"An important feature of PsiBuilder is its handling of whitespace and comments. The types of tokens which are treated as whitespace or comments are defined by the methods getWhitespaceTokens() and getCommentTokens() in the ParserDefinition class. PsiBuilder automatically omits whitespace and comment tokens from the stream of tokens it passes to PsiParser, and adjusts the token ranges of AST nodes so that leading and trailing whitespace tokens are not included in the node."

From https://github.com/JetBrains/intellij-sdk-docs/blob/master/reference_guide/custom_language_support/implementing_parser_and_psi.md

0

Thanks! Thas solves it.   Just overlook that I  have 3 comment tokens ( also beginn and end of comment )

Replacing:

public static final TokenSet COMMENTS = TokenSet.create(MMTalkTypes.COMMENT);


With 

public static final TokenSet COMMENTS = TokenSet.create(MMTalkTypes.COMMENT_CONTENT,  MMTalkTypes.COMMENT_START, MMTalkTypes.COMMENT_END);


Solved the problem.    COMMENT token was also defined ( as composite):

COMMENT ::= COMMENT_START COMMENT_CONTENT* COMMENT_END

Had to do this because language allows nested comments (  does not make sense -  but it is legacy and so it stays as it is) 

0

Nested comments shall be tracked by lexer via JFlex custom logic (states stack).

The rule below will never match because comment tokens are skipped by PsiBuilder:

COMMENT ::= COMMENT_START COMMENT_CONTENT* COMMENT_END

 

0

Actually it does not matter anymore.  But having those terminals defined in BNF provides string constants for tokes which are needed for lexer. And it does pay to reset custom stack depth.   So I ended with:

 

%{

int commentDepth = 0;

void stepInComment() {
commentDepth++;
}
void stepOutComment() {
commentDepth--;
}

%}

And States:

 


<YYINITIAL> {COMMENT_START} {

commentDepth = 0;
yybegin(COMMENT);
stepInComment();
return MMTalkTypes.COMMENT_START;
}
<COMMENT> {COMMENT_START} {
stepInComment();
return MMTalkTypes.COMMENT_CONTENT;
}

<COMMENT> {COMMENT_END} {
System.err.println("comment end in comment");
stepOutComment();
if(0 == commentDepth) {
yybegin(YYINITIAL);
return MMTalkTypes.COMMENT_END;
}
return MMTalkTypes.COMMENT_CONTENT;
}


<COMMENT> {COMMENT_CONTENT} {
return MMTalkTypes.COMMENT_CONTENT;
}
0

Please sign in to leave a comment.