Best way of handling comments

Created May 22, 2018 06:53

Hi,

I am developing plugin for custom language - and there are comments, and they can be everywhere (like after dot of function invocation, just after the operator etc) - reflecting this in BNF would be bit of complicated ( a lot of places to put them in will complicate it). Of course, I could emit whitespace tokens for them out of lexer, and then parser will ignore them - but I still like to have proper hilghlighting - so definitely need to e able to distignuish them somehow.

My best idea at the moment, is tio provide param for a lexer , and emit proper comment tokens while im highlighting mode and spaces when not.

like:

<YYINITIAL> {COMMENT_START}                                 {
                                                                yybegin(COMMENT);
                                                                stepInComment();
                                                                if(highlighting) {
                                                                     return MMTalkTypes.COMMENT_START;
                                                                } else {
                                                                     return TokenType.WHITE_SPACE;
                                                               }
                                                             }

Or is there better way to do this?

4 comments

Gregory Shrago

Created May 22, 2018 12:09

Hi Konstantin!

"An important feature of PsiBuilder is its handling of whitespace and comments. The types of tokens which are treated as whitespace or comments are defined by the methods getWhitespaceTokens() and getCommentTokens() in the ParserDefinition class. PsiBuilder automatically omits whitespace and comment tokens from the stream of tokens it passes to PsiParser, and adjusts the token ranges of AST nodes so that leading and trailing whitespace tokens are not included in the node."

From https://github.com/JetBrains/intellij-sdk-docs/blob/master/reference_guide/custom_language_support/implementing_parser_and_psi.md

Konstantin Pribluda

Created May 22, 2018 13:23

Thanks! Thas solves it. Just overlook that I have 3 comment tokens ( also beginn and end of comment )

Replacing:

public static final TokenSet COMMENTS = TokenSet.create(MMTalkTypes.COMMENT);

With

public static final TokenSet COMMENTS = TokenSet.create(MMTalkTypes.COMMENT_CONTENT,  MMTalkTypes.COMMENT_START, MMTalkTypes.COMMENT_END);

Solved the problem. COMMENT token was also defined ( as composite):

COMMENT ::= COMMENT_START COMMENT_CONTENT* COMMENT_END

Had to do this because language allows nested comments ( does not make sense - but it is legacy and so it stays as it is)

Gregory Shrago

Created May 22, 2018 13:28

Nested comments shall be tracked by lexer via JFlex custom logic (states stack).

The rule below will never match because comment tokens are skipped by PsiBuilder:

COMMENT ::= COMMENT_START COMMENT_CONTENT* COMMENT_END

Konstantin Pribluda

Created May 22, 2018 13:38

Actually it does not matter anymore. But having those terminals defined in BNF provides string constants for tokes which are needed for lexer. And it does pay to reset custom stack depth. So I ended with:

%{

    int commentDepth = 0;

    void stepInComment() {
        commentDepth++;
    }
    void stepOutComment() {
        commentDepth--;
    }

%}

And States:


<YYINITIAL> {COMMENT_START}                                 {

                                                                commentDepth = 0;
                                                                yybegin(COMMENT);
                                                                stepInComment();
                                                                return MMTalkTypes.COMMENT_START;
                                                            }
<COMMENT>  {COMMENT_START}                                  {
                                                                stepInComment();
                                                                return MMTalkTypes.COMMENT_CONTENT;
                                                            }

<COMMENT>  {COMMENT_END}                                    {
                                                                System.err.println("comment end in comment");
                                                                stepOutComment();
                                                                if(0 == commentDepth) {
                                                                    yybegin(YYINITIAL);
                                                                    return MMTalkTypes.COMMENT_END;
                                                                }
                                                                return MMTalkTypes.COMMENT_CONTENT;
                                                            }


<COMMENT> {COMMENT_CONTENT}                                  {
                                                                 return MMTalkTypes.COMMENT_CONTENT;
                                                            }

Please sign in to leave a comment.