Custom Language: Unclosed comment cannot raise an error

In the ParserDefinition for a custom language you usually give a set of comment tokens for your language. There seems to be an issue with that, because thosen token are never parsed and if you have a slighty more complex pattern for comments (like nesting), it is easily possible that your lexer is still processing an unclosed comment when it reaches the EOF. So if you have a state in your jflex file for being in a comment like this

{CommentStart}               { yypushstate(IN_COMMENT); return MathematicaElementTypes.COMMENT_START;}
[^\(\*\):]*                  { return MathematicaElementTypes.COMMENT_CONTENT; }
"::"[A-Z][A-Za-z]*"::"       {return MathematicaElementTypes.COMMENT_SECTION; }
":"[A-Z][A-Za-z ]*":"        {return MathematicaElementTypes.COMMENT_ANNOTATION; }
{CommentEnd}                 { yypopstate(); return MathematicaElementTypes.COMMENT_END; }
[\*\)\(:]                   { return MathematicaElementTypes.COMMENT_CONTENT; }
.                         { return MathematicaElementTypes.BAD_CHARACTER; }

when you reach the EOF, the lexer is finished, but the parser doesn't care that you are still in an unclosed comment.

Since testing for EOF (and eating this input) should (?) not be done to not drive IDEA crazy, is there a way to make my plugin showing an error?

Btw, the Gramar-Kit plugin does show the same wrong behavior when you leave a /* comment unclosed at the end of the file


Comment actions Permalink

That's an interesting observation!

I've just checked the Java PSI implementation and found out that gutter mark "Unclosed comment" is generated by Java Annotator.
See com.intellij.codeInsight.daemon.impl.analysis.HighlightUtil#checkUnclosedComment on github.

It seems that this error check is better be done on a higher level than lexer or parser.

However unclosed string tokens in Grammar-Kit are treated like BAD_CHARACTER by the lexer.

Comment actions Permalink

Hi Gregory,

I assume that in most cases comments are scanned in one single rule and either the lexer returns a comment token or it doesn't.
If it cannot scan a comment because it's not closed, then no other rule will match and the lexer will return the error. I guess this is the reason
that you can tell the parser-definition what comment-token do you have without that you have to specify open/close tokens.

Anyway, the issue is not critical. I just wanted to point it out and ensure that I haven't overseen a simple mistake of mine.



Please sign in to leave a comment.