Highlighter does not work for the first token
I wrote a small custom language with my own lexer and highlighter.
I've got a token URL that is highlighted as a bold light blue text. It is highlighted well except the first line.
When I write an url on the first line of the editor It is not highlighted well. My url become highlighted well when I add an empty new line before the url. Nothing change when I undo a new inserted line. Url is bold and light blue.
Can you help me to understand what wrong with my lexer/highlighter/etc ?
Here you can see my highlighter behavior - http://www.youtube.com/watch?v=HvpnmaK8UkE
请先登录再写评论。
Hello Denis,
did you check element structure with PSI Viewer? is it as expected?
Yes, I checked element structure with PSI Viewer. The element structure didn't changed. You can see it in my video.
http://www.youtube.com/watch?v=VRe9IUTvnVA
In this video I made cut-and-paste. The element structure didn't change after paste but my url was highlighted correctly.
I did not notice there was PSI viewer in the first video. :-P
Lexer definition possibly expects a crlf. Beginning of file and crlf are not the same thing.
how did you define lexer? with grammar kit?
Yes, I defined my lexer with the grammar kit.
The lexer works well and split the text in a right way - PSI viewer shows the correct element tree.
If I select all text (cmd+a), cut it in the clipboard (cmd+x) and paste it back (cmd+v) my URL is highlighted well.
Here is my lexer rules
HTTP_METHOD = ("GET" | "POST" | "PUT" | "PATCH" | "DELETE")
OPTION = "--" {LINE}?
HEADER = "@" {LINE}?
PARAM = "&" {LINE}?
COMMENT = "#" {LINE}?
URL = "http" "s"? "://" {LINE}?
CRLF = \n|\r|\r\n
WHITE_SPACE = [\ \t\f]
SEPARATOR = {WHITE_SPACE}* "%%%" {WHITE_SPACE}*
LINE = [^\r\n]+
FULL_LINE = [^\ \t\f\r\n]+ [^\r\n]*
EMPTY = [^]
%state S_METHOD S_OPTION S_URL S_PARAM S_REQ_HEADER S_RESP_HEADER
%state S_RESP_BODY S_REQ_BODY
%%
<YYINITIAL, S_METHOD, S_OPTION, S_URL, S_PARAM, S_REQ_HEADER, S_RESP_HEADER> {
{COMMENT} { return RestTypes.COMMENT; }
<S_RESP_BODY, S_REQ_BODY> {
{CRLF} { return RestTypes.CRLF; }
{WHITE_SPACE}+ { return RestTypes.WHITE_SPACE; }
}
}
<YYINITIAL> {
{OPTION} { yybegin(S_OPTION); return RestTypes.OPTION; }
{HTTP_METHOD} { yybegin(S_METHOD); return RestTypes.METHOD; }
{URL} { yybegin(S_URL); return RestTypes.URL;}
}
<S_OPTION> {
{OPTION} { return RestTypes.OPTION;}
{HTTP_METHOD} { yybegin(S_METHOD); return RestTypes.METHOD; }
{URL} { yybegin(S_URL); return RestTypes.URL;}
}
<S_METHOD> {
{URL} { yybegin(S_URL); return RestTypes.URL;}
}
<S_URL> {
{PARAM} { yybegin(S_PARAM); return RestTypes.PARAM;}
{HEADER} { yybegin(S_REQ_HEADER); return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}
<S_PARAM> {
{PARAM} { return RestTypes.PARAM;}
{HEADER} { yybegin(S_REQ_HEADER); return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}
<S_REQ_HEADER> {
{HEADER} { return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}
<S_REQ_BODY> {
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}
<S_RESP_HEADER> {
{HEADER} { return RestTypes.HEADER;}
{FULL_LINE} { yybegin(S_RESP_BODY); return RestTypes.RESPONSE_BODY_LINE;}
}
<S_RESP_BODY> {
{FULL_LINE} { return RestTypes.RESPONSE_BODY_LINE;}
}
. { return RestTypes.BAD_CHARACTER; }
I'd make these changes:
CRLF = [\n\r]+
WHITE_SPACE = [\s\t\f]
what is the purpose of these:
LINE = [^\r\n]+
FULL_LINE = [^\ \t\f\r\n]+ [^\r\n]*
EMPTY = [^]
?
are they necessary?
EMPTY = [^] - can be removed
LINE - matches a remaining characters on the line from any special prefix to the end of the line | ex: (COMMENT_PREFIX + LINE) = (#sdfdsfdsfs)
FULL_LINE - matches all characters on the line from the first none whitespace symbol to the end of line
does it make any difference after you remove EMPTY?
I removed EMPTY and nothing changed - http://www.youtube.com/watch?v=0OLu51T_BCw
I don't know enough about lexer notation to help. Sorry.
I've got a solution to my problem - https://intellij-support.jetbrains.com/hc/en-us/community/posts/207236875-Token-s-highlighter-does-not-work-with-long-strings?page=1#community_comment_207557209