Highlighter does not work for the first token

Created March 01, 2016 18:22

I wrote a small custom language with my own lexer and highlighter.

I've got a token URL that is highlighted as a bold light blue text. It is highlighted well except the first line.

When I write an url on the first line of the editor It is not highlighted well. My url become highlighted well when I add an empty new line before the url. Nothing change when I undo a new inserted line. Url is bold and light blue.

Can you help me to understand what wrong with my lexer/highlighter/etc ?

Here you can see my highlighter behavior - http://www.youtube.com/watch?v=HvpnmaK8UkE

10 comments

Imants Cekusins

Created March 01, 2016 20:41

Hello Denis,

did you check element structure with PSI Viewer? is it as expected?

Denis Chernyshov

Created March 02, 2016 04:31

Yes, I checked element structure with PSI Viewer. The element structure didn't changed. You can see it in my video.

http://www.youtube.com/watch?v=VRe9IUTvnVA

In this video I made cut-and-paste. The element structure didn't change after paste but my url was highlighted correctly.

Imants Cekusins

Created March 02, 2016 08:27

I did not notice there was PSI viewer in the first video. :-P

Lexer definition possibly expects a crlf. Beginning of file and crlf are not the same thing.

how did you define lexer? with grammar kit?

Denis Chernyshov

Created March 02, 2016 10:54

Yes, I defined my lexer with the grammar kit.

The lexer works well and split the text in a right way - PSI viewer shows the correct element tree.

If I select all text (cmd+a), cut it in the clipboard (cmd+x) and paste it back (cmd+v) my URL is highlighted well.

Here is my lexer rules

HTTP_METHOD = ("GET" | "POST" | "PUT" | "PATCH" | "DELETE")
OPTION = "--" {LINE}?
HEADER = "@" {LINE}?
PARAM = "&" {LINE}?
COMMENT = "#" {LINE}?
URL = "http" "s"? "://" {LINE}?
CRLF = \n|\r|\r\n
WHITE_SPACE = [\ \t\f]
SEPARATOR = {WHITE_SPACE}* "%%%" {WHITE_SPACE}*
LINE = [^\r\n]+
FULL_LINE = [^\ \t\f\r\n]+ [^\r\n]*
EMPTY = [^]

%state S_METHOD S_OPTION S_URL S_PARAM S_REQ_HEADER S_RESP_HEADER
%state S_RESP_BODY S_REQ_BODY

<YYINITIAL, S_METHOD, S_OPTION, S_URL, S_PARAM, S_REQ_HEADER, S_RESP_HEADER> {
{COMMENT} { return RestTypes.COMMENT; }
<S_RESP_BODY, S_REQ_BODY> {
{CRLF} { return RestTypes.CRLF; }
{WHITE_SPACE}+ { return RestTypes.WHITE_SPACE; }
}
}

<YYINITIAL> {
{OPTION} { yybegin(S_OPTION); return RestTypes.OPTION; }
{HTTP_METHOD} { yybegin(S_METHOD); return RestTypes.METHOD; }
{URL} { yybegin(S_URL); return RestTypes.URL;}
}

<S_OPTION> {
{OPTION} { return RestTypes.OPTION;}
{HTTP_METHOD} { yybegin(S_METHOD); return RestTypes.METHOD; }
{URL} { yybegin(S_URL); return RestTypes.URL;}
}

<S_METHOD> {
{URL} { yybegin(S_URL); return RestTypes.URL;}
}

<S_URL> {
{PARAM} { yybegin(S_PARAM); return RestTypes.PARAM;}
{HEADER} { yybegin(S_REQ_HEADER); return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}

<S_PARAM> {
{PARAM} { return RestTypes.PARAM;}
{HEADER} { yybegin(S_REQ_HEADER); return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}

<S_REQ_HEADER> {
{HEADER} { return RestTypes.HEADER;}
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}

<S_REQ_BODY> {
{SEPARATOR} { yybegin(S_RESP_HEADER); return RestTypes.SEPARATOR;}
{FULL_LINE} { yybegin(S_REQ_BODY); return RestTypes.REQUEST_BODY_LINE;}
}

<S_RESP_HEADER> {
{HEADER} { return RestTypes.HEADER;}
{FULL_LINE} { yybegin(S_RESP_BODY); return RestTypes.RESPONSE_BODY_LINE;}
}

<S_RESP_BODY> {
{FULL_LINE} { return RestTypes.RESPONSE_BODY_LINE;}
}

. { return RestTypes.BAD_CHARACTER; }

Imants Cekusins

Created March 02, 2016 11:04

I'd make these changes:

CRLF = [\n\r]+

WHITE_SPACE = [\s\t\f]

what is the purpose of these:
LINE = [^\r\n]+
FULL_LINE = [^\ \t\f\r\n]+ [^\r\n]*
EMPTY = [^]

are they necessary?

Denis Chernyshov

Created March 02, 2016 11:19

EMPTY = [^] - can be removed

LINE - matches a remaining characters on the line from any special prefix to the end of the line | ex: (COMMENT_PREFIX + LINE) = (#sdfdsfdsfs)

FULL_LINE - matches all characters on the line from the first none whitespace symbol to the end of line

Imants Cekusins

Created March 02, 2016 11:25

does it make any difference after you remove EMPTY?

Denis Chernyshov

Created March 02, 2016 11:32

I removed EMPTY and nothing changed - http://www.youtube.com/watch?v=0OLu51T_BCw

Imants Cekusins

Created March 02, 2016 12:28

I don't know enough about lexer notation to help. Sorry.

Denis Chernyshov

Created March 10, 2016 19:19

I've got a solution to my problem - https://intellij-support.jetbrains.com/hc/en-us/community/posts/207236875-Token-s-highlighter-does-not-work-with-long-strings?page=1#community_comment_207557209

Please sign in to leave a comment.