joy template language highlighting plugin

Hi, I'm trying to create plugin for this tpl language: https://github.com/asdfgh11111/joy.

It turned out much harder for me than I expected probably because I always was aside of java world... Currently I have html highlighted in joy files and kind of stuck creating lexer/parser for joy itself. So need some help/advice.

1. Workflow of implementing lexer overall not clear for me. In official tutorial it's like in drawing owl meme but in real world I start with something simple and then add/edit functionality which seems not handy in plugin project. Currently I end up with idea that it would be much handier to create for lexer separate project with original jflex, test it there and port it to plugin then. Or are there any tools for this?

2. I'm not sure jflex is the best solution for joy. Here is a bit bigger example of language: https://github.com/asdfgh11111/joy/blob/master/example/tpl/layout.joy.
Previously I've tried to implement lexer and parser with antlr4 and had problems with lack of lookahead functionality in lexer. As I got jflex has kind of similar limitations...
Is there any easier solution to apply original peg grammar for highlighting?
I have second idea - to lex input just in words symbols and spaces and then handle everything in parser. Is it good idea for highlighting? :)

3. I'm not sure I understand for what cases incremental lexing feature is.
>For example, a Java lexer could have separate states for top level context, comment context and string literal context.
Usually for comments and for string literals we use single token. Not clear why do we need states here. Where as in other case if we call smth 'state' usually there is stack of states and then I don't understand how it will work in general.

 

0
5 comments

1. You just creating a lexer, making tests and then add more features to it. Basically, JFlex itself has little to do with Java, It's just some FA description. Regexps and states. Also, IRR Jflex is just a java implementation of some flex generator. Don't see a sense to make it separately and  integrate after. I believe it's possible to use any tool, the question is - necessary amount of work. And believe - jflex is just good enough for this.

2. JFlex has lookaheads, what exact limitations you are talking about? Basically lexing is splitting text into tokens of different types - identifers, numbers, ats and so on. Syntax tree is built in parser. Lexer should be fast and, therefore, lets say - dumb :)

3. Lexer is used for 2 things

- building psi tree. For such cases IDE lexes all the file and builds new tree. There are some internal optimizations, but this is good enough explanation for now.

- highlighting. There are two levels of it - lexer based (fast one) and annotator based (slower one). If token type may be detected in lexer - it's going to be highlighted really fast immediately after lexing, based on token type (not it's text, context or smth. only type).  

If it requires some additional information, like position in syntax tree or more context - annotator may be used. It's started after syntax tree is built.

Of course, you can implement your parsing logic twice (or almost twice) and make your lexer distinct different identifiers in different contexts, but this may be too heavy. No golden rule here, just experience and feeling. Probably you should start from simple lexer and then, try to complicate it if possible and necessary. 

Incremental lexing may be necessary for responsive highlighting. For example, you have a very long file. And you are typing something in the middle of it. IDE can lex all the file and re-highlight it. But it may be slow and useless, because highlighting may be almost the same. Instead of it, IDE finds last position before typing offset where lexer was in initial state and start lexing from there. This allows to lex only little range of the file and re-highlight it. 

Concerning the states in comments or literals - these are implementation details. You may do as you want.

0
Avatar
Permanently deleted user

This is 'foo' text and then @foo variable and then @bar('function with argument') {and here is inner text with 'inner @baz variable'}.

Expected tokens:
"this is 'foo' text and then " - content
"@foo" - id
" variable and then " - content
"@bar" - id
"(" - arg open
"'function with argument'" - string literal
")" - arg close
" {" - block open
"and here inner text with 'inner " - content
"@baz" - id
" variable'" - content
"}" - block close
"." content

1. Unfortunately I'm not good enough yet to just write tests from scratch. As soon I've try do to so I fall into a lot of different java things starting from gradle/maven project setup and eding with difference between lexers generated be jflex and grammar kit plugin.
It would be much handier to have a tool/plugin that take input and just list tokens in similar way I've did above automatically(like for kids :D). Probably it's not top requested feature so as I got no workaround for me here :(.

2. > Lexer should be fast and, therefore, lets say - dumb :)
That's a problem because of joy syntax... I've tried to illustrate it in above example. As I got from ur answer with "second idea" it also will not be easy and fast cause in that case all highliting will be done on annotator level.
Limitation I've met with antlr lexer in negative lookahed for this cases:
'@hello() world!' '@hello() {world}!' '@hello()     {world}!'
there was no syntax to check whether block goes after function call or no(however it could be done by java helpers I think).
For jflex I haven't tried yet, just see them kind of similar.

3. So technically with IL relexing occurs within whole state, right? And in case I fall into fat lexer(with stack of states etc), does it mean I can't use IL then?

0

1. I've started my work on plugin before I was working in JetBrains. So I can share my experience:

- use gradle with intellij and grammar-kit plugins

- make tests. This will save you a lot of time on catching and fixing lexer and parser regressions. Lexer tests are pretty simple, just extend org.jetbrains.plugins.ruby.ruby.testCases.LexerTestCase (see usages in oss plugins)

- see how things are implemented in oss plugins

2. Your example looks pretty straightforward to me. Why do you need a lookahead in lexer if @some is id anyway?

3. Try to avoid complex things with stacks and so on. Just states and patterns. It's not always possible, but try. Still, even with those you may benefit from incremental lexing. Your lexer should report initial state not by DFM state, but DFM state & empty states stack, for example.

 

 

0
Avatar
Permanently deleted user

1. thx, have big progress with it.

2. I used to use states. So for example:
<div>@foo(bar() {
    <span>@baz()</span>
})</div>
after ")" lexer must decide whether pop out from function(in example upper state can be whether content or arguments context) state or continue with block.
Have no idea how to handle it without states.
Regarding jflex: for now it covers what I need I think. I have two things left for impl: function calls and if with else. Will see.

0

function is a complex element. A subtree. Lexer should work on lower level. Separate identifiers, numbers, dots, etc. Probably, try to categorize identifiers.

Again - not always possible. 

0

Please sign in to leave a comment.