documentation of how a lexer is called when used for highlighting

Hi,

I'm trying to build a custom language plugin. I'm trying to use ANTLR for lexing/parsing, but i'm having some trouble. Is there some documentation on how the lexer object is called and what each method is supposed to do? Basically, i'm trying to find out how the IDE calls the lexer and how it's it's supposed to progress through it's input stream. One of the issues i'm been having is it constantly calling start(). Thanks for the help.

4 comments
Comment actions Permalink

http://www.jetbrains.com/idea/documentation/idea_5.0.html

Bryan H. Haber wrote:

Hi,

I'm trying to build a custom language plugin. I'm trying to use ANTLR for lexing/parsing, but i'm having some trouble. Is there some documentation on how the lexer object is called and what each method is supposed to do? Basically, i'm trying to find out how the IDE calls the lexer and how it's it's supposed to progress through it's input stream. One of the issues i'm been having is it constantly calling start(). Thanks for the help.



--
Best regards,
Maxim Mossienko
IntelliJ Labs / JetBrains Inc.
http://www.intellij.com
"Develop with pleasure!"

0
Comment actions Permalink

Thanks for the reply. I've been looking through that already, but it doesn't say when certain methods are called. For instance, it doesn't mention the advance() method. Is there something with more depth?

0
Comment actions Permalink

another example of the type of information i'm looking for; is every character of the charstream supposed to be accounted for with a token? So if your valid tokens are 'int' and whitespace, what is 'int iint' supposed to produce? an INT and WHITESPACE token or an INT, WHITESPACE then BAD_CHARACTER token?

0
Comment actions Permalink

Hello Bryan,

another example of the type of information i'm looking for; is every
character of the charstream supposed to be accounted for with a token?


Yes.

So if your valid tokens are 'int' and whitespace, what is 'int iint'
supposed to produce? an INT and WHITESPACE token or an INT,
WHITESPACE then BAD_CHARACTER token?


The latter is correct.

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"


0

Please sign in to leave a comment.