Custom Language: Check syntax with lexer
Answered
Hello everyone,
I noticed that I probably have issues understanding the sense of states of a JFlex lexer.
I used Grammar-Kits .bnf file to generate a parser, so I described the programs syntax in the .bnf file.
I understand the general uses of lexer states, but in what terms do I need to implement / check the syntax of the program again?
Is there a point against having a single state (if the syntax allows it) which returns every possible token for each keyword, number, ...?
How complex should the lexer be compared to the parser?
Please sign in to leave a comment.
Hi,
Quick intro:
The main requirement for lexers is to be able to split the file text into language tokens as they appear in the file, without any order/syntax validation. For example:
in Java language could be tokenized by lexer to
identifier
,plus_opeator
,increment_operator
,return_keyword
,void_keyword
,boolean_keyword
,string_literal
, and it is a correct lexer’s output.States are not used to validate the syntax in the lexer. It is the parser’s responsibility to validate if they appear in the correct order and make sense as a program code.
States should be used in cases when:
If your language doesn’t have any corner cases and is simple, a single-state lexer with simple rules that match tokens is enough.