Case-insensitive Grammar-Kit tokens?
I've configured my lexer for case-insensitivity, but in order for me to use the Grammar-Kit plug-ins Live Preview, the tokens also need to be defined there. Is there a way for me to make those tokens case-insensitive as well short of using regexp with character classes that include the upper- and lower-cased characters? I can do that, but it's obviously cumbersome if there's just a switch. Also, is there documentation on the token syntax somewhere, particularly the special classes such as {Alpha}, {Digit}, etc.?
Please sign in to leave a comment.
Regarding the second part of my question (and perhaps the first as well), it looks like these special character classes are just the ones supported by Java's Pattern regexp syntax. I still don't see a way to tell the pattern to be case-insensitive without using explicit [UPPERlower] character classes, though. Thoughts?
Figured it out. I can use "regexp:(?i)<pattern>" to make the pattern case-insensitive. Oddly when I don't do this (exact match), the tokens show up with syntax highlighting, but when I do this (case-insensitive regexp match), there's no highlighting. I don't know how Live Preview decides to highlight tokens, though, so I'm not too worried about it. I'm most interested in being able to use Live Preview to rapidly evolve the grammar and pinning/recovery rules.
I'm kind of resurrecting this thread because I have more questions about the same topic. Stupid me, I finally saw that I can stop maintaining my own lexer and generate the lexer from the Grammar-Kit tokens! This is absolutely great, but I'm having to do some manual post-processing of the generated file and wanted to know if I can avoid it.
First off, as noted earlier in this thread, the way that I've made the GK tokens case-insensitive is by using "regexp:(?i)<token>". In JFlex I would just use the "%ignorecase" directive. Right now GK is generating invalid tokens in my lexer of the form:
and of course it's not adding a "%ignorecase" directive. Is there some better way for me to tell GK that the tokens are case-insensitive so that I don't have to include "regexp:(?i)" and so that the resulting .flex file has "%ignorecase"?
Additionally because of the differences in regexp syntax (at least based on my understanding) between JFlex and GK, I had the following in my hand-coded lexer:
The TRADITIONAL_COMMENT expression is invalid because of the "?", and DOCUMENTATION_COMMENT doesn't seem to match properly, and I'm not sure how to reproduce the same expression that I'd hand-coded in GK because of the "~". I guess if I could figure out how to express "~'*/' in a GK regexp I could make these 100% consistent. Any guidance there?
Thanks much for any assistance on these two questions!
Here's my workflow:
1. I use "regexp:" tokens mostly for quick language prototyping with "Live Preview" mode which doesn't involve any project and code generation.
The main language structure first and later the error recovery attributes (pins & recoverWhile).
2. Then I generate *.flex file to a non-generated source root just to have something to begin with.
Lexers are programs with their own lifecycle. Grammar-Kit generates the simplest one-state JFlex lexer which in real life will definitely require manual tweaks, states and options.
3. Now it is time to actually generate lexer & parser, code parsing tests and build the first version of a new language plugin.
Java RE vs JFlex RE:
Not only Java regexp syntax differs from JFlex one but its feature set is also different.
It is not possible to do the proper conversion. I just tried to support the syntax I employed in my projects.
Thanks for the reply. Yeah, I've taken a very similar approach where I let GK generate the JFlex lexer and then have a set of documented manual steps to get it into ship-shape. Takes less than five minutes each time I have to do it, and my grammar has stabilized enough now that I do it very rarely. Still seems like it would be a nice thing to have...basically true one-stop shopping for the parser and lexer...but considering the complexity of the problem being solved, it's remarkably close to that as it is!