No, that's not possible. Did you take CPU snapshots which prove that parsing is really the culprit? Or do you have additional functionality (annotator, resolving ...) which might rather cause this?
I tried to profile via Visual VM, which was quite hard. However I detached all other extensions like annotator and reference contributors and the problem remains. A file of my language looks like this:
1000BASE_SX {
ex [1000base sx]
en [1000base sx;1000base-sx]
}
1000BASE_T {
ex [10/100/1000;1000base t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan;ieee 802.3z]
en [10/100/1000;1000base t;1000base-t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan]
}
1000BASE_X {
ex [1000base x]
en [1000base x]
}
100BASE_FX {
ex [100base fx]
en [100base fx;100base-fx]
}
100BASE_T {
de [netzwerk]
ex [10/100 ethernet;100 mbps;100base t;ethernet;lan;netzwerk;rj 45]
en [10/100 ethernet;10/100base-t;100 mbps;100base t;ethernet;lan]
}
The whole fiel has about 15000 lines. The editor lags when I want to add another value in between the square brackets. The lex file I use looks like this:
And the dangerous part seems to be the ALTERNATIVE_NAME_STATE, where any sequence of signs can be read. When I remove the square bracket on the right I can write fluently. The complete source is available here: git clone https://bitbucket.org/ssprenger/simple-ontology-language.git
# You are reading the ".properties" entry. ! The exclamation mark can also mark text as comments. website = http://en.wikipedia.org/ language = English # The backslash below tells the application to continue reading # the value onto the next line. message = Welcome to \ Wikipedia! # Add spaces to the key key\ with\ spaces = This is the value that could be looked up with the key "key with spaces". # Unicode tab : \u0009
to 5000 lines and the editor freezes. In this case the lexer and grammar definitions are very simple. So is a custom language plugin suitable for very simple but large files?
Thanks again for the hint to move the syntax logic to the parser. That made the lexer much easier. I simplified the lexer of my custom language to the following:
PSIViewer shows the elements I expect. However, more than 2500 lines and performance is bad, more than 10000 lines and the file is harldy editable. Please help me to find the error because I'm really stucked.
The main one: the "flat structure" of file ("solFile" rule). IntelliJ often tries to get the current (the deepest) PsiElement at some file position (e.g. PsiFile.findElementAt() method). Flat structure means linear performance while tree-like structure can turn this into logarithmic. You can use the trick of mine from Grammar-Kit itself because grammar files are just a number of "rules". See https://github.com/JetBrains/Grammar-Kit/blob/master/grammars/Grammar.bnf?source=c#L53
PS COMMENT tokens can be auto-skipped by PsiBuilder (and usually should not present in a grammar) just like WHITESPACE tokens. See ParserDefinition.getCommentTokens().
Thank you very much. I don't really get it yet but I am playing around with the different rule attributes and try to create a more hierachical structure. That will take some time - it's hard stuff :-)
Hallo, I stripped down my language to understand the generated code better. In the end a file should look like this:
AN_IDENTIFIER { en an english name de a german name }
ANOTHER_IDENTIFIER { en another english name de another german name }
How I already mentioned definitions like that occur about 3000 times, so the file can reach a length of about 15000 lines. Because of the missing hierarchy, I tried to generate parent DUMMY_BLOCKS around a set of definitions like in the given example of the grammar-kit grammar. The grammar defintion looks like this:
I am facing two problems now: 1. In the generated parser code the method
protected boolean parse_root_(final IElementType root_, final PsiBuilder builder_, final int level_) {
parseGrammar(builder_, level_ + 1, definition_parser_);
}
can't compile, so I just tried to return true. Which I guess lead to my second problem: 2. I can't parse the code, the first opening brace is marked as unexpected and so the PSI-Viewer doesn't show the wished result.
I tried to set pins and recoverWhile at every possible place. However I don't understand this concept completely. Are pins the right way to make the input file valid? Thanks for any help, Sebastian
Pins & recoverWhile are the manual error handling mechanisms. They are needed when the input is incorrect. When they are applied:
pin - required tokens are missing (use often)
recoverWhile - extra tokens are present (use with care)
See quick doc popup on attribute in the IDE (Ctrl-Q) and HOWTO.md page on github. Grammar-Kit 1.1.7 can autogenerate simple predicates with recoverWhile="#auto" (one token lookahead).
No, that's not possible. Did you take CPU snapshots which prove that parsing is really the culprit? Or do you have additional functionality (annotator, resolving ...) which might rather cause this?
I tried to profile via Visual VM, which was quite hard. However I detached all other extensions like annotator and reference contributors and the problem remains.
A file of my language looks like this:
The whole fiel has about 15000 lines. The editor lags when I want to add another value in between the square brackets.
The lex file I use looks like this:
And the dangerous part seems to be the ALTERNATIVE_NAME_STATE, where any sequence of signs can be read.
When I remove the square bracket on the right I can write fluently. The complete source is available here:
git clone https://bitbucket.org/ssprenger/simple-ontology-language.git
Thanks for any help, Sebastian
You seem to put too much of syntax logic into your lexer.
IntelliJ IDEA platform assumes lexers are fast and uses them very often.
Solution: decouple lexer & parser logic as it is usually done:
This approach has a lot to offer:
You can write parser manually or generate it from a BNF grammar using Grammar-Kit plugin.
Please feel free to seek more information about lexer/parser relationship in our guide for language plugin developers and anywhere else.
http://confluence.jetbrains.com/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA
Thank you for the information. I will try to optimize my lexer and leave the work for the parser.
Could it also be possible that the files I used are just too large (=15000 lines).
I took the example language from http://confluence.jetbrains.com/display/IntelliJIDEA/Custom+Language+Support
I enlarged the example property file:
# You are reading the ".properties" entry.
! The exclamation mark can also mark text as comments.
website = http://en.wikipedia.org/
language = English
# The backslash below tells the application to continue reading
# the value onto the next line.
message = Welcome to \
Wikipedia!
# Add spaces to the key
key\ with\ spaces = This is the value that could be looked up with the key "key with spaces".
# Unicode
tab : \u0009
to 5000 lines and the editor freezes. In this case the lexer and grammar definitions are very simple. So is a custom language plugin suitable for very simple but large files?
Thanks again for the hint to move the syntax logic to the parser. That made the lexer much easier.
I simplified the lexer of my custom language to the following:
The grammer for the parser looks like this:
PSIViewer shows the elements I expect. However, more than 2500 lines and performance is bad, more than 10000 lines and the file is harldy editable. Please help me to find the error because I'm really stucked.
You're doing good :)
However there're 2 problems:
PS COMMENT tokens can be auto-skipped by PsiBuilder (and usually should not present in a grammar) just like WHITESPACE tokens. See ParserDefinition.getCommentTokens().
Thank you very much. I don't really get it yet but I am playing around with the different rule attributes and try to create a more hierachical structure. That will take some time - it's hard stuff :-)
Hallo,
I stripped down my language to understand the generated code better. In the end a file should look like this:
I found a proper implementation of parserGrammar:
Now everything is working fine and it's fast as hell! Thank you very much for your help.
Glad you got it working.
For the records:
Void methods are the result of the missing/incorrect psiImplUtilClass discussed here https://github.com/JetBrains/Grammar-Kit/issues/24
Pins & recoverWhile are the manual error handling mechanisms. They are needed when the input is incorrect.
When they are applied:
Grammar-Kit 1.1.7 can autogenerate simple predicates with recoverWhile="#auto" (one token lookahead).