Problems with performance in custom language plugin

Hallo,
is it possible to trigger the parsing process in an action? I get a very bad performance when the whole file is parsed after every key input?

0

No, that's not possible. Did you take CPU snapshots which prove that parsing is really the culprit? Or do you have additional functionality (annotator, resolving ...) which might rather cause this?

0
Avatar
Permanently deleted user

I tried to profile via Visual VM, which was quite hard. However I detached all other extensions like annotator and reference contributors and the problem remains.
A file of my language looks like this:

1000BASE_SX {      ex [1000base sx]      en [1000base sx;1000base-sx] } 1000BASE_T {      ex [10/100/1000;1000base t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan;ieee 802.3z]      en [10/100/1000;1000base t;1000base-t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan] } 1000BASE_X {      ex [1000base x]      en [1000base x] } 100BASE_FX {      ex [100base fx]      en [100base fx;100base-fx] } 100BASE_T {      de [netzwerk]      ex [10/100 ethernet;100 mbps;100base t;ethernet;lan;netzwerk;rj 45]      en [10/100 ethernet;10/100base-t;100 mbps;100base t;ethernet;lan] }


The whole fiel has about 15000 lines. The editor lags when I want to add another value in between the square brackets.
The lex file I use looks like this:

%% %class SOLLexer %implements FlexLexer %unicode %function advance %type IElementType %eof{  return; %eof} %{   private int globalStatus;   private int stateOfLanguageDefinition; %} CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] COMPARISON_TYPE = ("MORE_IS_BETTER"|"LESS_IS_BETTER"|"NOT_COMPARABLE"|"FEATURE_VALUE"|"AVAILABILITY"|"RICHNESS") PROPERTY_TYPE = ("BOOLEAN"|"INTEGER"|"DOUBLE"|"STRING"|"STRINGENUM"|"DATE") FEATURE_VALUE_TEXT = "Feature Value Definition" CATEGORY_TEXT = "Category Definition" PROPERTY_TEXT = "Property Definition" MAPPING_TEXT = "Mapping Definition" LOCALE_CHARACTER = ("ex"|"en"|"de") END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAME = [^\{\}\r\n\t;\]\[] ALTERNATIVE_NAME_START = [^\{\}\r\n\t\ ;\]\[] DEFINITION_CHARACTER = [A-Z0-9_]+ COMMA = ";" EXTENDS_CHARACTER = "extends" CLOSING_BRACKET_CHARACTER=\} CLOSING_RECT=\] OPENING_BRACKET_CHARACTER=\{ OPENING_RECT=\[ INPUT_CHARACTER = "##".* EQUALS_CHARACTER = "=" %state WAITING_FEATURE_VALUE_DEFINITION %state WAITING_PROPERTY_DEFINITION %state WAITING_CATEGORY_DEFINITION %state WAITING_PROPERTY_TYPE %state WAITING_COMPARISON_TYPE %state WAITING_PROPERTY_NAME %state WAITING_OPEN_DEFINITION %state PROPERTY_DEFINITION_STATE %state FEATURE_VALUE_DEFINITION_STATE %state CATEGORY_DEFINITION_STATE %state ALTERNATIVE_NAME_STATE %state WAITING_EXTENDS_DECLARATION %state WAITING_CATEGORY_REFERENCE %state WAITING_MAPPING_DEFINITION %state WAITING_END_INPUT %state WAITING_PROPERTY_MAPPINGS %state WAITING_EQUALS %state WAITING_VALUE %state WAITING_OPENING_RECT %state WAITING_CLOSING_RECT %% <YYINITIAL>{FEATURE_VALUE_TEXT} { yybegin(WAITING_FEATURE_VALUE_DEFINITION);  return SOLTypes.FEATURE_VALUE_SIGNIFIER;} <YYINITIAL>{CATEGORY_TEXT} { yybegin(WAITING_CATEGORY_DEFINITION);  return SOLTypes.CATEGORY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TEXT} { yybegin(WAITING_PROPERTY_DEFINITION);  return SOLTypes.PROPERTY_SIGNIFIER;} <YYINITIAL>{MAPPING_TEXT} { yybegin(WAITING_MAPPING_DEFINITION);  return SOLTypes.MAPPING_SIGNIFIER;} <WAITING_FEATURE_VALUE_DEFINITION>{DEFINITION_CHARACTER} {globalStatus = yystate(); yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.FEATURE_VALUE_NAME;} <WAITING_CATEGORY_DEFINITION>{DEFINITION_CHARACTER} {globalStatus = yystate(); yybegin(WAITING_EXTENDS_DECLARATION); return SOLTypes.CATEGORY_NAME;} <WAITING_EXTENDS_DECLARATION>{EXTENDS_CHARACTER} {yybegin(WAITING_CATEGORY_REFERENCE); return SOLTypes.EXTENDS;} <WAITING_EXTENDS_DECLARATION>{OPENING_BRACKET_CHARACTER} { yybegin(LexerUtil.determineStatusByGlobalStatus(globalStatus)); return SOLTypes.OPENING_BRACKET;} <WAITING_CATEGORY_REFERENCE>{DEFINITION_CHARACTER} {yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.CATEGORY_REFERENCE;} <WAITING_PROPERTY_DEFINITION>{PROPERTY_TYPE} {globalStatus = yystate(); yybegin(WAITING_COMPARISON_TYPE); return SOLTypes.PROPERTY_TYPE;} <WAITING_COMPARISON_TYPE>{COMPARISON_TYPE} {yybegin(WAITING_PROPERTY_NAME); return SOLTypes.COMPARISON_TYPE;} <WAITING_PROPERTY_NAME>{DEFINITION_CHARACTER} {yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.PROPERTY_NAME;} // from here for mapping <WAITING_MAPPING_DEFINITION>{INPUT_CHARACTER} {yybegin(WAITING_END_INPUT); return SOLTypes.INPUT;} <WAITING_END_INPUT>{CRLF} {yybegin(WAITING_PROPERTY_MAPPINGS); return TokenType.WHITE_SPACE;} <WAITING_PROPERTY_MAPPINGS>{INPUT_CHARACTER} {yybegin(WAITING_END_INPUT); return SOLTypes.INPUT;} <WAITING_PROPERTY_MAPPINGS>{DEFINITION_CHARACTER} {yybegin(WAITING_EQUALS); return SOLTypes.PROPERTY_REFERENCE;} <WAITING_EQUALS>{EQUALS_CHARACTER} {yybegin(WAITING_VALUE); return SOLTypes.EQUALS;} <WAITING_VALUE>{ALTERNATIVE_NAME_START}{ALTERNATIVE_NAME}* {yybegin(WAITING_VALUE); return SOLTypes.FEATURE_VALUE_REFERENCE;} <WAITING_VALUE>{CRLF} {yybegin(WAITING_PROPERTY_MAPPINGS); return TokenType.WHITE_SPACE;} <WAITING_OPEN_DEFINITION>{OPENING_BRACKET_CHARACTER} { yybegin(LexerUtil.determineStatusByGlobalStatus(globalStatus)); return SOLTypes.OPENING_BRACKET;} <PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE>{DEFINITION_CHARACTER} { yybegin(yystate()); return LexerUtil.determineReferenceTypeByState(yystate());} <FEATURE_VALUE_DEFINITION_STATE, PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE>{LOCALE_CHARACTER} { stateOfLanguageDefinition = yystate(); yybegin(WAITING_OPENING_RECT); return SOLTypes.LOCALE;}             <WAITING_OPENING_RECT> {OPENING_RECT}             {yybegin(ALTERNATIVE_NAME_STATE); return SOLTypes.OPENING_RECT; }             <ALTERNATIVE_NAME_STATE> {ALTERNATIVE_NAME_START}{ALTERNATIVE_NAME}*             {yybegin(WAITING_CLOSING_RECT); return SOLTypes.ALTERNATIVE_NAME; }             <WAITING_CLOSING_RECT> {CLOSING_RECT}             { yybegin(stateOfLanguageDefinition); return SOLTypes.CLOSING_RECT; }             <WAITING_CLOSING_RECT> {COMMA}             { yybegin(ALTERNATIVE_NAME_STATE); return SOLTypes.COMMA; }             // In ALTERNATIVE_PROPERTY_DEFINITION_STATE a CLOSING_BRACKET leads to PROPERTY_DEFINITION_STATE             <PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE, FEATURE_VALUE_DEFINITION_STATE> {CLOSING_BRACKET_CHARACTER}             { yybegin(globalStatus); return SOLTypes.CLOSING_BRACKET; } {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } {CRLF} { yystate(); return TokenType.WHITE_SPACE; } {WHITE_SPACE}+ { yystate(); return TokenType.WHITE_SPACE; } .{ return TokenType.BAD_CHARACTER; }


And the dangerous part seems to be the ALTERNATIVE_NAME_STATE, where any sequence of signs can be read.
When I remove the square bracket on the right I can write fluently. The complete source is available here:
git clone https://bitbucket.org/ssprenger/simple-ontology-language.git

Thanks for any help, Sebastian

0

You seem to put too much of syntax logic into your lexer.

IntelliJ IDEA platform assumes lexers are fast and uses them very often.

Solution: decouple lexer & parser logic as it is usually done:

  • Make lexer recognize basic tokens only, e.g. ID, BRACE, NUMBER, SEMICOLON, etc.
  • properties, features, definitions & etc. should be recognized (i.e. parsed) by a parser.


This approach has a lot to offer:

  • Parsing is often made in background, so UI is not freezed
  • A normal error reporting can be implemented (not just BAD_CHARACTER highlights)
  • Parser output is AST / PSI trees that can be later used for inspections/refactorings (not just raw token array)


You can write parser manually or generate it from a BNF grammar using Grammar-Kit plugin.


Please feel free to seek more information about lexer/parser relationship in our guide for language plugin developers and anywhere else.

http://confluence.jetbrains.com/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA

0
Avatar
Permanently deleted user

Thank you for the information. I will try to optimize my lexer and leave the work for the parser.

Could it also be possible that the files I used are just too large (=15000 lines).

I took the example language from http://confluence.jetbrains.com/display/IntelliJIDEA/Custom+Language+Support
I enlarged the example property file:

# You are reading the ".properties" entry.
! The exclamation mark can also mark text as comments.
website = http://en.wikipedia.org/
language = English
# The backslash below tells the application to continue reading
# the value onto the next line.
message = Welcome to \
          Wikipedia!
# Add spaces to the key
key\ with\ spaces = This is the value that could be looked up with the key "key with spaces".
# Unicode
tab : \u0009

to 5000 lines and the editor freezes. In this case the lexer and grammar definitions are very simple. So is a custom language plugin suitable for very simple but large files?

0
Avatar
Permanently deleted user

Thanks again for the hint to move the syntax logic to the parser. That made the lexer much easier.
I simplified the lexer of my custom language to the following:

package com.sol; import com.intellij.lexer.FlexLexer; import com.intellij.psi.tree.IElementType; import com.sol.psi.SOLTypes; import com.intellij.psi.TokenType; %% %class SOLLexer %implements FlexLexer %unicode %function advance %type IElementType %eof{  return; %eof} CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] COMPARISON_TYPE = ("MORE_IS_BETTER"|"LESS_IS_BETTER"|"NOT_COMPARABLE"|"FEATURE_VALUE"|"AVAILABILITY"|"RICHNESS") PROPERTY_TYPE = ("BOOLEAN"|"INTEGER"|"DOUBLE"|"STRING"|"STRINGENUM"|"DATE") FEATURE_VALUE_TEXT = "Feature Value Definition" CATEGORY_TEXT = "Category Definition" PROPERTY_TEXT = "Property Definition" END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAMES= ("en "|"ex "|"de ")[A-Za-z0-9\.,; ]+(\n|\r) //[A-Za-z0-9\.,]\n|\r IDENTIFIER = [A-Z0-9_]+ EXTENDS = "extends" CLOSING_BRACKET=\} OPENING_BRACKET=\{ %state INBODY %% <YYINITIAL>{FEATURE_VALUE_TEXT} {yybegin(YYINITIAL); return SOLTypes.FEATURE_VALUE_SIGNIFIER;} <YYINITIAL>{CATEGORY_TEXT} { yybegin(YYINITIAL); return SOLTypes.CATEGORY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TEXT} { yybegin(YYINITIAL); return SOLTypes.PROPERTY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TYPE} { yybegin(YYINITIAL); return SOLTypes.PROPERTY_TYPE;} <YYINITIAL>{COMPARISON_TYPE} { yybegin(YYINITIAL); return SOLTypes.COMPARISON;} <YYINITIAL>{EXTENDS} { yybegin(YYINITIAL); return SOLTypes.EXTENDS;} <YYINITIAL>{IDENTIFIER} { yybegin(YYINITIAL); return SOLTypes.IDENTIFIER;} <YYINITIAL>{OPENING_BRACKET} {  yybegin(INBODY); return SOLTypes.OPENING_BRACKET;} <INBODY>{IDENTIFIER} { yybegin(INBODY); return SOLTypes.REFERENCE;} <INBODY>{ALTERNATIVE_NAMES} {yybegin(INBODY); return SOLTypes.ALTERNATIVE_NAMES;} <INBODY>{CLOSING_BRACKET} { yybegin(YYINITIAL); return SOLTypes.CLOSING_BRACKET;} {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } <INBODY>{CRLF} { yybegin(INBODY); return TokenType.WHITE_SPACE; } <INBODY>{WHITE_SPACE}+ { yybegin(INBODY); return TokenType.WHITE_SPACE; } <YYINITIAL>{CRLF} { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } <YYINITIAL>{WHITE_SPACE}+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } .                       { return TokenType.BAD_CHARACTER; }


The grammer for the parser looks like this:

solFile ::= COMMENT* definition COMMENT* private definition ::= featureValueDefinitions|propertyDefinitions|categoryDefinitions private propertyDefinitions::=PROPERTY_SIGNIFIER propertyDefinition* private categoryDefinitions::=CATEGORY_SIGNIFIER categoryDefinition* private featureValueDefinitions::=FEATURE_VALUE_SIGNIFIER featureValueDefinition* propertyDefinition::= PROPERTY_TYPE COMPARISON IDENTIFIER OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|REFERENCE)* CLOSING_BRACKET featureValueDefinition::= IDENTIFIER OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|COMMENT)* CLOSING_BRACKET categoryDefinition::= IDENTIFIER (EXTENDS IDENTIFIER)? OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|REFERENCE)* CLOSING_BRACKET


PSIViewer shows the elements I expect. However, more than 2500 lines and performance is bad, more than 10000 lines and the file is harldy editable. Please help me to find the error because I'm really stucked.

0

You're doing good :)

However there're 2 problems:



PS COMMENT tokens can be auto-skipped by PsiBuilder (and usually should not present in a grammar) just like WHITESPACE tokens. See ParserDefinition.getCommentTokens().

0
Avatar
Permanently deleted user

Thank you very much. I don't really get it yet but I am playing around with the different rule attributes and try to create a more hierachical structure. That will take some time - it's hard stuff :-)

0
Avatar
Permanently deleted user

Hallo,
I stripped down my language to understand the generated code better. In the end a file should look like this:

AN_IDENTIFIER {
     en an english name
     de a german name
}

ANOTHER_IDENTIFIER {
     en another english name
     de another german name
}


How I already mentioned definitions like that occur about 3000 times, so the file can reach a length of about 15000 lines. Because of the missing hierarchy, I tried to generate parent DUMMY_BLOCKS around a set of definitions like in the given example of the grammar-kit grammar.
The grammar defintion looks like this:

external solFile ::= parseGrammar definition // live preview view //solFile ::= definition private definition ::= !<<eof>> entry private entry_recover ::= !(entry_start) entry ::= entry_start languageSection CLOSING_BRACKET entry_start ::= IDENTIFIER OPENING_BRACKET {pin=1} entry_start ::= languageSection::= ALTERNATIVE_NAME+


The lexer looks like this:

CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAME= . LOCALE = ("en "|"ex "|"de ") //[A-Za-z0-9\.,]\n|\r IDENTIFIER = [A-Z0-9_]+ CLOSING_BRACKET=\} OPENING_BRACKET=\{ %state INBODY %% <YYINITIAL>{IDENTIFIER} {yybegin(YYINITIAL); return SOLTypes.IDENTIFIER;} <YYINITIAL>{OPENING_BRACKET} {yybegin(INBODY); return SOLTypes.OPENING_BRACKET;} <INBODY>{LOCALE}{ALTERNATIVE_NAME}* {yybegin(INBODY); return SOLTypes.ALTERNATIVE_NAME;} <INBODY>{CLOSING_BRACKET} {yybegin(YYINITIAL); return SOLTypes.CLOSING_BRACKET;} {CRLF} { yystate(); return TokenType.WHITE_SPACE; } {WHITE_SPACE}+ { yystate(); return TokenType.WHITE_SPACE; } {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } . { return TokenType.BAD_CHARACTER; }


I am facing two problems now:
1. In the generated parser code the method

protected boolean parse_root_(final IElementType root_, final PsiBuilder builder_, final int level_) {     parseGrammar(builder_, level_ + 1, definition_parser_); }

can't compile, so I just tried to return true. Which I guess lead to my second problem:
2. I can't parse the code, the first opening brace is marked as unexpected and so the PSI-Viewer doesn't show the wished result.

I tried to set pins and recoverWhile at every possible place. However I don't understand this concept completely. Are pins the right way to make the input file valid?
Thanks for any help,
Sebastian

0
Avatar
Permanently deleted user

I found a proper implementation of parserGrammar:

  public static boolean parseGrammar(PsiBuilder builder_, int level, Parser parser) {         ErrorState state = ErrorState.get(builder_);         return parseAsTree(state, builder_, level, DUMMY_BLOCK, true, parser, TRUE_CONDITION);     }


Now everything is working fine and it's fast as hell! Thank you very much for your help.

0

Glad you got it working.

For the records:

Void methods are the result of the missing/incorrect psiImplUtilClass discussed here https://github.com/JetBrains/Grammar-Kit/issues/24

Pins & recoverWhile are the manual error handling mechanisms. They are needed when the input is incorrect.
When they are applied:

  • pin - required tokens are missing (use often)
  • recoverWhile - extra tokens are present (use with care)
See quick doc popup on attribute in the IDE (Ctrl-Q) and HOWTO.md page on github.
Grammar-Kit 1.1.7 can autogenerate simple predicates with recoverWhile="#auto" (one token lookahead).
0

请先登录再写评论。