Problems with performance in custom language plugin

Hallo,
is it possible to trigger the parsing process in an action? I get a very bad performance when the whole file is parsed after every key input?

10 comments
Comment actions Permalink

No, that's not possible. Did you take CPU snapshots which prove that parsing is really the culprit? Or do you have additional functionality (annotator, resolving ...) which might rather cause this?

0
Comment actions Permalink

I tried to profile via Visual VM, which was quite hard. However I detached all other extensions like annotator and reference contributors and the problem remains.
A file of my language looks like this:

1000BASE_SX {      ex [1000base sx]      en [1000base sx;1000base-sx] } 1000BASE_T {      ex [10/100/1000;1000base t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan;ieee 802.3z]      en [10/100/1000;1000base t;1000base-t;ethernet 10/100/1000 base tx;gb lan;gblan;gigabit ethernet;gigabit lan;gigabit-lan] } 1000BASE_X {      ex [1000base x]      en [1000base x] } 100BASE_FX {      ex [100base fx]      en [100base fx;100base-fx] } 100BASE_T {      de [netzwerk]      ex [10/100 ethernet;100 mbps;100base t;ethernet;lan;netzwerk;rj 45]      en [10/100 ethernet;10/100base-t;100 mbps;100base t;ethernet;lan] }


The whole fiel has about 15000 lines. The editor lags when I want to add another value in between the square brackets.
The lex file I use looks like this:

%% %class SOLLexer %implements FlexLexer %unicode %function advance %type IElementType %eof{  return; %eof} %{   private int globalStatus;   private int stateOfLanguageDefinition; %} CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] COMPARISON_TYPE = ("MORE_IS_BETTER"|"LESS_IS_BETTER"|"NOT_COMPARABLE"|"FEATURE_VALUE"|"AVAILABILITY"|"RICHNESS") PROPERTY_TYPE = ("BOOLEAN"|"INTEGER"|"DOUBLE"|"STRING"|"STRINGENUM"|"DATE") FEATURE_VALUE_TEXT = "Feature Value Definition" CATEGORY_TEXT = "Category Definition" PROPERTY_TEXT = "Property Definition" MAPPING_TEXT = "Mapping Definition" LOCALE_CHARACTER = ("ex"|"en"|"de") END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAME = [^\{\}\r\n\t;\]\[] ALTERNATIVE_NAME_START = [^\{\}\r\n\t\ ;\]\[] DEFINITION_CHARACTER = [A-Z0-9_]+ COMMA = ";" EXTENDS_CHARACTER = "extends" CLOSING_BRACKET_CHARACTER=\} CLOSING_RECT=\] OPENING_BRACKET_CHARACTER=\{ OPENING_RECT=\[ INPUT_CHARACTER = "##".* EQUALS_CHARACTER = "=" %state WAITING_FEATURE_VALUE_DEFINITION %state WAITING_PROPERTY_DEFINITION %state WAITING_CATEGORY_DEFINITION %state WAITING_PROPERTY_TYPE %state WAITING_COMPARISON_TYPE %state WAITING_PROPERTY_NAME %state WAITING_OPEN_DEFINITION %state PROPERTY_DEFINITION_STATE %state FEATURE_VALUE_DEFINITION_STATE %state CATEGORY_DEFINITION_STATE %state ALTERNATIVE_NAME_STATE %state WAITING_EXTENDS_DECLARATION %state WAITING_CATEGORY_REFERENCE %state WAITING_MAPPING_DEFINITION %state WAITING_END_INPUT %state WAITING_PROPERTY_MAPPINGS %state WAITING_EQUALS %state WAITING_VALUE %state WAITING_OPENING_RECT %state WAITING_CLOSING_RECT %% <YYINITIAL>{FEATURE_VALUE_TEXT} { yybegin(WAITING_FEATURE_VALUE_DEFINITION);  return SOLTypes.FEATURE_VALUE_SIGNIFIER;} <YYINITIAL>{CATEGORY_TEXT} { yybegin(WAITING_CATEGORY_DEFINITION);  return SOLTypes.CATEGORY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TEXT} { yybegin(WAITING_PROPERTY_DEFINITION);  return SOLTypes.PROPERTY_SIGNIFIER;} <YYINITIAL>{MAPPING_TEXT} { yybegin(WAITING_MAPPING_DEFINITION);  return SOLTypes.MAPPING_SIGNIFIER;} <WAITING_FEATURE_VALUE_DEFINITION>{DEFINITION_CHARACTER} {globalStatus = yystate(); yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.FEATURE_VALUE_NAME;} <WAITING_CATEGORY_DEFINITION>{DEFINITION_CHARACTER} {globalStatus = yystate(); yybegin(WAITING_EXTENDS_DECLARATION); return SOLTypes.CATEGORY_NAME;} <WAITING_EXTENDS_DECLARATION>{EXTENDS_CHARACTER} {yybegin(WAITING_CATEGORY_REFERENCE); return SOLTypes.EXTENDS;} <WAITING_EXTENDS_DECLARATION>{OPENING_BRACKET_CHARACTER} { yybegin(LexerUtil.determineStatusByGlobalStatus(globalStatus)); return SOLTypes.OPENING_BRACKET;} <WAITING_CATEGORY_REFERENCE>{DEFINITION_CHARACTER} {yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.CATEGORY_REFERENCE;} <WAITING_PROPERTY_DEFINITION>{PROPERTY_TYPE} {globalStatus = yystate(); yybegin(WAITING_COMPARISON_TYPE); return SOLTypes.PROPERTY_TYPE;} <WAITING_COMPARISON_TYPE>{COMPARISON_TYPE} {yybegin(WAITING_PROPERTY_NAME); return SOLTypes.COMPARISON_TYPE;} <WAITING_PROPERTY_NAME>{DEFINITION_CHARACTER} {yybegin(WAITING_OPEN_DEFINITION); return SOLTypes.PROPERTY_NAME;} // from here for mapping <WAITING_MAPPING_DEFINITION>{INPUT_CHARACTER} {yybegin(WAITING_END_INPUT); return SOLTypes.INPUT;} <WAITING_END_INPUT>{CRLF} {yybegin(WAITING_PROPERTY_MAPPINGS); return TokenType.WHITE_SPACE;} <WAITING_PROPERTY_MAPPINGS>{INPUT_CHARACTER} {yybegin(WAITING_END_INPUT); return SOLTypes.INPUT;} <WAITING_PROPERTY_MAPPINGS>{DEFINITION_CHARACTER} {yybegin(WAITING_EQUALS); return SOLTypes.PROPERTY_REFERENCE;} <WAITING_EQUALS>{EQUALS_CHARACTER} {yybegin(WAITING_VALUE); return SOLTypes.EQUALS;} <WAITING_VALUE>{ALTERNATIVE_NAME_START}{ALTERNATIVE_NAME}* {yybegin(WAITING_VALUE); return SOLTypes.FEATURE_VALUE_REFERENCE;} <WAITING_VALUE>{CRLF} {yybegin(WAITING_PROPERTY_MAPPINGS); return TokenType.WHITE_SPACE;} <WAITING_OPEN_DEFINITION>{OPENING_BRACKET_CHARACTER} { yybegin(LexerUtil.determineStatusByGlobalStatus(globalStatus)); return SOLTypes.OPENING_BRACKET;} <PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE>{DEFINITION_CHARACTER} { yybegin(yystate()); return LexerUtil.determineReferenceTypeByState(yystate());} <FEATURE_VALUE_DEFINITION_STATE, PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE>{LOCALE_CHARACTER} { stateOfLanguageDefinition = yystate(); yybegin(WAITING_OPENING_RECT); return SOLTypes.LOCALE;}             <WAITING_OPENING_RECT> {OPENING_RECT}             {yybegin(ALTERNATIVE_NAME_STATE); return SOLTypes.OPENING_RECT; }             <ALTERNATIVE_NAME_STATE> {ALTERNATIVE_NAME_START}{ALTERNATIVE_NAME}*             {yybegin(WAITING_CLOSING_RECT); return SOLTypes.ALTERNATIVE_NAME; }             <WAITING_CLOSING_RECT> {CLOSING_RECT}             { yybegin(stateOfLanguageDefinition); return SOLTypes.CLOSING_RECT; }             <WAITING_CLOSING_RECT> {COMMA}             { yybegin(ALTERNATIVE_NAME_STATE); return SOLTypes.COMMA; }             // In ALTERNATIVE_PROPERTY_DEFINITION_STATE a CLOSING_BRACKET leads to PROPERTY_DEFINITION_STATE             <PROPERTY_DEFINITION_STATE, CATEGORY_DEFINITION_STATE, FEATURE_VALUE_DEFINITION_STATE> {CLOSING_BRACKET_CHARACTER}             { yybegin(globalStatus); return SOLTypes.CLOSING_BRACKET; } {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } {CRLF} { yystate(); return TokenType.WHITE_SPACE; } {WHITE_SPACE}+ { yystate(); return TokenType.WHITE_SPACE; } .{ return TokenType.BAD_CHARACTER; }


And the dangerous part seems to be the ALTERNATIVE_NAME_STATE, where any sequence of signs can be read.
When I remove the square bracket on the right I can write fluently. The complete source is available here:
git clone https://bitbucket.org/ssprenger/simple-ontology-language.git

Thanks for any help, Sebastian

0
Comment actions Permalink

You seem to put too much of syntax logic into your lexer.

IntelliJ IDEA platform assumes lexers are fast and uses them very often.

Solution: decouple lexer & parser logic as it is usually done:

  • Make lexer recognize basic tokens only, e.g. ID, BRACE, NUMBER, SEMICOLON, etc.
  • properties, features, definitions & etc. should be recognized (i.e. parsed) by a parser.


This approach has a lot to offer:

  • Parsing is often made in background, so UI is not freezed
  • A normal error reporting can be implemented (not just BAD_CHARACTER highlights)
  • Parser output is AST / PSI trees that can be later used for inspections/refactorings (not just raw token array)


You can write parser manually or generate it from a BNF grammar using Grammar-Kit plugin.


Please feel free to seek more information about lexer/parser relationship in our guide for language plugin developers and anywhere else.

http://confluence.jetbrains.com/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA

0
Comment actions Permalink

Thank you for the information. I will try to optimize my lexer and leave the work for the parser.

Could it also be possible that the files I used are just too large (=15000 lines).

I took the example language from http://confluence.jetbrains.com/display/IntelliJIDEA/Custom+Language+Support
I enlarged the example property file:

# You are reading the ".properties" entry.
! The exclamation mark can also mark text as comments.
website = http://en.wikipedia.org/
language = English
# The backslash below tells the application to continue reading
# the value onto the next line.
message = Welcome to \
          Wikipedia!
# Add spaces to the key
key\ with\ spaces = This is the value that could be looked up with the key "key with spaces".
# Unicode
tab : \u0009

to 5000 lines and the editor freezes. In this case the lexer and grammar definitions are very simple. So is a custom language plugin suitable for very simple but large files?

0
Comment actions Permalink

Thanks again for the hint to move the syntax logic to the parser. That made the lexer much easier.
I simplified the lexer of my custom language to the following:

package com.sol; import com.intellij.lexer.FlexLexer; import com.intellij.psi.tree.IElementType; import com.sol.psi.SOLTypes; import com.intellij.psi.TokenType; %% %class SOLLexer %implements FlexLexer %unicode %function advance %type IElementType %eof{  return; %eof} CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] COMPARISON_TYPE = ("MORE_IS_BETTER"|"LESS_IS_BETTER"|"NOT_COMPARABLE"|"FEATURE_VALUE"|"AVAILABILITY"|"RICHNESS") PROPERTY_TYPE = ("BOOLEAN"|"INTEGER"|"DOUBLE"|"STRING"|"STRINGENUM"|"DATE") FEATURE_VALUE_TEXT = "Feature Value Definition" CATEGORY_TEXT = "Category Definition" PROPERTY_TEXT = "Property Definition" END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAMES= ("en "|"ex "|"de ")[A-Za-z0-9\.,; ]+(\n|\r) //[A-Za-z0-9\.,]\n|\r IDENTIFIER = [A-Z0-9_]+ EXTENDS = "extends" CLOSING_BRACKET=\} OPENING_BRACKET=\{ %state INBODY %% <YYINITIAL>{FEATURE_VALUE_TEXT} {yybegin(YYINITIAL); return SOLTypes.FEATURE_VALUE_SIGNIFIER;} <YYINITIAL>{CATEGORY_TEXT} { yybegin(YYINITIAL); return SOLTypes.CATEGORY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TEXT} { yybegin(YYINITIAL); return SOLTypes.PROPERTY_SIGNIFIER;} <YYINITIAL>{PROPERTY_TYPE} { yybegin(YYINITIAL); return SOLTypes.PROPERTY_TYPE;} <YYINITIAL>{COMPARISON_TYPE} { yybegin(YYINITIAL); return SOLTypes.COMPARISON;} <YYINITIAL>{EXTENDS} { yybegin(YYINITIAL); return SOLTypes.EXTENDS;} <YYINITIAL>{IDENTIFIER} { yybegin(YYINITIAL); return SOLTypes.IDENTIFIER;} <YYINITIAL>{OPENING_BRACKET} {  yybegin(INBODY); return SOLTypes.OPENING_BRACKET;} <INBODY>{IDENTIFIER} { yybegin(INBODY); return SOLTypes.REFERENCE;} <INBODY>{ALTERNATIVE_NAMES} {yybegin(INBODY); return SOLTypes.ALTERNATIVE_NAMES;} <INBODY>{CLOSING_BRACKET} { yybegin(YYINITIAL); return SOLTypes.CLOSING_BRACKET;} {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } <INBODY>{CRLF} { yybegin(INBODY); return TokenType.WHITE_SPACE; } <INBODY>{WHITE_SPACE}+ { yybegin(INBODY); return TokenType.WHITE_SPACE; } <YYINITIAL>{CRLF} { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } <YYINITIAL>{WHITE_SPACE}+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } .                       { return TokenType.BAD_CHARACTER; }


The grammer for the parser looks like this:

solFile ::= COMMENT* definition COMMENT* private definition ::= featureValueDefinitions|propertyDefinitions|categoryDefinitions private propertyDefinitions::=PROPERTY_SIGNIFIER propertyDefinition* private categoryDefinitions::=CATEGORY_SIGNIFIER categoryDefinition* private featureValueDefinitions::=FEATURE_VALUE_SIGNIFIER featureValueDefinition* propertyDefinition::= PROPERTY_TYPE COMPARISON IDENTIFIER OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|REFERENCE)* CLOSING_BRACKET featureValueDefinition::= IDENTIFIER OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|COMMENT)* CLOSING_BRACKET categoryDefinition::= IDENTIFIER (EXTENDS IDENTIFIER)? OPENING_BRACKET (COMMENT|ALTERNATIVE_NAMES|REFERENCE)* CLOSING_BRACKET


PSIViewer shows the elements I expect. However, more than 2500 lines and performance is bad, more than 10000 lines and the file is harldy editable. Please help me to find the error because I'm really stucked.

0
Comment actions Permalink

You're doing good :)

However there're 2 problems:



PS COMMENT tokens can be auto-skipped by PsiBuilder (and usually should not present in a grammar) just like WHITESPACE tokens. See ParserDefinition.getCommentTokens().

0
Comment actions Permalink

Thank you very much. I don't really get it yet but I am playing around with the different rule attributes and try to create a more hierachical structure. That will take some time - it's hard stuff :-)

0
Comment actions Permalink

Hallo,
I stripped down my language to understand the generated code better. In the end a file should look like this:

AN_IDENTIFIER {
     en an english name
     de a german name
}

ANOTHER_IDENTIFIER {
     en another english name
     de another german name
}


How I already mentioned definitions like that occur about 3000 times, so the file can reach a length of about 15000 lines. Because of the missing hierarchy, I tried to generate parent DUMMY_BLOCKS around a set of definitions like in the given example of the grammar-kit grammar.
The grammar defintion looks like this:

external solFile ::= parseGrammar definition // live preview view //solFile ::= definition private definition ::= !<<eof>> entry private entry_recover ::= !(entry_start) entry ::= entry_start languageSection CLOSING_BRACKET entry_start ::= IDENTIFIER OPENING_BRACKET {pin=1} entry_start ::= languageSection::= ALTERNATIVE_NAME+


The lexer looks like this:

CRLF = \n|\r|\r\n WHITE_SPACE = [\ \t\f] END_OF_LINE_COMMENT = "%%"[^\r\n]* ALTERNATIVE_NAME= . LOCALE = ("en "|"ex "|"de ") //[A-Za-z0-9\.,]\n|\r IDENTIFIER = [A-Z0-9_]+ CLOSING_BRACKET=\} OPENING_BRACKET=\{ %state INBODY %% <YYINITIAL>{IDENTIFIER} {yybegin(YYINITIAL); return SOLTypes.IDENTIFIER;} <YYINITIAL>{OPENING_BRACKET} {yybegin(INBODY); return SOLTypes.OPENING_BRACKET;} <INBODY>{LOCALE}{ALTERNATIVE_NAME}* {yybegin(INBODY); return SOLTypes.ALTERNATIVE_NAME;} <INBODY>{CLOSING_BRACKET} {yybegin(YYINITIAL); return SOLTypes.CLOSING_BRACKET;} {CRLF} { yystate(); return TokenType.WHITE_SPACE; } {WHITE_SPACE}+ { yystate(); return TokenType.WHITE_SPACE; } {END_OF_LINE_COMMENT} { yystate(); return SOLTypes.COMMENT; } . { return TokenType.BAD_CHARACTER; }


I am facing two problems now:
1. In the generated parser code the method

protected boolean parse_root_(final IElementType root_, final PsiBuilder builder_, final int level_) {     parseGrammar(builder_, level_ + 1, definition_parser_); }

can't compile, so I just tried to return true. Which I guess lead to my second problem:
2. I can't parse the code, the first opening brace is marked as unexpected and so the PSI-Viewer doesn't show the wished result.

I tried to set pins and recoverWhile at every possible place. However I don't understand this concept completely. Are pins the right way to make the input file valid?
Thanks for any help,
Sebastian

0
Comment actions Permalink

I found a proper implementation of parserGrammar:

  public static boolean parseGrammar(PsiBuilder builder_, int level, Parser parser) {         ErrorState state = ErrorState.get(builder_);         return parseAsTree(state, builder_, level, DUMMY_BLOCK, true, parser, TRUE_CONDITION);     }


Now everything is working fine and it's fast as hell! Thank you very much for your help.

0
Comment actions Permalink

Glad you got it working.

For the records:

Void methods are the result of the missing/incorrect psiImplUtilClass discussed here https://github.com/JetBrains/Grammar-Kit/issues/24

Pins & recoverWhile are the manual error handling mechanisms. They are needed when the input is incorrect.
When they are applied:

  • pin - required tokens are missing (use often)
  • recoverWhile - extra tokens are present (use with care)
See quick doc popup on attribute in the IDE (Ctrl-Q) and HOWTO.md page on github.
Grammar-Kit 1.1.7 can autogenerate simple predicates with recoverWhile="#auto" (one token lookahead).
0

Please sign in to leave a comment.