Grammar-Kit and Identifier

Permanently deleted user

Created February 14, 2014 12:19

I'm trying to create a grammar for a Java-like language.  I'm struggling with Identifiers.  I've gotten this to work, but I don't think it's ideal.  Could I get some feedback on the best way to accomplish this?

Here's what I have so far that works:

{
    parserClass="au.com.borner.salesforce.parser.apex.ApexParser"

    extends="com.intellij.extapi.psi.ASTWrapperPsiElement"

    psiClassPrefix="Apex"
    psiImplClassSuffix="Impl"
    psiPackage="au.com.borner.salesforce.psi.apex"
    psiImplPackage="au.com.borner.salesforce.psi.apex.impl"

    elementTypeHolderClass="au.com.borner.salesforce.psi.apex.ApexTypes"
    elementTypeClass="au.com.borner.salesforce.psi.apex.ApexElementType"
    tokenTypeClass="au.com.borner.salesforce.psi.apex.ApexTokenType"

    psiImplUtilClass="au.com.borner.salesforce.psi.apex.impl.ApexPsiImplUtil"

    tokens = [

        IDENTIFIER = 'regexp:[a-zA-Z][a-zA-Z0-9]*'
        CLASS_KEYWORD = 'class'
        LINE_COMMENT = 'regexp://.*'
        COMMENT = 'regexp:/\*(.|\n)*\*/'
        LEFT_BRACE = '{'
        RIGHT_BRACE = '}'
        LEFT_PAREN = '('
        RIGHT_PAREN = ')'
    ]
}

apexFile
    ::= classDeclaration |
        interfaceDeclaration |
        triggerDeclaration

classDeclaration
    ::= public CLASS_KEYWORD className ( extends extendsClassName )? (implements interfaceName)? LEFT_BRACE RIGHT_BRACE

interfaceDeclaration
    ::= public interface interfaceName ( extends extendsInterfaceName)? LEFT_BRACE RIGHT_BRACE

triggerDeclaration
    ::= trigger triggerName on sObjectName LEFT_PAREN triggerParameters RIGHT_PAREN LEFT_BRACE RIGHT_BRACE

triggerParameters
    ::= before insert |
        after insert

className
    ::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getIdentifier getName setName getNameIdentifier ] }

extendsClassName
    ::= IDENTIFIER

interfaceName
    ::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getIdentifier getName setName getNameIdentifier ] }

extendsInterfaceName
    ::= IDENTIFIER

triggerName
    ::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getIdentifier getName setName getNameIdentifier ] }

sObjectName
    ::= IDENTIFIER

(on a side note - how do I make the code look pretty on this post?)

The thing i don't like about this is that each NamedElement will have a child of type IDENTIFIER.  What I would prefer is for the NamedElement to just have text (no child).  Like such:

className
    ::= [a-zA-Z][a-zA-Z0-9]* { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                               implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                               methods = [ getIdentifier getName setName getNameIdentifier ] }

or

className
    ::= [:jletter:][:jletternumber:]* { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                                        implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                                        methods = [ getIdentifier getName setName getNameIdentifier ] }

But Grammar-kit doesn't like either of those.....

BTW: I'm using the Grammar-kit plugin to do a right-click on the Apex.bnf file to generate the _Apex.flex file.  My preference is to not have to manually edit the _Apex.flex file after its been generated.

So what's the best way to deal with Java-like identifiers?

14 comments

Gregory Shrago

Created February 18, 2014 17:01

There's a lot of things in your grammar that should be improved not only from implementation point of view but ideologically as well.

The most important one is that this grammar implies that named-element actually IS-A name which is not true (named-element HAS-A name).

If we start to fix this we soon come to far more pleasant BNF and hence PSI.

Something like this:

{
   ...

   extends(".+Declaration")=Declaration

// basic error recovery
   pin(".+Clause")=1
   pin(".+Declaration")=2 // class/interface/trigger
   recoverWhile(".+Declaration")="#auto"
}

apexFile
    ::= classDeclaration |
        interfaceDeclaration |
        triggerDeclaration

id ::= IDENTIFIER // I prefer identifier to be not leaf element, but this rule can be inlined

// fake PSI hierarchy rule that are extended by all declarations

fake Declaration ::= id {
                     mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getIdentifier="id" getNameIdentifier="id"
                                         getName setName ]
}

// all *Name rules below are replaced with "id" (or "IDENTIFIER" in case you want name element to be just leaf (token) element

// extends & implements clauses are extracted
private extendsClause ::= extends id
private implementsClause ::= implements id
private body ::= "{" "}"

classDeclaration
    ::= public class id extendsClause? implementsClause? body

interfaceDeclaration
    ::= public interface id extendsClause? body

triggerDeclaration
    ::= trigger id on id "(" triggerParameters ")" body

triggerParameters ::= before insert | after insert

UPDATE: pin/recover attributes fixed

Permanently deleted user

Created February 19, 2014 09:47

Thank you for your comprehensive reply!  It was really helpful.  And I see how it cleans up the generated PSI classes.

I have a couple of questions:

1) I noticed you are using '{' in the grammar, whereas I was using the LEFT_BRACE token.  Is there a preferred one to use? If I use '{', I would still have to define LEFT_BRACE token somewhere in the grammar in order to do syntax highlighting, correct?

2) The generated ApexClassDeclaration class has a method: List<ApexId> getIdList();  This is because the class declaraction can have several identifiers.  For example "public class mark extends fred implements foo { }" would have three identifiers 1) mark 2) fred 3) foo.
ApexDeclaraction:getId() will return the first one, correct (in this case: mark).

Would this be different if IDENTIFIER was a leaf or not?  I'm not sure how I would change your grammar so IDENTIFIER is not a leaf.  Could you show me?

3) The grammar-kit plugin is highlighting "pin(".+(Declaration|Clause)")=2 // class/interface/trigger" as Pattern doesn't match any rule.  Was your intetion something like this?

    pin(".+(Clause)")=1
    pin(".+(Declaration)")=2 // class/interface/trigger

4)  The grammar-kit plugin is highlighting recover(".+Declaration")="#auto" as an Unused attribute  I have to admit that I don't fully understand the recover process yet.  So any tips on what this does is appreciated.

Gregory Shrago

Created February 19, 2014 16:02

Sorry for typos, I didn't check the grammar in the IDE. I've updated it so #3 and #4 are covered.

#1 I personally prefer to use "{" instead of LEFT_BRACE because BNF looks more readable to me.
Both "{" and LEFT_BRACE can be used in the grammar but you're right that LEFT_BRACE have to be defined so that IElementType constant have a name.
Names are autogenerated for non-private rules and keyword tokens but not for syntax tokens.
If the constant is not defined the generated parser will match token by text which is generally slower.

#2 By "leaf" I mean "token" element (com.intellij.psi.impl.source.tree.LeafElement).
ApexId is already not a LeafElement which is good for future functionality.

List<ApexId> method doesn't look good to me so I would make clauses public:

extendsClause ::= extends id
implementsClause ::= implements id

Permanently deleted user

Created February 20, 2014 10:38

#1 - I agree that '{' is more readable.  I'll switch over to use that and keeping the keywords/symbols defined as tokens
#2 - I"m still confused by what is a leaf and what is not a leaf.  You say ApexId isn't a leaf, but when I look at the element in the PsiViewer, is says it is PsiLeafElement.  See the screenshot.  Cursor was on "Mark" in the preview window.
The browser says it's a PsiElement(IDENTIFIER_PATTERN) and the class is LeafPsiElement.  Can you help clarify?

#4 - I changed recover(".+Declaration")="#auto" to recoverWhile(".+Declaration")="#auto", but the IDE still tells me it's an Unused attribute.  Can you tell me what I'm doing wrong?

Here's the start of my grammar:

{
    parserClass="au.com.borner.salesforce.parser.apex.ApexParser"

    extends="com.intellij.extapi.psi.ASTWrapperPsiElement"

    psiClassPrefix="Apex"
    psiImplClassSuffix="Impl"
    psiPackage="au.com.borner.salesforce.psi.apex"
    psiImplPackage="au.com.borner.salesforce.psi.apex.impl"

    elementTypeHolderClass="au.com.borner.salesforce.psi.apex.ApexTypes"
    elementTypeClass="au.com.borner.salesforce.psi.apex.ApexElementType"
    tokenTypeClass="au.com.borner.salesforce.psi.apex.ApexTokenType"

    psiImplUtilClass="au.com.borner.salesforce.psi.apex.impl.ApexPsiImplUtil"

    tokens = [

        CLASS_KEYWORD   = 'class'
        TRUE_KEYWORD    = 'true'
        FALSE_KEYWORD   = 'false'

        IDENTIFIER_PATTERN = 'regexp:[a-zA-Z$_][a-zA-Z0-9$_]*'

        LINE_COMMENT = 'regexp://.*'
        COMMENT = 'regexp:/\*(.|\n)*\*/'
        STRING_LITERAL = "regexp:\'(\\b|\\t|\\n|\\f|\\r|\\\"|\\'|\\|.)*\'"
    ]

    extends(".+Declaration")=Declaration
    pin(".+(Clause)")=1
    recoverWhile(".+Declaration")="#auto"

}

apexFile
    ::= classDeclaration |
        interfaceDeclaration |
        triggerDeclaration

identifier ::= IDENTIFIER_PATTERN

fake Declaration ::=  identifier {
                     mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getName setName NameIdentifier="identifier" getPresentation]
}

classDeclaration
    ::= annotationDeclaration* classModifier virtualOrAbstractModifier? sharingModifier? CLASS_KEYWORD identifier extendsClause? implementsClause? classBody { pin=5 }

Attachment(s):
Screen Shot 2014-02-20 at 8.32.37 pm.png

Permanently deleted user

Created February 20, 2014 11:50

I've been refactoring this evening using the new Grammar and I like it.  Except for one thing.

The ApexDeclaration interface has the following method declaration:

  @NotNull
  ApexIdentifier getIdentifier();

And the ApexDeclarationImpl has the matching method implementation:

  @Override
  @NotNull
  public ApexIdentifier getIdentifier() {
    return findNotNullChildByClass(ApexIdentifier.class);
  }

Both have the @NotNull annotation on them.

However, all the ApexDeclaration extension classes - like ApexClassDeclaraction - override the getIdentifier() method with:

  @Override
  @Nullable
  public ApexIdentifier getIdentifier() {
    return findChildByClass(ApexIdentifier.class);
  }

Which is annotated as @Nullable.

So the IDE highlights all these methods with warnings about @NotNull overriding @Null methods.

Funny thing is: if I look at the findNotNullChildByClass() method that  ApexDeclarationImpl calls, it can return null! :-)

Is there anyway for me to control the annotations that the Generator generates?

Alexander Doroshko

Created February 20, 2014 11:56

Changing pin to 6 in classDeclaration will guarantee that id is not null. Currently with pin on 'class' keyword classDeclaration indeed may have no name identifier.

Gregory Shrago

Created February 20, 2014 16:56

#4 Please update Grammar-Kit plugin if you develop with IntelliJ 13. recoverWhile was named recoverUntil in the early days. recoverUntil still can be used for backward compatibility.

Permanently deleted user

Created February 21, 2014 06:02

Thanks. That fixed it up.

Permanently deleted user

Created February 21, 2014 06:05

I'm using IntelliJ 12 with Grammar-Kit Plugin version 1.1.2. When I change it to recoverUtil, then it highlights #auto with the warning Unused rule reference.

Permanently deleted user

Created February 21, 2014 07:59

In order to get references working, I renamed extendsClause to extendsDeclaration so I could get the named element support. That all works good.

How would you suggest I deal with the implementsClause?

My original grammar only had one identifier on this element. But in Apex (like in Java) you can have many. So the definition now is:

implementsClause ::=  implements identifier (',' identifier)*

So implements clause has-a LIST of names. I'm not sure how to model that - and have references work. I can't change implementsClause to implementsDeclaraction because the getIdentifier() method doesn't make sense - there is not one identifier, there are many,

Any help is appreciated.

Gregory Shrago

Created February 21, 2014 16:04

#auto recover predicates as well as token PSI accessors are supported in the latest Grammar-Kit version only.

Unfortunately (or luckily) most fixes and new features are always available in the latest versions as we live on the edge here.
You really need to consider moving to IDEA 13.0.3 and 13.1. Community edition is free and Grammar-Kit works there.

Gregory Shrago

Created February 21, 2014 16:11

Clauses ideologically should not be declarations and named-elements. This is just plain wrong (e.g. in completion and find usages).

You shall read more guides/forums on IntelliJ language plugin developing process as I cannot retype here all of that or write code for you.
References are covered here http://confluence.jetbrains.com/display/IntelliJIDEA/Reference+Contributor.

You may also consider Grammar-Kit code on github as an example.

Permanently deleted user

Created February 22, 2014 01:53

I have gone through that example; however, it shows how to inject a reference into an existing language. I'm trying to use references within my custom language.
I have read through http://confluence.jetbrains.com/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA#DevelopingCustomLanguagePluginsforIntelliJIDEA-ReferencesandResolve.
I'm trying to figure out how to implement these concepts into my custom language. I appreciate all your help!

I will check out the Grammar-Kit plugin source to see what I can gleen from it.

I've starting using Intellij 13.0.2 and installed Grammar-Kit 1.1.7. Now the generated Parser code won't compile. I went back to the original grammar you suggested, and it doesn't generate compilable code either. Can you tell me if this is a bug or if there is something wrong in the grammer?

Here's what I used as input:


{
    parserClass="au.com.borner.salesforce.parser.apex.ApexParser"
    extends="com.intellij.extapi.psi.ASTWrapperPsiElement"
    psiClassPrefix="Apex"
    psiImplClassSuffix="Impl"
    psiPackage="au.com.borner.salesforce.psi.apex"
    psiImplPackage="au.com.borner.salesforce.psi.apex.impl"
    elementTypeHolderClass="au.com.borner.salesforce.psi.apex.ApexTypes"
    elementTypeClass="au.com.borner.salesforce.psi.apex.ApexElementType"
    tokenTypeClass="au.com.borner.salesforce.psi.apex.ApexTokenType"
    psiImplUtilClass="au.com.borner.salesforce.psi.apex.impl.ApexPsiImplUtil"

    tokens = [
        IDENTIFIER_PATTERN = 'regexp:[a-zA-Z$_][a-zA-Z0-9$_]*'
    ]

    extends(".+Declaration")=Declaration

    // basic error recovery
    pin(".+Clause")=1
    pin(".+Declaration")=2 // class/interface/trigger
    recoverWhile(".+Declaration")="#auto"

}

apexFile
    ::= classDeclaration |
        interfaceDeclaration |
        triggerDeclaration

id ::= IDENTIFIER // I prefer identifier to be not leaf element, but this rule can be inlined

// fake PSI hierarchy rule that are extended by all declarations
fake Declaration ::= id {

                     mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
                     implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
                     methods = [ getIdentifier="id" getNameIdentifier="id" getName setName ]
}

// all *Name rules below are replaced with "id" (or "IDENTIFIER" in case you want name element to be just leaf (token) element

// extends & implements clauses are extracted
private extendsClause ::= extends id
private implementsClause ::= implements id
private body ::= "{" "}"

classDeclaration
    ::= public class id extendsClause? implementsClause? body

interfaceDeclaration
    ::= public interface id extendsClause? body

triggerDeclaration
    ::= trigger id on id "(" triggerParameters ")" body

triggerParameters ::= before insert | after insert

The line in error is in ApexParser (a parameter is missing after builder_,:

  final static Parser classDeclaration_auto_recover_ = new Parser() {
    public boolean parse(PsiBuilder builder_, int level_) {
      return !nextTokenIsFast(builder_, );
    }
  };

Thanks in advance.

Corrected version number as 1.1.7

Gregory Shrago

Created February 26, 2014 15:55

Just comment the recoverWhile(".+Declaration")="#auto" line.

I thought you would put several declarations in a file, with just one declaration there's no need in recovering.

Please sign in to leave a comment.