Grammar-Kit and Identifier Follow
I'm trying to create a grammar for a Java-like language. I'm struggling with Identifiers. I've gotten this to work, but I don't think it's ideal. Could I get some feedback on the best way to accomplish this?
Here's what I have so far that works:
{
parserClass="au.com.borner.salesforce.parser.apex.ApexParser"
extends="com.intellij.extapi.psi.ASTWrapperPsiElement"
psiClassPrefix="Apex"
psiImplClassSuffix="Impl"
psiPackage="au.com.borner.salesforce.psi.apex"
psiImplPackage="au.com.borner.salesforce.psi.apex.impl"
elementTypeHolderClass="au.com.borner.salesforce.psi.apex.ApexTypes"
elementTypeClass="au.com.borner.salesforce.psi.apex.ApexElementType"
tokenTypeClass="au.com.borner.salesforce.psi.apex.ApexTokenType"
psiImplUtilClass="au.com.borner.salesforce.psi.apex.impl.ApexPsiImplUtil"
tokens = [
IDENTIFIER = 'regexp:[a-zA-Z][a-zA-Z0-9]*'
CLASS_KEYWORD = 'class'
LINE_COMMENT = 'regexp://.*'
COMMENT = 'regexp:/\*(.|\n)*\*/'
LEFT_BRACE = '{'
RIGHT_BRACE = '}'
LEFT_PAREN = '('
RIGHT_PAREN = ')'
]
}
apexFile
::= classDeclaration |
interfaceDeclaration |
triggerDeclaration
classDeclaration
::= public CLASS_KEYWORD className ( extends extendsClassName )? (implements interfaceName)? LEFT_BRACE RIGHT_BRACE
interfaceDeclaration
::= public interface interfaceName ( extends extendsInterfaceName)? LEFT_BRACE RIGHT_BRACE
triggerDeclaration
::= trigger triggerName on sObjectName LEFT_PAREN triggerParameters RIGHT_PAREN LEFT_BRACE RIGHT_BRACE
triggerParameters
::= before insert |
after insert
className
::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getIdentifier getName setName getNameIdentifier ] }
extendsClassName
::= IDENTIFIER
interfaceName
::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getIdentifier getName setName getNameIdentifier ] }
extendsInterfaceName
::= IDENTIFIER
triggerName
::= IDENTIFIER { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getIdentifier getName setName getNameIdentifier ] }
sObjectName
::= IDENTIFIER
(on a side note - how do I make the code look pretty on this post?)
The thing i don't like about this is that each NamedElement will have a child of type IDENTIFIER. What I would prefer is for the NamedElement to just have text (no child). Like such:
className
::= [a-zA-Z][a-zA-Z0-9]* { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getIdentifier getName setName getNameIdentifier ] }
or
className
::= [:jletter:][:jletternumber:]* { mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getIdentifier getName setName getNameIdentifier ] }
But Grammar-kit doesn't like either of those.....
BTW: I'm using the Grammar-kit plugin to do a right-click on the Apex.bnf file to generate the _Apex.flex file. My preference is to not have to manually edit the _Apex.flex file after its been generated.
So what's the best way to deal with Java-like identifiers?
Please sign in to leave a comment.
There's a lot of things in your grammar that should be improved not only from implementation point of view but ideologically as well.
The most important one is that this grammar implies that named-element actually IS-A name which is not true (named-element HAS-A name).
If we start to fix this we soon come to far more pleasant BNF and hence PSI.
Something like this:
UPDATE: pin/recover attributes fixed
Thank you for your comprehensive reply! It was really helpful. And I see how it cleans up the generated PSI classes.
I have a couple of questions:
1) I noticed you are using '{' in the grammar, whereas I was using the LEFT_BRACE token. Is there a preferred one to use? If I use '{', I would still have to define LEFT_BRACE token somewhere in the grammar in order to do syntax highlighting, correct?
2) The generated ApexClassDeclaration class has a method: List<ApexId> getIdList(); This is because the class declaraction can have several identifiers. For example "public class mark extends fred implements foo { }" would have three identifiers 1) mark 2) fred 3) foo.
ApexDeclaraction:getId() will return the first one, correct (in this case: mark).
Would this be different if IDENTIFIER was a leaf or not? I'm not sure how I would change your grammar so IDENTIFIER is not a leaf. Could you show me?
3) The grammar-kit plugin is highlighting "pin(".+(Declaration|Clause)")=2 // class/interface/trigger" as Pattern doesn't match any rule. Was your intetion something like this?
pin(".+(Clause)")=1
pin(".+(Declaration)")=2 // class/interface/trigger
4) The grammar-kit plugin is highlighting recover(".+Declaration")="#auto" as an Unused attribute I have to admit that I don't fully understand the recover process yet. So any tips on what this does is appreciated.
Sorry for typos, I didn't check the grammar in the IDE. I've updated it so #3 and #4 are covered.
#1 I personally prefer to use "{" instead of LEFT_BRACE because BNF looks more readable to me.
Both "{" and LEFT_BRACE can be used in the grammar but you're right that LEFT_BRACE have to be defined so that IElementType constant have a name.
Names are autogenerated for non-private rules and keyword tokens but not for syntax tokens.
If the constant is not defined the generated parser will match token by text which is generally slower.
#2 By "leaf" I mean "token" element (com.intellij.psi.impl.source.tree.LeafElement).
ApexId is already not a LeafElement which is good for future functionality.
List<ApexId> method doesn't look good to me so I would make clauses public:
#1 - I agree that '{' is more readable. I'll switch over to use that and keeping the keywords/symbols defined as tokens
#2 - I"m still confused by what is a leaf and what is not a leaf. You say ApexId isn't a leaf, but when I look at the element in the PsiViewer, is says it is PsiLeafElement. See the screenshot. Cursor was on "Mark" in the preview window.
The browser says it's a PsiElement(IDENTIFIER_PATTERN) and the class is LeafPsiElement. Can you help clarify?
#4 - I changed recover(".+Declaration")="#auto" to recoverWhile(".+Declaration")="#auto", but the IDE still tells me it's an Unused attribute. Can you tell me what I'm doing wrong?
Here's the start of my grammar:
{
parserClass="au.com.borner.salesforce.parser.apex.ApexParser"
extends="com.intellij.extapi.psi.ASTWrapperPsiElement"
psiClassPrefix="Apex"
psiImplClassSuffix="Impl"
psiPackage="au.com.borner.salesforce.psi.apex"
psiImplPackage="au.com.borner.salesforce.psi.apex.impl"
elementTypeHolderClass="au.com.borner.salesforce.psi.apex.ApexTypes"
elementTypeClass="au.com.borner.salesforce.psi.apex.ApexElementType"
tokenTypeClass="au.com.borner.salesforce.psi.apex.ApexTokenType"
psiImplUtilClass="au.com.borner.salesforce.psi.apex.impl.ApexPsiImplUtil"
tokens = [
CLASS_KEYWORD = 'class'
TRUE_KEYWORD = 'true'
FALSE_KEYWORD = 'false'
IDENTIFIER_PATTERN = 'regexp:[a-zA-Z$_][a-zA-Z0-9$_]*'
LINE_COMMENT = 'regexp://.*'
COMMENT = 'regexp:/\*(.|\n)*\*/'
STRING_LITERAL = "regexp:\'(\\b|\\t|\\n|\\f|\\r|\\\"|\\'|\\|.)*\'"
]
extends(".+Declaration")=Declaration
pin(".+(Clause)")=1
recoverWhile(".+Declaration")="#auto"
}
apexFile
::= classDeclaration |
interfaceDeclaration |
triggerDeclaration
identifier ::= IDENTIFIER_PATTERN
fake Declaration ::= identifier {
mixin="au.com.borner.salesforce.psi.apex.impl.ApexNamedElementImpl"
implements="au.com.borner.salesforce.psi.apex.ApexNamedElement"
methods = [ getName setName NameIdentifier="identifier" getPresentation]
}
classDeclaration
::= annotationDeclaration* classModifier virtualOrAbstractModifier? sharingModifier? CLASS_KEYWORD identifier extendsClause? implementsClause? classBody { pin=5 }
Attachment(s):
Screen Shot 2014-02-20 at 8.32.37 pm.png
I've been refactoring this evening using the new Grammar and I like it. Except for one thing.
The ApexDeclaration interface has the following method declaration:
@NotNull
ApexIdentifier getIdentifier();
And the ApexDeclarationImpl has the matching method implementation:
@Override
@NotNull
public ApexIdentifier getIdentifier() {
return findNotNullChildByClass(ApexIdentifier.class);
}
Both have the @NotNull annotation on them.
However, all the ApexDeclaration extension classes - like ApexClassDeclaraction - override the getIdentifier() method with:
@Override
@Nullable
public ApexIdentifier getIdentifier() {
return findChildByClass(ApexIdentifier.class);
}
Which is annotated as @Nullable.
So the IDE highlights all these methods with warnings about @NotNull overriding @Null methods.
Funny thing is: if I look at the findNotNullChildByClass() method that ApexDeclarationImpl calls, it can return null! :-)
Is there anyway for me to control the annotations that the Generator generates?
Changing pin to 6 in classDeclaration will guarantee that id is not null. Currently with pin on 'class' keyword classDeclaration indeed may have no name identifier.
#4 Please update Grammar-Kit plugin if you develop with IntelliJ 13. recoverWhile was named recoverUntil in the early days. recoverUntil still can be used for backward compatibility.
Thanks. That fixed it up.
I'm using IntelliJ 12 with Grammar-Kit Plugin version 1.1.2. When I change it to recoverUtil, then it highlights #auto with the warning Unused rule reference.
In order to get references working, I renamed extendsClause to extendsDeclaration so I could get the named element support. That all works good.
How would you suggest I deal with the implementsClause?
My original grammar only had one identifier on this element. But in Apex (like in Java) you can have many. So the definition now is:
So implements clause has-a LIST of names. I'm not sure how to model that - and have references work. I can't change implementsClause to implementsDeclaraction because the getIdentifier() method doesn't make sense - there is not one identifier, there are many,
Any help is appreciated.
#auto recover predicates as well as token PSI accessors are supported in the latest Grammar-Kit version only.
Unfortunately (or luckily) most fixes and new features are always available in the latest versions as we live on the edge here.
You really need to consider moving to IDEA 13.0.3 and 13.1. Community edition is free and Grammar-Kit works there.
Clauses ideologically should not be declarations and named-elements. This is just plain wrong (e.g. in completion and find usages).
You shall read more guides/forums on IntelliJ language plugin developing process as I cannot retype here all of that or write code for you.
References are covered here http://confluence.jetbrains.com/display/IntelliJIDEA/Reference+Contributor.
You may also consider Grammar-Kit code on github as an example.
I have gone through that example; however, it shows how to inject a reference into an existing language. I'm trying to use references within my custom language.
I have read through http://confluence.jetbrains.com/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA#DevelopingCustomLanguagePluginsforIntelliJIDEA-ReferencesandResolve.
I'm trying to figure out how to implement these concepts into my custom language. I appreciate all your help!
I will check out the Grammar-Kit plugin source to see what I can gleen from it.
I've starting using Intellij 13.0.2 and installed Grammar-Kit 1.1.7. Now the generated Parser code won't compile. I went back to the original grammar you suggested, and it doesn't generate compilable code either. Can you tell me if this is a bug or if there is something wrong in the grammer?
Here's what I used as input:
The line in error is in ApexParser (a parameter is missing after builder_,:
Thanks in advance.
Corrected version number as 1.1.7
Just comment the recoverWhile(".+Declaration")="#auto" line.
I thought you would put several declarations in a file, with just one declaration there's no need in recovering.