Starting point for a Java-like grammar for Grammar-Kit?

I've started working on a custom language plugin for a very Java-like language (Force.com's Apex) by following the excellent Custom Language Support tutorial.  Because this language is so much like Java, I used a reference Java BNF to guide my initial implementation.  Very quickly I ran into Grammar-Kit's inability to generate left recursive parsers because of the heavy usage of left recursion in every Java BNF I've found.  Some of these are immediate and can generally be solved easily by changing, for example:

fieldDeclaration ::= fieldModifiers? type ";"
fieldModifiers ::= fieldModifier | fieldModifiers fieldModifier
fieldModifier ::= public | protected | private | static | final

to:

fieldDeclaration ::= fieldModifier* type ";"
fieldModifier ::= public | protected | private | static | final



However, when I get into mutually-recursive constructs such as expressions, things get much more complicated!  Before I dive into the tedious process of removing left recursion from all of these rules (potentially changing the grammar's associativity, etc.), I thought I'd check here to see if anyone knew of an easier path to my goal!

I'm not married to Grammar-Kit's parser generator.  It's obviously the one used in the tutorial, and it provides a convenient binding to IDEA's Psi mechanism.  However, if another parser (Antlr?) generator can pretty much consume a modified Java BNF directly that can be easily adapted for IDEA's needs, please let me know about it.  Or, if someone has already created a Grammar-Kit-compliant BNF for a Java-like language that they're comfortable sharing freely, that's great as well.  Or maybe I'm overcomplicating this and Grammar-Kit already supports this through some other mechanism that I haven't discovered yet.

Thanks in advance for any assistance you can provide!
Scott
6 comments
Comment actions Permalink

Grammar-Kit supports left recursion in one special case: Pratt-like expression parsing.

Here's an example: https://github.com/JetBrains/Grammar-Kit/blob/master/testData/generator/ExprParser.bnf
Another one is expression parsing in Erlang plugin grammar: https://github.com/ignatov/intellij-erlang/blob/master/grammars/erlang.bnf

I'd also like to point out that writing initial grammar is just the first step of making it really work, perform IDE-grade error recovering and produce nice PSI tree.
This means one will have to spend time tuning and debugging the grammar so it should be as short and as simple as possible.
It is not big deal to spend a couple of hours converting the original grammar in a better form learning its inner structure along the way.
Live Preview mode can even make this task a fun thing to do, here's a small tutorial: https://github.com/JetBrains/Grammar-Kit/blob/master/TUTORIAL.md

0
Comment actions Permalink

Thanks for the reply!  I actually found several Java grammars in the antlr distribution that had already worked through the removal of left recursion, so I grabbed one of those as a starting point.  I've been massaging the resulting Grammar-Kit grammar for the differences between Java and Apex (initially focusing on the intersection, then I'll add the Apex features that don't exist in Java) and making the Psi tree work better for my purposes.

You're right that getting an acceptable grammar is only a small part of the work!  I'm spending quite a bit of time refactoring the rules and making some (most!) private.  I haven't had to play with extends or fake rules yet because so far I'm able to produce a very nice, shallow, well-defined Psi tree, but I imagine I'll need those before it's over.

I already have syntax highlighting and a solid structure view working off of this Psi tree, so I'm well on my way!

Oh, and live preview actually isn't working well for me at all.  I don't know if it's my grammar or something else, but just typing "public class ClassName" and then typing a left brace into the live preview window locks up IDEA completely and I have to kill the process and restart it.  Any ideas what might be going on there?  When I run/debug the project that doesn't happen and I can watch the resulting Psi tree in the PsiViewer screen as I type, so I don't think my grammar is completely broken!

Thanks much!
Scott

0
Comment actions Permalink

I'm interested in fixing any problems with Live Preview so you can send me anything to reproduce the problem (email or github). It will be fixed.

0
Comment actions Permalink

Sorry, I just saw this.  Let me see if I can distill it down to something small that I can share with you because I'd love to have that level of immediate feedback when possible.

0
Comment actions Permalink

Hi,

I know this is a long shot, but is that Java BNF file available or posted anywhere? I am also working on a Java-like language and am hitting many of the same problems you did (ex: left recursion issues with expressions).

 

Thanks!

1
Comment actions Permalink

Trent, I looked back on my version history to see if I had a link to the grammar that I used as the basis for my custom language plugin's grammar back in early 2013, but I couldn't find anything specific. Reviewing this thread helped refresh my memory that the antlr Java grammars had already worked through these issues, though, so you might find these useful:

https://github.com/antlr/grammars-v4/tree/master/java

The one I used at the time was for Java 5 and these are for Java 7 and 8, but hopefully the core rules are similar enough for your purposes.

I also found this Java 5 grammar from Eclipse Che that might be useful since the language was much simpler in those days than now, especially if you just care about a smaller subset of Java's language features for your Java-like language:

https://github.com/eclipse/che-lib/blob/master/antlr-java5-grammar/src/main/resources/org/eclipse/che/plugin/jdb/server/expression/Java.g

I hope that helps!

2

Please sign in to leave a comment.