Reusing an existing parser

Permanently deleted user

Created December 23, 2008 16:22

I'm writing a plugin for a statically typed programming language.  The language parser is implemented in java.  I read on

  http://www.jetbrains.com/idea/plugins/developing_custom_language_plugins.html

that "IDEA currently does not provide a ready way to reuse existing language grammars (for example, from ANTLR) for creating custom language parsers. The parsers need to be coded manually, as a recursive descent implementation."  I hope I'm misreading that, because I'd really like to reuse the existing recursive descent parser.   I'm considering writing an adapter between PsiBuilder and the internal interface used for a tokenizer/lexer in the parser, so that the PsiBuilder's input is consumed properly, and then doing the parse into the languages standard AST objects, then transforming them to IntelliJ-based AST nodes.

Is this a crazy approach?  I really don't want to maintain two versions of the parser for this language.

Thanks,
Carson

3 comments

Dmitry Jemerov

Created December 23, 2008 16:30

Hello Carson,

Yes, this is doable, and some people have had success with this approach.
However, in our experience, implementing a recursive descent parser based
on PsiBuilder is a relatively straightforward effort, and if the syntax of
your language isn't changing, the maintenance of the IDEA parser won't be
a problem.

I'm writing a plugin for a statically typed programming language. The
language parser is implemented in java. I read on

http://www.jetbrains.com/idea/plugins/developing_custom_language_plugi
ns.html

that "IDEA currently does not provide a ready way to reuse existing
language grammars (for example, from ANTLR) for creating custom
language parsers. The parsers need to be coded manually, as a
recursive descent implementation." I hope I'm misreading that,
because I'd really like to reuse the exist

ing recursive descent parser. I'm considering writing an adapter
between PsiBuilder and the internal interface used for a
tokenizer/lexer in the parser, so that the PsiBuilder's input is
consumed properly, and then doing the parse into the languages
standard AST objects, then transforming them to

IntelliJ-based AST nodes.

Is this a crazy approach? I really don't want to maintain two
versions of the parser for this language.

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"

Permanently deleted user

Created December 23, 2008 16:45

Let me further clarify:

The parser already has a notion of ASTs in it called "ParseTree"s which map exactly to ASTNode (minus getElementType(), which would be trivial to implement) and of ParsedElements, which map fairly cleanly to PsiElements. Looking at the code, and PsiBuilder in particular, here is what I think I would have to do:

Parse the code using the regular language parser (how can I get to the original text? Do I need to shim it through to PsiParser#arse() somehow?)
With the language ParseTree's, use PsiBuilder to build up an intellij compatible ASTNode tree (ugh. Or can I just visit all the ParseTree's and new up ASTNodes that are properly assembled? That would be much easier.)
The PsiElement for the ASTNode would be an adapter to the parsers internal PsiElement, and would probably be created immediately when we create the ASTNode, since the information is already there from the original parse.

Does this sound bonkers? Again, the goal is to avoid maintaining two parsers for the language.

Cheers,
Carson

Permanently deleted user

Created December 23, 2008 18:38

Dmitry,

Thanks for the feedback. The reason I see not to implement another parser is that the existing parser has quite a bit of functionality built into it: type inference, method scoring, error reporting. Am I correct in assuming that all of these features would have to be reimplemented in the IDEA parser? If not, is there a better way to tie the two together?

Thanks agan for the rapid feedback.

Cheers,
Carsn

Please sign in to leave a comment.