Adding Extra Information During A Parse

Permanently deleted user

Created April 20, 2011 11:41

Hi, Jon,

Yes, parser creates markup of the code, using tree elements, but you may create PsiElements over these tree elements and in those PsiElements you have acess to the whole code of the file. You may also find out the position, neighbors, siblings, children - whatsoever. You may find more information here: http://confluence.jetbrains.net/display/IDEADEV/Developing+Custom+Language+Plugins+for+IntelliJ+IDEA. Basically, PSI nodes are created in createElement method of your instance of ParserDefinition. After checking the type of the node, that comes as a parameter, you can decide which class to instantiate. In the most common case, the class that you will instantiate here should be inherited from ASTWrapperPsiElement. Inside your inherited class you'll have acess to context.

If this note did not help, please specify more details on your particular case, so that we could better advise you.

Thank you.
Valeria

Jon Akhtar

Created April 20, 2011 16:37

To put it another way, the way the AST building works now the AST is actually not built until after the parse is complete, or it may already be built and the parse is just going to modify it.

There is some semantic information that is available during parsing that I would like to preserve in the AST, but i don't see a direct connection between the point at which you mark the boundaries of the AST node using PsiBuilder and the bulding of the actual node, so there is no place to store the data.

I have been adding special nodes which just serve to mark some bit of context data, I just wondered was there another way to get data from the parse into the AST.

Dmitry Jemerov

Created April 20, 2011 16:41

Hello Jon,

Unfortunately not. The only information passed from the parser when building
the PSI is the element type.

To put it another way, the way the AST building works now the AST is
actually not built until after the parse is complete, or it may
already be built and the parse is just going to modify it.

There is some semantic information that is available during parsing
that I would like to preserve in the AST, but i don't see a direct
connection between the point at which you mark the boundaries of the
AST node using PsiBuilder and the bulding of the actual node, so there
is no place to store the data.

I have been adding special nodes which just serve to mark some bit of
context data, I just wondered was there another way to get data from
the parse into the AST.

--
Dmitry Jemerov
Development Lead
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"

Permanently deleted user

Created November 05, 2013 14:41

Hi guys,

I see this thread is 2.5 years old; has there been any change in the Open API since then that would change the answer?

I'm working on a parser for Idea for the Ceylon language (www.ceylon-lang.org). The Ceylon language already has a parser based on ANTLR and a lot of infrastructure based on it and on the AST and language model it produces (specifically a Type Checker, and a lot of stuff that already works nicely in an Eclipse IDE). That's a lot of code that we'd definitely like to reuse for the Intellij Ceylon support.

We already have a PsiParser parser prototype that seems to work sufficiently well. (The basic idea is that we have the Antlr parser use a TokenStream (Lexer) that uses the PsiBuilder in the background, and the PsiParser that calls Idea's mark/done methods where the Antlr parser stars and ends nodes.)

So we get a (sufficiently) correct Idea AST, and a corresponding PSI tree.

The main unsolved issue now it to be able to associate the created PSI nodes with the native Ceylon's AST nodes. The native nodes are already created by the parser, but (as the above answers in this thread conclude), it is impossible to bind them to Idea's AST (and subsequently PSI) nodes during parse time.

The structure of the Ceylon's native tree is by design different from Idea's AST: it doesn't correspond directly to the parsed source since it doesn't necessarily store the order of a node's children etc. In other words, it is not a parse tree, but a more abstract syntax tree. Put differently, for a given native tree, several different sources (or Idea ASTs) are possible (even ignoring whitespace etc.), eg. with some elements ordered differently.

The way we are currently trying to bind them (with limited success) is to create our own parse tee (https://github.com/ceylon/ceylon-ide-intellij/blob/master/src/org/intellij/plugins/ceylon/parser/MyTree.java) that has the exact same structure as Idea's parse tree (AST) and at the same time holds references to the created native nodes. The tree could then be traversed parallelly with the Idea's AST (or perhaps PSI) since they have the same structure, and the associations made. This is what the bindSubtree() method in MyTree.MyNode does. A MyTree instance is associated with CeylonFile (which extends PsiFileBase) using the UserDataHolder methods (is that even correct? -- couldn't find any docs) in hope of having it available when necessary.

This works when a file is completely parsed, eg. parsed for the first time, but of course fails when PsiBuilder.getTreeBuilt() throws a ReparsedSuccessfullyException, which happens basically every time a file is edited. The current solution is thus quite useless, since the native node associations are wrong after a file is edited (even though the PSI is correct). When a ReparsedSuccessfullyException is thrown, I couldn't find a way to get hold of the new AST to associate it with the native tree.

It's perhaps worth noting that Ceylon's native AST is not the AST that Antlr is capable of producing out of the box, but is custom built.

We first tried taking a completely different approach, creating a bnf grammar from the original Ceylon Antlr grammar, using Grammar-Kit, and working from there. This relatively quickly produced some useful results, but it seems quite impossible to then reuse any of the existing code that is based on the native AST, and that is a lot of code.

I've also noticed that Terence Parr, the Author of Antlr, has been active trying to integrate Antlr and Idea, which is great, but his work so far doesn't seem to address our problems.

So.. the original question remains: is there a way to associate some information that is available at parse time, somehow, with the AST/PSI tree that is eventually created?

Or is it possible to get hold of the AST when a ReparsedSuccessfullyException is thrown in PsiBuilder.getTreeBuilt()?

Thanks for any helpful ideas!

Matija

Permanently deleted user

Created November 09, 2013 15:17

Hi, Dmitry and all,

I just wanted to let you guys know that a few months ago Red Hat made the decision to start work on IntelliJ support for Ceylon, and for the last month or so, Matija and Bastien have been hard at work trying to integrate the Ceylon typechecker with IntelliJ. They've overcome a number of challenges with this, but are now rather stuck on how to integrate with incremental builds.

I would be great if you you guys could help with some direction here, or if not, let us know that there's simply no reasonable way to go about integrating a pre-existing typechecker with IntelliJ (I can't really believe that this could possibly be true).

Thanks!
Gavin King

Permanently deleted user

Created November 09, 2013 15:20

FTR, the (rather long) relevant thread on ceylon-dev is here:

https://groups.google.com/forum/#!topic/ceylon-dev/IUJ2cHyXyBQ

Jon Akhtar

Created November 09, 2013 16:06

As with all software. There are always ways. I was just asking Dmitry if there was a "correct" way. There wasn't really. So you have to engineer one.

One simple way is to maintain your parse tree, then use an annotator later to grab the extra info you stored for that node. (there is lots of bookkeeping to make sure your parse is the parse that produced the psi tree being annotated. You then UserData entries on the Psi elements. These are maintained and survive Psi tree merges.

I have done it - it works. I am not sure that code is in any of my published sources though.

Much of this depends on how much effort you want to put into it.

There was another langugage plugin developed using the languages own external compiler. I talked with the developer on these forums a while back. Gosu. Now I don't know that it has any application to what you are doing - but have a look at least and see if there is anything there that might help you.

I'll also be happy to help answer questions if I can. I don't know everything about this topic, but I have been working on my language plugin for 3 years, so I have learned a few things here and there.

Permanently deleted user

Created November 09, 2013 17:48

Jon, thanks for the advice! I'll dig into what you suggest, and I'll be happy to ask some questions later :) .

Matija

Dmitry Jemerov

Created November 10, 2013 12:28

The only JetBrains-developed plugin that reuses the typechecker from the compiler is the Kotlin plugin. In the case of Kotlin, the compiler uses IntelliJ IDEA's parsing infrastructure (PsiBuilder and the PSI tree), so reusing the information wasn't particularly difficult.

All other plugins use typecheckers developed by JetBrains from scratch based on top of our PSI implementation, without reusing any code from the original compiler for the language.

Because of that, we haven't really investigated what the best way to solve this would be, and don't have any ready-made guidance.

Permanently deleted user

Created November 22, 2016 11:29

Matija,

Have you solved the problem already? I was facing the same issue when developing a custom language plugin for IDEA with an external ANTLR based compiler.

I solved the issue by a trick and I don't know if you find anything better.

The trick is to use a special type when create markers with PsiBuilder. Instead of using normal element type instance, use an element type instance that holds the native parse tree and/or abstract syntax tree node instance. Then extend the lang.ast.factory extension point with a customized ASTFactory to replace the 'special' element type with a normal one when creating the Idea AST node, but add the native tree node reference carried to the ASTNode with the UserDataHolder.putUserData() method. Then everything is normal to create PSI tree, and now the PSI tree holds reference to the native tree node-by-node.

A small issue is that the number of custom ElementType has a limit, but we can workaround this using the constructor with a parameter `register=false`

James Crawford

Created April 13, 2024 00:08

So over 10 years later, I am asking the same question for the same reasons. Is there any way to attach user data when creating Markers that would then be available when building PSI elements from the resulting ASTNodes?

I also have my own tokeniser/parser and want to be able to access my own AST tree from the PSIElements.

The suggestion above about creating new custom IElementType instances each time is explicitly called out as a mechanism that won't work in comments in the code so I can't use that approach.

From browsing the code in PsiBuilderImpl it looks like it would be simple enough to provide a way to attach user data at the time of Marker.done() that could then be stored in the corresponding ASTNode when it is constructed. That way developers could extract it in ASTFactory.createElement(ASTNode) and attach it to the relevant PsiElement as needed.

It is frustrating that there is no mechanism for doing this.