support for #include "file.ext" directive
Answered
I need to implement support for #include “file.ext” directive in my custom language.
Idea behind this directive is the same as in other languages like C or C++.
I need to have access to included file at lexer and parser level but that path of included file is relative to current document. Lexer interface takes as an input CharSequence and it does not understand files, so I cannot resolve relative file name to absolute path.
Do you know any solution to this problem?
Please sign in to leave a comment.
Unfortunately there is currently no sample code for your setup I'm aware of that we can share. You might try and browse for existing language plugins and find a similar use case https://plugins.jetbrains.com/search?products=idea&tags=Languages
For managing file include information at later stages, com.intellij.psi.impl.include.FileIncludeProvider could be used.
Good news, today I can share some instructions. HTH.
Hi Yann,
Many thanks for your very detailed explanation! It helped me to confirm that my approach is correct as it is almost the same now. I hope I will not get stuck somewhere dramatically.
Thanks also for linking this answer to my previous question. Both questions are related to preprocessing but this is more specific and assumes that the general solution is already worked out, so let me follow up on it here. I will ask some additional questions in the previous thread.
--------------------
My general problem raised in this thread is still not clear to me.
Let me quote relevant fragments from your answer:
After returning all the <#include> and <string> back, the lexer queries all symbols from header.h and especially all #DEFINES.
...
ParserDefinition.createLexer(Project) is sometimes called with "null" But "real" files are lexed/parsed via IFileElementType or IStubFileElementType API
What does it mean “lexed via IFileElementType”? Lexer interface accepts CharSequence only and there is no information about the containing file. Also Lexer instance is created using ParserDefinition.createLexer(Project) which gives us the context of the project, not the file.
In the presented example header.h is (or could be) relative to source.c. How Lexer can find header.h file if it does not about source.c?
This is great stuff Yann, thanks for such a detailed answer. I've filed it away in case I ever need to do this.
I have one question - as far as I am aware, it's a basic restriction that index information for a file should only be based on that file's contents, not on any other file's content. This is so that the index invalidation works correctly when files are modified. This implementation is a clear violation of that, how should that be handled? Can the index information for source.c be invalidated somehow when header.h is modified? Or is some other approach recommended?
Colin Fleming The suggestion would be to pass VirtualFile in the constructor.
Thanks for sharing this information. Replacing the macro by whitespace followed by it's content as zero length tokens is really smart. Are you using a different JFlex skeleton for this in CLion (JFlex has a nested skeleton with yypushStream/yypopStream for that purpose)? I've currently decided not to change the skeleton, but to wrap the “bare lexer” by one which will expand macro tokens for the language plugin I'm writing, because this way I can still easily decide not to care about macros e.g. for my syntax highlighter, but I'm not sure if that's the way to go, yet.
The problem Colin Fleming has is also going through my mind a bit and the lazy-cache you've mentioned is still very problematic. Of course the Lexer knows the VirtualFile or PsiFile object now after overriding doParseContents, but that doesn't give you the ability to get the outer scope, yet. I mean if you have the following constellation:
Then the header.h has to know that X is defined (outer scope). The symbol cache is depending on the include order and the contents of other files outside it's inclusion. Even worse, a single file (here header.h for example) can be included twice in different places if there is no guard block. So it has two different “scopes”. The lazy cache cannot be so "lazy" then.
That leads to another question: is the lazy cache built during indexing step and is the psi tree of all open editors rebuilt after indexing has finished? Or can we tell the IDE to forget about the previously built psi trees (with and without the stubs in the index) after a background task has finished or so in case we want our own separate indexing process?
And Colin Fleming, maybe if we have a file based index with psi stubs where only the preprocessor statements are defined you could essentially trace the path of inclusion back to get the beginning scope of the file you are looking at. So when starting to lex header.h you get the different scopes and macros where it is inside by doing a ReferenceSearch on the PsiFile, to the
#include “header.h”
statements and recursively gather the scope from there until there is no more inclusion. This way you can still stay strictly file based and it might still be quick depending on the language and the depth of the includes of course (but humans should also be able to know their code, so a depth of 100000 is unlikely, isn't it? Endless recursion should be prevented though).And I'd like to point out a funny but interesting limitation of the method shown here, as it still requires that each file in itself is syntactical “complete” although such a constellation would be valid and compile:
Of course that's an absurd one and you'd probably never write this. However, the outer scope is only known to a certain extent that your lazy-built symbol cache can establish.
Update:
When editing the file the lexer starts in the middle of the text. The reset method gets called with the integer state, but how do you handle recursive states when traversing included files? You only have an integer.