Implementing C #define directive

Answered

Created March 24, 2023 14:28

I'm implementing a language that has the identical implementation of the C #define directive, and I want to implement something like that in my plugin but find it pretty difficult. So basically what I want is simply to replace all variables that were defined in a #define definition with their value and only then do all the processing (lexing, parsing, highlighting, etc).

I tried to update the content of the file in ParserDefinition#createFile(), but it's not very happy with. I also tried to play with the lexer position to always "go back" and read from the directive definition, this also didn't help.

I'm not sure that's the best approach. If you have a better idea of how to implement C directive (specifically #define), that would be awesome.

Repo: https://github.com/walt-grace/glsl-plugin-idea

9 comments

Yann Cebron

Created March 27, 2023 16:22

Related question with some pointers

https://intellij-support.jetbrains.com/hc/en-us/community/posts/360007641819-Support-for-preprocessing-in-custom-language

more TBD

Yann Cebron

Created March 28, 2023 14:53

Rules:

All chars must be covered by a token. Tokens must be in the same order as in text. Total length must be the same as the size of file.

Whenever preprocessor expands some macro:
At first the preprocessor produces a token (e.g. MacroCall) that cover the original macro text. The parser will ignore this token.
After this MacroCall token, we produce tokens froms macro expansion and they have zero-width in order to not break offsets in PSI.

Unfortunately, I’m not aware of any OSS plugins to use as reference. Please feel free to follow up with questions.

Walt Grace

Created March 28, 2023 15:38

Hi Yann

Thanks for you replying. I would have a few questions and would be awesome if you could help me out. I general I think I get the theory but not the practice. In general, I think I don't get how I'm supposed to expand the the text without changing it's size.

So here are few things I don't get.

1. So the preprocessor (the lexer?) produces a token for all the text that comes after, let's say, #define. But what do you mean by produce? Returns?

2. And then what does it mean the parser will ignore it? How?

3. And I think to one is the harder: "we produce tokens froms macro expansion and they have zero-width". I don't understand that. By producing you mean returning? And also this zero width thing is not clear.

Yann Cebron

Created March 28, 2023 16:29

(1) The output of the lexer can be thought as a sequence of tokens which lexer returns one by one. The output of preprocessor also can be thought as a sequence of tokens. “Produce” means the tokens appear in this sequence. Internally it is done by changing the state of preprocessor in such a way that subsequent calls to token type/advance would return these tokens.

(2) The input of the parser is a sequence of tokens. And the parser consume them one by one. For example it sees tokens if then ( then true then )and so on. MacroCalls if present can appear in any place in this sequence. For example

if (true) // tokens: “if”, “(“, “true”, “)”

#define M(

if M true) // tokens: “if”, MacroCall(“M”), FromExpansion(“(“), “true”, “)”

#define M true

if (M) // tokens: “if”, “(“, , MacroCall(“M”), FromExpansion(“true”), “)”

The parser sees the sequence of token and checks syntax errors and builds PSI. For the purpose of syntax errors the tokens FromExpansion should be treated as normal tokens. The tokens MacroCall should be ignored. But for the purpose of PSI creations MacroCalls should still be added PSI, but FromExpansion tokens should be created as empty tokens.
“Ignoring” means they should be skipped in syntax checking, but still added to PSI.

Another way of saying: parser checks if the current token is `if`, or is it `while` or is it `switch`. And `MacroCall`s should be skipped for this.

(3) “Zero-width” means token length is zero and has no text

Walt Grace

Created March 28, 2023 17:59

Ok I think I'm getting it slowly. Now if you could give some directions of where to do each step. I know it should be something with PsiBuilder and GeneratedParserUtilBase. But something a bit more specific would be great.

Walt Grace

Created March 28, 2023 18:19

And also this line: "Internally it is done by changing the state of preprocessor in such a way that subsequent calls to token type/advance would return these tokens."

I have no clue how to do it. Or where. I guess in the lexer?

Yann Cebron

Created March 30, 2023 09:25

You'll need to employ custom parsing code https://github.com/JetBrains/Grammar-Kit#generated-parser-structure

The preprocessor can be implemented either on lexer or parser level.

Walt Grace

Created March 31, 2023 00:03

Hi Yann

I had a pretty good progress and it's kind of working now, following you general rules. It was very hard to trick ValidatingLexerWrapper to keep reading the tokens, but at the end it worked. Now I have two questions:

1. Do you think there is a job that must be done on PsiBuilder, (assuming I care only about lexing and parsing for now, and less about the psi elements)? Cause I think I kinda of made it without touching anything at the PsiBuilder. So it made me thinking maybe I missed something.

2. More importantly, the formatter complains for non empty text that is not covered, which is obviously very severe cause it deletes the text from the editor. So maybe this is just the result of my first question. How do I approach this problem?

Yann Cebron

Created April 20, 2023 16:41

Sorry for delay.
About (2), this means that some text in the editor, from which the tokens were obtained, did not fall into the interval builder.mark() … mark.done(...)

Please sign in to leave a comment.