How is text associated with IElementType and how can it be overridden?

Answered

Created November 12, 2018 21:32

Hello,

I'm developing a plugin for a language that uses macros. I'm expanding the macros in a `LookAheadLexer` for my language and inserting the resulting token in the actual file. This expansion needs to be done on the lexer level as the tokens need to be added before language rules parsing (i.e., this is a pre-processor and not part of the language)

I do this by calculating the expanded text; running it through another builder/lexer to get the actual tokens the macro is generating. Then inserting the tokens directly after the macro.

This expansion works for the BNF checker which sees tokens and parses them correctly.
I'm having problems with following tasks like resolution and getting names for inserted IDENTIFIERS.

## Pseudo code example:

```
`define get_type_name_func(T) \
 const static string type_name = `"T`";

class example {
 get_type_name_func(foo_)
}
```

The result means that the class example now has a static string named `type_name` with a string named `"foo_"`.

Currently my `LookAheadLexer` correctly inserts the tokens `kw_cont`, `kw_static`, `kw_string`, `IDENTIFIER`, `EQ`, `STRING`. Which is correct and the BNF correctly parses through these tokens.

The problem comes with resolve. The example class has a static string variable, but when it tries to get the name for the inserted `IDENTIFIER` it returns `null`.

## Here are my questions:

What associates `IDENTIFIER` (and other) `IElement` tokens with text?
How can I modify how this?
I have all the information for what text each token should have in the `LookAheadLexer`. How can I send this information forward?

4 comments

Edaphic Studio

Created November 12, 2018 22:05

Update here is a picture showing the issue in the editor. The main issue is that the inserted IDENTIFIER is dropped somewhere resulting in a null pointer in the expected rule (which breaks against assumptions!). I believe this is because the associated tokens do not have a text, textLength, textOffset etc. I have all needed info in the Lexer but I don't understand how to pass the information forward.

Alexandr Evstigneev

Created November 13, 2018 09:43

Lexer can't do what you want to.

It's just grouping chars into tokens with some type. You can't alter the text.

In your case, you've just added some empty tokens (tokens without text) and parser could parse them.

Lexer provides only range int text (possibly - empty) and type, nothing more. You can't collect some information and pass it to parser using these empty tokens.

You could lex and parse text as it is and build more sophisticated logic on top of your psi tree. Make your plugin understand that this is a macro with this value.

Edaphic Studio

Created November 13, 2018 13:59

Ok. Thank you for the feedback. It's frustrating as I have all the information (that takes CPU time to calculate) during the lexer pass but there is no convenient way of passing it upwards. I think I understand how to rework my r logic to resolution logic to do the same work.

Alexandr Evstigneev

Created November 14, 2018 06:21

Please, note that lexer used not only for psi builder, but for indexing too. Lexer and parser performance affects many things, including editor responsiveness.

Meaning that it's better to keep it fast and simple as possible and implement complex logic on higher levels.

Please sign in to leave a comment.