Grammar-kit and rules

Answered

Created February 05, 2022 12:54

Hi, I'm developing a language plugin and ran into a problem. In this language there are many ways of referring to variables, to local ones - #identifier, to global "identifier", to cpu memory :

STATUSBYTE := DB101.DB10; 
STATUS_3 := DB30.D1.1; 
Measval := DB25.DW20; 
STATUSBYTE := Statusdata.DB10; 
Measval := I1;

Below is part of my code for an example. The problem is that because of the token identifier = "regexp:([a-zA-Z]+|_([a-zA-Z]+|[0-9]+))(_?([a-zA-Z]+|[0-9]+))*" tokens from memory_prefix and size_prefix are not detected, so the expression DB1.DX11.1 is not defined.

How is it more correct to describe the various treatment options? through regular expressions or grammar rules?

absolute_address ::= indexed_memory_access
                    | absolute_db_access
                    | access_local_instance

indexed_memory_access ::= address_identifier indexed_access
absolute_db_access ::= address_identifier address //DB1.DX11.1 <- here
structured_db_access ::= db_identifier '.' identifier
address ::= number [ '.' number]
db_identifier ::= 'DB' number

address_identifier ::= db_identifier '.' 'D' size_prefix

memory_prefix ::= 'I' //input
| 'Q' //output
| 'M' //bit memory
| 'PI' //peripheral input
| 'PQ' //peripheral output

size_prefix ::= 'X' // bit
| 'B' // byte
| 'W' // word
| 'D' // double word

1 comment

Karol Lewandowski

Created February 07, 2022 13:28

Hi Nikita,

It's hard to fully understand your language by just a part of grammar, but choosing the correct approach depends on the language semantics and the planned support. The Grammar-Kit's responsibility is to generate PSI of your language and the grammar should be designed in a way that will make it possible and comfortable to navigate the PSI tree and extract information from your inspections, references, completions, etc. code.

I would say that the rule is to have grammar rules only when you will need to access corresponding PSI elements in your plugin features. It also doesn't mean that if you will need to know the memory prefix, the only way is to have a grammar rule and element for it. You can always add an additional method to your higher-level element which returns the prefix value.
Also, the more shallow and simple the PSI tree is, the better, so it is advised to limit the number of rules if possible/makes sense.

Please consider the following: do you need all of these rules to be separate PSI elements? E.g., does the memory_prefix make sense in the tree and will there be a use case when it is accessed as a separate element or it will always be part of higher-level element? If the latter is the case, then it's better to make it part of a token (so the "regular expression" as you referenced it).

Please sign in to leave a comment.