Help Handling a Language With No Command Delimiters

已回答

创建于 2020年01月03日 19:54

I am trying to write a plugin for an uncommon language, I would add some code completion, but more so I wanted to add inlay hints and links to grammar notes, and syntax checking. The Language is peculiar in that it has no command delimiters. The language has a set number of commands, and they each require 0 or more variables. The length of required arguments is known in advanced. The language looks like

scrp 0 1 2 3 base 4 anim [012] over doif obv1 eq 10 setv obv2 var1 endi stim writ 13 drea endm

If this language were more like java, the code would look like:

void Srcp01234() {
   base(4);
   anim("0123");
   over();
   doif (obv1 eq 10) {
       setv(obv2, var1);
   }endif 
   stimWrit(13);
   drea();
} endm

So I know all commands and how many arguments each command takes, but I am not sure how to parse this as a grammar. I tried hard wiring in all the commands and how many arguments each command takes in a test version, but there are over 200 in the first version of the language and over 650 in the latest. The problem really though is that the code is incredibly brittle. If one command is wrong, the entire file becomes nonsense tokens.

If I hardwire things in, I do not know how to make the code robust, and if I do not hardwire, I do not know how to break up the code into meaningful chunks.

I also tried to keep things generic, but it creates a long set of nesting, with each command nesting the next, but I do not know how to break up the nesting when one command ends.

I was trying to alter the ParserUtil class, but I couldn't figure out how to tell if a new command was started in the code.

I also thought about having one parser to break the language into tokens, and then somehow manually construct the tree in another, but I cannot imagine that being efficient or easy.

I have made language plugins in the past, and have constructed several BNF grammars in the past, but those languages were more straight forward, with command parenthesis, brackets, or a language with only one argument per command. I do not know how to solve this problem.

If anyone has any ideas, I am open to them. I have been working on this for a few weeks, and just cannot wrap my head around how to do this.

1 条评论

Yann Cebron

创建于 2020年01月28日 10:40

One suggestion:

If these commands are keywords, that is quite easy to parse it as r ::= <<kw>> <<non_kw>>* with keywords defined elsewhere.

As in "use external rule and programmatic parser"

请先登录再写评论。