How to highlight escaped symbols for injected language into xml attribute value

Answered

I am contributing to the Intellij plugin for the AEM platform (current branch - https://github.com/aemtools/aemtools/tree/209_jcr_language_highlighting_issue)

The goal that I want to reach is to highlight valid/invalid escaped symbols like ", &, ", etc.

In my lexer I expect the following tokens:

VALUE_TOKEN=.
ARRAY_VALUE_TOKEN=.
XML_ENTITY_ESCAPE=&(quot|amp|apos|lt|gt);
XML_DECIMAL_ESCAPE=&#[0-9]+;
XML_HEX_ESCAPE=&#x[0-9a-fA-F]+;
XML_ATTRIBUTE_VALUE_SPECIAL_CHARACTERS_ESCAPE=\\[,\[\]\{\"]
VALID_ESCAPE=(\\t)|(\\b)|(\\n)|(\\r)|(\\f)|(\\\\)
UNICODE_ESCAPE=(\\u[\da-f]{4})
INVALID_UNICODE_ESCAPE=(\\u[0-9a-f]{3}?)
INVALID_CHARACTER_ESCAPE=(\\\w)

However, they are recognized as simple VALUE_TOKEN or ARRAY_VALUE_TOKEN. For example, input:

["Test & Test",'Test2 & Test2']

In the parser and lexer tests it is good, but when I am testing on real project files then I see that lexer has escaped symbols in the input:

["Test & Test",'Test2 & Test2']

If I replace some token pattern to match escaped symbols (like XML_ENTITY_ESCAPE=(\"|&|'|<|>))  then highlighting will work as expected.

I assume it is due to XmlAttributeLiteralEscaper, but I don't know what should I configure to have unescaped text (xml attr value) passed to the parser.

Does someone face such an issue? Could you please help me to solve it?

0
4 comments

Hi Karol Lewandowski, I am sorry for tagging you directly. You helped me last time with syntax highlighting for my custom injected language (https://intellij-support.jetbrains.com/hc/en-us/community/posts/5750380999570-Syntax-highlighting-for-injected-language-into-XML-attribute-value-is-not-working). Have you ever encountered such a problem?

0

Hi! Looks like we don't have a direct way to control the escaper for 

XmlAttributeValueImpl

I think there are two options here:

  • Implement a new language and override the escaper for the attribute values.
  • Get rid of the injection and implement it as an html extension (something that is done for HTML+JS).
0

Hi Andrey Starovoyt, thank you for your response. I have some questions regarding your suggested options: 

  1. Regarding option 1: Do you mean to create some template language to match only XML attribute values and other pieces as OUTER_LANGUAGE?
  2. Regarding option 2: AFAIK, HTML+JS (script tag) is implemented using injection as well (com.intellij.psi.impl.source.html.HtmlScriptLanguageInjector).
0

Hi! 

1. No, I meant to create an extension of HTML language. But I think the idea with template language is also valid. 

2. No, see com.intellij.html.embedding.HtmlEmbedment

0

Please sign in to leave a comment.