Enhance Spellchecker
Answered
Hi!
I Would like to change the default Behavior of the spellchecker.
The spellcheck is too dumb in my opinion. It checks always the whole word as it is. It would be smart if it would check if one (unknown) word contains two or more (known) words.
Examples:
phpinfo => Typo (BUT php, info or phpInfo is OK)
emailserver => Typo (BUT email, server is OK)
emaildnsserver => Typo (BUT email, dns, server is OK)
How to override?
Where can i find the logic where words will be divided into words by camelCase?
Please sign in to leave a comment.
You can provide custom com.intellij.spellchecker.tokenizer.SpellcheckingStrategy and/or provide custom bundled dictionary via com.intellij.spellchecker.BundledDictionaryProvider
Ive created a class:
And Registered it
But i dont see any debug output. *scratch head*
I have also registered a "ApplicationStart"-Component which can be reached, so the Plugin is working i general
That's because there's an existing strategy for HTML already com.intellij.spellchecker.xml.HtmlSpellcheckingStrategy
try adding _order="first"_ in your plugin.xml declaration for EP
I ended up by adding a new custom inspection (extends SpellCheckingInspection).
To solve my problem i needed some Recursion and smart Algorithms. Not the easiest task :-D
I try to find so much valid words as i can first and put them in a child-parent-word-tree (...). But there are a lot of Words which are valid then. Example:
Word: "emailserver"
validWordsTree: {em=null, ai=em, ls=ai, er=erv, erv=emails, ail=em, server=email, ails=em, ema=null, il=ema, ilse=ema, rv=ilse, email=null, emails=null}
As you can see many "words" with only two letters are valid. So i decided for my Plugin to use a minimum length of 3 chars to reduce false positives:
validWordsTree: {ema=null, ilse=ema, email=null, server=email, emails=null, erv=emails}
This works well so far. But i figured out that some words are not valid where there should be. Example "dnsserver":
isValidWord: dnsserver: false
isValidWord: dns: false (<= Should be true)
isValidWord: dnss: false
isValidWord: dnsse: false
isValidWord: dnsser: false
isValidWord: dnsserv: false
isValidWord: dnsserve: false
isValidWord: dnsserver: false
I check these with
So the Question is: Why is "dns" or "php" not a valid word with that check "!myManager.hasProblem(word);" ?
These words are valid standalone.
TL;DR. The Quesion was:
Why is "dns" or "php" not a valid word with that check "!myManager.hasProblem(word);" ?
These words are valid standalone.
So basically now you have default Spellchecking inspection working and yours in addition? Or do you suppress default's "false positives" in your plugin?
And what is "myManager"? Please always share full code.
yes, my in addition. to eliminate the false ones i add words that i found with my plugin as correct to the dict (if you have a better idea to override the origin instead, please tell).
myManager is com.intellij.spellchecker.SpellCheckerManager. As i said i extends everything from SpellCheckingInspection (From here comes the myManager in MyTokenConsumer (I Dont know why it has the my-prefix when it comes from you :-) )
Full Code for now:
(most is copy paste, see end of file for relevant changes)
As for the question about spellchecker and words -- spellchecker ignores words with length <= 3. So it actually never calls `hasProblem` for `dns`. And actually it may not include a lot of valid words of length <=3 in the end.
As for the whole problem -- we are planning to migrate to Lucene Java-pure Hunspell implementation that would solve this problem out of the box :)
If youre plan something, then i can wait - Thanks