Enhance Spellchecker

Answered

Hi!

I Would like to change the default Behavior of the spellchecker.

The spellcheck is too dumb in my opinion. It checks always the whole word as it is. It would be smart if it would check if one (unknown) word contains two or more (known) words.

Examples:

phpinfo => Typo (BUT php, info or phpInfo is OK)

emailserver => Typo (BUT email, server is OK)

emaildnsserver => Typo (BUT email, dns, server is OK)

 

How to override?

Where can i find the logic where words will be divided into words by camelCase?

0
9 comments

You can provide custom com.intellij.spellchecker.tokenizer.SpellcheckingStrategy and/or provide custom bundled dictionary via com.intellij.spellchecker.BundledDictionaryProvider

0

Ive created a class:

package application;

import com.intellij.psi.PsiElement;
import com.intellij.spellchecker.tokenizer.SpellcheckingStrategy;
import com.intellij.spellchecker.tokenizer.Tokenizer;
import org.jetbrains.annotations.NotNull;

public class spellchecker extends SpellcheckingStrategy {

@NotNull
@Override
public Tokenizer getTokenizer(PsiElement element) {
System.out.println("+++");
System.out.println(element);
return EMPTY_TOKENIZER;

}

}

And Registered it

<extensions defaultExtensionNs="com.intellij">
<spellchecker.support language="HTML" implementationClass="application.spellchecker"/>
</extensions>

But i dont see any debug output. *scratch head*

 

I have also registered a "ApplicationStart"-Component which can be reached, so the Plugin is working i general

 

0

That's because there's an existing strategy for HTML already com.intellij.spellchecker.xml.HtmlSpellcheckingStrategy

try adding _order="first"_ in your plugin.xml declaration for EP

0

I ended up by adding a new custom inspection (extends SpellCheckingInspection).

To solve my problem i needed some Recursion and smart Algorithms. Not the easiest task :-D

I try to find so much valid words as i can first and put them in a child-parent-word-tree (...). But there are a lot of Words which are valid then. Example:

Word: "emailserver"

validWordsTree: {em=null, ai=em, ls=ai, er=erv, erv=emails, ail=em, server=email, ails=em, ema=null, il=ema, ilse=ema, rv=ilse, email=null, emails=null}

As you can see many "words" with only two letters are valid. So i decided for my Plugin to use a minimum length of 3 chars to reduce false positives:

validWordsTree: {ema=null, ilse=ema, email=null, server=email, emails=null, erv=emails}

 

This works well so far. But i figured out that some words are not valid where there should be. Example "dnsserver":

isValidWord: dnsserver: false
isValidWord: dns: false (<= Should be true)
isValidWord: dnss: false
isValidWord: dnsse: false
isValidWord: dnsser: false
isValidWord: dnsserv: false
isValidWord: dnsserve: false
isValidWord: dnsserver: false

I check these with

private boolean isValidWord(String word) {
return !myManager.hasProblem(word);
}

So the Question is: Why is "dns" or "php" not a valid word with that check "!myManager.hasProblem(word);" ?

These words are valid standalone.

0

TL;DR. The Quesion was:

Why is "dns" or "php" not a valid word with that check "!myManager.hasProblem(word);" ?

These words are valid standalone.

0

So basically now you have default Spellchecking inspection working and yours in addition?  Or do you suppress default's "false positives" in your plugin?

 

And what is "myManager"? Please always share full code.

0

yes, my in addition. to eliminate the false ones i add words that i found with my plugin as correct to the dict (if you have a better idea to override the origin instead, please tell).

myManager is com.intellij.spellchecker.SpellCheckerManager. As i said i extends everything from SpellCheckingInspection (From here comes the myManager in MyTokenConsumer (I Dont know why it has the my-prefix when it comes from you :-) )

 

Full Code for now:

(most is copy paste, see end of file for relevant changes)

package application;

import com.intellij.codeInspection.ProblemDescriptor;
import com.intellij.codeInspection.ProblemDescriptorBase;
import com.intellij.codeInspection.ProblemHighlightType;
import com.intellij.codeInspection.ProblemsHolder;
import com.intellij.lang.*;
import com.intellij.lang.refactoring.NamesValidator;
import com.intellij.openapi.util.TextRange;
import com.intellij.psi.PsiElement;
import com.intellij.psi.PsiElementVisitor;
import com.intellij.psi.tree.IElementType;
import com.intellij.spellchecker.SpellCheckerManager;
import com.intellij.spellchecker.inspections.SpellCheckingInspection;
import com.intellij.spellchecker.inspections.Splitter;
import com.intellij.spellchecker.quickfixes.SpellCheckerQuickFix;
import com.intellij.spellchecker.tokenizer.LanguageSpellchecking;
import com.intellij.spellchecker.tokenizer.SpellcheckingStrategy;
import com.intellij.spellchecker.tokenizer.TokenConsumer;
import com.intellij.spellchecker.tokenizer.Tokenizer;
import com.intellij.spellchecker.util.SpellCheckerBundle;
import com.intellij.util.Consumer;
import gnu.trove.THashSet;
import org.jetbrains.annotations.Nls;
import org.jetbrains.annotations.NonNls;
import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

import java.util.*;

public class MySpellCheckingInspection extends SpellCheckingInspection {

public static final String SPELL_CHECKING_INSPECTION_TOOL_NAME = "MySpellCheckingInspection";
public static final int MIN_WORD_LENGTH = 3;

@Override
@NonNls
@NotNull
public String getShortName() {
return SPELL_CHECKING_INSPECTION_TOOL_NAME;
}

@Nls(capitalization = Nls.Capitalization.Sentence)
@NotNull
@Override
public String getDisplayName() {
return SPELL_CHECKING_INSPECTION_TOOL_NAME;
}

@Nls(capitalization = Nls.Capitalization.Sentence)
@NotNull
@Override
public String getGroupDisplayName() {
return SPELL_CHECKING_INSPECTION_TOOL_NAME;
}

private static SpellcheckingStrategy getSpellcheckingStrategy(@NotNull PsiElement element, @NotNull Language language) {
for (SpellcheckingStrategy strategy : LanguageSpellchecking.INSTANCE.allForLanguage(language)) {
if (strategy.isMyContext(element)) {
return strategy;
}
}
return null;
}

private static ProblemDescriptor createProblemDescriptor(PsiElement element, int offset, TextRange textRange,
SpellCheckerQuickFix[] fixes,
boolean onTheFly) {
SpellcheckingStrategy strategy = getSpellcheckingStrategy(element, element.getLanguage());
final Tokenizer tokenizer = strategy != null ? strategy.getTokenizer(element) : null;
if (tokenizer != null) {
textRange = tokenizer.getHighlightingRange(element, offset, textRange);
}
assert textRange.getStartOffset() >= 0;

final String description = SpellCheckerBundle.message("typo.in.word.ref");
return new ProblemDescriptorBase(element, element, description, fixes, ProblemHighlightType.GENERIC_ERROR_OR_WARNING, false, textRange, onTheFly, onTheFly);
}

private static void addBatchDescriptor(PsiElement element,
int offset,
@NotNull TextRange textRange,
@NotNull ProblemsHolder holder) {
System.out.println("addBatchDescriptor");
SpellCheckerQuickFix[] fixes = SpellcheckingStrategy.getDefaultBatchFixes();
ProblemDescriptor problemDescriptor = createProblemDescriptor(element, offset, textRange, fixes, false);
holder.registerProblem(problemDescriptor);
}

private static void addRegularDescriptor(PsiElement element, int offset, @NotNull TextRange textRange, @NotNull ProblemsHolder holder,
boolean useRename, String wordWithTypo) {
System.out.println("addRegularDescriptor");
SpellcheckingStrategy strategy = getSpellcheckingStrategy(element, element.getLanguage());

SpellCheckerQuickFix[] fixes = strategy != null
? strategy.getRegularFixes(element, offset, textRange, useRename, wordWithTypo)
: SpellcheckingStrategy.getDefaultRegularFixes(useRename, wordWithTypo, element);

final ProblemDescriptor problemDescriptor = createProblemDescriptor(element, offset, textRange, fixes, true);
holder.registerProblem(problemDescriptor);
}

@Override
@NotNull
public PsiElementVisitor buildVisitor(@NotNull final ProblemsHolder holder, final boolean isOnTheFly) {
final SpellCheckerManager manager = SpellCheckerManager.getInstance(holder.getProject());

return new PsiElementVisitor() {
@Override
public void visitElement(final PsiElement element) {
if (holder.getResultCount()>1000) return;

final ASTNode node = element.getNode();
if (node == null) {
return;
}

// Extract parser definition from element
final Language language = element.getLanguage();
final IElementType elementType = node.getElementType();
final ParserDefinition parserDefinition = LanguageParserDefinitions.INSTANCE.forLanguage(language);

// Handle selected options
if (parserDefinition != null) {
if (parserDefinition.getStringLiteralElements().contains(elementType)) {
if (!processLiterals) {
return;
}
}
else if (parserDefinition.getCommentTokens().contains(elementType)) {
if (!processComments) {
return;
}
}
else if (!processCode) {
return;
}
}

tokenize(element, language, new MySpellCheckingInspection.MyTokenConsumer(manager, holder, LanguageNamesValidation.INSTANCE.forLanguage(language)));
}
};
}

private static class MyTokenConsumer extends TokenConsumer implements Consumer<TextRange> {
private final Set<String> myAlreadyChecked = new THashSet<>();
// HashMap<String, String> validWords = new HashMap<>();
Map<String, String> validWords = new LinkedHashMap<>();

private final SpellCheckerManager myManager;
private final ProblemsHolder myHolder;
private final NamesValidator myNamesValidator;
private PsiElement myElement;
private String myText;
private boolean myUseRename;
private int myOffset;

MyTokenConsumer(SpellCheckerManager manager, ProblemsHolder holder, NamesValidator namesValidator) {
myManager = manager;
myHolder = holder;
myNamesValidator = namesValidator;
}

@Override
public void consumeToken(final PsiElement element,
final String text,
final boolean useRename,
final int offset,
TextRange rangeToCheck,
Splitter splitter) {
myElement = element;
myText = text;
myUseRename = useRename;
myOffset = offset;
splitter.split(text, rangeToCheck, this);
}

@Override
public void consume(TextRange textRange) {
String word = textRange.substring(myText);
if (!myHolder.isOnTheFly() && myAlreadyChecked.contains(word)) {
return;
}

boolean keyword = myNamesValidator.isKeyword(word, myElement.getProject());
if (keyword) {
return;
}

System.out.println(word);

boolean hasProblems = !isValidWord(word);

if (hasProblems) {
hasProblems = !multiWordCheck(word);
// if (!hasProblems) {
// myManager.acceptWordAsCorrect(word, myManager.getProject());
// }
}
if (hasProblems) {
int aposIndex = word.indexOf('\'');
if (aposIndex != -1) {
word = word.substring(0, aposIndex); // IdentifierSplitter.WORD leaves &apos;
}
hasProblems = myManager.hasProblem(word);
}
if (hasProblems) {
if (myHolder.isOnTheFly()) {
addRegularDescriptor(myElement, myOffset, textRange, myHolder, myUseRename, word);
}
else {
myAlreadyChecked.add(word);
addBatchDescriptor(myElement, myOffset, textRange, myHolder);
}
}
}

private boolean multiWordCheck(String originWord) {
System.out.println("==== multiWordCheck : " + originWord);

validWords.clear();

createValidWordTree(originWord, null);

System.out.println("validWords: " + validWords);

return canResolveWord(originWord);

}

private boolean canResolveWord(String originWord) {
for (Map.Entry<String, String> entry : validWords.entrySet()) {
String childWord = entry.getKey();
String parentWord = entry.getValue();
String resolvedWord = resolveWordFromTree(childWord);
// System.out.println("resolvedWord: " + resolvedWord);

if (originWord.equals(resolvedWord)) {
// System.out.println("resolvedWord: " + resolvedWord);
return true;
}

}

return false;
}

private String resolveWordFromTree(String word) {
String parentWord = getParentWordFromTree(word);
if (parentWord != null) {
return resolveWordFromTree(parentWord) + word;
// return resolveWordFromTree(parentWord) + "|" + word;
}
return word;
}

private String getParentWordFromTree(String matchWord) {
for (Map.Entry<String, String> entry : validWords.entrySet()) {
String childWord = entry.getKey();
String parentWord = entry.getValue();
// System.out.println("matchWord:" + matchWord + " === childWord:" + childWord);
if (matchWord.equals(childWord)) {
// System.out.println("=>" + parentWord);
return parentWord;
}
}

return null;
}


private void createValidWordTree(String word, @Nullable String parentWord) {
// System.out.println("=>" + word + ": " + isValidWord(word));
if (isValidWord(word) && word.length() >= MIN_WORD_LENGTH) {
validWords.put(word, parentWord);
return;
}

ArrayList<String> words = splitWords(word);

for (String subWord : words) {
if (isValidWord(subWord)) {
validWords.put(subWord, parentWord);
String leftWordPart = word.replace(subWord, "");
createValidWordTree(leftWordPart, subWord);
}
}
}

private ArrayList<String> splitWords(String word) {
int strLen = word.length();
ArrayList<String> words = new ArrayList<>();
for (int i=MIN_WORD_LENGTH; i <= strLen; i++) {
String subWord = word.substring(0, i);
words.add(subWord);
}
return words;
}

private boolean isValidWord(String word) {
return !myManager.hasProblem(word);
// boolean isValid = !myManager.hasProblem(word);
// System.out.println("isValidWord: " + word + ": " + isValid);
// return isValid;
}
}

}
0

As for the question about spellchecker and words -- spellchecker ignores words with length <= 3. So it actually never calls `hasProblem` for `dns`. And actually it may not include a lot of valid words of length <=3 in the end.

As for the whole problem -- we are planning to migrate to Lucene Java-pure Hunspell implementation that would solve this problem out of the box :)

0

If youre plan something, then i can wait - Thanks

0

Please sign in to leave a comment.