Format of the *.dic dictionary files of the 9.0 spell checker
Hi,
in the new Intellij 9.0 you can add own dictionary files for the build-in spell checker. Bundled are two of them - an english one and one with special terms called jetbrains.
I like to add a german *.dic file. So I started downloading the igerman98 sources [1] and tried the generated *.dic files for ispell/hunspell. But no such file seems to work. Than I downloaded the old german dictionary [2] from the old spell checker plugin. In the JAR, I found a german.0 file. I renamed it into german.dic and added it to Intellij 9.0. It works. But than I noticed, that all words with german umlauts and ß won't recordnized.
The difference between the german.0 file inside the old german plugin dictionary is simply a list of newline separated words, where the ispell/hunspell *.dic files contain also some control information for conjugation and case. I suppose that the spell checker in Intellij 9.0 doesn't support this.
So is there a way to convert ispell/hunspell *.dic files in the more simple format suported by Intellij 9.0?
[1]: http://www.j3e.de/ispell/igerman98/
[2]: http://plugins.intellij.net/plugin/?idea&id=1658
Please sign in to leave a comment.
@Pokriefke @Richard Hunspell plugin is available now, please install it and you will be able to add hunspell dictionaries to spellchecker
Hi Alexander,
As regards german dictionary - we are going to bundle it in next
available IDEA public build (IDEA 9.0.1 will be released before the New
Year). And any custom dictionary with encoding set to UTF-8 will be
supported in full. (Currently default system encoding is used during
dictionary loading).
As regards dictionary format and converters - in the nearest future we
are not going to create a public converter for ispell/hunspell
dictionaries. The simple newline separated words list to create custom
dictionary looks like a simple way to give a possibility to extend
spell-checker knowledge base. Special .dic extension was selected just
to simplify user folder screening.
--
Ekaterina Shliakhovetskaja
Senior Software Developer
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"
On 10.12.2009 18:03, Alexander Kiel wrote:
>
>
>
>
>
>
Hi Ekaterina,
thanks for your replay.
The issue which I see with the simple word list format is, that you have to add every case of conjugation, lower and upper case.
As I asked for a converter, I had this issue in mind. The converter could expand a ispell/myspell/hunspell dictionary into a word list. I'm not sure if there is already one available. A quick google search did not reveal anything.
I think hunspell is widely used [1] and more advanced as a simple wordlist. Unfortnately there is no Pure Java implementation available [1, 2]. I think a JNI wrapper is not a good solution for IDEA.
[1]: http://en.wikipedia.org/wiki/Hunspell
[2]: http://www.mail-archive.com/dev@lingucomponent.openoffice.org/msg01519.html
Hi Alexander,
Thank you for your investigations. The decision of dictionary format is
based on the language-independent behavior of the spellchecker. We don't
use stemming at all. It force dictionary to grow, but we have now a
solution how to compress dictionary and prevent huge memory utilization.
--
Ekaterina Shliakhovetskaja
Senior Software Developer
JetBrains, Inc.
http://www.jetbrains.com/
"Develop with Pleasure!"
On 10.12.2009 22:24, Alexander Kiel wrote:
>
>
>
>
>
>
Hi Ekaterina,
okay, thanks. I'm looking forward for the german dictionary in the 9.0.1 version.
Hi Ekaterina,
I have installed IDEA 9.0.1 via patch update on linux. But I don't see a German dictionary. Is it scheduled for a later version?
Thanks
Alexander
Hello Ekaterina,
I would be very interested in a german dictionary, too.
Yet IDEA 9.0.3 does bundle only the english one. (Which sounds ok for me).
But any links how to create a custom dictionary without starting from scratch would be fine.
Thanks!
- Ben
"As regards dictionary format and converters - in the nearest future we
are not going to create a public converter for ispell/hunspell
dictionaries."
As you probably already know, some hunspell dictionaries do NOT contain all words and contain only lemmas which allow to create all the form of a word. A good example is dicollecte.
How can we get all words from a hunspell dic file to IntelliJ?
Sorry for bringing this old topic back to the top, but I really would love to see a simple support for hunspell dictionaries in Intellij, too. I tried to compile my own German dictionary and ended up with a list of 1.5M words (unmunched a German hunspell dic). Unfortunately this list is still incomplete. Even the simple word "Wörterbuch" (Dictionary) is not listed, because it is a combination of "Wörter" and "Buch". Trying to get every possible word combination in one file is nearly impossible. I'm not sure if this is a unique problem for German words or how many other languages are using word combinations...
It is just a spell check, but it would be nice, if we could activate it in our company again.