Idea / Suggestion: Word-Stemming Search-and-Replace
Hi,
I was just contemplating the choice of terminology I'd used in a new interface definition and wanted to see all occurrence of the term in question. I'd used this word in verb, participle / gerund forms with various tenses, so I had to compose a regular expression that would find them all. Not a big deal, really, but it got me thinking that it would be really nice to have this process automated.
I think it would be handy to have a search (and possibly replace) tool that would stem a word and search for its variants. Naturally, this would require a dictionary and would thus be dependent on the (human) language in which the code was written. Additionally, a full treatment of the concept would probably require the ability to add new words, since we deal in neologisms and technical terminology some of which might not appear in most on-line dictionaries.
So... Is there anybody out there with the time and the necessary plug-in programming skills (or the newly sparked motivation to acquire those skills) who could do this? I have neither.
Randall Schulz
Please sign in to leave a comment.
Lucene has support for stemming of words; perhaps someone could write a plugin?
It doesn't really strike me as something particularly suitable for core functionality.