[Trennmuster] FOSDEM and spell checking
Werner LEMBERG
wl at gnu.org
Di Jan 26 10:41:39 CET 2016
> Which of you will be at FOSDEM this weekend?
Too far away, sorry.
> While gathering some data on spell checking, I processed the files
> from this project, just to see how they perform with spell checking
> on latest stable Ubuntu. Below is a list of what Hunspell and
> Aspell make of it with their respective de_DE dictionaries.
It's not exactly clear to me what you have done. Did you use the
spell checker programs to test the validity of our wordlist?
> This is in no way criticism, the 84% on 460.000 words that were not
> used to build the dictionary is very high.
Mhmm. Given that a very high percentage of entries in our wordlist
are based on frequency, it is ashaming that the tested spell checkers
give so bad results.
> I am curious on how we can use this project to improve spell
> checking. Apparently, the source of the German dictionaries is
> https://www.j3e.de/ispell/igerman98/ Is he involved in this project
> too?
No. From time to time I send him reports on gross mistakes in the
list if I encounter them. Alas, I don't have enough time recently to
work on our wordlist...
> In what way can we help him improve dicts?
You might send him your result so that he can investigate why
spellchecking fails so frequently.
> The software I made to process word lists in different spell
> checkers with different dicts is somewhere in GitHub but far from
> presentable. Would you guys be interested that I include the word
> lists from this trennmuster project in its reporting by default?
Please explain the benefits :-)
> Are there any other word lists you would recommend validating
> against?
I don't know, sorry. The question is where to get *corrected* German
word lists...
Werner
Mehr Informationen über die Mailingliste Trennmuster