[Trennmuster] FOSDEM and spell checking

Werner LEMBERG wl at gnu.org
Di Jan 26 10:41:39 CET 2016


> Which of you will be at FOSDEM this weekend?

Too far away, sorry.

> While gathering some data on spell checking, I processed the files
> from this project, just to see how they perform with spell checking
> on latest stable Ubuntu.  Below is a list of what Hunspell and
> Aspell make of it with their respective de_DE dictionaries.

It's not exactly clear to me what you have done.  Did you use the
spell checker programs to test the validity of our wordlist?

> This is in no way criticism, the 84% on 460.000 words that were not
> used to build the dictionary is very high.

Mhmm.  Given that a very high percentage of entries in our wordlist
are based on frequency, it is ashaming that the tested spell checkers
give so bad results.

> I am curious on how we can use this project to improve spell
> checking.  Apparently, the source of the German dictionaries is
> https://www.j3e.de/ispell/igerman98/ Is he involved in this project
> too?

No.  From time to time I send him reports on gross mistakes in the
list if I encounter them.  Alas, I don't have enough time recently to
work on our wordlist...

> In what way can we help him improve dicts?

You might send him your result so that he can investigate why
spellchecking fails so frequently.

> The software I made to process word lists in different spell
> checkers with different dicts is somewhere in GitHub but far from
> presentable.  Would you guys be interested that I include the word
> lists from this trennmuster project in its reporting by default?

Please explain the benefits :-)

> Are there any other word lists you would recommend validating
> against?

I don't know, sorry.  The question is where to get *corrected* German
word lists...


    Werner



Mehr Informationen über die Mailingliste Trennmuster