[Trennmuster] Hyphenation patterns

Werner LEMBERG wl at gnu.org
Mi Mär 14 17:51:09 CET 2012


>> BTW, does your grammar support multiple hyphenation marks at one
>> place, this is, things like `==.'?
> 
> No.  Should it?  when do you use it?  Shouldn't it be ... in that
> case?

The idea is that it makes sense to add more information to the word
list.  In particular, it requires only a small amount of additional
tagging so that the list can be used to decide whether `ſ' or `s' must
be used if the text is typeset in Fraktur.

Example: At the border of components in a compound word, no `ſ' is
used.  To make this work reliably, such components are tagged with,
say, `='.  However, it might still happen that this hyphenation point
is not desired, so we need `=.' to mark this.  In a previous mail I've
already mentioned that `.' is actually a shorthand for `-.' (using the
current tagging characters of the German word list).

So `=' and `.' belong to different classes (grammatical
vs. aesthetical), transporting different information.

>>   Bra-bant   % Noord-Brabant
> 
> I have seen the nine comments in wortlist. These are only for the
> editors, correct?

No, these comments are for everyone, denoting special cases,
exceptions, etc.

> The thing is that most of these words in Dutch are not fully
> qualified words of their own. :( We have some very ugly combinations
> and a lot of variations.

This is of no importance to patgen.

> Sometimes they are loanwords or foreign expressions that do not
> exist outside a compound.  They need a hyphenation pattern in the
> compound.  Especially foreign words that would be hyphenated
> incorrectly by Dutch rules, rules extracted from Dutch words or they
> could be by accident written identically to another Dutch word with
> of course different hyphenation.  Therefore we need to be able to
> explicitly provide pattern.

Hmm.  Let's assume the word

 foobar-baz

exists in Dutch, and `foobar' is not a stand-alone word.  Why is it
problematic to have two entries `foobar' and `baz' in the word list
(with some comments to explain the origin)?  Even if there might exist
valid Dutch words `fooba' and `foobarr' with a different hyphenation,
everything can coexist in the word list.

Or does the case exist that, for example, you hyphenate `foo-bar#baz',
but a standalone `foobar' would be hyphenated as `foob-ar'?  I really
doubt that.

> At the moment we have about 6000 words that contain a hyphen in the
> normal form and some are used very often.  These can be broken down
> into several categories.  I will give you some examples.

Thanks.  Could you please analyze whether there are conflicts in the
hyphenation of the components, this is, whether there are cases
`foo-bar' vs. `foob-ar' as outlined above?

> Another one is
>
>   zwart-wittelevisie;zwart#wit==te-le-vi-sie
>
> which means black-and-white television, also 'wittelevisie' is not a
> Dutch word.  Here == has higher prio as #

Hmm.  Looking this up with google, I can only find

  zwart-witte televisie

The word `zwart-wittetelevisie' has *not a single* hit!  Is this
really a correct entry?

> I think ~ is a good one.

I like it too, but it is probably a bit too similar to `-'.


    Werner


Mehr Informationen über die Mailingliste Trennmuster