[Trennmuster] Hyphenation patterns
Werner LEMBERG
wl at gnu.org
Mi Mär 14 17:51:09 CET 2012
>> BTW, does your grammar support multiple hyphenation marks at one
>> place, this is, things like `==.'?
>
> No. Should it? when do you use it? Shouldn't it be ... in that
> case?
The idea is that it makes sense to add more information to the word
list. In particular, it requires only a small amount of additional
tagging so that the list can be used to decide whether `ſ' or `s' must
be used if the text is typeset in Fraktur.
Example: At the border of components in a compound word, no `ſ' is
used. To make this work reliably, such components are tagged with,
say, `='. However, it might still happen that this hyphenation point
is not desired, so we need `=.' to mark this. In a previous mail I've
already mentioned that `.' is actually a shorthand for `-.' (using the
current tagging characters of the German word list).
So `=' and `.' belong to different classes (grammatical
vs. aesthetical), transporting different information.
>> Bra-bant % Noord-Brabant
>
> I have seen the nine comments in wortlist. These are only for the
> editors, correct?
No, these comments are for everyone, denoting special cases,
exceptions, etc.
> The thing is that most of these words in Dutch are not fully
> qualified words of their own. :( We have some very ugly combinations
> and a lot of variations.
This is of no importance to patgen.
> Sometimes they are loanwords or foreign expressions that do not
> exist outside a compound. They need a hyphenation pattern in the
> compound. Especially foreign words that would be hyphenated
> incorrectly by Dutch rules, rules extracted from Dutch words or they
> could be by accident written identically to another Dutch word with
> of course different hyphenation. Therefore we need to be able to
> explicitly provide pattern.
Hmm. Let's assume the word
foobar-baz
exists in Dutch, and `foobar' is not a stand-alone word. Why is it
problematic to have two entries `foobar' and `baz' in the word list
(with some comments to explain the origin)? Even if there might exist
valid Dutch words `fooba' and `foobarr' with a different hyphenation,
everything can coexist in the word list.
Or does the case exist that, for example, you hyphenate `foo-bar#baz',
but a standalone `foobar' would be hyphenated as `foob-ar'? I really
doubt that.
> At the moment we have about 6000 words that contain a hyphen in the
> normal form and some are used very often. These can be broken down
> into several categories. I will give you some examples.
Thanks. Could you please analyze whether there are conflicts in the
hyphenation of the components, this is, whether there are cases
`foo-bar' vs. `foob-ar' as outlined above?
> Another one is
>
> zwart-wittelevisie;zwart#wit==te-le-vi-sie
>
> which means black-and-white television, also 'wittelevisie' is not a
> Dutch word. Here == has higher prio as #
Hmm. Looking this up with google, I can only find
zwart-witte televisie
The word `zwart-wittetelevisie' has *not a single* hit! Is this
really a correct entry?
> I think ~ is a good one.
I like it too, but it is probably a bit too similar to `-'.
Werner
Mehr Informationen über die Mailingliste Trennmuster