[Trennmuster] Hyphenation patterns
Pander
pander at users.sourceforge.net
Mi Mär 14 18:17:38 CET 2012
On 2012-03-14 17:51, Werner LEMBERG wrote:
>
>>> BTW, does your grammar support multiple hyphenation marks at one
>>> place, this is, things like `==.'?
>>
>> No. Should it? when do you use it? Shouldn't it be ... in that
>> case?
>
> The idea is that it makes sense to add more information to the word
> list. In particular, it requires only a small amount of additional
> tagging so that the list can be used to decide whether `ſ' or `s' must
> be used if the text is typeset in Fraktur.
>
> Example: At the border of components in a compound word, no `ſ' is
> used. To make this work reliably, such components are tagged with,
> say, `='. However, it might still happen that this hyphenation point
> is not desired, so we need `=.' to mark this. In a previous mail I've
> already mentioned that `.' is actually a shorthand for `-.' (using the
> current tagging characters of the German word list).
>
> So `=' and `.' belong to different classes (grammatical
> vs. aesthetical), transporting different information.
OK, no problem with that. I would like to prevent defining codes that
are aliases if possible.
>>> Bra-bant % Noord-Brabant
>>
>> I have seen the nine comments in wortlist. These are only for the
>> editors, correct?
>
> No, these comments are for everyone, denoting special cases,
> exceptions, etc.
>
>> The thing is that most of these words in Dutch are not fully
>> qualified words of their own. :( We have some very ugly combinations
>> and a lot of variations.
>
> This is of no importance to patgen.
>
>> Sometimes they are loanwords or foreign expressions that do not
>> exist outside a compound. They need a hyphenation pattern in the
>> compound. Especially foreign words that would be hyphenated
>> incorrectly by Dutch rules, rules extracted from Dutch words or they
>> could be by accident written identically to another Dutch word with
>> of course different hyphenation. Therefore we need to be able to
>> explicitly provide pattern.
>
> Hmm. Let's assume the word
>
> foobar-baz
>
> exists in Dutch, and `foobar' is not a stand-alone word. Why is it
> problematic to have two entries `foobar' and `baz' in the word list
> (with some comments to explain the origin)? Even if there might exist
> valid Dutch words `fooba' and `foobarr' with a different hyphenation,
> everything can coexist in the word list.
Objections are that we store one hyphenation pattern per word. Another
being that for "déjà-vugevoel" we get "vugevoel" which makes no sense
since it is "gevoel" that is an existing word.
> Or does the case exist that, for example, you hyphenate `foo-bar#baz',
> but a standalone `foobar' would be hyphenated as `foob-ar'? I really
> doubt that.
>
>> At the moment we have about 6000 words that contain a hyphen in the
>> normal form and some are used very often. These can be broken down
>> into several categories. I will give you some examples.
>
> Thanks. Could you please analyze whether there are conflicts in the
> hyphenation of the components, this is, whether there are cases
> `foo-bar' vs. `foob-ar' as outlined above?
I know that compounds with loanwords (which we use a lot, especially in
computer industry) it will cause problems because of mixing spelling of
different languages.
>> Another one is
>>
>> zwart-wittelevisie;zwart#wit==te-le-vi-sie
>>
>> which means black-and-white television, also 'wittelevisie' is not a
>> Dutch word. Here == has higher prio as #
>
> Hmm. Looking this up with google, I can only find
>
> zwart-witte televisie
>
> The word `zwart-wittetelevisie' has *not a single* hit! Is this
> really a correct entry?
Indeed not-so-good example from our list. However
signaal-ruisverhouding (signal to noise ratio)
and the other examples I used do exist.
>> I think ~ is a good one.
>
> I like it too, but it is probably a bit too similar to `-'.
We have exhausted the subject and ourselves pretty thorough I think. For
now, let's keep it at ~ Perhaps that later on a better way to solve this
presents itself.
> Werner
Mehr Informationen über die Mailingliste Trennmuster