[Trennmuster] Please review hyphenation file format

Guenter Milde milde at users.sf.net
Do Apr 19 18:40:44 CEST 2012


On 19.04.12, Pander wrote:
> Hi all,

> I have been documenting much of what has been discussed on this list to
> define an international file format for hyphenation patterns.

> The document is far from finished but could you please have a look at it
> and give me feedback for the data which is in there at the moment,
> especially on the section called priority.

Some more comments:

> 2.1. Basic layout
> 
- Each file is in Unicode encoding ...

+ The files use the Unicode character set in utf8 encoding.

While the German wortliste currently uses latin-1, utf8 seems the best
choice for a common encoding for all languages that need hyphenation patterns.

- 2.2. Alternated spelling induced hyphenations 

+ 2.2. Alternated spelling induced by hyphenation

(or some more easier to read alternative).


> German noun:
>   Abdrücken (Ab-drücken/Abdrük-ken) prints [plural noun]
  
„abdrücken“ is a German verb (to pull the trigger, to squeeze off) that
(like any German verb) can be made a noun (das Abdrücken: the act of pulling
the trigger) which cannot be used in plural.

- In German it is also possible to have doubling of consonants in
- digraphs when hyphenating.
- ...

In traditional German orthography (de-1901), compound words like «Vollast»
(voll + Last) [lit: full load] drop one of three similar consonants.
However, when such a word is hyphenated at the word joint, the
omitted consonant reappears:

  Vollast -> Voll-last

The hypenation patterm for this case is

  Vo{ll/ll=l}ast


(For more suggestions on formatting this section see also my previous post.)


- 2.3. Multiple hyphenation
+ 2.3. Ambiguous hyphenation

Suggestion:

In some cases a word can be hyphenated in different ways depending on the
meaning of the word. Square brackets mark alternatives inside a
hyphenation pattern, e.g. 

   val[k=/=k]uil
   
for the Dutch heteronyms   

  valkuil (valk-uil) ninox [lit: falcon owl]
  valkuil (val-kuil) trapping pit [lit: trap pit]

Automatic hyphenation should not use ambiguous hyphenation points,
interaktive hyphenation programs may suggest the alternatives.

German examples:

  Wachstube;Wach[=s/s=]tu-be
  
with the meanings

  guardroom (Wach + Stube)
  wax tube
  
and

  Wales;Wa[-/]les
  
with the meanings
  
  Wales (part of the UK) 
  whale's (Genitiv of «Wal»)

Only the latter can be hyphenated.


- 2.4. Emergency hyphenation 

Please drop.


4. Reserved characters 

We consider one more character to mark the breaking of ligatures.
Stay tuned.

Günter




Mehr Informationen über die Mailingliste Trennmuster