[Trennmuster] long-s hyphenation patterns

Guenter Milde milde at users.sf.net
Fr Jun 13 23:38:47 CEST 2014


Dear hyphenators,

On 11.06.14, Mojca Miklavec wrote:
> On Wed, Jun 11, 2014 at 10:21 PM, Werner LEMBERG wrote:

> > As mentioned in another mail, it's straightforward to have unified
> > patterns, provided a script generates the proper entries.

Patterns with ſ and without them do not interfere. This is similar to
spelling variants like Abbe/Abbé. However, as about 1/3 of German words
contain a "Binnen-S" (190000 out of 470000 words in our list), this would
mean "unified" UTF-8 pattern sets would be about 1/3 larger.

OTOH, separate pattern sets would add one more dimension of variants:

* language variants 
  'de-1901':         # "traditionell" (nach Rechtschreibreform 1901)
  'de-1996':         # reformierte Reformschreibung (1996)
  'de-x-GROSS':      # ohne ß (Schweiz oder GROSS) allgemein
  'de-1901-x-GROSS': # ohne ß (Schweiz oder GROSS) "traditionell"
  'de-CH-1996':      # = 'de-1996-x-GROSS' ohne ß (Schweiz oder GROSS) "reformiert"
  'de-CH-1901':      # ohne ß (Schweiz) "traditionell" ("süssauer")
  
* hyphenation variants (planned)
  
  permissive   ((almost) all valid hyphenation points)
  conservative (no confusing hyphenations)
  major        (only hyphenate at major morpheme boundaries)
  ... ?

* S-spelling (de-Lat, de-Latf)

This would result in about 6 x 3 x 2 = 36 different pattern sets just for
German.


However, I am not conviced that we need patterns with long-S at all:

* Similar to upper/lower case words, there are very few instances where
  distinguishing between ſ and s would allow for disambiguating the make-up
  of a word (and hence its spelling). These are the two rarely used words

    Wachstube;Wach[=s/s=]tu-be # Wachstube/Wachſtube
    Gastrasse;Gas[=t/t=]ras-se # Gastrasse/Gaſtraſſe
    
  In all other 190 000 cases where the spelling for typesetting in
  blackletter fonts is different, this does not change the hyphenation.
  
  I.e. the patterns would become 1/3 larger without any additional
  information.

* in 8-bit TeX, all input conventions use "s" (rund-s) for both, "ſ" and "s"
  and employ the ligature mechanism of TeX fonts to map s at the end of a
  word to "s" and in the middle of a word to "ſ".
  
  Pseudo-Ligatures for round-s in the middle of a compound (Ausſteuer, ...)
  vary between the various encodings of 8-bit blackletter TeX fonts.

  The most advanced font encoding (LF2) implemented by the "frakturx"
  package provides for the input convention «s"|», i.e. s followed by an
  explicite hyphenation point as defined by Babel's german.ldf
  («Ausſteuer» would be written «Aus"|steuer»). This convention is
  compatible with the existing hyphenation patterns, so there is no need for
  additional ſ-patterns with 8-bit TeX.
  
* in Unicode-aware TeX, pattern sets including long-ſ-variants would allow
  for hyphenation of texts with correct S-spelling for typesetting in
  blackletter fonts (there are some Unicode blackletter fonts with the
  various s characters at the correct position by now).
  
  However, LuaTeX provides the hooks for a modification of the hyphenation
  algorithm in a package so that any long ſ is mapped to round s prior to
  the lookup of hyphenation points (similar to downcasing capital letters).¹
  
¹ This mapping could possibly be achieved by defining the lc/uc values,
  but I am not sure about negative side-effects on \MakeUppercase and
  \MakeLowercase.
  

I am in favour of keeping the German hyphenation patterns free of ſ-spelling
variants unless someone convinces me that a 1/3 increase of the size of 
the pattern set for a rarely used orthography variant does not impair the
"normal" use of (Lua)TeX.


Günter
  
  
  



Mehr Informationen über die Mailingliste Trennmuster