[Trennmuster] Hyphenation patterns
    Werner LEMBERG 
    wl at gnu.org
                  
    Mi Mär 14 08:20:42 CET 2012
    
    
> I agree that · is lighter to read, used in dictionaries and UTF-8 is
> supported in many places.  Some people however have problems
> entering a ·, that was the reason for me to stick to ASCII only for
> delimiters.
It's just an editing mark, and a search-and-replace operation changes
this easily.  Another reason to make the set of hyphenation marks
configurable.
> Nevertheless, for balance in the symbols I would like to suggest colon
> because it is clearer that something is going on than when using a dot, see
>     abc·abc   abc:abc   abc=abc   abc#abc
> compared to
>     abc·abc   abc.abc   abc=abc   abc#abc
I see it differently.  My idea is to increase grayness for larger
weights while still retaining something which resembles a hyphen:
  abc·abc abc-abc abc=abc abc==abc abc===abc===
Your `#' character actually fits into this scheme, but I consider it
too gray due to its large height, making it hard to quickly read the
parts before and after the hyphenation mark:
  abc·abc abc-abc abc=abc abc#abc abc##abc
BTW, does your grammar support multiple hyphenation marks at one
place, this is, things like `==.'?
And I've just remembered that we use another character (`_') to
indicate emergency hyphenation points (`Nottrennungen'):
  Tel-tow_er
The difference to `.' is of conceptual nature: Using `.' belongs into
the aesthetical category (different people might have different
opionions whether a hyphenation should be suppressed at this very
place), while `_' is related more to grammar and pronounciation.  The
`w' character in `Teltow' isn't spoken (it's a `Dehnungs-w') and
belongs to the `o', but the hyphenation after and not before the `w'
looks strange:
   Die freiwillige Feuerwehr der Teltow-
   er Bürger ist sehr effizient.
>>> The idea is to use the hyphen as a normal letter of the language
>>> and it can be used to hyphenate with the highest priority.
I think I've now understood what you mean with `normal letter of the
language': It seems that you want to insert various hyphenation marks
into (longer) plain text.  Doing so I fully agree that `-' must be
avoided since it already has a function in normal text.  However, this
restriction doesn't hold for word lists.
> So how would you make a hyphenation pattern for
>
>   Noord-Brabant
>
> we would do now
>
>   Noord-Brabant;Noord-Bra=bant
I wouldn't make a hyphenation pattern for `Noord-Brabant' at all.  A
solution within a word list is to add the full word as a comment, if
really necessary (and it most cases it isn't since the parts before
and after the hyphen are already fully qualified words of its own):
  Bra-bant   % Noord-Brabant
> We need to be able to preserve the '-' between Noord and Brabant and
> we are allowed to break on it (even preferred).  That is why we
> needed to introduce a third one and ended up with -=#
Your target is plain text and not word lists...
> What would also be possible is your scheme and we use # for hyphen
> in compounds that always needs to be shown and is preferred to break
> on:
>
>   Brabant;Bra-bant
>   Noord-Brabant;Noord#Bra-bant
>
> and for the other example we could use:
>
>   treinwagon;trein=wa-gon
>   goederentrein;goe-de-ren=trein
>   goederentreinwagon;goe-de-ren=trein=wa-gon
Hmm.  I don't want to `win' this discussion.  I still think that using
two different sets for two different targets is the best solution.
    Werner
    
    
Mehr Informationen über die Mailingliste Trennmuster