[Trennmuster] Hyphenation patterns

Werner LEMBERG wl at gnu.org
Mi Mär 14 08:20:42 CET 2012


> I agree that · is lighter to read, used in dictionaries and UTF-8 is
> supported in many places.  Some people however have problems
> entering a ·, that was the reason for me to stick to ASCII only for
> delimiters.

It's just an editing mark, and a search-and-replace operation changes
this easily.  Another reason to make the set of hyphenation marks
configurable.

> Nevertheless, for balance in the symbols I would like to suggest colon
> because it is clearer that something is going on than when using a dot, see
>     abc·abc   abc:abc   abc=abc   abc#abc
> compared to
>     abc·abc   abc.abc   abc=abc   abc#abc

I see it differently.  My idea is to increase grayness for larger
weights while still retaining something which resembles a hyphen:

  abc·abc abc-abc abc=abc abc==abc abc===abc===

Your `#' character actually fits into this scheme, but I consider it
too gray due to its large height, making it hard to quickly read the
parts before and after the hyphenation mark:

  abc·abc abc-abc abc=abc abc#abc abc##abc

BTW, does your grammar support multiple hyphenation marks at one
place, this is, things like `==.'?

And I've just remembered that we use another character (`_') to
indicate emergency hyphenation points (`Nottrennungen'):

  Tel-tow_er

The difference to `.' is of conceptual nature: Using `.' belongs into
the aesthetical category (different people might have different
opionions whether a hyphenation should be suppressed at this very
place), while `_' is related more to grammar and pronounciation.  The
`w' character in `Teltow' isn't spoken (it's a `Dehnungs-w') and
belongs to the `o', but the hyphenation after and not before the `w'
looks strange:

   Die freiwillige Feuerwehr der Teltow-
   er Bürger ist sehr effizient.

>>> The idea is to use the hyphen as a normal letter of the language
>>> and it can be used to hyphenate with the highest priority.

I think I've now understood what you mean with `normal letter of the
language': It seems that you want to insert various hyphenation marks
into (longer) plain text.  Doing so I fully agree that `-' must be
avoided since it already has a function in normal text.  However, this
restriction doesn't hold for word lists.

> So how would you make a hyphenation pattern for
>
>   Noord-Brabant
>
> we would do now
>
>   Noord-Brabant;Noord-Bra=bant

I wouldn't make a hyphenation pattern for `Noord-Brabant' at all.  A
solution within a word list is to add the full word as a comment, if
really necessary (and it most cases it isn't since the parts before
and after the hyphen are already fully qualified words of its own):

  Bra-bant   % Noord-Brabant

> We need to be able to preserve the '-' between Noord and Brabant and
> we are allowed to break on it (even preferred).  That is why we
> needed to introduce a third one and ended up with -=#

Your target is plain text and not word lists...

> What would also be possible is your scheme and we use # for hyphen
> in compounds that always needs to be shown and is preferred to break
> on:
>
>   Brabant;Bra-bant
>   Noord-Brabant;Noord#Bra-bant
>
> and for the other example we could use:
>
>   treinwagon;trein=wa-gon
>   goederentrein;goe-de-ren=trein
>   goederentreinwagon;goe-de-ren=trein=wa-gon

Hmm.  I don't want to `win' this discussion.  I still think that using
two different sets for two different targets is the best solution.


    Werner



Mehr Informationen über die Mailingliste Trennmuster