[Trennmuster] Hyphenation patterns
Werner LEMBERG
wl at gnu.org
Mi Mär 14 08:20:42 CET 2012
> I agree that · is lighter to read, used in dictionaries and UTF-8 is
> supported in many places. Some people however have problems
> entering a ·, that was the reason for me to stick to ASCII only for
> delimiters.
It's just an editing mark, and a search-and-replace operation changes
this easily. Another reason to make the set of hyphenation marks
configurable.
> Nevertheless, for balance in the symbols I would like to suggest colon
> because it is clearer that something is going on than when using a dot, see
> abc·abc abc:abc abc=abc abc#abc
> compared to
> abc·abc abc.abc abc=abc abc#abc
I see it differently. My idea is to increase grayness for larger
weights while still retaining something which resembles a hyphen:
abc·abc abc-abc abc=abc abc==abc abc===abc===
Your `#' character actually fits into this scheme, but I consider it
too gray due to its large height, making it hard to quickly read the
parts before and after the hyphenation mark:
abc·abc abc-abc abc=abc abc#abc abc##abc
BTW, does your grammar support multiple hyphenation marks at one
place, this is, things like `==.'?
And I've just remembered that we use another character (`_') to
indicate emergency hyphenation points (`Nottrennungen'):
Tel-tow_er
The difference to `.' is of conceptual nature: Using `.' belongs into
the aesthetical category (different people might have different
opionions whether a hyphenation should be suppressed at this very
place), while `_' is related more to grammar and pronounciation. The
`w' character in `Teltow' isn't spoken (it's a `Dehnungs-w') and
belongs to the `o', but the hyphenation after and not before the `w'
looks strange:
Die freiwillige Feuerwehr der Teltow-
er Bürger ist sehr effizient.
>>> The idea is to use the hyphen as a normal letter of the language
>>> and it can be used to hyphenate with the highest priority.
I think I've now understood what you mean with `normal letter of the
language': It seems that you want to insert various hyphenation marks
into (longer) plain text. Doing so I fully agree that `-' must be
avoided since it already has a function in normal text. However, this
restriction doesn't hold for word lists.
> So how would you make a hyphenation pattern for
>
> Noord-Brabant
>
> we would do now
>
> Noord-Brabant;Noord-Bra=bant
I wouldn't make a hyphenation pattern for `Noord-Brabant' at all. A
solution within a word list is to add the full word as a comment, if
really necessary (and it most cases it isn't since the parts before
and after the hyphen are already fully qualified words of its own):
Bra-bant % Noord-Brabant
> We need to be able to preserve the '-' between Noord and Brabant and
> we are allowed to break on it (even preferred). That is why we
> needed to introduce a third one and ended up with -=#
Your target is plain text and not word lists...
> What would also be possible is your scheme and we use # for hyphen
> in compounds that always needs to be shown and is preferred to break
> on:
>
> Brabant;Bra-bant
> Noord-Brabant;Noord#Bra-bant
>
> and for the other example we could use:
>
> treinwagon;trein=wa-gon
> goederentrein;goe-de-ren=trein
> goederentreinwagon;goe-de-ren=trein=wa-gon
Hmm. I don't want to `win' this discussion. I still think that using
two different sets for two different targets is the best solution.
Werner
Mehr Informationen über die Mailingliste Trennmuster