digitalmars.D - VLERange: a range in between BidirectionalRange and RandomAccessRange
- spir <denis.spir gmail.com> Jan 14 2011
On 01/14/2011 07:26 AM, Nick Sabalausky wrote:"Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org> wrote in message news:igoj6s$17r6$1 digitalmars.com...I'm not so sure about that. What do you base this assessment on?
wrote a library that according to him does grapheme-related stuff
else does. So apparently graphemes is not what people care about
it might be what they should care about).
It's what they want, they just don't know it. Graphemes are what many people *think* code points are.This might be a good time to see whether we need to address graphemes systematically. Could you please post a few links that would
and others in the mysteries of combining characters?
Maybe someone else has a link to an explanation (I don't), but it's basically just this:
(You will certainly not find it in Unicode literature, for instance.) Nick's explanation below is good and concise. (Just 2 notes added.)Three levels of abstraction from lowest to highest: - Code Unit (ie, encoding) - Code Point (ie, what Unicode assigns distinct numbers to) - Grapheme (ie, what we think of as a "character") A code-point can be made up of one or more code-units. Likewise, a
can be made up of one or more code-points. There are (at least) two types of code points: - Regular ones, such as letters, digits, and punctuation. - "Combining Characters", such as accent marks (or if you're
Japanese, the little things in the upper-right corner that change
a "z" or an "h" to a "p". Or like German's umlaut - the two dots
vowel). Ie, things that are not characters in their own right, but
modify other characters. These can be often (always?) be thought of
composite "ü" or "ṵ̈̈". _________________ vita es estrany spir.wikidot.com
Jan 14 2011