www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - VLERange: a range in between BidirectionalRange and RandomAccessRange

On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:igoj6s$17r6$1 digitalmars.com...
 I'm not so sure about that. What do you base this assessment on? 
Denis
 wrote a library that according to him does grapheme-related stuff 
nobody
 else does. So apparently graphemes is not what people care about 
(although
 it might be what they should care about).
It's what they want, they just don't know it. Graphemes are what many people *think* code points are.
 This might be a good time to see whether we need to address graphemes
 systematically. Could you please post a few links that would 
educate me
 and others in the mysteries of combining characters?
Maybe someone else has a link to an explanation (I don't), but it's basically just this:
If anyone finds a pointer to such an explanation, bravo, and than you. (You will certainly not find it in Unicode literature, for instance.) Nick's explanation below is good and concise. (Just 2 notes added.)
 Three levels of abstraction from lowest to highest:
 - Code Unit (ie, encoding)
 - Code Point (ie, what Unicode assigns distinct numbers to)
 - Grapheme (ie, what we think of as a "character")

 A code-point can be made up of one or more code-units. Likewise, a 
grapheme
 can be made up of one or more code-points.

 There are (at least) two types of code points:

 - Regular ones, such as letters, digits, and punctuation.

 - "Combining Characters", such as accent marks (or if you're 
familiar with
 Japanese, the little things in the upper-right corner that change 
an "s" to
 a "z" or an "h" to a "p". Or like German's umlaut - the two dots 
above a
 vowel). Ie, things that are not characters in their own right, but 
merely
 modify other characters. These can be often (always?) be thought of 
as being
 like overlays.
You can also say there are 2 kinds of characters: simple like "u" & composite "ü" or "ṵ̈̈". _________________ vita es estrany spir.wikidot.com
Jan 14 2011