www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - VLERange: a range in between BidirectionalRange and RandomAccessRange

On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:igoj6s$17r6$1 digitalmars.com...
 I'm not so sure about that. What do you base this assessment on? 




 wrote a library that according to him does grapheme-related stuff 




 else does. So apparently graphemes is not what people care about 




 it might be what they should care about).


It's what they want, they just don't know it. Graphemes are what many people *think* code points are.
 This might be a good time to see whether we need to address graphemes
 systematically. Could you please post a few links that would 




 and others in the mysteries of combining characters?


Maybe someone else has a link to an explanation (I don't), but it's basically just this:


(You will certainly not find it in Unicode literature, for instance.) Nick's explanation below is good and concise. (Just 2 notes added.)
 Three levels of abstraction from lowest to highest:
 - Code Unit (ie, encoding)
 - Code Point (ie, what Unicode assigns distinct numbers to)
 - Grapheme (ie, what we think of as a "character")

 A code-point can be made up of one or more code-units. Likewise, a 


 can be made up of one or more code-points.

 There are (at least) two types of code points:

 - Regular ones, such as letters, digits, and punctuation.

 - "Combining Characters", such as accent marks (or if you're 


 Japanese, the little things in the upper-right corner that change 


 a "z" or an "h" to a "p". Or like German's umlaut - the two dots 


 vowel). Ie, things that are not characters in their own right, but 


 modify other characters. These can be often (always?) be thought of 


 like overlays.


composite "ü" or "ṵ̈̈". _________________ vita es estrany spir.wikidot.com
Jan 14 2011