digitalmars.D - VLERange: a range in between BidirectionalRange and RandomAccessRange

spir (19/58) Jan 14 2011 nobody

On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:igoj6s$17r6$1 digitalmars.com...
 I'm not so sure about that. What do you base this assessment on? 




Denis
 wrote a library that according to him does grapheme-related stuff 




nobody
 else does. So apparently graphemes is not what people care about 




(although
 it might be what they should care about).


 It's what they want, they just don't know it.

 Graphemes are what many people *think* code points are.

 This might be a good time to see whether we need to address graphemes
 systematically. Could you please post a few links that would 




educate me
 and others in the mysteries of combining characters?


 Maybe someone else has a link to an explanation (I don't), but it's
 basically just this:


If anyone finds a pointer to such an explanation, bravo, and than you.
(You will certainly not find it in Unicode literature, for instance.)
Nick's explanation below is good and concise. (Just 2 notes added.)

 Three levels of abstraction from lowest to highest:
 - Code Unit (ie, encoding)
 - Code Point (ie, what Unicode assigns distinct numbers to)
 - Grapheme (ie, what we think of as a "character")

 A code-point can be made up of one or more code-units. Likewise, a 


grapheme
 can be made up of one or more code-points.

 There are (at least) two types of code points:

 - Regular ones, such as letters, digits, and punctuation.

 - "Combining Characters", such as accent marks (or if you're 


familiar with
 Japanese, the little things in the upper-right corner that change 


an "s" to
 a "z" or an "h" to a "p". Or like German's umlaut - the two dots 


above a
 vowel). Ie, things that are not characters in their own right, but 


merely
 modify other characters. These can be often (always?) be thought of 


as being
 like overlays.


You can also say there are 2 kinds of characters: simple like "u" &
composite "ü" or "ṵ̈̈".

_________________
vita es estrany
spir.wikidot.com

Jan 14 2011

D Programming

C/C++ Programming

Other

digitalmars.D - VLERange: a range in between BidirectionalRange and RandomAccessRange