www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - State of the Unicode in D

reply Walter Bright <newshound2 digitalmars.com> writes:
http://training.perl.com/OSCON2011/index.html

This is a good starting point for seeing where we are with Unicode support and 
where we need to go.
Jul 29 2011
next sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
That \N syntax sugar could easily be replacable with a Phobos function
called via CTFE.
Jul 29 2011
parent KennyTM~ <kennytm gmail.com> writes:
On Jul 30, 11 07:37, Andrej Mitrovic wrote:
 That \N syntax sugar could easily be replacable with a Phobos function
 called via CTFE.

Possible, but don't do it :). The table would have like 0x18000 entries (just a guess). If each character name is 20 letter long, Phobos need to supply a 2 MB file for this rarely used feature. Besides, D has '\&afr;' already. There are more important features like Unicode properties, normalization (á <-> a´), locale-specific casing (dotless i), collation etc. that should be supported before having \N. (I'd prefer these be done via a wrapper to ICU, as most are database-based.)
Jul 29 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/29/2011 4:24 PM, Walter Bright wrote:
 http://training.perl.com/OSCON2011/index.html

 This is a good starting point for seeing where we are with Unicode support and
 where we need to go.

One problem: http://d.puremagic.com/issues/show_bug.cgi?id=6403
Jul 29 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 30.07.2011 5:21, Walter Bright wrote:
 On 7/29/2011 4:24 PM, Walter Bright wrote:
 http://training.perl.com/OSCON2011/index.html

 This is a good starting point for seeing where we are with Unicode 
 support and
 where we need to go.

One problem: http://d.puremagic.com/issues/show_bug.cgi?id=6403

Let me expand a bit my reply on bugzilla. There are other things I'd like to note, besides conforming to unicode regex standard, that is (going to be) fully supported in upcoming next-gen std.regex. Things I'd love to see in an upgrade of std.uni: - normalization (at least NFC) - unicode version 5.0 ---> 6.0 - grapheme support, via a special range on top of string or at least plain "stride" function that tells the length of a cluster a-la the one that does UTF-8 decoding I had to (re)implement a lot of stuff, with the end result that the unicode support in regex is self-contained right now. Of course, I'd be willing to make arrangements to gradually shift some of this stuff back where it belongs, once I'm finished with regexes. -- Dmitry Olshansky
Jul 30 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/30/2011 12:09 PM, Dmitry Olshansky wrote:
 Let me expand a bit my reply on bugzilla.
 There are other things I'd like to note, besides conforming to unicode regex
 standard, that is (going to be) fully supported in upcoming next-gen std.regex.
 Things I'd love to see in an upgrade of std.uni:
 - normalization (at least NFC)
 - unicode version 5.0 ---> 6.0
 - grapheme support, via a special range on top of string or at least plain
 "stride" function that tells the length of a cluster a-la the one that does
 UTF-8 decoding
 I had to (re)implement a lot of stuff, with the end result that the unicode
 support in regex is self-contained right now.
 Of course, I'd be willing to make arrangements to gradually shift some of this
 stuff back where it belongs, once I'm finished with regexes.

Sounds great!
Jul 30 2011