digitalmars.D - Unicode, graphemes and D

bearophile (7/7) Apr 05 2012 For people interested in a better Unicode handling in D, I have seen tha...

Dmitry Olshansky (5/12) Apr 05 2012 FYI

stephan (7/9) Apr 05 2012 Maybe helpful for your GSOC project: as part of a larger code

Dmitry Olshansky (6/17) Apr 05 2012 Nice.

stephan (18/19) Apr 05 2012 Ah, the licensing question. I am not a lawyer and I don't know

bearophile <bearophileHUGS lycos.com> writes:

For people interested in a better Unicode handling in D, I have seen that Perl
has some support for graphemes, /\X/ matches an extended grapheme cluster:

http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul

http://perldoc.perl.org/perluniprops.html

Perl seems one of the best languages to manage Unicode (D and Go too are good):
http://rosettacode.org/wiki/String_length#Grapheme_Length_2

Bye,
bearophile

Apr 05 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 05.04.2012 15:53, bearophile wrote:
 For people interested in a better Unicode handling in D, I have seen that Perl
has some support for graphemes, /\X/ matches an extended grapheme cluster:

 http://perldoc.perl.org/perl5120delta.html#Unicode-overhaul

 http://perldoc.perl.org/perluniprops.html

 Perl seems one of the best languages to manage Unicode (D and Go too are good):
 http://rosettacode.org/wiki/String_length#Grapheme_Length_2

 Bye,
 bearophile

FYI


-- 
Dmitry Olshansky

Apr 05 2012

"stephan" <stephanfmueller+dlang gmail.com> writes:

 FYI


Maybe helpful for your GSOC project: as part of a larger code 
base, we have implemented many standard Unicode algorithms 
(normalization; casefolding; graphemes; info like general 
category, Bidi class, joining type, etc.; ...).

The doc and source can be found at http://stephan.bitbucket.org/. 
As this was just a helper, it is not fully polished (but it works 
and is reasonably fast).

Apr 05 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 05.04.2012 18:56, stephan wrote:
 FYI


 Maybe helpful for your GSOC project: as part of a larger code base, we
 have implemented many standard Unicode algorithms (normalization;
 casefolding; graphemes; info like general category, Bidi class, joining
 type, etc.; ...).

 The doc and source can be found at http://stephan.bitbucket.org/. As
 this was just a helper, it is not fully polished (but it works and is
 reasonably fast).

Nice.
I'll add a link to my proposal. Though I can use it iff the license is 
Boost compatible.

-- 
Dmitry Olshansky

Apr 05 2012

"stephan" <stephanfmueller+dlang gmail.com> writes:

On Thursday, 5 April 2012 at 16:17:46 UTC, Dmitry Olshansky wrote:
 Though I can use it iff the license is Boost compatible.

Ah, the licensing question. I am not a lawyer and I don't know 
much about copyright law. So you have to do your own research. 
But here is my view regarding the unicodedata.d license situation.

Our code is Boost licensed. It is however not a clean-room 
installation. Although almost all algorithms and data structures 
are different and there is minimal (and clearly marked) direct 
copying, we have looked quite a bit at the ICU implementation 
(and its predecessors) for inspiration. The ICU license is very 
permissive, hence you should be ok here.

Furthermore, data files from the Unicode Consortium are part of 
the distribution. They are used in the "script mode" (version 
SCRIPT_DATA) to generate the relevant Unicode data in an 
appropriate format. Furthermore, they are used in the extensive 
unit tests (version ALL_UNIT_TESTS) for testing correctness 
against various test files and derived property files. Again, the 
data files have a very permissive license.

Let me know if I can be of any help.

Apr 05 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Unicode, graphemes and D