digitalmars.D.bugs - [Issue 7085] New: std.algorithm.reverse() problem with Unicode dchar[]
- d-bugmail puremagic.com (32/32) Dec 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7085
- d-bugmail puremagic.com (22/22) Dec 09 2011 http://d.puremagic.com/issues/show_bug.cgi?id=7085
http://d.puremagic.com/issues/show_bug.cgi?id=7085 Summary: std.algorithm.reverse() problem with Unicode dchar[] Product: D Version: D2 Platform: x86 OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc --- Comment #0 from bearophile_hugs eml.cc 2011-12-09 01:32:52 PST --- This code compiles and runs raising no assert error, so reverse() is giving a wrong result on a dchar[]: import std.algorithm: reverse; void main() { dchar[] txt = "\U00000041\U00000308\U00000042"d.dup; txt.reverse(); assert(txt == "\U00000042\U00000308\U00000041"d); } txt contains LATIN CAPITAL LETTER A, COMBINING DIAERESIS, LATIN CAPITAL LETTER B. See bug 7084 for more details. A more correct output for reversing txt is (LATIN CAPITAL LETTER B, LATIN CAPITAL LETTER A, COMBINING DIAERESIS): "\U00000042\U00000041\U00000308"d or even (LATIN CAPITAL LETTER B, LATIN CAPITAL LETTER A WITH DIAERESIS) (but this changes the array size and it's not necessary): "\U00000042\U000000C4"d -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 09 2011
http://d.puremagic.com/issues/show_bug.cgi?id=7085 Jonathan M Davis <jmdavisProg gmx.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |jmdavisProg gmx.com Resolution| |INVALID --- Comment #1 from Jonathan M Davis <jmdavisProg gmx.com> 2011-12-09 02:36:00 PST --- No, this behavior is as-designed. You're misunderstanding dchars. A dchar is a UTF-32 code unit, which is then guaranteed to be a code point. When you reverse a range of dchar - be it a dchar[] or some other data structure - the code points are reversed. It doesn't take graphemes into account _at all_. If you want to reverse a string based an graphemes, you need to have a range of graphemes not a range of dchar. Phobos does not currently have support for a range of graphemes, which makes that quite a bit harder to do, but until then, all ranges of characters are ranges of dchar, and any function which operates on a range is going to treat them as ranges of dchar, not graphemes, so reverse is going to reverse code points, even if that's not what the programmer really wanted. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 09 2011