digitalmars.D.bugs - [Issue 391] New: .sort and .reverse break utf8 encoding
- d-bugmail puremagic.com (23/23) Oct 02 2006 http://d.puremagic.com/issues/show_bug.cgi?id=391
- Stewart Gordon (14/24) Oct 03 2006 AIUI sort and reverse are defined to sort/reverse the individual
- Derek Parnell (7/23) Oct 03 2006 Yes, I realize that but it makes Walter's statements that char[] is all ...
- Walter Bright (4/15) Oct 03 2006 .sort and .reverse should reverse the unicode characters. If you want to...
- Sean Kelly (13/29) Oct 04 2006 Changing the behavior of .reverse kind of makes sense, but I don't
- Walter Bright (5/16) Oct 04 2006 A use for it is collecting character usage frequency statistics is one
- Lionello Lunesu (6/33) Oct 04 2006 What if you want to use a quick binary search look-up to see if a text
- Thomas Kuehne (16/27) Oct 03 2006 -----BEGIN PGP SIGNED MESSAGE-----
- d-bugmail puremagic.com (9/9) Oct 10 2006 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (15/15) Dec 23 2006 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (11/11) Jan 24 2007 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (16/16) Apr 21 2009 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (12/12) Nov 26 2010 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (8/8) Nov 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (8/8) Nov 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (9/9) Nov 20 2012 http://d.puremagic.com/issues/show_bug.cgi?id=391
- d-bugmail puremagic.com (11/11) Dec 27 2012 http://d.puremagic.com/issues/show_bug.cgi?id=391
http://d.puremagic.com/issues/show_bug.cgi?id=391 Summary: .sort and .reverse break utf8 encoding Product: D Version: unspecified Platform: PC OS/Version: All Status: NEW Severity: major Priority: P2 Component: DMD AssignedTo: bugzilla digitalmars.com ReportedBy: ddparnell bigpond.com import std.utf; import std.stdio; void main() { char[] a; a = "\u3026\u2021\u3061\n"; writefln("plain"); validate(a); writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // fails } --
Oct 02 2006
d-bugmail puremagic.com wrote: <snip>import std.utf; import std.stdio; void main() { char[] a; a = "\u3026\u2021\u3061\n"; writefln("plain"); validate(a); writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // fails }AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm.... Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Oct 03 2006
On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:d-bugmail puremagic.com wrote: <snip>Yes, I realize that but it makes Walter's statements that char[] is all we need and we do not need a 'string' a bit weaker. -- Derek Parnell Melbourne, Australia "Down with mediocrity!"import std.utf; import std.stdio; void main() { char[] a; a = "\u3026\u2021\u3061\n"; writefln("plain"); validate(a); writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // fails }AIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
Oct 03 2006
Derek Parnell wrote:On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:.sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first. Both behaviors will be fixed in the next update.d-bugmail puremagic.com wrote:Yes, I realize that but it makes Walter's statements that char[] is all we need and we do not need a 'string' a bit weaker.writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // failsAIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
Oct 03 2006
Walter Bright wrote:Derek Parnell wrote:Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII). I'm starting to feel like people are harping on Unicode issues just for the sake of doing so rather than because these are actual problems. Can someone please explain what I'm missing? SeanOn Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:.sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first.d-bugmail puremagic.com wrote:Yes, I realize that but it makes Walter's statements that char[] is all we need and we do not need a 'string' a bit weaker.writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // failsAIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
Oct 04 2006
Sean Kelly wrote:Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII). I'm starting to feel like people are harping on Unicode issues just for the sake of doing so rather than because these are actual problems. Can someone please explain what I'm missing?A use for it is collecting character usage frequency statistics is one such. Read a text file into a buffer, sort the buffer, and dump the result! I don't mind the harping on it. Getting the details right is important, even if the details themselves aren't. Besides, it's an easy fix.
Oct 04 2006
Sean Kelly wrote:Walter Bright wrote:What if you want to use a quick binary search look-up to see if a text contains a given character? ;) Not that I've ever needed it, but it makes sense to just fix it. How often do you .reverse a string, for that matter? L.Derek Parnell wrote:Changing the behavior of .reverse kind of makes sense, but I don't understand the reason for changing .sort aside from consistency. Personally, I've never had a reason to sort a char array in the first place unless the chars were intended to represent something other than their lexical meaning. And that aside, sorting chars in a string without a comparison predicate will do so using the char's binary value, which has no lexical significance beyond the 26 letters of the English alphabet (as represented in ASCII).On Tue, 03 Oct 2006 21:43:46 +0100, Stewart Gordon wrote:.sort and .reverse should reverse the unicode characters. If you want to reverse/sort the individual bytes, you should cast it to a ubyte[] first.d-bugmail puremagic.com wrote:Yes, I realize that but it makes Walter's statements that char[] is all we need and we do not need a 'string' a bit weaker.writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // failsAIUI sort and reverse are defined to sort/reverse the individual elements of the array, rather than the Unicode characters that make up a string. But hmm....
Oct 04 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 d-bugmail puremagic.com schrieb am 2006-10-02:http://d.puremagic.com/issues/show_bug.cgi?id=391import std.utf; import std.stdio; void main() { char[] a; a = "\u3026\u2021\u3061\n"; writefln("plain"); validate(a); writefln("sorted"); validate(a.sort); // fails writefln("reversed"); validate(a.reverse); // fails }Added to DStress as http://dstress.kuehne.cn/run/r/reverse_08_A.d http://dstress.kuehne.cn/run/r/reverse_08_B.d http://dstress.kuehne.cn/run/r/reverse_08_C.d http://dstress.kuehne.cn/run/s/sort_16_A.d http://dstress.kuehne.cn/run/s/sort_16_B.d http://dstress.kuehne.cn/run/s/sort_16_C.d Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFFI033LK5blCcjpWoRAgxQAJ4soetJ+LZHkmwiFl5YqkGdrjmOjACeI2GG wkC8F4+qfNmVEbLeUT0t06g= =HqWF -----END PGP SIGNATURE-----
Oct 03 2006
http://d.puremagic.com/issues/show_bug.cgi?id=391 bugzilla digitalmars.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Fixed DMD 0.169 --
Oct 10 2006
http://d.puremagic.com/issues/show_bug.cgi?id=391 thomas-dloop kuehne.cn changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | Process terminating with default action of signal 11 (SIGSEGV) Bad permissions for mapped region at address 0x805A0EC at 0x80544A3: _D3std8typeinfo8ti_dchar10TypeInfo_w4swapMFPvPvZv (in run/s/sort_16_A.d.exe) by 0x8050ACD: _adSort (in run/s/sort_16_A.d.exe) by 0x804A0F4: _Dmain (in run/s/sort_16_A.d:17) by 0x804BBE6: main (in run/s/sort_16_A.d.exe) --
Dec 23 2006
http://d.puremagic.com/issues/show_bug.cgi?id=391 thomas-dloop kuehne.cn changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED Fixed indeed in DMD 0.169 The test cases failed due to missing dups and thus trying to sort an constant string in place. --
Jan 24 2007
http://d.puremagic.com/issues/show_bug.cgi?id=391 clugdbug yahoo.com.au changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | This case (cut down from reverse_08_C) is still failing. int main(){ wchar[] a = "a\U00000081b\U00002000c\U00010000"; wchar[] b = a.dup; b.reverse; // OK b.reverse; // fails return 0; } --
Apr 21 2009
http://d.puremagic.com/issues/show_bug.cgi?id=391 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |ASSIGNED CC| |andrei metalanguage.com Version|1.00 |D1 & D2 11:30:22 PST --- Don's latest fails both on 1.065 and 2.050. Marking as a D1 & D2 issue. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 26 2010
http://d.puremagic.com/issues/show_bug.cgi?id=391 Commit pushed to phobos-1.x at https://github.com/D-Programming-Language/phobos https://github.com/D-Programming-Language/phobos/commit/8b4f262f9ed898a82e55e269ee68a865d97cc122 fix Issue 391 - .sort and .reverse break utf8 encoding -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 20 2012
http://d.puremagic.com/issues/show_bug.cgi?id=391 Commit pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/b30134123b200f0daa616015ac8d6bdcfb350c50 fix Issue 391 - .sort and .reverse break utf8 encoding -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 20 2012
http://d.puremagic.com/issues/show_bug.cgi?id=391 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 20 2012
http://d.puremagic.com/issues/show_bug.cgi?id=391 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|DMD |druntime AssignedTo|bugzilla digitalmars.com |nobody puremagic.com 18:40:32 PST --- This is also fixed in Phobos1. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 27 2012