digitalmars.D.bugs - [Issue 6791] New: std.algorithm.splitter random indexes utf strings
- d-bugmail puremagic.com (22/22) Oct 07 2011 http://d.puremagic.com/issues/show_bug.cgi?id=6791
- d-bugmail puremagic.com (18/18) Aug 18 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6791
- d-bugmail puremagic.com (18/28) Aug 18 2013 http://d.puremagic.com/issues/show_bug.cgi?id=6791
http://d.puremagic.com/issues/show_bug.cgi?id=6791 Summary: std.algorithm.splitter random indexes utf strings Product: D Version: D2 Platform: Other OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: dawg dawgfoto.de --- Comment #0 from dawg dawgfoto.de 2011-10-07 22:51:09 PDT --- Throws an UTFException. string s = `là dove terminava quella valle`; foreach(word; std.array.splitter(s)) writeln(word); --- The second UTF-8 code point of 'à' is 0xA0 for which isWhite is true. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 07 2011
http://d.puremagic.com/issues/show_bug.cgi?id=6791 hsteoh quickfur.ath.cx changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hsteoh quickfur.ath.cx --- Comment #1 from hsteoh quickfur.ath.cx 2013-08-18 22:22:41 PDT --- This is caused by struct SplitterResult in std.algorithm using array slicing and array indexing to pass char (not dchar!) to the lambda. SplitterResult appears to have multiple issues: it uses array slicing without a proper signature constraint on hasSlicing, and doesn't work properly for narrow strings because it uses indexing which for narrow strings doesn't handle multibyte UTF-8 sequences properly. It appears to be wanting a rewrite that uses only forward range primitives, or at least, an overload for narrow strings that properly take multibyte characters into account. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 18 2013
http://d.puremagic.com/issues/show_bug.cgi?id=6791 monarchdodra gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |monarchdodra gmail.com AssignedTo|nobody puremagic.com |monarchdodra gmail.com --- Comment #2 from monarchdodra gmail.com 2013-08-18 23:25:05 PDT --- (In reply to comment #1)This is caused by struct SplitterResult in std.algorithm using array slicing and array indexing to pass char (not dchar!) to the lambda. SplitterResult appears to have multiple issues: it uses array slicing without a proper signature constraint on hasSlicing, and doesn't work properly for narrow strings because it uses indexing which for narrow strings doesn't handle multibyte UTF-8 sequences properly. It appears to be wanting a rewrite that uses only forward range primitives, or at least, an overload for narrow strings that properly take multibyte characters into account.I had submitted a correction for this about 1 year ago, but it ended up being too big in scope (*all* splitter flavors have bugs). It also ended up being messy due to (trying to avoid) code duplication. It might be better to just fix things little by little though, rather than not at all. I'll fix *just* "splitter!pred": It's the easiest to fix. We'll see where we go from there. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 18 2013