www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6791] New: std.algorithm.splitter random indexes utf strings

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6791

           Summary: std.algorithm.splitter random indexes utf strings
           Product: D
           Version: D2
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: dawg dawgfoto.de


--- Comment #0 from dawg dawgfoto.de 2011-10-07 22:51:09 PDT ---
Throws an UTFException.

string s = `l dove terminava quella valle`;
foreach(word; std.array.splitter(s))
  writeln(word);

---

The second UTF-8 code point of '' is 0xA0 for which isWhite is true.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 07 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6791


hsteoh quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh quickfur.ath.cx


--- Comment #1 from hsteoh quickfur.ath.cx 2013-08-18 22:22:41 PDT ---
This is caused by struct SplitterResult in std.algorithm using array slicing
and array indexing to pass char (not dchar!) to the lambda. SplitterResult
appears to have multiple issues: it uses array slicing without a proper
signature constraint on hasSlicing, and doesn't work properly for narrow
strings because it uses indexing which for narrow strings doesn't handle
multibyte UTF-8 sequences properly.

It appears to be wanting a rewrite that uses only forward range primitives, or
at least, an overload for narrow strings that properly take multibyte
characters into account.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 18 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6791


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com
         AssignedTo|nobody puremagic.com        |monarchdodra gmail.com


--- Comment #2 from monarchdodra gmail.com 2013-08-18 23:25:05 PDT ---
(In reply to comment #1)
 This is caused by struct SplitterResult in std.algorithm using array slicing
 and array indexing to pass char (not dchar!) to the lambda. SplitterResult
 appears to have multiple issues: it uses array slicing without a proper
 signature constraint on hasSlicing, and doesn't work properly for narrow
 strings because it uses indexing which for narrow strings doesn't handle
 multibyte UTF-8 sequences properly.
 
 It appears to be wanting a rewrite that uses only forward range primitives, or
 at least, an overload for narrow strings that properly take multibyte
 characters into account.
I had submitted a correction for this about 1 year ago, but it ended up being too big in scope (*all* splitter flavors have bugs). It also ended up being messy due to (trying to avoid) code duplication. It might be better to just fix things little by little though, rather than not at all. I'll fix *just* "splitter!pred": It's the easiest to fix. We'll see where we go from there. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 18 2013