digitalmars.D.bugs - [Issue 7689] New: splitter() on ivalid UTF-8 sequences
- d-bugmail puremagic.com (34/34) Mar 11 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7689
- d-bugmail puremagic.com (16/39) Oct 22 2012 http://d.puremagic.com/issues/show_bug.cgi?id=7689
http://d.puremagic.com/issues/show_bug.cgi?id=7689
Summary: splitter() on ivalid UTF-8 sequences
Product: D
Version: D2
Platform: x86
OS/Version: Windows
Status: NEW
Severity: normal
Priority: P2
Component: Phobos
AssignedTo: nobody puremagic.com
ReportedBy: bearophile_hugs eml.cc
Is this difference/inconsistency between split() and splitter() desired and
good?
import std.string, std.array, std.algorithm, std.range;
void main() {
char[] s = cast(char[])[131, 64, 32, 251, 22];
assert(std.string.split(s).length == 2); // no error
assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence
assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8
sequence
}
Output, DMD 2.059head:
std.utf.UTFException std\utf.d(645): Invalid UTF-8 sequence (at index 1)
----------------
...\dmd2\src\phobos\std\array.d(469): dchar
std.array.front!(char[]).front(char[])
...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28...
...\dmd2\src\phobos\std\range.d(971): D3std5range97__...
----------------
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 11 2012
http://d.puremagic.com/issues/show_bug.cgi?id=7689
monarchdodra gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
CC| |monarchdodra gmail.com
AssignedTo|nobody puremagic.com |monarchdodra gmail.com
Is this difference/inconsistency between split() and splitter() desired and
good?
import std.string, std.array, std.algorithm, std.range;
void main() {
char[] s = cast(char[])[131, 64, 32, 251, 22];
assert(std.string.split(s).length == 2); // no error
assert(walkLength(std.array.splitter(s)) == 2); // Invalid UTF-8 sequence
assert(walkLength(std.algorithm.splitter(s)) == 2); // Invalid UTF-8
sequence
}
Output, DMD 2.059head:
std.utf.UTFException std\utf.d(645): Invalid UTF-8 sequence (at index 1)
----------------
...\dmd2\src\phobos\std\array.d(469): dchar
std.array.front!(char[]).front(char[])
...\dmd2\src\phobos\std\algorithm.d(2110): D3std9algorithm47__T8splitterS28...
...\dmd2\src\phobos\std\range.d(971): D3std5range97__...
----------------
This is a bug in string.split (which is actually a public import of
array.split).
Currently array.split only supports ascii white, and is oblivious to longer utf
whites (but it does work on unicode).
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 22 2012








d-bugmail puremagic.com