www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 11017] New: std.string/uni.toLower is very slow

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017

           Summary: std.string/uni.toLower is very slow
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: peter.alexander.au gmail.com


--- Comment #0 from Peter Alexander <peter.alexander.au gmail.com> 2013-09-12
10:52:33 PDT ---
char[] s = new char[10_000_000];
s[] = 'A';
auto s2 = s.toLower;

This takes 4.3 seconds on my machine.


char[] s = new char[10_000_000];
s[] = 'A';
auto s2 = s.map!toLower.to!string;

This only takes 1.1 seconds.

Looking at the code for std.uni.toLower, it appears the string is constructed
using repeated ~=. It should use an Appender of some sort.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 12 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com


--- Comment #1 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-09-12
11:59:08 PDT ---
(In reply to comment #0)
 char[] s = new char[10_000_000];
 s[] = 'A';
 auto s2 = s.toLower;
 
 This takes 4.3 seconds on my machine.
 
 
 char[] s = new char[10_000_000];
 s[] = 'A';
 auto s2 = s.map!toLower.to!string;
 
 This only takes 1.1 seconds.
 
There 2 things here to consider - first the 2nd one is not correct in general (1 codepoint can map to many e.g. german sharp S).
 Looking at the code for std.uni.toLower, it appears the string is constructed
 using repeated ~=. It should use an Appender of some sort.
This indeed could be fixed I do suspect put an optimisitc reserve(original.length) there would work even better. See also issue 10864: http://d.puremagic.com/issues/show_bug.cgi?id=10864 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017



--- Comment #2 from Peter Alexander <peter.alexander.au gmail.com> 2013-09-12
12:45:45 PDT ---
(In reply to comment #1)
 There 2 things here to consider - first the 2nd one is not correct in general
 (1 codepoint can map to many e.g. german sharp S).
Good point, although std.uni.toUpper doesn't handle it either :-) assert("".toUpper == ""); // passes -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017



--- Comment #3 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-09-12
12:50:37 PDT ---
(In reply to comment #2)
 (In reply to comment #1)
 There 2 things here to consider - first the 2nd one is not correct in general
 (1 codepoint can map to many e.g. german sharp S).
Good point, although std.uni.toUpper doesn't handle it either :-) assert("".toUpper == ""); // passes
To Lower will do. Sharp S is capital ;) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017



--- Comment #4 from Peter Alexander <peter.alexander.au gmail.com> 2013-09-12
12:52:31 PDT ---
(In reply to comment #3)
 To Lower will do. Sharp S is capital ;)
assert("".toLower == ""); assert("".toUpper == ""); Both pass. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017



--- Comment #5 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-09-12
14:01:05 PDT ---
(In reply to comment #4)
 (In reply to comment #3)
 To Lower will do. Sharp S is capital ;)
assert("".toLower == ""); assert("".toUpper == ""); Both pass.
Something wicked have happend. I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently. How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=11017



--- Comment #6 from Dmitry Olshansky <dmitry.olsh gmail.com> 2013-09-12
14:07:17 PDT ---
(In reply to comment #5)
 (In reply to comment #4)
 (In reply to comment #3)
 To Lower will do. Sharp S is capital ;)
assert("".toLower == ""); assert("".toUpper == ""); Both pass.
Something wicked have happend. I see that I've messed up toUpper in table generator while introducing toTitleCase (that isn't even yet exposed!). toLower is fine, toUpper is broken in half of cases apparently. How I missed that I've no idea ... gotta expand the test coverage around toLower/toUpper.
P.S. And there are both kinds of sharp s ... \u1E9E and \u00df -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 12 2013