www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 5077] New: std.algorithm.schwartzSort is slow

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077

           Summary: std.algorithm.schwartzSort is slow
           Product: D
           Version: D2
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc


--- Comment #0 from bearophile_hugs eml.cc 2010-10-18 19:08:43 PDT ---
Here are few benchmarks that show that schwartzSort() is much slower than
sort().

(While in Python the usage of the 'key' argumement, analogous to a Schwartz
sort, usually leads to faster code. But Python and D have very different
compilation structure, so comparisons are hazardous at best.)


Timings 2.1 GHz CPU, , DATA_LEN=1_000_000, best of 4, seconds:
  none:      0.3
  sort:      4.1
  sort:      3.4 (alternative)
  schwartz: 17.9


I have no idea why the "alternative" sort is faster than the normal sort, I was
expecting the opposite.

-----------------------------------------

import std.stdio: writeln;
import std.algorithm: schwartzSort, sort;
import std.random: Random, uniform;
import std.typecons: Tuple;

enum SortType { none, sort, schwartz }

enum DATA_LEN = 1_000_000;
enum sort_type = SortType.sort;


alias Tuple!(double, "x", double, "y") P;

void main() {
    auto data = new P[DATA_LEN];
    auto rnd = Random(1);
    foreach (ref el; data) {
        double x = uniform(0.0, 1.0, rnd);
        double y = uniform(0.0, 1.0, rnd);
        el = P(x, y);
    }

    if (data.length < 50) writeln(data);

    static if (sort_type == SortType.schwartz) {
        schwartzSort!((p) { return p.y; })(data);
        schwartzSort!((p) { return p.x; })(data);
        schwartzSort!((p) { return p.y; })(data);
    }

    static if (sort_type == SortType.sort) {
        sort!q{ a.y < b.y }(data);
        sort!q{ a.x < b.x }(data);
        sort!q{ a.y < b.y }(data);
        /*
        // alternative
        sort!((P a, P b) { return a.y < b.y; })(data);
        sort!((P a, P b) { return a.x < b.x; })(data);
        sort!((P a, P b) { return a.y < b.y; })(data);
        */
    }

    if (data.length < 50) writeln(data);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 18 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077


Peter Alexander <peter.alexander.au gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter.alexander.au gmail.co
                   |                            |m


--- Comment #1 from Peter Alexander <peter.alexander.au gmail.com> 2010-10-19
00:05:18 PDT ---
I get quite similar timings:

sort: 3.8
alt. sort: 3.4
schwartz: 25.8

This is with -inline -O -release.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com


--- Comment #2 from Andrei Alexandrescu <andrei metalanguage.com> 2010-10-19
06:37:48 PDT ---
This is a misunderstanding of Schwartz sort is supposed to do. D's equivalent
of Python's key argument is not Schwartz sort, but instead:

        sort!q{p.x < p.y}(data);

i.e. the keys are not copied but instead a projection is used for the
comparison. That's your "alt" sort. Schwartz sort is meant for usage when the
key computation (in this case a simple member access) is too expensive to be
done more than once. To avoid that, schwartzSort creates an array of the
computed keys and then sorts the keys and the original arrays in lockstep. It
is expected that schwartzSort is considerably slower than sort for cheap keys.
It is also expected that "alt" sort is the best of the breed because it has the
fastest key computation.

I'm leaving this open in case you have new experiments that do reveal a
problem. Otherwise, feel free to close it.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077



--- Comment #3 from bearophile_hugs eml.cc 2010-10-22 04:52:28 PDT ---
(In reply to comment #2)

Thank you for your answers.

 This is a misunderstanding of Schwartz sort is supposed to do. D's equivalent
 of Python's key argument is not Schwartz sort, but instead:
 
         sort!q{p.x < p.y}(data);
 
 i.e. the keys are not copied but instead a projection is used for the
 comparison. That's your "alt" sort.
In that D benchmark there are 3 kinds of sort, two use the normal sort() the other uses schwartzSort. The "alternative" version is still a normal sort. The only difference is that it uses a delegate, as you see from the code: (P a, P b) { return a.y < b.y; } instead of a "template lambda": q{ a.y < b.y } Regarding Python, years ago Python2 used to have just a sort like this, with the "cmp" argument: s.sort(cmp) From the docs: http://docs.python.org/library/stdtypes.html#mutable-sequence-types cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(), y.lower()). The default value is None. Recent Python2 versions use have a more complex sort signature: s.sort([cmp[, key[, reverse]]]) Where beside the "cmp" that's still present, there is the "key" function: key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None. Python3 has removed the cmp argument. In the bug report I was referring to "key" that's a function that takes a single argument and return a single transformed item. So in Python if you use the "key" you are performing a Schwartz sort. C source code of CPython is available online, if you use "key" CPython builds a temporary array of the transformed data, that is used to sort the true data.
 I'm leaving this open in case you have new experiments that do reveal a
 problem. Otherwise, feel free to close it.
The performance of schwartzSort is too much low, so in my opinion the bug report needs to be open still. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077



--- Comment #4 from bearophile_hugs eml.cc 2010-10-22 04:55:33 PDT ---
Sorry, I have forgotten a Python version of the code:


from random import random, seed
from operator import itemgetter

SortType_none = 0
SortType_sort = 1
SortType_schwartz = 2

DATA_LEN = 1000 # **********
sort_type = SortType_schwartz

def main():
    seed(1)
    data = [(random(), random()) for _ in xrange(DATA_LEN)]

    if len(data) < 50: print data

    if sort_type == SortType_schwartz:
        data.sort(key=itemgetter(1))
        data.sort(key=itemgetter(0))
        data.sort(key=itemgetter(1))

    if sort_type == SortType_sort:
        data.sort(cmp=lambda a, b: cmp(a[1], b[1]))
        data.sort(cmp=lambda a, b: cmp(a[0], b[0]))
        data.sort(cmp=lambda a, b: cmp(a[1], b[1]))

    if len(data) < 50: print data

main()

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5077



--- Comment #5 from bearophile_hugs eml.cc 2011-06-21 15:53:29 PDT ---
See bug 6192 for more

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 21 2011