www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 5076] New: std.algorithm.sorted / schwartzSorted

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076

           Summary: std.algorithm.sorted / schwartzSorted
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: patch
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc


--- Comment #0 from bearophile_hugs eml.cc 2010-10-18 19:05:59 PDT ---
I propose to add two new little functions to Phobos, named
sorted()/schwartzSorted(). Their purpose is to copy the input items, sort them
and return them.

The -ed versions are more functional, they don't modify the input data, so they
may work with a immutable input sequence too. They may be used as expressions
instead of as statements, so in theory they allow code like (currently this
doesn't work, see bug 5074 ):


auto foo(immutable(int)[] data) {
    return map!q{ -a }(sorted(data));
}


Instead of:

auto foo(immutable(int)[] data) {
    int[] arr = data.dup;
    sort(arr);
    return map!q{ -a }(arr);
}


sorted()/schwartzSorted() may be seen as less efficient than
sort()/schwartzSort() because they copy the input data, but there are many
situations (like script-like D code) where the programmer wants short and quick
code, and knows the number of input items will be low enough to not cause
memory shortage. Python too has both list.sort() and sorted() built-ins.


A possible simple implementation, for DMD 2.049 (Code not tested much):


import std.algorithm: SwapStrategy, schwartzSort, sort;
import std.range: isRandomAccessRange, hasLength;
import std.array: array;

auto sorted(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable,
        Range)(Range r)
{
    auto auxr = array(r);
    sort!(less, ss)(auxr);
    return auxr;
}

auto schwartzSorted(alias transform, alias less = "a < b",
        SwapStrategy ss = SwapStrategy.unstable, Range)(Range r)
    if (isRandomAccessRange!(Range) && hasLength!(Range))
{
    auto auxr = array(r);
    schwartzSort!(transform, less, ss)(auxr);
    return auxr;
}

import std.typecons: Tuple;
import std.stdio: writeln;

alias Tuple!(int, "x", int, "y") P;

void main() {
    P[] data = [P(1,4), P(2,3), P(3,1), P(4,0)];
    writeln(data);
    writeln(schwartzSorted!((x) { return x.y; })(data));
    writeln(data);
    writeln(sorted!q{ a.y < b.y }(data));
    writeln(data);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 18 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076


Peter Alexander <peter.alexander.au gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter.alexander.au gmail.co
                   |                            |m


--- Comment #1 from Peter Alexander <peter.alexander.au gmail.com> 2010-10-19
00:20:20 PDT ---
This actually seems to be a common pattern.

By "this", I mean:

auto foo(T value)
{
  T copy = value.dup;
  modify(copy);
  return copy;
}

"modify" here could be sort, reverse, schwartzSort, partition etc. which would
give you sorted, reversed, schwartzSorted and partitioned. It's also the same
as defining op+ in terms of op+=.

I have no idea what you would call foo though :-(

arr2 = transformed!(sort)(arr1);
arr2 = mutated!(sort)(arr1);
arr2 = modified!(sort)(arr1);
arr2 = copyModify!(sort)(arr1);

???

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com


--- Comment #2 from Andrei Alexandrescu <andrei metalanguage.com> 2010-10-19
06:22:08 PDT ---
One issue is that there's no standardized "create a copy" function for ranges.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #3 from Peter Alexander <peter.alexander.au gmail.com> 2010-10-19
09:15:12 PDT ---
(In reply to comment #2)
 One issue is that there's no standardized "create a copy" function for ranges.
auto copy = array(input); ? Ideally you'd be able to specify what container you want to copy into, but that should do as a default. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #4 from Andrei Alexandrescu <andrei metalanguage.com> 2010-10-19
09:42:37 PDT ---
(In reply to comment #3)
 (In reply to comment #2)
 One issue is that there's no standardized "create a copy" function for ranges.
auto copy = array(input); ? Ideally you'd be able to specify what container you want to copy into, but that should do as a default.
array creates an array from anything. We should have a way to say "duplicate and preserve type". Probably the best idea is to define an optional property ".dup" for ranges. Then arrays implement it automatically, and other ranges may define it as they find fit. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #5 from Peter Alexander <peter.alexander.au gmail.com> 2010-10-19
11:15:30 PDT ---
(In reply to comment #4)
 array creates an array from anything. We should have a way to say "duplicate
 and preserve type".
 
 Probably the best idea is to define an optional property ".dup" for ranges.
 Then arrays implement it automatically, and other ranges may define it as they
 find fit.
What if you don't want to preserve type? e.g. you have a list, but you want to get a sorted array of the elements in that list? Why not take the best of both worlds, allowing the user to specify the new container type, but have it default to the original type, something like: auto transformed(alias xform, OutputRange = InputRange, InputRange)(InputRange range) { ... } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #6 from bearophile_hugs eml.cc 2010-10-19 13:32:04 PDT ---
(In reply to comment #4)

 array creates an array from anything. We should have a way to say "duplicate
 and preserve type".
After thinking about your words for some time I have understood your point. So after your change, for the semantics I am looking for, I'll need to write a bit longer code: sorted(array(some_linked_list)) What if the input collection is not sortable? Like: sorted(some_hash_set) In that case I presume the compilation will fail, and I'll have to use something like: sorted(array(some_hash_set)) Or even: sorted(toList(some_hash_set)) (Where toList() is similar to array() but produces some kind of list out of an iterable). A problem in using this: sorted(array(some_linked_list)) is that array() is supposed to create an array duplicate of the collection, and then sorted() is supposed to create a second useless copy of it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #7 from bearophile_hugs eml.cc 2010-10-19 13:33:57 PDT ---
(In reply to comment #6)

Another problem is that after your change this code will fail to compile even
if some_linked_list is immutable:
sorted(some_linked_list)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 19 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #8 from bearophile_hugs eml.cc 2010-10-20 04:04:21 PDT ---
See another case:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22381

This is supposed to not work:
sort(map(...))

This is supposed to work:
sorted(map(...))

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 20 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #9 from Peter Alexander <peter.alexander.au gmail.com> 2010-10-20
08:49:43 PDT ---
(In reply to comment #8)
 See another case:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22381
 This is supposed to not work:
 sort(map(...))
 This is supposed to work:
 sorted(map(...))
Just implement sorted et al. something like this: auto sorted(Output = ElementType!InputRange[], InputRange)(InputRange range) { Output output = Output(range); // copy range into new container sort(output); return output; } I don't know if you can construct built-in arrays like that, but you should, and you can always specialise for it if necessary. This allows the input range to be whatever it likes (including Map), and also gives you the choice of the output range. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 20 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 09 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #10 from bearophile_hugs eml.cc 2011-01-24 03:04:20 PST ---
In Python sort() is in-place. To help programmers remember this, sort() returns
None (like void in D):

 a = [10, 30, 5]
 sorted(a)
[5, 10, 30]
 a
[10, 30, 5]
 a.sort()
 a
[5, 10, 30] But in DMD 2.051 std.algorithm.sort() returns the sequence sorted in-place. Once a sorted/schwartzSorted are present, I suggest to let std.algorithm.sort() return void, as in Python. sorted/schwartzSorted may be tagged with nodiscard from bug 5464 to further help programmers remember their semantics. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 24 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #11 from bearophile_hugs eml.cc 2011-05-18 18:21:42 PDT ---
See also issue 6035

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076



--- Comment #12 from bearophile_hugs eml.cc 2011-10-28 17:33:16 PDT ---
An use case for sorted(). I have to create a function foo() with a int[]
argument. Unless foo() is performance-critical the usual API requirements ask
for its arguments to be constant (in), to make the program less bug-prone.
Inside foo() I need to sort the a copy of items, and then I don't need to
modify this array, so I'd like this array copy too to be const. This is an
implementation that currently works:


import std.algorithm, std.exception;
void foo(in int[] unsortedData) {
    int[] tmpData_ = unsortedData.dup;
    tmpData_.sort();
    const(int[]) data = assumeUnique(tmpData_);
    // Use array 'data' here.
}
void main() {}


assumeUnique is not safe, and the tmpData_ name is present in the scope still
(despite assumeUnique has turned its length to zero, this improves the
situation a little).

With a pure sorted(), the code becomes more clean and safe:


import std.algorithm;
void foo(in int[] unsortedData) pure {
    const(int[]) data = sorted(unsortedData);
    // Use array 'data' here.
}
void main() {}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 28 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=5076


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com


--- Comment #13 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2012-02-07
12:20:02 PST ---
I run into this issue all the time, in particular when doing script-based
programming. E.g.:

string[string] classes;
foreach (name; classes.keys.sorted) { } // ng

Having to do this is a chore:
auto keys = classes.keys;
sort(keys);
foreach (name; keys) { } 

This works ok for my purposes (yada yada about constraints):
T sorted(T)(T t)
{
    T result = t;
    sort(result);
    return result;
}

Why do we have replace and replaceInPlace, whereas we have sort which sorts in
place implicitly? "findSkip" is another function that I sometimes use but I
hate how it hides the fact that it modifies its arguments. "find" returns a
range, but "findSkip" returns a bool and *modifies* your argument. It's not at
all obvious from the call site.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 07 2012