www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4705] New: Redesign of std.algorithm.max()/min() + mins()/maxs()

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705

           Summary: Redesign of std.algorithm.max()/min() + mins()/maxs()
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc


--- Comment #0 from bearophile_hugs eml.cc 2010-08-21 11:14:01 PDT ---
Here I propose a redesign of the std.algorithm.max()/min() functions, and the
creation of two related functions, that may be named mins()/maxs().

Usually the contents of std.algorithm.max() are not necessary to program in D2,
their main purpose is to allow to shorten programs (and avoid some bugs),
especially in the large number of situations where maximum performance is not
necessary.

So the contents of std.algorithm must be flexible and handy to use. So the
contents of this module need to be chosen according to the most common pattern
found in normal programs. To show few examples I use Python code (because
Python design shows a very good choice of the most common patterns you find in
your code).


The task is to collect all items that have the max length:

 maxlen = max(len(item) for item in data)
 [item for item in data if len(item) == maxlen]
['hello', 'roger'] This task is so common that I use an utility module (as you see in Python max() accepts an optional key mapping function):
 from util import maxs
 maxs(data, key=len)
['hello', 'roger'] A subtask is to find the length of the longer string of 'data', here I show two possible solutions:
 data = ["red", "hello", "yes", "no", "roger", "bud"]
 len(max(data, key=len))
5
 max(len(item) for item in data)
5 This code shows a possible D2 solution of the subtask and task: import std.stdio: writeln; import std.algorithm: reduce, map, filter, max; import std.range: walkLength; void main() { string[] data = ["red", "hello", "yes", "no", "roger", "bud"]; int lmax = reduce!max(map!walkLength(data)); writeln(lmax); auto maxs = filter!((a){ return a.length == lmax; })(data); writeln(maxs); } But "reduce!max(map!walkLength(data))" is too much complex for such simple and common task (I have found that code only after many tries), so I think the current max() design is not good enough. My experience shows that the most common&handy usages of max()/min() are four (for each): max(item1, item2) max(item1, item2, item3) max(collection) max!(callable)(collection) min(item1, item2) min(item1, item2, item3) min(collection) min!(callable)(collection) They are: to find the max among two items, among tree items, among a collection of items, and among a collection of items using a mapping callable. The callable is similar to the 'transform' function of schwartzSort(). So I suggest a redesign of min() and max() according to those four most common usages. I also suggest to create two new functions, that I have just called maxs() and mins() similar to max and min, that find all the max or min items of a given collection. They don't need the case with two or three items: maxs(collection) maxs!(callable)(collection) mins(collection) mins!(callable)(collection) (If this proposal gets accepted I may be able to create a patch with this changes to max/min and the mins/maxs. I have written the same four functions for D1, that don't use ranges.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 21 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


--- Comment #1 from bearophile_hugs eml.cc 2010-10-11 13:45:17 PDT ---
I have assigned this to Andrei, but I don't know if that's the right thing to
do. I am not asking Andrei to write the D2 implementation of the
max/min/maxs/mins I discuss here, that's something I may do :-) But in Bugzilla
I have some Phobos2 enhancement requests like this one where I'd like to
receive a comment.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 11 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705


Denis Derman <denis.spir gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |denis.spir gmail.com


--- Comment #2 from Denis Derman <denis.spir gmail.com> 2010-12-24 11:12:14 PST
---
(In reply to comment #0)

 I also suggest to create two new functions, that I have just called maxs() and
 mins() similar to max and min, that find all the max or min items of a given
 collection. They don't need the case with two or three items:
 
 maxs(collection)
 maxs!(callable)(collection)
 
 mins(collection)
 mins!(callable)(collection)
Why I understand the utility of maxs/mins, I wonder about them in std lib. Aren't they "niche domain" tools? Also, I wonder about maxs(collection) without any trnasform func: if you don't map, then there is logically a single max value (even if can occur several times in collection). maxS/minS seems useful only in presence of a transform func on which results comparison is done: the compared value is (in your case string len) unique, but there may be several original values in coll that map to it. To say it deifferently, in your case mins(collection) would return possibly multiple specimens of the same string. And mins(lenghts) would return [5,5]. Unless I misunderstand your point. Denis -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #3 from Denis Derman <denis.spir gmail.com> 2010-12-24 11:17:33 PST
---
Aside the posted comment, I rather support this proposal.

Have you ever called an external func for min(a,b) or sum(a,b)? (lol ;-)
So, according to me, what we need is a simple and efficient implementation of
min/max (and sum, etc...) for what is the common need: over a collection.
An variant with transform func would also be nice, indeed.

Denis

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #4 from bearophile_hugs eml.cc 2010-12-24 22:53:27 PST ---
(In reply to comment #2)

 Aren't they "niche domain" tools?
The opposite is true. Functions and HOFs like max, min, sum, maxs, map, filter, reduce, and so on are basic building blocks that are useful to build most other programs. So their place is in the standard library, or sometimes even in the language itself.
 Also, I wonder about maxs(collection) without any trnasform func: if you don't
 map, then there is logically a single max value (even if can occur several
 times in collection). maxS/minS seems useful only in presence of a transform
 func on which results comparison is done: the compared value is (in your case
 string len) unique, but there may be several original values in coll that map
 to it.
 To say it deifferently, in your case mins(collection) would return possibly
 multiple specimens of the same string. And mins(lenghts) would return [5,5].
 Unless I misunderstand your point.
maxs with no transform function is useful even with integers, to count how many equal max values are present: maxs([5, 1, 5, 3]).length == 2 And if your collection contains objects, they may compare as equal despite being distinct. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 24 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #5 from bearophile_hugs eml.cc 2011-01-01 06:33:02 PST ---
Another example to show why a better max() is useful. The task is ot show how
the number less than 1000 which has the longest hailstone sequence. For the
hailstone see:
http://en.wikipedia.org/wiki/Collatz_conjecture

The code, DMD 2.051:

import std.stdio, std.algorithm, std.range, std.typecons;

auto hailstone(int n) {
    auto result = [n];
    while (n != 1) {
        n = n & 1 ? n*3 + 1 : n/2;
        result ~= n;
    }
    return result;
}

void main() {
    auto r = reduce!max(map!((i){return tuple(hailstone(i).length,
i);})(iota(1, 1000)))[1];
    writeln(r);
}


With a max that supports a key function and iterables the line of code in the
main() becomes just:

auto r = max!hailstone(iota(1, 1000));

With the enhancement request 5395 it becomes even more readable:

auto r = max!hailstone(1 .. 1000);

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 01 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #6 from bearophile_hugs eml.cc 2011-02-13 05:51:55 PST ---
Two more usage examples of the improved max. This program finds the longest
common subsequence with a recursive algorithm:


import std.stdio;
T[] lcs(T)(T[] a, T[] b) {
    T[] longest(T)(T[] s, T[] t) {
        return s.length > t.length ? s : t;
    }
    if (!a.length || !b.length)
        return null;
    if (a[0] == b[0])
        return a[0] ~ lcs(a[1..$], b[1..$]);
    return longest(lcs(a, b[1..$]), lcs(a[1..$], b));
}
void main() {
    writeln(lcs("thisisatest", "testing123testing"));
}


With a key-mapping max() you are able to remove the inner function and simplify
the code (untested):

import std.stdio;
T[] lcs(T)(T[] a, T[] b) {
    if (!a.length || !b.length)
        return null;
    if (a[0] == b[0])
        return a[0] ~ lcs(a[1..$], b[1..$]);
    return max!q{a.length}([lcs(a, b[1..$]), lcs(a[1..$], b)]);
}
void main() {
    writeln(lcs("thisisatest", "testing123testing"));
}


--------------

This program shows the most common anagrams from a dictionary file of different
words:


import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    int m = reduce!max(map!q{a.length}(anags.values));
    writeln(filter!((wl){ return wl.length == m; })(anags.values));
}


A key-mapping max() makes the code shorter and more readable (untested):


import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    int m = max!q{a.length}(anags.values);
    writeln(filter!((wl){ return wl.length == m; })(anags.values));
}


maxs() simplifies the code even more (untested):

import std.stdio, std.algorithm;
void main() {
    string[][string] anags;
    foreach (w; File("unixdict.txt").byLine())
        anags[w.sort.idup] ~= w.idup;
    writeln(maxs!q{a.length}(anags.values));
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 13 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #7 from bearophile_hugs eml.cc 2011-03-24 15:07:40 PDT ---
The examples hopefully show how much useful are the new max/min.
This is part of the "pivoting" part of a LU decomposition algorithm:

T imax = mat[j][j];
int nrow = j;
foreach (i; j .. N) {
    if (mat[i][j] > imax) {
        imax = mat[i][j];
        nrow = i;
    }
}


With the improved max() it becomes:

int nrow = max!((int i){ return mat[i][j]; })(iota(j, N));


That is similar to this Python code:

nrow = max(xrange(j, N), key=lambda i: mat[i][j])

Python designers have recognized this is a common pattern in code.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 24 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #8 from bearophile_hugs eml.cc 2011-03-25 10:35:06 PDT ---
A max/min with a comparing function is present in the Haskel standard library
too, named "maximumBy":


import Data.List
import Data.Ord

longestWord words = maximumBy (comparing length) words
listx = ["x", "yyy", "zz"]
main = print $ longestWord listx


Prints:
"yyy"

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 25 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #9 from bearophile_hugs eml.cc 2011-05-23 05:12:44 PDT ---
Another example, compared to using minPos():


import std.stdio, std.algorithm, std.array;

void main() {
    string[] data = ["red", "hello", "yes", "no", "roger", "bud"];

    // works in DMD 2.053
    string r1 = minPos!q{ walkLength(a) > walkLength(b) }(data).front();
    writeln(r1);

    // proposed
    string r2 = max!q{ walkLength(a) }(items);
    writeln(r2);
}


The second version is shorter and more readable, and just like schwartzSort()
the modified max() is more efficient than using minPos when the mapping
function becomes costly to compute.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 23 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #10 from bearophile_hugs eml.cc 2011-05-23 16:12:53 PDT ---
The code that uses minPos() also leads to a possible bug (a real bug I have
found in my code), shown here:


import std.stdio, std.algorithm, std.math, std.range, std.random;
int gen(int x) { return uniform(-100, 100); }
void main() {
    auto data = map!gen(iota(10));
    writeln(data);
    writeln(data);
    int result = minPos!((a, b){ return abs(a) < abs(b); })(data).front();
    writeln(result);
}


The output shows that gen is recomputed every time data is used, so
abs(a)<abs(b) gives bogus results:
[-87, -1, 86, -93, -89, 16, 17, -91, 55, 88]
[-36, 91, 38, 6, 23, 85, 60, -25, -48, -100]
-97

(Maybe I'd like an annotation to tell the compiler that "data" is an an Input
Range, unlike iota() that map is iterating on.)


To avoid that bug you have to turn data into an array:

import std.stdio, std.algorithm, std.math, std.range, std.random;
int gen(int x) { return uniform(-100, 100); }
void main() {
    auto data = array(map!gen(iota(10)));
    writeln(data);
    writeln(data);
    int result = minPos!((a, b){ return abs(a) < abs(b); })(data).front();
    writeln(result);
}


Now abs(a)<abs(b) gets computed on something that's not changing under the
carpet, and there is no bug:
[-41, -36, -15, -35, 91, 31, -5, -67, -91, -65]
[-41, -36, -15, -35, 91, 31, -5, -67, -91, -65]
-5



In code like this there is no need to turn data into an array, the result is
correct even keeping "data" lazy because items of data are accessed only once
to compute abs(a):


import std.stdio, std.algorithm, std.math, std.range, std.random;
int gen(int x) { return uniform(-100, 100); }
void main() {
    auto data = map!gen(iota(10));
    int result = min!q{ abs(a) }(data);
    writeln(result);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 23 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #11 from bearophile_hugs eml.cc 2011-05-23 16:16:25 PDT ---
(In reply to comment #10)

 (Maybe I'd like an annotation to tell the compiler that "data" is an an Input
 Range, unlike iota() that map is iterating on.)
This bug doesn't happen if the mapping function of map() is pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 23 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #12 from bearophile_hugs eml.cc 2011-09-14 15:51:52 PDT ---
Another use case. Given this struct:

struct Foo {
    double x;
    int[100] a;
}


This D code finds the struct with the smallest x and assigns it to good[index]:

size_t idmin = 0;
foreach (size_t i; 1 .. N)
    if (foos[i].x < foos[idmin].x)
        idmin = i;
good[index] = foos[idmin];


With the improvement I have proposed you are allowed to replace it with a
higher level code, that expresses the  idea clearly, is less bug-prone, and
reqires one 1 instead of 5:


good[index] = min!q{ a.x }(foos);

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 14 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #13 from bearophile_hugs eml.cc 2011-10-12 15:55:39 PDT ---
Part of A comment by Andrei Alexandrescu:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=144562

 Second, you propose
 
 mins(collection)
 mins!(callable)(collection)
 maxs(collection)
 maxs!(callable)(collection)
 
 that return all elements. I'm not sure how you plan to return - create a
 new array, or iterate a la filter? The latter is interesting, but for
 either variant is quite difficult to find use examples that are frequent
 enough to make min followed by filter too verbose.
It's not a problem of verbosity. If you have to find all the min or max items of a lazy iterable, and you want to use min followed by filter, then you have to scan the sequence two times. This is a problem because: - Two scans are a waste of time if the sequence is a large array, because of CPU cache issues. - It becomes worse if the sequence is a lazy range, because you have to compute every item two times. This is sometimes not acceptable. I have found situations like this, where the range was coming from a map with a costly mapping function. So maxs/mins is a common enough pattern (I have just hit another use case), it's not trivial to implement manually (because you have to efficiently manage the cache of the max/min items found so far), and I think it can't be trivially and efficiently implemented using existing std.range/std.algorithm tools. This makes it a worth candidate for a specilized higher order function. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 12 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #14 from bearophile_hugs eml.cc 2012-01-29 12:53:57 PST ---
Another example of the usefulness of maxs/mins:

http://rosettacode.org/wiki/Ordered_words#D
 Define an ordered word as a word in which the letters of the word appear
 in alphabetic order. Examples include 'abbey' and 'dirt'. The task is to 
 find and display all the ordered words in this dictionary that have the
 longest word length.
A D2 solution: import std.stdio, std.algorithm, std.range, std.file; void main() { auto ws = filter!isSorted(readText("unixdict.txt").split()); immutable maxl = reduce!max(map!walkLength(ws)); writeln(filter!(w => w.length == maxl)(ws)); } With a maxs the code becomes shorter, simpler, more readable and it needs to scan the items only once, instead of two as the reduce-max + filter. When the items are many scanning them only once speeds up the code, and it's usable with InputRanges too that allow only a single scan of the items: import std.stdio, std.algorithm, std.range, std.file; void main() { auto ws = filter!sorted?(readText("unixdict.txt").split()); writeln(maxs!walkLength(ws)); } Another hint of the usefulness of the maxs (and mins) function comes from looking at the Haskell solution, that contains the keepLongest function that essentially is a maxs!length (but it returns the items in reverse order, for efficiency because it uses a list): http://rosettacode.org/wiki/Ordered_words#Haskell isOrdered wws (_ : ws) = and $ zipWith (<=) wws ws keepLongest _ acc [] = acc keepLongest max acc (w : ws) = let len = length w in case compare len max of LT -> keepLongest max acc ws EQ -> keepLongest max (w : acc) ws GT -> keepLongest len [w] ws longestOrderedWords = reverse . keepLongest 0 [] . filter isOrdered main = do str <- getContents let ws = longestOrderedWords $ words str mapM_ putStrLn ws -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 29 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #15 from bearophile_hugs eml.cc 2013-03-20 06:16:14 PDT ---
In Haskell the reduce!min and reduce!max are named "minimum" and "maximum".

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 20 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #16 from bearophile_hugs eml.cc 2013-04-14 14:23:35 PDT ---
I still think mins()/maxs() are useful. But years after the original proposal
an API change in max/min is now problematic (unless you want to introduce
maximum/minimum functions). So I propose a small change in my original request.

Now I suggest to keep the max/min functions diadic as they are now, and add the
optional 'key' function. This is a backwards-compatible change in max/min.

This is the updated example code presented here:
http://d.puremagic.com/issues/show_bug.cgi?id=4705#c5


import std.stdio, std.algorithm, std.range, std.typecons;

auto hailstone(int n) pure nothrow {
    auto result = [n];
    while (n != 1) {
        n = (n & 1) ? (n * 3 + 1) : (n / 2);
        result ~= n;
    }
    return result;
}

void main() {
    iota(1, 1000)
    .map!(i => tuple(i.hailstone.length, i))
    .reduce!max[1]
    .writeln; // 871
}


With the original proposal the main() becomes (with the maximum the code is
very similar):

void main() {
    iota(1, 1000)
    max!(i => i.hailstone.length)
    .writeln;
}



With the reduced proposal the main() becomes:

void main() {
    iota(1, 1000)
    reduce!(max!(i => i.hailstone.length))
    .writeln;
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 14 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4705



--- Comment #17 from bearophile_hugs eml.cc 2013-09-03 09:45:18 PDT ---
In dmd 2.064alpha you can't use minPos on a byKey range:

import std.algorithm: minPos;
void main() {
    int[int] aa = [1: 2];
    int m1 = aa.byKey.minPos!((a, b) => a > b)[0]; // errors
    int m2 = aa.keys.minPos!((a, b) => a > b)[0];
}



test.d(4): Error: template std.algorithm.minPos does not match any function
template declaration. Candidates are:
...\dmd2\src\phobos\std\algorithm.d(6436):        std.algorithm.minPos(alias
pred = "a < b", Range)(Range range) if (isForwardRange!Range &&
!isInfinite!Range && is(typeof(binaryFun!pred(range.front, range.front))))
test.d(4): Error: template std.algorithm.minPos(alias pred = "a < b",
Range)(Range range) if (isForwardRange!Range && !isInfinite!Range &&
is(typeof(binaryFun!pred(range.front, range.front)))) cannot deduce template
function from argument types !(__lambda3)(Result)
test.d(4): Error: template instance minPos!(__lambda3) errors instantiating
template

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 03 2013