digitalmars.D.bugs - [Issue 4705] New: Redesign of std.algorithm.max()/min() + mins()/maxs()
- d-bugmail puremagic.com (75/82) Aug 21 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (14/14) Oct 11 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (22/31) Dec 24 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (11/11) Dec 24 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (15/25) Dec 24 2010 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (29/29) Jan 01 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (62/62) Feb 13 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (20/20) Mar 24 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (14/14) Mar 25 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (19/19) May 23 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (48/48) May 23 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (7/9) May 23 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (20/20) Sep 14 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (21/32) Oct 12 2011 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (41/45) Jan 29 2012 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (6/6) Mar 20 2013 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (40/40) Apr 14 2013 http://d.puremagic.com/issues/show_bug.cgi?id=4705
- d-bugmail puremagic.com (23/23) Sep 03 2013 http://d.puremagic.com/issues/show_bug.cgi?id=4705
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Summary: Redesign of std.algorithm.max()/min() + mins()/maxs() Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: bearophile_hugs eml.cc Here I propose a redesign of the std.algorithm.max()/min() functions, and the creation of two related functions, that may be named mins()/maxs(). Usually the contents of std.algorithm.max() are not necessary to program in D2, their main purpose is to allow to shorten programs (and avoid some bugs), especially in the large number of situations where maximum performance is not necessary. So the contents of std.algorithm must be flexible and handy to use. So the contents of this module need to be chosen according to the most common pattern found in normal programs. To show few examples I use Python code (because Python design shows a very good choice of the most common patterns you find in your code). The task is to collect all items that have the max length:['hello', 'roger'] This task is so common that I use an utility module (as you see in Python max() accepts an optional key mapping function):maxlen = max(len(item) for item in data) [item for item in data if len(item) == maxlen]['hello', 'roger'] A subtask is to find the length of the longer string of 'data', here I show two possible solutions:from util import maxs maxs(data, key=len)5data = ["red", "hello", "yes", "no", "roger", "bud"] len(max(data, key=len))5 This code shows a possible D2 solution of the subtask and task: import std.stdio: writeln; import std.algorithm: reduce, map, filter, max; import std.range: walkLength; void main() { string[] data = ["red", "hello", "yes", "no", "roger", "bud"]; int lmax = reduce!max(map!walkLength(data)); writeln(lmax); auto maxs = filter!((a){ return a.length == lmax; })(data); writeln(maxs); } But "reduce!max(map!walkLength(data))" is too much complex for such simple and common task (I have found that code only after many tries), so I think the current max() design is not good enough. My experience shows that the most common&handy usages of max()/min() are four (for each): max(item1, item2) max(item1, item2, item3) max(collection) max!(callable)(collection) min(item1, item2) min(item1, item2, item3) min(collection) min!(callable)(collection) They are: to find the max among two items, among tree items, among a collection of items, and among a collection of items using a mapping callable. The callable is similar to the 'transform' function of schwartzSort(). So I suggest a redesign of min() and max() according to those four most common usages. I also suggest to create two new functions, that I have just called maxs() and mins() similar to max and min, that find all the max or min items of a given collection. They don't need the case with two or three items: maxs(collection) maxs!(callable)(collection) mins(collection) mins!(callable)(collection) (If this proposal gets accepted I may be able to create a patch with this changes to max/min and the mins/maxs. I have written the same four functions for D1, that don't use ranges.) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------max(len(item) for item in data)
Aug 21 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705 bearophile_hugs eml.cc changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|nobody puremagic.com |andrei metalanguage.com I have assigned this to Andrei, but I don't know if that's the right thing to do. I am not asking Andrei to write the D2 implementation of the max/min/maxs/mins I discuss here, that's something I may do :-) But in Bugzilla I have some Phobos2 enhancement requests like this one where I'd like to receive a comment. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 11 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Denis Derman <denis.spir gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |denis.spir gmail.com ---I also suggest to create two new functions, that I have just called maxs() and mins() similar to max and min, that find all the max or min items of a given collection. They don't need the case with two or three items: maxs(collection) maxs!(callable)(collection) mins(collection) mins!(callable)(collection)Why I understand the utility of maxs/mins, I wonder about them in std lib. Aren't they "niche domain" tools? Also, I wonder about maxs(collection) without any trnasform func: if you don't map, then there is logically a single max value (even if can occur several times in collection). maxS/minS seems useful only in presence of a transform func on which results comparison is done: the compared value is (in your case string len) unique, but there may be several original values in coll that map to it. To say it deifferently, in your case mins(collection) would return possibly multiple specimens of the same string. And mins(lenghts) would return [5,5]. Unless I misunderstand your point. Denis -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 24 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705 --- Aside the posted comment, I rather support this proposal. Have you ever called an external func for min(a,b) or sum(a,b)? (lol ;-) So, according to me, what we need is a simple and efficient implementation of min/max (and sum, etc...) for what is the common need: over a collection. An variant with transform func would also be nice, indeed. Denis -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 24 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705Aren't they "niche domain" tools?The opposite is true. Functions and HOFs like max, min, sum, maxs, map, filter, reduce, and so on are basic building blocks that are useful to build most other programs. So their place is in the standard library, or sometimes even in the language itself.Also, I wonder about maxs(collection) without any trnasform func: if you don't map, then there is logically a single max value (even if can occur several times in collection). maxS/minS seems useful only in presence of a transform func on which results comparison is done: the compared value is (in your case string len) unique, but there may be several original values in coll that map to it. To say it deifferently, in your case mins(collection) would return possibly multiple specimens of the same string. And mins(lenghts) would return [5,5]. Unless I misunderstand your point.maxs with no transform function is useful even with integers, to count how many equal max values are present: maxs([5, 1, 5, 3]).length == 2 And if your collection contains objects, they may compare as equal despite being distinct. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Dec 24 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Another example to show why a better max() is useful. The task is ot show how the number less than 1000 which has the longest hailstone sequence. For the hailstone see: http://en.wikipedia.org/wiki/Collatz_conjecture The code, DMD 2.051: import std.stdio, std.algorithm, std.range, std.typecons; auto hailstone(int n) { auto result = [n]; while (n != 1) { n = n & 1 ? n*3 + 1 : n/2; result ~= n; } return result; } void main() { auto r = reduce!max(map!((i){return tuple(hailstone(i).length, i);})(iota(1, 1000)))[1]; writeln(r); } With a max that supports a key function and iterables the line of code in the main() becomes just: auto r = max!hailstone(iota(1, 1000)); With the enhancement request 5395 it becomes even more readable: auto r = max!hailstone(1 .. 1000); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 01 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Two more usage examples of the improved max. This program finds the longest common subsequence with a recursive algorithm: import std.stdio; T[] lcs(T)(T[] a, T[] b) { T[] longest(T)(T[] s, T[] t) { return s.length > t.length ? s : t; } if (!a.length || !b.length) return null; if (a[0] == b[0]) return a[0] ~ lcs(a[1..$], b[1..$]); return longest(lcs(a, b[1..$]), lcs(a[1..$], b)); } void main() { writeln(lcs("thisisatest", "testing123testing")); } With a key-mapping max() you are able to remove the inner function and simplify the code (untested): import std.stdio; T[] lcs(T)(T[] a, T[] b) { if (!a.length || !b.length) return null; if (a[0] == b[0]) return a[0] ~ lcs(a[1..$], b[1..$]); return max!q{a.length}([lcs(a, b[1..$]), lcs(a[1..$], b)]); } void main() { writeln(lcs("thisisatest", "testing123testing")); } -------------- This program shows the most common anagrams from a dictionary file of different words: import std.stdio, std.algorithm; void main() { string[][string] anags; foreach (w; File("unixdict.txt").byLine()) anags[w.sort.idup] ~= w.idup; int m = reduce!max(map!q{a.length}(anags.values)); writeln(filter!((wl){ return wl.length == m; })(anags.values)); } A key-mapping max() makes the code shorter and more readable (untested): import std.stdio, std.algorithm; void main() { string[][string] anags; foreach (w; File("unixdict.txt").byLine()) anags[w.sort.idup] ~= w.idup; int m = max!q{a.length}(anags.values); writeln(filter!((wl){ return wl.length == m; })(anags.values)); } maxs() simplifies the code even more (untested): import std.stdio, std.algorithm; void main() { string[][string] anags; foreach (w; File("unixdict.txt").byLine()) anags[w.sort.idup] ~= w.idup; writeln(maxs!q{a.length}(anags.values)); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 13 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 The examples hopefully show how much useful are the new max/min. This is part of the "pivoting" part of a LU decomposition algorithm: T imax = mat[j][j]; int nrow = j; foreach (i; j .. N) { if (mat[i][j] > imax) { imax = mat[i][j]; nrow = i; } } With the improved max() it becomes: int nrow = max!((int i){ return mat[i][j]; })(iota(j, N)); That is similar to this Python code: nrow = max(xrange(j, N), key=lambda i: mat[i][j]) Python designers have recognized this is a common pattern in code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 24 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 A max/min with a comparing function is present in the Haskel standard library too, named "maximumBy": import Data.List import Data.Ord longestWord words = maximumBy (comparing length) words listx = ["x", "yyy", "zz"] main = print $ longestWord listx Prints: "yyy" -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 25 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Another example, compared to using minPos(): import std.stdio, std.algorithm, std.array; void main() { string[] data = ["red", "hello", "yes", "no", "roger", "bud"]; // works in DMD 2.053 string r1 = minPos!q{ walkLength(a) > walkLength(b) }(data).front(); writeln(r1); // proposed string r2 = max!q{ walkLength(a) }(items); writeln(r2); } The second version is shorter and more readable, and just like schwartzSort() the modified max() is more efficient than using minPos when the mapping function becomes costly to compute. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 23 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 The code that uses minPos() also leads to a possible bug (a real bug I have found in my code), shown here: import std.stdio, std.algorithm, std.math, std.range, std.random; int gen(int x) { return uniform(-100, 100); } void main() { auto data = map!gen(iota(10)); writeln(data); writeln(data); int result = minPos!((a, b){ return abs(a) < abs(b); })(data).front(); writeln(result); } The output shows that gen is recomputed every time data is used, so abs(a)<abs(b) gives bogus results: [-87, -1, 86, -93, -89, 16, 17, -91, 55, 88] [-36, 91, 38, 6, 23, 85, 60, -25, -48, -100] -97 (Maybe I'd like an annotation to tell the compiler that "data" is an an Input Range, unlike iota() that map is iterating on.) To avoid that bug you have to turn data into an array: import std.stdio, std.algorithm, std.math, std.range, std.random; int gen(int x) { return uniform(-100, 100); } void main() { auto data = array(map!gen(iota(10))); writeln(data); writeln(data); int result = minPos!((a, b){ return abs(a) < abs(b); })(data).front(); writeln(result); } Now abs(a)<abs(b) gets computed on something that's not changing under the carpet, and there is no bug: [-41, -36, -15, -35, 91, 31, -5, -67, -91, -65] [-41, -36, -15, -35, 91, 31, -5, -67, -91, -65] -5 In code like this there is no need to turn data into an array, the result is correct even keeping "data" lazy because items of data are accessed only once to compute abs(a): import std.stdio, std.algorithm, std.math, std.range, std.random; int gen(int x) { return uniform(-100, 100); } void main() { auto data = map!gen(iota(10)); int result = min!q{ abs(a) }(data); writeln(result); } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 23 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705(Maybe I'd like an annotation to tell the compiler that "data" is an an Input Range, unlike iota() that map is iterating on.)This bug doesn't happen if the mapping function of map() is pure. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
May 23 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Another use case. Given this struct: struct Foo { double x; int[100] a; } This D code finds the struct with the smallest x and assigns it to good[index]: size_t idmin = 0; foreach (size_t i; 1 .. N) if (foos[i].x < foos[idmin].x) idmin = i; good[index] = foos[idmin]; With the improvement I have proposed you are allowed to replace it with a higher level code, that expresses the idea clearly, is less bug-prone, and reqires one 1 instead of 5: good[index] = min!q{ a.x }(foos); -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 14 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Part of A comment by Andrei Alexandrescu: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=144562Second, you propose mins(collection) mins!(callable)(collection) maxs(collection) maxs!(callable)(collection) that return all elements. I'm not sure how you plan to return - create a new array, or iterate a la filter? The latter is interesting, but for either variant is quite difficult to find use examples that are frequent enough to make min followed by filter too verbose.It's not a problem of verbosity. If you have to find all the min or max items of a lazy iterable, and you want to use min followed by filter, then you have to scan the sequence two times. This is a problem because: - Two scans are a waste of time if the sequence is a large array, because of CPU cache issues. - It becomes worse if the sequence is a lazy range, because you have to compute every item two times. This is sometimes not acceptable. I have found situations like this, where the range was coming from a map with a costly mapping function. So maxs/mins is a common enough pattern (I have just hit another use case), it's not trivial to implement manually (because you have to efficiently manage the cache of the max/min items found so far), and I think it can't be trivially and efficiently implemented using existing std.range/std.algorithm tools. This makes it a worth candidate for a specilized higher order function. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 12 2011
http://d.puremagic.com/issues/show_bug.cgi?id=4705 Another example of the usefulness of maxs/mins: http://rosettacode.org/wiki/Ordered_words#DDefine an ordered word as a word in which the letters of the word appear in alphabetic order. Examples include 'abbey' and 'dirt'. The task is to find and display all the ordered words in this dictionary that have the longest word length.A D2 solution: import std.stdio, std.algorithm, std.range, std.file; void main() { auto ws = filter!isSorted(readText("unixdict.txt").split()); immutable maxl = reduce!max(map!walkLength(ws)); writeln(filter!(w => w.length == maxl)(ws)); } With a maxs the code becomes shorter, simpler, more readable and it needs to scan the items only once, instead of two as the reduce-max + filter. When the items are many scanning them only once speeds up the code, and it's usable with InputRanges too that allow only a single scan of the items: import std.stdio, std.algorithm, std.range, std.file; void main() { auto ws = filter!sorted?(readText("unixdict.txt").split()); writeln(maxs!walkLength(ws)); } Another hint of the usefulness of the maxs (and mins) function comes from looking at the Haskell solution, that contains the keepLongest function that essentially is a maxs!length (but it returns the items in reverse order, for efficiency because it uses a list): http://rosettacode.org/wiki/Ordered_words#Haskell isOrdered wws (_ : ws) = and $ zipWith (<=) wws ws keepLongest _ acc [] = acc keepLongest max acc (w : ws) = let len = length w in case compare len max of LT -> keepLongest max acc ws EQ -> keepLongest max (w : acc) ws GT -> keepLongest len [w] ws longestOrderedWords = reverse . keepLongest 0 [] . filter isOrdered main = do str <- getContents let ws = longestOrderedWords $ words str mapM_ putStrLn ws -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 29 2012
http://d.puremagic.com/issues/show_bug.cgi?id=4705 In Haskell the reduce!min and reduce!max are named "minimum" and "maximum". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 20 2013
http://d.puremagic.com/issues/show_bug.cgi?id=4705 I still think mins()/maxs() are useful. But years after the original proposal an API change in max/min is now problematic (unless you want to introduce maximum/minimum functions). So I propose a small change in my original request. Now I suggest to keep the max/min functions diadic as they are now, and add the optional 'key' function. This is a backwards-compatible change in max/min. This is the updated example code presented here: http://d.puremagic.com/issues/show_bug.cgi?id=4705#c5 import std.stdio, std.algorithm, std.range, std.typecons; auto hailstone(int n) pure nothrow { auto result = [n]; while (n != 1) { n = (n & 1) ? (n * 3 + 1) : (n / 2); result ~= n; } return result; } void main() { iota(1, 1000) .map!(i => tuple(i.hailstone.length, i)) .reduce!max[1] .writeln; // 871 } With the original proposal the main() becomes (with the maximum the code is very similar): void main() { iota(1, 1000) max!(i => i.hailstone.length) .writeln; } With the reduced proposal the main() becomes: void main() { iota(1, 1000) reduce!(max!(i => i.hailstone.length)) .writeln; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Apr 14 2013
http://d.puremagic.com/issues/show_bug.cgi?id=4705 In dmd 2.064alpha you can't use minPos on a byKey range: import std.algorithm: minPos; void main() { int[int] aa = [1: 2]; int m1 = aa.byKey.minPos!((a, b) => a > b)[0]; // errors int m2 = aa.keys.minPos!((a, b) => a > b)[0]; } test.d(4): Error: template std.algorithm.minPos does not match any function template declaration. Candidates are: ...\dmd2\src\phobos\std\algorithm.d(6436): std.algorithm.minPos(alias pred = "a < b", Range)(Range range) if (isForwardRange!Range && !isInfinite!Range && is(typeof(binaryFun!pred(range.front, range.front)))) test.d(4): Error: template std.algorithm.minPos(alias pred = "a < b", Range)(Range range) if (isForwardRange!Range && !isInfinite!Range && is(typeof(binaryFun!pred(range.front, range.front)))) cannot deduce template function from argument types !(__lambda3)(Result) test.d(4): Error: template instance minPos!(__lambda3) errors instantiating template -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 03 2013