www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Improving std.range.Zip

reply =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
I have noticed an emerging idiom in my code lately: bring together n  
ranges, transform them to one range using a n-ary function. Currently it's  
achieved with:

map!((a) {return myNaryFun(a._0, a._1, ...); })(zip(range1, range2, ...));

It's a bit of a nuisanse -- rarely do my transforming functions take  
tuples, and there's the necessity of composing 2 higher-order ranges to  
render fairly common functionality. I think Zip could be further  
parametrized with a zipper function, so that the above code would boil  
down to:

zip!myNaryFun(range1, range2, ...);

Having looked at Zip's source, such change shouldn't be a big deal. Oh,  
and the default zipper function would be std.typecons.tuple, not to  
trouble those not keen on generalization.

Good?

-- 
Tomek
Oct 24 2010
next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
2010/10/24 Tomek Sowi=C5=84ski <just ask.me>:
 I have noticed an emerging idiom in my code lately: bring together n rang=

 transform them to one range using a n-ary function. Currently it's achiev=

 with:

 map!((a) {return myNaryFun(a._0, a._1, ...); })(zip(range1, range2, ...))=

 It's a bit of a nuisanse -- rarely do my transforming functions take tupl=

 and there's the necessity of composing 2 higher-order ranges to render
 fairly common functionality. I think Zip could be further parametrized wi=

 a zipper function, so that the above code would boil down to:

 zip!myNaryFun(range1, range2, ...);

 Having looked at Zip's source, such change shouldn't be a big deal. Oh, a=

 the default zipper function would be std.typecons.tuple, not to trouble
 those not keen on generalization.

That's what Haskell calls ZipWith. I called it tmap (as in tuple-map) when I needed it in D. IMHO, it should be a generalization of std.algorithm.map: let it accept n ranges and a n-ary function. It can even do a partial check at compile-time. Extending filter() and reduce() to work with n ranges in parallel and a n-args function is also useful. you may be interested in looking there: http://www.dsource.org/projects/dranges/ (more specifically, in the algorithm module) Philippe
Oct 24 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Tomek S.:

 map!((a) {return myNaryFun(a._0, a._1, ...); })(zip(range1, range2, ...));

Currently the docs of std.algorithm.map say:
Multiple functions can be passed to map. In that case, the element type of map 
is a tuple containing one element for each function.<

But lot of time ago I have said that in my opinion that's not the best design. Python has a different design, its map does what you want, you may write your code in Python as: map(myNaryFun, range1, range2, ...) An example (Python 2.6):
 a = ["a", "b", "c", "d"]
 b = [1, 2, 3, 4]
 map(lambda c,n: c * n, a, b)



Is is possible to change the std.algorithm.map to a semantics similar to the Python one, that I think is more useful? Bye, bearophile
Oct 24 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
On Sun, 24 Oct 2010 21:39:24 +0200, bearophile <bearophileHUGS lycos.com>  
wrote:

 Tomek S.:

 map!((a) {return myNaryFun(a._0, a._1, ...); })(zip(range1, range2,  
 ...));

Currently the docs of std.algorithm.map say:
 Multiple functions can be passed to map. In that case, the element type  
 of map  is a tuple containing one element for each function.<

But lot of time ago I have said that in my opinion that's not the best design. Python has a different design, its map does what you want, you may write your code in Python as: map(myNaryFun, range1, range2, ...) An example (Python 2.6):
 a = ["a", "b", "c", "d"]
 b = [1, 2, 3, 4]
 map(lambda c,n: c * n, a, b)



Is is possible to change the std.algorithm.map to a semantics similar to the Python one, that I think is more useful?

From what I can see, map currently simply doesn't support passing it multiple ranges. It would be a trivial change to let it support multiple ranges in addition to multiple functions. -- Simen
Oct 24 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Simen kjaeraas:

  From what I can see, map currently simply doesn't support passing it
 multiple ranges. It would be a trivial change to let it support multiple
 ranges in addition to multiple functions.

If you may have multiple of both then the situation becomes complex. So I think that single function - multiple ranges is simpler & better than that :-) Bye, bearophile
Oct 24 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/24/10 14:55 CDT, Simen kjaeraas wrote:
 On Sun, 24 Oct 2010 21:39:24 +0200, bearophile
 <bearophileHUGS lycos.com> wrote:

 Tomek S.:

 map!((a) {return myNaryFun(a._0, a._1, ...); })(zip(range1, range2,
 ...));

Currently the docs of std.algorithm.map say:
 Multiple functions can be passed to map. In that case, the element
 type of map is a tuple containing one element for each function.<

But lot of time ago I have said that in my opinion that's not the best design. Python has a different design, its map does what you want, you may write your code in Python as: map(myNaryFun, range1, range2, ...) An example (Python 2.6):
 a = ["a", "b", "c", "d"]
 b = [1, 2, 3, 4]
 map(lambda c,n: c * n, a, b)



Is is possible to change the std.algorithm.map to a semantics similar to the Python one, that I think is more useful?

From what I can see, map currently simply doesn't support passing it multiple ranges. It would be a trivial change to let it support multiple ranges in addition to multiple functions.

This is coming full circle. At a point, map _did_ support multiple ranges. Some people found that non-modular - if you want multiple ranges, you should use map with zip... Andrei
Oct 24 2010
parent reply KennyTM~ <kennytm gmail.com> writes:
On Oct 25, 10 14:04, Andrei Alexandrescu wrote:
 This is coming full circle. At a point, map _did_ support multiple
 ranges. Some people found that non-modular - if you want multiple
 ranges, you should use map with zip...

Except that at "that point"[1], map's multi-range support is equivalent to map!(func)(r1, r2, r3...) == map!(func)(chain(r1, r2, r3...)) which is hardly useful. OTOH zipWithN is a very useful higher-order function. [1]: http://web.archive.org/web/20080417051734/www.digitalmars.com/d/2.0/phobos/std_algorithm.html#map
Oct 24 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/25/10 1:42 CDT, KennyTM~ wrote:
 On Oct 25, 10 14:04, Andrei Alexandrescu wrote:
 This is coming full circle. At a point, map _did_ support multiple
 ranges. Some people found that non-modular - if you want multiple
 ranges, you should use map with zip...

Except that at "that point"[1], map's multi-range support is equivalent to map!(func)(r1, r2, r3...) == map!(func)(chain(r1, r2, r3...)) which is hardly useful. OTOH zipWithN is a very useful higher-order function. [1]: http://web.archive.org/web/20080417051734/www.digitalmars.com/d/2.0/phobos/std_algorithm.html#map

Oh, you're right. Sorry! Andrei
Oct 24 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Tomek S.:

 You're right. Still, modularity is a valid point.

This is a general and interesting design topic. The short answer is that a real language is designed taking into account different trade-offs. Modularity is important, and Andrei surely likes it. Modularity means the ability to combine small parts to create many different complex systems. To do this the subsystems must interact clearly with each other, they must lack too many corner cases, and of course they need a common flexible interface. As I have recently read, in a modular system "the surprises will be pleasant ones". On the other hand modularity has some disadvantages too. To be clean those interfaces among subsystems must often be hi-level, this means they may be not fully efficient (unless your compiler is very smart, as the most optimizing compiler, the Stalin Scheme compiler). Another disadvantage is that you have to build common higher order systems over and over again, and sometimes you need some brain and time to build them well. To avoid this last problem a well designed system usually offers you pre-built higher order systems, made with the elementary components, that you may just use. In some situations very common usage patterns deserve to be encapsulated into handy forms. So, if you have one of the most used high order functions (map), if its usage often enough deals with mapping functions that work on more than one range of items for its arguments, and if the current way to perform this operation from elementary components produces a not so readable syntax, then practicality asks for for a less modular but more handy design of map :-) Few more examples of this trade-off below. ---------------- I have even (weakly) asked for amap()/afilter() functions, that are just array(map(...)) and array(filter(...)), because those are very common patterns in my code. I am still not sure adding amap/afilter to Phobos is a good idea, it just reduces the number of parenthesis a little, so it decreases noise in the code. ---------------- Recently in Bugzilla I have asked for sorted()/schwartzSorted(), in this case my desire is stronger, because they save a bit more than just few chars, they are expressions: http://d.puremagic.com/issues/show_bug.cgi?id=5076 ---------------- A similar thread has recently come up in the D.learn newsgroup: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=22413 This person has asked for a way to remove the first item in an array given not the item index (found with indexOf), but the item itself. Python2 too has those two different ways to remove an item from a list (array), by inded and by item:
 arr = ["a", "b", "c", "d"]
 arr.remove("x")



File "<stdin>", line 1, in <module> ValueError: list.remove(x): x not in list
 arr.index("c")



 arr.remove("c")
 arr



 del arr[1]
 arr



This composed operation (list.remove in Python) is useful in D both because it's a common need, and because it saves the testing code in the middle (between the index and the del, to be sure the item is present). Well, the test is done anyway, but it's done after the single operation, this is a little better. In Python the list.remove raises an exception if the item is missing. In D this function may throw an exception or return a false boolean (exceptions are safer, but less efficient, this is another trade-off. In some situations you may even add both functions, one that throws exceptions and one that returns true/false). In D a combined function like the Python list.remove() one also avoids possible troubles caused by the fact that indexOf() returns a signed integer, and while the delete function accepts signed integers too, then I think inside it compares them to a unsigned length and this may cause troubles :-( You may compile this wrong program in release mode, to see it: import std.algorithm: remove, indexOf; import std.stdio: writeln; void main() { int[] a1 = [3, 5, 7, 8]; auto pos = a1.indexOf(11); writeln(pos); // -1 int[] a2 = a1.remove(pos); writeln(a2); } Partially related problem, this is part of the docs of std.algorithm:
The original array has remained of the same length because all functions in
std.algorithm only change content, not topology. The value 8 is repeated
because std.algorithm.move  was invoked to move elements around and on integers
move simply copies the source to the destination. To replace a with the effect
of the removal, simply assign a = remove(a, 1). The slice will be rebound to
the shorter array and the operation completes with maximal efficiency.<

This is bug-prone, but maybe there is no way to avoid this behaviour, because in D arrays aren't true reference types. Bye, bearophile
Oct 25 2010
prev sibling next sibling parent =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 25-10-2010 o 08:42:31 KennyTM~ <kennytm gmail.com> napisa=B3(a):

 On Oct 25, 10 14:04, Andrei Alexandrescu wrote:
 This is coming full circle. At a point, map _did_ support multiple
 ranges. Some people found that non-modular - if you want multiple
 ranges, you should use map with zip...

Except that at "that point"[1], map's multi-range support is equivalen=

 to

      map!(func)(r1, r2, r3...) =3D=3D  map!(func)(chain(r1, r2, r3...)=

 which is hardly useful. OTOH zipWithN is a very useful higher-order  =

 function.

 [1]:  =

 http://web.archive.org/web/20080417051734/www.digitalmars.com/d/2.0/ph=

You're right. Still, modularity is a valid point. Thinking about it, thi= s = can be a good exercise in static inheritance -- make Zip the base taking= = care of multi-range traversal (stopping policy, etc) and the "derived" = ranges (Map, Filter, Reduce) would only implement element accessors, = transformation, and so on. -- = Tomek
Oct 25 2010
prev sibling parent =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 25-10-2010 o 21:20:24 Tomek Sowi=F1ski <just ask.me> napisa=B3(a):

 (Map, Filter, Reduce)

reduce is of course not a range ;) -- = Tomek
Oct 25 2010
prev sibling parent =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:
Dnia 24-10-2010 o 21:34:54 Philippe Sigaud <philippe.sigaud gmail.com>  =

napisa=B3(a):

 That's what Haskell calls ZipWith. I called it tmap (as in tuple-map)
 when I needed it in D.
 IMHO, it should be a generalization of std.algorithm.map: let it
 accept n ranges and a n-ary function. It can even do a partial check
 at compile-time.

It could be. As long as the job gets done it doesn't matter what will be= = generalized, Map or Zip. One will cover other's functionality entirely.
 Extending filter() and reduce() to work with n ranges in parallel and
 a n-args function is also useful.
 you may be interested in looking there:
 http://www.dsource.org/projects/dranges/
 (more specifically, in the algorithm module)

Nice lib. Any plans to incorporate some of it to Phobos? -- = Tomek
Oct 24 2010