www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Replacement for Zip/Lockstep

reply "monarch_dodra" <monarchdodra gmail.com> writes:
I have an issue with these 2 functions: They have a run-time 
stopping condition.

This is bad on several counts:
1. Piss poor performance. Also, the "safe" default of "shortest" 
doen't help.
2. Traits support. In particular, things like length, empty or 
infinite. Mixing inifinite and non-inifinite ranges creates 
something that may or may not be infinite...

These issues are *not* solvable "in-place". The only solution is 
a new range, but that takes its argument statically. I think it 
would have big net benefits. And it would break nothing, since we 
could keep around zip/lockstep (but not recommend them).

I'd want to start working on this, but I wanted to touch on a few 
things here first:
1. Name? I'd need a name for this. Any thoughts?
2. Default value? AFAIK, *all* the times I've seen Zip/Lockstep 
in action, it was on ranges that had the same length, yet used 
the default "shortest" option, which is bad. I think it would be 
better to default on `requireSameLength`, and have 
shortest/longuest as explicit requests.

Thoughts? Suggestions for names?
Mar 16 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 Thoughts?

I have various enhancement requests or suggestions on zip/lockstep in Bugzilla, I suggest you to take a look at them. I prefer the "shortest", because it's used in various languages, and it works well when one given range is finite and the other is infinite: zip([10, 20, 7, 3), [1, -1].cycle) I think the zip() name is the most common in languages. Adding a third is not easy. And I like the zip name. I have suggested to deprecate lockstep and keep zip only: https://d.puremagic.com/issues/show_bug.cgi?id=8155 zip(A, B) is currently not nothrow, and this is bad. I suggest to give it the kind of iteration (shortest/longest/etc) as template argument. So the shortest (that I think should be the default, as now), can be specialized to be nothrow: https://d.puremagic.com/issues/show_bug.cgi?id=11913 https://d.puremagic.com/issues/show_bug.cgi?id=6034 https://d.puremagic.com/issues/show_bug.cgi?id=8715 In the ER 8715 I have suggested to add to zip _optionally_ a mapping function, so it becomes like zipWith of Haskell: static struct Vec { int x, y; } auto r2 = zip!Vec([1,2,3], [10,20,30]); This is very handy and it's more efficient and shorter than a zip+map. Bye, bearophile
Mar 16 2014
prev sibling next sibling parent "jerro" <jkrempus gmail.com> writes:
 I prefer the "shortest", because it's used in various 
 languages, and it works well when one given range is finite and 
 the other is infinite:

 zip([10, 20, 7, 3), [1, -1].cycle)

The policy requireSameLenght could be changed so that it would ignore the empty property of infinite ranges. Then your example would still work with it. Or, alternatively, requireSameLengthOrInfinite could be added. I don't think that allowing multiple finite ranges with different numbers of elements is a very good default. Code that intentionally passes finite ranges with different lengths to zip() is probably pretty rare (code that does it unintentionally may be more common). And even when such code is correct, being explicit about it would make it clearer.
Mar 16 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
jerro:

 Code that intentionally
 passes finite ranges with different lengths to zip() is probably
 pretty rare (code that does it unintentionally may be more 
 common).

In my code it's sufficiently common. I have just "grepped" only two usages of StoppingPolicy.requireSameLength in my code. And one of them is an artificial need. One use case for zipping unequal length sequences is to take adjacent pairs of a sequence, you zip the sequence with itself minus the first (Python2.6 code):
 a = [10, 20, 30, 40]
 zip(a, a[1:])



 from itertools import izip
 list(izip(a, a[1:]))



 zip(a, a[1:], a[2:])



This is useful for local averages, or to compute simple local operations on a sequence.
 And even when such code is correct, being explicit about it 
 would make it clearer.

One problem is that "StoppingPolicy.requireSameLength" is very long, so putting it in UFCS chains kills too much horizontal space. Bye, bearophile
Mar 16 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 17 March 2014 at 02:19:58 UTC, bearophile wrote:
 One problem is that "StoppingPolicy.requireSameLength" is very 
 long, so putting it in UFCS chains kills too much horizontal 
 space.

Thanks for the bug reports. I'll have to read them all in depth to really appreciate them in depth. I think you've convinced me to keep "shortest" as default, as consistency is king. It *would* help to have short aliases to promote simpler usage of each mode. Something like "zipShortest, zipLongest, zipSameLength". For example. I like 8715, I'll see if I can make it happen. I'm just worried about having functions with 2 different default parameters. That said, proper template magic could allow you to write whatever you want and still work ^^. Thinking about it harder, I'm wondering if it isn't possible to actually have a Zip!condition coexist with the run-time version. In which case, all I'd have to do is add the new version. I'll see what I can do. If not, the other idea is to alias `lockstep(condition)` to return a `Zip`, and introduce a new and improved `lockstep`. This works because (AFAIK), there is no publicly available `Lockstep`. But I'd rather keep the name `Zip` if at all possible. Time to start working!
Mar 17 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 Thanks for the bug reports. I'll have to read them all in depth 
 to really appreciate them in depth.

If you do a quick search in Bugzilla perhaps you can find few more of them.
 I think you've convinced me to keep "shortest" as default, as 
 consistency is king.

In Haskell zip does the same as in Python and D: Prelude> zip [1,2] [10,20,30] [(1,10),(2,20)]
 It *would* help to have short aliases to promote simpler usage 
 of each mode. Something like "zipShortest, zipLongest, 
 zipSameLength". For example.

Right. Another improvement could come from a little change in D: enum E { A, B } void foo(E x) {} void main() { foo(A); // "E." is implicit if there are name clashes. }
 I like 8715, I'll see if I can make it happen.

One advantage of 8715 is that you don't need to create tuples first with zip and then something different with a map.
 I'm just worried about having functions with 2 different 
 default parameters.

Right, this should be designed well.
 Thinking about it harder, I'm wondering if it isn't possible to 
 actually have a Zip!condition coexist with the run-time 
 version. In which case, all I'd have to do is add the new 
 version. I'll see what I can do.

I'd like the run-time version to be deprecated and later removed, to keep the API clean.
 If not, the other idea is to alias `lockstep(condition)` to 
 return a `Zip`, and introduce a new and improved `lockstep`. 
 This works because (AFAIK), there is no publicly available 
 `Lockstep`.

If possible I'd like to keep only zip() and deprecate lockstep().
 Time to start working!

Good :-) Zip is not at the top of improvements I'd like for std.range/std.algorithm (near the top of my list there are issues 4705 and the already implemented but not merged issue 5550), but making zip nothrow is going to be a nice improvement. Bye, bearophile
Mar 17 2014
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 16/03/14 23:37, monarch_dodra wrote:
 Thoughts? Suggestions for names?

sync ... ? (or synch)
Mar 17 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 17 March 2014 at 21:08:12 UTC, Joseph Rushton Wakeling 
wrote:
 On 16/03/14 23:37, monarch_dodra wrote:
 Thoughts? Suggestions for names?

sync ... ? (or synch)

"sync" actually went through my head. In the the context of programming, it could really mean anything, so I don't think its so good.
Mar 17 2014
prev sibling next sibling parent "Meta" <jared771 gmail.com> writes:
On Monday, 17 March 2014 at 21:26:42 UTC, monarch_dodra wrote:
 On Monday, 17 March 2014 at 21:08:12 UTC, Joseph Rushton 
 Wakeling wrote:
 On 16/03/14 23:37, monarch_dodra wrote:
 Thoughts? Suggestions for names?

sync ... ? (or synch)

"sync" actually went through my head. In the the context of programming, it could really mean anything, so I don't think its so good.

Perhaps it is time to introduce std.range.dubstep.
Mar 17 2014
prev sibling next sibling parent "Meta" <jared771 gmail.com> writes:
On Monday, 17 March 2014 at 22:29:26 UTC, Meta wrote:
 On Monday, 17 March 2014 at 21:26:42 UTC, monarch_dodra wrote:
 On Monday, 17 March 2014 at 21:08:12 UTC, Joseph Rushton 
 Wakeling wrote:
 On 16/03/14 23:37, monarch_dodra wrote:
 Thoughts? Suggestions for names?

sync ... ? (or synch)

"sync" actually went through my head. In the the context of programming, it could really mean anything, so I don't think its so good.

Perhaps it is time to introduce std.range.dubstep.

On a more serious note, if the default is to require statically that both ranges are the same length, it could be named something like balancedZip or zipEven.
Mar 17 2014
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 17/03/14 22:26, monarch_dodra wrote:
 "sync" actually went through my head. In the the context of programming, it
 could really mean anything, so I don't think its so good.

inStep, then?
Mar 17 2014
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 16 March 2014 at 23:29:36 UTC, bearophile wrote:
 monarch_dodra:
 zip(A, B) is currently not nothrow, and this is bad. I suggest 
 to give it the kind of iteration (shortest/longest/etc) as 
 template argument. So the shortest (that I think should be the 
 default, as now), can be specialized to be nothrow:
 https://d.puremagic.com/issues/show_bug.cgi?id=11913

I looked into this, and the only reason it throws is because `requireSameLength` same length does an `enforce` if you pop and empty range. If think this is wrong, and should simply assert. `Zip` has an `empty` member, so nothing really justifies accepting a user calling pop on an empty Zip. I'll try to solve this now.
Mar 18 2014