www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10670] New: std.algorithm.reduce: no-seed initialization wrong design

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670

           Summary: std.algorithm.reduce: no-seed initialization wrong
                    design
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: monarchdodra gmail.com
        ReportedBy: monarchdodra gmail.com



reduce has a "no-seed" implementation provided: "The one-argument version $(D
reduce!fun(range)) works similarly, but it uses the first element of the range
as the seed (the range must be non-empty)."

I question this design as invalid: The first element is never passed through
the transform function. There is no reason it be given any "special treatment".
Here is a trivial example of why the design is (IMO) wrong:

//----
import std.stdio, std.algorithm, std.range;

void main(string[] args)
{
    auto arr = [1, 1];
    auto sumDoubles1 = reduce!"a + 2*b"(0, arr);
    auto sumDoubles2 = reduce!"a + 2*b"(arr);
    writeln(sumDoubles1); //4
    writeln(sumDoubles2); //3 (!)
}
//----

I really can't see any universe where you could justify that return value...

The correct seed value should be "Seed seed = fun(Seed.init, r.front)": This
means the first element is correctly "processed". Of course, doing this really
just boils down to seed being Seed.init.

An added "bonus" is that it makes simple "isIterable" types much more
efficient, as they don't have to check "isInitialized" on every iteration.

--------
Unsure if this is "ER", "wrong-code" or what. Not sure if changing the behavior
is acceptable (IMO, it should be).

I was re-writing reduce already, so please provide your feedback on the issue
so that I can take it into account :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 19 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670


Peter Neubauer <peterneubauer2 gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peterneubauer2 gmail.com



07:34:43 PDT ---
float.init is nan, so this change would require all (float[]).reduce!"a+b"
calls to be rewritten with an explicit seed. I imagine that this is a
relatively common use case, while I've never had to reduce!"a+2b" anything.
Users of the latter case should get the burden of the explicit seed.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670


hsteoh quickfur.ath.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsteoh quickfur.ath.cx



I've run into this issue before. I agree that if no seed value is provided, it
should use ElementType!range.init, NOT range.front.

In the case of floats, well... I'd argue that using the seedless variety of
reduce on a float range is already a bug, so it should be fixed anyway.
Alternatively, since float.init is always NaN, it makes no sense to use the
seedless variety of reduce, so we could just static assert(false) if
isFloatingPointType!(ElementType!range). This will break existing code but at
least it will point out why the code is breaking, instead of silently changing
whatever the current behaviour is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670




I've done the dev. Here is my preliminary experience with it:
For starters, there *is* some convenience in using "front as seed" for calls
such as:

//----
double[] a = [ 3, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(a);
//----

Here, even a straight up "0.0" is a bad seed. The code needs to be fixed as:
//----
double[] a = [ 3, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(tuple(double.max, double.min), a);
//----

Not very convenient. But still, nothing surmountable.

But at the same time, if you look at the existing unittests, you can see some
strange things such as:
//----
    // two funs
    auto r2 = reduce!("a + b", "a - b")(tuple(0.0, 0.0), a);
    writeln(r2);
    assert(r2[0] == 7 && r2[1] == -7);
    auto r3 = reduce!("a + b", "a - b")(a);
    assert(r3[0] == 7 && r3[1] == -1);
//----
I don't know what that was trying to prove, but it shows that no-seed can do
some really strange things.

The "assert not float (or char for that matter)" was very efficient in catching
float seeds, and catched almost all of the "wrong usage" cases. The only one
that it did not catch was the above case. But such code doesn't make sense to
me. I think "silent breakage" would really be minimal.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 22 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc





 In the case of floats, well... I'd argue that using the seedless variety of
 reduce on a float range is already a bug,
I don't think it's a bug, this is Python:
 a = [2.0, 3.0, 4.0]
 reduce(lambda x, y: x * y, a)
24.0 I have a ton of D code that relies on such behavour of D reduce. I suggest to just swap the seed and sequence arguments of reduce, to support UFCS chains, and leave the rest of reduce as it is. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670






 
 In the case of floats, well... I'd argue that using the seedless variety of
 reduce on a float range is already a bug,
I don't think it's a bug, this is Python:
 a = [2.0, 3.0, 4.0]
 reduce(lambda x, y: x * y, a)
24.0 I have a ton of D code that relies on such behavour of D reduce.
You could be right, and I've seen use cases where it makes a lot of sense, eg: reduce!`a ~ " " ~ b`(["hello", "world"]);
 I suggest to just swap the seed and sequence arguments of reduce, to support
 UFCS chains, and leave the rest of reduce as it is.
I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the conclusion is that it is not possible without breaking code. And when I say "breaking code", I mean "still compiles but silently generates different results". That, and no migration plan was found. Short of finding a new name for a new function, I have not been able to achieve this goad. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10670






 I've tried in http://d.puremagic.com/issues/show_bug.cgi?id=8755, but the
 conclusion is that it is not possible without breaking code. And when I say
 "breaking code", I mean "still compiles but silently generates different
 results". That, and no migration plan was found. Short of finding a new name
 for a new function, I have not been able to achieve this goad.
I understand. I suggest to introduce a new function named like your "fold" and slowly deprecate "reduce". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 22 2013