www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Functional vs simple code

reply "ixid" <nuaccount gmail.com> writes:
Without optimization the range and algorithm method takes about 
10 times as long as the simple code below it, with no array 
bounds checking and optimization it takes six times as long. Why 
is the difference so huge? I'd expect a moderate overhead but 
that's a big difference.

module main;
import std.stdio, std.algorithm, std.range, std.datetime;

enum MAX = 10_000_000UL;

void main() {
	StopWatch sw;
	sw.start;

	auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

	sw.peek.msecs.writeln("msecs");
	sum1.writeln;
	sw.start;

	ulong sum2 = 0;
	foreach(i;0..MAX)
		sum2 += i * i;

	sw.peek.msecs.writeln("msecs");

	sum2.writeln;
}
Oct 02 2012
next sibling parent "ixid" <nuaccount gmail.com> writes:
On Tuesday, 2 October 2012 at 20:48:31 UTC, ixid wrote:
 Without optimization the range and algorithm method takes about 
 10 times as long as the simple code below it, with no array 
 bounds checking and optimization it takes six times as long. 
 Why is the difference so huge? I'd expect a moderate overhead 
 but that's a big difference.

 module main;
 import std.stdio, std.algorithm, std.range, std.datetime;

 enum MAX = 10_000_000UL;

 void main() {
 	StopWatch sw;
 	sw.start;

 	auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

 	sw.peek.msecs.writeln("msecs");
 	sum1.writeln;
 	sw.start;

 	ulong sum2 = 0;
 	foreach(i;0..MAX)
 		sum2 += i * i;

 	sw.peek.msecs.writeln("msecs");

 	sum2.writeln;
 }

I realised I was making the functional version more complicated than necessary. This version is faster but still four times slower than the simple version: MAX.iota.reduce!"a + b * b".writeln;
Oct 02 2012
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 10/02/2012 10:48 PM, ixid wrote:
 Without optimization the range and algorithm method takes about 10 times
 as long as the simple code below it, with no array bounds checking and
 optimization it takes six times as long. Why is the difference so huge?
 I'd expect a moderate overhead but that's a big difference.

 module main;
 import std.stdio, std.algorithm, std.range, std.datetime;

 enum MAX = 10_000_000UL;

 void main() {
      StopWatch sw;
      sw.start;

      auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

      sw.peek.msecs.writeln("msecs");
      sum1.writeln;
      sw.start;

      ulong sum2 = 0;
      foreach(i;0..MAX)
          sum2 += i * i;

      sw.peek.msecs.writeln("msecs");

      sum2.writeln;
 }

$ cat ixidbench.d module main; import std.stdio, std.algorithm, std.range, std.datetime; enum MAX = 10_000_000_000UL; void main() { StopWatch sw; sw.start; auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b"; sw.stop; sw.peek.msecs.writeln("msecs"); sum1.writeln; sw.reset; sw.start; ulong sum2 = 0; foreach(i;0..MAX) sum2 += i * i; sw.stop; sw.peek.msecs.writeln("msecs"); sum2.writeln; } $ ldmd2 -O -release -inline ixidbench.d -ofixidbench $ ./ixidbench 6528msecs 7032546979563742720 7518msecs 7032546979563742720
Oct 02 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 10/03/2012 12:11 AM, Timon Gehr wrote:
 ...

 $ cat ixidbench.d
 module main;
 import std.stdio, std.algorithm, std.range, std.datetime;

 enum MAX = 10_000_000_000UL;

 void main() {
      StopWatch sw;
      sw.start;

      auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

      sw.stop;
      sw.peek.msecs.writeln("msecs");
      sum1.writeln;
      sw.reset;
      sw.start;

      ulong sum2 = 0;
      foreach(i;0..MAX)
          sum2 += i * i;

      sw.stop;
      sw.peek.msecs.writeln("msecs");

      sum2.writeln;
 }
 $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
 $ ./ixidbench
 6528msecs
 7032546979563742720
 7518msecs
 7032546979563742720

$ gdmd -O -release -inline ixidbench.d -ofixidbench $ ./ixidbench 11250msecs 7032546979563742720 11233msecs 7032546979563742720
Oct 02 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
On Tuesday, 2 October 2012 at 22:13:10 UTC, Timon Gehr wrote:
 On 10/03/2012 12:11 AM, Timon Gehr wrote:
 ...

 $ cat ixidbench.d
 module main;
 import std.stdio, std.algorithm, std.range, std.datetime;

 enum MAX = 10_000_000_000UL;

 void main() {
     StopWatch sw;
     sw.start;

     auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

     sw.stop;
     sw.peek.msecs.writeln("msecs");
     sum1.writeln;
     sw.reset;
     sw.start;

     ulong sum2 = 0;
     foreach(i;0..MAX)
         sum2 += i * i;

     sw.stop;
     sw.peek.msecs.writeln("msecs");

     sum2.writeln;
 }
 $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
 $ ./ixidbench
 6528msecs
 7032546979563742720
 7518msecs
 7032546979563742720

$ gdmd -O -release -inline ixidbench.d -ofixidbench $ ./ixidbench 11250msecs 7032546979563742720 11233msecs 7032546979563742720

Yes, I think it was just the compiler and compiler options rather than anything inherent in the method, DMD produced much slower code than GDC for the functional version. It's interesting that it's significantly faster in the functional style with LDC.
Oct 02 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
On Tuesday, 2 October 2012 at 22:13:10 UTC, Timon Gehr wrote:
 On 10/03/2012 12:11 AM, Timon Gehr wrote:
 ...

 $ cat ixidbench.d
 module main;
 import std.stdio, std.algorithm, std.range, std.datetime;

 enum MAX = 10_000_000_000UL;

 void main() {
     StopWatch sw;
     sw.start;

     auto sum1 = MAX.iota.map!(x => x * x).reduce!"a + b";

     sw.stop;
     sw.peek.msecs.writeln("msecs");
     sum1.writeln;
     sw.reset;
     sw.start;

     ulong sum2 = 0;
     foreach(i;0..MAX)
         sum2 += i * i;

     sw.stop;
     sw.peek.msecs.writeln("msecs");

     sum2.writeln;
 }
 $ ldmd2 -O -release -inline ixidbench.d -ofixidbench
 $ ./ixidbench
 6528msecs
 7032546979563742720
 7518msecs
 7032546979563742720

$ gdmd -O -release -inline ixidbench.d -ofixidbench $ ./ixidbench 11250msecs 7032546979563742720 11233msecs 7032546979563742720

While fiddling with this I came across something that seems odd in the behaviour of reduce and wondered if it's intended. It rather limits the usefulness of reduce with UFCS to "a + b" and "a - b". Reduce works correctly when provided with a seed argument: reduce!"a + b + 2"(0, [1,1,1]).writeln; // == 9 as expected With UFCS I see no elegant way to reduce a chain with a more sophisticated reduce argument than the two simple ones listed above because the operation is not carried out on the first array element, used as the seed. This means: [1,1,1].reduce!"a + b + 2".writeln; // == 7 which is useless and could easily trip people up because the intuitive expectation is that it will act like the seed example. If I am wrong in that expectation what is a use case for the seedless behaviour? Is there a way of writing the operation to give the same answer as the seed answer? The element in position 0 should surely have +2 added to itself, it's only being the seed out of convenience isn't it? Instead we get 1 + (1 + 2) + (1 + 2). The order of reduce's arguments, (seed, range) prevent it being used with UFCS properly. If it were (range, seed) then there would be no problem: [1,1,1].reduce!"a + b + 2"(0).writeln; // == 9
Oct 02 2012
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Wednesday, 3 October 2012 at 01:21:38 UTC, ixid wrote:
 If it were (range, seed) then there would be no problem:

 [1,1,1].reduce!"a + b + 2"(0).writeln; // == 9

My thoughts exactly.
Oct 02 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 While fiddling with this I came across something that seems odd 
 in the behaviour of reduce and wondered if it's intended. It 
 rather limits the usefulness of reduce with UFCS to "a + b" and 
 "a - b".

 Reduce works correctly when provided with a seed argument:

 reduce!"a + b + 2"(0, [1,1,1]).writeln; // == 9 as expected

 With UFCS I see no elegant way to reduce a chain with a more 
 sophisticated reduce argument than the two simple ones listed 
 above because the operation is not carried out on the first 
 array element, used as the seed. This means:

 [1,1,1].reduce!"a + b + 2".writeln; // == 7

 which is useless and could easily trip people up because the 
 intuitive expectation is that it will act like the seed 
 example. If I am wrong in that expectation what is a use case 
 for the seedless behaviour?

The fact that it does something different than the seeded version does not make the seedless version useless - in fact, that's what makes it useful. A typical use case is to find the maximum of a range (there is an example of this in the documentation at http://dlang.org/phobos/std_algorithm.html#reduce). If you don't know the highest possible value of elements, then you couldn't come up with an appropriate seed for this case. I agree that the difference between the two versions of reduce can be a bit confusing. Maybe it would be better if they were named differently.
 Is there a way of writing the operation to give the same answer 
 as the seed answer?

In general, no (apart from the silly reduce!f(chain([seed], range))). If you want the behavior of the seeded version, use the seeded version.
 The element in position 0 should surely have +2 added to 
 itself, it's only being the seed out of convenience isn't it?

No, reduce!f(r) is not merely a syntactic sugar for reduce!f(0, r) (you could say it's a syntactic sugar for reduce!f(r.front, r.drop(1), though). Making reduce!f(r) be the same as reduce!f(0, r) doesn't really make any sense - would you expect [1, 2, 3].reduce!"a * b" to return 0? I wouldn't.
 Instead we get 1 + (1 + 2) + (1 + 2). The order of reduce's 
 arguments, (seed, range) prevent it being used with UFCS 
 properly. If it were (range, seed) then there would be no 
 problem:

 [1,1,1].reduce!"a + b + 2"(0).writeln; // == 9

This would be better, yes. Reduce was written before there was UFCS, so they couldn't take this into account. I don't know if it is possible for this to be changed now, as it would definitely break some code.
Oct 02 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
A typical use case is to find the maximum of a range (there is 
an example of this in the documentation at 
http://dlang.org/phobos/std_algorithm.html#reduce). If you don't 
know the highest possible value of elements, then you couldn't 
come up with an appropriate seed for this case.

Fixing unseeded reduce would not remove that use case though, it just happens to be a case that's not broken by the way unseeded reduce works, why couldn't unseeded reduce compare a (the first datum) to itself? That would discover a isn't greater than a so leave a in position 0 and compare with all of the other elements. It would also avoid breaking uses like "a + b + 2" and "a + b * 2" as well as a raft of more complex uses that reduce should be able to do but requires map and then reduce at present. Walter Bright has a subtle bug in his code (the combined sum and square reduce) in his latest Dr Dobbs due to the behaviour of unseeded reduce. I'd call that more broken than unintuitive. int[] arr = [1,2,3,4,5]; auto r = arr.reduce!((a,b) => a + b, (a,b) => a + b * b); writefln("sum = %s, sum of squares = %s", r[0], r[1]); If you change the value of the first number to 2 the square sum is an incorrect 56 when it should be 58.
Oct 02 2012
prev sibling parent "ixid" <nuaccount gmail.com> writes:
 Stuff


Actually now I realise what the conflict is between the two, a + b * b would give the wrong answer when applied to the whole array in the manner I was thinking by doing a + a * a for the first value.
Oct 02 2012