digitalmars.D.learn - sliced().array compatibility with parallel?

Jay Norwood (34/34) Jan 09 2016 I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice":

Ilya Yaroshenko (4/9) Jan 09 2016 It is a bug (Slice or Parallel ?). Please fill this issue.

Jay Norwood (2/5) Jan 09 2016 Ok, thanks, I'll submit it.

Jay Norwood (4/4) Jan 09 2016 for example,
Ilya Yaroshenko (54/89) Jan 09 2016 This is a bug in std.parallelism :-)

Jay Norwood (43/44) Jan 09 2016 ok, thanks. I'm using your code and reduced it a bit. Looks

Russel Winder via Digitalmars-d-learn (17/17) Jan 10 2016 T24gU3VuLCAyMDE2LTAxLTEwIGF0IDAxOjQ2ICswMDAwLCBKYXkgTm9yd29vZCB2aWEgRGln...

Jay Norwood (2/12) Jan 10 2016 I saw it mentioned in another post, and tried it. Works.

Ilya Yaroshenko (6/11) Jan 09 2016 Oh... there is no bug.

Jay Norwood (2/14) Jan 09 2016 ok, thanks. That works. I'll go back to trying ndslice now.

Jay Norwood (41/42) Jan 09 2016 The parallel time for this case is about a 2x speed-up on my

Ilya (3/8) Jan 09 2016 I will add significantly faster pairwise summation based on SIMD

Jay Norwood (12/14) Jan 10 2016 Wow! A lot of overhead in the debug build. I checked the

Marc =?UTF-8?B?U2Now7x0eg==?= (3/15) Jan 10 2016 I'd say, if `shared` is required, but it compiles without, then

Jay Norwood (8/10) Jan 10 2016 Yeah, probably so. Interestingly, without 'shared' and using a

Jay Norwood <jayn prismnet.com> writes:

I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": 
"~>0.8.8".  If I convert the 2D slice with .array(), should that 
first dimension then be compatible with parallel foreach?

I find that without using parallel, all the means get computed, 
but with parallel, only about  half of them are computed in this 
example.  The others remain NaN, examined in the debugger in 
Visual D.

import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

enum testCount = 1;
double[1000] means;
double[] data;

void f1() {
  import std.parallelism;
  auto sl = data.sliced(1000,100_000);
  auto sla = sl.array();
  foreach(i,vec; parallel(sla)){
   double v=vec.sum(0.0);
   means[i] = v / 100_000;
  }
}

void main() {
  data = new double[100_000_000];
  for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
  auto r = benchmark!(f1)(testCount);
  auto f0Result = to!Duration(r[0] / testCount);
  f0Result.writeln;
  writeln(means[0]);
}

Jan 09 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]

It is a bug (Slice or Parallel ?). Please fill this issue.
Slice should work with parallel, and array of slices should work 
with parallel.

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 00:41:35 UTC, Ilya Yaroshenko wrote:
 It is a bug (Slice or Parallel ?). Please fill this issue.
 Slice should work with parallel, and array of slices should 
 work with parallel.

Ok, thanks, I'll submit it.

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

for example,
means[63]  through means[251] are consistently all NaN when using 
parallel in this test, but are all computed double values when 
parallel is not used.

Jan 09 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 I find that without using parallel, all the means get computed, 
 but with parallel, only about  half of them are computed in 
 this example.  The others remain NaN, examined in the debugger 
 in Visual D.

 import std.range : iota;
 import std.array : array;
 import std.algorithm;
 import std.datetime;
 import std.conv : to;
 import std.stdio;
 import std.experimental.ndslice;

 enum testCount = 1;
 double[1000] means;
 double[] data;

 void f1() {
  import std.parallelism;
  auto sl = data.sliced(1000,100_000);
  auto sla = sl.array();
  foreach(i,vec; parallel(sla)){
   double v=vec.sum(0.0);
   means[i] = v / 100_000;
  }
 }

 void main() {
  data = new double[100_000_000];
  for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
  auto r = benchmark!(f1)(testCount);
  auto f0Result = to!Duration(r[0] / testCount);
  f0Result.writeln;
  writeln(means[0]);
 }

This is a bug in std.parallelism :-)

Proof:

import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import mir.ndslice;

import std.parallelism;

enum testCount = 1;

double[1000] means;
double[] data;

void f1() {
	//auto sl = data.sliced(1000, 100_000);
	//auto sla = sl.array();
	auto sla = new double[][1000];
	foreach(i, ref e; sla)
	{
		e = data[i * 100_000 .. (i+1) * 100_000];
	}
	foreach(i,vec; parallel(sla))
	{
		double v = vec.sum;
		means[i] = v / vec.length;
	}
}

void main() {
	data = new double[100_000_000];
	foreach(i, ref e; data){
		e = i / 100_000_000.0;
	}
	auto r = benchmark!(f1)(testCount);
	auto f0Result = to!Duration(r[0] / testCount);
	f0Result.writeln;
	writeln(means);
}

Prints:
[0.000499995, 0.0015, 0.0025, 0.0035, 0.00449999, 0.00549999, 
0.00649999, 0.00749999, 0.00849999, 0.00949999, 0.0105, 0.0115, 
0.0125, 0.0135, 0.0145, 0.0155, 0.0165, 0.0175, 0.0185, 0.0195, 
0.0205, 0.0215, 0.0225, 0.0235, 0.0245, 0.0255, 0.0265, 0.0275, 
0.0285, 0.0295, 0.0305, 0.0315, 0.0325, 0.0335, 0.0345, 0.0355, 
0.0365, 0.0375, 0.0385, 0.0395, 0.0405, 0.0415, 0.0425, 0.0435, 
0.0445, 0.0455, 0.0465, 0.0475, 0.0485, 0.0495, 0.0505, 0.0515, 
0.0525, 0.0535, 0.0545, 0.0555, 0.0565, 0.0575, 0.0585, 0.0595, 
0.0605, 0.0615, 0.0625, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan ....

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 00:47:29 UTC, Ilya Yaroshenko wrote:
 This is a bug in std.parallelism :-)

ok, thanks.  I'm using your code and reduced it a bit.  Looks 
like it has some interaction with executing vec.sum.  If I 
substitute a simple assign of a double value, then all the values 
are updated in the parallel version also.


import std.algorithm;

double[1000] dvp;
double[1000] dv2;
double[] data;

void f1() {
     import std.parallelism;
     auto sla = new double[][1000];
     foreach(i, ref e; sla)
     {
         e = data[i * 100_000 .. (i+1) * 100_000];
     }

     // calculate sums in parallel
     foreach(i, vec; parallel(sla)){
         dvp[i] = vec.sum;
     }

     // calculate same values non-parallel
     foreach(i, vec; sla){
         dv2[i] = vec.sum;
     }
}

int main() {
     data = new double[100_000_000];
     for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
     f1();

     // processed non-parallel works ok
     foreach( dv; dv2){
         if(dv != dv){ // test for NaN
             return 1;
         }
     }

     // calculated parallel leaves out processing of many values
     foreach( dv; dvp){
         if(dv != dv){ // test for NaN
             return 1;
         }
     }
     return(0);
}

Jan 09 2016

Russel Winder via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

T24gU3VuLCAyMDE2LTAxLTEwIGF0IDAxOjQ2ICswMDAwLCBKYXkgTm9yd29vZCB2aWEgRGlnaXRh
bG1hcnMtZC1sZWFybgp3cm90ZToKPiAKW+KApl0KPiDCoMKgwqDCoMKgLy8gcHJvY2Vzc2VkIG5v
bi1wYXJhbGxlbCB3b3JrcyBvawo+IMKgwqDCoMKgwqBmb3JlYWNoKCBkdjsgZHYyKXsKPiDCoMKg
wqDCoMKgwqDCoMKgwqBpZihkdiAhPSBkdil7IC8vIHRlc3QgZm9yIE5hTgo+IMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgcmV0dXJuIDE7Cj4gwqDCoMKgwqDCoMKgwqDCoMKgfQo+IMKgwqDCoMKg
wqB9Cj4gCj4gwqDCoMKgwqDCoC8vIGNhbGN1bGF0ZWQgcGFyYWxsZWwgbGVhdmVzIG91dCBwcm9j
ZXNzaW5nIG9mIG1hbnkgdmFsdWVzCj4gwqDCoMKgwqDCoGZvcmVhY2goIGR2OyBkdnApewo+IMKg
wqDCoMKgwqDCoMKgwqDCoGlmKGR2ICE9IGR2KXsgLy8gdGVzdCBmb3IgTmFOCj4gwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gMTsKPiDCoMKgwqDCoMKgwqDCoMKgwqB9Cj4gwqDCoMKg
wqDCoH0KPiDCoMKgwqDCoMKgcmV0dXJuKDApOwo+IH0KCkkgYW0gbm90IGNvbnZpbmNlZCB0aGVz
ZSAiVGVzdHMgZm9yIE5hTiIgYWN0dWFsbHkgdGVzdCBmb3IgTmFOLiBJCmJlbGlldmUgeW91IGhh
dmUgdG8gdXNlIGlzTmFuKGR2KS4KCi0tIApSdXNzZWwuCj09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09CkRy
IFJ1c3NlbCBXaW5kZXIgICAgICB0OiArNDQgMjAgNzU4NSAyMjAwICAgdm9pcDogc2lwOnJ1c3Nl
bC53aW5kZXJAZWtpZ2EubmV0CjQxIEJ1Y2ttYXN0ZXIgUm9hZCAgICBtOiArNDQgNzc3MCA0NjUg
MDc3ICAgeG1wcDogcnVzc2VsQHdpbmRlci5vcmcudWsKTG9uZG9uIFNXMTEgMUVOLCBVSyAgIHc6
IHd3dy5ydXNzZWwub3JnLnVrICBza3lwZTogcnVzc2VsX3dpbmRlcgoK

Jan 10 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 12:11:39 UTC, Russel Winder wrote:
      foreach( dv; dvp){
          if(dv != dv){ // test for NaN
              return 1;
          }
      }
      return(0);
 }

 I am not convinced these "Tests for NaN" actually test for NaN. 
 I
 believe you have to use isNan(dv).

I saw it mentioned in another post, and tried it.  Works.

Jan 10 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]

Oh... there is no bug.
means must be shared =) :
----
shared double[1000] means;
----

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:
 On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]

 Oh... there is no bug.
 means must be shared =) :
 ----
 shared double[1000] means;
 ----

ok, thanks.  That works. I'll go back to trying ndslice now.

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:
 ok, thanks.  That works. I'll go back to trying ndslice now.

The parallel time for this case is about a 2x speed-up on my 
corei5 laptop, debug build in windows32, dmd.

D:\ec_mars_ddt\workspace\nd8>nd8.exe
parallel time msec:2495
non_parallel msec:5093

===========
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

shared double[1000] means;
double[] data;

void f1() {
     import std.parallelism;
     auto sl = data.sliced(1000,100_000);
     foreach(i,vec; parallel(sl)){
         means[i] = vec.sum / 100_000;
     }
}

void f2() {
     auto sl = data.sliced(1000,100_000);
     foreach(i,vec; sl.array){
         means[i] = vec.sum / 100_000;
     }
}

void main() {
     data = new double[100_000_000];
     for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
     StopWatch sw1, sw2;
     sw1.start();
     f1() ;
     auto r1 = sw1.peek().msecs;
     sw2.start();
     f2();
     auto r2 = sw2.peek().msecs;

     writeln("parallel time msec:",r1);
     writeln("non_parallel msec:", r2);
}

Jan 09 2016

Ilya <ilyayaroshenko gmail.com> writes:

On Sunday, 10 January 2016 at 02:43:05 UTC, Jay Norwood wrote:
 On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:
 [...]

 The parallel time for this case is about a 2x speed-up on my 
 corei5 laptop, debug build in windows32, dmd.

 [...]

I will add significantly faster pairwise summation based on SIMD 
instructions into the future std.las. --Ilya

Jan 09 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 03:23:14 UTC, Ilya wrote:
 I will add significantly faster pairwise summation based on 
 SIMD instructions into the future std.las. --Ilya

Wow! A lot of overhead in the debug build.  I checked the 
computed values are the same.  This is on my laptop corei5.

dub -b release-nobounds --force
parallel time msec:448
non_parallel msec:767

dub -b debug --force
parallel time msec:2465
non_parallel msec:4962

on my corei7 desktop, the release-no bounds
parallel time msec:161
non_parallel msec:571

Jan 10 2016

Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:

On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:
 On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]

 Oh... there is no bug.
 means must be shared =) :
 ----
 shared double[1000] means;
 ----

I'd say, if `shared` is required, but it compiles without, then 
it's still a bug.

Jan 10 2016

Jay Norwood <jayn prismnet.com> writes:

On Sunday, 10 January 2016 at 11:21:53 UTC, Marc Schütz wrote:
 I'd say, if `shared` is required, but it compiles without, then 
 it's still a bug.

Yeah, probably so.  Interestingly, without 'shared' and using a 
simple assignment from a constant (means[i]= 1.0;), instead of 
assignment from the sum() evaluation, results in all the values 
being initialized, so not marking it shared doesn't protect it 
from being written from the other thread.  Anyway, the shared 
declaration doesn't seem to slow the execution, and it does make 
sense to me that it should be marked shared.

Jan 10 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - sliced().array compatibility with parallel?