www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - sliced().array compatibility with parallel?

reply Jay Norwood <jayn prismnet.com> writes:
I'm playing around with win32, v2.069.2 dmd and "dip80-ndslice": 
"~>0.8.8".  If I convert the 2D slice with .array(), should that 
first dimension then be compatible with parallel foreach?

I find that without using parallel, all the means get computed, 
but with parallel, only about  half of them are computed in this 
example.  The others remain NaN, examined in the debugger in 
Visual D.

import std.range : iota;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.conv : to;
import std.stdio;
import std.experimental.ndslice;

enum testCount = 1;
double[1000] means;
double[] data;

void f1() {
  import std.parallelism;
  auto sl = data.sliced(1000,100_000);
  auto sla = sl.array();
  foreach(i,vec; parallel(sla)){
   double v=vec.sum(0.0);
   means[i] = v / 100_000;
  }
}

void main() {
  data = new double[100_000_000];
  for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
  auto r = benchmark!(f1)(testCount);
  auto f0Result = to!Duration(r[0] / testCount);
  f0Result.writeln;
  writeln(means[0]);
}
Jan 09
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]
It is a bug (Slice or Parallel ?). Please fill this issue. Slice should work with parallel, and array of slices should work with parallel.
Jan 09
parent Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 00:41:35 UTC, Ilya Yaroshenko wrote:
 It is a bug (Slice or Parallel ?). Please fill this issue.
 Slice should work with parallel, and array of slices should 
 work with parallel.
Ok, thanks, I'll submit it.
Jan 09
prev sibling next sibling parent Jay Norwood <jayn prismnet.com> writes:
for example,
means[63]  through means[251] are consistently all NaN when using 
parallel in this test, but are all computed double values when 
parallel is not used.
Jan 09
prev sibling next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 I find that without using parallel, all the means get computed, 
 but with parallel, only about  half of them are computed in 
 this example.  The others remain NaN, examined in the debugger 
 in Visual D.

 import std.range : iota;
 import std.array : array;
 import std.algorithm;
 import std.datetime;
 import std.conv : to;
 import std.stdio;
 import std.experimental.ndslice;

 enum testCount = 1;
 double[1000] means;
 double[] data;

 void f1() {
  import std.parallelism;
  auto sl = data.sliced(1000,100_000);
  auto sla = sl.array();
  foreach(i,vec; parallel(sla)){
   double v=vec.sum(0.0);
   means[i] = v / 100_000;
  }
 }

 void main() {
  data = new double[100_000_000];
  for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;}
  auto r = benchmark!(f1)(testCount);
  auto f0Result = to!Duration(r[0] / testCount);
  f0Result.writeln;
  writeln(means[0]);
 }
This is a bug in std.parallelism :-) Proof: import std.range : iota; import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import mir.ndslice; import std.parallelism; enum testCount = 1; double[1000] means; double[] data; void f1() { //auto sl = data.sliced(1000, 100_000); //auto sla = sl.array(); auto sla = new double[][1000]; foreach(i, ref e; sla) { e = data[i * 100_000 .. (i+1) * 100_000]; } foreach(i,vec; parallel(sla)) { double v = vec.sum; means[i] = v / vec.length; } } void main() { data = new double[100_000_000]; foreach(i, ref e; data){ e = i / 100_000_000.0; } auto r = benchmark!(f1)(testCount); auto f0Result = to!Duration(r[0] / testCount); f0Result.writeln; writeln(means); } Prints: [0.000499995, 0.0015, 0.0025, 0.0035, 0.00449999, 0.00549999, 0.00649999, 0.00749999, 0.00849999, 0.00949999, 0.0105, 0.0115, 0.0125, 0.0135, 0.0145, 0.0155, 0.0165, 0.0175, 0.0185, 0.0195, 0.0205, 0.0215, 0.0225, 0.0235, 0.0245, 0.0255, 0.0265, 0.0275, 0.0285, 0.0295, 0.0305, 0.0315, 0.0325, 0.0335, 0.0345, 0.0355, 0.0365, 0.0375, 0.0385, 0.0395, 0.0405, 0.0415, 0.0425, 0.0435, 0.0445, 0.0455, 0.0465, 0.0475, 0.0485, 0.0495, 0.0505, 0.0515, 0.0525, 0.0535, 0.0545, 0.0555, 0.0565, 0.0575, 0.0585, 0.0595, 0.0605, 0.0615, 0.0625, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan ....
Jan 09
parent reply Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 00:47:29 UTC, Ilya Yaroshenko wrote:
 This is a bug in std.parallelism :-)
ok, thanks. I'm using your code and reduced it a bit. Looks like it has some interaction with executing vec.sum. If I substitute a simple assign of a double value, then all the values are updated in the parallel version also. import std.algorithm; double[1000] dvp; double[1000] dv2; double[] data; void f1() { import std.parallelism; auto sla = new double[][1000]; foreach(i, ref e; sla) { e = data[i * 100_000 .. (i+1) * 100_000]; } // calculate sums in parallel foreach(i, vec; parallel(sla)){ dvp[i] = vec.sum; } // calculate same values non-parallel foreach(i, vec; sla){ dv2[i] = vec.sum; } } int main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} f1(); // processed non-parallel works ok foreach( dv; dv2){ if(dv != dv){ // test for NaN return 1; } } // calculated parallel leaves out processing of many values foreach( dv; dvp){ if(dv != dv){ // test for NaN return 1; } } return(0); }
Jan 09
parent reply Russel Winder via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
T24gU3VuLCAyMDE2LTAxLTEwIGF0IDAxOjQ2ICswMDAwLCBKYXkgTm9yd29vZCB2aWEgRGlnaXRh
bG1hcnMtZC1sZWFybgp3cm90ZToKPiAKW+KApl0KPiDCoMKgwqDCoMKgLy8gcHJvY2Vzc2VkIG5v
bi1wYXJhbGxlbCB3b3JrcyBvawo+IMKgwqDCoMKgwqBmb3JlYWNoKCBkdjsgZHYyKXsKPiDCoMKg
wqDCoMKgwqDCoMKgwqBpZihkdiAhPSBkdil7IC8vIHRlc3QgZm9yIE5hTgo+IMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgcmV0dXJuIDE7Cj4gwqDCoMKgwqDCoMKgwqDCoMKgfQo+IMKgwqDCoMKg
wqB9Cj4gCj4gwqDCoMKgwqDCoC8vIGNhbGN1bGF0ZWQgcGFyYWxsZWwgbGVhdmVzIG91dCBwcm9j
ZXNzaW5nIG9mIG1hbnkgdmFsdWVzCj4gwqDCoMKgwqDCoGZvcmVhY2goIGR2OyBkdnApewo+IMKg
wqDCoMKgwqDCoMKgwqDCoGlmKGR2ICE9IGR2KXsgLy8gdGVzdCBmb3IgTmFOCj4gwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gMTsKPiDCoMKgwqDCoMKgwqDCoMKgwqB9Cj4gwqDCoMKg
wqDCoH0KPiDCoMKgwqDCoMKgcmV0dXJuKDApOwo+IH0KCkkgYW0gbm90IGNvbnZpbmNlZCB0aGVz
ZSAiVGVzdHMgZm9yIE5hTiIgYWN0dWFsbHkgdGVzdCBmb3IgTmFOLiBJCmJlbGlldmUgeW91IGhh
dmUgdG8gdXNlIGlzTmFuKGR2KS4KCi0tIApSdXNzZWwuCj09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09CkRy
IFJ1c3NlbCBXaW5kZXIgICAgICB0OiArNDQgMjAgNzU4NSAyMjAwICAgdm9pcDogc2lwOnJ1c3Nl
bC53aW5kZXJAZWtpZ2EubmV0CjQxIEJ1Y2ttYXN0ZXIgUm9hZCAgICBtOiArNDQgNzc3MCA0NjUg
MDc3ICAgeG1wcDogcnVzc2VsQHdpbmRlci5vcmcudWsKTG9uZG9uIFNXMTEgMUVOLCBVSyAgIHc6
IHd3dy5ydXNzZWwub3JnLnVrICBza3lwZTogcnVzc2VsX3dpbmRlcgoK
Jan 10
parent Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 12:11:39 UTC, Russel Winder wrote:
      foreach( dv; dvp){
          if(dv != dv){ // test for NaN
              return 1;
          }
      }
      return(0);
 }
I am not convinced these "Tests for NaN" actually test for NaN. I believe you have to use isNan(dv).
I saw it mentioned in another post, and tried it. Works.
Jan 10
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]
Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
Jan 09
next sibling parent reply Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:
 On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]
Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
ok, thanks. That works. I'll go back to trying ndslice now.
Jan 09
parent reply Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:
 ok, thanks.  That works. I'll go back to trying ndslice now.
The parallel time for this case is about a 2x speed-up on my corei5 laptop, debug build in windows32, dmd. D:\ec_mars_ddt\workspace\nd8>nd8.exe parallel time msec:2495 non_parallel msec:5093 =========== import std.array : array; import std.algorithm; import std.datetime; import std.conv : to; import std.stdio; import std.experimental.ndslice; shared double[1000] means; double[] data; void f1() { import std.parallelism; auto sl = data.sliced(1000,100_000); foreach(i,vec; parallel(sl)){ means[i] = vec.sum / 100_000; } } void f2() { auto sl = data.sliced(1000,100_000); foreach(i,vec; sl.array){ means[i] = vec.sum / 100_000; } } void main() { data = new double[100_000_000]; for(int i=0;i<100_000_000;i++){ data[i] = i/100_000_000.0;} StopWatch sw1, sw2; sw1.start(); f1() ; auto r1 = sw1.peek().msecs; sw2.start(); f2(); auto r2 = sw2.peek().msecs; writeln("parallel time msec:",r1); writeln("non_parallel msec:", r2); }
Jan 09
parent reply Ilya <ilyayaroshenko gmail.com> writes:
On Sunday, 10 January 2016 at 02:43:05 UTC, Jay Norwood wrote:
 On Sunday, 10 January 2016 at 01:54:18 UTC, Jay Norwood wrote:
 [...]
The parallel time for this case is about a 2x speed-up on my corei5 laptop, debug build in windows32, dmd. [...]
I will add significantly faster pairwise summation based on SIMD instructions into the future std.las. --Ilya
Jan 09
parent Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 03:23:14 UTC, Ilya wrote:
 I will add significantly faster pairwise summation based on 
 SIMD instructions into the future std.las. --Ilya
Wow! A lot of overhead in the debug build. I checked the computed values are the same. This is on my laptop corei5. dub -b release-nobounds --force parallel time msec:448 non_parallel msec:767 dub -b debug --force parallel time msec:2465 non_parallel msec:4962 on my corei7 desktop, the release-no bounds parallel time msec:161 non_parallel msec:571
Jan 10
prev sibling parent reply Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Sunday, 10 January 2016 at 01:16:43 UTC, Ilya Yaroshenko wrote:
 On Saturday, 9 January 2016 at 23:20:00 UTC, Jay Norwood wrote:
 I'm playing around with win32, v2.069.2 dmd and 
 "dip80-ndslice": "~>0.8.8".  If I convert the 2D slice with 
 .array(), should that first dimension then be compatible with 
 parallel foreach?

 [...]
Oh... there is no bug. means must be shared =) : ---- shared double[1000] means; ----
I'd say, if `shared` is required, but it compiles without, then it's still a bug.
Jan 10
parent Jay Norwood <jayn prismnet.com> writes:
On Sunday, 10 January 2016 at 11:21:53 UTC, Marc Schütz wrote:
 I'd say, if `shared` is required, but it compiles without, then 
 it's still a bug.
Yeah, probably so. Interestingly, without 'shared' and using a simple assignment from a constant (means[i]= 1.0;), instead of assignment from the sum() evaluation, results in all the values being initialized, so not marking it shared doesn't protect it from being written from the other thread. Anyway, the shared declaration doesn't seem to slow the execution, and it does make sense to me that it should be marked shared.
Jan 10