## digitalmars.D.learn - How to use D parallel functions/library

• Bishop120 (66/66) Nov 24 2015 Hey everyone. A new D learner here. So far I love D and how
• anonymous (22/28) Nov 24 2015 I'm not sure what you're asking. Are you maybe looking for
• thedeemon (14/20) Nov 24 2015 Incrementing often the same variable from different parallel
Bishop120 <thomas.tc.coolidge gmail.com> writes:
```Hey everyone.  A new D learner here.  So far I love D and how
much better its working than C++.  One thing I like doing is
parallel functions so with C++ using OMP.  Right now Im trying to
figure out how to do Conways Game of Life in D in parallel.
Serially D is much faster than C++ so I feel fairly confident
that it should be faster using D's parallelism library.

In C++ with OMP its pretty easy to do a parallel for with a
private and a reduction variable but I am having problems
understanding how to do this in D.  Heres the meat of my parallel
code for the Game of Life.  Can yall help me understand how to
convert this to D?

//Iterate through 2d matrix ignoring the border cells (starting
at 1 and going to matrix size)
#pragma omp for private (x) reduction (+:alive) schedule (dynamic)
for (int i = 1; i <= sizeX; i++)
{
for (int j = 1; j <= sizeY; j++)
{
//Set X to 0... sumerize all 8 of X's neighbors including
border cells
x = 0;
x += matrixA[i - 1][j] + matrixA[i + 1][j] + matrixA[i][j -
1] + matrixA[i][j + 1] + matrixA[i - 1][j - 1] + matrixA[i - 1][j
+ 1] + matrixA[i + 1][j - 1] + matrixA[i + 1][j + 1];

//If cell is alive
if (matrixA[i][j] == true)
{
//Cell dies if it doesnot have 2 or 3 neighbors
if (x < 2 || x > 3)
{
matrixB[i][j] = false;
}
//Mark cell as alive in matrix B
else
{
matrixB[i][j] = true;
alive++;
}
}

//If cell is not alive
else
{
//Cell becomes alive if it has exactly 3 neighbors
if (x == 3)
{
//Mark cell alive in matrix B
matrixB[i][j] = true;
alive++;
}
}
}
}

The Matrices are bools since its only alive or dead.  I keep
track of the number of alive cells so that I can see at a glance
if things are working correctly since the same seed run the same
number of iterations will always have the same outcome.  For
simplicity sake imagine that the matrices are 2002 x 2002.  The
reason they are extra rows and columns is so that I can do wrap
around but thats not relevant here.

I figured this would be a simple parallel foreach function with
an iota range of sizeX and just making int X declared inside the
function so that I didnt have to worry about shared variable but
I cant get around the alive++ reduction and I dont understand
enough about D's reduction/parallel library.

Any ideas?  Thanks in advance for yalls patience and assistance!

Thomas
```
Nov 24 2015
anonymous <anonymous example.com> writes:
```On 24.11.2015 19:49, Bishop120 wrote:
I figured this would be a simple parallel foreach function with an iota
range of sizeX and just making int X declared inside the function so
that I didnt have to worry about shared variable but I cant get around
the alive++ reduction and I dont understand enough about D's
reduction/parallel library.

Any ideas?  Thanks in advance for yalls patience and assistance!

I'm not sure what you're asking. Are you maybe looking for
core.atomic.atomicOp?

Example:
----
import core.atomic: atomicOp;
import std.parallelism: parallel;
import std.range: iota;
import std.stdio: writeln;

void main()
{
int x = 0;
shared int y = 0;
foreach(i; parallel(iota(100_000)))
{
++x;
y.atomicOp!"+="(1);
}
writeln(x); /* usually less than 100_000 */
writeln(y); /* 100_000 */
}
----
```
Nov 24 2015
thedeemon <dlang thedeemon.com> writes:
```On Tuesday, 24 November 2015 at 18:49:25 UTC, Bishop120 wrote:
I figured this would be a simple parallel foreach function with
an iota range of sizeX and just making int X declared inside
the function so that I didnt have to worry about shared
variable but I cant get around the alive++ reduction and I dont
understand enough about D's reduction/parallel library.

Any ideas?  Thanks in advance for yalls patience and assistance!

Incrementing often the same variable from different parallel
threads is a very bad idea in terms of performance. I would
suggest counting number of alive cells for each row independently
(in a local non-shared variable) and storing it to an array (one
value per row), then after the loop sum them up.

auto aliveCellsPerRow = new int[N];

foreach(i; iota(N).parallel) {
int aliveHere;
//...process a row...
aliveCellsPerRow[i] = aliveHere;
}

alive = aliveCellsPerRow.sum;

Then everything will be truly parallel, correct and fast.
```
Nov 24 2015