digitalmars.D - std.parallel

digitalmars.D - std.parallel_algorithm

dsimcha (11/11) May 22 2011 Here's some very early work in progress on std.parallel_algorithm:

Russel Winder (25/39) May 22 2011 thm.d

dsimcha (5/29) May 23 2011 Yeah, I meant to mention that when I was talking about how much of a

Jonathan M Davis (11/26) May 22 2011 Well, unless you intend to try and have functions which choose whether t...
Andrej Mitrovic (4/4) May 22 2011 David, do you have some automated way of adding new modules to the
Andrej Mitrovic (2/2) May 22 2011 Russell, get parallelism.d from here:
Andrej Mitrovic (3/3) May 22 2011 Anyway here's some results on a quad-core AMD:
Russel Winder (14/16) May 23 2011 Too many ls in there but I'll assume it's me ;-)
Jonathan M Davis (4/12) May 23 2011 That's what David said in his initial post. He had to make additional ch...
Russel Winder (15/27) May 23 2011 nges=20

dsimcha (22/38) May 23 2011 Interesting. Thanks. Unfortunately it doesn't look like merge and

dsimcha <dsimcha yahoo.com> writes:

Here's some very early work in progress on std.parallel_algorithm:

https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d

I haven't compiled the docs yet because it's too much of a WIP and 
doesn't even work without a few modifications I made to std.parallelism. 
  So far I've got parallel merge, sort and dot product.

One issue is that most of these algorithms require pretty fine grained 
parallelism and the break even point depends on a lot of things, like 
the details of the hardware.  How should we go about documenting/solving 
this?  Is it reasonable to always just punt the issue of finding the 
break even point and whether one is above or below it to the user of the 
library?  If not, what is a reasonable solution?

May 22 2011

Russel Winder <russel russel.org.uk> writes:

David,

On Mon, 2011-05-23 at 00:00 -0400, dsimcha wrote:
 Here's some very early work in progress on std.parallel_algorithm:
=20
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algori=

thm.d
=20
 I haven't compiled the docs yet because it's too much of a WIP and=20
 doesn't even work without a few modifications I made to std.parallelism.=

=20
   So far I've got parallel merge, sort and dot product.
=20
 One issue is that most of these algorithms require pretty fine grained=

=20
 parallelism and the break even point depends on a lot of things, like=20
 the details of the hardware.  How should we go about documenting/solving=

=20
 this?  Is it reasonable to always just punt the issue of finding the=20
 break even point and whether one is above or below it to the user of the=

=20
 library?  If not, what is a reasonable solution?

Does this requires a version of std.parallelism that is not in the
standard Phobos?


+ dmd -m32 -lib std/parallel_algorithm.d
std/parallel_algorithm.d(379): Error: class std.parallelism.TaskPool
member defaultWorkUnitSize is not accessible

+ dmd -m64 -lib std/parallel_algorithm.d
std/parallel_algorithm.d(379): Error: class std.parallelism.TaskPool
member defaultWorkUnitSize is not accessible


--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

May 22 2011

dsimcha <dsimcha yahoo.com> writes:

On 5/23/2011 1:23 AM, Russel Winder wrote:
 David,

 On Mon, 2011-05-23 at 00:00 -0400, dsimcha wrote:
 Here's some very early work in progress on std.parallel_algorithm:

 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d

 I haven't compiled the docs yet because it's too much of a WIP and
 doesn't even work without a few modifications I made to std.parallelism.
    So far I've got parallel merge, sort and dot product.

 One issue is that most of these algorithms require pretty fine grained
 parallelism and the break even point depends on a lot of things, like
 the details of the hardware.  How should we go about documenting/solving
 this?  Is it reasonable to always just punt the issue of finding the
 break even point and whether one is above or below it to the user of the
 library?  If not, what is a reasonable solution?

 Does this requires a version of std.parallelism that is not in the
 standard Phobos?


 + dmd -m32 -lib std/parallel_algorithm.d
 std/parallel_algorithm.d(379): Error: class std.parallelism.TaskPool
 member defaultWorkUnitSize is not accessible

 + dmd -m64 -lib std/parallel_algorithm.d
 std/parallel_algorithm.d(379): Error: class std.parallelism.TaskPool
 member defaultWorkUnitSize is not accessible

Yeah, I meant to mention that when I was talking about how much of a 
work in progress it is.  I didn't expect anyone to actually want to test 
it this soon.  You need the slightly modified one at:

https://github.com/dsimcha/phobos/blob/master/std/parallelism.d

May 23 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-22 21:00, dsimcha wrote:
 Here's some very early work in progress on std.parallel_algorithm:
 
 https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorith
 m.d
 
 I haven't compiled the docs yet because it's too much of a WIP and
 doesn't even work without a few modifications I made to std.parallelism.
   So far I've got parallel merge, sort and dot product.
 
 One issue is that most of these algorithms require pretty fine grained
 parallelism and the break even point depends on a lot of things, like
 the details of the hardware.  How should we go about documenting/solving
 this?  Is it reasonable to always just punt the issue of finding the
 break even point and whether one is above or below it to the user of the
 library?  If not, what is a reasonable solution?

Well, unless you intend to try and have functions which choose whether they're 
parallel or not at runtime based on what they're given, I'd fully expect the 
issue of whether to use the parallel or non-parallel versions up to the 
programmer using them, and if that's the case, it's completely up to them to 
determine which is more efficient given their particular use case. At best, 
you could give some sort of information on what you expect the break-even 
point to be, and if it's highly system dependent (as it sounds like it is), 
then I don't see how you could do even that. So, really, I'd expect it to be 
completely up to the programmer.

- Jonathan M Davis

May 22 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

David, do you have some automated way of adding new modules to the
makefile? It's a bit of a drag to manually do this. ;)

Also, make sucks. I made a typo at line 429 and it reported an error
at line 542. Gah!

May 22 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Russell, get parallelism.d from here:
https://github.com/dsimcha/phobos/tree/master/std

May 22 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Anyway here's some results on a quad-core AMD:

https://gist.github.com/986322

Was too lazy to calculate an average out of N runs. :)

May 22 2011

Russel Winder <russel russel.org.uk> writes:

On Mon, 2011-05-23 at 08:16 +0200, Andrej Mitrovic wrote:
 Russell, get parallelism.d from here:
 https://github.com/dsimcha/phobos/tree/master/std

Too many ls in there but I'll assume it's me ;-)

So the implication is that std.parallelism has changed since the 2.053
release and that std.parallel_algorithm relies on this change?

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

May 23 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On 2011-05-23 02:07, Russel Winder wrote:
 On Mon, 2011-05-23 at 08:16 +0200, Andrej Mitrovic wrote:
 Russell, get parallelism.d from here:
 https://github.com/dsimcha/phobos/tree/master/std

 
 Too many ls in there but I'll assume it's me ;-)
 
 So the implication is that std.parallelism has changed since the 2.053
 release and that std.parallel_algorithm relies on this change?

That's what David said in his initial post. He had to make additional changes 
to std.parallelism for std.parallel_algorithm to work.

- Jonathan M Davis

May 23 2011

Russel Winder <russel russel.org.uk> writes:

Jonathan,

On Mon, 2011-05-23 at 02:15 -0700, Jonathan M Davis wrote:
 On 2011-05-23 02:07, Russel Winder wrote:
 On Mon, 2011-05-23 at 08:16 +0200, Andrej Mitrovic wrote:
 Russell, get parallelism.d from here:
 https://github.com/dsimcha/phobos/tree/master/std

=20
 Too many ls in there but I'll assume it's me ;-)
=20
 So the implication is that std.parallelism has changed since the 2.053
 release and that std.parallel_algorithm relies on this change?

=20
 That's what David said in his initial post. He had to make additional cha=

nges=20
 to std.parallelism for std.parallel_algorithm to work.

Ah, thanks for the reminder.  That'll teach me to skim read, get the
URL, act, and fail to actually note important information.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

May 23 2011

dsimcha <dsimcha yahoo.com> writes:

On 5/23/2011 6:10 AM, Russel Winder wrote:
 Jonathan,

 On Mon, 2011-05-23 at 02:15 -0700, Jonathan M Davis wrote:
 On 2011-05-23 02:07, Russel Winder wrote:
 On Mon, 2011-05-23 at 08:16 +0200, Andrej Mitrovic wrote:
 Russell, get parallelism.d from here:
 https://github.com/dsimcha/phobos/tree/master/std

 Too many ls in there but I'll assume it's me ;-)

 So the implication is that std.parallelism has changed since the 2.053
 release and that std.parallel_algorithm relies on this change?

 That's what David said in his initial post. He had to make additional changes
 to std.parallelism for std.parallel_algorithm to work.

 Ah, thanks for the reminder.  That'll teach me to skim read, get the
 URL, act, and fail to actually note important information.

Interesting.  Thanks.  Unfortunately it doesn't look like merge and 
dotProduct scale to a quad core very well, because I tuned these to be 
very close to the break-even point on a dual core.  This is part of my 
concern:  In the worst case, stuff that's very close to break-even on a 
dual or quad core can get **slower** when run on a 6 or 8 core.  This is 
just a fact of life.  Eventually, I'd like to push the break-even point 
for std.parallelism a little lower and I know roughly how to do it 
(mainly by implementing work stealing instead of a simple FIFO queue). 
However, there are tons of nagging little implementation issues and 
there would still always be a break even point, even if it's lower.

Right now the break-even point seems to be on the order of a few 
microseconds per task, which on modern hardware equates to ~10,000 clock 
cycles.  This is good enough for a lot of practical purposes, but 
obviously not nano-parallelism.  I've made improving this a fairly low 
priority.  Given the implementation difficulties, the fact that it's 
fairly easy to take full advantage of multicore hardware without getting 
down to this granularity (usually if you need more performance you're 
executing some expensive algorithm in a loop and you can just 
parallelize the outer loop and be done with it) and the fact that 
there's plenty of other useful hacking jobs in the D ecosystem, the 
benefits-to-effort ratio doesn't seem that good.

May 23 2011

D Programming

C/C++ Programming

Other

digitalmars.D - std.parallel_algorithm