digitalmars.D - Parallel programming

bearophile (10/10) Jul 15 2008 How much time do we have to wait to see some parallel processing feature...

Sean Kelly (24/36) Jul 15 2008 I asked for parallelization support for foreach... well, ages ago. At

downs (4/10) Jul 15 2008 For what it's worth, coroutines and futures are also in tools. ( http://...

downs (24/39) Jul 15 2008 Patched GDC supports autovectorization with -ftree-vectorize, although t...
Markus Koskimies (36/40) Jul 15 2008 A very short answer; for true parallel processing, 2-4 processors is

Sean Kelly (24/31) Jul 15 2008 I agree completely. MP is easy to comprehend (it's how people naturally

Markus Koskimies (17/42) Jul 15 2008 I couldn't agree more. MP is very natural way for us humans to organize

Markus Koskimies (2/4) Jul 15 2008 Many *even* highly etc. etc. :oops:

JAnderson (6/22) Jul 15 2008 I'm hoping that the new "Pure" stuff Walter is working on, will enable
Jascha Wetzel (3/19) Jul 16 2008 agreed, we absolutely need an OpenMP (http://www.openmp.org)

bearophile <bearophileHUGS lycos.com> writes:

How much time do we have to wait to see some parallel processing features in D?
People are getting more and more rabid because they have few ways to use their
2-4 core CPUs.
Classic multithreading is useful, but sometimes it's not easy to use correctly.

There are other ways to write parallel code, that D may adopt (more than one
way is probably better, no silver bullet exists in this field). Their point is
to allow to use the 2-4+ core CPUs of today (and maybe the 80-1000+ cores of
the future) in non-speed-critical parts of the code where the programmer wants
to use the other cores anyway, without too much programming efforts.

I think Walter wants D language to be multi-paradigm; one of the best ways to
allow multi-processing in a simple and safer way is the Stream Processing
(http://en.wikipedia.org/wiki/Stream_Processing ), D syntax may grow few
constructs to use such kind of programming in a simple way (C++ has some such
libs, I think).

Another easy way to perform multi processing is to vectorize. It means the
compiler can automatically use all the cores to perform operators like
array1+array2+array3.

Another way to perform multi processing is so add to the D syntax the
parallel_for (and few related things to merge things back, etc) syntax that was
present in the "Parallel Pascal" language. Such things are quite simpler to use
correctly than threads. The new "Fortress" language by Sun shows similar
things, but they are more refined compared to the Parallel Pascal ones (and
they look more complex to understand and use, so they may be overkill for D, I
don't know. Some of those parallel things of Fortress look quite difficult to
implement to me).
Time ago I have seen a form of parallel_for and the like in a small and easy
language from MIT, that I think are simple enough.

Other ways to use parallel code are now being pushed by Intel, OpenMP, and the
hairy but usable CUDA by Nvidia (I am not sure I want to learn CUDA, it's a C
variant, but seems to require a large human memory and a large human brain to
be used, while I think D may have simpler built-in things. Much more "serious"
D programmers may use external libs that allow them any fine control they
want). I to me they look too much in flux now to be copied too much by D.

Bye,
bearophile

Jul 15 2008

Sean Kelly <sean invisibleduck.org> writes:

bearophile wrote:
 How much time do we have to wait to see some parallel processing features in
D? People are getting more and more rabid because they have few ways to use
their 2-4 core CPUs.
 Classic multithreading is useful, but sometimes it's not easy to use correctly.
 
 There are other ways to write parallel code, that D may adopt (more than one
way is probably better, no silver bullet exists in this field). Their point is
to allow to use the 2-4+ core CPUs of today (and maybe the 80-1000+ cores of
the future) in non-speed-critical parts of the code where the programmer wants
to use the other cores anyway, without too much programming efforts.
 
 I think Walter wants D language to be multi-paradigm; one of the best ways to
allow multi-processing in a simple and safer way is the Stream Processing
(http://en.wikipedia.org/wiki/Stream_Processing ), D syntax may grow few
constructs to use such kind of programming in a simple way (C++ has some such
libs, I think).
 
 Another easy way to perform multi processing is to vectorize. It means the
compiler can automatically use all the cores to perform operators like
array1+array2+array3.
 
 Another way to perform multi processing is so add to the D syntax the
parallel_for (and few related things to merge things back, etc) syntax that was
present in the "Parallel Pascal" language. Such things are quite simpler to use
correctly than threads. The new "Fortress" language by Sun shows similar
things, but they are more refined compared to the Parallel Pascal ones (and
they look more complex to understand and use, so they may be overkill for D, I
don't know. Some of those parallel things of Fortress look quite difficult to
implement to me).
 Time ago I have seen a form of parallel_for and the like in a small and easy
language from MIT, that I think are simple enough.

I asked for parallelization support for foreach... well, ages ago.  At 
the time Walter said no because DMD was years away from being able to do 
anything like that, but perhaps with the new focus on multiprogramming 
one can argue more strongly that it's important to get something like 
this in the spec even if DMD itself doesn't support it.  My request was 
pretty minimal and partially a reaction to foreach_reverse.  It was:

foreach( ... ) // defaults to "fwd"
foreach(fwd)( ... )
foreach(rev)( ... )
foreach(any)( ... )

Thus foreach(any) is eligible for parallelization, while fwd and rev are 
what we have now.  This would be easy enough with templates and another 
keyword:

apply!(fwd)( ... )
etc.

But passing a delegate literal as an argument isn't nearly as nice as 
the built-in foreach.  And Tom's (IIRC) proposal to clean up the syntax 
for this doesn't look like it will ever be accepted.

 Other ways to use parallel code are now being pushed by Intel, OpenMP, and the
hairy but usable CUDA by Nvidia (I am not sure I want to learn CUDA, it's a C
variant, but seems to require a large human memory and a large human brain to
be used, while I think D may have simpler built-in things. Much more "serious"
D programmers may use external libs that allow them any fine control they
want). I to me they look too much in flux now to be copied too much by D.

D already has coroutines, DCSP, and futures available from various 
programmers (Mikola Lysenko for the first two), so I think the state of 
multiprogramming in D is actually pretty good even without additional 
language support.


Sean

Jul 15 2008

downs <default_357-line yahoo.de> writes:

Sean Kelly wrote:
 D already has coroutines, DCSP, and futures available from various
 programmers (Mikola Lysenko for the first two), so I think the state of
 multiprogramming in D is actually pretty good even without additional
 language support.
 

 Sean

For what it's worth, coroutines and futures are also in tools. (
http://dsource.org/projects/scrapple/browser/trunk/tools/tools )

Also, I agree with your sentiment.

 --downs

Jul 15 2008

downs <default_357-line yahoo.de> writes:

bearophile wrote:
 How much time do we have to wait to see some parallel processing features in
D? People are getting more and more rabid because they have few ways to use
their 2-4 core CPUs.
 Classic multithreading is useful, but sometimes it's not easy to use correctly.
 

Grow a pair and use threads. It's not _that_ hard.

 There are other ways to write parallel code, that D may adopt (more than one
way is probably better, no silver bullet exists in this field). Their point is
to allow to use the 2-4+ core CPUs of today (and maybe the 80-1000+ cores of
the future) in non-speed-critical parts of the code where the programmer wants
to use the other cores anyway, without too much programming efforts.
 
 I think Walter wants D language to be multi-paradigm; one of the best ways to
allow multi-processing in a simple and safer way is the Stream Processing
(http://en.wikipedia.org/wiki/Stream_Processing ), D syntax may grow few
constructs to use such kind of programming in a simple way (C++ has some such
libs, I think).
 
 Another easy way to perform multi processing is to vectorize. It means the
compiler can automatically use all the cores to perform operators like
array1+array2+array3.
 

Patched GDC supports autovectorization with -ftree-vectorize, although that's
single-core.

One of the good things IMHO about D is that its operations are mostly easy to
understand, i.e. there's little magic going on. PLEASE don't change that.

 Another way to perform multi processing is so add to the D syntax the
parallel_for (and few related things to merge things back, etc) syntax that was
present in the "Parallel Pascal" language. Such things are quite simpler to use
correctly than threads. The new "Fortress" language by Sun shows similar
things, but they are more refined compared to the Parallel Pascal ones (and
they look more complex to understand and use, so they may be overkill for D, I
don't know. Some of those parallel things of Fortress look quite difficult to
implement to me).

 Time ago I have seen a form of parallel_for and the like in a small and easy
language from MIT, that I think are simple enough.

auto tp = new Threadpool(4);
tp.mt_foreach(Range[4], (int e) { });

 
 Other ways to use parallel code are now being pushed by Intel, OpenMP, and the
hairy but usable CUDA by Nvidia (I am not sure I want to learn CUDA, it's a C
variant, but seems to require a large human memory and a large human brain to
be used, while I think D may have simpler built-in things. Much more "serious"
D programmers may use external libs that allow them any fine control they
want). I to me they look too much in flux now to be copied too much by D.

Please, no hardware specific features. D is x86 dependent enough as it is, it
would be a bad idea to add dependencies on _graphics cards_.

 Bye,
 bearophile

IMHO what's really needed is good tools to discover interaction between
threads. I'd like a standardized way to grab debug info, like the current back
trace of a std.thread.Thread.

This could be used to implement fairly sophisticated logging.

Also, what I have requested before .. single-instruction function bodies should
be able to omit their {}s, to bring them in line with normal loop statements.

This sounds like a hack, but which is better?

void test()
{
  synchronized(this)
  {
    ...
  }
}

or

void test() synchronized(this)
{
  ...
}

 --downs

Jul 15 2008

Markus Koskimies <markus reaaliaika.net> writes:

On Tue, 15 Jul 2008 19:34:24 -0400, bearophile wrote:

 How much time do we have to wait to see some parallel processing
 features in D? People are getting more and more rabid because they have
 few ways to use their 2-4 core CPUs. Classic multithreading is useful,
 but sometimes it's not easy to use correctly.

A very short answer; for true parallel processing, 2-4 processors is 
nothing. The success of CFL (Control-Flow Languages) like C, C++, D, 
Pascal, Perl, Python, BASICs, Cobol, Comal, PL/I, whitespace, malbolge, 
etc. etc. is that they follow the underlaying paradigm of computer.

There has been many efforts to declare languages, that are implicitly 
parallel. The most used approach is to use DFL (Data-Flow Language) 
paradigms, and the most well-know of these is definitely VHDL. Others are 
e.g. NESL and ID. Then there are several languages that are either in-
between like functional programming languages (Haskell, Erlang) or 
reductive languages (like make and Prolog).

Short references:

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=714561
http://portal.acm.org/citation.cfm?id=359579&dl=GUIDE
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=630241

Especially Hartenstein's articles are good to read, if you are trying to 
understand, why we are still using CFL & RASP, and why parallel 
architectures have failed.

No, the future will show us not any more parallelism at source level. 
Instead, (a) the compilers start to understand source better, to 
parallelize inner kernels of loops automatically, and (b) there will be 
even more layers between source we are writing and the instructions/
configurations processors are executing, and thus the main purpose of 
source language is not any more to follow the underlaying paradigm, but 
productivity - how easy it is to humans to express things; and CFL-
languages are far from their counterparts in this area. Comparing CFL/DFL 
at compiler level, see e.g.

http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/
proceedings/&toc=comp/proceedings/
fccm/1995/7086/00/7086toc.xml&DOI=10.1109/FPGA.1995.477423


If I would asked to say what is the way of writing future programs, I 
would say it is MPS (Message Passing Systems), refer to e.g. Hewitt's 
Actor Model (1973). Furthermore, I would predict processors to start to 
do low-level reconfigurations, e.g. RSIP (Reconfigurable Instruction Set 
Processor) -paradigm. Look google for GARP and the awesome performance 
increasements it can offer for certain tasks.

Jul 15 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Markus Koskimies (markus reaaliaika.net)'s article
 If I would asked to say what is the way of writing future programs, I
 would say it is MPS (Message Passing Systems), refer to e.g. Hewitt's
 Actor Model (1973).

I agree completely.  MP is easy to comprehend (it's how people naturally
operate) and the tech behind it is extremely well established.  I remain
skeptical that we'll see a tremendous amount of automatic parallelization
of ostensibly procedural code by the interpreter (ie. compiler or VM).  For
one thing, it complicates debugging tremendously, not to mention the
error conditions that such translation can introduce.

As an potentially relevant anecdote, after Herb Sutter's presentation on
Concur a year or two ago at SDWest I asked him what should happen if
two threads of an automatically parallelized loop both throw an exception,
given that the C++ spec dictates that having more than one in-flight
exception per thread should call terminate().  He dodged my question and
turned to talk to someone else, who interestingly enough, did make an
attempt to ensure that Herb understood what I was asking, but to no avail.
Implications about Herb aside, I do think that this suggests that there are
known problems with implicit parallelization that everyone is hoping will
just magically disappear.  How can one verify the correctness of code that
may fail if implicitly parallelized but work if not?

 Furthermore, I would predict processors to start to
 do low-level reconfigurations, e.g. RSIP (Reconfigurable Instruction Set
 Processor) -paradigm. Look google for GARP and the awesome performance
 increasements it can offer for certain tasks.

Interestingly, parallel programming is the topic covered by ACM Communications
magazine this month, and I believe there is a bit about this sort of hardware
parallelism in addition to transactional memory, etc.  The articles I've read
so far
have all been well-reasoned and pretty honest about the benefits and problems
with each idea.


Sean

Jul 15 2008

Markus Koskimies <markus reaaliaika.net> writes:

On Wed, 16 Jul 2008 00:53:28 +0000, Sean Kelly wrote:

 == Quote from Markus Koskimies (markus reaaliaika.net)'s article
 If I would asked to say what is the way of writing future programs, I
 would say it is MPS (Message Passing Systems), refer to e.g. Hewitt's
 Actor Model (1973).

 
 I agree completely.  MP is easy to comprehend (it's how people naturally
 operate) and the tech behind it is extremely well established. 

I couldn't agree more. MP is very natural way for us humans to organize 
parallel things. But there is even more behind it; the very fundamental 
reason that restricts computers to come PRAM machines is this our world 
around us. It restricts all physical machines, including computers to 
maximum of three spatial dimensions, and inherently neighborhood 
connected models; and those are very very far from ideal PRAM things...

 I remain
 skeptical that we'll see a tremendous amount of automatic
 parallelization of ostensibly procedural code by the interpreter (ie.
 compiler or VM).  For one thing, it complicates debugging tremendously,
 not to mention the error conditions that such translation can introduce.

Another thing I completely agree. It is not about what could be ideally 
best things, it is the reality that matters. Debugging a highly parallel 
thing, e.g. FPGA hardware, is very, very time-consuming thing.

 As an potentially relevant anecdote, after Herb Sutter's presentation on
 Concur [...]

Many highly skillful people are very bound to the great ideas they have 
in their mind. I'm not an exception :)

 Furthermore, I would predict processors to start to do low-level
 reconfigurations, e.g. RSIP (Reconfigurable Instruction Set Processor)
 -paradigm. Look google for GARP and the awesome performance
 increasements it can offer for certain tasks.

 
 Interestingly, parallel programming is the topic covered by ACM
 Communications magazine this month, and I believe there is a bit about
 this sort of hardware parallelism in addition to transactional memory,
 etc.  The articles I've read so far have all been well-reasoned and
 pretty honest about the benefits and problems with each idea.

If reconfigurable computers - and more or less distributed computing - 
does not come as next major processor architectures, I will go to some 
distant place and shame. They are not ideal nor optimal computers, far 
from that - programming one is very laborous and it is very hard for 
compilers. But they just work.

Jul 15 2008

Markus Koskimies <markus reaaliaika.net> writes:

On Wed, 16 Jul 2008 01:15:23 +0000, Markus Koskimies wrote:

 Many highly skillful people are very bound to the great ideas they have
 in their mind. I'm not an exception :)

Many *even* highly etc. etc. :oops:

Jul 15 2008

JAnderson <ask me.com> writes:

bearophile wrote:
 How much time do we have to wait to see some parallel processing features in
D? People are getting more and more rabid because they have few ways to use
their 2-4 core CPUs.
 Classic multithreading is useful, but sometimes it's not easy to use correctly.
 
 There are other ways to write parallel code, that D may adopt (more than one
way is probably better, no silver bullet exists in this field). Their point is
to allow to use the 2-4+ core CPUs of today (and maybe the 80-1000+ cores of
the future) in non-speed-critical parts of the code where the programmer wants
to use the other cores anyway, without too much programming efforts.
 
 I think Walter wants D language to be multi-paradigm; one of the best ways to
allow multi-processing in a simple and safer way is the Stream Processing
(http://en.wikipedia.org/wiki/Stream_Processing ), D syntax may grow few
constructs to use such kind of programming in a simple way (C++ has some such
libs, I think).
 
 Another easy way to perform multi processing is to vectorize. It means the
compiler can automatically use all the cores to perform operators like
array1+array2+array3.
 
 Another way to perform multi processing is so add to the D syntax the
parallel_for (and few related things to merge things back, etc) syntax that was
present in the "Parallel Pascal" language. Such things are quite simpler to use
correctly than threads. The new "Fortress" language by Sun shows similar
things, but they are more refined compared to the Parallel Pascal ones (and
they look more complex to understand and use, so they may be overkill for D, I
don't know. Some of those parallel things of Fortress look quite difficult to
implement to me).
 Time ago I have seen a form of parallel_for and the like in a small and easy
language from MIT, that I think are simple enough.
 
 Other ways to use parallel code are now being pushed by Intel, OpenMP, and the
hairy but usable CUDA by Nvidia (I am not sure I want to learn CUDA, it's a C
variant, but seems to require a large human memory and a large human brain to
be used, while I think D may have simpler built-in things. Much more "serious"
D programmers may use external libs that allow them any fine control they
want). I to me they look too much in flux now to be copied too much by D.
 
 Bye,
 bearophile

I'm hoping that the new "Pure" stuff Walter is working on, will enable 
the compiler to automatically parrellize things like foreach.  It won't 
be as fast as something that's hand tuned to be faster however it will 
be a hell of a lot easier to write.

-Joel

Jul 15 2008

Jascha Wetzel <ask a-search-engine.de> writes:

agreed, we absolutely need an OpenMP (http://www.openmp.org) 
implementation for D.

bearophile wrote:
 How much time do we have to wait to see some parallel processing features in
D? People are getting more and more rabid because they have few ways to use
their 2-4 core CPUs.
 Classic multithreading is useful, but sometimes it's not easy to use correctly.
 
 There are other ways to write parallel code, that D may adopt (more than one
way is probably better, no silver bullet exists in this field). Their point is
to allow to use the 2-4+ core CPUs of today (and maybe the 80-1000+ cores of
the future) in non-speed-critical parts of the code where the programmer wants
to use the other cores anyway, without too much programming efforts.
 
 I think Walter wants D language to be multi-paradigm; one of the best ways to
allow multi-processing in a simple and safer way is the Stream Processing
(http://en.wikipedia.org/wiki/Stream_Processing ), D syntax may grow few
constructs to use such kind of programming in a simple way (C++ has some such
libs, I think).
 
 Another easy way to perform multi processing is to vectorize. It means the
compiler can automatically use all the cores to perform operators like
array1+array2+array3.
 
 Another way to perform multi processing is so add to the D syntax the
parallel_for (and few related things to merge things back, etc) syntax that was
present in the "Parallel Pascal" language. Such things are quite simpler to use
correctly than threads. The new "Fortress" language by Sun shows similar
things, but they are more refined compared to the Parallel Pascal ones (and
they look more complex to understand and use, so they may be overkill for D, I
don't know. Some of those parallel things of Fortress look quite difficult to
implement to me).
 Time ago I have seen a form of parallel_for and the like in a small and easy
language from MIT, that I think are simple enough.
 
 Other ways to use parallel code are now being pushed by Intel, OpenMP, and the
hairy but usable CUDA by Nvidia (I am not sure I want to learn CUDA, it's a C
variant, but seems to require a large human memory and a large human brain to
be used, while I think D may have simpler built-in things. Much more "serious"
D programmers may use external libs that allow them any fine control they
want). I to me they look too much in flux now to be copied too much by D.
 
 Bye,
 bearophile

Jul 16 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Parallel programming