www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D and Heterogeneous Computing

reply "Josh Klontz" <josh.klontz gmail.com> writes:
Greetings! As someone with a research interest in software 
abstractions for image processing, the D programming language 
appears to offer unsurpassed language features for constructing 
beautiful and efficient programs. With that said, what would 
really get me to abandon C++ is if D supported a heterogenous 
programming model.

My personal inclination would be something closer to OpenACC than 
anything else I've seen available. Though only in the sense that 
I like the idea of writing code once and being able to 
compile/run/debug it with or without automatic 
vectorization/kernelization. Presumably we could achieve more 
elegant syntax with tighter integration into the language. Has 
anyone been working on anything like this? Is this something the 
community would be interested in seeing? What should the solution 
look like?

One path forward could be a patch to the compiler to generate and 
execute OpenCL kernels for appropriately marked-up D code. While 
I'm new the the D language, I'd be happy to work on a proof of 
concept of this if it is something the community thinks would be 
valuable and I could get specific feedback about the right way to 
approach it.
Apr 07 2012
parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC than
 anything else I've seen available. Though only in the sense that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something the
 community would be interested in seeing? What should the solution
 look like?

 One path forward could be a patch to the compiler to generate and
 execute OpenCL kernels for appropriately marked-up D code. While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would be
 valuable and I could get specific feedback about the right way to
 approach it.
I've been using D with CUDA via a high-level wrapper around the driver API. It works very nicely, but it doesn't address the language integration issues. Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That would seem much faster than writing your own back-end.
Apr 07 2012
parent reply "Josh Klontz" <josh.klontz gmail.com> writes:
On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz 
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC 
 than
 anything else I've seen available. Though only in the sense 
 that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something 
 the
 community would be interested in seeing? What should the 
 solution
 look like?

 One path forward could be a patch to the compiler to generate 
 and
 execute OpenCL kernels for appropriately marked-up D code. 
 While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would 
 be
 valuable and I could get specific feedback about the right way 
 to
 approach it.
I've been using D with CUDA via a high-level wrapper around the driver API. It works very nicely, but it doesn't address the language integration issues. Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That would seem much faster than writing your own back-end.
Yes, I certainly don't want to be in the business of writing back-ends. Another idea that came to mind recently was implementing a keyword similar in spirit to "asm": opencl { // Valid opencl code here } And have the compiler automatically handle memory copying of D variables referenced in the kernel code. Would be entirely back-end independent and perhaps pleasant to implement?
Apr 08 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 09.04.2012 6:49, Josh Klontz wrote:
 On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC than
 anything else I've seen available. Though only in the sense that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something the
 community would be interested in seeing? What should the solution
 look like?

 One path forward could be a patch to the compiler to generate and
 execute OpenCL kernels for appropriately marked-up D code. While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would be
 valuable and I could get specific feedback about the right way to
 approach it.
I've been using D with CUDA via a high-level wrapper around the driver API. It works very nicely, but it doesn't address the language integration issues. Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That would seem much faster than writing your own back-end.
Yes, I certainly don't want to be in the business of writing back-ends. Another idea that came to mind recently was implementing a keyword similar in spirit to "asm": opencl { // Valid opencl code here } And have the compiler automatically handle memory copying of D variables referenced in the kernel code. Would be entirely back-end independent and perhaps pleasant to implement?
Take a look at C++ AMP it's almost exactly this thing added to Visual C++ (but of course for now it's DirectCompute): http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx -- Dmitry Olshansky
Apr 08 2012
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 08 Apr 2012 21:49:48 -0500, Josh Klontz <josh.klontz gmail.com> wrote:

 On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC
 than
 anything else I've seen available. Though only in the sense
 that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something
 the
 community would be interested in seeing? What should the
 solution
 look like?

 One path forward could be a patch to the compiler to generate
 and
 execute OpenCL kernels for appropriately marked-up D code.
 While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would
 be
 valuable and I could get specific feedback about the right way
 to
 approach it.
I've been using D with CUDA via a high-level wrapper around the driver API. It works very nicely, but it doesn't address the language integration issues. Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That would seem much faster than writing your own back-end.
Yes, I certainly don't want to be in the business of writing back-ends. Another idea that came to mind recently was implementing a keyword similar in spirit to "asm": opencl { // Valid opencl code here } And have the compiler automatically handle memory copying of D variables referenced in the kernel code. Would be entirely back-end independent and perhaps pleasant to implement?
IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there wouldn't be a need for any language changes.
Apr 09 2012
parent reply "Josh Klontz" <josh.klontz gmail.com> writes:
 IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, 
 there wouldn't be a need for any language changes.
Correct, and that's the underlying power I'm proposing to leverage. IMO, writing OpenCL code involves (at least) the following nuisances: 1) The kernel code needs to be written as a text string within the native code base. 2) Various function calls to the OpenCL library need to be made to manage the runtime, compile kernels, connect arguments to kernels, execute the kernels, and retrieve the results. 3) If you want to build an application both with and without OpenCL as the backend then you have to maintain two versions of every algorithm, one as an OpenCL string and the other in the native language of your program. To me there seems to be a huge opportunity to obviate the above issues and entice new developers to D via some careful engineering at either the compiler or the standard library level to support heterogeneous computing. Certainly technologies like C++ AMP are a step in the right direction, but to my knowledge there currently doesn't exist anything with the following desirable principles: 1) Write the algorithm once, compile for both serial execution on the CPU or massively parallel execution on an OpenCL enabled device. 2) FOSS 3) Runs everywhere the underlying language runs. 4) The underlying language has a robust compiler, active and growing community, solid standard library, elegant language features, etc... Perhaps I was wrong to suggest that this has to be solved at the compiler level. The EPGPU library seems to tackle some of the problems of mixing OpenCL kernels within C++, though the syntax is far from ideal. Thoughts?
Apr 10 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 11.04.2012 0:31, Josh Klontz wrote:
 IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there
 wouldn't be a need for any language changes.
Correct, and that's the underlying power I'm proposing to leverage. IMO, writing OpenCL code involves (at least) the following nuisances: 1) The kernel code needs to be written as a text string within the native code base. 2) Various function calls to the OpenCL library need to be made to manage the runtime, compile kernels, connect arguments to kernels, execute the kernels, and retrieve the results. 3) If you want to build an application both with and without OpenCL as the backend then you have to maintain two versions of every algorithm, one as an OpenCL string and the other in the native language of your program. To me there seems to be a huge opportunity to obviate the above issues and entice new developers to D via some careful engineering at either the compiler or the standard library level to support heterogeneous computing. Certainly technologies like C++ AMP are a step in the right direction, but to my knowledge there currently doesn't exist anything with the following desirable principles: 1) Write the algorithm once, compile for both serial execution on the CPU or massively parallel execution on an OpenCL enabled device. 2) FOSS 3) Runs everywhere the underlying language runs. 4) The underlying language has a robust compiler, active and growing community, solid standard library, elegant language features, etc... Perhaps I was wrong to suggest that this has to be solved at the compiler level. The EPGPU library seems to tackle some of the problems of mixing OpenCL kernels within C++, though the syntax is far from ideal. Thoughts?
From the looks of it this kind of stuff should be easy with tokenzied strings ( q{ code } )+ mixins + some "auto-magic" helpers being run for OpenCL behind the covers. The problematic part is checking that the fragment is using the correct subset of both languages. Ideally API should work along the lines of this: float[] arr1, arr2; //init arr1 & arr2 assert(arr1.length == arr2.length); length = arr1.length; compute!q{ for(int i=0;i<length; i++) arr1[i] += arr2[i]; }(arr1, arr2); where compute works both with plain CPU and even without OpenCL (by simply mixin stuff in) and for OpenCL with a bit of extra binding magic inside compute template. (compute is an eponymous template that alied to static function inside, that in turn is generated by mixin, for concrete example - take a look on how ctRegex template in std.regex does it) Of course, there are some painful details when you go for deeper things and error messages but it should be perfectly doable in normal D even w/o say CTFE parser. -- Dmitry Olshansky
Apr 10 2012
parent reply "Josh Klontz" <josh.klontz gmail.com> writes:
 From the looks of it this kind of stuff should be easy with 
 tokenzied strings ( q{ code } )+ mixins + some "auto-magic" 
 helpers being run for OpenCL behind the covers. The problematic 
 part is checking that the fragment is using the correct subset 
 of both languages.

 Ideally API should work along the lines of this:

 float[] arr1, arr2;
 //init arr1 & arr2
 assert(arr1.length == arr2.length);
 length = arr1.length;
 compute!q{
 	for(int i=0;i<length; i++)
 		arr1[i] += arr2[i];
 }(arr1, arr2);

 where compute works both with plain CPU and even without OpenCL 
 (by simply mixin stuff in) and for OpenCL with a bit of extra 
 binding magic inside compute template.

 (compute is an eponymous template that alied to static function 
 inside, that in turn is generated by mixin, for concrete 
 example - take a look on how ctRegex template in std.regex does 
 it)

 Of course, there are some painful details when you go for 
 deeper things and error messages but it should be perfectly 
 doable in normal D even w/o say CTFE parser.
Awesome, thanks! Will chew on this for a while :)
Apr 10 2012
parent "proxy" <pr xy.com> writes:
 Awesome, thanks! Will chew on this for a while :)
Looking forward to it!! :)
Apr 10 2012