digitalmars.D - D and Heterogeneous Computing

Josh Klontz (21/21) Apr 07 2012 Greetings! As someone with a research interest in software

Robert Jacques (2/23) Apr 07 2012 I've been using D with CUDA via a high-level wrapper around the driver A...

Josh Klontz (10/49) Apr 08 2012 Yes, I certainly don't want to be in the business of writing

Dmitry Olshansky (6/49) Apr 08 2012 Take a look at C++ AMP it's almost exactly this thing added to Visual
Robert Jacques (2/53) Apr 09 2012 IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there wou...

Josh Klontz (33/35) Apr 10 2012 Correct, and that's the underlying power I'm proposing to

Dmitry Olshansky (25/60) Apr 10 2012 From the looks of it this kind of stuff should be easy with tokenzied

Josh Klontz (1/25) Apr 10 2012 Awesome, thanks! Will chew on this for a while :)

proxy (1/2) Apr 10 2012 Looking forward to it!! :)

"Josh Klontz" <josh.klontz gmail.com> writes:

Greetings! As someone with a research interest in software 
abstractions for image processing, the D programming language 
appears to offer unsurpassed language features for constructing 
beautiful and efficient programs. With that said, what would 
really get me to abandon C++ is if D supported a heterogenous 
programming model.

My personal inclination would be something closer to OpenACC than 
anything else I've seen available. Though only in the sense that 
I like the idea of writing code once and being able to 
compile/run/debug it with or without automatic 
vectorization/kernelization. Presumably we could achieve more 
elegant syntax with tighter integration into the language. Has 
anyone been working on anything like this? Is this something the 
community would be interested in seeing? What should the solution 
look like?

One path forward could be a patch to the compiler to generate and 
execute OpenCL kernels for appropriately marked-up D code. While 
I'm new the the D language, I'd be happy to work on a proof of 
concept of this if it is something the community thinks would be 
valuable and I could get specific feedback about the right way to 
approach it.

Apr 07 2012

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC than
 anything else I've seen available. Though only in the sense that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something the
 community would be interested in seeing? What should the solution
 look like?

 One path forward could be a patch to the compiler to generate and
 execute OpenCL kernels for appropriately marked-up D code. While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would be
 valuable and I could get specific feedback about the right way to
 approach it.

I've been using D with CUDA via a high-level wrapper around the driver API. It
works very nicely, but it doesn't address the language integration issues.
Might I recommend looking into hooking up LDC to the PTX LLVM back-end. That
would seem much faster than writing your own back-end.

Apr 07 2012

"Josh Klontz" <josh.klontz gmail.com> writes:

On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz 
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC 
 than
 anything else I've seen available. Though only in the sense 
 that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something 
 the
 community would be interested in seeing? What should the 
 solution
 look like?

 One path forward could be a patch to the compiler to generate 
 and
 execute OpenCL kernels for appropriately marked-up D code. 
 While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would 
 be
 valuable and I could get specific feedback about the right way 
 to
 approach it.

 I've been using D with CUDA via a high-level wrapper around the 
 driver API. It works very nicely, but it doesn't address the 
 language integration issues. Might I recommend looking into 
 hooking up LDC to the PTX LLVM back-end. That would seem much 
 faster than writing your own back-end.

Yes, I certainly don't want to be in the business of writing 
back-ends. Another idea that came to mind recently was 
implementing a keyword similar in spirit to "asm":

opencl {
  // Valid opencl code here
}

And have the compiler automatically handle memory copying of D 
variables referenced in the kernel code. Would be entirely 
back-end independent and perhaps pleasant to implement?

Apr 08 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 09.04.2012 6:49, Josh Klontz wrote:
 On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC than
 anything else I've seen available. Though only in the sense that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something the
 community would be interested in seeing? What should the solution
 look like?

 One path forward could be a patch to the compiler to generate and
 execute OpenCL kernels for appropriately marked-up D code. While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would be
 valuable and I could get specific feedback about the right way to
 approach it.

 I've been using D with CUDA via a high-level wrapper around the driver
 API. It works very nicely, but it doesn't address the language
 integration issues. Might I recommend looking into hooking up LDC to
 the PTX LLVM back-end. That would seem much faster than writing your
 own back-end.

 Yes, I certainly don't want to be in the business of writing back-ends.
 Another idea that came to mind recently was implementing a keyword
 similar in spirit to "asm":

 opencl {
 // Valid opencl code here
 }

 And have the compiler automatically handle memory copying of D variables
 referenced in the kernel code. Would be entirely back-end independent
 and perhaps pleasant to implement?

Take a look at C++ AMP it's almost exactly this thing added to Visual 
C++ (but of course for now it's DirectCompute):
http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx

-- 
Dmitry Olshansky

Apr 08 2012

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 08 Apr 2012 21:49:48 -0500, Josh Klontz <josh.klontz gmail.com> wrote:

 On Saturday, 7 April 2012 at 18:47:21 UTC, Robert Jacques wrote:
 On Sat, 07 Apr 2012 11:38:15 -0500, Josh Klontz
 <josh.klontz gmail.com> wrote:

 Greetings! As someone with a research interest in software
 abstractions for image processing, the D programming language
 appears to offer unsurpassed language features for constructing
 beautiful and efficient programs. With that said, what would
 really get me to abandon C++ is if D supported a heterogenous
 programming model.

 My personal inclination would be something closer to OpenACC
 than
 anything else I've seen available. Though only in the sense
 that
 I like the idea of writing code once and being able to
 compile/run/debug it with or without automatic
 vectorization/kernelization. Presumably we could achieve more
 elegant syntax with tighter integration into the language. Has
 anyone been working on anything like this? Is this something
 the
 community would be interested in seeing? What should the
 solution
 look like?

 One path forward could be a patch to the compiler to generate
 and
 execute OpenCL kernels for appropriately marked-up D code.
 While
 I'm new the the D language, I'd be happy to work on a proof of
 concept of this if it is something the community thinks would
 be
 valuable and I could get specific feedback about the right way
 to
 approach it.

 I've been using D with CUDA via a high-level wrapper around the
 driver API. It works very nicely, but it doesn't address the
 language integration issues. Might I recommend looking into
 hooking up LDC to the PTX LLVM back-end. That would seem much
 faster than writing your own back-end.

 Yes, I certainly don't want to be in the business of writing
 back-ends. Another idea that came to mind recently was
 implementing a keyword similar in spirit to "asm":

 opencl {
   // Valid opencl code here
 }

 And have the compiler automatically handle memory copying of D
 variables referenced in the kernel code. Would be entirely
 back-end independent and perhaps pleasant to implement?

IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there wouldn't
be a need for any language changes.

Apr 09 2012

"Josh Klontz" <josh.klontz gmail.com> writes:

 IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, 
 there wouldn't be a need for any language changes.

Correct, and that's the underlying power I'm proposing to
leverage.

IMO, writing OpenCL code involves (at least) the following
nuisances:
1) The kernel code needs to be written as a text string within
the native code base.
2) Various function calls to the OpenCL library need to be made
to manage the runtime, compile kernels, connect arguments to
kernels, execute the kernels, and retrieve the results.
3) If you want to build an application both with and without
OpenCL as the backend then you have to maintain two versions of
every algorithm, one as an OpenCL string and the other in the
native language of your program.

To me there seems to be a huge opportunity to obviate the above
issues and entice new developers to D via some careful
engineering at either the compiler or the standard library level
to support heterogeneous computing. Certainly technologies like
C++ AMP are a step in the right direction, but to my knowledge
there currently doesn't exist anything with the following
desirable principles:
1) Write the algorithm once, compile for both serial execution on
the CPU or massively parallel execution on an OpenCL enabled
device.
2) FOSS
3) Runs everywhere the underlying language runs.
4) The underlying language has a robust compiler, active and
growing community, solid standard library, elegant language
features, etc...

Perhaps I was wrong to suggest that this has to be solved at the
compiler level. The EPGPU library seems to tackle some of the
problems of mixing OpenCL kernels within C++, though the syntax
is far from ideal.

Thoughts?

Apr 10 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 11.04.2012 0:31, Josh Klontz wrote:
 IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there
 wouldn't be a need for any language changes.

 Correct, and that's the underlying power I'm proposing to
 leverage.

 IMO, writing OpenCL code involves (at least) the following
 nuisances:
 1) The kernel code needs to be written as a text string within
 the native code base.
 2) Various function calls to the OpenCL library need to be made
 to manage the runtime, compile kernels, connect arguments to
 kernels, execute the kernels, and retrieve the results.
 3) If you want to build an application both with and without
 OpenCL as the backend then you have to maintain two versions of
 every algorithm, one as an OpenCL string and the other in the
 native language of your program.

 To me there seems to be a huge opportunity to obviate the above
 issues and entice new developers to D via some careful
 engineering at either the compiler or the standard library level
 to support heterogeneous computing. Certainly technologies like
 C++ AMP are a step in the right direction, but to my knowledge
 there currently doesn't exist anything with the following
 desirable principles:
 1) Write the algorithm once, compile for both serial execution on
 the CPU or massively parallel execution on an OpenCL enabled
 device.
 2) FOSS
 3) Runs everywhere the underlying language runs.
 4) The underlying language has a robust compiler, active and
 growing community, solid standard library, elegant language
 features, etc...

 Perhaps I was wrong to suggest that this has to be solved at the
 compiler level. The EPGPU library seems to tackle some of the
 problems of mixing OpenCL kernels within C++, though the syntax
 is far from ideal.

 Thoughts?

 From the looks of it this kind of stuff should be easy with tokenzied 
strings ( q{ code } )+ mixins + some "auto-magic" helpers being run for 
OpenCL behind the covers. The problematic part is checking that the 
fragment is using the correct subset of both languages.

Ideally API should work along the lines of this:

float[] arr1, arr2;
//init arr1 & arr2
assert(arr1.length == arr2.length);
length = arr1.length;
compute!q{
	for(int i=0;i<length; i++)
		arr1[i] += arr2[i];
}(arr1, arr2);

where compute works both with plain CPU and even without OpenCL (by 
simply mixin stuff in) and for OpenCL with a bit of extra binding magic 
inside compute template.

(compute is an eponymous template that alied to static function inside, 
that in turn is generated by mixin, for concrete example - take a look 
on how ctRegex template in std.regex does it)

Of course, there are some painful details when you go for deeper things 
and error messages but it should be perfectly doable in normal D even 
w/o say CTFE parser.


-- 
Dmitry Olshansky

Apr 10 2012

"Josh Klontz" <josh.klontz gmail.com> writes:

 From the looks of it this kind of stuff should be easy with 
 tokenzied strings ( q{ code } )+ mixins + some "auto-magic" 
 helpers being run for OpenCL behind the covers. The problematic 
 part is checking that the fragment is using the correct subset 
 of both languages.

 Ideally API should work along the lines of this:

 float[] arr1, arr2;
 //init arr1 & arr2
 assert(arr1.length == arr2.length);
 length = arr1.length;
 compute!q{
 	for(int i=0;i<length; i++)
 		arr1[i] += arr2[i];
 }(arr1, arr2);

 where compute works both with plain CPU and even without OpenCL 
 (by simply mixin stuff in) and for OpenCL with a bit of extra 
 binding magic inside compute template.

 (compute is an eponymous template that alied to static function 
 inside, that in turn is generated by mixin, for concrete 
 example - take a look on how ctRegex template in std.regex does 
 it)

 Of course, there are some painful details when you go for 
 deeper things and error messages but it should be perfectly 
 doable in normal D even w/o say CTFE parser.

Awesome, thanks! Will chew on this for a while :)

Apr 10 2012

"proxy" <pr xy.com> writes:

 Awesome, thanks! Will chew on this for a while :)

Looking forward to it!! :)

Apr 10 2012

D Programming

C/C++ Programming

Other

digitalmars.D - D and Heterogeneous Computing