digitalmars.D.announce - DCompute - Native heterogeneous computing for D

digitalmars.D.announce - DCompute - Native heterogeneous computing for D - is here!

Nicholas Wilson (38/38) Feb 26 2017 DCompute is an extension to LDC capable of generating code (with

Rory McGuire via Digitalmars-d-announce (3/12) Feb 26 2017 Awesome! Been wanting this feature since ldc started catching up to dmd.
jmh530 (3/14) Feb 26 2017 Great work.
Nicholas Wilson (8/10) Feb 27 2017 Hmm, I appear to have really mucked up the git submodules.

Mike Parker (4/14) Feb 27 2017 Give the thumbs up on this and I'll put it on reddit in the next

Nicholas Wilson (5/22) Feb 27 2017 Once I get the submodule stuff fixed and do a release of llvm
Nicholas Wilson (4/21) Feb 27 2017 Actually I've got the submodules working so feel free to go

Mike Parker (3/7) Feb 27 2017 Now is a great time.
Mike Parker (4/8) Feb 27 2017 Direct your AMA here:

Guillaume Piolat (8/12) Feb 27 2017 Interesting to write kernels in D, since a limitation of CUDA is

Nicholas Wilson (24/36) Feb 27 2017 Wait you mean you have to explicitly instantiate every instance

Guillaume Piolat (8/27) Feb 27 2017 IIRC, that entry point explosion happens in CUDA when you

Nicholas Wilson (19/58) Feb 27 2017 An simple example because I forgot.

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

DCompute is an extension to LDC capable of generating code (with 
no language changes*) for NVIDIA's NVPTX for use with CUDA, SPIRV 
for use with the OpenCL runtime, and of course the host, all at 
the same time! It is also possible to share implementation of 
algorithms across the host and device.
This will enable writing kernels in D utilising all of D's meta 
programming goodness across the device divide and will allow 
launching those kernels with a level of ease on par with CUDA's 
<<<...>>> syntax. I hope to be giving a talk at DConf2017 about 
this ;), what it enables us to do, what still needs to be done 
and future plans.

DCompute supports all of OpenCL except Images and Pipes (support 
is planned though).
I haven't done any test for CUDA so I'm not sure about the extent 
of support for it, all of the math stuff works, images/textures 
not so sure.

Many thanks to the ldc team (especially Johan) for their guidance 
and patience, Ilya for reminding me that I should upstream my 
work and John Colvin for his DConf2016 talk for making me think 
'surely compiler support can't be too hard'. 10 months later: 
here it is!

The DCompute compiler is available at the dcompute branch of ldc 
[0], you will need my fork of llvm here[1] and the SPIRV 
submodule that comes with it [2] as the llvm to link against. 
There is also a tool for interconversion [3] (I've mucked up the 
submodules a bit, sorry, just clone it into 'tools/llvm-spirv', 
it's not necessary anyway). The device standard library and 
drivers (both WIP) are available here[4].

Please sent bug reports to their respective components, although 
I'm sure I'll see them anyway regardless of where they go.

[0]: https://github.com/ldc-developers/ldc/tree/dcompute
[1]: https://github.com/thewilsonator/llvm/tree/compute
[2]: https://github.com/thewilsonator/llvm-target-spirv
[3]: https://github.com/thewilsonator/llvm-tool-spirv
[4]: https://github.com/libmir/dcompute

* modulo one hack related to resolving intrinsics because there 
is no static context (i.e. static if) for the device(s). 
Basically a 'codegen time if'.

Feb 26 2017

Rory McGuire via Digitalmars-d-announce writes:

On Sun, Feb 26, 2017 at 10:37 AM, Nicholas Wilson via
Digitalmars-d-announce <digitalmars-d-announce puremagic.com> wrote:
 DCompute is an extension to LDC capable of generating code (with no language
 changes*) for NVIDIA's NVPTX for use with CUDA, SPIRV for use with the
 OpenCL runtime, and of course the host, all at the same time! It is also
 possible to share implementation of algorithms across the host and device.
 This will enable writing kernels in D utilising all of D's meta programming
 goodness across the device divide and will allow launching those kernels
 with a level of ease on par with CUDA's <<<...>>> syntax. I hope to be
 giving a talk at DConf2017 about this ;), what it enables us to do, what
 still needs to be done and future plans.

Awesome! Been wanting this feature since ldc started catching up to dmd.

Feb 26 2017

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with 
 CUDA, SPIRV for use with the OpenCL runtime, and of course the 
 host, all at the same time! It is also possible to share 
 implementation of algorithms across the host and device.
 This will enable writing kernels in D utilising all of D's meta 
 programming goodness across the device divide and will allow 
 launching those kernels with a level of ease on par with CUDA's 
 <<<...>>> syntax. I hope to be giving a talk at DConf2017 about 
 this ;), what it enables us to do, what still needs to be done 
 and future plans.

Great work.

Feb 26 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with

Hmm, I appear to have really mucked up the git submodules. 
Unfortunately I have a cold at the moment and fighting git is 
beyond me at the best of times but I'm completely stumped here, 
PRs appreciated. Once this is sorted I'll do a tag and release.

Thanks for the appreciation, please let me know about your 
experiences/bug reports.

Feb 27 2017

Mike Parker <aldacron gmail.com> writes:

On Monday, 27 February 2017 at 08:37:56 UTC, Nicholas Wilson 
wrote:
 On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
 wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with

 Hmm, I appear to have really mucked up the git submodules. 
 Unfortunately I have a cold at the moment and fighting git is 
 beyond me at the best of times but I'm completely stumped here, 
 PRs appreciated. Once this is sorted I'll do a tag and release.

 Thanks for the appreciation, please let me know about your 
 experiences/bug reports.

Give the thumbs up on this and I'll put it on reddit in the next 
window.

Feb 27 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 27 February 2017 at 09:13:22 UTC, Mike Parker wrote:
 On Monday, 27 February 2017 at 08:37:56 UTC, Nicholas Wilson 
 wrote:
 On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
 wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with

 Hmm, I appear to have really mucked up the git submodules. 
 Unfortunately I have a cold at the moment and fighting git is 
 beyond me at the best of times but I'm completely stumped 
 here, PRs appreciated. Once this is sorted I'll do a tag and 
 release.

 Thanks for the appreciation, please let me know about your 
 experiences/bug reports.

 Give the thumbs up on this and I'll put it on reddit in the 
 next window.

Once I get the submodule stuff fixed and do a release of llvm 
I'll let you know.
Hopefully some time tomorrow morning (UTC+8), but maybe in the 
afternoon.

Feb 27 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 27 February 2017 at 09:13:22 UTC, Mike Parker wrote:
 On Monday, 27 February 2017 at 08:37:56 UTC, Nicholas Wilson 
 wrote:
 On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
 wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with

 Hmm, I appear to have really mucked up the git submodules. 
 Unfortunately I have a cold at the moment and fighting git is 
 beyond me at the best of times but I'm completely stumped 
 here, PRs appreciated. Once this is sorted I'll do a tag and 
 release.

 Thanks for the appreciation, please let me know about your 
 experiences/bug reports.

 Give the thumbs up on this and I'll put it on reddit in the 
 next window.

Actually I've got the submodules working so feel free to go 
ahead, the release is only for OSX for ldc's CI. If you could let 
me know when that window is I could post an AMA if I'm awake then.

Feb 27 2017

Mike Parker <aldacron gmail.com> writes:

On Monday, 27 February 2017 at 13:19:00 UTC, Nicholas Wilson 
wrote:

 Actually I've got the submodules working so feel free to go 
 ahead, the release is only for OSX for ldc's CI. If you could 
 let me know when that window is I could post an AMA if I'm 
 awake then.

Now is a great time.

Feb 27 2017

Mike Parker <aldacron gmail.com> writes:

On Monday, 27 February 2017 at 13:19:00 UTC, Nicholas Wilson 
wrote:

 Actually I've got the submodules working so feel free to go 
 ahead, the release is only for OSX for ldc's CI. If you could 
 let me know when that window is I could post an AMA if I'm 
 awake then.

Direct your AMA here:

https://www.reddit.com/r/programming/comments/5wgqmb/dcompute_native_heterogeneous_computing_for_d_is/

Feb 27 2017

Guillaume Piolat <first.last gmail.com> writes:

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
wrote:
 This will enable writing kernels in D utilising all of D's meta 
 programming goodness across the device divide and will allow 
 launching those kernels with a level of ease on par with CUDA's 
 <<<...>>> syntax.

Interesting to write kernels in D, since a limitation of CUDA is 
that you need to multiply the entry points to instantiate a 
template differently, and a limitation of OpenCL C is that you 
need templates and includes in the first place.

How does this work?
Does the host code need something like DerelictCL to work?

Feb 27 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 27 February 2017 at 13:55:23 UTC, Guillaume Piolat 
wrote:
 On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
 wrote:
 This will enable writing kernels in D utilising all of D's 
 meta programming goodness across the device divide and will 
 allow launching those kernels with a level of ease on par with 
 CUDA's <<<...>>> syntax.

 Interesting to write kernels in D, since a limitation of CUDA 
 is that you need to multiply the entry points to instantiate a 
 template differently, and a limitation of OpenCL C is that you 
 need templates and includes in the first place.

Wait you mean you have to explicitly instantiate every instance 
of a templated kernel? Ouch. In D all you need do is have a 
reference to it somewhere, taking it's .mangleof suffices and is 
(part of) how the example below will achieve its elegance.

I should first emphasise the future tense of the second half of 
the sentence you quoted.

 How does this work?

DCompute (the compiler infrastructure) is currently capable of 
building .ptx and .spv as part of the compilation process. They 
can be used directly in any process pipeline you may have already.

 Does the host code need something like DerelictCL/CUDA to work?

If you want to call the kernel, yes. The eventual goal of 
DCompute (the D infrastructure) is to fully wrap and unify and 
abstract the OpeCL/CUDA runtime libraries (most likely provided 
by Derelict), and have something like:

```
Queue q = ...;
Buffer b = ...;
q.enqueue!(myTemplatedKernel!(Foo,bar,baz => 
myTransform(baz)))(b,other, args);
```
Although, there is no need  to wait until DCompute reaches that 
point to use it, you would just have to do the (rather painful) 
API bashing yourself.

Feb 27 2017

Guillaume Piolat <first.last gmail.com> writes:

On Monday, 27 February 2017 at 23:02:43 UTC, Nicholas Wilson 
wrote:
 Interesting to write kernels in D, since a limitation of CUDA 
 is that you need to multiply the entry points to instantiate a 
 template differently, and a limitation of OpenCL C is that you 
 need templates and includes in the first place.

 Wait you mean you have to explicitly instantiate every instance 
 of a templated kernel? Ouch.

IIRC, that entry point explosion happens in CUDA when you 
separate strictly host and device code. Not sure for mixed mode 
as I've never used that.


 I should first emphasise the future tense of the second half of 
 the sentence you quoted.

 How does this work?

 DCompute (the compiler infrastructure) is currently capable of 
 building .ptx and .spv as part of the compilation process. They 
 can be used directly in any process pipeline you may have 
 already.

.ptx, got it.

 Does the host code need something like DerelictCL/CUDA to work?

 If you want to call the kernel, yes. The eventual goal of 
 DCompute (the D infrastructure) is to fully wrap and unify and 
 abstract the OpeCL/CUDA runtime libraries (most likely provided 
 by Derelict), and have something like:

Interesting.
Let me know if you need more things in OpenCL bindings.

Feb 27 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
wrote:
 DCompute is an extension to LDC capable of generating code 
 (with no language changes*) for NVIDIA's NVPTX for use with 
 CUDA, SPIRV for use with the OpenCL runtime, and of course the 
 host, all at the same time! It is also possible to share 
 implementation of algorithms across the host and device.
 This will enable writing kernels in D utilising all of D's meta 
 programming goodness across the device divide and will allow 
 launching those kernels with a level of ease on par with CUDA's 
 <<<...>>> syntax. I hope to be giving a talk at DConf2017 about 
 this ;), what it enables us to do, what still needs to be done 
 and future plans.

 DCompute supports all of OpenCL except Images and Pipes 
 (support is planned though).
 I haven't done any test for CUDA so I'm not sure about the 
 extent of support for it, all of the math stuff works, 
 images/textures not so sure.

 Many thanks to the ldc team (especially Johan) for their 
 guidance and patience, Ilya for reminding me that I should 
 upstream my work and John Colvin for his DConf2016 talk for 
 making me think 'surely compiler support can't be too hard'. 10 
 months later: here it is!

 The DCompute compiler is available at the dcompute branch of 
 ldc [0], you will need my fork of llvm here[1] and the SPIRV 
 submodule that comes with it [2] as the llvm to link against. 
 There is also a tool for interconversion [3] (I've mucked up 
 the submodules a bit, sorry, just clone it into 
 'tools/llvm-spirv', it's not necessary anyway). The device 
 standard library and drivers (both WIP) are available here[4].

 Please sent bug reports to their respective components, 
 although I'm sure I'll see them anyway regardless of where they 
 go.

 [0]: https://github.com/ldc-developers/ldc/tree/dcompute
 [1]: https://github.com/thewilsonator/llvm/tree/compute
 [2]: https://github.com/thewilsonator/llvm-target-spirv
 [3]: https://github.com/thewilsonator/llvm-tool-spirv
 [4]: https://github.com/libmir/dcompute

 * modulo one hack related to resolving intrinsics because there 
 is no static context (i.e. static if) for the device(s). 
 Basically a 'codegen time if'.

An simple example because I forgot.

```
 compute(CompileFor.deviceOnly) module example;
import ldc.attributes;
import ldc.dcomputetypes;
import dcompute.std.index;

 kernel void test(GlobalPointer!float a, GlobalPointer!float b)
{
     auto idx = GlobalIndex.x;
     a[idx] = a[idx] + b[idx];
}
```

then compile with `ldc -mdcompute-targets=ocl-220,cuda-500 
example.d -I/path/to/dcompute`. It will produce two files, 
kernels_ocl220_64.spv and kernels_cuda500_64.ptx when built in 
64-bit mode and kernels_ocl220_32.spv and kernels_cuda500_32.ptx 
in 32 bit mode.

Feb 27 2017

D Programming

C/C++ Programming

Other

digitalmars.D.announce - DCompute - Native heterogeneous computing for D - is here!