www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - DCompute: First kernels run successfully

reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
I'm pleased to announce that I have run the first dcompute kernel 
and it was a success!

There is still a fair bit of polish to the driver needed to make 
the API sane and more complete, not to mention more similar to 
the (untested) OpenCL driver API. But it works!
(Contributions are of course greatly welcomed)

The kernel:
```
 compute(CompileFor.deviceOnly)
module dcompute.tests.dummykernels;

import ldc.dcompute;
import dcompute.std.index;

 kernel void saxpy(GlobalPointer!(float) res,
                    float alpha,GlobalPointer!(float) x,
                    GlobalPointer!(float) y,
                    size_t N)
{
     auto i = GlobalIndex.x;
     if (i >= N) return;
     res[i] = alpha*x[i] + y[i];
}
```

The host code:
```
import dcompute.driver.cuda;
import dcompute.tests.dummykernels : saxpy;

Platform.initialise();

auto devs   = Platform.getDevices(theAllocator);
auto ctx    = Context(devs[0]); scope(exit) ctx.detach();

// Change the file to match your GPU.
Program.globalProgram = 
Program.fromFile("./.dub/obj/kernels_cuda210_64.ptx");
auto q = Queue(false);

enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
{
     x[i] = N - i;
     y[i] = i * i;
}
Buffer!(float) b_res, b_x, b_y;
b_res      =  Buffer!(float)(res[]); scope(exit) b_res.release();
b_x        =  Buffer!(float)(x[]);   scope(exit) b_x.release();
b_y        =  Buffer!(float)(y[]);   scope(exit) b_y.release();

b_x.copy!(Copy.hostToDevice); // not quite sold on this interface 
yet.
b_y.copy!(Copy.hostToDevice);

q.enqueue!(saxpy)  // <-- the main magic happens here
     ([N,1,1],[1,1,1])   // the grid
     (b_res,alpha,b_x,b_y, N); // the kernel arguments

b_res.copy!(Copy.deviceToHost);
foreach(i; 0 .. N)
     enforce(res[i] == alpha * x[i] + y[i]);
writeln(res[]); // [640, 636, ... 16134]
```

Simple as that!

Dcompute, as always, is at https://github.com/libmir/dcompute and 
on dub.

To successfully run the dcompute CUDA test you will need a very 
recent LDC (less than two days) with the NVPTX backend* enabled 
along with a CUDA environment and an Nvidia GPU.

*Or wait for LDC 1.4 release real soon(™).

Thanks to the LDC folks for putting up with me ;)

Have fun GPU programming,
Nic
Sep 11 2017
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Monday, 11 September 2017 at 12:23:16 UTC, Nicholas Wilson 
wrote:
 I'm pleased to announce that I have run the first dcompute 
 kernel and it was a success!
Keep up the good work.
Sep 11 2017
prev sibling next sibling parent reply kerdemdemir <kerdemdemir hotmail.com> writes:
Hi Wilson,

Since I believe GPU-CPU hybrid programming is the future I 
believe you are doing a great job for your and D lang's future.

 To successfully run the dcompute CUDA test you will need a very 
 recent LDC (less than two days) with the NVPTX backend* enabled 
 along with a CUDA environment and an Nvidia GPU.

 *Or wait for LDC 1.4 release real soon(™).
Can you please describe a bit about for starters like me how to build recent LDC. Is this "NVPTX backend" a cmake option? And what should I do for making my "CUDA environment" ready? Which packages should I install? Sorry if my questions are so dummy I hope I will be able to add an example. Regards Erdem
Sep 11 2017
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 11 September 2017 at 20:45:43 UTC, kerdemdemir wrote:
 Hi Wilson,

 Since I believe GPU-CPU hybrid programming is the future I 
 believe you are doing a great job for your and D lang's future.

 To successfully run the dcompute CUDA test you will need a 
 very recent LDC (less than two days) with the NVPTX backend* 
 enabled along with a CUDA environment and an Nvidia GPU.

 *Or wait for LDC 1.4 release real soon(™).
Can you please describe a bit about for starters like me how to build recent LDC. Is this "NVPTX backend" a cmake option? And what should I do for making my "CUDA environment" ready? Which packages should I install? Sorry if my questions are so dummy I hope I will be able to add an example. Regards Erdem
Hi Erdem Sorry I've been a bit busy with uni. To build LDC just clone ldc and `git submodule --init` and run cmake, setting LLVM_CONFIG to /path/to/llvm/build/bin/llvm-config and LLVM_INTRINSIC_TD_PATH to /path/to/llvm/source/include/llvm/IR The nvptx backend is enabled by setting LLVM's cmake variable LLVM_TARGETS_TO_BUILD to either "all", or "X86;NVPTX" along with any other archs you want to enable, (without the quotes) and then building LLVM with cmake. This will get picked up by LDC automatically. I just installed the CUDA sdk in its entirety, but I'm sure you don't need everything from it.
Sep 11 2017
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 9/11/2017 5:23 AM, Nicholas Wilson wrote:
 I'm pleased to announce that I have run the first dcompute kernel and it was a 
 success!
Excellent! https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg
Sep 11 2017
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Monday, 11 September 2017 at 22:40:02 UTC, Walter Bright wrote:
 On 9/11/2017 5:23 AM, Nicholas Wilson wrote:
 I'm pleased to announce that I have run the first dcompute 
 kernel and it was a success!
Excellent! https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAgvAAAAJDY4OTI4MmE0LTVlZDgtNGQzYy1iN2U1LWU5Nzk1NjlhNzIwNg.jpg
Indeed the the world domination begin! I just need to get some OpenCL 2.0 capable hardware to test that and we'll be well on the way. AlsoLDC1.4 was just released Yay!
Sep 11 2017