www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - CUDA with D working after all

reply Trass3r <mrmocool gmx.de> writes:
While browsing team0xf's repository I found a small, inconspicuous 
project named dcuda that was never officially mentioned: 
http://team0xf.com:1024/dcuda/

This guy managed to compile at least a few CUDA examples with D.
Just tried out MatrixMultiply and the test passed :)
Jul 19 2009
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 19 Jul 2009 21:52:36 -0400, Trass3r <mrmocool gmx.de> wrote:

 While browsing team0xf's repository I found a small, inconspicuous  
 project named dcuda that was never officially mentioned:  
 http://team0xf.com:1024/dcuda/

 This guy managed to compile at least a few CUDA examples with D.
 Just tried out MatrixMultiply and the test passed :)

Cool. Although the link doesn't seem to be working. I've also been using CUDA and have my own bindings, although I haven't gotten around to hosting them anywhere yet.
Jul 19 2009
next sibling parent reply Trass3r <mrmocool gmx.de> writes:
Robert Jacques schrieb:
 Cool. Although the link doesn't seem to be working.



Port 1024 works for me.
 At first glance, this looks like it only runs on linux, except according 
 to the source it's Windows only... hmm... cool CUDA 2.0 fixed the 
 nvcuda.dll loading bug. Sweet. I didn't notice it in the change log. I 
 have some wrappers which greatly simplify CUDA use. Would the members of 
 team0xf be interested?

I tried it on WinXP Pro SP3, so it works on Windows. Although I could only run the debug version, the release one crashed with access violation.
Jul 20 2009
parent Michal M <gruby team0xf.com> writes:
Trass3r pisze:
 I tried it on WinXP Pro SP3, so it works on Windows. Although I could 
 only run the debug version, the release one crashed with access violation.

Newest 'dcuda' works with CUDA 2.3, contains cufft bindings and simpleCUFFT example.
Jul 24 2009
prev sibling parent reply Trass3r <mrmocool gmx.de> writes:
Robert Jacques schrieb:
 have some wrappers which greatly simplify CUDA use. Would the members of 
 team0xf be interested?

Me too.
Jul 20 2009
next sibling parent Sam Hu <samhudotsamhu gmail.com> writes:
Robert Jacques Wrote:

 On Mon, 20 Jul 2009 08:48:04 -0400, Trass3r <mrmocool gmx.de> wrote:
 
 Robert Jacques schrieb:
 have some wrappers which greatly simplify CUDA use. Would the members  
 of team0xf be interested?

Me too.

Here: https://jshare.johnshopkins.edu/xythoswfs/webui/_xy-3638242_1-t_5kUrZSWG It requires D2 and the new phobos and CUDA 2.2. It's only been tested on XP and Vista and D2.031. If you need a D1 version, let me know and I'll hunt down an old revision. The ddocs for cuda.api and cuda.tests has some example usage. The D2 version has some pretty niffty features, like struct support. Let me know if you need any help.

It seems the package does not contain module cuda.cufft?
Jul 20 2009
prev sibling parent Trass3r <mrmocool gmx.de> writes:
Robert Jacques schrieb:
 Here: 
 https://jshare.johnshopkins.edu/xythoswfs/webui/_xy-3638242_1-t_5kUrZSWG
 It requires D2 and the new phobos and CUDA 2.2. It's only been tested on 
 XP and Vista and D2.031. If you need a D1 version, let me know and I'll 
 hunt down an old revision. The ddocs for cuda.api and cuda.tests has 
 some example usage. The D2 version has some pretty niffty features, like 
 struct support. Let me know if you need any help.

What about that __thread storage class? class Device { protected: static __thread CUdevice dev; // This thread's GPU static __thread int gpu_id = 0; // This thread's GPU id static __thread int rev_major; // The major revision number static __thread int rev_minor; // The minor revision number static __thread size_t totalGM; // Total global memory static __thread string dev_name; // Device name static __thread CUdevprop prop; // This device's properties static __thread CUcontext ctx; // This device's context static __thread isValid = false; // Is the device alive Does this combination of static and __thread work? By the way, are static class members treated as "classic global data"? D 2.030: classic global storage now defaults to TLS (Thread Local Storage). D 2.013: Added __thread storage class for thread local storage. This is for testing purposes only to check out the machinery in the back end. The front end design of this will change.
Jul 21 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Jul 19, 2009 at 11:49 PM, Robert Jacques<sandford jhu.edu> wrote:
 On Sun, 19 Jul 2009 21:52:36 -0400, Trass3r <mrmocool gmx.de> wrote:

 While browsing team0xf's repository I found a small, inconspicuous project
 named dcuda that was never officially mentioned:
 http://team0xf.com:1024/dcuda/

 This guy managed to compile at least a few CUDA examples with D.
 Just tried out MatrixMultiply and the test passed :)

Cool. Although the link doesn't seem to be working. I've also been using CUDA and have my own bindings, although I haven't gotten around to hosting them anywhere yet.

Try http://team0xf.com:8080/dcuda/
Jul 19 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 20 Jul 2009 00:41:19 -0400, Jarrett Billingsley  
<jarrett.billingsley gmail.com> wrote:

 On Sun, Jul 19, 2009 at 11:49 PM, Robert Jacques<sandford jhu.edu> wrote:
 On Sun, 19 Jul 2009 21:52:36 -0400, Trass3r <mrmocool gmx.de> wrote:

 While browsing team0xf's repository I found a small, inconspicuous  
 project
 named dcuda that was never officially mentioned:
 http://team0xf.com:1024/dcuda/

 This guy managed to compile at least a few CUDA examples with D.
 Just tried out MatrixMultiply and the test passed :)

Cool. Although the link doesn't seem to be working. I've also been using CUDA and have my own bindings, although I haven't gotten around to hosting them anywhere yet.

Try http://team0xf.com:8080/dcuda/

Thanks. At first glance, this looks like it only runs on linux, except according to the source it's Windows only... hmm... cool CUDA 2.0 fixed the nvcuda.dll loading bug. Sweet. I didn't notice it in the change log. I have some wrappers which greatly simplify CUDA use. Would the members of team0xf be interested?
Jul 19 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 20 Jul 2009 08:48:04 -0400, Trass3r <mrmocool gmx.de> wrote:

 Robert Jacques schrieb:
 have some wrappers which greatly simplify CUDA use. Would the members  
 of team0xf be interested?

Me too.

Here: https://jshare.johnshopkins.edu/xythoswfs/webui/_xy-3638242_1-t_5kUrZSWG It requires D2 and the new phobos and CUDA 2.2. It's only been tested on XP and Vista and D2.031. If you need a D1 version, let me know and I'll hunt down an old revision. The ddocs for cuda.api and cuda.tests has some example usage. The D2 version has some pretty niffty features, like struct support. Let me know if you need any help.
Jul 20 2009
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 21 Jul 2009 02:09:42 -0400, Sam Hu <samhudotsamhu gmail.com> wrote:

 Robert Jacques Wrote:

 On Mon, 20 Jul 2009 08:48:04 -0400, Trass3r <mrmocool gmx.de> wrote:

 Robert Jacques schrieb:
 have some wrappers which greatly simplify CUDA use. Would the members
 of team0xf be interested?

Me too.

Here: https://jshare.johnshopkins.edu/xythoswfs/webui/_xy-3638242_1-t_5kUrZSWG It requires D2 and the new phobos and CUDA 2.2. It's only been tested on XP and Vista and D2.031. If you need a D1 version, let me know and I'll hunt down an old revision. The ddocs for cuda.api and cuda.tests has some example usage. The D2 version has some pretty niffty features, like struct support. Let me know if you need any help.

It seems the package does not contain module cuda.cufft?

Opps. Just remove the import statement, the file is currently all comments. I stopped maintaining the fft and blas dll shortly after CUDA 1.0 since I wasn't using them. I've updated the ZIP. Also, as before, if you'd like a copy of the FFT bindings, let me know.
Jul 21 2009
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 21 Jul 2009 21:48:45 -0400, Trass3r <mrmocool gmx.de> wrote:
 Robert Jacques schrieb:
 Here:  
 https://jshare.johnshopkins.edu/xythoswfs/webui/_xy-3638242_1-t_5kUrZSWG
 It requires D2 and the new phobos and CUDA 2.2. It's only been tested  
 on XP and Vista and D2.031. If you need a D1 version, let me know and  
 I'll hunt down an old revision. The ddocs for cuda.api and cuda.tests  
 has some example usage. The D2 version has some pretty niffty features,  
 like struct support. Let me know if you need any help.

What about that __thread storage class? class Device { protected: static __thread CUdevice dev; // This thread's GPU static __thread int gpu_id = 0; // This thread's GPU id static __thread int rev_major; // The major revision number static __thread int rev_minor; // The minor revision number static __thread size_t totalGM; // Total global memory static __thread string dev_name; // Device name static __thread CUdevprop prop; // This device's properties static __thread CUcontext ctx; // This device's context static __thread isValid = false; // Is the device alive Does this combination of static and __thread work? By the way, are static class members treated as "classic global data"? D 2.030: classic global storage now defaults to TLS (Thread Local Storage). D 2.013: Added __thread storage class for thread local storage. This is for testing purposes only to check out the machinery in the back end. The front end design of this will change.

Well, I could probably eliminate __thread now, but it doesn't seem to be doing any harm. Device is in flux right now, as it's basically transitioning from a class with data to a struct containing a bunch of static functions. I don't have a multi-GPU setup (sadly) so I haven't played around with any of the multi-gpu/context stuff in CUDA.
Jul 21 2009