www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Has anyone used D with Nvidia's Cuda?

reply Walter Bright <newshound2 digitalmars.com> writes:
http://www.nvidia.com/object/cuda_home_new.html
Apr 03 2015
next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
Honestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Apr 03 2015
parent reply "weaselcat" <weaselcat gmail.com> writes:
On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:
 On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
Honestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Derelict offers cuda bindings.
Apr 03 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/3/2015 11:12 PM, weaselcat wrote:
 On Saturday, 4 April 2015 at 02:59:46 UTC, Rikki Cattermole wrote:
 On 4/04/2015 3:49 p.m., Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
Honestly, I don't think anyone has even tried to create bindings. Let alone use it. Although I think there are OpenCL bindings floating around which has a similar purpose.
Derelict offers cuda bindings.
Ahh, I see: https://github.com/DerelictOrg/DerelictCUDA I don't see it here: http://svn.dsource.org/projects/derelict/branches/Derelict2/doc/index.html If the latter is obsolete, it should perhaps be updated to point to the newer one. The svn one is the first google hit for Derelict.
Apr 04 2015
parent reply "weaselcat" <weaselcat gmail.com> writes:
On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to 
 point to the newer one. The svn one is the first google hit for 
 Derelict.
Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 2:34 AM, weaselcat wrote:
 On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to point to the newer
 one. The svn one is the first google hit for Derelict.
Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?
`D programming language derelict` In any case, the dsource.org page should be fixed or removed. The github page also has problems, * the "Using Derelict" link is dead * "DerelictUtil for Users" has zero information about using D with CUDA, and seems completely irrelevant * no link for "DerelictUtil Wiki" * the example shown is useless * there are no examples of actually running code on a GPU It looks like nothing more than a couple header files (which is a great start, but that's all). In contrast, there's a package to use CUDA with Go: https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/ which is still pretty thin, but much further along.
Apr 04 2015
next sibling parent reply "weaselcat" <weaselcat gmail.com> writes:
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 On 4/4/2015 2:34 AM, weaselcat wrote:
 On Saturday, 4 April 2015 at 09:24:07 UTC, Walter Bright wrote:
 If the latter is obsolete, it should perhaps be updated to 
 point to the newer
 one. The svn one is the first google hit for Derelict.
Top 3 results for me for `dlang derelict` are all his github page/projects, did you just google `derelict` or?
`D programming language derelict` In any case, the dsource.org page should be fixed or removed. The github page also has problems, * the "Using Derelict" link is dead * "DerelictUtil for Users" has zero information about using D with CUDA, and seems completely irrelevant * no link for "DerelictUtil Wiki" * the example shown is useless * there are no examples of actually running code on a GPU
PR?
 It looks like nothing more than a couple header files (which is 
 a great start, but that's all).

 In contrast, there's a package to use CUDA with Go:

   https://archive.fosdem.org/2014/schedule/event/hpc_devroom_go/

 which is still pretty thin, but much further along.
AFAIK almost all derelict repos are maintained almost solely by aldacron, and he maintains a _lot_ of them. https://github.com/DerelictOrg p.s., googling "golang cuda" comes up with almost nothing useful at the top - 4-5 links to the FOSDEM video and some pdfs. I'm not being biased, I seriously can't figure out anything beyond the fosdem video for cuda with go. First result for "dlang cuda" for me is the dub repo for derelict cuda.
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?
Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
Apr 04 2015
next sibling parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 4/04/2015 11:26 p.m., Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?
Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
On that idea, just a thought. But DMD-FE is using the visitor pattern quite a lot to allow the backend to hook into it easily. What if we exposed a set block of code, to CTFE that acted like a backend, but only transformed for the real backend. In other words, allow CTFE to extend the compiler a little like the backend does. To add language features such as transform x code into OpenCL code and have it wrapped nicely into D code. Theoretically if this was done, we could move the iasm into library. Because of CTFE, surely this wouldn't add much code to the front end?
Apr 04 2015
prev sibling next sibling parent reply "weaselcat" <weaselcat gmail.com> writes:
On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?
Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
I really think you're barking up the wrong tree here - cuda is a closed proprietary solution only implemented by one vendor effectively cutting off anyone that doesn't work with nvidia hardware. also, the std.algorithm thing sounds a lot like the C++ library Bolt/Thrust https://github.com/HSA-Libraries/Bolt
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 3:45 AM, weaselcat wrote:
 I really think you're barking up the wrong tree here - cuda is a closed
 proprietary solution only implemented by one vendor effectively cutting off
 anyone that doesn't work with nvidia hardware.
That's right. On the other hand, 1. Nvidia hardware is pervasive, and CUDA has been around for many years. I doubt it is going away anytime soon. 2. It is little effort on our part to support it. 3. We'd have some co-marketing opportunities with Nvidia if we support it. 4. Supporting CUDA doesn't impede supporting OpenCL.
 also, the std.algorithm thing sounds a lot like the C++ library Bolt/Thrust
 https://github.com/HSA-Libraries/Bolt
Yup.
Apr 04 2015
parent "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 17:21:45 UTC, Walter Bright wrote:
 On 4/4/2015 3:45 AM, weaselcat wrote:
 I really think you're barking up the wrong tree here - cuda is 
 a closed
 proprietary solution only implemented by one vendor 
 effectively cutting off
 anyone that doesn't work with nvidia hardware.
That's right. On the other hand, 1. Nvidia hardware is pervasive, and CUDA has been around for many years. I doubt it is going away anytime soon. 2. It is little effort on our part to support it. 3. We'd have some co-marketing opportunities with Nvidia if we support it.
If NVIDIA wants full support for NPP, cuBLAS, and the myriad of libraries depending on the Runtime API then it's more effort.
Apr 04 2015
prev sibling parent "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 10:26:27 UTC, Walter Bright wrote:
 On 4/4/2015 3:04 AM, weaselcat wrote:
 PR?
Exactly! The idea is that GPUs can greatly accelerate code (2x to 1000x), and if D wants to appeal to high performance computing programmers, we need to have a workable way to program the GPU. At this point, it doesn't have to be slick or great, but it has to be doable. Nvidia appears to have put a lot of effort into CUDA, and it shouldn't be hard to work with CUDA given the Derelict D headers, and will give us an answer to D users who want to leverage the GPU. It would also be dazz if someone were to look at std.algorithm and see what could be accelerated with GPU code.
A good OpenCL wrapper library like cl4d would do wonders.
Apr 04 2015
prev sibling next sibling parent "Dmitri Makarov" <dmakarv gmail.com> writes:
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * there are no examples of actually running code on a GPU
I can contribute at least three examples running code on a GPU (the domains are neural networks, bioinformatics, and grid traversal -- these are my ports to D/OpenCL of Rodinia benchmarks http://www.cs.virginia.edu/~skadron/wiki/rodinia/index.php/Rodinia:Accelerating_Compute-Intensive_Applications with_Accelerators), but these examples using OpenCL, not in CUDA.
Apr 04 2015
prev sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless
The problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.
 * there are no examples of actually running code on a GPU
Because it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.
 It looks like nothing more than a couple header files (which is 
 a great start, but that's all).
Maybe we can delete them so that it's not too embarrassing? Serious proposal. In my opinion, the couple header files provide all you need to use CUDA, if you know what you are doing. If you don't, don't do GPGPU.
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 4:29 AM, ponce wrote:
 On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless
The problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.
Oh, I understand that keeping things up to date is always a problem with every third party tool. On the plus side, however, Nvidia seems very good with backwards compatiblity, meaning that when the D bindings get out of date, they will still work. They just won't work with new features.
 * there are no examples of actually running code on a GPU
Because it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.
Of course. But having a couple examples to show it really does work will go a long way. I am not suggesting making any attempt to duplicate Nvidia's documentation in D.
 It looks like nothing more than a couple header files (which is a great start,
 but that's all).
Maybe we can delete them so that it's not too embarrassing? Serious proposal.
If that's the state of things, I'd be happy to take them over and put them in Deimos.
 In my opinion, the couple header files provide all you need to use CUDA, if you
 know what you are doing. If you don't, don't do GPGPU.
That does work for someone who really wants to use CUDA, but not much for someone who is evaluating using D and wants to use the GPU with CUDA.
Apr 04 2015
parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 17:16:19 UTC, Walter Bright wrote:
 On 4/4/2015 4:29 AM, ponce wrote:
 On Saturday, 4 April 2015 at 09:50:09 UTC, Walter Bright wrote:
 * the example shown is useless
The problem with example is that someone have to maintain them. For DerelictBgfx we removed all translated examples. So the Derelict policy is to remove example to avoid them becoming out of date. For the record Aldacron maintains approx. 22 Derelict bindings and I maintain 7 of them, in our free time. Keeping up with all library change is impossible if everyone excpect everything to be up-to-date and with examples.
Oh, I understand that keeping things up to date is always a problem with every third party tool. On the plus side, however, Nvidia seems very good with backwards compatiblity, meaning that when the D bindings get out of date, they will still work. They just won't work with new features.
They doesn't seem to have deprecated any function indeed. That could make examples practical.
 * there are no examples of actually running code on a GPU
Because it's similar to using the Driver/Runtime API in C++, you have to read CUDA documentation.
Of course. But having a couple examples to show it really does work will go a long way. I am not suggesting making any attempt to duplicate Nvidia's documentation in D.
 It looks like nothing more than a couple header files (which 
 is a great start,
 but that's all).
Maybe we can delete them so that it's not too embarrassing? Serious proposal.
If that's the state of things, I'd be happy to take them over and put them in Deimos.
Sure, the licensing of Derelict probably allows it, and deimos and Derelict are complimentary anyway.
Apr 04 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 2:59 PM, ponce wrote:
 Sure, the licensing of Derelict probably allows it, and deimos and Derelict are
 complimentary anyway.
Thanks. I think I'll give it a try and see what it takes to get a simple example working.
Apr 04 2015
prev sibling next sibling parent reply "Dmitri Makarov" <dmakarv gmail.com> writes:
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
No, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
Apr 03 2015
parent reply "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
No, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
How would it be used? At the client level, I mean.
Apr 04 2015
parent Dmitri Makarov via Digitalmars-d <digitalmars-d puremagic.com> writes:
The programmer describes the computations to be done on a device,
invokes the clop compiler via mixin expression passing the string
describing the computations in an OpenCL-like syntax. The compiler
returns D code that includes the generated OpenCL kernel and all the
boiler plate code. The computations can refer to variables declared in
the host application, CLOP will generate the necessary CL buffers and
kernel arguments. Here's an example:

// use CLOP DSL to generate OpenCL kernel and API calls.
mixin( compile(
q{
int max3( int a, int b, int c )
{
int k = a > b ? a : b;
return k > c ? k : c;
}
Antidiagonal NDRange( c : 1 .. cols, r : 1 .. rows ) {
F[c, r] = max3( F[c - 1, r - 1] + S[c + cols * r], F[c - 1, r] -
penalty, F[c, r - 1] - penalty );
} apply( rectangular_blocking( 8, 8 ) )
} ) );

This implements Needleman-Wunsch algorithm in CLOP. It says that the
computation to be done over 2D index space 1..cols by 1..rows. It
requires anti-diagonal synchronization pattern, meaning that the
elements on every anti-diagonal of the index space can be processed in
parallel, but there is global synchronization point between the
diagonals. Also the user requests to optimize this using rectangular
blocking. The variables: cols, rows, S, F, penalty are normal D
variables declared and defined in the application that contains the
above mixin statement.

 You can look at my github repository for more examples
https://github.com/dmakarov/clop but the project is in very early
stage and not yet usable.

Regards,

Dmitri


On Sat, Apr 4, 2015 at 9:00 AM, Vlad Levenfeld via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Saturday, 4 April 2015 at 06:36:49 UTC, Dmitri Makarov wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
No, but I'm building an embedded dsl that will allow to generate opencl kernels and supporting boilerplate opencl api calls at compile-time. it's called clop (openCL OPtimizer). It uses derelict.opencl bindings.
How would it be used? At the client level, I mean.
Apr 04 2015
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
http://code.dlang.org/packages/derelict-cuda
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
http://code.dlang.org/packages/derelict-cuda
I know you have interest in CUDA, have you gotten any D code to work with it?
Apr 04 2015
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:
 On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
http://code.dlang.org/packages/derelict-cuda
I know you have interest in CUDA, have you gotten any D code to work with it?
I use OpenCL as I don't want to be locked to one vendor's hardware. It's hard enough to write portable, efficient GPGPU code without swapping frameworks as well.
Apr 04 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 3:58 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 10:07:16 UTC, Walter Bright wrote:
 On 4/4/2015 2:16 AM, John Colvin wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
http://code.dlang.org/packages/derelict-cuda
I know you have interest in CUDA, have you gotten any D code to work with it?
I use OpenCL as I don't want to be locked to one vendor's hardware. It's hard enough to write portable, efficient GPGPU code without swapping frameworks as well.
A reasonable viewpoint.
Apr 04 2015
prev sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
I wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible. OpenCL 2.x is much more interesting though.
Apr 04 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 2:35 AM, ponce wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
I wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible.
Thank you. How far are you interested in going with it?
 OpenCL 2.x is much more interesting though.
It's slower:
 Furthermore, in studies of straightforward translation of CUDA programs to
 OpenCL C programs, CUDA has been found to outperform OpenCL;[83][86] but the
 performance differences can mostly be attributed to differences in the
 programming model (especially the memory model) and in the optimizations that
 OpenCL C compilers performed as compared to those in the CUDA compiler.
-- http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternatives No reason not to support both, however.
Apr 04 2015
next sibling parent "Dmitri Makarov" <dmakarv gmail.com> writes:
On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:
 It's slower:
However, it's an open standard, will improve, and will be available on devices of any vendor interested in implementing the compiler and the runtime API, which is essentially every vendor of compute devices (CPU, GPU, FPGA, or other accelerators). CUDA will be for Nvidia hardware only. (Not that I am against providing CUDA support for D programmers).
Apr 04 2015
prev sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 4 April 2015 at 10:03:56 UTC, Walter Bright wrote:
 On 4/4/2015 2:35 AM, ponce wrote:
 On Saturday, 4 April 2015 at 02:49:16 UTC, Walter Bright wrote:
 http://www.nvidia.com/object/cuda_home_new.html
I wrote the Driver and Runtime API bindings for https://github.com/DerelictOrg/DerelictCUDA And the one thing I've done with them is loading the functions, create a context and destroy it. So yeah I think using CUDA with D is possible.
Thank you. How far are you interested in going with it?
Not far. I'm currently trying to bootstrap a business solo (hopefully with the help of D) and available time has became significantly sparser. I'd much prefer pass time on Derelict OpenCL bindings (brought to you by MeinMein) if time was an option.
 OpenCL 2.x is much more interesting though.
It's slower:
 Furthermore, in studies of straightforward translation of CUDA 
 programs to
 OpenCL C programs, CUDA has been found to outperform 
 OpenCL;[83][86] but the
 performance differences can mostly be attributed to 
 differences in the
 programming model (especially the memory model) and in the 
 optimizations that
 OpenCL C compilers performed as compared to those in the CUDA 
 compiler.
-- http://en.wikipedia.org/wiki/OpenCL#Portability.2C_performance_and_alternatives
It used to be that CUDA had warps and pinned memory and OpenCL didn't. Now OpenCL 2.0 has several driver providers and also has warps ("sub-groups") and associated warp operations which are super useful for performance. To the extent that I wouldn't recommend building anything new in CUDA. I don't really see what could make OpenCL it slower. But I see really well what is dangerous in making new projects in CUDA nowadays. I was certainly burned by it to some extent. The newest CUDA features don't improve performance (Unified Memory Addressing, Peer copy, and friends). OpenCL compiles to FPGAs, CPUs, GPUs, and has no missing features anymore. We must now forget what was once true about it. With Intel OpenCL SDK even tooling is on par with NVIDIA.
 No reason not to support both, however.
Yep.
Apr 04 2015
parent reply "ponce" <contact gam3sfrommars.fr> writes:
Also consider costs: NVIDIA will artificially limit the speed of 
pinned memory transferts to push you to buy expensive $3000 
discrete GPUs. They have segmented the market to make the most of 
people performance-starved. It goes to the point that you are 
left with $3000 GPUs that are slower than $300 ones, just to get 
the right driver. Hopefully the market will correct them after so 
much milking.
Apr 04 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/4/2015 4:16 AM, ponce wrote:
 Also consider costs: NVIDIA will artificially limit the speed of pinned memory
 transferts to push you to buy expensive $3000 discrete GPUs. They have
segmented
 the market to make the most of people performance-starved. It goes to the point
 that you are left with $3000 GPUs that are slower than $300 ones, just to get
 the right driver. Hopefully the market will correct them after so much milking.
The only thing I can add to that is the people who really want performance will be more than willing to buy the GPU to do it and the $3000 means nothing to them. I.e. people to whom microseconds means money, such as trading software. I don't want to leave any tern unstoned. Also, it seems that we are 95% there in supporting CUDA already. thanks to your header work. Just need to write some examples to make sure it works, and write a few pages of "how to do it". Once that is done, we can approach Nvidia and get them to mention on their site that D supports CUDA. Nvidia is really pushing CUDA, and it will be of mutual benefit for them to promote D and us to support CUDA.
Apr 04 2015