www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Misc questions:- licensing, VC++ IDE compatible, GPGPU, LTCG, QT, SDL

reply Dan W <twinbee42 skytopia.com> writes:
Hi all, I'm toying around with the idea of porting my raytracer codebase to D.
But before committing, I have a few rookie questions:

1: What kind of license is the D compiler under? I'm thinking of shipping a
commercial, close sourced (for now) program with the D compiler (so that users
can compile within the GUI). Is this possible to do, or can I least pay for the
priviledge?

2: Is it possible to use D with the Visual C++ IDE? Preferably, I would like
the apprepriate compiler and D options listed in the options (in place of the
usual c/c++ options).

3: I need my program to be as fast as possible. The Visual C++ compiler has
features such as "link-time code generation" and "Profile guided optimization".
Does D have equivalents?

4: Does D play nicely with QT, SDL, Lua?

5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I just as
easily write GPGPU programs which run as fast as I can with C/C++?
May 16 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Dan W.:

 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation" and "Profile guided optimization".
 Does D have equivalents?

You can't ask a new open source language to have the features of a ten+ years old commercial compiler. If you compile D1-Tango code on Linux 32 bit using LDC using all the correct compile switches you can get performance comparable to C code compiled with GCC. LDC does not have vectorization (that gcc has) but has link-time optimization that GCC 4.5 has just in part. This is the very best performance you can hope with D. General note: a trap D has put itself into: a significant group of people seem interested in D only as a high performance language. But history shows that nearly no new language starts its life being very fast. High performance, especially if you mean it as compared to quite mature C++ implementations, is something that can only come some years after a language has already reached some form of success and people start to use weeks, months or years just tuning the GC, creating whole new kinds of GC, inventing and implementing other D-specific optimizations, implementing a good escape analysis, implementing a good devirtualization+inlining of virtual functions, implementing various different kinds of efficient vectorizations, implementing a good pointer alias analysis, and so on. Today some kind of Java programs running on HotSpot have a performance comparable to C++ programs. JavaScript running on V8 is often less than ten times slower than well compiled C. But for years both Java and JavaScript were dog-slow. Most things in D are designed to require a simple enough compiler, it doesn't need an advanced JIT just to be efficient. So even naively compiled D programs aren't 50 times slower than equivalent C++ programs. Yet, the performance is not the same as commercial C++ compilers, and it will not be like that unless groups of serious people set as their main/only purpose the creation of a efficient D2 compiler. "Performance" is not something that just happens, you need lot of focused work to gain it. Bye, bearophile
May 16 2010
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 You can't ask a new open source language to have the features of a ten+ years
 old commercial compiler.

Right, but dmd is using an optimizer and code generator that has been around for 25 years now. It's optimization is competent and reasonably advanced - the usual data flow optimizations are there, and the expected back end optimizations like register allocation using live range analysis and instruction scheduling are all there. The back end can be improved for floating point, but for integer/pointer work it is excellent. It does not do link time code generation nor profile guided optimization, although in my experiments such features pay off only in a small minority of cases. In my experiments on vector array operations, the improvement from the CPU's vector instructions is disappointing. It always seems to get hopelessly bottlenecked by memory bandwidth. The dmd does have a built-in profiling tool, which is extremely effective in pinpointing trouble spots in the source code. For example, in the recent issue where the spell checker was slow, the profiler pointed the damning finger at exactly where the problem was. (It was an algorithmic problem, not an optimization problem.) Just to brag about how good it can be, DMC++ remains by far the fastest C++ compiler available, and DMD is incredibly fast at compiling. Both are built with the same optimizer and code generator that DMD uses.
May 16 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

Thank you for your answers and explanations.

 The back end can be improved for floating point, but for integer/pointer work
it 
 is excellent.

Only practical experiments can tell if you are right. (a Raytracer uses lot of floating point ops. My small raytracers compiled with dmd are some times slower than the same compiled with ldc).
 It does not do link time code generation nor profile guided optimization, 
 although in my experiments such features pay off only in a small minority of 
 cases.

I agree that profile guided optimization on GCC usually pays little, so I usually I don't use it with GCC. My theory is that it is not using the profiling information well enough yet. Reading the asm output of the Java HotSpot (and this is not easy to do) has shown me that HotSpot performs some things that GCC isn't doing yet, that in numerical programs give a good performance increase. Here I have shown one of the simpler and most effective optimizations done by HotSpot thanks to the profile information it continuously collects: http://llvm.org/bugs/show_bug.cgi?id=5540 Link time optimization, as done by LDC has given a good speedup in several of my programs, I like it enough. It allows to apply all other compiler optimizations more effectively. It's able to decrease the program size too.
 In my experiments on vector array operations, the improvement from the 
 CPU's vector instructions is disappointing. It always seems to get hopelessly 
 bottlenecked by memory bandwidth.

dmd array operations are currently not so useful. But there are several ways to vectorize code that can give very large (up to 10-16 times) speedups on numerical code. This is one of the kinds of vectorization: http://gcc.gnu.org/wiki/Graphite http://wiki.llvm.org/Polyhedral_optimization_framework Another kind of vectorization is performing up to three levels of tiling (when the implemented algorithm is not cache oblivious). Another kind of vectorization is the usage of all the fields of a SSE (and future AVC) registers in parallel. Doing this well seems very hard for compilers (llvm is not able to do it, gcc does it a bit in some situations, and I don't know what the intel compiler does here, I think the intel compiler performs it only if the given C code is written in a specific way that you often have to find by time-consuming trial and error), I don't know why. So this optimization is often done manually, writing asm by hand... if you look at the asm written in video decoders you can see that it's many times faster than the asm produced from C by the best compilers. Then there are true parallel optimizations, that means using more than one CPU core to perform operations, examples of this are Cilk, parallel fors done in a simple way or in a more refined way as in Chapel language, and then there are the various message passing implementations, etc. If you have heavy numerical code and you combine all those things you can often get code 40 times faster or more. To perform all such optimizations you need smart compilers and/or a language that gives lot of semantics to the back-end (as Cilk, Chapel, Fortress). Bye, bearophile
May 16 2010
prev sibling next sibling parent reply %u <twinbee42 skytopia.com> writes:
Hi all, due to the slow speed of my browser and multiple posts, I'll be
posting just one email which covers everything. Please let me know if
replying to each individually is really preferred. Many thanks for all
and any help.


 May I ask you why are you planning to port an existing codebase to D?
 What kind of benefits specifically(except comparable to C performance)
 you expect from D?

 Thank you.

Sure. There's a couple of reasons really. First is that a lot of 'fluff' in C is rectified in D so that declarations and header files are a thing of the past. Hence less repetition and housekeeping. Second reason is (and I know this might sound idealistic), it'd be nice to promote D more, and get more people using it, since it is a step up from a C in many regards. My code is still fairly small (certainly less than 1 million lines :) ), so it won't be too much hassle. Walter said:
 It does not do link time code generation nor profile guided optimization,
 although in my experiments such features pay off only in a small minority of
 cases.

In VC++, PGO is a great speed help because of inlining, but from what you said later, this doesn't seem to be so much of an issue with D as (like you said), it has access to all the code anyway. I'm a little concerned though about the floating point performance, as raytracing does quite a bit of this of course. The DMC++ compiler you mentioned sounds interesting too. I'd like to compare performance with that, the VC++ one, and the Intel compiler. Thanks to Robert, for recommending VisualD and the bindings. I might try all three D compilers to which gets the best speed, but perhaps LDC seems most promising from what you've said. I suppose in the future when many-core becomes prevalent that compiler optimization won't be so much of an issue because of the relative simplicity compared to the tricks of the present day CPU. One issue I have with the Visual C++ compiler is that it doesn't seem to support loop unswitching (i.e. doubling up code with boolean If statements). I wonder if one of the D compilers supports it. I started a thread over at cprogramming about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html
 I have some decent CUDA bindings with a nice high level API that I'd be
 willing to share/open source. But you still have to write the actual GPU
 kernels in C/C++.

Thanks, I'll bear those in mind. Cheers, Dan
May 18 2010
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
%u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to compare
 performance with that, the VC++ one, and the Intel compiler.

When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.: dmd with dmc gcc with gdc lcc with ldc This is because back ends can vary greatly in the code generated.
May 18 2010
next sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 18/05/10 20:19, retard wrote:
 What if I'm using a clean room implementation of D with a custom backend
 and no accompanying C compiler, am I not allowed to compare the
 performance with anything?

 When people compare C compilers, they usually use the latest Visual
 Studio, gcc, icc, and llvm versions -- i.e. C compilers from various
 vendors. Using the same logic one is not allowed to compare dmc against
 those since it would always lose.

I don't believe Walter is arguing against this methodology. What he is arguing against is comparing dmd with gcc for example. Comparing ldc with gdc and dmd is fine, comparing dmd with dmc is fine, but when it comes to comparing D and C, he believes you should compare compilers using the same backend, that is dmd and dmc rather than dmd and gcc. Or that's what I took from it. This said, I don't agree with that methodology, unless it's only a small test. If you're comparing lots of C compilers and D you should include dmc for example if you're using dmd as the D reference, or clang if you're using ldc as a reference. If you're comparing C and D, you should stick to compilers with the same backend, otherwise the one with the superior backend will always win, and it's not a fair interlanguage comparison.
May 18 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Clipsham:
 otherwise the one with the 
 superior backend will always win, and it's not a fair interlanguage 
 comparison.

Life isn't fair. Too bad for the one with a inferior back-end. Bye, bearophile
May 18 2010
parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Life isn't fair. Too bad for the one with a inferior back-end.

Of course it isn't fair. But if you want to draw useful conclusions from a benchmark, you have to do what is known as "isolate the variables". If there are two independent variables feeding into performance, you CANNOT draw a conclusion about one of them from the performance. In other words, if: g = f(x,y) then knowing g, x and y tells you nothing at all about x's contribution to g.
May 18 2010
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
retard wrote:
 Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
 
 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

with the same back end, i.e.: dmd with dmc gcc with gdc lcc with ldc This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?

You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages.
 When people compare C compilers, they usually use the latest Visual 
 Studio, gcc, icc, and llvm versions -- i.e. C compilers from various 
 vendors. Using the same logic one is not allowed to compare dmc against 
 those since it would always lose.

It's perfectly reasonable to compare dmc and gcc for code generation quality.
May 18 2010
parent Walter Bright <newshound1 digitalmars.com> writes:
retard wrote:
 It's a rookie mistake to believe that languages have some kind of 
 differences performance wise.

Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires. It's like car racing. The performance is a combination of 3 factors: 1. the 'formula' for the particular class you're racing in 2. the quality of the construction of the car to that formula 3. the ability of the driver It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.
May 20 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
%u:
 One issue I have with the Visual C++ compiler is that it doesn't seem to
support
 loop unswitching (i.e. doubling up code with boolean If statements). I wonder
if
 one of the D compilers supports it. I started a thread over at cprogramming
 about it here: http://cboard.cprogramming.com/c-programming/126756-lack-compiler-loop-optimization-loop-unswitching.html

In LDC (LLVM) this optimization is named -loop-unswitch and it's present on default on -O3 and higher. -------------------------- Your C++ code cleaned up a bit: #include <stdio.h> #include <stdlib.h> #include <math.h> double test(bool b) { double d = 0.0; double u = 0.0; for (int n = 0; n < 1000000000; n++) { d += u; if (b) u = sin((double)n); } return d; } int main() { bool b = (bool)atoi("1"); printf("%f\n", test(b)); } The asm generated of just the test() function: g++ -O3 -S __Z4testb: pushl %ebp movl %esp, %ebp pushl %ebx subl $36, %esp cmpb $0, 8(%ebp) jne L2 fldz movl $1000000000, %eax fld %st(0) .p2align 4,,7 L3: subl $1, %eax fadd %st(1), %st jne L3 fstp %st(1) addl $36, %esp popl %ebx popl %ebp ret .p2align 4,,7 L2: fldz xorl %ebx, %ebx fld %st(0) jmp L5 .p2align 4,,7 L9: fxch %st(1) L5: faddp %st, %st(1) movl %ebx, -12(%ebp) addl $1, %ebx fildl -12(%ebp) fstpl (%esp) fstpl -24(%ebp) call _sin cmpl $1000000000, %ebx fldl -24(%ebp) jne L9 fstp %st(1) addl $36, %esp popl %ebx popl %ebp ret ------------------- More aggressive compilation: g++ -O3 -s -fomit-frame-pointer -msse3 -march=native -ffast-math -S __Z4testb: subl $4, %esp cmpb $0, 8(%esp) jne L2 movl $1000000000, %eax .p2align 4,,10 L3: decl %eax jne L3 fldz addl $4, %esp ret .p2align 4,,10 L2: fldz xorl %eax, %eax fld %st(0) .p2align 4,,10 L5: movl %eax, (%esp) faddp %st, %st(1) incl %eax fildl (%esp) cmpl $1000000000, %eax fsin jne L5 fstp %st(0) addl $4, %esp ret -------------------------- This is a D1 translation: import tango.math.Math: sin; import tango.stdc.stdio: printf; import tango.stdc.stdlib: atoi; double test(bool b) { double d = 0.0; double u = 0.0; for (int n; n < 1_000_000_000; n++) { d += u; if (b) u = sin(cast(double)n); } return d; } void main() { bool b = cast(bool)atoi("1"); printf("%f\n", test(b)); } Compiled with: ldc -O3 -release -inline test.d Asm produced, note the je .LBB1_4 near the top: _D5test54testFbZd: pushl %esi subl $64, %esp testb $1, %al je .LBB1_4 pxor %xmm0, %xmm0 movsd %xmm0, 32(%esp) movl $1000000000, %esi movsd %xmm0, 24(%esp) movsd %xmm0, 16(%esp) .align 16 .LBB1_2: movsd 32(%esp), %xmm0 movsd %xmm0, 56(%esp) fldl 56(%esp) fstpt (%esp) call sinl fstpl 48(%esp) movsd 24(%esp), %xmm1 addsd 16(%esp), %xmm1 movsd %xmm1, 24(%esp) decl %esi movsd 32(%esp), %xmm0 addsd .LCPI1_0, %xmm0 movsd %xmm0, 32(%esp) movsd 48(%esp), %xmm0 movsd %xmm0, 16(%esp) ##FP_REG_KILL jne .LBB1_2 .LBB1_3: movsd 24(%esp), %xmm0 movsd %xmm0, 40(%esp) fldl 40(%esp) addl $64, %esp popl %esi ret .LBB1_4: movl $1000000000, %eax .align 16 .LBB1_5: decl %eax jne .LBB1_5 pxor %xmm0, %xmm0 movsd %xmm0, 24(%esp) jmp .LBB1_3 This runs in about 86 seconds. -------------------------- Aggressive compilation with LDC: ldc -O3 -release -inline -enable-unsafe-fp-math -unroll-allow-partial test.d _D5test54testFbZd: subl $92, %esp testb $1, %al je .LBB1_4 pxor %xmm0, %xmm0 xorl %eax, %eax movapd %xmm0, %xmm1 movapd %xmm0, %xmm2 .align 16 .LBB1_2: leal 1(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 40(%esp) leal 2(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 48(%esp) leal 3(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 56(%esp) leal 4(%eax), %ecx cvtsi2sd %ecx, %xmm3 movsd %xmm3, 64(%esp) movsd %xmm0, 80(%esp) fldl 80(%esp) fsin fstpl 72(%esp) fldl 40(%esp) fsin fstpl 8(%esp) fldl 48(%esp) fsin fstpl 16(%esp) fldl 56(%esp) fsin fstpl 24(%esp) fldl 64(%esp) fsin fstpl 32(%esp) addsd %xmm1, %xmm2 addsd 72(%esp), %xmm2 addsd 8(%esp), %xmm2 addsd 16(%esp), %xmm2 movapd %xmm2, %xmm1 addsd 24(%esp), %xmm1 addl $5, %eax cmpl $1000000000, %eax addsd .LCPI1_0, %xmm0 movsd 32(%esp), %xmm2 ##FP_REG_KILL jne .LBB1_2 .LBB1_3: movsd %xmm1, (%esp) fldl (%esp) addl $92, %esp ret .LBB1_4: xorl %eax, %eax .align 16 .LBB1_5: addl $10, %eax cmpl $1000000000, %eax jne .LBB1_5 pxor %xmm1, %xmm1 jmp .LBB1_3 This runs in about 58 seconds. Note also it's partially unrolled 4 times. Here both G++ and LDC are performing loop unswitching. Bye, bearophile
May 18 2010
prev sibling next sibling parent retard <re tard.com.invalid> writes:
Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:

 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

When comparing D performance with C++, it is best to compare compilers with the same back end, i.e.: dmd with dmc gcc with gdc lcc with ldc This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything? When people compare C compilers, they usually use the latest Visual Studio, gcc, icc, and llvm versions -- i.e. C compilers from various vendors. Using the same logic one is not allowed to compare dmc against those since it would always lose.
May 18 2010
prev sibling next sibling parent retard <re tard.com.invalid> writes:
Tue, 18 May 2010 15:03:43 -0700, Walter Bright wrote:

 retard wrote:
 Tue, 18 May 2010 12:13:02 -0700, Walter Bright wrote:
 
 %u wrote:
 The DMC++ compiler you mentioned sounds interesting too. I'd like to
 compare performance with that, the VC++ one, and the Intel compiler.

with the same back end, i.e.: dmd with dmc gcc with gdc lcc with ldc This is because back ends can vary greatly in the code generated.

What if I'm using a clean room implementation of D with a custom backend and no accompanying C compiler, am I not allowed to compare the performance with anything?

You're allowed to do whatever you want. I'm pointing out that the difference in code generator ability should not be misconstrued as a difference in the languages.

It's a rookie mistake to believe that languages have some kind of differences performance wise. That kind of comparison was likely useful in the 80s when languages and instruction sets had a greater resemblance (they were all low level languages). But as you can see from the bearophile's link ( http://blog.llvm.org/2010/05/glasgow-haskell-compiler- and-llvm.html ), there is larger performance gap between a naive and a highly tuned implementation of the same language than between decent implementations of different modern languages. Why developers want to compare dmd with g++ is just because they're not interested in D or D's code generator per se. They have a task to solve and they want the fastest production ready (stable enough to compile their solution) toolchain for the problem - NOW. There is no loyalty left. Most mainstream languages contain the same imperative / object oriented hybrid core with small functional extensions (closures/lambdas). You only need to choose the best for this particular task. Usually there's only a limited amount of time left so you may need to guess. You just have to evaluate partial information snippets, for instance that dmd sucks at inlining closures and Java doesn't do tail call optimization. Ideally a casual developer studies the language grammar for a few hours and then starts writing code. If the language turns out to be bad, he just moves on and forgets it unless the toolchain improves later and there will be a reddit post about it. That's how I met Perl. With years of Pascal/C/C++/Java experience under my belt, I learned that Perl might be a perfect tool for extending apache with our plugin. Few hours of studying (the language) + quite a bit more (the APIdocs) and there I was writing Perl - probably really buggy code, but code nonetheless. There are even languages that consist of visual graphs (the "editor" is just a CAD-like GUI) or sentences written in normal english - they don't have any kind of link between the target machine and the solution other than the abstract computational model. If you encounter a statement such as: find_longest_common_substring(string1, string2); you cannot know how fast it is. This kind of code is getting more popular and it's called declarative - it doesn't tell how it solves it problem, it just tells what it does. It's also the abstraction level that most developers are (should be) using. You may ask, if that statement is faster in C than in Python. The Python coder could just use the one written in C and invoke it via a foreign function interface. The FFI might add few cycles worth of overhead, but overall the algorithm is the same.
May 18 2010
prev sibling parent retard <re tard.com.invalid> writes:
Thu, 20 May 2010 10:06:17 -0700, Walter Bright wrote:

 retard wrote:
 It's a rookie mistake to believe that languages have some kind of
 differences performance wise.

Well, they do. It's also true that these performance differences can be swamped by the quality of the implementation, and the ability of the programmer. But that doesn't mean there are not inherent performance differences due to the semantics the language requires. It's like car racing. The performance is a combination of 3 factors: 1. the 'formula' for the particular class you're racing in 2. the quality of the construction of the car to that formula 3. the ability of the driver It's simply wrong to measure the performance and then naively attribute it to one of those three, pretending the other two are constant.

Of course. The language/implementation comparisons are all faulty. You also need to model the performance of the programmer by building some kind of developer skill profiles and measure how the languages & implementations compete against each other in all these skill classes. For example the language shooutout site favors experienced programmers; bad programmers generate code with 2-3 orders of magnitude worse performance.
May 21 2010
prev sibling next sibling parent reply Robert Clipsham <robert octarineparrot.com> writes:
On 16/05/10 15:27, Dan W wrote:
 Hi all, I'm toying around with the idea of porting my raytracer codebase to D.
 But before committing, I have a few rookie questions:

 1: What kind of license is the D compiler under? I'm thinking of shipping a
 commercial, close sourced (for now) program with the D compiler (so that users
 can compile within the GUI). Is this possible to do, or can I least pay for the
 priviledge?

dmd is under 2 (3) licenses, one for the front end and one for the backend. I won't go into details, you can find the details in the archives though. Long story short if you want to redistribute dmd you have to ask Walter for the priviledge. LDC and GDC have no such restrictions, you can include them as long as you don't modify the source, and if you do then you distribute the source as well as the binaries.
 2: Is it possible to use D with the Visual C++ IDE? Preferably, I would like
 the apprepriate compiler and D options listed in the options (in place of the
 usual c/c++ options).

Try VisualD, which was released about a month ago. I haven't tried it yet, I believe it still has some way to go... This said its current feature list looks impressive. http://dsource.org/projects/visuald/
 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation" and "Profile guided optimization".
 Does D have equivalents?

If you want LTO you'll need to use LDC with some fancy compilation steps (I believe bearophile, our resident benchmarker should be able to provide you with these). The downside to LDC is that it does not support exceptions on windows (it will support them as soon as llvm does).
 4: Does D play nicely with QT, SDL, Lua?

See: http://dsource.org/projects/qtd/ - Qt bindings http://dsource.org/projects/luad/ - Lua bindings http://dsource.org/projects/derelict/ - Various bindings for multimedia/game apps including SDL, OpenGL, OpenAL etc
 5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I just as
 easily write GPGPU programs which run as fast as I can with C/C++?

I don't know what the status of this is, I think a couple of people have written some initial bindings for either CUDA or OpenCL, perhaps someone else can enlighten you as to their status. As for their speed it will be just as fast as the equivilant code in C/C++. I hope things go well for you, there's a lot of initial hurdles for getting into D, but once you find your way around them you'll learn to love this great language! There are lots of people that have written ray tracers in D, so should you need assistance there's people who can help.
May 16 2010
parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.
=20

the source even if you didn't modify it. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 16 2010
parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Leandro Lucarella wrote:
 "J=C3=A9r=C3=B4me M. Berger", el 16 de mayo a las 22:50 me escribiste:
 Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.

the source even if you didn't modify it.

The source must be available. You usually don't distribute the source i=

 you didn't modify the program because anyone can find it in the origina=

 place. But when you do modify it, you must provide a way to access the
 source.
=20

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D8<------------------------------ 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) ------------------------------>8=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Since the OP was talking about a commercial distribution, point (c) does not apply and therefore, source must be redistributed. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 17 2010
prev sibling next sibling parent reply Alex Makhotin <alex bitprox.com> writes:
Dan W wrote:
 Hi all, I'm toying around with the idea of porting my raytracer codebase to D.

Hi, May I ask you why are you planning to port an existing codebase to D? What kind of benefits specifically(except comparable to C performance) you expect from D? Thank you. -- Alex Makhotin, the founder of BITPROX, http://bitprox.com
May 16 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Alex Makhotin:

 May I ask you why are you planning to port an existing codebase to D?
 What kind of benefits specifically(except comparable to C performance) 
 you expect from D?

At the moment performance (if compared to C++ code compiled with GCC or ICC) is not a selling point of D. But D can be advertised for its other quality: compared to C or C++ it's very nice to write D code, it's more handy, and a little safer. This can be enough to to justify a switch from C++ to D :-) A problem in such advertising strategy is that lot of people I know don't seem to look for a better C++, it seems they want to keep themselves away from anything that smells a bit of C++ :-( Bye, bearophile
May 16 2010
prev sibling next sibling parent BCS <none anon.com> writes:
Hello Dan,

 Hi all, I'm toying around with the idea of porting my raytracer
 codebase to D. But before committing, I have a few rookie questions:
 
 1: What kind of license is the D compiler under? I'm thinking of
 shipping a commercial, close sourced (for now) program with the D
 compiler (so that users can compile within the GUI). Is this possible
 to do, or can I least pay for the priviledge?

The front end is under an Open Source (R) license. The backbend is open source but only in that you can see the source. Several projects combine the front end with a FOOS back end than can b shipped but you can't ship copies of the official exe without Walters ok but he's been know to give it at no cost if you ask really nicely.
 2: Is it possible to use D with the Visual C++ IDE? Preferably, I
 would like the apprepriate compiler and D options listed in the
 options (in place of the usual c/c++ options).

There is a D plugin that recently got posted that allows that. I've never got it working but I think that's me FUBARing VS.
 3: I need my program to be as fast as possible. The Visual C++
 compiler has features such as "link-time code generation" and "Profile
 guided optimization". Does D have equivalents?

For link time code generation: you might get the same effect via templates (they are way easier under D than C++). As for the other, I think DMD can do some of that but I don't remember the details.
 5: How about compatibility with GPGPU stuff like CUDA and OpenCL?

I remember seeing some work in that direction about 2-3 years ago. If you can get a C API to that stuff, you can do it in D. There might be a wrapper somewhere that gives an API that's cleaner to use from D.
 Can I just as easily write GPGPU programs which run as fast as I can with 

Assuming a reasonable API, you should be able to whip out D code to interact with CUDA/OpenCL at least as fast as you can write the same in C/C++. -- ... <IXOYE><
May 16 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Dan W.:

 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation"

This page explains this topic: http://msdn.microsoft.com/en-us/magazine/cc301698.aspx Bye, bearophile
May 16 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 3: I need my program to be as fast as possible. The Visual C++ compiler has
 features such as "link-time code generation"

This page explains this topic: http://msdn.microsoft.com/en-us/magazine/cc301698.aspx

What's actually happening is interprocedural analysis, and inlining across source modules. In C++ this needs to happen at link time because the C++ compilation module is each source file is completely independent of other source files. This is not true of D. In D, the compiler can (at the option of how it is compiled and how the programmer sets up the source modules) look at all the source to the program. Hence, a lot of inlining can (and does) happen across modules without needing any support from the linker.
May 16 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

This is not true of D. In D, the compiler can<

Thank you for your answers. At the moment D compilers aren't doing this, I think. (LDC performs an optimization at link time). But it's nice that D leaves this optimization opportunity to future D compilers. ---------------------- If you have noticed that Html page lists three optimizations. The first one is the one you have explained. The second optmizations it talks about is custom calling conventions:
Normally, all functions are either cdecl, stdcall, or fastcall. With custom
calling conventions, the back end has enough knowledge that it can pass more
values in registers, and less on the stack. This usually cuts code size and
improves performance.<

I have translated his demo code to D: int foo(int i, int* j, int* k, int l) { *j = *k; *k = i + l; return i + *j + *k + l; } int main(char[][] args) { int i, j, k, l; l = i = args.length; int x = foo(i, &j, &k, l); return x * args.length; } This is how dmd compiles foo() (-O -release): _D7stdcall3fooFiPiPiiZi comdat push EAX mov ECX,8[ESP] mov EDX,[ECX] push EBX mov EBX,010h[ESP] push ESI mov ESI,018h[ESP] push EDI lea EDI,[EAX][ESI] mov [EBX],EDX mov [ECX],EDI mov EAX,[EBX] add EAX,ESI add EAX,EDI add EAX,0Ch[ESP] pop EDI pop ESI pop EBX pop ECX ret 0Ch This is how LDC compiles foo() with -O3 -release: _D4test3fooFiPiPiiZi: pushl %esi movl 8(%esp), %ecx movl (%ecx), %edx movl 12(%esp), %esi movl %edx, (%esi) addl 16(%esp), %eax movl %eax, (%ecx) addl %eax, %eax addl (%esi), %eax popl %esi ret $12 This is the asm of foo() shown in that article: _foo: mov ecx,dword ptr [eax] mov dword ptr [esi],ecx lea ecx,[edi+edx] mov dword ptr [eax],ecx mov eax,dword ptr [esi] // *j add eax,ecx // *k sub-expression (from add eax,edi // l add eax,edx // i ret It seems LDC isn't performing this optimization. ---------------------- The third optimizations it talks about is 'Small TLS Encoding':
When you use __declspec(thread) variables, the code generator stores the
variables at a fixed offset in each per-thread data area. Without LTCG, the
code generator has no idea of how many __declspec(thread) variables there will
be. As such, it must generate code that assumes the worst, and uses a four-byte
offset to access the variable. With LTCG, the code generator has the
opportunity to examine all __declspec(thread) variables, and note how often
they're used. The code generator can put the smaller, more frequently used
variables at the beginning of the per-thread data area and use a one-byte
offset to access them.<

This is the C++ example code he uses: __declspec(thread) int i = 1; int main() { i = 4; return i; } The asm he shows without this optimization: _main: mov eax,dword ptr [__tls_index] mov ecx,dword ptr fs:[2Ch] mov ecx,dword ptr [ecx+eax*4] push 4 pop eax mov dword ptr [ecx+4],eax ret The asm he shows with this optimization: _main: mov eax,dword ptr fs:[0000002Ch] mov ecx,dword ptr [eax] mov eax,4 mov dword ptr [ecx+8],eax ret I have translated that last C++ example in this D code: int i = 1; int main() { i = 4; return i; } I think I can't test this with LDC because it doesn't have TLS/__gshared. dmd compiles it to: __Dmain mov ECX,FS:__tls_array mov EDX,[ECX] mov EAX,4 mov _D4test1ii[EDX],EAX ret On this little example dmd seems to produce similar asm. Bye, bearophile
May 16 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 
 This is not true of D. In D, the compiler can<

Thank you for your answers. At the moment D compilers aren't doing this,

Yes, they are. dmd definitely inlines across source modules.
 The second optmizations it talks about is custom calling conventions:
 
 Normally, all functions are either cdecl, stdcall, or fastcall. With custom
 calling conventions, the back end has enough knowledge that it can pass
 more values in registers, and less on the stack. This usually cuts code
 size and improves performance.<


Right, dmd doesn't do custom calling conventions. But, it is not necessary for D to have the linker do them. As I explained, the compiler has as much source available to it as the user wishes to supply.
 The third optimizations it talks about is 'Small TLS Encoding':
 
 When you use __declspec(thread) variables, the code generator stores the
 variables at a fixed offset in each per-thread data area. Without LTCG, the
 code generator has no idea of how many __declspec(thread) variables there
 will be. As such, it must generate code that assumes the worst, and uses a
 four-byte offset to access the variable. With LTCG, the code generator has
 the opportunity to examine all __declspec(thread) variables, and note how
 often they're used. The code generator can put the smaller, more frequently
 used variables at the beginning of the per-thread data area and use a
 one-byte offset to access them.<


Yes, but you won't find this to be a speed improvement. The various addressing modes all run at the same speed. Furthermore, the use of global variables (and that includes TLS) should be minimized. Use of TLS (or any globals) in a tight loop should be avoided on general principles in favor of caching the value in a local. I don't believe this optimization is worth the effort. Many compilers spend a lot of time trying to optimize access to statics and globals. This ain't low hanging fruit for any but badly written programs.
May 16 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Right, dmd doesn't do custom calling conventions. But, it is not necessary for
D 
 to have the linker do them. As I explained, the compiler has as much source 
 available to it as the user wishes to supply.

I'll talk about this a bit with LLVM devs. Thank you for all your explanations, you often teach me things. Bye, bearophile
May 16 2010
prev sibling next sibling parent "Nick Sabalausky" <a a.a> writes:
"Dan W" <twinbee42 skytopia.com> wrote in message 
news:hsovdd$1s1j$1 digitalmars.com...
 2: Is it possible to use D with the Visual C++ IDE? Preferably, I would 
 like
 the apprepriate compiler and D options listed in the options (in place of 
 the
 usual c/c++ options).

Other people mentioned the recent D plugin for Visual Studio. If that isn't mature enough for you, there's a very mature plugin for Eclipse called Descent: http://www.dsource.org/projects/descent
 3: I need my program to be as fast as possible.

Optimization often seems to be a mixed bag across any two modern languages. On one hand, there are some cases where D can be a little slower than average. For instance, I've heard that the GC isn't great at handling lots of small objects. Bearophile can probably tell you a lot about any slow spots of D, he's done a lot of testing in that area. On the other hand, there's plenty that D is fast with. Other people have mentioned a lot about this already. But I'll also add that the design of D has a few things that can allow certain things to be done in a more efficient way than can easily be done in C/C++. Array slicing (combined with GC), for example, has been shown to go a long way in helping to make a ridiculously fast (and memory-efficient) XML parser with less effort than it would take in C/C++: http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-parsequerymutateserialize/ http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/ http://dotnot.org/blog/archives/2008/03/12/why-is-dtango-so-fast-at-parsing-xml/
May 16 2010
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
"Jérôme M. Berger", el 16 de mayo a las 22:50 me escribiste:
 Robert Clipsham wrote:
 LDC and GDC have no such
 restrictions, you can include them as long as you don't modify the
 source, and if you do then you distribute the source as well as the
 binaries.
 

the source even if you didn't modify it.

The source must be available. You usually don't distribute the source if you didn't modify the program because anyone can find it in the original place. But when you do modify it, you must provide a way to access the source. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- More than 50% of the people in the world have never made Or received a telephone call
May 16 2010
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 16 May 2010 10:27:57 -0400, Dan W <twinbee42 skytopia.com> wrote:
 5: How about compatibility with GPGPU stuff like CUDA and OpenCL? Can I  
 just as
 easily write GPGPU programs which run as fast as I can with C/C++?

I have some decent CUDA bindings with a nice high level API that I'd be willing to share/open source. But you still have to write the actual GPU kernels in C/C++.
May 16 2010