digitalmars.D - Increasing speed of D applications to Intel C compiled applicaitons' standards
- Teoman Soygul <tsoygul tralsoft.com> Nov 10 2006
- Walter Bright <newshound digitalmars.com> Nov 10 2006
- Bill Baxter <wbaxter gmail.com> Nov 10 2006
- %u <tsoygul tralsoft.com> Nov 10 2006
- Benji Smith <dlanguage benjismith.net> Nov 10 2006
- Sean Kelly <sean f4.ca> Nov 10 2006
- J Duncan <jtd514 nospam.ameritech.net> Nov 10 2006
- Dave <Dave_member pathlink.com> Nov 10 2006
- Sean Kelly <sean f4.ca> Nov 10 2006
- Dave <Dave_member pathlink.com> Nov 10 2006
- Dave <Dave_member pathlink.com> Nov 11 2006
- Bill Baxter <wbaxter gmail.com> Nov 10 2006
- Teoman Soygul <tsoygul tralsoft.com> Nov 10 2006
- Sean Kelly <sean f4.ca> Nov 10 2006
My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome.
If it's identical code, then it's a compiler optimization issue, as identical C code compiled with D should produce identical results. You can try using the D profiler to see if the slowdown is in any specific place. 5% to 20% are generally not big obstacles to overcome, as they are often in one or two spots that can be hand optimized (or written in assembler). It's also my experience that until it reaches 2:1, few people even notice it. Even the same program will vary 5 to 10% in execution time from run to run.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.
I think even Microsoft's compiler is 5-20% slower than the Intel's compiler, depending on the particular code in question (particularly SSE optimizeable things). So I don't think this speed difference is something to be so worried about. Still, there's a big difference between 5% and 20%. Do you have any observations about what sorts of things put you in the 20% category? --bb
Nov 10 2006
"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical). I'll post these results on my blog once I put them into a good graphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)
Nov 10 2006
%u wrote:- Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).
Keep in mind that D's allocator includes a garbage collector. To really do an apples-to-apples comparison, your benchmarking code should create lots of temporary objects. In D, the GC will handle all allocations and de-allocations. In C++, your objects will have to be manually created and destroyed. I'd be more interested in those kinds of benchmarks than in the kinds of micro-benchmarks that just compare the speed of D allocation with the speed of C++ allocation. --benji
Nov 10 2006
%u wrote:"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).
Some of this may be related to DMC's C library implementation vs. Intel's. Could you try the app using DMC and compare it to DMD? Sean
Nov 10 2006
%u wrote:"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical). I'll post these results on my blog once I put them into a good graphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)
so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.
Nov 10 2006
J Duncan wrote:%u wrote:"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler.
IMO, that's not real discouraging considering that Intel C/++/Fortran seems to be considered 'the best' for Intel platforms <g> FWIW - DMD (the compiler, not the language) often lags by a good margin in two major areas that I wish would be improved: floating point calculations and recursion. The D FP spec. doesn't have the same maximum precision prohibitions as the C/++ spec. so it can be more heavily optimized and still follow the spec. DMD doesn't currently take advantage of that though.- Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).
D uses the DMC lib. for most of that stuff so the difference could be there. Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?I'll post these results on my blog once I put them into a good
I didn't see a url for your blog?graphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)
so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.
Nov 10 2006
Dave wrote:Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?
I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code. Sean
Nov 10 2006
Sean Kelly wrote:Dave wrote:Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?
I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.
Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.Sean
Nov 10 2006
Dave wrote:Sean Kelly wrote:Dave wrote:Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?
I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.
Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.
BTW - Since performance has come up a lot lately.. The reason one has to write the loop to get the most out of a simple memset type operation is because things like arr[100..200] = 0; are replaced by a call to memset anyhow. This is something that the compiler really should treat like an intrinsic IMO.Sean
Nov 11 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.
One thing you might want to do is to also compare with C code compiled using the Digital Mars C compiler since dmd and dmc use the same back end. This should help clarify what's due to inherent differences in the languages and what's just differences in compiler optimization. --bb
Nov 10 2006
== Quote from Bill Baxter (wbaxter gmail.com)'s articleTeoman Soygul wrote:My personal benchmarks prove that applications written in D
to 20% slower than identical code written in C and compiled
Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler
all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a
project, speed issues with D seems to be the biggest obstacle
overcome. Teoman Soygul, Alsoft.
using the Digital Mars C compiler since dmd and dmc use the same
end. This should help clarify what's due to inherent
languages and what's just differences in compiler optimization. --bb
Comparing Digital Mars C to Intel C really is a good idea to see if the performance difference is about D & C or D compiler and C compiler. I'll do that comparison also and post it so we can evaluate.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what??
If you can, try to profile the code and see where the difference lies. How long does the application run, is the code identical, etc? 5% for a very short-running app could be attributed to the garbage collector initialization and termination, for example, but this wouldn't be a factor in longer running apps. Sean
Nov 10 2006









Walter Bright <newshound digitalmars.com> 