digitalmars.D - Increasing speed of D applications to Intel C compiled applicaitons' standards
- Teoman Soygul (11/11) Nov 10 2006 My personal benchmarks prove that applications written in D are 5%
- Walter Bright (10/20) Nov 10 2006 If it's identical code, then it's a compiler optimization issue, as
- Bill Baxter (8/21) Nov 10 2006 I think even Microsoft's compiler is 5-20% slower than the Intel's
- %u (14/14) Nov 10 2006 "Do you have any observations about what shorts of things put you
- Benji Smith (10/13) Nov 10 2006 Keep in mind that D's allocator includes a garbage collector. To really
- Sean Kelly (4/14) Nov 10 2006 Some of this may be related to DMC's C library implementation vs.
- J Duncan (4/20) Nov 10 2006 so are we talking about a GC issue? I think it would be interesting to
- Dave (13/35) Nov 10 2006 IMO, that's not real discouraging considering that Intel C/++/Fortran se...
- Sean Kelly (9/16) Nov 10 2006 I've been thinking about this as well. These functions are possibly a
- Bill Baxter (6/19) Nov 10 2006 One thing you might want to do is to also compare with C code compiled
- Teoman Soygul (13/32) Nov 10 2006 with
- Sean Kelly (7/13) Nov 10 2006 If you can, try to profile the code and see where the difference lies.
My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome.If it's identical code, then it's a compiler optimization issue, as identical C code compiled with D should produce identical results. You can try using the D profiler to see if the slowdown is in any specific place. 5% to 20% are generally not big obstacles to overcome, as they are often in one or two spots that can be hand optimized (or written in assembler). It's also my experience that until it reaches 2:1, few people even notice it. Even the same program will vary 5 to 10% in execution time from run to run.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.I think even Microsoft's compiler is 5-20% slower than the Intel's compiler, depending on the particular code in question (particularly SSE optimizeable things). So I don't think this speed difference is something to be so worried about. Still, there's a big difference between 5% and 20%. Do you have any observations about what sorts of things put you in the 20% category? --bb
Nov 10 2006
"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical). I'll post these results on my blog once I put them into a good graphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)
Nov 10 2006
%u wrote:- Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).Keep in mind that D's allocator includes a garbage collector. To really do an apples-to-apples comparison, your benchmarking code should create lots of temporary objects. In D, the GC will handle all allocations and de-allocations. In C++, your objects will have to be manually created and destroyed. I'd be more interested in those kinds of benchmarks than in the kinds of micro-benchmarks that just compare the speed of D allocation with the speed of C++ allocation. --benji
Nov 10 2006
%u wrote:"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).Some of this may be related to DMC's C library implementation vs. Intel's. Could you try the app using DMC and compare it to DMD? Sean
Nov 10 2006
%u wrote:"Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler. - Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical). I'll post these results on my blog once I put them into a good graphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.
Nov 10 2006
J Duncan wrote:%u wrote:IMO, that's not real discouraging considering that Intel C/++/Fortran seems to be considered 'the best' for Intel platforms <g> FWIW - DMD (the compiler, not the language) often lags by a good margin in two major areas that I wish would be improved: floating point calculations and recursion. The D FP spec. doesn't have the same maximum precision prohibitions as the C/++ spec. so it can be more heavily optimized and still follow the spec. DMD doesn't currently take advantage of that though."Do you have any observations about what shorts of things put you in the 20% category?" - Luckly the code requiring sheer processing power like math functions(trigs, logs...), b-tree creation, compression, D code runs only 5-6% slower (an averaged mean) compared to Intel C compiler.D uses the DMC lib. for most of that stuff so the difference could be there. Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?- Unfortunately, processes requiring sheer memory access like memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20% slower at D. (note that, code is totally identical).I didn't see a url for your blog?I'll post these results on my blog once I put them into a goodgraphed format so we can discuss it even further. with my limited knowledge on compilers, what i've seen is that intel c compiler has many ingenious optimisations. maybe there can be a way to put the same ideas into D compiler. (i hope)so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.
Nov 10 2006
Dave wrote:Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code. Sean
Nov 10 2006
Sean Kelly wrote:Dave wrote:Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.Sean
Nov 10 2006
Dave wrote:Sean Kelly wrote:BTW - Since performance has come up a lot lately.. The reason one has to write the loop to get the most out of a simple memset type operation is because things like arr[100..200] = 0; are replaced by a call to memset anyhow. This is something that the compiler really should treat like an intrinsic IMO.Dave wrote:Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.Sean
Nov 11 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting a big project, speed issues with D seems to be the biggest obstacle to overcome. Teoman Soygul, Alsoft.One thing you might want to do is to also compare with C code compiled using the Digital Mars C compiler since dmd and dmc use the same back end. This should help clarify what's due to inherent differences in the languages and what's just differences in compiler optimization. --bb
Nov 10 2006
== Quote from Bill Baxter (wbaxter gmail.com)'s articleTeoman Soygul wrote:are 5%My personal benchmarks prove that applications written in Dwithto 20% slower than identical code written in C and compiledlackingIntel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compilerbigall the optimisation parameters) or the standard d libary or what?? Any information/idea would be appriciated. Before starting atoproject, speed issues with D seems to be the biggest obstaclecompiledovercome. Teoman Soygul, Alsoft.One thing you might want to do is to also compare with C codeusing the Digital Mars C compiler since dmd and dmc use the samebackend. This should help clarify what's due to inherentdifferences in thelanguages and what's just differences in compiler optimization. --bbComparing Digital Mars C to Intel C really is a good idea to see if the performance difference is about D & C or D compiler and C compiler. I'll do that comparison also and post it so we can evaluate.
Nov 10 2006
Teoman Soygul wrote:My personal benchmarks prove that applications written in D are 5% to 20% slower than identical code written in C and compiled with Intel C compilter 9. Does anyone know what to do with this problem? Is it a compiler specific problem (ie: D compiler lacking all the optimisation parameters) or the standard d libary or what??If you can, try to profile the code and see where the difference lies. How long does the application run, is the code identical, etc? 5% for a very short-running app could be attributed to the garbage collector initialization and termination, for example, but this wouldn't be a factor in longer running apps. Sean
Nov 10 2006