www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Increasing speed of D applications to Intel C compiled applicaitons' standards

reply Teoman Soygul <tsoygul tralsoft.com> writes:
My personal benchmarks prove that applications written in D are 5%
to 20% slower than identical code written in C and compiled with
Intel C compilter 9. Does anyone know what to do with this
problem? Is it a compiler specific problem (ie: D compiler lacking
all the optimisation parameters) or the standard d libary or
what??

Any information/idea would be appriciated. Before starting a big
project, speed issues with D seems to be the biggest obstacle to
overcome.

Teoman Soygul,
Alsoft.
Nov 10 2006
next sibling parent Walter Bright <newshound digitalmars.com> writes:
Teoman Soygul wrote:
 My personal benchmarks prove that applications written in D are 5%
 to 20% slower than identical code written in C and compiled with
 Intel C compilter 9. Does anyone know what to do with this
 problem? Is it a compiler specific problem (ie: D compiler lacking
 all the optimisation parameters) or the standard d libary or
 what??
 
 Any information/idea would be appriciated. Before starting a big
 project, speed issues with D seems to be the biggest obstacle to
 overcome.

If it's identical code, then it's a compiler optimization issue, as identical C code compiled with D should produce identical results. You can try using the D profiler to see if the slowdown is in any specific place. 5% to 20% are generally not big obstacles to overcome, as they are often in one or two spots that can be hand optimized (or written in assembler). It's also my experience that until it reaches 2:1, few people even notice it. Even the same program will vary 5 to 10% in execution time from run to run.
Nov 10 2006
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
Teoman Soygul wrote:
 My personal benchmarks prove that applications written in D are 5%
 to 20% slower than identical code written in C and compiled with
 Intel C compilter 9. Does anyone know what to do with this
 problem? Is it a compiler specific problem (ie: D compiler lacking
 all the optimisation parameters) or the standard d libary or
 what??
 
 Any information/idea would be appriciated. Before starting a big
 project, speed issues with D seems to be the biggest obstacle to
 overcome.
 
 Teoman Soygul,
 Alsoft.

I think even Microsoft's compiler is 5-20% slower than the Intel's compiler, depending on the particular code in question (particularly SSE optimizeable things). So I don't think this speed difference is something to be so worried about. Still, there's a big difference between 5% and 20%. Do you have any observations about what sorts of things put you in the 20% category? --bb
Nov 10 2006
parent reply %u <tsoygul tralsoft.com> writes:
"Do you have any observations about what shorts of things put you
in the 20% category?"

- Luckly the code requiring sheer processing power like math
functions(trigs, logs...), b-tree creation, compression, D code
runs only 5-6% slower (an averaged mean) compared to Intel C
compiler.
- Unfortunately, processes requiring sheer memory access like
memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20%
slower at D. (note that, code is totally identical).

I'll post these results on my blog once I put them into a good
graphed format so we can discuss it even further. with my limited
knowledge on compilers, what i've seen is that intel c compiler
has many ingenious optimisations. maybe there can be a way to put
the same ideas into D compiler. (i hope)
Nov 10 2006
next sibling parent Benji Smith <dlanguage benjismith.net> writes:
%u wrote:
 - Unfortunately, processes requiring sheer memory access like
 memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20%
 slower at D. (note that, code is totally identical).

Keep in mind that D's allocator includes a garbage collector. To really do an apples-to-apples comparison, your benchmarking code should create lots of temporary objects. In D, the GC will handle all allocations and de-allocations. In C++, your objects will have to be manually created and destroyed. I'd be more interested in those kinds of benchmarks than in the kinds of micro-benchmarks that just compare the speed of D allocation with the speed of C++ allocation. --benji
Nov 10 2006
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
%u wrote:
 "Do you have any observations about what shorts of things put you
 in the 20% category?"
 
 - Luckly the code requiring sheer processing power like math
 functions(trigs, logs...), b-tree creation, compression, D code
 runs only 5-6% slower (an averaged mean) compared to Intel C
 compiler.
 - Unfortunately, processes requiring sheer memory access like
 memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20%
 slower at D. (note that, code is totally identical).

Some of this may be related to DMC's C library implementation vs. Intel's. Could you try the app using DMC and compare it to DMD? Sean
Nov 10 2006
prev sibling parent reply J Duncan <jtd514 nospam.ameritech.net> writes:
%u wrote:
 "Do you have any observations about what shorts of things put you
 in the 20% category?"
 
 - Luckly the code requiring sheer processing power like math
 functions(trigs, logs...), b-tree creation, compression, D code
 runs only 5-6% slower (an averaged mean) compared to Intel C
 compiler.
 - Unfortunately, processes requiring sheer memory access like
 memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20%
 slower at D. (note that, code is totally identical).
 
 I'll post these results on my blog once I put them into a good
 graphed format so we can discuss it even further. with my limited
 knowledge on compilers, what i've seen is that intel c compiler
 has many ingenious optimisations. maybe there can be a way to put
 the same ideas into D compiler. (i hope)

so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.
Nov 10 2006
parent reply Dave <Dave_member pathlink.com> writes:
J Duncan wrote:
 %u wrote:
 "Do you have any observations about what shorts of things put you
 in the 20% category?"

 - Luckly the code requiring sheer processing power like math
 functions(trigs, logs...), b-tree creation, compression, D code
 runs only 5-6% slower (an averaged mean) compared to Intel C
 compiler.


IMO, that's not real discouraging considering that Intel C/++/Fortran seems to be considered 'the best' for Intel platforms <g> FWIW - DMD (the compiler, not the language) often lags by a good margin in two major areas that I wish would be improved: floating point calculations and recursion. The D FP spec. doesn't have the same maximum precision prohibitions as the C/++ spec. so it can be more heavily optimized and still follow the spec. DMD doesn't currently take advantage of that though.
 - Unfortunately, processes requiring sheer memory access like
 memcopy, mem alloc, de-alloc, stream copy is nearly almost 15->20%
 slower at D. (note that, code is totally identical).


D uses the DMC lib. for most of that stuff so the difference could be there. Could it be that Intel is doing whole program optimization to inline things like memcpy and memset, during linkage (Are you using WPO? -- it may be the default, I can't remember)? I've found that for time critical code I can code my own (e.g.: memset) in D so the compiler can inline it and it will be faster. Perhaps those should be in Phobos instead of the C lib.?
 I'll post these results on my blog once I put them into a good


I didn't see a url for your blog?
 graphed format so we can discuss it even further. with my limited
 knowledge on compilers, what i've seen is that intel c compiler
 has many ingenious optimisations. maybe there can be a way to put
 the same ideas into D compiler. (i hope)

so are we talking about a GC issue? I think it would be interesting to use D for a front end to C . Then basically D code could be ran through the Intel optimizer.

Nov 10 2006
parent reply Sean Kelly <sean f4.ca> writes:
Dave wrote:
 
 Could it be that Intel is doing whole program optimization to inline 
 things like memcpy and memset, during linkage (Are you using WPO? -- it 
 may be the default, I can't remember)? I've found that for time critical 
 code I can code my own (e.g.: memset) in D so the compiler can inline it 
 and it will be faster. Perhaps those should be in Phobos instead of the 
 C lib.?

I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code. Sean
Nov 10 2006
parent reply Dave <Dave_member pathlink.com> writes:
Sean Kelly wrote:
 Dave wrote:
 Could it be that Intel is doing whole program optimization to inline 
 things like memcpy and memset, during linkage (Are you using WPO? -- 
 it may be the default, I can't remember)? I've found that for time 
 critical code I can code my own (e.g.: memset) in D so the compiler 
 can inline it and it will be faster. Perhaps those should be in Phobos 
 instead of the C lib.?

I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.

Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.
 Sean

Nov 10 2006
parent Dave <Dave_member pathlink.com> writes:
Dave wrote:
 Sean Kelly wrote:
 Dave wrote:
 Could it be that Intel is doing whole program optimization to inline 
 things like memcpy and memset, during linkage (Are you using WPO? -- 
 it may be the default, I can't remember)? I've found that for time 
 critical code I can code my own (e.g.: memset) in D so the compiler 
 can inline it and it will be faster. Perhaps those should be in 
 Phobos instead of the C lib.?

I've been thinking about this as well. These functions are possibly a bit much for intrinsics, but it would be fairly trivial to write them in native D or even assembler--would have to inspect the resulting compiled code to see which was better. The only problem offhand is that DMD does not inline functions containing loops, nor does it inline functions containing ASM blocks, so we'd probably be stuck with a function call even with native D code.

Good points - I'd forgotten about not inlining loops.. The way I've "inlined" things like memset() is to just write a foreach if needed.

BTW - Since performance has come up a lot lately.. The reason one has to write the loop to get the most out of a simple memset type operation is because things like arr[100..200] = 0; are replaced by a call to memset anyhow. This is something that the compiler really should treat like an intrinsic IMO.
 Sean


Nov 11 2006
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
Teoman Soygul wrote:
 My personal benchmarks prove that applications written in D are 5%
 to 20% slower than identical code written in C and compiled with
 Intel C compilter 9. Does anyone know what to do with this
 problem? Is it a compiler specific problem (ie: D compiler lacking
 all the optimisation parameters) or the standard d libary or
 what??
 
 Any information/idea would be appriciated. Before starting a big
 project, speed issues with D seems to be the biggest obstacle to
 overcome.
 
 Teoman Soygul,
 Alsoft.

One thing you might want to do is to also compare with C code compiled using the Digital Mars C compiler since dmd and dmc use the same back end. This should help clarify what's due to inherent differences in the languages and what's just differences in compiler optimization. --bb
Nov 10 2006
parent Teoman Soygul <tsoygul tralsoft.com> writes:
== Quote from Bill Baxter (wbaxter gmail.com)'s article
 Teoman Soygul wrote:
 My personal benchmarks prove that applications written in D


 to 20% slower than identical code written in C and compiled


 Intel C compilter 9. Does anyone know what to do with this
 problem? Is it a compiler specific problem (ie: D compiler


 all the optimisation parameters) or the standard d libary or
 what??

 Any information/idea would be appriciated. Before starting a


 project, speed issues with D seems to be the biggest obstacle


 overcome.

 Teoman Soygul,
 Alsoft.


 using the Digital Mars C compiler since dmd and dmc use the same

 end.  This should help clarify what's due to inherent

 languages and what's just differences in compiler optimization.
 --bb

Comparing Digital Mars C to Intel C really is a good idea to see if the performance difference is about D & C or D compiler and C compiler. I'll do that comparison also and post it so we can evaluate.
Nov 10 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Teoman Soygul wrote:
 My personal benchmarks prove that applications written in D are 5%
 to 20% slower than identical code written in C and compiled with
 Intel C compilter 9. Does anyone know what to do with this
 problem? Is it a compiler specific problem (ie: D compiler lacking
 all the optimisation parameters) or the standard d libary or
 what??

If you can, try to profile the code and see where the difference lies. How long does the application run, is the code identical, etc? 5% for a very short-running app could be attributed to the garbage collector initialization and termination, for example, but this wouldn't be a factor in longer running apps. Sean
Nov 10 2006