www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why is this D code slower than C++?

reply Bradley Smith <digitalmars-com baysmith.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Jacco Bikker wrote several raytracing articles on DevMaster.net. I took 
his third article and ported it to D. I was surprised to find that the D 
code is approx. 4 times slower than C++.

The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders 
in approx. 5 sec. I am using the DMD and DMC compilers on Windows.

How can the D code be made to run faster?

Thanks,
   Bradley
Jan 16 2007
next sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Bradley Smith wrote:
 Jacco Bikker wrote several raytracing articles on DevMaster.net. I took 
 his third article and ported it to D. I was surprised to find that the D 
 code is approx. 4 times slower than C++.
 
 The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders 
 in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
 
 How can the D code be made to run faster?
 
 Thanks,
   Bradley
 

Your build_d.bat is missing the -release flag? Don't know how much it will gain though. L.
Jan 16 2007
next sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
dmd -O -inline -release: 23.2 secs
dmc -o+speed: 7,6 secs

Averaged over 3 runs.

This is without Bill's "inout" optimization, but with RegisterClass fixed.

L.
Jan 17 2007
prev sibling parent Lionello Lunesu <lio lunesu.remove.com> writes:
OK, ignore my previous post (it was with a debug build of Phobos).

dmd -O -inline -release: 17.7 secs
dmc -o+speed: 7.6 secs

Averaged over 3 runs.

This is without Bill's "inout" optimization, but with RegisterClass 
fixed. Also,

I've also included a std.gc.disable() and I've replaced a "long" with 
"int", but these changes did not have any effect.

L.
Jan 17 2007
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bradley Smith wrote:
 Jacco Bikker wrote several raytracing articles on DevMaster.net. I took 
 his third article and ported it to D. I was surprised to find that the D 
 code is approx. 4 times slower than C++.
 
 The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders 
 in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
 
 How can the D code be made to run faster?
 
 Thanks,
   Bradley
 

That is pretty weird. I noticed that it doesn't work properly with -release add to the compiler flags. If I do add it I just get a lot of flashing of my desktop icons when I run it, rather than a window popping up with a raytracer inside. Any idea why? Anyway, after some tweaking of the D version I got it down to 15 sec, vs 10 sec for C++ version on my machine. Mainly the kinds of thing I did were to make more things inout parameters so they don't get passed by value. Also it looks like maybe your template math functions like DOT and LENGTH aren't getting inlined. Replacing those with the inline code in hotspots like the sphere intersect function sped things up. Here's was the version of Sphere.Intersect I ended up with: int Intersect( inout Ray a_Ray, inout float a_Dist ) { vector3 v = a_Ray.origin; v -= m_Centre; //float b = -DOT!(float, vector3) ( v, a_Ray.direction ); vector3 dir = a_Ray.direction; float b = -(v.x * dir.x + v.y * dir.y + v.z * dir.z); float det = (b * b) - (v.x*v.x+v.y*v.y+v.z*v.z) + m_SqRadius; int retval = MISS; if (det > 0) { det = sqrt( det ); float i2 = b + det; if (i2 > 0) { float i1 = b - det; if (i1 < 0) { if (i2 < a_Dist) { a_Dist = i2; return INPRIM; } } else { if (i1 < a_Dist) { a_Dist = i1; return HIT; } } } } return retval; } The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec. I also tried making similar changes to the C++ version, but they didn't seem to affect the runtime at all. --bb
Jan 16 2007
parent reply %u <u infearof.spm> writes:
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 I noticed that it doesn't work properly with -release add to the
 compiler flags.

an assertion. On my machine the -release flag brings another 25%.
 The inout on the Ray parameter and the other changes to this
 function alone change my D runtime from 22 sec to 15 sec.

The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
Jan 17 2007
next sibling parent Dave <Dave_member pathlink.com> writes:
%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 I noticed that it doesn't work properly with -release add to the
 compiler flags.

an assertion. On my machine the -release flag brings another 25%.
 The inout on the Ray parameter and the other changes to this
 function alone change my D runtime from 22 sec to 15 sec.

The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.

In that respect I'd like to see 'byref' be a synonym for 'inout' as well, so we can tweak those things w/o relying on the compiler, or by using a keyword (inout) that doesn't really fit the situation in which it's being used.
Jan 17 2007
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
%u wrote:
 == Quote from Bill Baxter (dnewsgroup billbaxter.com)'s article
 I noticed that it doesn't work properly with -release add to the
 compiler flags.

an assertion. On my machine the -release flag brings another 25%.
 The inout on the Ray parameter and the other changes to this
 function alone change my D runtime from 22 sec to 15 sec.

The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.

No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents. I guess a new modifier like "byref" is the only option.. L.
Jan 18 2007
parent reply %u <u infearof.spm> writes:
Lionello Lunesu Wrote:
 No, it can't.. Passing a struct by ref will result in unexpected 
 behavior if it changes in some other thread. As always, the default 
 should be safe no matter what, and that means copying the struct's contents.

 I guess a new modifier like "byref" is the only option..

Jan 18 2007
parent Dave <Dave_member pathlink.com> writes:
%u wrote:
 Lionello Lunesu Wrote:
 No, it can't.. Passing a struct by ref will result in unexpected 
 behavior if it changes in some other thread. As always, the default 
 should be safe no matter what, and that means copying the struct's contents.

 I guess a new modifier like "byref" is the only option..


That's Ok as long as all D compilers will most likely rightly determine whether or not to pass the const byref as an optimization. Since this is probably not realistic, I think something like 'byref' is called for. There's been a great debate as to whether or not 'const' is actually enforceable, and unless it is, it would not really be of any value as an optimizer hint (like const can't be counted on as an optimizer hint for C++).
Jan 18 2007
prev sibling next sibling parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Thanks for all the suggestions. It helps, but not enough to make the D 
code faster than the C++. It is now 2.6 times slower. The render times 
are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.

Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)
   Manually inlined DOT function. (Function not being inlined)


Any other suggestions?

Thanks,
   Bradley

Bradley Smith wrote:
 Jacco Bikker wrote several raytracing articles on DevMaster.net. I took 
 his third article and ported it to D. I was surprised to find that the D 
 code is approx. 4 times slower than C++.
 
 The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders 
 in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
 
 How can the D code be made to run faster?
 
 Thanks,
   Bradley
 

Jan 17 2007
next sibling parent reply "nobody_" <spam spam.spam> writes:
I really hope you'll get it faster than the C++ variant.

Might -profile shed some light?
Or maybe I lurk here in learn for a reason :D


 Thanks for all the suggestions. It helps, but not enough to make the D
 code faster than the C++. It is now 2.6 times slower. The render times
 are now approx. 13 sec for raytracer_d and approx. 5 sec for 
 raytracer_cpp.

 Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)
   Manually inlined DOT function. (Function not being inlined)


 Any other suggestions?

Jan 17 2007
parent reply BCS <BCS pathlink.com> writes:
nobody_ wrote:
 I really hope you'll get it faster than the C++ variant.
 
 Might -profile shed some light?
 Or maybe I lurk here in learn for a reason :D
 
 
 
Thanks for all the suggestions. It helps, but not enough to make the D
code faster than the C++. It is now 2.6 times slower. The render times
are now approx. 13 sec for raytracer_d and approx. 5 sec for 
raytracer_cpp.

Here are the changes I've made. Attached is the new code.

  Call RegisterClass outside of assert. (Broken if -release used)
  Apply -release option. (Increases speed in an unknown way)
  Converted templates to regular functions. (Templates not being inlined)
  Manually inlined DOT function. (Function not being inlined)


Any other suggestions?


I ran it with -profile and it takes about 25 min. here's the log http://www.webpages.uidaho.edu/~shro8822/trace.log
Jan 17 2007
next sibling parent reply %u <u infearof.spm> writes:
BCS Wrote:
 here's the log
 
 http://www.webpages.uidaho.edu/~shro8822/trace.log

That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
Jan 17 2007
parent Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
%u wrote:
 BCS Wrote:
 here's the log

 http://www.webpages.uidaho.edu/~shro8822/trace.log

That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.

No, it shows foreach there because a lot of stuff got inlined and it's only seen by the profiler as the foreach's body. In my experience, more meaningful results can be obtained if -profile is used without -inline. -- Tomasz Stachowiak
Jan 17 2007
prev sibling parent "nobody_" <spam spam.spam> writes:
 I ran it with -profile and it takes about 25 min.

Talk about overhead :) cpp took about 7 minutes (log attached)
 here's the log

 http://www.webpages.uidaho.edu/~shro8822/trace.log 

Jan 17 2007
prev sibling next sibling parent reply Steve Horne <stephenwantshornenospam100 aol.com> writes:
On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith
<digitalmars-com baysmith.com> wrote:

Thanks for all the suggestions. It helps, but not enough to make the D 
code faster than the C++. It is now 2.6 times slower. The render times 
are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.

...
Any other suggestions?

I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers. You could try the GNU D compiler - GDC. Since it is using the standard GNU compiler suite backend code generator, it will probably handle the optimisation better. A second option is to split out some key inner-loop calculations and handle them in C, using D for the less performance-sensitive code. Calling C code from D is easy enough, though calling C++ is more of a hassle. This hack could be considered temporary, as the D float performance will no doubt be improved in time. Alternatively, if you don't mind losing portability, you could try using inline assembler for those key inner-loop calculations. If you're a real speed freak, you might even try using SIMD instructions to get 4 float calculations per instruction (and IIRC most SIMD instructions complete in a single clock cycle these days). The down side to that would be lower floating point precision, but for raytracing I wouldn't expect that to be a big deal. -- Remove 'wants' and 'nospam' from e-mail.
Jan 17 2007
parent Steve Horne <stephenwantshornenospam100 aol.com> writes:
On Wed, 17 Jan 2007 22:34:31 +0000, Steve Horne
<stephenwantshornenospam100 aol.com> wrote:

On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith
<digitalmars-com baysmith.com> wrote:

Thanks for all the suggestions. It helps, but not enough to make the D 
code faster than the C++. It is now 2.6 times slower. The render times 
are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.

...
Any other suggestions?

I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers.

On second thoughts, if you're comparing with the DMC compiler for C++, floating point math performance seems a less likely issue. It seems odd that there's such a difference between the DMD and DMC compilers. You'd think the DMD compiler would use much the same back-end code generation that DMC does. -- Remove 'wants' and 'nospam' from e-mail.
Jan 17 2007
prev sibling next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the D 
 code faster than the C++. It is now 2.6 times slower. The render times 
 are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
 
 Here are the changes I've made. Attached is the new code.
 
   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)
   Manually inlined DOT function. (Function not being inlined)

You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb
Jan 17 2007
next sibling parent Dave <Dave_member pathlink.com> writes:
Bill Baxter wrote:
 Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the D 
 code faster than the C++. It is now 2.6 times slower. The render times 
 are now approx. 13 sec for raytracer_d and approx. 5 sec for 
 raytracer_cpp.

 Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)
   Manually inlined DOT function. (Function not being inlined)

You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb

One more thing to try (now that auto classes are allocated on the stack) is to convert the structs to classes and pass those around. Of course you can't return those from things like opSub(), so you'd have to always use opXxxAssign(), etc. I haven't gone over the code in detail, so maybe this is not really feasible but maybe worth a shot? IIRC, one of the problems with using 'inout' as function params. is that those are excluded from consideration for in-lining with the current D compiler front-end.
Jan 17 2007
prev sibling next sibling parent reply %u <u infearof.spm> writes:
Bill Baxter Wrote:
 D still not as fast as the C++, but close.

I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
Jan 18 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
%u wrote:
 Bill Baxter Wrote:
 D still not as fast as the C++, but close.

I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.

It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct. --bb
Jan 18 2007
next sibling parent reply %u <u infearof.spm> writes:
Bill Baxter Wrote:
 So the C++ code is ok.  But it's not clear why Material became a
 class in the D version rather than a struct.

This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
next sibling parent reply Dave <Dave_member pathlink.com> writes:
%u wrote:
 Bill Baxter Wrote:
 So the C++ code is ok.  But it's not clear why Material became a
 class in the D version rather than a struct.

This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(

Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D. C++ is still being used for new development in large part because of great performance, and the language constructs ("expressibility") that make that possible. One area where this keeps popping up in D is being able to pass structs 'byref' w/o necessarily using 'inout'.
Jan 18 2007
parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
Dave wrote:
 %u wrote:
 Bill Baxter Wrote:
 So the C++ code is ok.  But it's not clear why Material became a
 class in the D version rather than a struct.

This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(

Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.

Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, Bradley
Jan 18 2007
next sibling parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Bradley Smith wrote:
 
 
 Dave wrote:
 %u wrote:
 Bill Baxter Wrote:
 So the C++ code is ok.  But it's not clear why Material became a
 class in the D version rather than a struct.

This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(

Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.

Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, Bradley

I think this was a great little benchmark you posted. I hope Walter takes some interest in this too, because he's consistently responded to performance questions with "I bet it'll be the same if you compile with DMC and DMD". But now at last we have a real-world kind of benchmark with which to test that assertion. The answer appears to be negative at the moment, but just as with bugs, you can't fix it if you can't reproduce the problem. And you've given us a very nice repro case. --bb
Jan 18 2007
prev sibling parent %u <u infearof.spm> writes:
Bradley Smith Wrote:
 Thanks for defending me, Dave.

victim to a known source of errors.
Jan 18 2007
prev sibling parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
%u wrote:
 Bill Baxter Wrote:
 So the C++ code is ok.  But it's not clear why Material became a
 class in the D version rather than a struct.

This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(

What technical documentation would be proper? What would it contain?
Jan 18 2007
parent reply %u <u infearof.spm> writes:
Bradley Smith Wrote:
 What technical documentation would be proper? What would it
 contain?

As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all. For example the answer you gave to Bill Baxter: | Because in the C++, GetMaterial returns a pointer. Since other | objects can use the pointer to change the value of the Material | contained within a Primitive, the same behavior was used in the D | code by using a class. If a struct had been used, a copy of Material | would be returned, and changing the Material would have no effect on | the Primitive. | Also, because GetMaterial is called very often, I assume that making | lots of copies of it would decrease performance. Presumably, that | is why the C++ code returns a pointer. would belong into such documentation as well as any other decision that was made during the port. For example I found a ".dup" in the D-version where there was no copying in the cpp-version. The question raises immediately whether this is done with intent or by accident. Without redundancy provided by technical documentation a careful analysis for the necessity of these four characters has to be undertaken.
Jan 18 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
%u wrote:
 Bradley Smith Wrote:
 What technical documentation would be proper? What would it
 contain?

As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all.

Dude, it's a toy raytracer ported from some free code someone posted to a website somewhere. Why should it come with gobs of documentation? But anyway, the original code was part of a series of tutorials. I think the version Bradley posted was probably from this installment: http://www.devmaster.net/articles/raytracing_series/part3.php As the series goes on, the author adds more and more fancy features to the raytracer. Anyway, the tutorials are already far more documentation than you'll find for most free code out in the wild. --bb
Jan 18 2007
prev sibling parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
Bill Baxter wrote:
 %u wrote:
 Bill Baxter Wrote:
 D still not as fast as the C++, but close.

I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.

It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.

Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer. Thanks, Bradley
Jan 18 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Bradley Smith wrote:
 
 
 Bill Baxter wrote:
 %u wrote:
 Bill Baxter Wrote:



 Material is a by-value member of Primitive in the C++ version.  This 
 means it acts more like a D struct than a D class.  GetMaterial calls 
 return a pointer to the Material that's part of the class, and it will 
 have been initialized implicitly by the Primitive constructor using 
 whatever Material's default constructor does.

 So the C++ code is ok.  But it's not clear why Material became a class 
 in the D version rather than a struct.

Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer.

You can return pointers in D too. But anyway, I don't think the change from by-value class in C++ to a by-reference class in D made any difference in the runtime. I wasn't saying that it was wrong that you changed Material to a D class or anything. It's a valid approach and certainly more D-ish than returning a pointer to a struct. --bb
Jan 18 2007
prev sibling parent Bradley Smith <digitalmars-com baysmith.com> writes:
Bill Baxter wrote:
 You left out changing Intersect's Ray argument to be inout.  And 
 generally all Ray (and possibly vector3 parameters) to be inout to avoid 
  the cost of copying them on the stack.

Sorry Bill, that was unintentional. I changed the Raytrace's Ray argument, but forgot the Interect's Ray argument
 
 Also converting vector expressions like
       vector3 v = a_Ray.origin - m_Centre;
 to
       vector3 v = a_Ray.origin;
       v -= m_Centre;
 
 makes a difference.  Changing that one line in the Sphere.Intersect 
 routine changes my runtime from 12.2 to 14.3 sec.

That helps too. The time is now down to approx. 10 sec. (2 times slower than C++).
 
 Interestingly the same sort of transformation to the C++ code didn't 
 seem to make much difference.  It could be related in part to the C++ 
 vector parameters on the operators all taking const vector& (references) 
 vs the D ones being just plain vector3.  Chaging all the operators in 
 the D version to inout may help speed too.

I've tried this "temporary value elimination" optimization in other areas of the code, but the effect is minimal. Based on my experience with Java, I think C++ is very good using return value optimization to eliminate temporary objects. Thanks, Bradley
Jan 18 2007
prev sibling next sibling parent reply Dave <Dave_member pathlink.com> writes:
Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the D 
 code faster than the C++. It is now 2.6 times slower. The render times 
 are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
 
 Here are the changes I've made. Attached is the new code.
 
   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
   Manually inlined DOT function. (Function not being inlined)
 
 
 Any other suggestions?
 
 Thanks,
   Bradley
 
 Bradley Smith wrote:
 Jacco Bikker wrote several raytracing articles on DevMaster.net. I 
 took his third article and ported it to D. I was surprised to find 
 that the D code is approx. 4 times slower than C++.

 The raytracer_d renders in approx. 21 sec and the raytracer_cpp 
 renders in approx. 5 sec. I am using the DMD and DMC compilers on 
 Windows.

 How can the D code be made to run faster?

 Thanks,
   Bradley


Jan 17 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Dave wrote:
 Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the D 
 code faster than the C++. It is now 2.6 times slower. The render times 
 are now approx. 13 sec for raytracer_d and approx. 5 sec for 
 raytracer_cpp.

 Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not. --bb
Jan 17 2007
parent Dave <Dave_member pathlink.com> writes:
Bill Baxter wrote:
 Dave wrote:
 Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the 
 D code faster than the C++. It is now 2.6 times slower. The render 
 times are now approx. 13 sec for raytracer_d and approx. 5 sec for 
 raytracer_cpp.

 Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being 
 inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not.

I agree and have been wondering about that for some time - my guess is that it caused some type of bug early on and Walter didn't have the time to loop back and fix.
 
 --bb

Jan 19 2007
prev sibling next sibling parent Bradley Smith <digitalmars-com baysmith.com> writes:
Dave wrote:
 Bradley Smith wrote:
   Converted templates to regular functions. (Templates not being inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

No, I'm not sure. I'm assuming based on the performance increase when they are manually inlined. It could very well be that template functions are inlined as much as regular functions, since the regular functions weren't being inlined either. Thanks, Bradley
Jan 18 2007
prev sibling parent Bradley Smith <digitalmars-com baysmith.com> writes:
Dave wrote:
 Bradley Smith wrote:
 Thanks for all the suggestions. It helps, but not enough to make the D 
 code faster than the C++. It is now 2.6 times slower. The render times 
 are now approx. 13 sec for raytracer_d and approx. 5 sec for 
 raytracer_cpp.

 Here are the changes I've made. Attached is the new code.

   Call RegisterClass outside of assert. (Broken if -release used)
   Apply -release option. (Increases speed in an unknown way)
   Converted templates to regular functions. (Templates not being inlined)

Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.

You are correct. I have confirmed that the templates and regular functions are inlined. However, the way they are inlined appears to perform much more moving of data around than manually inlining. Perhaps the extra data moving is the cause of the performance degredation by using the function or template. I can also confirm that using inout on the function parameters will cause it to not be inlined. Thanks, Bradley
Jan 19 2007
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
When comparing the generated assembly from the dmc exe with the one from 
dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences 
(obviously copying of one vector3 to another). I could not find these in 
the C version.

Maybe the DMC is better at register aliasing (or what's it called) than 
DMD? I mean, DMD's actually moving data around, where DMC simply changes 
the names of the data?

Only W. knows.

L.
Jan 18 2007
parent Bill Baxter <dnewsgroup billbaxter.com> writes:
Lionello Lunesu wrote:
 When comparing the generated assembly from the dmc exe with the one from 
 dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences 
 (obviously copying of one vector3 to another). I could not find these in 
 the C version.
 
 Maybe the DMC is better at register aliasing (or what's it called) than 
 DMD? I mean, DMD's actually moving data around, where DMC simply changes 
 the names of the data?
 
 Only W. knows.
 
 L.

Hmm. I hope he knows...and is paying attention to this thread. --bb
Jan 18 2007
prev sibling next sibling parent reply "nobody_" <spam spam.spam> writes:
I think this thread is worth posting as a (D-performance) tutorial or 
something.
Alot of interesting performance issues have come up, of which most were 
unknown to me :)

What do you think? 
Jan 18 2007
parent Dave <Dave_member pathlink.com> writes:
nobody_ wrote:
 I think this thread is worth posting as a (D-performance) tutorial or 
 something.
 Alot of interesting performance issues have come up, of which most were 
 unknown to me :)
 

Hopefully the need for a tutorial on performance will soon be deprecated by better optimizations and a faster GC <g>
 What do you think? 
 
 

Jan 19 2007
prev sibling next sibling parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

As Bill Baxter pointed out, I missed an optimization on version 2. The 
pass by reference optimization using the inout on the Intersect's Ray 
argument. I had applied inout only to the Raytrace's Ray argument.

The further optimization brings the following approx. timings:

         time     factor
   dmc    5 sec    1.0
   dmd    9 sec    1.8
   gdc    13 sec   2.6
   msvc   5 sec    1.0
   g++    doesn't compile

Version 3 is attached and has the following changes:
   Fixed compiling with gdc
   Use inout for Intersect's Ray argument

Thanks,
   Bradley

Bradley Smith wrote:
 Jacco Bikker wrote several raytracing articles on DevMaster.net. I took 
 his third article and ported it to D. I was surprised to find that the D 
 code is approx. 4 times slower than C++.
 
 The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders 
 in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
 
 How can the D code be made to run faster?
 
 Thanks,
   Bradley
 

Jan 18 2007
parent reply Bradley Smith <digitalmars-com baysmith.com> writes:
Bradley Smith wrote:
 As Bill Baxter pointed out, I missed an optimization on version 2. The 
 pass by reference optimization using the inout on the Intersect's Ray 
 argument. I had applied inout only to the Raytrace's Ray argument.
 
 The further optimization brings the following approx. timings:
 
         time     factor
   dmc    5 sec    1.0
   dmd    9 sec    1.8
   gdc    13 sec   2.6

   msvc   5 sec    1.0
   g++    doesn't compile
 

Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
parent reply Daniel Giddings <dgiddings bigworldtech.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Try this version. In MSVC C++, float -> double, funcf -> func for the 
floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 
seconds on my computer. The same process makes the D version slower.

Bradley Smith wrote:
 
 Bradley Smith wrote:
 As Bill Baxter pointed out, I missed an optimization on version 2. The 
 pass by reference optimization using the inout on the Intersect's Ray 
 argument. I had applied inout only to the Raytrace's Ray argument.

 The further optimization brings the following approx. timings:

         time     factor
   dmc    5 sec    1.0
   dmd    9 sec    1.8
   gdc    13 sec   2.6

   msvc   5 sec    1.0
   g++    doesn't compile

Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".

Jan 18 2007
next sibling parent Bradley Smith <digitalmars-com baysmith.com> writes:
Yes, I see that behavior too. Using doubles, here is what I get.

    dmc    6 sec
    dmd    19 sec
    gdc    17 sec
    msvc   4 sec

It is also interesting that the msvc gets better where the dmc gets 
worse. I wouldn't stake to much on it though, since these are 
approximate timings.

Thanks,
   Bradley

Daniel Giddings wrote:
 Try this version. In MSVC C++, float -> double, funcf -> func for the 
 floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 
 seconds on my computer. The same process makes the D version slower.
 
 Bradley Smith wrote:
 Bradley Smith wrote:
 As Bill Baxter pointed out, I missed an optimization on version 2. 
 The pass by reference optimization using the inout on the Intersect's 
 Ray argument. I had applied inout only to the Raytrace's Ray argument.

 The further optimization brings the following approx. timings:

         time     factor
   dmc    5 sec    1.0
   dmd    9 sec    1.8
   gdc    13 sec   2.6

   msvc   5 sec    1.0
   g++    doesn't compile

Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".


Jan 18 2007
prev sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
You must have made a mistake somewhere, because the rendered image from 
D and C++ are not the same!

The image from the D exe has a lone white pixel (also present in the 
'float' versions, both D and cpp), but that white pixel is gone in the 
cpp version (both dmc and msvc).

L.
Jan 19 2007
parent Lionello Lunesu <lio lunesu.remove.com> writes:
Lionello Lunesu wrote:
 You must have made a mistake somewhere, because the rendered image from 
 D and C++ are not the same!
 
 The image from the D exe has a lone white pixel (also present in the 
 'float' versions, both D and cpp), but that white pixel is gone in the 
 cpp version (both dmc and msvc).
 
 L.

Sorry, I thought the .d files were also using 'double', but they're not.. This explains the different outcome. L.
Jan 19 2007
prev sibling parent Bradley Smith <digitalmars-com baysmith.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The Java implementation is also faster.

         time     factor   memory
   dmc    5 sec    1.0      5 MB
   java   8 sec    1.6     72 MB  (Java 1.6.0 -server)
   dmd    9 sec    1.8      5 MB
   java  19 sec    3.8     19 MB  (Java 1.6.0 -client)

However, Java uses much more memory.

All three implementations are in the attached zip.

Thanks,
   Bradley
Jan 21 2007