digitalmars.D.learn - Why is this D code slower than C++?
- Bradley Smith <digitalmars-com baysmith.com> Jan 16 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 16 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 17 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 17 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 16 2007
- %u <u infearof.spm> Jan 17 2007
- Dave <Dave_member pathlink.com> Jan 17 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 18 2007
- %u <u infearof.spm> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 17 2007
- "nobody_" <spam spam.spam> Jan 17 2007
- BCS <BCS pathlink.com> Jan 17 2007
- %u <u infearof.spm> Jan 17 2007
- Tom S <h3r3tic remove.mat.uni.torun.pl> Jan 17 2007
- "nobody_" <spam spam.spam> Jan 17 2007
- Steve Horne <stephenwantshornenospam100 aol.com> Jan 17 2007
- Steve Horne <stephenwantshornenospam100 aol.com> Jan 17 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 17 2007
- Dave <Dave_member pathlink.com> Jan 17 2007
- %u <u infearof.spm> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 18 2007
- %u <u infearof.spm> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 18 2007
- %u <u infearof.spm> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- %u <u infearof.spm> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 17 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 17 2007
- Dave <Dave_member pathlink.com> Jan 19 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 19 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 18 2007
- Bill Baxter <dnewsgroup billbaxter.com> Jan 18 2007
- "nobody_" <spam spam.spam> Jan 18 2007
- Dave <Dave_member pathlink.com> Jan 19 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Daniel Giddings <dgiddings bigworldtech.com> Jan 18 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 18 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 19 2007
- Lionello Lunesu <lio lunesu.remove.com> Jan 19 2007
- Bradley Smith <digitalmars-com baysmith.com> Jan 21 2007
Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 16 2007
Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Your build_d.bat is missing the -release flag? Don't know how much it will gain though. L.
Jan 16 2007
dmd -O -inline -release: 23.2 secs dmc -o+speed: 7,6 secs Averaged over 3 runs. This is without Bill's "inout" optimization, but with RegisterClass fixed. L.
Jan 17 2007
OK, ignore my previous post (it was with a debug build of Phobos). dmd -O -inline -release: 17.7 secs dmc -o+speed: 7.6 secs Averaged over 3 runs. This is without Bill's "inout" optimization, but with RegisterClass fixed. Also, I've also included a std.gc.disable() and I've replaced a "long" with "int", but these changes did not have any effect. L.
Jan 17 2007
Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
That is pretty weird. I noticed that it doesn't work properly with -release add to the compiler flags. If I do add it I just get a lot of flashing of my desktop icons when I run it, rather than a window popping up with a raytracer inside. Any idea why? Anyway, after some tweaking of the D version I got it down to 15 sec, vs 10 sec for C++ version on my machine. Mainly the kinds of thing I did were to make more things inout parameters so they don't get passed by value. Also it looks like maybe your template math functions like DOT and LENGTH aren't getting inlined. Replacing those with the inline code in hotspots like the sphere intersect function sped things up. Here's was the version of Sphere.Intersect I ended up with: int Intersect( inout Ray a_Ray, inout float a_Dist ) { vector3 v = a_Ray.origin; v -= m_Centre; //float b = -DOT!(float, vector3) ( v, a_Ray.direction ); vector3 dir = a_Ray.direction; float b = -(v.x * dir.x + v.y * dir.y + v.z * dir.z); float det = (b * b) - (v.x*v.x+v.y*v.y+v.z*v.z) + m_SqRadius; int retval = MISS; if (det > 0) { det = sqrt( det ); float i2 = b + det; if (i2 > 0) { float i1 = b - det; if (i1 < 0) { if (i2 < a_Dist) { a_Dist = i2; return INPRIM; } } else { if (i1 < a_Dist) { a_Dist = i1; return HIT; } } } } return retval; } The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec. I also tried making similar changes to the C++ version, but they didn't seem to affect the runtime at all. --bb
Jan 16 2007
== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleI noticed that it doesn't work properly with -release add to the compiler flags.
an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.
The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
Jan 17 2007
%u wrote:== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleI noticed that it doesn't work properly with -release add to the compiler flags.
an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.
The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
In that respect I'd like to see 'byref' be a synonym for 'inout' as well, so we can tweak those things w/o relying on the compiler, or by using a keyword (inout) that doesn't really fit the situation in which it's being used.
Jan 17 2007
%u wrote:== Quote from Bill Baxter (dnewsgroup billbaxter.com)'s articleI noticed that it doesn't work properly with -release add to the compiler flags.
an assertion. On my machine the -release flag brings another 25%.The inout on the Ray parameter and the other changes to this function alone change my D runtime from 22 sec to 15 sec.
The compiler should be smart enough to detect, that the Ray parameter is not used as an lvalue and thus can be replaced by a reference.
No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents. I guess a new modifier like "byref" is the only option.. L.
Jan 18 2007
Lionello Lunesu Wrote:No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents.
I guess a new modifier like "byref" is the only option..
Jan 18 2007
%u wrote:Lionello Lunesu Wrote:No, it can't.. Passing a struct by ref will result in unexpected behavior if it changes in some other thread. As always, the default should be safe no matter what, and that means copying the struct's contents.
I guess a new modifier like "byref" is the only option..
That's Ok as long as all D compilers will most likely rightly determine whether or not to pass the const byref as an optimization. Since this is probably not realistic, I think something like 'byref' is called for. There's been a great debate as to whether or not 'const' is actually enforceable, and unless it is, it would not really be of any value as an optimizer hint (like const can't be counted on as an optimizer hint for C++).
Jan 18 2007
Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions? Thanks, Bradley Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 17 2007
I really hope you'll get it faster than the C++ variant. Might -profile shed some light? Or maybe I lurk here in learn for a reason :DThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions?
Jan 17 2007
nobody_ wrote:I really hope you'll get it faster than the C++ variant. Might -profile shed some light? Or maybe I lurk here in learn for a reason :DThanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined) Any other suggestions?
I ran it with -profile and it takes about 25 min. here's the log http://www.webpages.uidaho.edu/~shro8822/trace.log
Jan 17 2007
BCS Wrote:here's the log http://www.webpages.uidaho.edu/~shro8822/trace.log
That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
Jan 17 2007
%u wrote:BCS Wrote:here's the log http://www.webpages.uidaho.edu/~shro8822/trace.log
That looks like the use of foreach lets the performance go down. Maybe its due to the numerous calls of delegates.
No, it shows foreach there because a lot of stuff got inlined and it's only seen by the profiler as the foreach's body. In my experience, more meaningful results can be obtained if -profile is used without -inline. -- Tomasz Stachowiak
Jan 17 2007
I ran it with -profile and it takes about 25 min.
Talk about overhead :) cpp took about 7 minutes (log attached)here's the log http://www.webpages.uidaho.edu/~shro8822/trace.log
Jan 17 2007
On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith <digitalmars-com baysmith.com> wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
...Any other suggestions?
I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers. You could try the GNU D compiler - GDC. Since it is using the standard GNU compiler suite backend code generator, it will probably handle the optimisation better. A second option is to split out some key inner-loop calculations and handle them in C, using D for the less performance-sensitive code. Calling C code from D is easy enough, though calling C++ is more of a hassle. This hack could be considered temporary, as the D float performance will no doubt be improved in time. Alternatively, if you don't mind losing portability, you could try using inline assembler for those key inner-loop calculations. If you're a real speed freak, you might even try using SIMD instructions to get 4 float calculations per instruction (and IIRC most SIMD instructions complete in a single clock cycle these days). The down side to that would be lower floating point precision, but for raytracing I wouldn't expect that to be a big deal. -- Remove 'wants' and 'nospam' from e-mail.
Jan 17 2007
On Wed, 17 Jan 2007 22:34:31 +0000, Steve Horne <stephenwantshornenospam100 aol.com> wrote:On Wed, 17 Jan 2007 11:18:10 -0800, Bradley Smith <digitalmars-com baysmith.com> wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp.
...Any other suggestions?
I haven't actually looked at the code, but I'll take a guess anyway. Raytracing is heavy on the floating point math. As Walter Bright acknowledges, the DMD compiler does not handle the optimisation of float arithmetic as well as some C++ compilers.
On second thoughts, if you're comparing with the DMC compiler for C++, floating point math performance seems a less likely issue. It seems odd that there's such a difference between the DMD and DMC compilers. You'd think the DMD compiler would use much the same back-end code generation that DMC does. -- Remove 'wants' and 'nospam' from e-mail.
Jan 17 2007
Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined)
You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb
Jan 17 2007
Bill Baxter wrote:Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined) Manually inlined DOT function. (Function not being inlined)
You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack. Also converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec. Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too. With those changes on my Intel Xeon 3.6GHz CPU the run times are about 10.1 sec vs 12.2 sec. D still not as fast as the C++, but close. --bb
One more thing to try (now that auto classes are allocated on the stack) is to convert the structs to classes and pass those around. Of course you can't return those from things like opSub(), so you'd have to always use opXxxAssign(), etc. I haven't gone over the code in detail, so maybe this is not really feasible but maybe worth a shot? IIRC, one of the problems with using 'inout' as function params. is that those are excluded from consideration for in-lining with the current D compiler front-end.
Jan 17 2007
Bill Baxter Wrote:D still not as fast as the C++, but close.
I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
Jan 18 2007
%u wrote:Bill Baxter Wrote:D still not as fast as the C++, but close.
I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct. --bb
Jan 18 2007
Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Jan 18 2007
%u wrote:Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D. C++ is still being used for new development in large part because of great performance, and the language constructs ("expressibility") that make that possible. One area where this keeps popping up in D is being able to pass structs 'byref' w/o necessarily using 'inout'.
Jan 18 2007
Dave wrote:%u wrote:Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.
Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, Bradley
Jan 18 2007
Bradley Smith wrote:Dave wrote:%u wrote:Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
Let's assume that the OP was earnestly trying to make the C++ and D code comparable... If so, then this exercise did point out some areas where D needs attention. In the final analysis, it's "good faith" ports like these that are going to satisfy whether or not D "is as fast or faster" than C++, and in many cases, whether or not people will make the switch. If it requires a lot of code modifications over and above a simple port to make D comparable in performance, people will shy away from D.
Thanks for defending me, Dave. You are correct in assuming that I am trying to make the C++ and D code comparable. I'm not trying to sabotage the D effort. In fact, I would very much like to see the D code perform significantly better than C++. I'm just trying to learn how to write high-performance D code. Thanks, Bradley
I think this was a great little benchmark you posted. I hope Walter takes some interest in this too, because he's consistently responded to performance questions with "I bet it'll be the same if you compile with DMC and DMD". But now at last we have a real-world kind of benchmark with which to test that assertion. The answer appears to be negative at the moment, but just as with bugs, you can't fix it if you can't reproduce the problem. And you've given us a very nice repro case. --bb
Jan 18 2007
Bradley Smith Wrote:Thanks for defending me, Dave.
victim to a known source of errors.
Jan 18 2007
%u wrote:Bill Baxter Wrote:So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
This shows however, that programmers still are not following engeering principles: no technical documentation of the port is given and no one complains. Instead several people are eager searching flaws in the reference implementation of D for which there is also no technical documentation :-(
What technical documentation would be proper? What would it contain?
Jan 18 2007
Bradley Smith Wrote:What technical documentation would be proper? What would it contain?
As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all. For example the answer you gave to Bill Baxter: | Because in the C++, GetMaterial returns a pointer. Since other | objects can use the pointer to change the value of the Material | contained within a Primitive, the same behavior was used in the D | code by using a class. If a struct had been used, a copy of Material | would be returned, and changing the Material would have no effect on | the Primitive. | Also, because GetMaterial is called very often, I assume that making | lots of copies of it would decrease performance. Presumably, that | is why the C++ code returns a pointer. would belong into such documentation as well as any other decision that was made during the port. For example I found a ".dup" in the D-version where there was no copying in the cpp-version. The question raises immediately whether this is done with intent or by accident. Without redundancy provided by technical documentation a careful analysis for the necessity of these four characters has to be undertaken.
Jan 18 2007
%u wrote:Bradley Smith Wrote:What technical documentation would be proper? What would it contain?
As always such depends on the requirements of the presumed readers. If you are able to change your position from the view of the porter to the view of a verifier or freshly introduced maintainer of the port, then you will have an impression of what you would want to look at first. It is a pity as it stands, that the question for the content of the technical documentation raises at all.
Dude, it's a toy raytracer ported from some free code someone posted to a website somewhere. Why should it come with gobs of documentation? But anyway, the original code was part of a series of tutorials. I think the version Bradley posted was probably from this installment: http://www.devmaster.net/articles/raytracing_series/part3.php As the series goes on, the author adds more and more fancy features to the raytracer. Anyway, the tutorials are already far more documentation than you'll find for most free code out in the wild. --bb
Jan 18 2007
Bill Baxter wrote:%u wrote:Bill Baxter Wrote:D still not as fast as the C++, but close.
I refuse to analyze this any further. On comparing the implementations of Primary, I noticed, that the OP has introduced a constructor which executes "new Material". There is no "new" in the cpp-version of Primary but a "SetMaterial" function. On deleting the new expression in the D-version an exception was raised on executing the newly compiled binary. Astonishingly grepping over the .cpp and .h -files with agent ransack no calls of "SetMaterial" were delivered---but "GetMaterial" is called---which uses the unset "Material" pointer. :-( Conclusion: at least one of the following is true 1) I have near to no ability to understand c++ 2) the c++-version is lucky to run at all In case of 2) the OP has silently changed the algorithm on porting to D.
It's case 1) I'm afraid. :-) Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer. Thanks, Bradley
Jan 18 2007
Bradley Smith wrote:Bill Baxter wrote:%u wrote:Bill Baxter Wrote:
Material is a by-value member of Primitive in the C++ version. This means it acts more like a D struct than a D class. GetMaterial calls return a pointer to the Material that's part of the class, and it will have been initialized implicitly by the Primitive constructor using whatever Material's default constructor does. So the C++ code is ok. But it's not clear why Material became a class in the D version rather than a struct.
Because in the C++, GetMaterial returns a pointer. Since other objects can use the pointer to change the value of the Material contained within a Primitive, the same behavior was used in the D code by using a class. If a struct had been used, a copy of Material would be returned, and changing the Material would have no effect on the Primitive. Also, because GetMaterial is called very often, I assume that making lots of copies of it would decrease performance. Presumably, that is why the C++ code returns a pointer.
You can return pointers in D too. But anyway, I don't think the change from by-value class in C++ to a by-reference class in D made any difference in the runtime. I wasn't saying that it was wrong that you changed Material to a D class or anything. It's a valid approach and certainly more D-ish than returning a pointer to a struct. --bb
Jan 18 2007
Bill Baxter wrote:You left out changing Intersect's Ray argument to be inout. And generally all Ray (and possibly vector3 parameters) to be inout to avoid the cost of copying them on the stack.
Sorry Bill, that was unintentional. I changed the Raytrace's Ray argument, but forgot the Interect's Ray argumentAlso converting vector expressions like vector3 v = a_Ray.origin - m_Centre; to vector3 v = a_Ray.origin; v -= m_Centre; makes a difference. Changing that one line in the Sphere.Intersect routine changes my runtime from 12.2 to 14.3 sec.
That helps too. The time is now down to approx. 10 sec. (2 times slower than C++).Interestingly the same sort of transformation to the C++ code didn't seem to make much difference. It could be related in part to the C++ vector parameters on the operators all taking const vector& (references) vs the D ones being just plain vector3. Chaging all the operators in the D version to inout may help speed too.
I've tried this "temporary value elimination" optimization in other areas of the code, but the effect is minimal. Based on my experience with Java, I think C++ is very good using return value optimization to eliminate temporary objects. Thanks, Bradley
Jan 18 2007
Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)
Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.Manually inlined DOT function. (Function not being inlined) Any other suggestions? Thanks, Bradley Bradley Smith wrote:Jacco Bikker wrote several raytracing articles on DevMaster.net. I took his third article and ported it to D. I was surprised to find that the D code is approx. 4 times slower than C++. The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders in approx. 5 sec. I am using the DMD and DMC compilers on Windows. How can the D code be made to run faster? Thanks, Bradley
Jan 17 2007
Dave wrote:Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)
Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not. --bb
Jan 17 2007
Bill Baxter wrote:Dave wrote:Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)
Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
I changed a bunch parameters to inout after discovering that it made a difference for the Intersect method. It could be that I had the template parameters as inout at the time when getting rid of the templates seemed to make a difference. That's evil that inout disables inlining. Seems like inout params would be easier to inline than regular parameters, but I guess not.
I agree and have been wondering about that for some time - my guess is that it caused some type of bug early on and Walter didn't have the time to loop back and fix.--bb
Jan 19 2007
Dave wrote:Bradley Smith wrote:Converted templates to regular functions. (Templates not being inlined)
Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
No, I'm not sure. I'm assuming based on the performance increase when they are manually inlined. It could very well be that template functions are inlined as much as regular functions, since the regular functions weren't being inlined either. Thanks, Bradley
Jan 18 2007
Dave wrote:Bradley Smith wrote:Thanks for all the suggestions. It helps, but not enough to make the D code faster than the C++. It is now 2.6 times slower. The render times are now approx. 13 sec for raytracer_d and approx. 5 sec for raytracer_cpp. Here are the changes I've made. Attached is the new code. Call RegisterClass outside of assert. (Broken if -release used) Apply -release option. (Increases speed in an unknown way) Converted templates to regular functions. (Templates not being inlined)
Are you sure? I know templates can be/are inlined and I guess I haven't noticed anywhere they aren't were I'd expect a regularly defined function to be inlined.
You are correct. I have confirmed that the templates and regular functions are inlined. However, the way they are inlined appears to perform much more moving of data around than manually inlining. Perhaps the extra data moving is the cause of the performance degredation by using the function or template. I can also confirm that using inout on the function parameters will cause it to not be inlined. Thanks, Bradley
Jan 19 2007
When comparing the generated assembly from the dmc exe with the one from dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences (obviously copying of one vector3 to another). I could not find these in the C version. Maybe the DMC is better at register aliasing (or what's it called) than DMD? I mean, DMD's actually moving data around, where DMC simply changes the names of the data? Only W. knows. L.
Jan 18 2007
Lionello Lunesu wrote:When comparing the generated assembly from the dmc exe with the one from dmd, I noticed that the D one had many "movsd; movsd; movsd;" sequences (obviously copying of one vector3 to another). I could not find these in the C version. Maybe the DMC is better at register aliasing (or what's it called) than DMD? I mean, DMD's actually moving data around, where DMC simply changes the names of the data? Only W. knows. L.
Hmm. I hope he knows...and is paying attention to this thread. --bb
Jan 18 2007
I think this thread is worth posting as a (D-performance) tutorial or something. Alot of interesting performance issues have come up, of which most were unknown to me :) What do you think?
Jan 18 2007
nobody_ wrote:I think this thread is worth posting as a (D-performance) tutorial or something. Alot of interesting performance issues have come up, of which most were unknown to me :)
Hopefully the need for a tutorial on performance will soon be deprecated by better optimizations and a faster GC <g>What do you think?
Jan 19 2007
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
As Bill Baxter pointed out, I missed an optimization on version 2. The
pass by reference optimization using the inout on the Intersect's Ray
argument. I had applied inout only to the Raytrace's Ray argument.
The further optimization brings the following approx. timings:
time factor
dmc 5 sec 1.0
dmd 9 sec 1.8
gdc 13 sec 2.6
msvc 5 sec 1.0
g++ doesn't compile
Version 3 is attached and has the following changes:
Fixed compiling with gdc
Use inout for Intersect's Ray argument
Thanks,
Bradley
Bradley Smith wrote:
Jacco Bikker wrote several raytracing articles on DevMaster.net. I took
his third article and ported it to D. I was surprised to find that the D
code is approx. 4 times slower than C++.
The raytracer_d renders in approx. 21 sec and the raytracer_cpp renders
in approx. 5 sec. I am using the DMD and DMC compilers on Windows.
How can the D code be made to run faster?
Thanks,
Bradley
Jan 18 2007
Bradley Smith wrote:As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6
msvc 5 sec 1.0 g++ doesn't compile
Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Try this version. In MSVC C++, float -> double, funcf -> func for the floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7 seconds on my computer. The same process makes the D version slower. Bradley Smith wrote:Bradley Smith wrote:As Bill Baxter pointed out, I missed an optimization on version 2. The pass by reference optimization using the inout on the Intersect's Ray argument. I had applied inout only to the Raytrace's Ray argument. The further optimization brings the following approx. timings: time factor dmc 5 sec 1.0 dmd 9 sec 1.8 gdc 13 sec 2.6
msvc 5 sec 1.0 g++ doesn't compile
Here is a correction to the gdc results. The wrong optimization flag was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
Yes, I see that behavior too. Using doubles, here is what I get.
dmc 6 sec
dmd 19 sec
gdc 17 sec
msvc 4 sec
It is also interesting that the msvc gets better where the dmc gets
worse. I wouldn't stake to much on it though, since these are
approximate timings.
Thanks,
Bradley
Daniel Giddings wrote:
Try this version. In MSVC C++, float -> double, funcf -> func for the
floating funcs (sqrtf, expf). It improves the time from 8.6 to 5.7
seconds on my computer. The same process makes the D version slower.
Bradley Smith wrote:
Bradley Smith wrote:
As Bill Baxter pointed out, I missed an optimization on version 2.
The pass by reference optimization using the inout on the Intersect's
Ray argument. I had applied inout only to the Raytrace's Ray argument.
The further optimization brings the following approx. timings:
time factor
dmc 5 sec 1.0
dmd 9 sec 1.8
gdc 13 sec 2.6
msvc 5 sec 1.0
g++ doesn't compile
Here is a correction to the gdc results. The wrong optimization flag
was used. The build_d_gdc.bat should have "-O3" rather than "-O".
Jan 18 2007
You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). L.
Jan 19 2007
Lionello Lunesu wrote:You must have made a mistake somewhere, because the rendered image from D and C++ are not the same! The image from the D exe has a lone white pixel (also present in the 'float' versions, both D and cpp), but that white pixel is gone in the cpp version (both dmc and msvc). L.
Sorry, I thought the .d files were also using 'double', but they're not.. This explains the different outcome. L.
Jan 19 2007
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
The Java implementation is also faster.
time factor memory
dmc 5 sec 1.0 5 MB
java 8 sec 1.6 72 MB (Java 1.6.0 -server)
dmd 9 sec 1.8 5 MB
java 19 sec 3.8 19 MB (Java 1.6.0 -client)
However, Java uses much more memory.
All three implementations are in the attached zip.
Thanks,
Bradley
Jan 21 2007









Lionello Lunesu <lio lunesu.remove.com> 