www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - optimize vector code

reply Sascha Katzner <sorry.no spam.invalid> writes:
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I'm currently trying to optimize my vector/matrix code.

the relevant section:
 struct Vector3(T) {
 	T x, y, z;
 	void opAddAssign(Vector3 v) {
 		x += v.x;
 		y += v.y;
 		z += v.z;
 	}
 	Vector3 opMul(T s) {
 		return Vector3(x * s, y * s, z * s);
 	}
 }

If you compare the resulting code from this two examples: first:
 	v1 += v2 * 3.0f;

second:
 	v1.x += v2.x * 3.0f;
 	v1.y += v2.y * 3.0f;
 	v1.z += v2.z * 3.0f;

...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source
Nov 30 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sascha Katzner wrote:
 Hi,
 
 I'm currently trying to optimize my vector/matrix code.
 
 the relevant section:
 struct Vector3(T) {
     T x, y, z;
     void opAddAssign(Vector3 v) {
         x += v.x;
         y += v.y;
         z += v.z;
     }
     Vector3 opMul(T s) {
         return Vector3(x * s, y * s, z * s);
     }
 }

If you compare the resulting code from this two examples: first:
     v1 += v2 * 3.0f;

second:
     v1.x += v2.x * 3.0f;
     v1.y += v2.y * 3.0f;
     v1.z += v2.z * 3.0f;

...it is rather obvious that this is not very good optimized, because opMul() creates a new struct in the first example. Is there any Way to guide the compiler in the first example to create more efficient code? LLAP, Sascha P.S. attached a complete example of this source

Pass big structs by reference. void opAddAssign(ref Vector3 v) {... --bb
Nov 30 2007
parent reply Sascha Katzner <sorry.no spam.invalid> writes:
Bill Baxter wrote:
 Pass big structs by reference.
      void opAddAssign(ref Vector3 v) {...

In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) LLAP, Sascha
Nov 30 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sascha Katzner wrote:
 Bill Baxter wrote:
 Pass big structs by reference.
      void opAddAssign(ref Vector3 v) {...

In this example this is not a good idea, because it prevents that the compiler inlines opAddAssign(). Don't know why, but it does. yourSuggestion.sizeof = 0x38 + 0x23 = 0x5b bytes ;-) LLAP, Sascha

All I know is that actual benchmarking has been done on raytracers and changing all the pass-by-values to pass-by-ref improved speed. I have no idea what your sizeof is benchmarking there. But if you're interested in actual execution speed I suggest measuring time rather than bytes. I'd be very interested to know if pass-by-ref is no longer faster than pass-by-value for big structs. --bb
Nov 30 2007
parent reply Sascha Katzner <sorry.no spam.invalid> writes:
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

Bill Baxter wrote:
 All I know is that actual benchmarking has been done on raytracers and 
 changing all the pass-by-values to pass-by-ref improved speed.
 
 I have no idea what your sizeof is benchmarking there.  But if you're 
 interested in actual execution speed I suggest measuring time rather 
 than bytes.  I'd be very interested to know if pass-by-ref is no longer 
 faster than pass-by-value for big structs.

Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case. I've benchmarked the three cases and got: 9.5s without ref 6.7s with ref (<- your suggestion) 4.1s manual inlined So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :( LLAP, Sascha
Nov 30 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Sascha Katzner wrote:
 Bill Baxter wrote:
 All I know is that actual benchmarking has been done on raytracers and 
 changing all the pass-by-values to pass-by-ref improved speed.

 I have no idea what your sizeof is benchmarking there.  But if you're 
 interested in actual execution speed I suggest measuring time rather 
 than bytes.  I'd be very interested to know if pass-by-ref is no 
 longer faster than pass-by-value for big structs.

Youre right, comparing the size of the generated code was a VERY rough estimate... and wrong in this case. I've benchmarked the three cases and got: 9.5s without ref 6.7s with ref (<- your suggestion) 4.1s manual inlined So, it is a lot faster indeed, but yet not as fast as inling the functions manually. :(

It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results. --bb
Nov 30 2007
parent Saaa <no reply.com> writes:
 So, it is a lot faster indeed, but yet not as fast as inling the 
 functions manually. :(

It's been mentioned before that DMD is particularly poor at floating point optimizations. If it matters a lot to you, you might be able to get better results from GDC, which uses gcc's backend. If you do try it I'd love to hear the benchmark results. --bb

If the inlining was done correctly how could floating-point-optimizations account for the difference in speed? Or am I missing something? (probably:)
Nov 30 2007