www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Strange performance behavior

reply Marius Muja <mariusm cs.ubc.ca> writes:
I have noticed the following strange (at least for me) performance 
behavior with one of my programs. It is a program that does some 
scientific computations and while trying to optimize it I noticed that 
the code from case_B below executes faster (almost twice as fast) as the 
code in case_A. This is a bit counterintuitive for me, since in case _B 
there is also the cost of the function call (or should be the same if 
the function is inlined).
Can anybody shed some light on why it's behaving this way?

case_A:
-------------------------------
foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
	value += vecs[index][k];
}
----------------------------------

case_B:
-------------------------------
void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
}
....
foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
}
_______________________________


Marius
Sep 19 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as the 
 code in case_A. This is a bit counterintuitive for me, since in case _B 
 there is also the cost of the function call (or should be the same if 
 the function is inlined).
 Can anybody shed some light on why it's behaving this way?
 
 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------
 
 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; } --bb
Sep 19 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as 
 the code in case_A. This is a bit counterintuitive for me, since in 
 case _B there is also the cost of the function call (or should be the 
 same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb
Sep 19 2007
next sibling parent reply Bruce Adams <tortoise_74 yeah.who.co.uk> writes:
Bill Baxter Wrote:

 Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as 
 the code in case_A. This is a bit counterintuitive for me, since in 
 case _B there is also the cost of the function call (or should be the 
 same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb

<shields up> This is exactly the sort of thing where const would help :) </shields down>
Sep 19 2007
parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bruce Adams wrote:
 Bill Baxter Wrote:
 
 Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as 
 the code in case_A. This is a bit counterintuitive for me, since in 
 case _B there is also the cost of the function call (or should be the 
 same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb

<shields up> This is exactly the sort of thing where const would help :) </shields down>

Uh oh... you should have left the shields up. Now you're unprotected... --bb
Sep 19 2007
parent reply Bruce Adams <tortoise_74 yeah.who.co.uk> writes:
Bill Baxter Wrote:

 Bruce Adams wrote:
 Bill Baxter Wrote:
 
 Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as 
 the code in case_A. This is a bit counterintuitive for me, since in 
 case _B there is also the cost of the function call (or should be the 
 same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb

<shields up> This is exactly the sort of thing where const would help :) </shields down>

Uh oh... you should have left the shields up. Now you're unprotected... --bb

I was invariant but now I'm just const but you still can't modify me through this interface :p
Sep 20 2007
next sibling parent reply "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Bruce Adams" <tortoise_74 yeah.who.co.uk> wrote in message 
news:fct9gb$uhd$1 digitalmars.com...
 I was invariant but now I'm just const but you still can't modify me 
 through this interface :p

*(cast(void*)&BruceAdams) = 0; Helloooo, void*.
Sep 20 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message 
news:fcton6$1rda$1 digitalmars.com...

 *(cast(void*)&BruceAdams) = 0;

Shit. *(cast(int*)cast(void*)&BruceAdams) = 0;
Sep 20 2007
prev sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Bruce Adams wrote:
 Bill Baxter Wrote:
 
 Bruce Adams wrote:
 Bill Baxter Wrote:

 Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed that 
 the code from case_B below executes faster (almost twice as fast) as 
 the code in case_A. This is a bit counterintuitive for me, since in 
 case _B there is also the cost of the function call (or should be the 
 same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb

This is exactly the sort of thing where const would help :) </shields down>

--bb

I was invariant but now I'm just const but you still can't modify me through this interface :p

Ooh, you better watch out. Somebody's going to put you in a cast for making such volatile statements. But still, if it's a pure literal expression of how you feel, then I won't take exception. ugh. :-P --bb
Sep 20 2007
parent reply Sean Kelly <sean f4.ca> writes:
Bill Baxter wrote:
 
 Ooh, you better watch out.  Somebody's going to put you in a cast for 
 making such volatile statements.  But still, if it's a pure literal 
 expression of how you feel, then I won't take exception.

Baaaaad :-) Sean
Sep 20 2007
parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Sean Kelly wrote:
 Bill Baxter wrote:
 Ooh, you better watch out.  Somebody's going to put you in a cast for 
 making such volatile statements.  But still, if it's a pure literal 
 expression of how you feel, then I won't take exception.

Baaaaad :-) Sean

(mhi:104424) Exception caught: AbstractConceptOverflowException: too many puns from mhi/arch/cogito/video/trans.d(782): Lingua!("us-en/sl").translate from mhi/arch/cogito/video/inp.d(342): Aleph!(NatAleph.LATIN).input in mhi/arch/cogito/main.d(501): Video!(170, 60).scan You raised an exception in my Mass Hallucination Interface... how dare you. Probably have a memory leak now... -- Chris Nicholson-Sauls
Sep 23 2007
parent BCS <ao pathlink.com> writes:
Reply to Chris Nicholson-Sauls,

 Sean Kelly wrote:
 
 Bill Baxter wrote:
 
 Ooh, you better watch out.  Somebody's going to put you in a cast
 for making such volatile statements.  But still, if it's a pure
 literal expression of how you feel, then I won't take exception.
 

Sean

many puns from mhi/arch/cogito/video/trans.d(782): Lingua!("us-en/sl").translate from mhi/arch/cogito/video/inp.d(342): Aleph!(NatAleph.LATIN).input in mhi/arch/cogito/main.d(501): Video!(170, 60).scan You raised an exception in my Mass Hallucination Interface... how dare you. Probably have a memory leak now...

Just thought I'd correct the subject line :b
Sep 23 2007
prev sibling parent Marius Muja <mariusm cs.ubc.ca> writes:
Bill Baxter wrote:
 Bill Baxter wrote:
 Marius Muja wrote:
 I have noticed the following strange (at least for me) performance 
 behavior with one of my programs. It is a program that does some 
 scientific computations and while trying to optimize it I noticed 
 that the code from case_B below executes faster (almost twice as 
 fast) as the code in case_A. This is a bit counterintuitive for me, 
 since in case _B there is also the cost of the function call (or 
 should be the same if the function is inlined).
 Can anybody shed some light on why it's behaving this way?

 case_A:
 -------------------------------
 foreach (i,index; indices) {
    foreach (k, inout value; centers[belongs_to[i]])
     value += vecs[index][k];
 }
 ----------------------------------

 case_B:
 -------------------------------
 void addTo(T,U)(T[] a, U[] b) {
    foreach(index, inout value; a) {
       value += b[index];
    }
 }
 ....
 foreach (i,index; indices) {
    addTo(centers[belongs_to[i]],vecs[index]);
 }
 _______________________________

My guess would be the compiler's failure to optimize[*] away the [index] indexing in A. So you do 2 lookups per iteration rather than just one. If so then this should be just as fast as case_B: foreach (i,index; indices) { auto vecs_i = vecs[index]; foreach (k, inout value; centers[belongs_to[i]]) value += vecs_i[k]; }

[*] It may not really be a failure of the optimizer -- maybe it's unreasonably difficult to determine absolutely that vecs won't changed by the operation '+= vecs[index][k]'. --bb

Yes, that was it! Thanks for clearing this up for me.
Sep 19 2007