www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Method Call Overhead Stats for GDC (function vs virtual vs final)

reply eris <jvburnes gmail.com> writes:
Hello D programmers.

I'm implementing a simulation in D that will likely make sustained and intense
use of method calls.  Naturally I wanted to determine the overhead for various
types of call invocations.  Throwing all advice against "premature
optimization" aside I wrote a short program to test this.

One of the primary questions I wanted answered was "Would marking methods with
the 'final' attribute help the compiler optimize those calls into something
very close to a simple function call?"

Here are the class definitions to test the idea:

// ***********************
static void fnmeth() {
	j++;
}

class A
{
	void Ameth() {
		j++;
	}
}

class B : A
{
	final void Bmeth() {
		j++;
	}
}

final class C : A
{
	void Cmeth() {
		j++;
	}
}

// *****************

The first definition is a simple static function call.  I'll use this as a
baseline.
The second one is a simple virtual method.
The third is a method explicitly marked as 'final'.
The fourth is a class marked as 'final' which would imply that all methods are
final.

 The j++ bump is simply something to populate the method and possibly keep it
from being optimized into non-existence early on.

I ran each of these calls 4 million times to look at their respective
performance.  My program uses the tango StopWatch object to handle the
performance timing.  Unfortunately I'm running under VMWare and as I found out
later the timers are screwed up when running under a VM.

The upside of all this is that I was forced to disassemble the executable to
determined what sort of optimizations gdc performed.  This is actually a good
thing since it tells you more about the code generator than you would otherwise
be able to see.

Here are the results:

draco% gdc --version 
gdc (GCC) 4.2.3 20080225 (prerelease gdc 0.25 20071215, using dmd 1.022)
(Ubuntu 0.25-4.2.3-2ubuntu2)

(no optimization)

Call Type                   Call Overhead  (x86 instructions)        Internal
Method Overhead (x86 ins)
------------                    ---------------------------------------------  
      ------------------------------------------------
static function call            1 (direct call)                                
        4 (push,mov,pop,ret)
virtual method call           7 (indirect call)                                
     8 (push,mov,sub,mov,mov,call,leave,ret)
final method call               3 (setup & direct call)                        
8 (same as above)
final class                           7 (indirect call)                        
            8 (same as above)

(optimization level 1)

Call Type                   Call Overhead  (x86 instructions)        Internal
Method Overhead (x86 ins)
------------                    ---------------------------------------------  
      ------------------------------------------------
static function call            1 (direct call)                                
        4 (push,mov,pop,ret)
virtual method call           3 (indirect call)                                
     8 (push,mov,sub,mov,mov,call,leave,ret)
final method call               3 (setup & direct call)                        
8 (same as above)
final class                           3 (indirect call)                        
            8 (same as above)

(optimization level 2 yielded no improvements in call setup or overhead, but
inlined the function call)

(optimization level 3 yielded no significant improvements over that)

(Different instructions will of course have different number of cycles,
dispatching etc so the function call count is only a general guide).

Interesting conclusion

(1) At -O1, both final and fully virtual methods have the same call overhead
and the same internal method overhead.  The only difference is the final method
is a direct call.

(2) With no optimization, the final method call setup is 3 instructions while
the final class call setup is 7 instructions.  At -O1 the compiler figures out
that they're the same.  

(2) Internal method overhead for any class method invocation looks to be around
2.5x the call setup overhead at -O1.  This is something I hadn't considered.

Lesson learned:

If you're looking towards efficiency for frequently invoked methods, consider
other ways of getting to the data.  In other words avoid over using get()/set()
type calls and don't force everything into extreme levels of encapsulation just
to be "politically correct".  You will pay the price.

BTW: Does anyone know why gdc is doing this " mov    %esi,(%esp)".  Thats two
instructions before a direct call.  Why would you want to store the contents of
the string index register into the top of the stack?  I'm not doing any string
or byte functions.

Hmm.

Eris
Jul 24 2008
parent reply Jason House <jason.james.house gmail.com> writes:
A few days ago, I added "final" to many of my speed critical function calls and
got a 20% speed gain to my simulation-based application.  Maybe it was from
extra inlining?

eris Wrote:

 Hello D programmers.
 
 I'm implementing a simulation in D that will likely make sustained and intense
use of method calls.  Naturally I wanted to determine the overhead for various
types of call invocations.  Throwing all advice against "premature
optimization" aside I wrote a short program to test this.
 
 One of the primary questions I wanted answered was "Would marking methods with
the 'final' attribute help the compiler optimize those calls into something
very close to a simple function call?"

Jul 25 2008
parent superdan <super dan.org> writes:
Jason House Wrote:

 A few days ago, I added "final" to many of my speed critical function calls
and got a 20% speed gain to my simulation-based application.  Maybe it was from
extra inlining?

it all connects together. final disables virtual. that in turn allows inlining. after inlining about a brazillion optimizations apply. after those yet another brazillion optimizations apply. some compilers even do some fixed point iteration shit to figure out the best order and sequence of applying optimizations. this is all an entangled mess making predictions unreliable like shit. measuring on any concrete case (like the op has done) does little in the way of predicting performance improvements for other code.
Jul 25 2008