www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - cost of calling class function

reply =?UTF-8?B?RHXFoWFu?= Pavkov <dusan e-dule.com> writes:
Hello,

I have tried to measure how much would some simple task be faster 


causes one by one I have an example which shows where the problem 
is. If the function is outside of class code runs much faster. 
I'm obviously doing something wrong and would appreciate any help 
with this.

Here is the code:

import std.stdio;
import std.conv;
import std.datetime;

public float getTotal(string s, int add)
{
	float result = add;
	for (int j = 0; j < s.length; j++)
	{
		result += s[j];
	}
	return result;
}

class A
{
	public float getTotal(string s, int add)
	{
		float result = add;
		for (int j = 0; j < s.length; j++)
		{
			result += s[j];
		}
		return result;
	}
}

void main(string[] args)
{
	StopWatch sw;
	sw.start();

	int n = args.length == 2 ? to!int(args[1]) : 100000;

	string inputA = "qwertyuiopasdfghjklzxcvbnm0123456789";
	double total = 0;
	for (int i = 0; i < n; i++)
	{
		for (int ii = 0; ii < inputA.length; ii++)
		{
			total += getTotal(inputA, i);
		}
	}
	sw.stop();
	writeln("direct call: ");
	writeln("total: ", total);
	writeln("elapsed: ", sw.peek().msecs, " [ms]");
	writeln();

     total = 0;
	auto a = new A();
	sw.reset();
	sw.start();
	for (int i = 0; i < n; i++)
	{
		for (int ii = 0; ii < inputA.length; ii++)
		{
			total += a.getTotal(inputA, i);
		}
	}
	sw.stop();
	writeln("func in class call: ", total);
	writeln("total: ", total);
	writeln("elapsed: ", sw.peek().msecs, " [ms]");
}


here are the build configuration and execution times:

C:\projects\D\benchmarks\reduced problem>dub run 
--config=application --arch=x86_64 --build=release-nobounds 
--compiler=ldc2
Performing "release-nobounds" build using ldc2 for x86_64.
benchmark1 ~master: target for configuration "application" is up 
to date.
To force a rebuild of up-to-date targets, run again with --force.
Running .\benchmark1.exe
direct call:
total: 1.92137e+11
elapsed: 4 [ms]

func in class call: 1.92137e+11
total: 1.92137e+11
elapsed: 138 [ms]

Thanks in advance.
Feb 22 2017
next sibling parent reply Seb <seb wilzba.ch> writes:
On Wednesday, 22 February 2017 at 23:49:43 UTC, Dušan Pavkov 
wrote:
 Hello,

 I have tried to measure how much would some simple task be 


 eliminating causes one by one I have an example which shows 
 where the problem is. If the function is outside of class code 
 runs much faster. I'm obviously doing something wrong and would 
 appreciate any help with this.
I think I can provide a couple of pointers for one reason. The function isn't final and virtual calls are inefficient: https://dlang.org/spec/function.html#virtual-functions http://forum.dlang.org/post/mailman.840.1332033836.4860.digitalmars-d puremagic.com https://issues.dlang.org/show_bug.cgi?id=11616 http://wiki.dlang.org/DIP51 AFAICT though it was approved, the switch to final by default has never happened.
Feb 22 2017
parent reply Jeremy DeHaan <dehaan.jeremiah gmail.com> writes:
On Thursday, 23 February 2017 at 01:48:40 UTC, Seb wrote:
 AFAICT though it was approved, the switch to final by default 
 has never happened.
I believe Andrei made an executive decision to shut down final by default.
Feb 22 2017
parent reply Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, February 23, 2017 02:17:02 Jeremy DeHaan via Digitalmars-d 
wrote:
 On Thursday, 23 February 2017 at 01:48:40 UTC, Seb wrote:
 AFAICT though it was approved, the switch to final by default
 has never happened.
I believe Andrei made an executive decision to shut down final by default.
Yeah, the change that introduced virtual to start the change to making class member functions non-virtual by default was actually committed, and then Andrei found out about it and insisted that it be reverted. So, it was reverted, and we're never going to get non-virtual by default. - Jonathan M Davis
Feb 22 2017
parent Chris Wright <dhasenan gmail.com> writes:
On Wed, 22 Feb 2017 18:31:37 -0800, Jonathan M Davis via Digitalmars-d
wrote:

 On Thursday, February 23, 2017 02:17:02 Jeremy DeHaan via Digitalmars-d
 wrote:
 On Thursday, 23 February 2017 at 01:48:40 UTC, Seb wrote:
 AFAICT though it was approved, the switch to final by default has
 never happened.
I believe Andrei made an executive decision to shut down final by default.
Yeah, the change that introduced virtual to start the change to making class member functions non-virtual by default was actually committed, and then Andrei found out about it and insisted that it be reverted. So, it was reverted, and we're never going to get non-virtual by default. - Jonathan M Davis
It's an interesting debate, but there's not a ton of reason to prefer one over the other design-wise. It can be considered for D3, but for D2, the ship has sailed.
Feb 23 2017
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Wednesday, 22 February 2017 at 23:49:43 UTC, Dušan Pavkov 
wrote:
 
 If the function is outside of class code runs much faster. I'm 
 obviously doing something wrong and would appreciate any help 
 with this.
Interesting test case, thanks :-) Adding "final" to the class method nullifies the speed difference. Somehow, LDC does not devirtualize the call in your testcase. Without the for-loops the call is nicely devirtualized, so no performance difference. -Johan
Feb 23 2017
parent reply Johan Engelen <j j.nl> writes:
On Thursday, 23 February 2017 at 16:25:34 UTC, Johan Engelen 
wrote:
 On Wednesday, 22 February 2017 at 23:49:43 UTC, Dušan Pavkov 
 wrote:
 
 If the function is outside of class code runs much faster. I'm 
 obviously doing something wrong and would appreciate any help 
 with this.
Interesting test case, thanks :-) Adding "final" to the class method nullifies the speed difference. Somehow, LDC does not devirtualize the call in your testcase. Without the for-loops the call is nicely devirtualized, so no performance difference.
We're in good company: both clang and gcc also do not devirtualize the call when the loopcount is too large (when the loop count is 4, the indirect calls are gone, when it is 160, they are back). Btw, with PGO, the performance is 4 ms(direct call) vs 6 ms (virtual call). Pathological, but still. I am submitting a DConf talk on optimization and the cost of D idioms. This gave me some new ideas to present, thanks :) -Johan
Feb 23 2017
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Thursday, 23 February 2017 at 17:02:55 UTC, Johan Engelen 
wrote:
 On Thursday, 23 February 2017 at 16:25:34 UTC, Johan Engelen 
 wrote:
 [...]
We're in good company: both clang and gcc also do not devirtualize the call when the loopcount is too large (when the loop count is 4, the indirect calls are gone, when it is 160, they are back). Btw, with PGO, the performance is 4 ms(direct call) vs 6 ms (virtual call). Pathological, but still. I am submitting a DConf talk on optimization and the cost of D idioms. This gave me some new ideas to present, thanks :) -Johan
Marking the method as pure changes anything?
Feb 23 2017
parent Johan Engelen <j j.nl> writes:
On Thursday, 23 February 2017 at 19:57:18 UTC, Patrick Schluter 
wrote:
 Marking the method as  pure changes anything?
Here is the link to play with it yourself :-) https://godbolt.org/g/se4dCZ
Feb 23 2017