digitalmars.D.learn - D slower ~1s from C ?!

Minas (213/213) Apr 05 2012 Many of you should know the website "projecteuler.net", where

q66 (3/217) Apr 05 2012 My guess would be still worse optimization capabilities of dmd.
Andrej Mitrovic (8/9) Apr 05 2012 C via GCC (gcc -m32 test.c -o testgcc.exe -std=c99 -lm -O5)
Kapps (23/37) Apr 05 2012 First, you should compile with -O -release -inline and, in this

Minas (3/17) Apr 06 2012 Thank you, that made it run 300ms faster than the C version!

Jonathan M Davis (8/10) Apr 06 2012 A bug? I don't think so. Something that the optimizer could do better?

Marco Leise (148/153) Apr 08 2012 The first rule or PE: You don't talk about PE in the public. You acciden...

"Minas" <minas_mina1990 hotmail.co.uk> writes:

Many of you should know the website "projecteuler.net", where 
there are mathematical problems to solve with computers.

I am doing those in D, and after I finished one today, I decided 
to compile it in C as well to compare the results.

The problem was:

"The number 3797 has an interesting property. Being prime itself, 
it is possible to continuously remove digits from left to right, 
and remain prime at each stage: 3797, 797, 97, and 7. Similarly 
we can work from right to left: 3797, 379, 37, and 3.

Find the sum of the only eleven primes that are both truncatable 
from left to right and right to left.

NOTE: 2, 3, 5, and 7 are not considered to be truncatable primes."

My solution in D:

[source]
import std.stdio;
import std.math;
import std.conv;

void main()
{
	int count, i = 10, sum;
	
	while( count < 11 )
	{
		if( isPrime(i) && isPrimeRightToLeft(i) && 
isPrimeLeftToRight(i) )
		{
			//writeln(i);
			++count;
			sum += i;
		}
		
		++i;
	}
	
	writeln(sum);
}

/// returns true if n is a prime number
bool isPrime(ulong n)
{
	n = abs(n);
	
	// 0 and 1 aren't primes
	if( n < 2 ) return false;

	if( n % 2 == 0 && n != 2)
		return false; // an even number can't be a prime (except 2)

	// check only if it's odd
	for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i)
		if( n % i == 0 )
			return false;

	return true;
}

/**
  * returns true if n is a prime* truncatable from right to left.
  * Note: n must have been tested to be prime with a separate 
function
  */
bool isPrimeRightToLeft(ulong n)
{
	if( n < 10 )
		return false;
	
	n /= 10; // assuming that n has already been tested to be prime, 
we can skip checking it
	
	while( n > 0 )
	{
		if( !isPrime(n) )
			return false;
		
		n /= 10;
	}
	
	return true;
}

/**
  * returns true if n is a prime* truncatable from left to right.
  * Note: n must have been tested to be prime with a separate 
function
  */
bool isPrimeLeftToRight(ulong n)
{
	if( n < 10 )
		return false;
	
	ulong power = cast(ulong)pow(10, cast(ulong)log10(n));
	ulong firstDigit = n / power;
	n -= firstDigit * power;
	
	while( n > 0 )
	{
		if( !isPrime(n) )
			return false;
		
		power = cast(ulong)pow(10, cast(ulong)log10(n));
		firstDigit = n / power;
		
		n -= firstDigit * power;
	}
	
	return true;
}
[/source]


In C:
[source]
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

typedef unsigned long ulong;

int isPrime(ulong n);
int isPrimeRightToLeft(ulong n);
int isPrimeLeftToRight(ulong n);

int main()
{
	int count = 0, i = 10, sum = 0;
	
	while( count < 11 )
	{
		if( isPrime(i) && isPrimeRightToLeft(i) && 
isPrimeLeftToRight(i) )
		{
			//writeln(i);
			++count;
			sum += i;
		}
		
		++i;
	}
	
	printf("%d\n", sum);
	
	return 0;
}

/// returns true if n is a prime number
int isPrime(ulong n)
{
	n = abs(n);
	
	// 0 and 1 aren't primes
	if( n < 2 ) return 0;

	if( n % 2 == 0 && n != 2)
		return 0; // an even number can't be a prime (except 2)

	// check only if it's odd
	for(ulong i = 2; i <= (ulong)sqrt((double)n+1); ++i)
		if( n % i == 0 )
			return 0;

	return 1;
}

/**
  * returns 1 if n is a prime* truncatable from right to left.
  * Note: n must have been tested to be prime with a separate 
function
  */
int isPrimeRightToLeft(ulong n)
{
	if( n < 10 )
		return 0;
	
	n /= 10; // assuming that n has already been tested to be prime, 
we can skip checking it
	
	while( n > 0 )
	{
		if( !isPrime(n) )
			return 0;
		
		n /= 10;
	}
	
	return 1;
}

/**
  * returns 1 if n is a prime* truncatable from left to right.
  * Note: n must have been tested to be prime with a separate 
function
  */
int isPrimeLeftToRight(ulong n)
{
	if( n < 10 )
		return 0;
	
	ulong power = (ulong)pow(10, (ulong)log10(n));
	ulong firstDigit = n / power;
	n -= firstDigit * power;
	
	while( n > 0 )
	{
		if( !isPrime(n) )
			return 0;
		
		power = (ulong)pow(10, (ulong)log10(n));
		firstDigit = n / power;
		
		n -= firstDigit * power;
	}
	
	return 1;
}
[/source]

And this is the time execution of the programs:
In D:
real	0m1.648s
user	0m1.644s
sys	0m0.000s

In C:
real	0m0.766s
user	0m0.760s
sys	0m0.000s

There's quite a big difference here. Is that normal or is there 
something wrong? (Maybe a bug?)

*  C source file was compiled as "gcc euler37.c -o euler37 -lm 
-std=c99 -O5
    D source file was compiled as "dmd euler37.d -O -release"

** In gcc as well in dmd sizeof(ulong) == 8

Apr 05 2012

"q66" <quaker66 gmail.com> writes:

On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote:
 Many of you should know the website "projecteuler.net", where 
 there are mathematical problems to solve with computers.

 I am doing those in D, and after I finished one today, I 
 decided to compile it in C as well to compare the results.

 The problem was:

 "The number 3797 has an interesting property. Being prime 
 itself, it is possible to continuously remove digits from left 
 to right, and remain prime at each stage: 3797, 797, 97, and 7. 
 Similarly we can work from right to left: 3797, 379, 37, and 3.

 Find the sum of the only eleven primes that are both 
 truncatable from left to right and right to left.

 NOTE: 2, 3, 5, and 7 are not considered to be truncatable 
 primes."

 My solution in D:

 [source]
 import std.stdio;
 import std.math;
 import std.conv;

 void main()
 {
 	int count, i = 10, sum;
 	
 	while( count < 11 )
 	{
 		if( isPrime(i) && isPrimeRightToLeft(i) && 
 isPrimeLeftToRight(i) )
 		{
 			//writeln(i);
 			++count;
 			sum += i;
 		}
 		
 		++i;
 	}
 	
 	writeln(sum);
 }

 /// returns true if n is a prime number
 bool isPrime(ulong n)
 {
 	n = abs(n);
 	
 	// 0 and 1 aren't primes
 	if( n < 2 ) return false;

 	if( n % 2 == 0 && n != 2)
 		return false; // an even number can't be a prime (except 2)

 	// check only if it's odd
 	for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i)
 		if( n % i == 0 )
 			return false;

 	return true;
 }

 /**
  * returns true if n is a prime* truncatable from right to left.
  * Note: n must have been tested to be prime with a separate 
 function
  */
 bool isPrimeRightToLeft(ulong n)
 {
 	if( n < 10 )
 		return false;
 	
 	n /= 10; // assuming that n has already been tested to be 
 prime, we can skip checking it
 	
 	while( n > 0 )
 	{
 		if( !isPrime(n) )
 			return false;
 		
 		n /= 10;
 	}
 	
 	return true;
 }

 /**
  * returns true if n is a prime* truncatable from left to right.
  * Note: n must have been tested to be prime with a separate 
 function
  */
 bool isPrimeLeftToRight(ulong n)
 {
 	if( n < 10 )
 		return false;
 	
 	ulong power = cast(ulong)pow(10, cast(ulong)log10(n));
 	ulong firstDigit = n / power;
 	n -= firstDigit * power;
 	
 	while( n > 0 )
 	{
 		if( !isPrime(n) )
 			return false;
 		
 		power = cast(ulong)pow(10, cast(ulong)log10(n));
 		firstDigit = n / power;
 		
 		n -= firstDigit * power;
 	}
 	
 	return true;
 }
 [/source]


 In C:
 [source]
 #include <stdio.h>
 #include <stdlib.h>
 #include <math.h>

 typedef unsigned long ulong;

 int isPrime(ulong n);
 int isPrimeRightToLeft(ulong n);
 int isPrimeLeftToRight(ulong n);

 int main()
 {
 	int count = 0, i = 10, sum = 0;
 	
 	while( count < 11 )
 	{
 		if( isPrime(i) && isPrimeRightToLeft(i) && 
 isPrimeLeftToRight(i) )
 		{
 			//writeln(i);
 			++count;
 			sum += i;
 		}
 		
 		++i;
 	}
 	
 	printf("%d\n", sum);
 	
 	return 0;
 }

 /// returns true if n is a prime number
 int isPrime(ulong n)
 {
 	n = abs(n);
 	
 	// 0 and 1 aren't primes
 	if( n < 2 ) return 0;

 	if( n % 2 == 0 && n != 2)
 		return 0; // an even number can't be a prime (except 2)

 	// check only if it's odd
 	for(ulong i = 2; i <= (ulong)sqrt((double)n+1); ++i)
 		if( n % i == 0 )
 			return 0;

 	return 1;
 }

 /**
  * returns 1 if n is a prime* truncatable from right to left.
  * Note: n must have been tested to be prime with a separate 
 function
  */
 int isPrimeRightToLeft(ulong n)
 {
 	if( n < 10 )
 		return 0;
 	
 	n /= 10; // assuming that n has already been tested to be 
 prime, we can skip checking it
 	
 	while( n > 0 )
 	{
 		if( !isPrime(n) )
 			return 0;
 		
 		n /= 10;
 	}
 	
 	return 1;
 }

 /**
  * returns 1 if n is a prime* truncatable from left to right.
  * Note: n must have been tested to be prime with a separate 
 function
  */
 int isPrimeLeftToRight(ulong n)
 {
 	if( n < 10 )
 		return 0;
 	
 	ulong power = (ulong)pow(10, (ulong)log10(n));
 	ulong firstDigit = n / power;
 	n -= firstDigit * power;
 	
 	while( n > 0 )
 	{
 		if( !isPrime(n) )
 			return 0;
 		
 		power = (ulong)pow(10, (ulong)log10(n));
 		firstDigit = n / power;
 		
 		n -= firstDigit * power;
 	}
 	
 	return 1;
 }
 [/source]

 And this is the time execution of the programs:
 In D:
 real	0m1.648s
 user	0m1.644s
 sys	0m0.000s

 In C:
 real	0m0.766s
 user	0m0.760s
 sys	0m0.000s

 There's quite a big difference here. Is that normal or is there 
 something wrong? (Maybe a bug?)

 *  C source file was compiled as "gcc euler37.c -o euler37 -lm 
 -std=c99 -O5
    D source file was compiled as "dmd euler37.d -O -release"

 ** In gcc as well in dmd sizeof(ulong) == 8

My guess would be still worse optimization capabilities of dmd. 
Try with gdc and see.

Apr 05 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 4/5/12, Minas <minas_mina1990 hotmail.co.uk> wrote:
 And this is the time execution of the programs

C via GCC (gcc -m32 test.c -o testgcc.exe -std=c99 -lm -O5)
Elapsed Time:     0:00:02.015

D via DMD (dmd test.d -oftestdmd.exe -release -inline -O -noboundscheck)
Elapsed Time:     0:00:08.312

D via GDC (gdmd -m32 -release -inline -O -noboundscheck test.d -oftestgdc.exe)
Elapsed Time:     0:00:01.015

These results are fairly consistent on my machine.

Apr 05 2012

"Kapps" <opantm2+spam gmail.com> writes:

On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote:
 Many of you should know the website "projecteuler.net", where 
 there are mathematical problems to solve with computers.

 I am doing those in D, and after I finished one today, I 
 decided to compile it in C as well to compare the results.

 The problem was:

 "The number 3797 has an interesting property. Being prime 
 itself, it is possible to continuously remove digits from left 
 to right, and remain prime at each stage: 3797, 797, 97, and 7. 
 Similarly we can work from right to left: 3797, 379, 37, and 3.

 Find the sum of the only eleven primes that are both 
 truncatable from left to right and right to left.

 NOTE: 2, 3, 5, and 7 are not considered to be truncatable 
 primes."

 My solution in D:

First, you should compile with -O -release -inline and, in this 
case, -noboundscheck.

The main issue here seems to be the for loop.
Changing:
for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i)
		if( n % i == 0 )
			return false;
To:
ulong End = cast (ulong)sqrt(cast(double)n+1);
for(ulong i = 2; i <= End; ++i)
	if( n % i == 0 )
		return false;

Results in a 26 times performance increase for me, based off of 
using a StopWatch at start of main and stopping it at end of 
main. It's possible that the C compiler can recognize that this 
is a constant expression (sqrt might be an intrinsic). D should 
be able to do this even better; sqrt is strongly pure and takes 
in arguments that do not change, thus it should be able to 
automatically make the change I did above. It (at least DMD) does 
not seem to however.

I did not try the C version, and the D version was compiled with 
DMD on Windows.

Apr 05 2012

"Minas" <minas_mina1990 hotmail.co.uk> writes:

On Thursday, 5 April 2012 at 23:23:54 UTC, Kapps wrote:
 On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote:
 First, you should compile with -O -release -inline and, in this 
 case, -noboundscheck.
 		return false;

 Results in a 26 times performance increase for me, based off of 
 using a StopWatch at start of main and stopping it at end of 
 main. It's possible that the C compiler can recognize that this 
 is a constant expression (sqrt might be an intrinsic). D should 
 be able to do this even better; sqrt is strongly pure and takes 
 in arguments that do not change, thus it should be able to 
 automatically make the change I did above. It (at least DMD) 
 does not seem to however.

 I did not try the C version, and the D version was compiled 
 with DMD on Windows.

Thank you, that made it run 300ms faster than the C version!
Could this be a bug in dmd that I should report?

Apr 06 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, April 06, 2012 09:34:10 Minas wrote:
 Thank you, that made it run 300ms faster than the C version!
 Could this be a bug in dmd that I should report?

A bug? I don't think so. Something that the optimizer could do better? 
Absolutely. The code is perfectly correct. It's just that the optimizer 
doesn't do as good a job here as we'd like.

It certainly wouldn't hurt to submit an enhancement request with the two 
versions of the code and an explanation of why you expect the optimizer to 
make the first one as fast as the second.

- Jonathan M Davis

Apr 06 2012

Marco Leise <Marco.Leise gmx.de> writes:

Am Thu, 05 Apr 2012 19:22:37 +0200
schrieb "Minas" <minas_mina1990 hotmail.co.uk>:

 Many of you should know the website "projecteuler.net", where=20
 there are mathematical problems to solve with computers.
=20
 I am doing those in D, and after I finished one today, I decided=20
 to compile it in C as well to compare the results.

The first rule or PE: You don't talk about PE in the public. You accidental=
ly posted a solution to a problem in the public. Now people can solve the t=
ask, using a search engine. And after solving the problem, get access to ot=
her language implementations. Some employers even use problems from there t=
o assess new programmers. And believe me, you wouldn't want a co-worker tha=
t cheated on PE to get the job ;).

Anyway like others suggested, I use GDC to get the most out of D and I actu=
ally like the challenge of staying under 50ms for every problem I solve. (M=
y Friend Key is 58749952297599_fa95cc8e61e8df1206f1144436ac050f)
Admittedly my first solutions aren't always fast. Often I just solve the pr=
oblem and then look into the forum for how to make it _fast_. After a while=
 I think, I should have a nice library of very fast helper functions.

My personal favorite is a small struct that can be used to iterate along ro=
ws or diagonals of Pascal's triangle:

---------------------

alias size_t =E2=84=95;

/**
 * The ultimate tool to move around in Pascal's triangle. Pascal(0, 0) is t=
he
 * top of the triangle with the value of 1.
 */
struct Pascal
{
	this(=E2=84=95 n, =E2=84=95 k) {
		// internally we offset these by +1, to simplify the math
		++n; ++k;
		_n =3D n;
		if (k > n / 2) {
			_k =3D _n;
			left(_n - k);
		} else {
			right(k - 1);
		}
	}
=09
	=E2=84=95 left(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D --_k;
			_value /=3D _n - _k;
		}
		return _value;
	}
=09
	=E2=84=95 right(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D _n - _k;
			_value /=3D _k++;
		}
		return _value;
	}
=09
	=E2=84=95 rightDown(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D _n++;
			_value /=3D _k++;
		}
		return _value;
	}

	=E2=84=95 leftDown(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D _n++;
			_value /=3D _n - _k;
		}
		return _value;
	}

	=E2=84=95 leftUp(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D --_k;
			_value /=3D --_n;
		}
		return _value;
	}

	=E2=84=95 rightUp(=E2=84=95 amount) {
		while (amount--) {
			_value *=3D _n - _k;
			_value /=3D --_n;
		}
		return _value;
	}
=09
	 property
	=E2=84=95 value() { return _value; }
=09
	alias value this;
=09
	private:
=09
	=E2=84=95 _n =3D 1, _k =3D 1;
	=E2=84=95 _value =3D 1;
}

-------------------------

I use it like this for combinatorics calculations (n over k), where a naive=
 approach would recalculate that Pascal's triangle value at the required po=
sition over and over again:

	=E2=84=95[] permut_per_variation =3D new =E2=84=95[variations];
	Pascal pascal =3D Pascal(_2 + _3, _3);
	foreach (v; 0..variations)
	{
		permut +=3D permut_per_variation[v] =3D pascal;
		pascal.leftDown(1);
		pascal.left(2);
	}

Still some functional programmers keep mocking me for my lines of code coun=
t. And I tend to agree now and then (some FP solutions are _really_ short).=
 With these fun tasks it is more a matter of personality though. I prefer f=
ast over concise code. Some participants even use ASM... maybe they like it=
 really close to the metal. In any case it is good to know how C/D code map=
s to machine code, e.g. what is efficient and what not, so you don't misuse=
 real instead of int.

Looking at your code I noticed the following:

*)	writeln() is ok, but doesn't flush (e.g. missing output when a program i=
s halted in a debugger or crashes)
	You can use stdout.writeln() when you need the output on the screen as soo=
n as the line is executed

*)	n =3D abs(n); // <- don't need this, you already declared n as unsigned

*)	if( n % 2 ...)
	Just in case someone wonders, GCC/GDC replaces the modulo with faster=09
	code. There is no need to write n & 1, which is a bit more cryptic.

*)	for(ulong i =3D 2; i <=3D cast (ulong)sqrt(cast(double)n+1); ++i)
	can be written as:
	foreach (i; 2 .. cast(ulong)sqrt(n)+1)

	Foreach only computes start and end once and 'i' will be ulong thanks to t=
he cast!

*)	ulong power =3D cast(ulong)pow(10, cast(ulong)log10(n));
	can alternatively be written as:
	ulong power =3D 10 ^^ cast(ulong)log10(n);

*)	In the following while loop, you can replace
	power =3D cast(ulong)pow(10, cast(ulong)log10(n));
	with
	power /=3D 10;

But really, only the isPrime() function is taking up time in this code. Thi=
s is a short and fast version with GDC 64bit, but a different algorithm wou=
ld help more:

bool isPrime(ulong n)
{
	// 0 and 1 aren't primes
	if( n < 2 ) return false;

	for( ulong i =3D 2; n / i >=3D i; ++i )
		if( n % i =3D=3D 0 )
			return false;

	return true;
}

When / and % are used close to each other, they are handled in one CPU inst=
ruction. So you can avoid floating point math altogether.

--=20
Marco

Apr 08 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - D slower ~1s from C ?!