## digitalmars.D.learn - D slower ~1s from C ?!

- Minas (213/213) Apr 05 2012 Many of you should know the website "projecteuler.net", where
- q66 (3/217) Apr 05 2012 My guess would be still worse optimization capabilities of dmd.
- Andrej Mitrovic (8/9) Apr 05 2012 C via GCC (gcc -m32 test.c -o testgcc.exe -std=c99 -lm -O5)
- Kapps (23/37) Apr 05 2012 First, you should compile with -O -release -inline and, in this
- Minas (3/17) Apr 06 2012 Thank you, that made it run 300ms faster than the C version!
- Jonathan M Davis (8/10) Apr 06 2012 A bug? I don't think so. Something that the optimizer could do better?
- Marco Leise (148/153) Apr 08 2012 The first rule or PE: You don't talk about PE in the public. You acciden...

Many of you should know the website "projecteuler.net", where there are mathematical problems to solve with computers. I am doing those in D, and after I finished one today, I decided to compile it in C as well to compare the results. The problem was: "The number 3797 has an interesting property. Being prime itself, it is possible to continuously remove digits from left to right, and remain prime at each stage: 3797, 797, 97, and 7. Similarly we can work from right to left: 3797, 379, 37, and 3. Find the sum of the only eleven primes that are both truncatable from left to right and right to left. NOTE: 2, 3, 5, and 7 are not considered to be truncatable primes." My solution in D: [source] import std.stdio; import std.math; import std.conv; void main() { int count, i = 10, sum; while( count < 11 ) { if( isPrime(i) && isPrimeRightToLeft(i) && isPrimeLeftToRight(i) ) { //writeln(i); ++count; sum += i; } ++i; } writeln(sum); } /// returns true if n is a prime number bool isPrime(ulong n) { n = abs(n); // 0 and 1 aren't primes if( n < 2 ) return false; if( n % 2 == 0 && n != 2) return false; // an even number can't be a prime (except 2) // check only if it's odd for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i) if( n % i == 0 ) return false; return true; } /** * returns true if n is a prime* truncatable from right to left. * Note: n must have been tested to be prime with a separate function */ bool isPrimeRightToLeft(ulong n) { if( n < 10 ) return false; n /= 10; // assuming that n has already been tested to be prime, we can skip checking it while( n > 0 ) { if( !isPrime(n) ) return false; n /= 10; } return true; } /** * returns true if n is a prime* truncatable from left to right. * Note: n must have been tested to be prime with a separate function */ bool isPrimeLeftToRight(ulong n) { if( n < 10 ) return false; ulong power = cast(ulong)pow(10, cast(ulong)log10(n)); ulong firstDigit = n / power; n -= firstDigit * power; while( n > 0 ) { if( !isPrime(n) ) return false; power = cast(ulong)pow(10, cast(ulong)log10(n)); firstDigit = n / power; n -= firstDigit * power; } return true; } [/source] In C: [source] #include <stdio.h> #include <stdlib.h> #include <math.h> typedef unsigned long ulong; int isPrime(ulong n); int isPrimeRightToLeft(ulong n); int isPrimeLeftToRight(ulong n); int main() { int count = 0, i = 10, sum = 0; while( count < 11 ) { if( isPrime(i) && isPrimeRightToLeft(i) && isPrimeLeftToRight(i) ) { //writeln(i); ++count; sum += i; } ++i; } printf("%d\n", sum); return 0; } /// returns true if n is a prime number int isPrime(ulong n) { n = abs(n); // 0 and 1 aren't primes if( n < 2 ) return 0; if( n % 2 == 0 && n != 2) return 0; // an even number can't be a prime (except 2) // check only if it's odd for(ulong i = 2; i <= (ulong)sqrt((double)n+1); ++i) if( n % i == 0 ) return 0; return 1; } /** * returns 1 if n is a prime* truncatable from right to left. * Note: n must have been tested to be prime with a separate function */ int isPrimeRightToLeft(ulong n) { if( n < 10 ) return 0; n /= 10; // assuming that n has already been tested to be prime, we can skip checking it while( n > 0 ) { if( !isPrime(n) ) return 0; n /= 10; } return 1; } /** * returns 1 if n is a prime* truncatable from left to right. * Note: n must have been tested to be prime with a separate function */ int isPrimeLeftToRight(ulong n) { if( n < 10 ) return 0; ulong power = (ulong)pow(10, (ulong)log10(n)); ulong firstDigit = n / power; n -= firstDigit * power; while( n > 0 ) { if( !isPrime(n) ) return 0; power = (ulong)pow(10, (ulong)log10(n)); firstDigit = n / power; n -= firstDigit * power; } return 1; } [/source] And this is the time execution of the programs: In D: real 0m1.648s user 0m1.644s sys 0m0.000s In C: real 0m0.766s user 0m0.760s sys 0m0.000s There's quite a big difference here. Is that normal or is there something wrong? (Maybe a bug?) * C source file was compiled as "gcc euler37.c -o euler37 -lm -std=c99 -O5 D source file was compiled as "dmd euler37.d -O -release" ** In gcc as well in dmd sizeof(ulong) == 8

Apr 05 2012

On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote:Many of you should know the website "projecteuler.net", where there are mathematical problems to solve with computers. I am doing those in D, and after I finished one today, I decided to compile it in C as well to compare the results. The problem was: "The number 3797 has an interesting property. Being prime itself, it is possible to continuously remove digits from left to right, and remain prime at each stage: 3797, 797, 97, and 7. Similarly we can work from right to left: 3797, 379, 37, and 3. Find the sum of the only eleven primes that are both truncatable from left to right and right to left. NOTE: 2, 3, 5, and 7 are not considered to be truncatable primes." My solution in D: [source] import std.stdio; import std.math; import std.conv; void main() { int count, i = 10, sum; while( count < 11 ) { if( isPrime(i) && isPrimeRightToLeft(i) && isPrimeLeftToRight(i) ) { //writeln(i); ++count; sum += i; } ++i; } writeln(sum); } /// returns true if n is a prime number bool isPrime(ulong n) { n = abs(n); // 0 and 1 aren't primes if( n < 2 ) return false; if( n % 2 == 0 && n != 2) return false; // an even number can't be a prime (except 2) // check only if it's odd for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i) if( n % i == 0 ) return false; return true; } /** * returns true if n is a prime* truncatable from right to left. * Note: n must have been tested to be prime with a separate function */ bool isPrimeRightToLeft(ulong n) { if( n < 10 ) return false; n /= 10; // assuming that n has already been tested to be prime, we can skip checking it while( n > 0 ) { if( !isPrime(n) ) return false; n /= 10; } return true; } /** * returns true if n is a prime* truncatable from left to right. * Note: n must have been tested to be prime with a separate function */ bool isPrimeLeftToRight(ulong n) { if( n < 10 ) return false; ulong power = cast(ulong)pow(10, cast(ulong)log10(n)); ulong firstDigit = n / power; n -= firstDigit * power; while( n > 0 ) { if( !isPrime(n) ) return false; power = cast(ulong)pow(10, cast(ulong)log10(n)); firstDigit = n / power; n -= firstDigit * power; } return true; } [/source] In C: [source] #include <stdio.h> #include <stdlib.h> #include <math.h> typedef unsigned long ulong; int isPrime(ulong n); int isPrimeRightToLeft(ulong n); int isPrimeLeftToRight(ulong n); int main() { int count = 0, i = 10, sum = 0; while( count < 11 ) { if( isPrime(i) && isPrimeRightToLeft(i) && isPrimeLeftToRight(i) ) { //writeln(i); ++count; sum += i; } ++i; } printf("%d\n", sum); return 0; } /// returns true if n is a prime number int isPrime(ulong n) { n = abs(n); // 0 and 1 aren't primes if( n < 2 ) return 0; if( n % 2 == 0 && n != 2) return 0; // an even number can't be a prime (except 2) // check only if it's odd for(ulong i = 2; i <= (ulong)sqrt((double)n+1); ++i) if( n % i == 0 ) return 0; return 1; } /** * returns 1 if n is a prime* truncatable from right to left. * Note: n must have been tested to be prime with a separate function */ int isPrimeRightToLeft(ulong n) { if( n < 10 ) return 0; n /= 10; // assuming that n has already been tested to be prime, we can skip checking it while( n > 0 ) { if( !isPrime(n) ) return 0; n /= 10; } return 1; } /** * returns 1 if n is a prime* truncatable from left to right. * Note: n must have been tested to be prime with a separate function */ int isPrimeLeftToRight(ulong n) { if( n < 10 ) return 0; ulong power = (ulong)pow(10, (ulong)log10(n)); ulong firstDigit = n / power; n -= firstDigit * power; while( n > 0 ) { if( !isPrime(n) ) return 0; power = (ulong)pow(10, (ulong)log10(n)); firstDigit = n / power; n -= firstDigit * power; } return 1; } [/source] And this is the time execution of the programs: In D: real 0m1.648s user 0m1.644s sys 0m0.000s In C: real 0m0.766s user 0m0.760s sys 0m0.000s There's quite a big difference here. Is that normal or is there something wrong? (Maybe a bug?) * C source file was compiled as "gcc euler37.c -o euler37 -lm -std=c99 -O5 D source file was compiled as "dmd euler37.d -O -release" ** In gcc as well in dmd sizeof(ulong) == 8My guess would be still worse optimization capabilities of dmd. Try with gdc and see.

Apr 05 2012

On 4/5/12, Minas <minas_mina1990 hotmail.co.uk> wrote:And this is the time execution of the programsC via GCC (gcc -m32 test.c -o testgcc.exe -std=c99 -lm -O5) Elapsed Time: 0:00:02.015 D via DMD (dmd test.d -oftestdmd.exe -release -inline -O -noboundscheck) Elapsed Time: 0:00:08.312 D via GDC (gdmd -m32 -release -inline -O -noboundscheck test.d -oftestgdc.exe) Elapsed Time: 0:00:01.015 These results are fairly consistent on my machine.

Apr 05 2012

On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote:Many of you should know the website "projecteuler.net", where there are mathematical problems to solve with computers. I am doing those in D, and after I finished one today, I decided to compile it in C as well to compare the results. The problem was: "The number 3797 has an interesting property. Being prime itself, it is possible to continuously remove digits from left to right, and remain prime at each stage: 3797, 797, 97, and 7. Similarly we can work from right to left: 3797, 379, 37, and 3. Find the sum of the only eleven primes that are both truncatable from left to right and right to left. NOTE: 2, 3, 5, and 7 are not considered to be truncatable primes." My solution in D:First, you should compile with -O -release -inline and, in this case, -noboundscheck. The main issue here seems to be the for loop. Changing: for(ulong i = 2; i <= cast (ulong)sqrt(cast(double)n+1); ++i) if( n % i == 0 ) return false; To: ulong End = cast (ulong)sqrt(cast(double)n+1); for(ulong i = 2; i <= End; ++i) if( n % i == 0 ) return false; Results in a 26 times performance increase for me, based off of using a StopWatch at start of main and stopping it at end of main. It's possible that the C compiler can recognize that this is a constant expression (sqrt might be an intrinsic). D should be able to do this even better; sqrt is strongly pure and takes in arguments that do not change, thus it should be able to automatically make the change I did above. It (at least DMD) does not seem to however. I did not try the C version, and the D version was compiled with DMD on Windows.

Apr 05 2012

On Thursday, 5 April 2012 at 23:23:54 UTC, Kapps wrote:On Thursday, 5 April 2012 at 17:22:38 UTC, Minas wrote: First, you should compile with -O -release -inline and, in this case, -noboundscheck. return false; Results in a 26 times performance increase for me, based off of using a StopWatch at start of main and stopping it at end of main. It's possible that the C compiler can recognize that this is a constant expression (sqrt might be an intrinsic). D should be able to do this even better; sqrt is strongly pure and takes in arguments that do not change, thus it should be able to automatically make the change I did above. It (at least DMD) does not seem to however. I did not try the C version, and the D version was compiled with DMD on Windows.Thank you, that made it run 300ms faster than the C version! Could this be a bug in dmd that I should report?

Apr 06 2012

On Friday, April 06, 2012 09:34:10 Minas wrote:Thank you, that made it run 300ms faster than the C version! Could this be a bug in dmd that I should report?A bug? I don't think so. Something that the optimizer could do better? Absolutely. The code is perfectly correct. It's just that the optimizer doesn't do as good a job here as we'd like. It certainly wouldn't hurt to submit an enhancement request with the two versions of the code and an explanation of why you expect the optimizer to make the first one as fast as the second. - Jonathan M Davis

Apr 06 2012

Am Thu, 05 Apr 2012 19:22:37 +0200 schrieb "Minas" <minas_mina1990 hotmail.co.uk>:Many of you should know the website "projecteuler.net", where=20 there are mathematical problems to solve with computers. =20 I am doing those in D, and after I finished one today, I decided=20 to compile it in C as well to compare the results.The first rule or PE: You don't talk about PE in the public. You accidental= ly posted a solution to a problem in the public. Now people can solve the t= ask, using a search engine. And after solving the problem, get access to ot= her language implementations. Some employers even use problems from there t= o assess new programmers. And believe me, you wouldn't want a co-worker tha= t cheated on PE to get the job ;). Anyway like others suggested, I use GDC to get the most out of D and I actu= ally like the challenge of staying under 50ms for every problem I solve. (M= y Friend Key is 58749952297599_fa95cc8e61e8df1206f1144436ac050f) Admittedly my first solutions aren't always fast. Often I just solve the pr= oblem and then look into the forum for how to make it _fast_. After a while= I think, I should have a nice library of very fast helper functions. My personal favorite is a small struct that can be used to iterate along ro= ws or diagonals of Pascal's triangle: --------------------- alias size_t =E2=84=95; /** * The ultimate tool to move around in Pascal's triangle. Pascal(0, 0) is t= he * top of the triangle with the value of 1. */ struct Pascal { this(=E2=84=95 n, =E2=84=95 k) { // internally we offset these by +1, to simplify the math ++n; ++k; _n =3D n; if (k > n / 2) { _k =3D _n; left(_n - k); } else { right(k - 1); } } =09 =E2=84=95 left(=E2=84=95 amount) { while (amount--) { _value *=3D --_k; _value /=3D _n - _k; } return _value; } =09 =E2=84=95 right(=E2=84=95 amount) { while (amount--) { _value *=3D _n - _k; _value /=3D _k++; } return _value; } =09 =E2=84=95 rightDown(=E2=84=95 amount) { while (amount--) { _value *=3D _n++; _value /=3D _k++; } return _value; } =E2=84=95 leftDown(=E2=84=95 amount) { while (amount--) { _value *=3D _n++; _value /=3D _n - _k; } return _value; } =E2=84=95 leftUp(=E2=84=95 amount) { while (amount--) { _value *=3D --_k; _value /=3D --_n; } return _value; } =E2=84=95 rightUp(=E2=84=95 amount) { while (amount--) { _value *=3D _n - _k; _value /=3D --_n; } return _value; } =09 property =E2=84=95 value() { return _value; } =09 alias value this; =09 private: =09 =E2=84=95 _n =3D 1, _k =3D 1; =E2=84=95 _value =3D 1; } ------------------------- I use it like this for combinatorics calculations (n over k), where a naive= approach would recalculate that Pascal's triangle value at the required po= sition over and over again: =E2=84=95[] permut_per_variation =3D new =E2=84=95[variations]; Pascal pascal =3D Pascal(_2 + _3, _3); foreach (v; 0..variations) { permut +=3D permut_per_variation[v] =3D pascal; pascal.leftDown(1); pascal.left(2); } Still some functional programmers keep mocking me for my lines of code coun= t. And I tend to agree now and then (some FP solutions are _really_ short).= With these fun tasks it is more a matter of personality though. I prefer f= ast over concise code. Some participants even use ASM... maybe they like it= really close to the metal. In any case it is good to know how C/D code map= s to machine code, e.g. what is efficient and what not, so you don't misuse= real instead of int. Looking at your code I noticed the following: *) writeln() is ok, but doesn't flush (e.g. missing output when a program i= s halted in a debugger or crashes) You can use stdout.writeln() when you need the output on the screen as soo= n as the line is executed *) n =3D abs(n); // <- don't need this, you already declared n as unsigned *) if( n % 2 ...) Just in case someone wonders, GCC/GDC replaces the modulo with faster=09 code. There is no need to write n & 1, which is a bit more cryptic. *) for(ulong i =3D 2; i <=3D cast (ulong)sqrt(cast(double)n+1); ++i) can be written as: foreach (i; 2 .. cast(ulong)sqrt(n)+1) Foreach only computes start and end once and 'i' will be ulong thanks to t= he cast! *) ulong power =3D cast(ulong)pow(10, cast(ulong)log10(n)); can alternatively be written as: ulong power =3D 10 ^^ cast(ulong)log10(n); *) In the following while loop, you can replace power =3D cast(ulong)pow(10, cast(ulong)log10(n)); with power /=3D 10; But really, only the isPrime() function is taking up time in this code. Thi= s is a short and fast version with GDC 64bit, but a different algorithm wou= ld help more: bool isPrime(ulong n) { // 0 and 1 aren't primes if( n < 2 ) return false; for( ulong i =3D 2; n / i >=3D i; ++i ) if( n % i =3D=3D 0 ) return false; return true; } When / and % are used close to each other, they are handled in one CPU inst= ruction. So you can avoid floating point math altogether. --=20 Marco

Apr 08 2012