digitalmars.D.learn - Small part of a program : d and c versions performances diff.

Larry (92/92) Jul 09 2014 Hello,

NCrashed (35/127) Jul 09 2014 Clock isn't an accurate benchmark instrument. Try
bearophile (121/128) Jul 09 2014 Your C code is not equivalent to the D code, there are small

Larry (13/143) Jul 09 2014 You are definitely right, I did mess up while translating !

bearophile (8/14) Jul 09 2014 If you run it on very low powered hardware then you may not need
John Colvin (8/208) Jul 09 2014 Could you provide the exact code you are using for that

Larry (14/14) Jul 09 2014 Yes you are perfectly right but our need is to run the fastest

Larry (5/19) Jul 09 2014 @John Colvin :
bearophile (9/13) Jul 09 2014 Have you benchmarked the D code without starting the current

Larry (6/6) Jul 09 2014 @Bearophile: just tried. No dramatic change.

bearophile (10/16) Jul 09 2014 That just means disabling the GC, so the start time is the same.

John Colvin (6/10) Jul 09 2014 You say you are worried about microseconds and power consumption,

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/5) Jul 09 2014 Not much overhead if you don't use a MMU and use static linking.
Larry (15/25) Jul 09 2014 @John : A new process ? Where ?

Chris (5/33) Jul 09 2014 I wouldn't give up on D (as you've already signalled). It's
John Colvin (8/36) Jul 09 2014 process == program in this case. Launching a new process ==

Larry (12/12) Jul 09 2014 I may definitely help on the D project.

Larry (20/20) Jul 09 2014 @Chris :

Kapps (3/17) Jul 09 2014 This to me pretty much confirms that almost the entirety of your

Larry (10/10) Jul 09 2014 The actual code is not that much slower according to the numerous

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/9) Jul 09 2014 Changing the topic a little, the calculation above ignores the tv_sec

Larry (5/15) Jul 09 2014 Absolutely Ali because I know it is under the sec range. I made

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/22) Jul 09 2014 I know it did work and will work every time you test it. :)

Larry (1/1) Jul 09 2014 Right

Kapps (19/19) Jul 09 2014 Measure a larger number of loops. I understand you're concerned

"Larry" <deco33 hotmail.fr> writes:

Hello,

I extracted a part of my code written in c.
it is deliberately useless here but I would understand the 
different technics to optimize such kind of code with gdc 
compiler.

it currently runs under a microsecond.

Constraint : the way the code is expressed cannot be changed much 
we need that double loop because there are other operations 
involved in the first loop scope.

main.c :
[code]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "jol.h"
#include <time.h>
#include <sys/time.h>
int main(void)
{

     struct timeval s,e;
     gettimeofday(&s,NULL);

     int pol = 5;
     tes(&pol);


     int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215};
     int len = 13-1;
     int g = 0;

     for (int x = 36; x >= 0 ; --x ){
         // some code here erased for the test
         for(int y = len ; y >= 0; --y){
             //some other code here
             ++g;
             arr[y] +=1;

         }

     }
     gettimeofday(&e,NULL);

     printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, 
arr[4],arr[9],pol);
     return 0;
}
[/code]

jol.c
[code]
void tes(int * restrict a){

     *a = 9;

}
[/code]

and jol.h

#ifndef JOL_H
#define JOL_H
void tes(int * restrict a);
#endif // JOL_H


Now, the D counterpart:

module main;

import std.stdio;
import std.datetime;
import jol;
int main(string[] args)
{


     auto currentTime = Clock.currTime();

     int pol = 5;
     tes(pol);
     pol = 8;

     int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215];
     int len = 13-1;
     int g = 0;

     for (int x = 31; x >= 0 ; --x ){

         for(int y = len ; y >= 0; --y){

             ++g;
             arr[y] +=1;

         }

     }
     auto currentTime2 = Clock.currTime();
     writefln("Hello World %d %s %d %d\n",g, (currentTime2 - 
currentTime),arr[4],arr[9]);

     return 0;
}

and

module jol;
final void tes(ref int a){

     a = 9;

}


Ok, the compilation options :
gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize

gcc -march=native -std=c11 -O2 main.c jol.c

Now the performance :
D : 12 µs
C : < 1µs

Where does the diff comes from ? Is there a way to optimize the d 
version ?

Again, I am absolutely new to D and those are my very first line 
of code with it.

Thanks

Jul 09 2014

"NCrashed" <NCrashed gmail.com> writes:

On Wednesday, 9 July 2014 at 10:57:33 UTC, Larry wrote:
 Hello,

 I extracted a part of my code written in c.
 it is deliberately useless here but I would understand the 
 different technics to optimize such kind of code with gdc 
 compiler.

 it currently runs under a microsecond.

 Constraint : the way the code is expressed cannot be changed 
 much we need that double loop because there are other 
 operations involved in the first loop scope.

 main.c :
 [code]
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 #include "jol.h"
 #include <time.h>
 #include <sys/time.h>
 int main(void)
 {

     struct timeval s,e;
     gettimeofday(&s,NULL);

     int pol = 5;
     tes(&pol);


     int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215};
     int len = 13-1;
     int g = 0;

     for (int x = 36; x >= 0 ; --x ){
         // some code here erased for the test
         for(int y = len ; y >= 0; --y){
             //some other code here
             ++g;
             arr[y] +=1;

         }

     }
     gettimeofday(&e,NULL);

     printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, 
 arr[4],arr[9],pol);
     return 0;
 }
 [/code]

 jol.c
 [code]
 void tes(int * restrict a){

     *a = 9;

 }
 [/code]

 and jol.h

 #ifndef JOL_H
 #define JOL_H
 void tes(int * restrict a);
 #endif // JOL_H


 Now, the D counterpart:

 module main;

 import std.stdio;
 import std.datetime;
 import jol;
 int main(string[] args)
 {


     auto currentTime = Clock.currTime();

     int pol = 5;
     tes(pol);
     pol = 8;

     int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215];
     int len = 13-1;
     int g = 0;

     for (int x = 31; x >= 0 ; --x ){

         for(int y = len ; y >= 0; --y){

             ++g;
             arr[y] +=1;

         }

     }
     auto currentTime2 = Clock.currTime();
     writefln("Hello World %d %s %d %d\n",g, (currentTime2 - 
 currentTime),arr[4],arr[9]);

     return 0;
 }

 and

 module jol;
 final void tes(ref int a){

     a = 9;

 }


 Ok, the compilation options :
 gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize

 gcc -march=native -std=c11 -O2 main.c jol.c

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize the 
 d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.

 Thanks

Clock isn't an accurate benchmark instrument. Try 
std.datetime.benchmark:
```
module main;

import std.stdio;
import std.datetime;

void tes(ref int a)
{
     a = 9;
}

int[] arr = [9,16,458,2,68,5452,98,32,4,565,78,985,3215];

void foo()
{
     int pol = 5;
     tes(pol);
     pol = 8;
     int g = 0;

     foreach_reverse(x; 0..31)
     {
         foreach_reverse(ref a; arr)
         {
             ++g;
             a += 1;
         }
     }
}

void main()
{
     auto res = benchmark!foo(1000); // take mean of 1000 launches
     writeln(res[0].msecs, " ", arr[4], " ", arr[9]);
}
```

Dmd time: 1 us
Gcc time: <= 1 us

Jul 09 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize the 
 d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.

Your C code is not equivalent to the D code, there are small 
differences, even the output is different. So I've cleaned up 
your C and D code:

------------------------

// C code.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include "jol.h"

int main() {
     struct timeval s, e;
     gettimeofday(&s, NULL);

     int pol = 5;
     tes(&pol);

     int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
985, 3215};
     int len = 13 - 1;
     int g = 0;

     for (int x = 36; x >= 0; --x) {
         for (int y = len; y >= 0; --y) {
             ++g;
             arr[y]++;
         }
     }

     gettimeofday(&e, NULL);
     printf("C: %d %lu %d %d %d\n",
            g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);

     return 0;
}

------------------------

D code ("final" functions have not much meaning, but the D 
compiler is very sloppy and doesn't complain):


module jol;

void tes(ref int a) {
     a = 9;
}


---------

module maind;

void main() {
     import std.stdio;
     import std.datetime;
     import jol;

     StopWatch sw;
     sw.start;

     int pol = 5;
     tes(pol);

     int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
985, 3215];
     int len = 13 - 1;
     int g = 0;

     for (int x = 36; x >= 0; --x) {
         // Some code here erased for the test.
         for (int y = len; y >= 0; --y) {
             // Some other code here.
             ++g;
             arr[y]++;
         }
     }

     sw.stop;
     writefln("D: %d %d %d %d %d",
              g, sw.peek.nsecs, arr[4], arr[9], pol);
}

----------------

That D code is not fully idiomatic, this is closer to idiomatic D 
code:


module jol2;

void test(ref int x) pure nothrow  safe {
     x = 9;
}



module maind;

void main() {
     import std.stdio, std.datetime;
     import jol2;

     StopWatch sw;
     sw.start;

     int pol = 5;
     test(pol);

     int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
985, 3215];
     uint count = 0;

     foreach_reverse (immutable _; 0 .. 37) {
         foreach_reverse (ref ai; arr) {
             count++;
             ai++;
         }
     }

     sw.stop;
     writefln("D: %d %d %d %d %d",
              count, sw.peek.nsecs, arr[4], arr[9], pol);
}

----------------

In my benchmarks I don't have used the more idiomatic D code, I 
have used the C-like code. But the run-time is essentially the 
same.

I compile the C and D code with (on a 32 bit Windows):

gcc -march=native -std=c11 -O2 main.c jol.c -o main

ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
strip maind.exe

For the D code I've used the latest ldc2 compiler (V. 0.13.0, 
based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 
(rubenvb-4.8.0).

----------------

The C code gives as ouput:

C: 481 0 105 602 9


The D code gives as output:

D: 481 6076 105 602 9

----------------------

If I slow down the CPU at half speed the C code runs in about 
0.05 seconds, the D code runs in about 0.07 seconds.

Such run times are too much small to perform a sufficiently 
meaningful comparison. You need a run-time of about 2 seconds to 
get meaningful timings.

The difference between 0.05 and 0.07 is caused by initializing 
the D rutime (like the D GC), it takes about 0.015 seconds on my 
systems at full speed CPU to initialize the D runtime, and it's a 
constant time.

Bye,
bearophile

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
 Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize 
 the d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.

 Your C code is not equivalent to the D code, there are small 
 differences, even the output is different. So I've cleaned up 
 your C and D code:

 ------------------------

 // C code.
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 #include <time.h>
 #include <sys/time.h>
 #include "jol.h"

 int main() {
     struct timeval s, e;
     gettimeofday(&s, NULL);

     int pol = 5;
     tes(&pol);

     int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215};
     int len = 13 - 1;
     int g = 0;

     for (int x = 36; x >= 0; --x) {
         for (int y = len; y >= 0; --y) {
             ++g;
             arr[y]++;
         }
     }

     gettimeofday(&e, NULL);
     printf("C: %d %lu %d %d %d\n",
            g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);

     return 0;
 }

 ------------------------

 D code ("final" functions have not much meaning, but the D 
 compiler is very sloppy and doesn't complain):


 module jol;

 void tes(ref int a) {
     a = 9;
 }


 ---------

 module maind;

 void main() {
     import std.stdio;
     import std.datetime;
     import jol;

     StopWatch sw;
     sw.start;

     int pol = 5;
     tes(pol);

     int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215];
     int len = 13 - 1;
     int g = 0;

     for (int x = 36; x >= 0; --x) {
         // Some code here erased for the test.
         for (int y = len; y >= 0; --y) {
             // Some other code here.
             ++g;
             arr[y]++;
         }
     }

     sw.stop;
     writefln("D: %d %d %d %d %d",
              g, sw.peek.nsecs, arr[4], arr[9], pol);
 }

 ----------------

 That D code is not fully idiomatic, this is closer to idiomatic 
 D code:


 module jol2;

 void test(ref int x) pure nothrow  safe {
     x = 9;
 }



 module maind;

 void main() {
     import std.stdio, std.datetime;
     import jol2;

     StopWatch sw;
     sw.start;

     int pol = 5;
     test(pol);

     int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215];
     uint count = 0;

     foreach_reverse (immutable _; 0 .. 37) {
         foreach_reverse (ref ai; arr) {
             count++;
             ai++;
         }
     }

     sw.stop;
     writefln("D: %d %d %d %d %d",
              count, sw.peek.nsecs, arr[4], arr[9], pol);
 }

 ----------------

 In my benchmarks I don't have used the more idiomatic D code, I 
 have used the C-like code. But the run-time is essentially the 
 same.

 I compile the C and D code with (on a 32 bit Windows):

 gcc -march=native -std=c11 -O2 main.c jol.c -o main

 ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
 strip maind.exe

 For the D code I've used the latest ldc2 compiler (V. 0.13.0, 
 based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 
 (rubenvb-4.8.0).

 ----------------

 The C code gives as ouput:

 C: 481 0 105 602 9


 The D code gives as output:

 D: 481 6076 105 602 9

 ----------------------

 If I slow down the CPU at half speed the C code runs in about 
 0.05 seconds, the D code runs in about 0.07 seconds.

 Such run times are too much small to perform a sufficiently 
 meaningful comparison. You need a run-time of about 2 seconds 
 to get meaningful timings.

 The difference between 0.05 and 0.07 is caused by initializing 
 the D rutime (like the D GC), it takes about 0.015 seconds on 
 my systems at full speed CPU to initialize the D runtime, and 
 it's a constant time.

 Bye,
 bearophile

You are definitely right, I did mess up while translating !

I run the corrected codes (the ones I was meant to provide :S) 
and on a slow macbook I end up with :
C : 2
D : 15994

Of course when run on very high end machines, this diff is almost 
non existent but we want to run on very low powered hardware.

Ok, even with a longer code, there will always be a launch 
penalty for d. So I cannot use it for very high performance loops.

Shame for us..
:)

Thanks and bye

Jul 09 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Larry:

 Of course when run on very high end machines, this diff is 
 almost non existent but we want to run on very low powered 
 hardware.

 Ok, even with a longer code, there will always be a launch 
 penalty for d. So I cannot use it for very high performance 
 loops.

If you run it on very low powered hardware then you may not need 
the GC. So if you disable the run-time (stubbing out the GC) the 
start-up time of the D code will be smaller.

I think people here like you are really too quick at dismissing D 
:-)

Bye,
bearophile

Jul 09 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
 Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize 
 the d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.

 Your C code is not equivalent to the D code, there are small 
 differences, even the output is different. So I've cleaned up 
 your C and D code:

 ------------------------

 // C code.
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 #include <time.h>
 #include <sys/time.h>
 #include "jol.h"

 int main() {
    struct timeval s, e;
    gettimeofday(&s, NULL);

    int pol = 5;
    tes(&pol);

    int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215};
    int len = 13 - 1;
    int g = 0;

    for (int x = 36; x >= 0; --x) {
        for (int y = len; y >= 0; --y) {
            ++g;
            arr[y]++;
        }
    }

    gettimeofday(&e, NULL);
    printf("C: %d %lu %d %d %d\n",
           g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);

    return 0;
 }

 ------------------------

 D code ("final" functions have not much meaning, but the D 
 compiler is very sloppy and doesn't complain):


 module jol;

 void tes(ref int a) {
    a = 9;
 }


 ---------

 module maind;

 void main() {
    import std.stdio;
    import std.datetime;
    import jol;

    StopWatch sw;
    sw.start;

    int pol = 5;
    tes(pol);

    int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215];
    int len = 13 - 1;
    int g = 0;

    for (int x = 36; x >= 0; --x) {
        // Some code here erased for the test.
        for (int y = len; y >= 0; --y) {
            // Some other code here.
            ++g;
            arr[y]++;
        }
    }

    sw.stop;
    writefln("D: %d %d %d %d %d",
             g, sw.peek.nsecs, arr[4], arr[9], pol);
 }

 ----------------

 That D code is not fully idiomatic, this is closer to 
 idiomatic D code:


 module jol2;

 void test(ref int x) pure nothrow  safe {
    x = 9;
 }



 module maind;

 void main() {
    import std.stdio, std.datetime;
    import jol2;

    StopWatch sw;
    sw.start;

    int pol = 5;
    test(pol);

    int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 
 985, 3215];
    uint count = 0;

    foreach_reverse (immutable _; 0 .. 37) {
        foreach_reverse (ref ai; arr) {
            count++;
            ai++;
        }
    }

    sw.stop;
    writefln("D: %d %d %d %d %d",
             count, sw.peek.nsecs, arr[4], arr[9], pol);
 }

 ----------------

 In my benchmarks I don't have used the more idiomatic D code, 
 I have used the C-like code. But the run-time is essentially 
 the same.

 I compile the C and D code with (on a 32 bit Windows):

 gcc -march=native -std=c11 -O2 main.c jol.c -o main

 ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
 strip maind.exe

 For the D code I've used the latest ldc2 compiler (V. 0.13.0, 
 based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 
 (rubenvb-4.8.0).

 ----------------

 The C code gives as ouput:

 C: 481 0 105 602 9


 The D code gives as output:

 D: 481 6076 105 602 9

 ----------------------

 If I slow down the CPU at half speed the C code runs in about 
 0.05 seconds, the D code runs in about 0.07 seconds.

 Such run times are too much small to perform a sufficiently 
 meaningful comparison. You need a run-time of about 2 seconds 
 to get meaningful timings.

 The difference between 0.05 and 0.07 is caused by initializing 
 the D rutime (like the D GC), it takes about 0.015 seconds on 
 my systems at full speed CPU to initialize the D runtime, and 
 it's a constant time.

 Bye,
 bearophile

 You are definitely right, I did mess up while translating !

 I run the corrected codes (the ones I was meant to provide :S) 
 and on a slow macbook I end up with :
 C : 2
 D : 15994

 Of course when run on very high end machines, this diff is 
 almost non existent but we want to run on very low powered 
 hardware.

 Ok, even with a longer code, there will always be a launch 
 penalty for d. So I cannot use it for very high performance 
 loops.

 Shame for us..
 :)

 Thanks and bye

Could you provide the exact code you are using for that 
benchmark? Once the program has started up you should be able to 
obtain performance parity between C and D. Situations where this 
isn't true are problems we would like to know about.

For the amount of work you are doing in the test program (almost 
nothing), the total runtime is probably dominated by the program 
load time etc. even when using C.

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

Yes you are perfectly right but our need is to run the fastest 
code on the lowest powered machines. Not servers but embedded 
systems.

That is why I just test the overall structures.

The rest of the code is numerical so it will not change by much 
the fact that d cannot get back the huge launching time. At the 
microsecond level(even nano) it counts because of electrical 
consumption, size of hardware, heat and so on.

It is definitely not something most care about and i cannot 
disclose the full code for license reasons (yeah I know I suck 
and generate some fuss for nothing but.. I just execute.)

But D may be of our use for non critical code to replace some 
Python there and there. It is definitely a good piece of 
engineering. And it will help save money.

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 Yes you are perfectly right but our need is to run the fastest 
 code on the lowest powered machines. Not servers but embedded 
 systems.

 That is why I just test the overall structures.

 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.

 It is definitely not something most care about and i cannot 
 disclose the full code for license reasons (yeah I know I suck 
 and generate some fuss for nothing but.. I just execute.)

 But D may be of our use for non critical code to replace some 
 Python there and there. It is definitely a good piece of 
 engineering. And it will help save money.

 John Colvin :
hem, you meant the sample code or the real code ? If the former, 
it is the one corrected by Bearophile.
My excuses

Jul 09 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Larry:

 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.

Have you benchmarked the D code without starting the current 
d-runtime (without GC)?

Is a starting time of around 0.015 seconds on an old PC is a huge 
one? I think no one has worked a lot in decreasing this tiny 
time. If you care for such time, D being open source, you can 
take a look at the runtime starting code.

Bye,
bearophile

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

 Bearophile: just tried. No dramatic change.

import core.memory;

void main() {
GC.disable;
...
}

Jul 09 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Larry:

  Bearophile: just tried. No dramatic change.

 import core.memory;

 void main() {
 GC.disable;
 ...
 }

That just means disabling the GC, so the start time is the same.
What you want is to not start the GC/runtime, stubbing it out... 
(assuming you don't need the GC in your program).

I think you can stub out the runtime functions defining few empty 
extern(C) functions, but I've never had to do it (saving 0.015 
seconds is not important for my needs), so if you don't know how 
to do it, you have to ask to others.

Bye,
bearophile

Jul 09 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.

You say you are worried about microseconds and power consumption, 
but you are suggesting launching a new process - a lot of 
overhead - to do a small amount of numerical work.

Surely no matter what programming language you use you would not 
want to work like this?

Jul 09 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 You say you are worried about microseconds and power 
 consumption, but you are suggesting launching a new process - a 
 lot of overhead - to do a small amount of numerical work.

Not much overhead if you don't use a MMU and use static linking.

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.

 You say you are worried about microseconds and power 
 consumption, but you are suggesting launching a new process - a 
 lot of overhead - to do a small amount of numerical work.

 Surely no matter what programming language you use you would 
 not want to work like this?

 John : A new process ? Where ?
Or maybe I got you wrong on this one John

I am writing libraries and before going further I wondered if
there were alternatives that I could have a grab on. The idea is
to have an homogeneous software so we were ready to switch to d
for the whole tasks/asset.

No new process involved.

I was seaking for maybe a python like programming language that
offers c-like perfs, without so much writing as in c. Exit
Cython. Debugging it is a real pain. And executable size is..
well..

I am becoming lazy and seek for the Holy Grail. Java not welcome.
D seemed like a very good choice and maybe it is, or more
certainly will.

Jul 09 2014

"Chris" <wendlec tcd.ie> writes:

On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.

 You say you are worried about microseconds and power 
 consumption, but you are suggesting launching a new process - 
 a lot of overhead - to do a small amount of numerical work.

 Surely no matter what programming language you use you would 
 not want to work like this?

  John : A new process ? Where ?
 Or maybe I got you wrong on this one John

 I am writing libraries and before going further I wondered if
 there were alternatives that I could have a grab on. The idea is
 to have an homogeneous software so we were ready to switch to d
 for the whole tasks/asset.

 No new process involved.

 I was seaking for maybe a python like programming language that
 offers c-like perfs, without so much writing as in c. Exit
 Cython. Debugging it is a real pain. And executable size is..
 well..

 I am becoming lazy and seek for the Holy Grail. Java not 
 welcome.
 D seemed like a very good choice and maybe it is, or more
 certainly will.

I wouldn't give up on D (as you've already signalled). It's 
getting better with each iteration. BTW, have you measured the 
power consumption yet? Does it make a big difference if you use D 
or C?

Jul 09 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.

 You say you are worried about microseconds and power 
 consumption, but you are suggesting launching a new process - 
 a lot of overhead - to do a small amount of numerical work.

 Surely no matter what programming language you use you would 
 not want to work like this?

  John : A new process ? Where ?
 Or maybe I got you wrong on this one John

process == program in this case. Launching a new process == 
running the program

The startup cost of the D runtime is only paid when you start the 
program. If the amount of work done per execution of the program 
is more than a trivial amount then the startup cost will only be 
a small part of the total running time and power consumption etc.

 I am writing libraries and before going further I wondered if
 there were alternatives that I could have a grab on. The idea is
 to have an homogeneous software so we were ready to switch to d
 for the whole tasks/asset.

 No new process involved.

 I was seaking for maybe a python like programming language that
 offers c-like perfs, without so much writing as in c. Exit
 Cython. Debugging it is a real pain. And executable size is..
 well..

 I am becoming lazy and seek for the Holy Grail. Java not 
 welcome.
 D seemed like a very good choice and maybe it is, or more
 certainly will.

I think D could be a good choice for you.

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

I may definitely help on the D project.

I noticed that gdc doesn't have profile guided optimization too.
So yeah, I cannot use D right now, I mean for this project.

Ok, I will do my best to have some spare time on Dlang. Didn't 
really looked at the code already and I code for years in C, 
which is my first class coding language. Hope it will not be any 
kind of barrier (c++ is my.. third best coding buddy anyway 
(after python, excellent for managing systems)).

Many thanks to all the community. I will stick with you and see 
what I can bring (or cannot).

:)

Bye

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

 Chris :
Actually yes. If we consider the device to run 20h a day, by 
shaving a few microseconds there and there on billions of 
operations a day over a whole machine park, you can enable 
yourself to shut down some of them for maintenance more easily, 
or pause some of them letting their battery lasting a bit longer 
and economies have proven to be in the order of thousands $$ 
thanks to a redefined coding strategy.

Not even mentionning hardware usage which is related to heat and 
savings you can pretend to have over a long run.

By changing some hardware a few monthes after their theorical 
obsolescence, you can save a bit further.

And the accountant is very happy because he can optimize the 
finance further (staggered repayment)

It enabled us to hire more engineers/hardware.

Of course, the saving is not only on this loop but on the whole 
chain. And it definitely adds up $$$.

And there are a lot more things involved that benefit it (latency 
and so on).

Yep. :)

Jul 09 2014

"Kapps" <opantm2+spam gmail.com> writes:

On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
 You are definitely right, I did mess up while translating !

 I run the corrected codes (the ones I was meant to provide :S) 
 and on a slow macbook I end up with :
 C : 2
 D : 15994

 Of course when run on very high end machines, this diff is 
 almost non existent but we want to run on very low powered 
 hardware.

 Ok, even with a longer code, there will always be a launch 
 penalty for d. So I cannot use it for very high performance 
 loops.

 Shame for us..
 :)

 Thanks and bye

This to me pretty much confirms that almost the entirety of your 
C code is being optimized out and thus not actually executing.

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

The actual code is not that much slower according to the numerous 
other operations we do. And certainly faster than D version doing 
almost nothing.

Well it is about massive bitshifts and array accesses and 
calculations.
With all the optimizations we are on par with fortran numerical 
code (thanks -std=c11).

There may be an optimization hidden somewhere or just gdc having 
to mature.

Dunno. But don't get me wrong, D is a fantastic language.

Jul 09 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;

[...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);

Changing the topic a little, the calculation above ignores the tv_sec 
members of s and e.

Ali

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:
 On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;

 [...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);

 Changing the topic a little, the calculation above ignores the 
 tv_sec members of s and e.

 Ali

Absolutely Ali because I know it is under the sec range. I made 
some test before submitting it :)

But you are absolutely right Ali the mileage will vary in a 
completely different scenario.

Jul 09 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 07/09/2014 12:47 PM, Larry wrote:

 On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:
 On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;

 [...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);

 Changing the topic a little, the calculation above ignores the tv_sec
 members of s and e.

 Ali

 Absolutely Ali because I know it is under the sec range. I made some
 test before submitting it :)

I know it did work and will work every time you test it. :)

However, even if the difference is just one millisecond, if s and e 
happen to be on different sides of a second boundary, you will get a 
huge result.

Ali

Jul 09 2014

"Larry" <deco33 hotmail.fr> writes:

Right

Jul 09 2014

"Kapps" <opantm2+spam gmail.com> writes:

Measure a larger number of loops. I understand you're concerned 
about microseconds, but your benchmark shows nothing because your 
timer is simply not accurate enough for this. The benchmark that 
bearophile showed where C took ~2 nanoseconds vs the ~7000 D took 
heavily implies to me that the C implementation is simply being 
optimized out and nothing is actually running. All inputs are 
known at compile-time, the output is known at compile-time, the 
compiler is perfectly free to simply remove all your code and 
replace it with the result. I'm somewhat surprised that the D 
version doesn't do this actually, perhaps because of the dynamic 
memory allocation. I realize that you can't post your actual 
code, but this benchmark honestly just has too many flaws to 
determine anything from.

As for startup cost, D will indeed have a higher startup cost 
than C because of static constructors. Once it's running, it 
should be very close. If you're looking to start a process that 
will run for only a few milliseconds, you'd probably want to not 
use D (or avoid most static constructors, including those in the 
runtime / standard library).

Jul 09 2014

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Small part of a program : d and c versions performances diff.