www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Small part of a program : d and c versions performances diff.

reply "Larry" <deco33 hotmail.fr> writes:
Hello,

I extracted a part of my code written in c.
it is deliberately useless here but I would understand the 
different technics to optimize such kind of code with gdc 
compiler.

it currently runs under a microsecond.

Constraint : the way the code is expressed cannot be changed much 
we need that double loop because there are other operations 
involved in the first loop scope.

main.c :
[code]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "jol.h"
#include <time.h>
#include <sys/time.h>
int main(void)
{

     struct timeval s,e;
     gettimeofday(&s,NULL);

     int pol = 5;
     tes(&pol);


     int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215};
     int len = 13-1;
     int g = 0;

     for (int x = 36; x >= 0 ; --x ){
         // some code here erased for the test
         for(int y = len ; y >= 0; --y){
             //some other code here
             ++g;
             arr[y] +=1;

         }

     }
     gettimeofday(&e,NULL);

     printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, 
arr[4],arr[9],pol);
     return 0;
}
[/code]

jol.c
[code]
void tes(int * restrict a){

     *a = 9;

}
[/code]

and jol.h

#ifndef JOL_H
#define JOL_H
void tes(int * restrict a);
#endif // JOL_H


Now, the D counterpart:

module main;

import std.stdio;
import std.datetime;
import jol;
int main(string[] args)
{


     auto currentTime = Clock.currTime();

     int pol = 5;
     tes(pol);
     pol = 8;

     int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215];
     int len = 13-1;
     int g = 0;

     for (int x = 31; x >= 0 ; --x ){

         for(int y = len ; y >= 0; --y){

             ++g;
             arr[y] +=1;

         }

     }
     auto currentTime2 = Clock.currTime();
     writefln("Hello World %d %s %d %d\n",g, (currentTime2 - 
currentTime),arr[4],arr[9]);

     return 0;
}

and

module jol;
final void tes(ref int a){

     a = 9;

}


Ok, the compilation options :
gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize

gcc -march=native -std=c11 -O2 main.c jol.c

Now the performance :
D : 12 µs
C : < 1µs

Where does the diff comes from ? Is there a way to optimize the d 
version ?

Again, I am absolutely new to D and those are my very first line 
of code with it.

Thanks
Jul 09 2014
next sibling parent "NCrashed" <NCrashed gmail.com> writes:
On Wednesday, 9 July 2014 at 10:57:33 UTC, Larry wrote:
 Hello,

 I extracted a part of my code written in c.
 it is deliberately useless here but I would understand the 
 different technics to optimize such kind of code with gdc 
 compiler.

 it currently runs under a microsecond.

 Constraint : the way the code is expressed cannot be changed 
 much we need that double loop because there are other 
 operations involved in the first loop scope.

 main.c :
 [code]
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 #include "jol.h"
 #include <time.h>
 #include <sys/time.h>
 int main(void)
 {

     struct timeval s,e;
     gettimeofday(&s,NULL);

     int pol = 5;
     tes(&pol);


     int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215};
     int len = 13-1;
     int g = 0;

     for (int x = 36; x >= 0 ; --x ){
         // some code here erased for the test
         for(int y = len ; y >= 0; --y){
             //some other code here
             ++g;
             arr[y] +=1;

         }

     }
     gettimeofday(&e,NULL);

     printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, 
 arr[4],arr[9],pol);
     return 0;
 }
 [/code]

 jol.c
 [code]
 void tes(int * restrict a){

     *a = 9;

 }
 [/code]

 and jol.h

 #ifndef JOL_H
 #define JOL_H
 void tes(int * restrict a);
 #endif // JOL_H


 Now, the D counterpart:

 module main;

 import std.stdio;
 import std.datetime;
 import jol;
 int main(string[] args)
 {


     auto currentTime = Clock.currTime();

     int pol = 5;
     tes(pol);
     pol = 8;

     int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215];
     int len = 13-1;
     int g = 0;

     for (int x = 31; x >= 0 ; --x ){

         for(int y = len ; y >= 0; --y){

             ++g;
             arr[y] +=1;

         }

     }
     auto currentTime2 = Clock.currTime();
     writefln("Hello World %d %s %d %d\n",g, (currentTime2 - 
 currentTime),arr[4],arr[9]);

     return 0;
 }

 and

 module jol;
 final void tes(ref int a){

     a = 9;

 }


 Ok, the compilation options :
 gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize

 gcc -march=native -std=c11 -O2 main.c jol.c

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize the 
 d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.

 Thanks
Clock isn't an accurate benchmark instrument. Try std.datetime.benchmark: ``` module main; import std.stdio; import std.datetime; void tes(ref int a) { a = 9; } int[] arr = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; void foo() { int pol = 5; tes(pol); pol = 8; int g = 0; foreach_reverse(x; 0..31) { foreach_reverse(ref a; arr) { ++g; a += 1; } } } void main() { auto res = benchmark!foo(1000); // take mean of 1000 launches writeln(res[0].msecs, " ", arr[4], " ", arr[9]); } ``` Dmd time: 1 us Gcc time: <= 1 us
Jul 09 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize the 
 d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.
Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
Jul 09 2014
parent reply "Larry" <deco33 hotmail.fr> writes:
On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
 Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize 
 the d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.
Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and bye
Jul 09 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Larry:

 Of course when run on very high end machines, this diff is 
 almost non existent but we want to run on very low powered 
 hardware.

 Ok, even with a longer code, there will always be a launch 
 penalty for d. So I cannot use it for very high performance 
 loops.
If you run it on very low powered hardware then you may not need the GC. So if you disable the run-time (stubbing out the GC) the start-up time of the D code will be smaller. I think people here like you are really too quick at dismissing D :-) Bye, bearophile
Jul 09 2014
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
 Larry:

 Now the performance :
 D : 12 µs
 C : < 1µs

 Where does the diff comes from ? Is there a way to optimize 
 the d version ?

 Again, I am absolutely new to D and those are my very first 
 line of code with it.
Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and bye
Could you provide the exact code you are using for that benchmark? Once the program has started up you should be able to obtain performance parity between C and D. Situations where this isn't true are problems we would like to know about. For the amount of work you are doing in the test program (almost nothing), the total runtime is probably dominated by the program load time etc. even when using C.
Jul 09 2014
parent reply "Larry" <deco33 hotmail.fr> writes:
Yes you are perfectly right but our need is to run the fastest 
code on the lowest powered machines. Not servers but embedded 
systems.

That is why I just test the overall structures.

The rest of the code is numerical so it will not change by much 
the fact that d cannot get back the huge launching time. At the 
microsecond level(even nano) it counts because of electrical 
consumption, size of hardware, heat and so on.

It is definitely not something most care about and i cannot 
disclose the full code for license reasons (yeah I know I suck 
and generate some fuss for nothing but.. I just execute.)

But D may be of our use for non critical code to replace some 
Python there and there. It is definitely a good piece of 
engineering. And it will help save money.
Jul 09 2014
next sibling parent "Larry" <deco33 hotmail.fr> writes:
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 Yes you are perfectly right but our need is to run the fastest 
 code on the lowest powered machines. Not servers but embedded 
 systems.

 That is why I just test the overall structures.

 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.

 It is definitely not something most care about and i cannot 
 disclose the full code for license reasons (yeah I know I suck 
 and generate some fuss for nothing but.. I just execute.)

 But D may be of our use for non critical code to replace some 
 Python there and there. It is definitely a good piece of 
 engineering. And it will help save money.
John Colvin : hem, you meant the sample code or the real code ? If the former, it is the one corrected by Bearophile. My excuses
Jul 09 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Larry:

 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.
Have you benchmarked the D code without starting the current d-runtime (without GC)? Is a starting time of around 0.015 seconds on an old PC is a huge one? I think no one has worked a lot in decreasing this tiny time. If you care for such time, D being open source, you can take a look at the runtime starting code. Bye, bearophile
Jul 09 2014
parent reply "Larry" <deco33 hotmail.fr> writes:
 Bearophile: just tried. No dramatic change.

import core.memory;

void main() {
GC.disable;
...
}
Jul 09 2014
parent "bearophile" <bearophileHUGS lycos.com> writes:
Larry:

  Bearophile: just tried. No dramatic change.

 import core.memory;

 void main() {
 GC.disable;
 ...
 }
That just means disabling the GC, so the start time is the same. What you want is to not start the GC/runtime, stubbing it out... (assuming you don't need the GC in your program). I think you can stub out the runtime functions defining few empty extern(C) functions, but I've never had to do it (saving 0.015 seconds is not important for my needs), so if you don't know how to do it, you have to ask to others. Bye, bearophile
Jul 09 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by much 
 the fact that d cannot get back the huge launching time. At the 
 microsecond level(even nano) it counts because of electrical 
 consumption, size of hardware, heat and so on.
You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
Jul 09 2014
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 You say you are worried about microseconds and power 
 consumption, but you are suggesting launching a new process - a 
 lot of overhead - to do a small amount of numerical work.
Not much overhead if you don't use a MMU and use static linking.
Jul 09 2014
prev sibling parent reply "Larry" <deco33 hotmail.fr> writes:
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.
You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.
Jul 09 2014
next sibling parent "Chris" <wendlec tcd.ie> writes:
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.
You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.
I wouldn't give up on D (as you've already signalled). It's getting better with each iteration. BTW, have you measured the power consumption yet? Does it make a big difference if you use D or C?
Jul 09 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:
 On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:
 On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:
 The rest of the code is numerical so it will not change by 
 much the fact that d cannot get back the huge launching time. 
 At the microsecond level(even nano) it counts because of 
 electrical consumption, size of hardware, heat and so on.
You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
John : A new process ? Where ? Or maybe I got you wrong on this one John
process == program in this case. Launching a new process == running the program The startup cost of the D runtime is only paid when you start the program. If the amount of work done per execution of the program is more than a trivial amount then the startup cost will only be a small part of the total running time and power consumption etc.
 I am writing libraries and before going further I wondered if
 there were alternatives that I could have a grab on. The idea is
 to have an homogeneous software so we were ready to switch to d
 for the whole tasks/asset.

 No new process involved.

 I was seaking for maybe a python like programming language that
 offers c-like perfs, without so much writing as in c. Exit
 Cython. Debugging it is a real pain. And executable size is..
 well..

 I am becoming lazy and seek for the Holy Grail. Java not 
 welcome.
 D seemed like a very good choice and maybe it is, or more
 certainly will.
I think D could be a good choice for you.
Jul 09 2014
parent reply "Larry" <deco33 hotmail.fr> writes:
I may definitely help on the D project.

I noticed that gdc doesn't have profile guided optimization too.
So yeah, I cannot use D right now, I mean for this project.

Ok, I will do my best to have some spare time on Dlang. Didn't 
really looked at the code already and I code for years in C, 
which is my first class coding language. Hope it will not be any 
kind of barrier (c++ is my.. third best coding buddy anyway 
(after python, excellent for managing systems)).

Many thanks to all the community. I will stick with you and see 
what I can bring (or cannot).

:)

Bye
Jul 09 2014
parent "Larry" <deco33 hotmail.fr> writes:
 Chris :
Actually yes. If we consider the device to run 20h a day, by 
shaving a few microseconds there and there on billions of 
operations a day over a whole machine park, you can enable 
yourself to shut down some of them for maintenance more easily, 
or pause some of them letting their battery lasting a bit longer 
and economies have proven to be in the order of thousands $$ 
thanks to a redefined coding strategy.

Not even mentionning hardware usage which is related to heat and 
savings you can pretend to have over a long run.

By changing some hardware a few monthes after their theorical 
obsolescence, you can save a bit further.

And the accountant is very happy because he can optimize the 
finance further (staggered repayment)

It enabled us to hire more engineers/hardware.

Of course, the saving is not only on this loop but on the whole 
chain. And it definitely adds up $$$.

And there are a lot more things involved that benefit it (latency 
and so on).

Yep. :)
Jul 09 2014
prev sibling parent reply "Kapps" <opantm2+spam gmail.com> writes:
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
 You are definitely right, I did mess up while translating !

 I run the corrected codes (the ones I was meant to provide :S) 
 and on a slow macbook I end up with :
 C : 2
 D : 15994

 Of course when run on very high end machines, this diff is 
 almost non existent but we want to run on very low powered 
 hardware.

 Ok, even with a longer code, there will always be a launch 
 penalty for d. So I cannot use it for very high performance 
 loops.

 Shame for us..
 :)

 Thanks and bye
This to me pretty much confirms that almost the entirety of your C code is being optimized out and thus not actually executing.
Jul 09 2014
parent "Larry" <deco33 hotmail.fr> writes:
The actual code is not that much slower according to the numerous 
other operations we do. And certainly faster than D version doing 
almost nothing.

Well it is about massive bitshifts and array accesses and 
calculations.
With all the optimizations we are on par with fortran numerical 
code (thanks -std=c11).

There may be an optimization hidden somewhere or just gdc having 
to mature.

Dunno. But don't get me wrong, D is a fantastic language.
Jul 09 2014
prev sibling next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;
[...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);
Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Jul 09 2014
parent reply "Larry" <deco33 hotmail.fr> writes:
On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:
 On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;
[...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);
Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Absolutely Ali because I know it is under the sec range. I made some test before submitting it :) But you are absolutely right Ali the mileage will vary in a completely different scenario.
Jul 09 2014
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 07/09/2014 12:47 PM, Larry wrote:

 On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:
 On 07/09/2014 03:57 AM, Larry wrote:

      struct timeval s,e;
[...]
      gettimeofday(&e,NULL);

      printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec,
 arr[4],arr[9],pol);
Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Absolutely Ali because I know it is under the sec range. I made some test before submitting it :)
I know it did work and will work every time you test it. :) However, even if the difference is just one millisecond, if s and e happen to be on different sides of a second boundary, you will get a huge result. Ali
Jul 09 2014
parent "Larry" <deco33 hotmail.fr> writes:
Right
Jul 09 2014
prev sibling parent "Kapps" <opantm2+spam gmail.com> writes:
Measure a larger number of loops. I understand you're concerned 
about microseconds, but your benchmark shows nothing because your 
timer is simply not accurate enough for this. The benchmark that 
bearophile showed where C took ~2 nanoseconds vs the ~7000 D took 
heavily implies to me that the C implementation is simply being 
optimized out and nothing is actually running. All inputs are 
known at compile-time, the output is known at compile-time, the 
compiler is perfectly free to simply remove all your code and 
replace it with the result. I'm somewhat surprised that the D 
version doesn't do this actually, perhaps because of the dynamic 
memory allocation. I realize that you can't post your actual 
code, but this benchmark honestly just has too many flaws to 
determine anything from.

As for startup cost, D will indeed have a higher startup cost 
than C because of static constructors. Once it's running, it 
should be very close. If you're looking to start a process that 
will run for only a few milliseconds, you'd probably want to not 
use D (or avoid most static constructors, including those in the 
runtime / standard library).
Jul 09 2014