www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - math.log() benchmark of first 1 billion int using std.parallelism

reply "Iov Gherman" <iovisx gmail.com> writes:
Hi everybody,

I am a java developer and used C/C++ only for some home projects 
so I never mastered native programming.

I am currently learning D and I find it fascinating. I was 
reading the documentation about std.parallelism and I wanted to 
experiment a bit with the example "Find the logarithm of every 
number from 1 to 10_000_000 in parallel".

So, first, I changed the limit to 1 billion and ran it. I was 
blown away by the performance, the program ran in: 4 secs, 670 ms 
and I used a workUnitSize of 200. I have an i7 4th generation 
processor with 8 cores.

Then I was curios to try the same test in Java just to see how 
much slower will that be (at least that was what I expected). I 
used Java's ExecutorService with a pool of 8 cores and created 
5_000_000 tasks, each task was calculating log() for 200 numbers. 
The whole program ran in 3 secs, 315 ms.

Now, can anyone explain why this program ran faster in Java? I 
ran both programs multiple times and the results were always 
close to this execution times.

Can the implementation of log() function be the reason for a 
slower execution time in D?

I then decided to ran the same program in a single thread, a 
simple foreach/for loop. I tried it in C and Go also. This are 
the results:
- D: 24 secs, 32 ms.
- Java: 20 secs, 881 ms.
- C: 21 secs
- Go: 37 secs

I run Arch Linux on my PC. I compiled D programs using dmd-2.066 
and used no compile arguments (dmd prog.d).
I used Oracle's Java 8 (tried 7 and 6, seems like with Java 6 the 
performance is a bit better then 7 and 8).
To compile the C program I used: gcc 4.9.2
For Go program I used go 1.4

I really really like the built in support in D for parallel 
processing and how easy is to schedule tasks taking advantage of 
workUnitSize.

Thanks,
Iov
Dec 22 2014
next sibling parent "bachmeier" <no spam.net> writes:
On Monday, 22 December 2014 at 10:12:52 UTC, Iov Gherman wrote:
 Now, can anyone explain why this program ran faster in Java? I 
 ran both programs multiple times and the results were always 
 close to this execution times.

 Can the implementation of log() function be the reason for a 
 slower execution time in D?

 I then decided to ran the same program in a single thread, a 
 simple foreach/for loop. I tried it in C and Go also. This are 
 the results:
 - D: 24 secs, 32 ms.
 - Java: 20 secs, 881 ms.
 - C: 21 secs
 - Go: 37 secs

 I run Arch Linux on my PC. I compiled D programs using 
 dmd-2.066 and used no compile arguments (dmd prog.d).
 I used Oracle's Java 8 (tried 7 and 6, seems like with Java 6 
 the performance is a bit better then 7 and 8).
 To compile the C program I used: gcc 4.9.2
 For Go program I used go 1.4

 I really really like the built in support in D for parallel 
 processing and how easy is to schedule tasks taking advantage 
 of workUnitSize.

 Thanks,
 Iov
DMD is generally going to produce the slowest code. LDC and GDC will normally do better.
Dec 22 2014
prev sibling next sibling parent reply Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
 I run Arch Linux on my PC. I compiled D programs using dmd-2.066 
 and used no compile arguments (dmd prog.d)
You should try use some arguments -O -release -inline -noboundscheck and maybe try use gdc or ldc should help with performance can you post your code in all languages somewhere? I like to try it on my machine :)
Dec 22 2014
parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Monday, 22 December 2014 at 10:35:52 UTC, Daniel Kozak via 
Digitalmars-d-learn wrote:
 I run Arch Linux on my PC. I compiled D programs using 
 dmd-2.066 and used no compile arguments (dmd prog.d)
You should try use some arguments -O -release -inline -noboundscheck and maybe try use gdc or ldc should help with performance can you post your code in all languages somewhere? I like to try it on my machine :)
Btw. try use C log function, maybe it would be faster: import core.stdc.math;
Dec 22 2014
parent reply "aldanor" <i.s.smirnov gmail.com> writes:
On Monday, 22 December 2014 at 10:40:45 UTC, Daniel Kozak wrote:
 On Monday, 22 December 2014 at 10:35:52 UTC, Daniel Kozak via 
 Digitalmars-d-learn wrote:
 I run Arch Linux on my PC. I compiled D programs using 
 dmd-2.066 and used no compile arguments (dmd prog.d)
You should try use some arguments -O -release -inline -noboundscheck and maybe try use gdc or ldc should help with performance can you post your code in all languages somewhere? I like to try it on my machine :)
Btw. try use C log function, maybe it would be faster: import core.stdc.math;
Just tried it out myself (E5 Xeon / Linux): D version: 19.64 sec (avg 3 runs) import core.stdc.math; void main() { double s = 0; foreach (i; 1 .. 1_000_000_000) s += log(i); } // build flags: -O -release C version: 19.80 sec (avg 3 runs) #include <math.h> int main() { double s = 0; long i; for (i = 1; i < 1000000000; i++) s += log(i); return 0; } // build flags: -O3 -lm
Dec 22 2014
parent reply "aldanor" <i.s.smirnov gmail.com> writes:
On Monday, 22 December 2014 at 11:11:07 UTC, aldanor wrote:
 Just tried it out myself (E5 Xeon / Linux):

 D version: 19.64 sec (avg 3 runs)

     import core.stdc.math;

     void main() {
         double s = 0;
         foreach (i; 1 .. 1_000_000_000)
             s += log(i);
     }

     // build flags: -O -release

 C version: 19.80 sec (avg 3 runs)

     #include <math.h>

     int main() {
         double s = 0;
         long i;
         for (i = 1; i < 1000000000; i++)
             s += log(i);
         return 0;
     }

     // build flags: -O3 -lm
Replacing "import core.stdc.math" with "import std.math" in the D example increases the avg runtime from 19.64 to 23.87 seconds (~20% slower) which is consistent with OP's statement.
Dec 22 2014
parent "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
 Replacing "import core.stdc.math" with "import std.math" in the 
 D example increases the avg runtime from 19.64 to 23.87 seconds 
 (~20% slower) which is consistent with OP's statement.
+ GDC/LDC vs DMD + nobounds, release Do you think we should start a topic on D wiki front page for benchmarking/performance tips to organize peoples' experience of what works? I took a quick look and couldn't see anything already. And it seems to be a topic that comes up quite frequently (less on forum than people doing their own benchmarks and it getting picked up on reddit etc). I am not so experienced in this area otherwise I would write a first draft myself. Laeeth
Dec 22 2014
prev sibling next sibling parent Russel Winder via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Mon, 2014-12-22 at 10:12 +0000, Iov Gherman via Digitalmars-d-learn wrote:
 […]
 - D: 24 secs, 32 ms.
 - Java: 20 secs, 881 ms.
 - C: 21 secs
 - Go: 37 secs
 
Without the source codes and the commands used to create and run, it is impossible to offer constructive criticism of the results. However a priori the above does not surprise me. I'll wager ldc2 or gdc will beat dmd for CPU-bound code, so as others have said for benchmarking use ldc2 or gdc with all optimization on (-O3). If you used gc for Go then switch to gccgo (again with -O3) and see a huge performance improvement on CPU-bound code. Java beating C and C++ is fairly normal these days due to the tricks you can play with JIT over AOT optimization. Once Java has proper support for GPGPU, it will be hard for native code languages to get any new converts from JVM. Put the source up and I and others will try things out. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Dec 22 2014
prev sibling parent reply "Iov Gherman" <iovisx gmail.com> writes:
Hi Guys,

First of all, thank you all for responding so quick, it is so 
nice to see D having such an active community.

As I said in my first post, I used no other parameters to dmd 
when compiling because I don't know too much about dmd 
compilation flags. I can't wait to try the flags Daniel suggested 
with dmd (-O -release -inline -noboundscheck) and the other two 
compilers (ldc2 and gdc). Thank you guys for your suggestions.

Meanwhile, I created a git repository on github and I put there 
all my code. If you find any errors please let me know. Because I 
am keeping the results in a big array the programs take 
approximately 8Gb of RAM. If you don't have enough RAM feel free 
to decrease the size of the array. For java code you will also 
need to change 'compile-run.bsh' and use the right memory 
parameters.


Thank you all for helping,
Iov
Dec 22 2014
parent reply "bachmeier" <no spam.com> writes:
On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:
 Hi Guys,

 First of all, thank you all for responding so quick, it is so 
 nice to see D having such an active community.

 As I said in my first post, I used no other parameters to dmd 
 when compiling because I don't know too much about dmd 
 compilation flags. I can't wait to try the flags Daniel 
 suggested with dmd (-O -release -inline -noboundscheck) and the 
 other two compilers (ldc2 and gdc). Thank you guys for your 
 suggestions.

 Meanwhile, I created a git repository on github and I put there 
 all my code. If you find any errors please let me know. Because 
 I am keeping the results in a big array the programs take 
 approximately 8Gb of RAM. If you don't have enough RAM feel 
 free to decrease the size of the array. For java code you will 
 also need to change 'compile-run.bsh' and use the right memory 
 parameters.


 Thank you all for helping,
 Iov
Link to your repo?
Dec 22 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote:
 On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:
 Hi Guys,

 First of all, thank you all for responding so quick, it is so 
 nice to see D having such an active community.

 As I said in my first post, I used no other parameters to dmd 
 when compiling because I don't know too much about dmd 
 compilation flags. I can't wait to try the flags Daniel 
 suggested with dmd (-O -release -inline -noboundscheck) and 
 the other two compilers (ldc2 and gdc). Thank you guys for 
 your suggestions.

 Meanwhile, I created a git repository on github and I put 
 there all my code. If you find any errors please let me know. 
 Because I am keeping the results in a big array the programs 
 take approximately 8Gb of RAM. If you don't have enough RAM 
 feel free to decrease the size of the array. For java code you 
 will also need to change 'compile-run.bsh' and use the right 
 memory parameters.


 Thank you all for helping,
 Iov
Link to your repo?
Sorry, forgot about it: https://github.com/ghermaniov/benchmarks
Dec 22 2014
next sibling parent reply "Iov Gherman" <iovisx gmail.com> writes:
So, I did some more testing with the one processing in paralel:

--- dmd:
4 secs, 977 ms

--- dmd with flags: -O -release -inline -noboundscheck:
4 secs, 635 ms

--- ldc:
6 secs, 271 ms

--- gdc:
10 secs, 439 ms

I also pushed the new bash scripts to the git repository.
Dec 22 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
Flag suggestions: ldc2 -O3 -release -mcpu=native -singleobj gdc -O3 -frelease -march=native
Dec 22 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
On Monday, 22 December 2014 at 17:50:20 UTC, John Colvin wrote:
 On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
Flag suggestions: ldc2 -O3 -release -mcpu=native -singleobj gdc -O3 -frelease -march=native
Tried it, here are the results: --- ldc: 6 secs, 271 ms --- ldc -O3 -release -mcpu=native -singleobj: 5 secs, 686 ms --- gdc: 10 secs, 439 ms --- gdc -O3 -frelease -march=native: 9 secs, 180 ms
Dec 22 2014
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 22 December 2014 at 18:27:48 UTC, Iov Gherman wrote:
 On Monday, 22 December 2014 at 17:50:20 UTC, John Colvin wrote:
 On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in 
 paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
Flag suggestions: ldc2 -O3 -release -mcpu=native -singleobj gdc -O3 -frelease -march=native
Tried it, here are the results: --- ldc: 6 secs, 271 ms --- ldc -O3 -release -mcpu=native -singleobj: 5 secs, 686 ms --- gdc: 10 secs, 439 ms --- gdc -O3 -frelease -march=native: 9 secs, 180 ms
That's very different to my results. I see no important difference between ldc and dmd when using std.math, but when using core.stdc.math ldc halves its time where dmd only manages to get to ~80%
Dec 22 2014
next sibling parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
 That's very different to my results.

 I see no important difference between ldc and dmd when using 
 std.math, but when using core.stdc.math ldc halves its time 
 where dmd only manages to get to ~80%
What CPU do you have? On my Intel Core i3 I have similar experience as Iov Gherman, but on my Amd FX4200 I have same results as you. Seems std.math.log is not good for my AMD CPU :)
Dec 22 2014
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 23 December 2014 at 07:26:27 UTC, Daniel Kozak wrote:
 That's very different to my results.

 I see no important difference between ldc and dmd when using 
 std.math, but when using core.stdc.math ldc halves its time 
 where dmd only manages to get to ~80%
What CPU do you have? On my Intel Core i3 I have similar experience as Iov Gherman, but on my Amd FX4200 I have same results as you. Seems std.math.log is not good for my AMD CPU :)
Intel Core i5-4278U
Dec 23 2014
prev sibling parent reply "Iov Gherman" <iovisx gmail.com> writes:
 That's very different to my results.

 I see no important difference between ldc and dmd when using 
 std.math, but when using core.stdc.math ldc halves its time 
 where dmd only manages to get to ~80%
I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms
Dec 23 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:
 That's very different to my results.

 I see no important difference between ldc and dmd when using 
 std.math, but when using core.stdc.math ldc halves its time 
 where dmd only manages to get to ~80%
I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms
These multi-threaded benchmarks can be very sensitive to their environment, you should try running it with nice -20 and do multiple passes to get a vague idea of the variability in the result. Also, it's important to minimise the number of other running processes.
Dec 23 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
 These multi-threaded benchmarks can be very sensitive to their 
 environment, you should try running it with nice -20 and do 
 multiple passes to get a vague idea of the variability in the 
 result. Also, it's important to minimise the number of other 
 running processes.
I did not use the nice parameter but I always ran them multiple times and choose the average time. My system has very few running processes, minimalist ArchLinux with Xfce4 so I don't think the running processes are affecting in any way my tests.
Dec 23 2014
parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Tuesday, 23 December 2014 at 10:39:13 UTC, Iov Gherman wrote:
 These multi-threaded benchmarks can be very sensitive to their 
 environment, you should try running it with nice -20 and do 
 multiple passes to get a vague idea of the variability in the 
 result. Also, it's important to minimise the number of other 
 running processes.
I did not use the nice parameter but I always ran them multiple times and choose the average time. My system has very few running processes, minimalist ArchLinux with Xfce4 so I don't think the running processes are affecting in any way my tests.
And what about single threaded version? Btw. One reason why DMD is faster is because it use fyl2x X87 instruction here is version for others compilers: import std.math, std.stdio, std.datetime; enum SIZE = 100_000_000; version(GNU) { real mylog(double x) pure nothrow { real result; double y = LN2; asm { "fldl %2\n" "fldl %1\n" "fyl2x" : "=t" (result) : "m" (x), "m" (y); } return result; } } else { real mylog(double x) pure nothrow { return yl2x(x, LN2); } } void main() { auto t1 = Clock.currTime(); auto logs = new double[SIZE]; foreach (i; 0 .. SIZE) { logs[i] = mylog(i + 1.0); } auto t2 = Clock.currTime(); writeln("time: ", (t2 - t1)); } But it is faster only on all Intel CPU, but on one of my AMD it is slower than core.stdc.log
Dec 23 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
 And what about single threaded version?
Just ran the single thread examples after I moved time start before array allocation, thanks for that, good catch. Still better results in Java: - java: 21 secs, 612 ms - with std.math: dmd: 23 secs, 994 ms ldc: 31 secs, 668 ms gdc: 52 secs, 576 ms - with core.stdc.math: dmd: 30 secs, 724 ms ldc: 30 secs, 988 ms gdc: time: 25 secs, 970 ms
Dec 23 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Tuesday, 23 December 2014 at 12:26:28 UTC, Iov Gherman wrote:
 And what about single threaded version?
Just ran the single thread examples after I moved time start before array allocation, thanks for that, good catch. Still better results in Java: - java: 21 secs, 612 ms - with std.math: dmd: 23 secs, 994 ms ldc: 31 secs, 668 ms gdc: 52 secs, 576 ms - with core.stdc.math: dmd: 30 secs, 724 ms ldc: 30 secs, 988 ms gdc: time: 25 secs, 970 ms
Note that log is done in software on x86 with different levels of precision and with different ability to handle corner cases. It is therefore a very bad benchmark tool.
Dec 23 2014
prev sibling parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Tuesday, 23 December 2014 at 10:20:04 UTC, Iov Gherman wrote:
 That's very different to my results.

 I see no important difference between ldc and dmd when using 
 std.math, but when using core.stdc.math ldc halves its time 
 where dmd only manages to get to ~80%
I checked again today and the results are interesting, on my pc I don't see any difference between std.math and core.stdc.math with ldc. Here are the results with all compilers. - with std.math: dmd: 4 secs, 878 ms ldc: 5 secs, 650 ms gdc: 9 secs, 161 ms - with core.stdc.math: dmd: 5 secs, 991 ms ldc: 5 secs, 572 ms gdc: 7 secs, 957 ms
Btw. I just noticed small issue with D vs. java, you start messure in D before allocation, but in case of Java after allocation
Dec 23 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
 Btw. I just noticed small issue with D vs. java, you start 
 messure in D before allocation, but in case of Java after 
 allocation
Here is the java result for parallel processing after moving the start time as the first line in main. Still best result: 4 secs, 50 ms average
Dec 23 2014
next sibling parent "Iov Gherman" <iovisx gmail.com> writes:
Forgot to mention that I pushed my changes to github.
Dec 23 2014
prev sibling parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Tuesday, 23 December 2014 at 12:31:47 UTC, Iov Gherman wrote:
 Btw. I just noticed small issue with D vs. java, you start 
 messure in D before allocation, but in case of Java after 
 allocation
Here is the java result for parallel processing after moving the start time as the first line in main. Still best result: 4 secs, 50 ms average
Java: Exec time: 6 secs, 421 ms LDC (-O3 -release -mcpu=native -singleobj -inline -boundscheck=off) time: 5 secs, 321 ms, 877 μs, and 2 hnsecs GDC(-O3 -frelease -march=native -finline -fno-bounds-check) time: 5 secs, 237 ms, 453 μs, and 7 hnsecs DMD(-O -release -inline -noboundscheck) time: 5 secs, 107 ms, 931 μs, and 3 hnsecs So all d compilers beat Java in my case: but I have made some change in D version: import std.parallelism, std.math, std.stdio, std.datetime; import core.memory; enum XMS = 3*1024*1024*1024; //3GB version(GNU) { real mylog(double x) pure nothrow { double result; double y = LN2; asm { "fldl %2\n" "fldl %1\n" "fyl2x\n" : "=t" (result) : "m" (x), "m" (y); } return result; } } else { real mylog(double x) pure nothrow { return yl2x(x, LN2); } } void main() { GC.reserve(XMS); auto t1 = Clock.currTime(); auto logs = new double[1_000_000_000]; foreach(i, ref elem; taskPool.parallel(logs, 200)) { elem = mylog(i + 1.0); } auto t2 = Clock.currTime(); writeln("time: ", (t2 - t1)); }
Dec 23 2014
parent "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> writes:
I'm getting faster execution on java thank dmd, gdc beats it 
though.

...although, what this topic really provides is a reason for me 
to get more RAM for my next laptop. How much do you people run 
with? I had to scale the java down to 300 million to avoid dying 
with 4G memory.
Dec 23 2014
prev sibling parent reply "aldanor" <i.s.smirnov gmail.com> writes:
On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
import std.math, std.stdio, std.datetime; --> try replacing "std.math" with "core.stdc.math".
Dec 22 2014
parent reply "Iov Gherman" <iovisx gmail.com> writes:
On Monday, 22 December 2014 at 18:00:18 UTC, aldanor wrote:
 On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
import std.math, std.stdio, std.datetime; --> try replacing "std.math" with "core.stdc.math".
Tried it, it is worst: 6 secs, 78 ms while the initial one was 4 secs, 977 ms and sometimes even better.
Dec 22 2014
parent "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> writes:
On Monday, 22 December 2014 at 18:23:29 UTC, Iov Gherman wrote:
 On Monday, 22 December 2014 at 18:00:18 UTC, aldanor wrote:
 On Monday, 22 December 2014 at 17:28:12 UTC, Iov Gherman wrote:
 So, I did some more testing with the one processing in 
 paralel:

 --- dmd:
 4 secs, 977 ms

 --- dmd with flags: -O -release -inline -noboundscheck:
 4 secs, 635 ms

 --- ldc:
 6 secs, 271 ms

 --- gdc:
 10 secs, 439 ms

 I also pushed the new bash scripts to the git repository.
import std.math, std.stdio, std.datetime; --> try replacing "std.math" with "core.stdc.math".
Tried it, it is worst: 6 secs, 78 ms while the initial one was 4 secs, 977 ms and sometimes even better.
Strange... for me, core.stdc.math.log is about twice as fast as std.math.log.
Dec 22 2014
prev sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 22 December 2014 at 17:16:49 UTC, Iov Gherman wrote:
 On Monday, 22 December 2014 at 17:16:05 UTC, bachmeier wrote:
 On Monday, 22 December 2014 at 17:05:19 UTC, Iov Gherman wrote:
 Hi Guys,

 First of all, thank you all for responding so quick, it is so 
 nice to see D having such an active community.

 As I said in my first post, I used no other parameters to dmd 
 when compiling because I don't know too much about dmd 
 compilation flags. I can't wait to try the flags Daniel 
 suggested with dmd (-O -release -inline -noboundscheck) and 
 the other two compilers (ldc2 and gdc). Thank you guys for 
 your suggestions.

 Meanwhile, I created a git repository on github and I put 
 there all my code. If you find any errors please let me know. 
 Because I am keeping the results in a big array the programs 
 take approximately 8Gb of RAM. If you don't have enough RAM 
 feel free to decrease the size of the array. For java code 
 you will also need to change 'compile-run.bsh' and use the 
 right memory parameters.


 Thank you all for helping,
 Iov
Link to your repo?
Sorry, forgot about it: https://github.com/ghermaniov/benchmarks
For posix-style threads, a per-thread workload of 200 calls to log seems rather small. It would interesting to see a graph of execution-time as a function of workgroup-size. Traditionally one would use a workgroup size of (nElements / nCores) or similar, in order to get all the cores working but also minimise pressure on the scheduler, inter-thread communication and so on.
Dec 23 2014