digitalmars.D.learn - Speed of math function atan: comparison D and C++

J-S Caux (9/9) Mar 04 2018 I'm considering shifting a large existing C++ codebase into D

rikki cattermole (5/15) Mar 04 2018 Gonna need to disassemble and compare them.

J-S Caux (10/28) Mar 04 2018 So the codes are trivial, simply some check of raw speed:

rikki cattermole (2/35) Mar 04 2018 Yes, but that doesn't show me how you benchmarked.
Uknown (14/45) Mar 05 2018 Depending on your platform, the size of `double` could be

J-S Caux (26/39) Mar 05 2018 Thanks all for the info.

bauss (2/4) Mar 05 2018 Probably, if someone takes the time to look at the bottlenecks.
H. S. Teoh (17/47) Mar 05 2018 Walter has been adamant that we should always compute std.math.*

bachmeier (2/10) Mar 05 2018 I wonder if Ilya has worked on any of this for Mir.

jmh530 (3/4) Mar 05 2018 Mir has sin and cos, but that's it. It looks like they use llvm

=?iso-8859-1?Q?Robert_M._M=FCnch?= (11/18) Mar 05 2018 Hi, do you have a reference for this? I can't believe this, as the

J-S Caux (23/43) Mar 06 2018 Speaking for myself, the reason why I haven't made the switch

Uknown (14/35) Mar 06 2018 D has std.complex and inbuilt complex types, just like C [0][1].

H. S. Teoh (19/29) Mar 06 2018 [...]

jmh530 (8/13) Mar 06 2018 Aren't there two issues: 1) std.math functions that cast to real

H. S. Teoh (22/36) Mar 06 2018 The fix itself may be straightforward, but how to do it without breaking

jmh530 (12/16) Mar 06 2018 Ah, I see what you're saying. People may be depending on the

Andrea Fontana (7/10) Mar 06 2018 I don't understand why atan(float) returns real and atan(double)

Johan Engelen (19/39) Mar 05 2018 The performance problem with this code is that LDC does not yet
psychoticRabbit (3/12) Mar 05 2018 should a be an int?

Era Scarecrow (12/15) Mar 04 2018 Should be 3-4 instructions. Load input to the FPU (Optional?

Marc (3/13) Mar 05 2018 What compiled flags did you used to compile both C++ and D

J-S Caux <js gmail.com> writes:

I'm considering shifting a large existing C++ codebase into D 
(it's a scientific code making much use of functions like atan, 
log etc).

I've compared the raw speed of atan between C++ (Apple LLVM 
version 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also ldc2 
1.7.0) by doing long loops of such functions.

I can't get the D to run faster than about half the speed of C++.

Are there benchmarks for such scientific functions published 
somewhere?

Mar 04 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 05/03/2018 6:35 PM, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D (it's a 
 scientific code making much use of functions like atan, log etc).
 
 I've compared the raw speed of atan between C++ (Apple LLVM version 
 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also ldc2 1.7.0) by doing 
 long loops of such functions.
 
 I can't get the D to run faster than about half the speed of C++.
 
 Are there benchmarks for such scientific functions published somewhere

Gonna need to disassemble and compare them.

atan should work out to only be a few instructions (inline assembly) 
from what I've looked at in the source.

Also you should post the code you used for each.

Mar 04 2018

J-S Caux <js gmail.com> writes:

On Monday, 5 March 2018 at 05:40:09 UTC, rikki cattermole wrote:
 On 05/03/2018 6:35 PM, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D 
 (it's a scientific code making much use of functions like 
 atan, log etc).
 
 I've compared the raw speed of atan between C++ (Apple LLVM 
 version 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also ldc2 
 1.7.0) by doing long loops of such functions.
 
 I can't get the D to run faster than about half the speed of 
 C++.
 
 Are there benchmarks for such scientific functions published 
 somewhere

 Gonna need to disassemble and compare them.

 atan should work out to only be a few instructions (inline 
 assembly) from what I've looked at in the source.

 Also you should post the code you used for each.

So the codes are trivial, simply some check of raw speed:

   double x = 0.0;
   for (int a = 0; a < 1000000000; ++a) x += atan(1.0/(1.0 + 
sqrt(1.0 + a)));

for C++ and

   double x = 0.0;
   for (int a = 0; a < 1_000_000_000; ++a) x += atan(1.0/(1.0 + 
sqrt(1.0 + a)));

for D. C++ exec takes 40 seconds, D exec takes 68 seconds.

Mar 04 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 05/03/2018 7:01 PM, J-S Caux wrote:
 On Monday, 5 March 2018 at 05:40:09 UTC, rikki cattermole wrote:
 On 05/03/2018 6:35 PM, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D (it's a 
 scientific code making much use of functions like atan, log etc).

 I've compared the raw speed of atan between C++ (Apple LLVM version 
 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also ldc2 1.7.0) by 
 doing long loops of such functions.

 I can't get the D to run faster than about half the speed of C++.

 Are there benchmarks for such scientific functions published somewhere

 Gonna need to disassemble and compare them.

 atan should work out to only be a few instructions (inline assembly) 
 from what I've looked at in the source.

 Also you should post the code you used for each.

 
 So the codes are trivial, simply some check of raw speed:
 
    double x = 0.0;
    for (int a = 0; a < 1000000000; ++a) x += atan(1.0/(1.0 + sqrt(1.0 + 
 a)));
 
 for C++ and
 
    double x = 0.0;
    for (int a = 0; a < 1_000_000_000; ++a) x += atan(1.0/(1.0 + sqrt(1.0 
 + a)));
 
 for D. C++ exec takes 40 seconds, D exec takes 68 seconds.

Yes, but that doesn't show me how you benchmarked.

Mar 04 2018

Uknown <sireeshkodali1 gmail.com> writes:

On Monday, 5 March 2018 at 06:01:27 UTC, J-S Caux wrote:
 On Monday, 5 March 2018 at 05:40:09 UTC, rikki cattermole wrote:
 On 05/03/2018 6:35 PM, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D 
 (it's a scientific code making much use of functions like 
 atan, log etc).
 
 I've compared the raw speed of atan between C++ (Apple LLVM 
 version 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also 
 ldc2 1.7.0) by doing long loops of such functions.
 
 I can't get the D to run faster than about half the speed of 
 C++.
 
 Are there benchmarks for such scientific functions published 
 somewhere

 Gonna need to disassemble and compare them.

 atan should work out to only be a few instructions (inline 
 assembly) from what I've looked at in the source.

 Also you should post the code you used for each.

 So the codes are trivial, simply some check of raw speed:

   double x = 0.0;
   for (int a = 0; a < 1000000000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for C++ and

   double x = 0.0;
   for (int a = 0; a < 1_000_000_000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for D. C++ exec takes 40 seconds, D exec takes 68 seconds.

Depending on your platform, the size of `double` could be 
different between C++ and D. Could you check that the size and 
precision are indeed the same?
Also, benchmark method is just as important as benchmark code. 
Did you use DMD or LDC as the D compiler? In this case it 
shouldn't matter, but try with LDC if you haven't. Also ensure 
that you've used the right flags:
`-release -inline -O`.

If the D version is still slower, you could try using the C 
version of the function
Simply change `import std.math: atan;` to `core.stdc.math: atan;` 
[0]

[0]: https://dlang.org/phobos/core_stdc_math.html#.atan

Mar 05 2018

J-S Caux <js gmail.com> writes:

On Monday, 5 March 2018 at 09:48:49 UTC, Uknown wrote:

 Depending on your platform, the size of `double` could be 
 different between C++ and D. Could you check that the size and 
 precision are indeed the same?
 Also, benchmark method is just as important as benchmark code. 
 Did you use DMD or LDC as the D compiler? In this case it 
 shouldn't matter, but try with LDC if you haven't. Also ensure 
 that you've used the right flags:
 `-release -inline -O`.

 If the D version is still slower, you could try using the C 
 version of the function
 Simply change `import std.math: atan;` to `core.stdc.math: 
 atan;` [0]

 [0]: https://dlang.org/phobos/core_stdc_math.html#.atan

Thanks all for the info.

I've tested these two very basic representative codes:
https://www.dropbox.com/s/b5o4i8h43qh1saf/test.cc?dl=0
https://www.dropbox.com/s/zsaikhdoyun3olk/test.d?dl=0

Results:

C++:
g++ (Apple LLVM version 7.3.0):  9.5 secs
g++ (GCC 7.1.0):  10.7 secs

D:
dmd :  35.5 secs
dmd -release -inline -O : 29.5 secs
ldc2 :  34.4 secs
ldc2 -release -O : 31.5 secs

But now: using the core.stdc.math atan as per Uknown's suggestion:
D:
dmd:  9 secs
dmd -release -inline -O :  6.8 secs
ldc2 : 10 secs
ldc2 -release -O :  6.5 secs   <- best

So indeed the difference is between the `std.math atan` versus 
the `core.stdc.math atan`. Thanks Uknown! Just knowing this trick 
could make the difference between me and other scientists 
switching over to D...

But now comes the question: can the D fundamental maths functions 
be propped up to be as fast as the C ones?

Mar 05 2018

bauss <jj_1337 live.dk> writes:

On Monday, 5 March 2018 at 18:39:21 UTC, J-S Caux wrote:
 But now comes the question: can the D fundamental maths 
 functions be propped up to be as fast as the C ones?

Probably, if someone takes the time to look at the bottlenecks.

Mar 05 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Mar 05, 2018 at 06:39:21PM +0000, J-S Caux via Digitalmars-d-learn
wrote:
[...]
 I've tested these two very basic representative codes:
 https://www.dropbox.com/s/b5o4i8h43qh1saf/test.cc?dl=0
 https://www.dropbox.com/s/zsaikhdoyun3olk/test.d?dl=0
 
 Results:
 
 C++:
 g++ (Apple LLVM version 7.3.0):  9.5 secs
 g++ (GCC 7.1.0):  10.7 secs
 
 D:
 dmd :  35.5 secs
 dmd -release -inline -O : 29.5 secs
 ldc2 :  34.4 secs
 ldc2 -release -O : 31.5 secs
 
 But now: using the core.stdc.math atan as per Uknown's suggestion:
 D:
 dmd:  9 secs
 dmd -release -inline -O :  6.8 secs
 ldc2 : 10 secs
 ldc2 -release -O :  6.5 secs   <- best
 
 So indeed the difference is between the `std.math atan` versus the
 `core.stdc.math atan`. Thanks Uknown! Just knowing this trick could
 make the difference between me and other scientists switching over to
 D...
 
 But now comes the question: can the D fundamental maths functions be
 propped up to be as fast as the C ones?

Walter has been adamant that we should always compute std.math.*
functions with the `real` type, which on x86 maps to the non-IEEE 80-bit
floats.  However, 80-bit floats have been deprecated for a while now,
and pretty much nobody cares to improve their performance on newer CPUs,
focusing instead on SSE/MMX performance with 64-bit doubles.  People
have been clamoring for using 64-bit doubles by default rather than
80-bit floats, but so far Walter has refused to budge.

But perhaps this time, we might have a strong case for pushing this into
D.  IMO, it has been long overdue.  I filed an issue for this:

	https://issues.dlang.org/show_bug.cgi?id=18559

If you have any additional relevant information, please post it there so
that we can build a strong case to convince Walter about this issue.


T

-- 
Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be
algorithms.

Mar 05 2018

bachmeier <no spam.net> writes:

On Monday, 5 March 2018 at 20:11:06 UTC, H. S. Teoh wrote:

 Walter has been adamant that we should always compute 
 std.math.* functions with the `real` type, which on x86 maps to 
 the non-IEEE 80-bit floats.  However, 80-bit floats have been 
 deprecated for a while now, and pretty much nobody cares to 
 improve their performance on newer CPUs, focusing instead on 
 SSE/MMX performance with 64-bit doubles.  People have been 
 clamoring for using 64-bit doubles by default rather than 
 80-bit floats, but so far Walter has refused to budge.

I wonder if Ilya has worked on any of this for Mir.

Mar 05 2018

jmh530 <john.michael.hall gmail.com> writes:

On Monday, 5 March 2018 at 21:05:19 UTC, bachmeier wrote:
 I wonder if Ilya has worked on any of this for Mir.

Mir has sin and cos, but that's it. It looks like they use llvm 
intrinsics on LDC and then fall back to phobos' implementation.

Mar 05 2018

=?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:

On 2018-03-05 20:11:06 +0000, H. S. Teoh said:

 Walter has been adamant that we should always compute std.math.*
 functions with the `real` type, which on x86 maps to the non-IEEE 80-bit
 floats.  However, 80-bit floats have been deprecated for a while now,

Hi, do you have a reference for this? I can't believe this, as the 
80-bit are pretty important for a lot of optimization algorithms. We 
use it all the time and it's absolutly necessary.

 and pretty much nobody cares to improve their performance on newer CPUs,

Really?

 focusing instead on SSE/MMX performance with 64-bit doubles.  People
 have been clamoring for using 64-bit doubles by default rather than
 80-bit floats, but so far Walter has refused to budge.

IMO this is all driven by the GPU/AI hype that just (seems) to be happy 
with rough precision.

-- 
Robert M. M�nch
http://www.saphirion.com
smarter | better | faster

Mar 05 2018

J-S Caux <js gmail.com> writes:

On Tuesday, 6 March 2018 at 07:12:57 UTC, Robert M. Münch wrote:
 On 2018-03-05 20:11:06 +0000, H. S. Teoh said:

 Walter has been adamant that we should always compute 
 std.math.*
 functions with the `real` type, which on x86 maps to the 
 non-IEEE 80-bit
 floats.  However, 80-bit floats have been deprecated for a 
 while now,

 Hi, do you have a reference for this? I can't believe this, as 
 the 80-bit are pretty important for a lot of optimization 
 algorithms. We use it all the time and it's absolutly necessary.

 and pretty much nobody cares to improve their performance on 
 newer CPUs,

 Really?

 focusing instead on SSE/MMX performance with 64-bit doubles.  
 People
 have been clamoring for using 64-bit doubles by default rather 
 than
 80-bit floats, but so far Walter has refused to budge.

 IMO this is all driven by the GPU/AI hype that just (seems) to 
 be happy with rough precision.

Speaking for myself, the reason why I haven't made the switch 
from C++ to D many years ago for all my scientific work is that 
for many computations, 64 bit precision is certainly sufficient, 
and the performance I could get out of D (factor 4 to 6 slower in 
my tests) was simply insufficient.

Now, with Uknown's trick of using the C math functions, I can 
reconsider. It's a bit of a "patch" but at least it works.

In an ideal world, I'd like the language I use to:
- have double-precision arithmetic with equal performance to C/C++
- have all basic mathematical functions implemented, including 
for complex types
- *big bonus*: have the ability to do extended-precision 
arithmetic (integer, but most importantly (complex) 
floating-point) on-the-fly if I so wish, without having to rely 
on external libraries.

C++ was always fine, with external libraries for extended 
precision, but D is so much more pleasant to use. Many of my 
colleagues are switching to e.g. Julia despite the performance 
costs, because it is by design a very maths/science-friendly 
language. D is however much closer to a whole stack of existing 
codebases, so switching to it would involve much less extensive 
refactoring.

Mar 06 2018

Uknown <sireeshkodali1 gmail.com> writes:

On Tuesday, 6 March 2018 at 08:20:05 UTC, J-S Caux wrote:
 On Tuesday, 6 March 2018 at 07:12:57 UTC, Robert M. Münch wrote:
 On 2018-03-05 20:11:06 +0000, H. S. Teoh said:

 [snip]
 Now, with Uknown's trick of using the C math functions, I can 
 reconsider. It's a bit of a "patch" but at least it works.

I'm glad I could help!

 In an ideal world, I'd like the language I use to:
 - have double-precision arithmetic with equal performance to 
 C/C++
 - have all basic mathematical functions implemented, including 
 for complex types
 - *big bonus*: have the ability to do extended-precision 
 arithmetic (integer, but most importantly (complex) 
 floating-point) on-the-fly if I so wish, without having to rely 
 on external libraries.

D has std.complex and inbuilt complex types, just like C [0][1]. 
I modified the mandelbrot generator on Wikipedia, using D's 
std.complex and didn't have too much of an issue with 
performance.[2]
Also, std.bigint and mir might be of interest to you.[3]

 C++ was always fine, with external libraries for extended 
 precision, but D is so much more pleasant to use. Many of my 
 colleagues are switching to e.g. Julia despite the performance 
 costs, because it is by design a very maths/science-friendly 
 language. D is however much closer to a whole stack of existing 
 codebases, so switching to it would involve much less extensive 
 refactoring.

Theres a good chance D can interface with those libraries you 
mentioned...

[0]: https://dlang.org/phobos/std_complex.html
[1]: https://dlang.org/phobos/core_stdc_complex.html
[2]: 
https://github.com/Sirsireesh/Khoj-2017/blob/master/Mandelbrot-set/mandelbrot.d
[3]: https://github.com/libmir

Mar 06 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Mar 06, 2018 at 08:12:57AM +0100, Robert M. Münch via
Digitalmars-d-learn wrote:
 On 2018-03-05 20:11:06 +0000, H. S. Teoh said:
 
 Walter has been adamant that we should always compute std.math.*
 functions with the `real` type, which on x86 maps to the non-IEEE
 80-bit floats.  However, 80-bit floats have been deprecated for a
 while now,

 
 Hi, do you have a reference for this? I can't believe this, as the
 80-bit are pretty important for a lot of optimization algorithms. We
 use it all the time and it's absolutly necessary.

[...]

http://www.zdnet.com/article/nvidia-de-optimizes-physx-for-the-cpu/?tag=nl.e539

Quotation:

	Intel started discouraging the use of x87 with the introduction
	of the P4 in late 2000. AMD deprecated x87 since the K8 in 2003,
	as x86-64 is defined with SSE2 support; VIA’s C7 has supported
	SSE2 since 2005. In 64-bit versions of Windows, x87 is
	deprecated for user-mode, and prohibited entirely in
	kernel-mode. Pretty much everyone in the industry has
	recommended SSE over x87 since 2005 and there are no reasons to
	use x87, unless software has to run on an embedded Pentium or
	486. 

I'm not advocating for getting *rid* of 80-bit float support, but only
to make it *optional* rather than the default, as currently done in
std.math.


T

-- 
Once bitten, twice cry...

Mar 06 2018

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 6 March 2018 at 17:51:54 UTC, H. S. Teoh wrote:
 [snip]

 I'm not advocating for getting *rid* of 80-bit float support, 
 but only to make it *optional* rather than the default, as 
 currently done in std.math.


 T

Aren't there two issues: 1) std.math functions that cast to real 
to perform calculations, 2) the compiler sometimes converts 
things to real in the background when people don't want it to.

Number 1 seems straightforward to fix. Introduce new versions of 
the std.math functions for float/double and the user can cast to 
real if the additional accuracy is necessary.

Number 2 would require a compiler switch, I imagine.

Mar 06 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Mar 06, 2018 at 06:05:59PM +0000, jmh530 via Digitalmars-d-learn wrote:
 On Tuesday, 6 March 2018 at 17:51:54 UTC, H. S. Teoh wrote:
 [snip]
 
 I'm not advocating for getting *rid* of 80-bit float support, but
 only to make it *optional* rather than the default, as currently
 done in std.math.


[...]
 Aren't there two issues: 1) std.math functions that cast to real to
 perform calculations, 2) the compiler sometimes converts things to
 real in the background when people don't want it to.
 
 Number 1 seems straightforward to fix. Introduce new versions of the
 std.math functions for float/double and the user can cast to real if
 the additional accuracy is necessary.

The fix itself may be straightforward, but how to do it without breaking
tons of existing code and provoking user backlash is the tricky part.


 Number 2 would require a compiler switch, I imagine.

It may not always be the compiler's fault. In the case of x87, it's the
hardware itself that internally promotes to 80-bit and truncates later.
IIRC, the original intent was that user code would only deal with
64-bit, and the 80-bit stuff would only happen inside the x87 (C, for
example, does not provide direct access to this type, except via vendor
extensions). However, due to the necessity to be able to save
intermediate computational states, there are instructions that can
load/extract 80-bit intermediate values to/from the x87, and eventually
people ended up just using these instructions for working with the
80-bit type directly.  You can suppress the compiler from issuing these
instructions, but 64-bit doubles may still be internally converted by
the hardware to 80-bit intermediate values during computation.

But I suppose you could force the compiler to use SSE instructions for
double operations instead of x87, then it would bypass the 80-bit
intermediate values completely.


T

-- 
Being able to learn is a great learning; being able to unlearn is a greater
learning.

Mar 06 2018

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 6 March 2018 at 18:41:15 UTC, H. S. Teoh wrote:
 The fix itself may be straightforward, but how to do it without 
 breaking tons of existing code and provoking user backlash is 
 the tricky part.
 [snip]

Ah, I see what you're saying. People may be depending on the 
extra accuracy for these functions.

Would just require something like

double sin(double x)  safe pure nothrow  nogc
{
     version (FP_Math) {
         ///double sin implementation
     } else {
         return sin(cast(real) x);
     }
}

Mar 06 2018

Andrea Fontana <nospam example.com> writes:

On Monday, 5 March 2018 at 20:11:06 UTC, H. S. Teoh wrote:
 Walter has been adamant that we should always compute 
 std.math.* functions with the `real` type
 T

I don't understand why atan(float) returns real and atan(double) 
return real too. If I'm working with float, why does it return a 
real? If you want to comute with real is ok, but shouldn't be T 
atan(T) rather than real atan(T)?

I'm missing something.

Andrea

Mar 06 2018

Johan Engelen <j j.nl> writes:

On Monday, 5 March 2018 at 06:01:27 UTC, J-S Caux wrote:
 On Monday, 5 March 2018 at 05:40:09 UTC, rikki cattermole wrote:
 On 05/03/2018 6:35 PM, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D 
 (it's a scientific code making much use of functions like 
 atan, log etc).
 
 I've compared the raw speed of atan between C++ (Apple LLVM 
 version 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also 
 ldc2 1.7.0) by doing long loops of such functions.
 
 I can't get the D to run faster than about half the speed of 
 C++.


   double x = 0.0;
   for (int a = 0; a < 1000000000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for C++ and

   double x = 0.0;
   for (int a = 0; a < 1_000_000_000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for D. C++ exec takes 40 seconds, D exec takes 68 seconds.

The performance problem with this code is that LDC does not yet 
do cross-module inlining by default. GDC does. If you pass 
`-enable-cross-module-inlining` to LDC, things should be faster. 
In particular, std.sqrt is not inlined although it is profitable 
to do so (it becomes one machine instruction). Things become 
worse when using core.stdc.math.sqrt, because no implementation 
source available: no inlining possible.

Another problem is that std.math.atan(double) just calls 
std.math.atan(real). Calculations are more expensive on platforms 
where real==80bits (i.e. x86), and that's not solvable with a 
compile flag. What it takes is someone to write the double and 
float versions of atan (and other math functions), but it 
requires someone with the right knowledge to do it.

Your tests (and reporting about them) are much appreciated. 
Please do file bug reports for these things. Perhaps you can take 
a stab at implementing double-versions of the functions you need?

cheers,
   Johan

Mar 05 2018

psychoticRabbit <meagain meagain.com> writes:

On Monday, 5 March 2018 at 06:01:27 UTC, J-S Caux wrote:
 So the codes are trivial, simply some check of raw speed:

   double x = 0.0;
   for (int a = 0; a < 1000000000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for C++ and

   double x = 0.0;
   for (int a = 0; a < 1_000_000_000; ++a) x += atan(1.0/(1.0 + 
 sqrt(1.0 + a)));

 for D. C++ exec takes 40 seconds, D exec takes 68 seconds.

should a be an int?

make it a double ;-)

Mar 05 2018

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Monday, 5 March 2018 at 05:40:09 UTC, rikki cattermole wrote:
 atan should work out to only be a few instructions (inline 
 assembly) from what I've looked at in the source.

 Also you should post the code you used for each.

  Should be 3-4 instructions. Load input to the FPU (Optional? 
Depends on if it already has the value loaded), Atan, Fwait 
(optional?), Retrieve value.

  Off hand that i remember, FPU instructions run in their own 
separated space and should more or less take up only a few cycles 
by themselves to run (and also run in parallel to the CPU code).

  At which point if the code is running half the speed of C++'s, 
that means probably bad optimization elsewhere, or even the 
control settings for the FPU.

  I really haven't looked that in depth to the FPU stuff since 
about 2000...

Mar 04 2018

Marc <jckj33 gmail.com> writes:

On Monday, 5 March 2018 at 05:35:28 UTC, J-S Caux wrote:
 I'm considering shifting a large existing C++ codebase into D 
 (it's a scientific code making much use of functions like atan, 
 log etc).

 I've compared the raw speed of atan between C++ (Apple LLVM 
 version 7.3.0 (clang-703.0.29)) and D (dmd v2.079.0, also ldc2 
 1.7.0) by doing long loops of such functions.

 I can't get the D to run faster than about half the speed of 
 C++.

 Are there benchmarks for such scientific functions published 
 somewhere?

What compiled flags did you used to compile both C++ and D 
versions?

Mar 05 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Speed of math function atan: comparison D and C++