digitalmars.D - Differences in results when using the same function in CTFE and

Carsten Schlote (41/41) Aug 08 2024 Hi

IchorDev (6/8) Aug 08 2024 Which it probably does not. CTFE is meant to be less
user1234 (13/20) Aug 08 2024 During CTFE _there has to be_ much more round-trips between the

Carsten Schlote (30/33) Aug 09 2024 Replaced the 'doubles' with 'real' and now the the results of the

H. S. Teoh (9/14) Aug 08 2024 IIRC DMD either uses in CTFE or emits code that uses 8088 80-bit
Nicholas Wilson (5/17) Aug 08 2024 there are `toPrec`[1] intrinsics to solve exactly this issue of

IchorDev (6/28) Aug 10 2024 Can anyone remind me whether there’s a way to force calculations

Quirin Schroll (11/45) Aug 12 2024 D specifies that floating point calculations may be performed

IchorDev (7/13) Aug 12 2024 I don’t care. Only one family of CPU architectures supports

Quirin Schroll (11/24) Aug 13 2024 I honestly don’t know how JRE did implement `double` operations

Abdulhaq (5/10) Aug 13 2024 For those who haven't really run into numerical analysis before,

matheus (4/9) Aug 13 2024 Since I can't access that site, could you please share the URL

Abdulhaq (2/12) Aug 13 2024 sure, it's https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf

IchorDev (19/29) Aug 13 2024 Probably in software, but modern x86 CPUs can just use the

Timon Gehr (8/21) Aug 12 2024 The specification allows behavior like that even at runtime.
Abdulhaq (14/19) Aug 13 2024 It's worth mentioning that in the wider field, it is generally

Timon Gehr (13/22) Aug 13 2024 This is not true at the level of hardware, where results are actually

Abdulhaq (13/20) Aug 13 2024 It's an interesting discussion, but we are where we are, and I

Timon Gehr (28/53) Aug 13 2024 Well, I think one thing is where we are now, another question is where

claptrap (14/21) Aug 13 2024 If you have a pristine 24 bit audio sample, maybe 120db SnR due

Timon Gehr (12/38) Aug 13 2024 It is fully true. "A means B" holds when in any situation where A holds,...

claptrap (6/29) Aug 14 2024 I'd argue "does not mean" is vague, and most people would read

Dom DiSc (6/8) Aug 15 2024 But if you do cross compiling, that may simply be not possible

Timon Gehr (9/18) Aug 15 2024 It's still possible, but it's not even what I asked for. Also,
Timon Gehr (13/14) Aug 15 2024 Personally I mostly care about reproducibility, but also, as I said
claptrap (5/13) Aug 15 2024 So if have some CTFE function that doesn't a bunch of

Carsten Schlote (10/15) Aug 15 2024 I would consider it a bug.

Timon Gehr (3/14) Aug 15 2024 The D spec allows this to not be true too. I consider it a weakness of
Abdulhaq (20/27) Aug 15 2024 I'd say that whether it's a bug or not comes down to the

Abdulhaq (20/21) Aug 15 2024 To clarify a bit more, I'm not just talking about single isolated

Timon Gehr (12/38) Aug 15 2024 No.

Timon Gehr (10/13) Aug 15 2024 It's you who calls it "errors". Someone else may just call it "results".

Abdulhaq (23/39) Aug 15 2024 Here by "error" I mean where the result is different to a known

Quirin Schroll (65/104) Aug 17 2024 Experiences, little, as I'm not doing floating-point stuff

Timon Gehr (48/66) Aug 18 2024 The normal use case for floating-point isn't perfectly reproducible

Quirin Schroll (29/115) Aug 19 2024 I think you got me wrong here on a key aspect: I’m not trying to

Carsten Schlote <carsten.schlote gmx.net> writes:

Hi

I'm playing with CTFE in D. This feature allows for a lot of 
funny things, e.g. initialisation of immutable data at compile 
time with the result of some other function (template).

As a result I get immutable result blobs compiled into the 
binary. But none of the generating code, because it was already 
executed by CTFE.

This worked nicly for several other usecases as well.For now the 
results of CTFE and RT were always the same. As expected.

However, yesterday a unit-test started to report, that the 
results created by the same code with same parameters differ when 
run in CTFE mode or at runtime.

```D
     static immutable ubyte[] burningShipImageCTFE = 
generateBurningShipImage(twidth, theight, maxIter);
     immutable ubyte[] burningShipImageRT = 
generateBurningShipImage(twidth, theight, maxIter);
     assert(burningShipImageCTFE == burningShipImageRT, "Same 
results expected.");

```
I diffed the pictures and indeed some of the pixels in the more 
complex areas of the BurningShip fractal were clearly and 
noteably different.

Ok, the fractal code uses 'double' floats, which are by their 
very nature limited in precision. But assuming that the math 
emulation of CTFE works identical as the CPU does at runtime, the 
outcome should be identical.

Or not, in some cases ;-) E.g. with a fractal equation where 
smallest changes can result into big differences.

And it opens up some questions:

- Can CTFE be used under all circumstances when float numbers of 
any precision are involved?
- Or is this some kind of expected behaviour whenever floats are 
involved?
- Is the D CTFE documentation completely covering such possible 
issues?

I can imagine that bugs causes by such subtil differences might 
be very difficult to fix.


Any experiences or thought on this?

Greetz
Carsten

Aug 08 2024

IchorDev <zxinsworld gmail.com> writes:

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 […] assuming that the math emulation of CTFE works identical as 
 the CPU does at runtime, the outcome should be identical.

Which it probably does not. CTFE is meant to be less 
platform-dependent than runtime code, so it is interpreted rather 
than running directly on the CPU. This is why there are strict 
limitations on where CTFE can be invoked. I believe the exact 
behaviour of CTFE would be implementation-dependent though.

Aug 08 2024

user1234 <user1234 12.de> writes:

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi

 [...]

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?

 Greetz
 Carsten

During CTFE _there has to be_ much more round-trips between the 
storage and the effective operations whereas with native code 
several operations can happen without leaving the FPU... those 
are done with a different accurary (80 bits). During the round 
trips implied by CTFE you have a floating point truncation after 
each op.

You see during CTFE just `a + b * c` means two truncations, one 
after the multiply, the second after the addition (even if this 
example is not very good as those op are done using SSE)

That being said, that'd be intersting to verify if it's the 
problem. Maybe update your code to use `real` variables and let's 
see.

Aug 08 2024

Carsten Schlote <carsten.schlote gmx.net> writes:

On Thursday, 8 August 2024 at 13:31:43 UTC, user1234 wrote:

 That being said, that'd be intersting to verify if it's the 
 problem. Maybe update your code to use `real` variables and 
 let's see.

Replaced the 'doubles' with 'real' and now the the results of the 
CTFE and RT executions are exactly the same.

So, as you expected and described, the CTFE lost some precision 
by splitting the ops and storing the intermediate results back 
from FPU registers with internal precision to the limited 
'double' type somewhere in memory.

However, the other fractals (mandelbot, julia, newton, lyapunov, 
...) work prefectly with the 'double's. I suspect, that their 
default/starting views do no involve enough recursions and 
iterations in the calculation, so that the missing precision is 
not important, yet. I will pimp my unittests to catch such 
precision issues for the other fractals as well, and then fix it 
by using 'real' types.

So, what are the lessions learned here:

1. When using floats use the 'real' type - it uses the maximum 
available FPU precision and no precision is lost, when the FPU 
register is stored/read back to/from memory. So CTFE and RT 
execute have the best chance to produce the same results.

2. Use 'float' and 'double' only if you need to save some bits of 
storage and if the reduced precision is acceptable.

3. Unittests proofed again to be great. Especially when support 
for unittesting is built into the language. Even simplest 
assumptions can be wrong and having assertions for such 
assumptions will be helpful.

4. Use assert()/enforce(), in/out/do to check everything to match 
your expectations and assumptions.

At the end ```assert(burningShipImageCTFE == burningShipImageRT, 
"Same results expected.");``` in my unitest did find it. And I 
got a chance to think about it and work on my assumptions ;-)

Aug 09 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Thu, Aug 08, 2024 at 10:31:32AM +0000, Carsten Schlote via Digitalmars-d
wrote:
[...]
 Ok, the fractal code uses 'double' floats, which are by their very
 nature limited in precision. But assuming that the math emulation of
 CTFE works identical as the CPU does at runtime, the outcome should be
 identical.

IIRC DMD either uses in CTFE or emits code that uses 8088 80-bit
extended precision in intermediate results. Which, obviously, will
produce different results when the same computation performed using only
doubles.


[...]
 - Or is this some kind of expected behaviour whenever floats are involved?

https://floating-point-gui.de/


--T

Aug 08 2024

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such possible 
 issues?

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?

there are `toPrec`[1] intrinsics to solve exactly this issue of 
lack of truncation of precision.


[1]: 
https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214

 Greetz
 Carsten

Aug 08 2024

IchorDev <zxinsworld gmail.com> writes:

On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:
 On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote 
 wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such 
 possible issues?

 I can imagine that bugs causes by such subtil differences 
 might be very difficult to fix.


 Any experiences or thought on this?

 there are `toPrec`[1] intrinsics to solve exactly this issue of 
 lack of truncation of precision.


 [1]: 
 https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214

 Greetz
 Carsten


Can anyone remind me whether there’s a way to force calculations 
to be performed with a certain degree of precision (e.g. single 
or double) instead of rounding down from the largest floats 
available? Would be really useful for cross-platform consistency. 
:’|

Aug 10 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Saturday, 10 August 2024 at 11:15:19 UTC, IchorDev wrote:
 On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:
 On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote 
 wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such 
 possible issues?

 I can imagine that bugs causes by such subtil differences 
 might be very difficult to fix.


 Any experiences or thought on this?

 there are `toPrec`[1] intrinsics to solve exactly this issue 
 of lack of truncation of precision.


 [1]: 
 https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214

 Greetz
 Carsten


 Can anyone remind me whether there’s a way to force 
 calculations to be performed with a certain degree of precision 
 (e.g. single or double) instead of rounding down from the 
 largest floats available? Would be really useful for 
 cross-platform consistency. :’|

D specifies that floating point calculations may be performed 
with higher precision than the type suggests:

 Execution of floating-point expressions may yield a result of 
 greater precision than dictated by the source.

— [*Floating-Point Intermediate 
Values*](https://dlang.org/spec/float.html#fp_intermediate_values), D Language
Specification

On almost all non-embedded CPUs, doing non-vector calculations in 
`float` is more costly than doing them in `double` or `real` 
because for single-arguments, the floats are converted to 
`double` or `real`. I consider `float` to be a type used for 
storing values in arrays that don’t need the precision and save 
me half the RAM.

Aug 12 2024

IchorDev <zxinsworld gmail.com> writes:

On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:
 On almost all non-embedded CPUs, doing non-vector calculations 
 in `float` is more costly than doing them in `double` or `real` 
 because for single-arguments, the floats are converted to 
 `double` or `real`. I consider `float` to be a type used for 
 storing values in arrays that don’t need the precision and save 
 me half the RAM.

I don’t care. Only one family of CPU architectures supports 
‘extended precision’ floats (because it’s a waste of time), so I 
would like to know a way to always perform calculations with 
double precision for better cross-platform consistency. Imagine 
trying to implement JRE without being able to do native double 
precision maths.

Aug 12 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Monday, 12 August 2024 at 11:06:15 UTC, IchorDev wrote:
 On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:
 On almost all non-embedded CPUs, doing non-vector calculations 
 in `float` is more costly than doing them in `double` or 
 `real` because for single-arguments, the floats are converted 
 to `double` or `real`. I consider `float` to be a type used 
 for storing values in arrays that don’t need the precision and 
 save me half the RAM.

 I don’t care. Only one family of CPU architectures supports 
 ‘extended precision’ floats (because it’s a waste of time), so 
 I would like to know a way to always perform calculations with 
 double precision for better cross-platform consistency. Imagine 
 trying to implement JRE without being able to do native double 
 precision maths.

I honestly don’t know how JRE did implement `double` operations 
on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 
-O3` and add some `double` values, intermediate results are 
stored and loaded again. Very inefficient. If I use `long 
double`, that does not happen.

The assertion that only one CPU family supports extended floats 
is objectively wrong. You probably meant the x86 with its 80-bit 
format, which is still noteworthy, as x86 is very, very common. 
However, at least the POWER9 family supports 128-bit IEEE-754 
quad quadruple-precision floats. IIUC, RISC-V also supports them.

Aug 13 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:

 I honestly don’t know how JRE did implement `double` operations 
 on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 
 -O3` and add some `double` values, intermediate results are 
 stored and loaded again. Very inefficient. If I use `long 
 double`, that does not happen.

For those who haven't really run into numerical analysis before, 
this paper is worth reading. I've given a link to the HN 
discussion around it:

https://news.ycombinator.com/item?id=37028310

Aug 13 2024

matheus <matheus gmail.com> writes:

On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:
 ...
 For those who haven't really run into numerical analysis 
 before, this paper is worth reading. I've given a link to the 
 HN discussion around it:

 https://news.ycombinator.com/item?id=37028310

Since I can't access that site, could you please share the URL 
directly to this paper?

Matheus.

Aug 13 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Tuesday, 13 August 2024 at 14:20:29 UTC, matheus wrote:
 On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:
 ...
 For those who haven't really run into numerical analysis 
 before, this paper is worth reading. I've given a link to the 
 HN discussion around it:

 https://news.ycombinator.com/item?id=37028310

 Since I can't access that site, could you please share the URL 
 directly to this paper?

 Matheus.

sure, it's https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf

Aug 13 2024

IchorDev <zxinsworld gmail.com> writes:

On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:
 I honestly don’t know how JRE did implement `double` operations 
 on e.g. the Intel 80486

Probably in software, but modern x86 CPUs can just use the 
hardware; so the difference isn’t so meaningful anymore.

 if I try using `gcc -mfpmath=387 -O3` and add some `double` 
 values, intermediate results are stored and loaded again. Very 
 inefficient. If I use `long double`, that does not happen.

Who cares? In a situation where we must reach the same result on 
every platform (cross-platform determinism) the performance can 
suffer a bit. You are just avoiding my question by making 
excuses. Do you make sure your program’s data is passed around 
exclusively via data registers? Do you only write branchless 
code? Does your entire program fit inside the CPU cache? No, 
because we make performance sacrifices to achieve the desired 
outcome. The idea that the only valid way of coding something is 
the way that compromises on the integrity of the output in favour 
of performance is a step away from programming nihilism.

 You probably meant the x86 with its 80-bit format, which is 
 still noteworthy

Yes because it’s referred to as ‘extended precision’ and doesn’t 
have a proper name because it’s an unstandardised atrocity.
https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format

 However, at least the POWER9 family supports 128-bit IEEE-754 
 quad quadruple-precision floats. IIUC, RISC-V also supports 
 them.

binary128 is obviously not the same as x86’s ‘extended 
precision’. You will not get cross-platform deterministic results 
from using them interchangeably; you will get misery.

Aug 13 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/8/24 12:31, Carsten Schlote wrote:
 
 And it opens up some questions:
 
 - Can CTFE be used under all circumstances when float numbers of any 
 precision are involved?
 - Or is this some kind of expected behaviour whenever floats are involved?
 - Is the D CTFE documentation completely covering such possible issues?
 
 I can imagine that bugs causes by such subtil differences might be very 
 difficult to fix.
 
 
 Any experiences or thought on this?

The specification allows behavior like that even at runtime.
https://dlang.org/spec/float.html

I have been vocally against this. Full portability is perhaps one thing, 
as hardware differences do exist, but completely thwarting the 
expectations by deliberately doing a different operation than the one 
you requested I think makes no sense at all. A lot of modern hardware 
nowadays is compatible or almost compatible regarding floats.

Aug 12 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:

 Ok, the fractal code uses 'double' floats, which are by their 
 very nature limited in precision. But assuming that the math 
 emulation of CTFE works identical as the CPU does at runtime, 
 the outcome should be identical.

 Any experiences or thought on this?

It's worth mentioning that in the wider field, it is generally 
held that when working with floating point operations we don't 
expect to see the same duplicate results every time. It is a 
mistake to compare for equality, you should be checking for 
"almost equals".

For your particular application it might feel like a "bug" to see 
different results, but in fact this is expected behaviour and 
generally not held to be a fault.

One little piece of anecdata, when working with python I saw 
differing results in the 17th d.p. when running the exact same 
calculation in the same python session. The reason turned out to 
be which registers were being used each time, which could vary. 
This is not considered to be a bug.

Aug 13 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/13/24 09:56, Abdulhaq wrote:
 
 It's worth mentioning that in the wider field, it is generally held that 
 when working with floating point operations we don't expect to see the 
 same duplicate results every time.

This is not true at the level of hardware, where results are actually 
reproducible on the same processor architecture (and often even across 
architectures) with the same sequence of instructions. Treating floats 
as "fuzzy" is perhaps common, but it is not correct. Rounding is a pure 
function like any other.

 It is a mistake to compare for equality, you should be checking for "almost
equals".

That is a different point.

 
 For your particular application it might feel like a "bug" to see different
results, but in fact this is expected behaviour and generally not held to be a
fault.
 ...

It's annoying anyway, and we should hold programming languages to a 
higher standard than that.

 One little piece of anecdata, when working with python I saw differing results
in the 17th d.p. when running the exact same calculation in the same python
session. The reason turned out to be which registers were being used each time,
which could vary. This is not considered to be a bug. 

Well, I think the x87 is a bad design as a compilation target. If you 
are targeting that with a compiler whose developers by default do not 
care about reproducibility, expect weird behavior. These instructions 
are now deprecated, so the calculation should change a bit.

Aug 13 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:

 It's annoying anyway, and we should hold programming languages 
 to a higher standard than that.


 Well, I think the x87 is a bad design as a compilation target. 
 If you are targeting that with a compiler whose developers by 
 default do not care about reproducibility, expect weird 
 behavior. These instructions are now deprecated, so the 
 calculation should change a bit.

It's an interesting discussion, but we are where we are, and I 
think you'd agree that the OP should understand that in the 
current world of "systems"  programming languages, this behaviour 
would not be viewed as a "bug".

I'm only being somewhat tongue-in-cheek when I say that his 
application, viewing fractals, could be considered a kind of 
"chaos amplifier" that is especially prone to these sorts of 
issues.

I'm going to guess that the situation we're in is due to 
preference of speed over reproduceability in the days when 
computers ran much more slowly than they do now. Would we choose 
a different priority these days? I don't know.

Aug 13 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/13/24 13:09, Abdulhaq wrote:
 On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:
 
 It's annoying anyway, and we should hold programming languages to a 
 higher standard than that.


 Well, I think the x87 is a bad design as a compilation target. If you 
 are targeting that with a compiler whose developers by default do not 
 care about reproducibility, expect weird behavior. These instructions 
 are now deprecated, so the calculation should change a bit.

 
 It's an interesting discussion, but we are where we are, and I think 
 you'd agree that the OP should understand that in the current world of 
 "systems"  programming languages, this behaviour would not be viewed as 
 a "bug".
 ...

Well, I think one thing is where we are now, another question is where 
we should be in the future.

 I'm only being somewhat tongue-in-cheek when I say that his application, 
 viewing fractals, could be considered a kind of "chaos amplifier" that 
 is especially prone to these sorts of issues.
 ...

Not really. I think it is valid to expect reproducible results for 
addition, absolute value, and squaring, no matter how many times you 
repeat them. The reason why this issue happened is that D does not even 
try a little bit, and explicitly just goes ahead and uses the wrong data 
type, namely the 80-bit one only supported by the x87.

 I'm going to guess that the situation we're in is due to preference of 
 speed over reproduceability in the days when computers ran much more 
 slowly than they do now.

Ironically, the x87 design is mostly about getting accurate numerics. It 
is a rather inefficient design. One issue is that it was designed for 
assembly programmers, not C compilers. C language features do not map 
well to it. x87 results are reproducible in assembly as it is clear 
where rounding happens.

It is true that computers used to run more slowly, and the x87 is some 
part of the reason why that was true. There are many reasons why the 
design was ditched, efficiency is one of them.

 Would we choose a different priority these days? I don't know.

The x87 results in both slower code and worse reproducibility when used 
as a target for `float`/`double`/`real` computations.

Ironically, IIRC Kahan was involved in the x87 design, yet algorithms 
attributed to him, like Kahan summation, can actually not be implemented 
correctly using D floating-point types, because of the insane rules 
introduced to support targeting the x87 without changing rounding 
precision or spilling data to the cache and reloading it.

At the same time, "better accuracy" is part of the reason why CTFE uses 
higher floating-point precision, but higher precision does not mean the 
result will be more accurate. Unless arbitrary precision is used, which 
is even slower, it leads to double-rounding issues which can cause the 
result to be less accurate.

Aug 13 2024

claptrap <clap trap.com> writes:

On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:
 On 8/13/24 13:09, Abdulhaq wrote:

 At the same time, "better accuracy" is part of the reason why 
 CTFE uses higher floating-point precision, but higher precision 
 does not mean the result will be more accurate. Unless 
 arbitrary precision is used, which is even slower, it leads to 
 double-rounding issues which can cause the result to be less 
 accurate.

If you have a pristine 24 bit audio sample, maybe 120db SnR due 
to DAC limitations. Processing it in 16 bit will automatically 
loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you 
process it at 32 bits you still have 120db SnR, greater precision 
but same accuracy as the source.

The point is the statement "higher precision does not mean the 
result will be more accurate." is only half true.

If the precision you are doing the calculations at is already 
higher than the accuracy of your data, more precision wont get 
you much of anything.

but if the converse is true, if you are processing the data at 
lower precision than the accuracy of your source data, then 
increasing precision will absolutely increase accuracy.

Aug 13 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/14/24 00:35, claptrap wrote:
 On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:
 On 8/13/24 13:09, Abdulhaq wrote:

 At the same time, "better accuracy" is part of the reason why CTFE 
 uses higher floating-point precision, but higher precision does not 
 mean the result will be more accurate. Unless arbitrary precision is 
 used, which is even slower, it leads to double-rounding issues which 
 can cause the result to be less accurate.

 
 If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC 
 limitations. Processing it in 16 bit will automatically loose you 4 bits 
 of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits 
 you still have 120db SnR, greater precision but same accuracy as the 
 source.
 
 The point is the statement "higher precision does not mean the result 
 will be more accurate." is only half true.
 ...

It is fully true. "A means B" holds when in any situation where A holds, 
B also holds. "A does not mean B" holds when there is a situation where 
A is true but B is false.

Clearly I am not saying that higher accuracy implies lower precision, 
just that there are cases where lower precision gives you higher accuracy.

 If the precision you are doing the calculations at is already higher 
 than the accuracy of your data, more precision wont get you much of 
 anything.
 ...

It may even break some algorithms, especially when it is inconsistently 
and implicitly applied, which is the case in D.

 but if the converse is true, if you are processing the data at lower 
 precision than the accuracy of your source data, then increasing 
 precision will absolutely increase accuracy.
 

Usually true, but I still want to decide on my own which computation 
will be rounded to what precision. Even an accidentally avoided accuracy 
issue resulting from not guaranteed implicit higher precision is a 
ticking time bomb and very annoying to debug when it goes off.

Aug 13 2024

claptrap <clap trap.com> writes:

On Wednesday, 14 August 2024 at 02:35:08 UTC, Timon Gehr wrote:
 On 8/14/24 00:35, claptrap wrote:
 If you have a pristine 24 bit audio sample, maybe 120db SnR 
 due to DAC limitations. Processing it in 16 bit will 
 automatically loose you 4 bits of accuracy, it'll drop the SnR 
 to 96dbs. If you process it at 32 bits you still have 120db 
 SnR, greater precision but same accuracy as the source.
 
 The point is the statement "higher precision does not mean the 
 result will be more accurate." is only half true.
 ...

 It is fully true. "A means B" holds when in any situation where 
 A holds, B also holds. "A does not mean B" holds when there is 
 a situation where A is true but B is false.

I'd argue "does not mean" is vague, and most people would read 
what you wrote as " A != B in any situation."

But otherwise I agree based on what you meant.


 but if the converse is true, if you are processing the data at 
 lower precision than the accuracy of your source data, then 
 increasing precision will absolutely increase accuracy.
 

 Usually true, but I still want to decide on my own which 
 computation will be rounded to what precision. Even an 
 accidentally avoided accuracy issue resulting from not 
 guaranteed implicit higher precision is a ticking time bomb and 
 very annoying to debug when it goes off.

I agree the compiler should actually use the float precision you 
explicitly ask for.

Aug 14 2024

Dom DiSc <dominikus scherkl.de> writes:

On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision 
 you explicitly ask for.

But if you do cross compiling, that may simply be not possible 
(because the target has different hardware implementation than 
the hardware the compiler runs on).

Also relying on specific inaccuracy of FP calculations is very 
bad design.

Aug 15 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/15/24 12:25, Dom DiSc wrote:
 On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision you 
 explicitly ask for.

 
 But if you do cross compiling, that may simply be not possible (because 
 the target has different hardware implementation than the hardware the 
 compiler runs on).
 ...

It's still possible, but it's not even what I asked for. Also, 
personally I am not targeting any machine whose floating-point unit is 
sufficiently incompatible to cause a difference on this specific code. 
There has been a convergence of how floats work. D differs gratuitously.

 Also relying on specific inaccuracy of FP calculations is very bad design.

The point is not "relying on specific inaccuracy", the point is 
reproducibility. The specifics of the inaccuracy usually do not matter 
at all, what matters a great deal sometimes is that you get the same 
result when you run the same computation.

Aug 15 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/15/24 12:25, Dom DiSc wrote:
 Also relying on specific inaccuracy of FP calculations is very bad design.

Personally I mostly care about reproducibility, but also, as I said 
earlier, accuracy-improving algorithms like Kahan summation or general 
double-double computations very much "rely on specific inaccuracy". It's 
not bad design, it may just be that you are not sufficiently informed 
about this kind of thing.

Kahan summation is even in Phobos: 
https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L7558-L7572

This implementation is however technically incorrect because D could 
choose a bad subset of the computations and run only those at "higher" 
precision, destroying the overall accuracy.

It gets more obvious if you drop the judgmental tone and just call 
"specific inaccuracy" by the proper name: "correct rounding".

Aug 15 2024

claptrap <clap trap.com> writes:

On Thursday, 15 August 2024 at 10:25:45 UTC, Dom DiSc wrote:
 On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision 
 you explicitly ask for.

 But if you do cross compiling, that may simply be not possible 
 (because the target has different hardware implementation than 
 the hardware the compiler runs on).

So if have some CTFE function that doesn't a bunch of 
calculations in double precision and the compiler is running on a 
platform that only supports float, what happens?


 Also relying on specific inaccuracy of FP calculations is very 
 bad design.

It's not relying on them, it's accounting for them.

Aug 15 2024

Carsten Schlote <carsten.schlote gmx.net> writes:

On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw 
 differing results in the 17th d.p. when running the exact same 
 calculation in the same python session. The reason turned out 
 to be which registers were being used each time, which could 
 vary. This is not considered to be a bug.

I would consider it a bug.

Running the same piece of code multiple times with exactly the 
same parameters should result into the exactly same results.

But once I also expected that CTFE and real execution of some 
code should always yield the same results. This assumption is 
wrong as I learned recently. But using 'real' as type usually 
leads to the same result.

All the other issues with floats of any size are some separate 
topic.

Aug 15 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/15/24 11:13, Carsten Schlote wrote:
 On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw differing 
 results in the 17th d.p. when running the exact same calculation in 
 the same python session. The reason turned out to be which registers 
 were being used each time, which could vary. This is not considered to 
 be a bug.

 
 I would consider it a bug.
 
 Running the same piece of code multiple times with exactly the same 
 parameters should result into the exactly same results.

The D spec allows this to not be true too. I consider it a weakness of 
the spec.

Aug 15 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote 
wrote:
 On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw 
 differing results in the 17th d.p. when running the exact same 
 calculation in the same python session. The reason turned out 
 to be which registers were being used each time, which could 
 vary. This is not considered to be a bug.

 I would consider it a bug.

I'd say that whether it's a bug or not comes down to the 
definition of "bug", and I don't think a discussion of that is 
useful, and also it's subjective and reasonable people can differ 
on it.

But I would say that, in my python example, both answers were 
wrong, in the sense that they were not exactly right (and can't 
be), but they were both wrong in a different way.

In your example I'd ask you to consider this: all the images you 
created, except the trivial ones, were wrong, in the sense that 
some of the calculations were inexact and some of them had 
sufficient error to change the color of the pixel. It's just that 
the CTFE image was wrong in a different way to the non-CTFE 
image. Why do you want to reproduce the errors?

In general floating point calculations are inexact, and we 
shouldn't (generalising here) expect to get the same error across 
different platforms.

In this specific case you could argue that it's the "same 
platform", I get that.

Aug 15 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote

To clarify a bit more, I'm not just talking about single isolated 
computations, I'm talking about e.g. matrix multiplication. 
Different compilers, even LDC vs DMD for example, could optimise 
the calculation in a different way, loop unrolling, step 
elimination etc. even if the rounding algorithms etc. at the chip 
level are the same, the way the code is compiled and calculations 
sequenced will change the error in the final answer.

Then, variations in pipelining and caching at the processor level 
could also affect the answer.

And if you move on to different computing paradigms such as 
quantum computing and other as yet undiscovered techniques, again 
the way operations and rounding etc is compounded will cause 
divergences in computations.

Now, we could insist that we somehow legislate for the way 
compound calculations are conducted. But that would cripple the 
speed of calculations for some processor architecture / paradigms 
for a goal (reproduceability) which is worthy, but for 99% of 
usages not sufficiently beneficial to pay the big price in 
performance.

Aug 15 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/15/24 18:50, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote

 
 To clarify a bit more, I'm not just talking about single isolated 
 computations, I'm talking about e.g. matrix multiplication. Different 
 compilers, even LDC vs DMD for example, could optimise the calculation 
 in a different way, loop unrolling, step elimination etc. even if the 
 rounding algorithms etc. at the chip level are the same, the way the 
 code is compiled and calculations sequenced will change the error in the 
 final answer.
 ...

LDC disables -ffast-math by default.

 Then, variations in pipelining and caching at the processor level could 
 also affect the answer.
 ...

No.

 And if you move on to different computing paradigms such as quantum 
 computing and other as yet undiscovered techniques, again the way 
 operations and rounding etc is compounded will cause divergences in 
 computations.
 ...

Yes, if you move on to an analog computing paradigm with imperfect error 
correction, full reproducibility will go out the window. Floating point 
is not that though.

 Now, we could insist that we somehow legislate for the way compound 
 calculations are conducted. But that would cripple the speed of 
 calculations for some processor architecture / paradigms for a goal 
 (reproduceability) which is worthy, but for 99% of usages not 
 sufficiently beneficial to pay the big price in performance.
 
 

It's really not that expensive, changing the result via optimizations is 
disabled by default in LDC, and actually, how do you know that the 
compiler does not pessimize your hand-optimized compound operations.

I am not even against people being able to pass -ffast-math, it should 
just not destroy the correctness and reproducibility of everyone else's 
computations.

Aug 15 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/15/24 18:21, Abdulhaq wrote:
 Why do you want to reproduce the errors?

It's you who calls it "errors". Someone else may just call it "results".

Science. Determinism, e.g. blockchain or other deterministic lock-step 
networking. etc. It's not hard to come up with use cases. Sometimes what 
exact result you get is not very important, but it is important that you 
get the same one.

 
 In general floating point calculations are inexact, and we shouldn't
(generalising here) expect to get the same error across different platforms. 

It's just not true that floating point calculations are somehow 
"inexact". You are confusing digital and analog computing. This is still 
digital computing. Rounding is a deterministic, exact function, like any 
other.

Aug 15 2024

Abdulhaq <alynch4048 gmail.com> writes:

On Thursday, 15 August 2024 at 16:51:51 UTC, Timon Gehr wrote:
 On 8/15/24 18:21, Abdulhaq wrote:
 Why do you want to reproduce the errors?

 It's you who calls it "errors". Someone else may just call it 
 "results".

Here by "error" I mean where the result is different to a known 
correct answer. It's a simple matter to propose a method for 
producing a fractal plot, and to have a notional "correct" plot 
if calculated to infinite precision. Then, at sufficient depth of 
an e.g. mandelbrot plot, at a finite precision, some pixels will 
differ from the ideal plot produced by the machine of infinite 
precision. These divergences are, by my definition, errors. Yes, 
it's all results, and some results are errors.

 Science. Determinism, e.g. blockchain or other deterministic 
 lock-step networking. etc. It's not hard to come up with use 
 cases. Sometimes what exact result you get is not very 
 important, but it is important that you get the same one.

 


This is why I was careful to say that I was generalising. Of 
course we can come up with examples where reproduceability is an 
essential requirement. In this case we can often work out what 
precision is required to achieve said reproduceability.

 In general floating point calculations are inexact, and we 
 shouldn't (generalising here) expect to get the same error 
 across different platforms.

 It's just not true that floating point calculations are somehow 
 "inexact". You are confusing digital and analog computing. This 
 is still digital computing. Rounding is a deterministic, exact 
 function, like any other.

Fair point. I meant to refer to floating point numbers rather 
than calculations. There are an infinite number of real numbers 
that cannot be exactly represented by a floating point number of 
finite precision. The floating point representations of those 
numbers are inexact.

If we calculate 1/3 using floating point, the result can be 
exactly correct in the sense there is a correct answer in the 
world of floating point, but the calculated result will be an 
inexact representation of the correct number, which is what I had 
in mind.

Aug 15 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi

 I'm playing with CTFE in D. This feature allows for a lot of 
 funny things, e.g. initialisation of immutable data at compile 
 time with the result of some other function (template).

 As a result I get immutable result blobs compiled into the 
 binary. But none of the generating code, because it was already 
 executed by CTFE.

 This worked nicly for several other usecases as well.For now 
 the results of CTFE and RT were always the same. As expected.

 However, yesterday a unit-test started to report, that the 
 results created by the same code with same parameters differ 
 when run in CTFE mode or at runtime.

 ```D
     static immutable ubyte[] burningShipImageCTFE = 
 generateBurningShipImage(twidth, theight, maxIter);
     immutable ubyte[] burningShipImageRT = 
 generateBurningShipImage(twidth, theight, maxIter);
     assert(burningShipImageCTFE == burningShipImageRT, "Same 
 results expected.");

 ```
 I diffed the pictures and indeed some of the pixels in the more 
 complex areas of the BurningShip fractal were clearly and 
 noteably different.

 Ok, the fractal code uses 'double' floats, which are by their 
 very nature limited in precision. But assuming that the math 
 emulation of CTFE works identical as the CPU does at runtime, 
 the outcome should be identical.

 Or not, in some cases ;-) E.g. with a fractal equation where 
 smallest changes can result into big differences.

 And it opens up some questions:

 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such possible 
 issues?

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?

Experiences, little, as I'm not doing floating-point stuff 
professionally, but I know my stuff because in the past, for some 
years, I did the lecture assistance for a numerical programming 
course.

The normal use case for floating-point isn't perfectly 
reproducible results between different optimization levels. 
However, differences between CTFE and RT are indeed unacceptable 
for core-language operations. Those are bugs. Of course, results 
in user code can differ between CTFE and RT due to using `__ctfe` 
incorrectly. It might be noteworthy that C++ (at least up to 
including C++17) does not allow floating-point types in CTFE 
(i.e. in `constexpr` execution) and my suspicion is that this is 
the reason.

Maybe the solution is the same: Remove floating-point operations 
from CTFE or at least ones that could differ from RT. It would be 
awkward, at last in some people's opinion, because that would 
mean that at CTFE, only `real` is available, despite it being 
implementation defined (it's not just platform dependent, it's 
also how expressions are interpreted), while `double` and `float` 
being seemingly exactly defined. The reason is that `real` can't 
be optimized to higher precision as it's the highest precision 
format supposed by the platform by definition, whereas for the 
smaller formats, RT results may differ for different optimization 
levels. What the compiler could do, however is replace `a + b * 
c` by a fused multiply add. If it does that consistently across 
CTFE and RT, as I read the spec, it would be allowed to do that, 
even for `real`.

The reason D specifies floating-point operations not that 
precisely is to allow for optimizations. Generally speaking, 
optimizations require some leeway in the spec. Optimizations are 
also required not to change observable behavior, but what counts 
as that is again up to the spec. In C++, the compiler is allowed 
to optimize away copies, even if those would invoke a copy 
constructor that has observable side-effects. As I see it (not my 
opinion, just what I see and conclude), D specifies differences 
in floating-point operations due to them being carried out in 
higher-than-required precision not an obersvable side-effect, 
i.e. one that the optimizer must preserve, even if you can 
practically observe a difference.

The reason for that is probably because Walter didn't like that 
other languages nailed down floating-point operations so that 
you'd get both less precise results *and* worse performance. That 
would for example be the case on an 80387 coprocessor, and 
(here's where my knowledge ends) probably also true for basically 
all hardware today if you consider `float` specifically. I know 
of no hardware, that supports single precision, but not double 
precision. Giving you double precision instead of single is at 
least basically free and possibly even a performance boost, while 
also giving you more precision.

An algorithm like Kahan summation must be implemented in a way 
that takes those optimizations into account. This is exactly like 
in C++, signed integer overflow is undefined, not because it's 
undefined on the hardware, but because it allows for 
optimizations. In Zig, all integer overflow is undefined for that 
reason, and for wrap-around or saturated arithmetic, there are 
separate operators. D could easily add specific functions to 
`core.math` that specify operations as specifically IEEE-754 
confirming. Using those, Phobos could give you types that are 
specified to produce results as specified by IEEE-754, with no 
interference by the optimizer. You can't actually do the reverse, 
i.e. provide a type in Phobos that allows for optimizations of 
that sort but the core-language types are guaranteed to be 
unoptimized. Such a type would have to be compiler-recognized, 
i.e. it would end up being a built-in type.

Aug 17 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 8/17/24 18:33, Quirin Schroll wrote:
 

The normal use case for floating-point isn't perfectly reproducible 
results between different optimization levels.

I would imagine the vast majority of FLOPs nowadays are used in HPC and 
AI workloads. Reproducibility is at least a plus, particularly in a 
research context.

 However, differences between CTFE and RT are indeed unacceptable for
core-language operations. Those are bugs.

No, they are not bugs, it's just the same kind of badly designed 
specification. According to the specification, you can get differences 
between RT and RT when running the exact same function. Of course you 
will get differences between CTFE and RT.

 The reason for that is probably because Walter didn't like that other 
 languages nailed down floating-point operations

Probably. C famously nails down floating-point operations, just like it 
nails down all the other types. D is really well-known for all of its 
unportable built-in data types, because Walter really does not like 
nailing things down and this is not one of D's selling points. /s

Anyway, at least LDC is sane on this at runtime by default. Otherwise I 
would have to switch language for use cases involving floating point, 
which would probably just make me abandon D in the long run.

 so that you'd get both less precise results *and* worse performance.

Imagine just manually using the data type that is most suitable for your 
use case.

 That would for example be 
 the case on an 80387 coprocessor, and (here's where my knowledge ends) 

Then your knowledge may be rather out of date. I get the x87 
shenanigans, but that's just not very relevant anymore. I am not 
targeting 32-bit x86 with anything nowadays.

 probably also true for basically all hardware today if you consider 
 `float` specifically. I know of no hardware, that supports single 
 precision, but not double precision. Giving you double precision instead 
 of single is at least basically free and possibly even a performance 
 boost, while also giving you more precision.

It's nonsense. If I want double, I ask for double. Also, it's definitely 
not true that going to double instead of single precision will boost 
your performance on a modern machine. If you are lucky it will not slow 
you down, but if the code can be auto-vectorized (or you are vectorizing 
manually), you are looking at least at a 2x slowdown.

 
 An algorithm like Kahan summation must be implemented in a way that takes
those optimizations into account.

I.e., do not try to implement this at all with the built-in 
floating-point types. It's impossible.

 This is exactly like in C++, signed integer overflow is undefined, not because
it's undefined on the hardware, but because it allows for optimizations.

If you have to resort to invoking insane C++ precedent in order to 
defend a point, you have lost the debate. Anyway, it is not at all the 
same (triggered by overflow vs triggered by default, undefined behavior 
vs wrong result), and also, in D, signed overflow is actually defined 
behavior.

 D could easily add specific functions to `core.math` that specify operations
as specifically IEEE-754 confirming. Using those, Phobos could give you types
that are specified to produce results as specified by IEEE-754, with no
interference by the optimizer. 

It does not do that. Anyway, I would expect that to go to std.numeric.

 You can't actually do the reverse, i.e. provide a type in Phobos that allows
for optimizations of that sort but the core-language types are guaranteed to be
unoptimized.

You say "unoptimized", I hear "not broken".

Anyway, clearly the default should be the variant with less pitfalls. If 
you really want to add some sort of flexible-precision data types, why 
not, but there should be a compiler flag to disable it.

 Such a type would have to be compiler-recognized, i.e. it would end up being a
built-in type. 

I have no desire at all to suffer from irreproducible behavior because 
some dependency tried to max out on some irrelevant to me benchmark. I 
also have no desire at all to suffer from an unnecessary performance 
penalty just to recover reproducible behavior that is exposed directly 
by the hardware.

Of course, then there's the issue that libc math functions are not fully 
precise and have differences between implementations, but at least there 
seems to be some movement on that front, and this is easy to work around 
given that the built-in operations are sane.

Aug 18 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:
 On 8/17/24 18:33, Quirin Schroll wrote:
 

 The normal use case for floating-point isn't perfectly 
 reproducible results between different optimization levels.

 I would imagine the vast majority of FLOPs nowadays are used in 
 HPC and AI workloads. Reproducibility is at least a plus, 
 particularly in a research context.

 However, differences between CTFE and RT are indeed 
 unacceptable for core-language operations. Those are bugs.

 No, they are not bugs, it's just the same kind of badly 
 designed specification. According to the specification, you can 
 get differences between RT and RT when running the exact same 
 function. Of course you will get differences between CTFE and 
 RT.

 The reason for that is probably because Walter didn't like 
 that other languages nailed down floating-point operations

 Probably. C famously nails down floating-point operations, just 
 like it nails down all the other types. D is really well-known 
 for all of its unportable built-in data types, because Walter 
 really does not like nailing things down and this is not one of 
 D's selling points. /s

 Anyway, at least LDC is sane on this at runtime by default. 
 Otherwise I would have to switch language for use cases 
 involving floating point, which would probably just make me 
 abandon D in the long run.

 so that you'd get both less precise results *and* worse 
 performance.

 Imagine just manually using the data type that is most suitable 
 for your use case.

 That would for example be the case on an 80387 coprocessor, 
 and (here's where my knowledge ends)

 Then your knowledge may be rather out of date. I get the x87 
 shenanigans, but that's just not very relevant anymore. I am 
 not targeting 32-bit x86 with anything nowadays.

 probably also true for basically all hardware today if you 
 consider `float` specifically. I know of no hardware, that 
 supports single precision, but not double precision. Giving 
 you double precision instead of single is at least basically 
 free and possibly even a performance boost, while also giving 
 you more precision.

 It's nonsense. If I want double, I ask for double. Also, it's 
 definitely not true that going to double instead of single 
 precision will boost your performance on a modern machine. If 
 you are lucky it will not slow you down, but if the code can be 
 auto-vectorized (or you are vectorizing manually), you are 
 looking at least at a 2x slowdown.

 
 An algorithm like Kahan summation must be implemented in a way 
 that takes those optimizations into account.

 I.e., do not try to implement this at all with the built-in 
 floating-point types. It's impossible.

 This is exactly like in C++, signed integer overflow is 
 undefined, not because it's undefined on the hardware, but 
 because it allows for optimizations.

 If you have to resort to invoking insane C++ precedent in order 
 to defend a point, you have lost the debate. Anyway, it is not 
 at all the same (triggered by overflow vs triggered by default, 
 undefined behavior vs wrong result), and also, in D, signed 
 overflow is actually defined behavior.

 D could easily add specific functions to `core.math` that 
 specify operations as specifically IEEE-754 confirming. Using 
 those, Phobos could give you types that are specified to 
 produce results as specified by IEEE-754, with no interference 
 by the optimizer.

 It does not do that. Anyway, I would expect that to go to 
 std.numeric.

 You can't actually do the reverse, i.e. provide a type in 
 Phobos that allows for optimizations of that sort but the 
 core-language types are guaranteed to be unoptimized.

 You say "unoptimized", I hear "not broken".

 Anyway, clearly the default should be the variant with less 
 pitfalls. If you really want to add some sort of 
 flexible-precision data types, why not, but there should be a 
 compiler flag to disable it.

 Such a type would have to be compiler-recognized, i.e. it 
 would end up being a built-in type.

 I have no desire at all to suffer from irreproducible behavior 
 because some dependency tried to max out on some irrelevant to 
 me benchmark. I also have no desire at all to suffer from an 
 unnecessary performance penalty just to recover reproducible 
 behavior that is exposed directly by the hardware.

 Of course, then there's the issue that libc math functions are 
 not fully precise and have differences between implementations, 
 but at least there seems to be some movement on that front, and 
 this is easy to work around given that the built-in operations 
 are sane.

I think you got me wrong here on a key aspect: I’m not trying to 
argue, but to explain what the D spec means and outline possible 
rationales for why it is as it is. For example, my x87 knowledge 
isn’t “outdated,” x87 co-processors didn’t change. They’re just 
not relevant anymore practically, but they could have influenced 
*why* Walter specified D as he did. I totally agree with you that 
in the context of modern hardware, D’s float spec makes little 
sense.

Then I tried to outline a compromise between Walter’s numerously 
stated opinion and the desires of many D community members, 
including you.

I’m somewhere between your side and neither side. I know enough 
about floating-point arithmetic to avoid it as much as I can, and 
it’s not due to how languages implement it, but how it works in 
practice. Personally, I wouldn’t even care much if D removed 
floating-point types entirely. I just see (saw?, years ago) great 
potential in D and want it to succeed, and I believe having 
reproducible results would be a big win.

One aspect about immune-to-optimizations types (calling them 
`float32` and `float64` for now) would be that if you have both a 
context in which you want good and fast algebraic results (where 
using fused-multiply-add is welcomed), but also a context, in 
which reproducibility is required, you could use `double` in the 
first context, and `float64` in the second, while passing 
`-ffast-math`. Maybe a `pragma` on one kind of those functions is 
better. I don’t know. They’re not mutually exclusive. LDC already 
has ` fastmath`, so maybe it can just become official and be part 
of the D spec?

Aug 19 2024

D Programming

C/C++ Programming

Other

digitalmars.D - Differences in results when using the same function in CTFE and