www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Differences in results when using the same function in CTFE and

reply Carsten Schlote <carsten.schlote gmx.net> writes:
Hi

I'm playing with CTFE in D. This feature allows for a lot of 
funny things, e.g. initialisation of immutable data at compile 
time with the result of some other function (template).

As a result I get immutable result blobs compiled into the 
binary. But none of the generating code, because it was already 
executed by CTFE.

This worked nicly for several other usecases as well.For now the 
results of CTFE and RT were always the same. As expected.

However, yesterday a unit-test started to report, that the 
results created by the same code with same parameters differ when 
run in CTFE mode or at runtime.

```D
     static immutable ubyte[] burningShipImageCTFE = 
generateBurningShipImage(twidth, theight, maxIter);
     immutable ubyte[] burningShipImageRT = 
generateBurningShipImage(twidth, theight, maxIter);
     assert(burningShipImageCTFE == burningShipImageRT, "Same 
results expected.");

```
I diffed the pictures and indeed some of the pixels in the more 
complex areas of the BurningShip fractal were clearly and 
noteably different.

Ok, the fractal code uses 'double' floats, which are by their 
very nature limited in precision. But assuming that the math 
emulation of CTFE works identical as the CPU does at runtime, the 
outcome should be identical.

Or not, in some cases ;-) E.g. with a fractal equation where 
smallest changes can result into big differences.

And it opens up some questions:

- Can CTFE be used under all circumstances when float numbers of 
any precision are involved?
- Or is this some kind of expected behaviour whenever floats are 
involved?
- Is the D CTFE documentation completely covering such possible 
issues?

I can imagine that bugs causes by such subtil differences might 
be very difficult to fix.


Any experiences or thought on this?

Greetz
Carsten
Aug 08
next sibling parent IchorDev <zxinsworld gmail.com> writes:
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 […] assuming that the math emulation of CTFE works identical as 
 the CPU does at runtime, the outcome should be identical.
Which it probably does not. CTFE is meant to be less platform-dependent than runtime code, so it is interpreted rather than running directly on the CPU. This is why there are strict limitations on where CTFE can be invoked. I believe the exact behaviour of CTFE would be implementation-dependent though.
Aug 08
prev sibling next sibling parent reply user1234 <user1234 12.de> writes:
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi

 [...]

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?

 Greetz
 Carsten
During CTFE _there has to be_ much more round-trips between the storage and the effective operations whereas with native code several operations can happen without leaving the FPU... those are done with a different accurary (80 bits). During the round trips implied by CTFE you have a floating point truncation after each op. You see during CTFE just `a + b * c` means two truncations, one after the multiply, the second after the addition (even if this example is not very good as those op are done using SSE) That being said, that'd be intersting to verify if it's the problem. Maybe update your code to use `real` variables and let's see.
Aug 08
parent Carsten Schlote <carsten.schlote gmx.net> writes:
On Thursday, 8 August 2024 at 13:31:43 UTC, user1234 wrote:

 That being said, that'd be intersting to verify if it's the 
 problem. Maybe update your code to use `real` variables and 
 let's see.
Replaced the 'doubles' with 'real' and now the the results of the CTFE and RT executions are exactly the same. So, as you expected and described, the CTFE lost some precision by splitting the ops and storing the intermediate results back from FPU registers with internal precision to the limited 'double' type somewhere in memory. However, the other fractals (mandelbot, julia, newton, lyapunov, ...) work prefectly with the 'double's. I suspect, that their default/starting views do no involve enough recursions and iterations in the calculation, so that the missing precision is not important, yet. I will pimp my unittests to catch such precision issues for the other fractals as well, and then fix it by using 'real' types. So, what are the lessions learned here: 1. When using floats use the 'real' type - it uses the maximum available FPU precision and no precision is lost, when the FPU register is stored/read back to/from memory. So CTFE and RT execute have the best chance to produce the same results. 2. Use 'float' and 'double' only if you need to save some bits of storage and if the reduced precision is acceptable. 3. Unittests proofed again to be great. Especially when support for unittesting is built into the language. Even simplest assumptions can be wrong and having assertions for such assumptions will be helpful. 4. Use assert()/enforce(), in/out/do to check everything to match your expectations and assumptions. At the end ```assert(burningShipImageCTFE == burningShipImageRT, "Same results expected.");``` in my unitest did find it. And I got a chance to think about it and work on my assumptions ;-)
Aug 09
prev sibling next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Aug 08, 2024 at 10:31:32AM +0000, Carsten Schlote via Digitalmars-d
wrote:
[...]
 Ok, the fractal code uses 'double' floats, which are by their very
 nature limited in precision. But assuming that the math emulation of
 CTFE works identical as the CPU does at runtime, the outcome should be
 identical.
IIRC DMD either uses in CTFE or emits code that uses 8088 80-bit extended precision in intermediate results. Which, obviously, will produce different results when the same computation performed using only doubles. [...]
 - Or is this some kind of expected behaviour whenever floats are involved?
https://floating-point-gui.de/ --T
Aug 08
prev sibling next sibling parent reply Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such possible 
 issues?

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?
there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214
 Greetz
 Carsten
Aug 08
parent reply IchorDev <zxinsworld gmail.com> writes:
On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:
 On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote 
 wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such 
 possible issues?

 I can imagine that bugs causes by such subtil differences 
 might be very difficult to fix.


 Any experiences or thought on this?
there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214
 Greetz
 Carsten
Can anyone remind me whether there’s a way to force calculations to be performed with a certain degree of precision (e.g. single or double) instead of rounding down from the largest floats available? Would be really useful for cross-platform consistency. :’|
Aug 10
parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Saturday, 10 August 2024 at 11:15:19 UTC, IchorDev wrote:
 On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:
 On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote 
 wrote:
 Hi
 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such 
 possible issues?

 I can imagine that bugs causes by such subtil differences 
 might be very difficult to fix.


 Any experiences or thought on this?
there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214
 Greetz
 Carsten
Can anyone remind me whether there’s a way to force calculations to be performed with a certain degree of precision (e.g. single or double) instead of rounding down from the largest floats available? Would be really useful for cross-platform consistency. :’|
D specifies that floating point calculations may be performed with higher precision than the type suggests:
 Execution of floating-point expressions may yield a result of 
 greater precision than dictated by the source.
— [*Floating-Point Intermediate Values*](https://dlang.org/spec/float.html#fp_intermediate_values), D Language Specification On almost all non-embedded CPUs, doing non-vector calculations in `float` is more costly than doing them in `double` or `real` because for single-arguments, the floats are converted to `double` or `real`. I consider `float` to be a type used for storing values in arrays that don’t need the precision and save me half the RAM.
Aug 12
parent reply IchorDev <zxinsworld gmail.com> writes:
On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:
 On almost all non-embedded CPUs, doing non-vector calculations 
 in `float` is more costly than doing them in `double` or `real` 
 because for single-arguments, the floats are converted to 
 `double` or `real`. I consider `float` to be a type used for 
 storing values in arrays that don’t need the precision and save 
 me half the RAM.
I don’t care. Only one family of CPU architectures supports ‘extended precision’ floats (because it’s a waste of time), so I would like to know a way to always perform calculations with double precision for better cross-platform consistency. Imagine trying to implement JRE without being able to do native double precision maths.
Aug 12
parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Monday, 12 August 2024 at 11:06:15 UTC, IchorDev wrote:
 On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:
 On almost all non-embedded CPUs, doing non-vector calculations 
 in `float` is more costly than doing them in `double` or 
 `real` because for single-arguments, the floats are converted 
 to `double` or `real`. I consider `float` to be a type used 
 for storing values in arrays that don’t need the precision and 
 save me half the RAM.
I don’t care. Only one family of CPU architectures supports ‘extended precision’ floats (because it’s a waste of time), so I would like to know a way to always perform calculations with double precision for better cross-platform consistency. Imagine trying to implement JRE without being able to do native double precision maths.
I honestly don’t know how JRE did implement `double` operations on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 -O3` and add some `double` values, intermediate results are stored and loaded again. Very inefficient. If I use `long double`, that does not happen. The assertion that only one CPU family supports extended floats is objectively wrong. You probably meant the x86 with its 80-bit format, which is still noteworthy, as x86 is very, very common. However, at least the POWER9 family supports 128-bit IEEE-754 quad quadruple-precision floats. IIUC, RISC-V also supports them.
Aug 13
next sibling parent reply Abdulhaq <alynch4048 gmail.com> writes:
On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:

 I honestly don’t know how JRE did implement `double` operations 
 on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 
 -O3` and add some `double` values, intermediate results are 
 stored and loaded again. Very inefficient. If I use `long 
 double`, that does not happen.
For those who haven't really run into numerical analysis before, this paper is worth reading. I've given a link to the HN discussion around it: https://news.ycombinator.com/item?id=37028310
Aug 13
parent reply matheus <matheus gmail.com> writes:
On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:
 ...
 For those who haven't really run into numerical analysis 
 before, this paper is worth reading. I've given a link to the 
 HN discussion around it:

 https://news.ycombinator.com/item?id=37028310
Since I can't access that site, could you please share the URL directly to this paper? Matheus.
Aug 13
parent Abdulhaq <alynch4048 gmail.com> writes:
On Tuesday, 13 August 2024 at 14:20:29 UTC, matheus wrote:
 On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:
 ...
 For those who haven't really run into numerical analysis 
 before, this paper is worth reading. I've given a link to the 
 HN discussion around it:

 https://news.ycombinator.com/item?id=37028310
Since I can't access that site, could you please share the URL directly to this paper? Matheus.
sure, it's https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf
Aug 13
prev sibling parent IchorDev <zxinsworld gmail.com> writes:
On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:
 I honestly don’t know how JRE did implement `double` operations 
 on e.g. the Intel 80486
Probably in software, but modern x86 CPUs can just use the hardware; so the difference isn’t so meaningful anymore.
 if I try using `gcc -mfpmath=387 -O3` and add some `double` 
 values, intermediate results are stored and loaded again. Very 
 inefficient. If I use `long double`, that does not happen.
Who cares? In a situation where we must reach the same result on every platform (cross-platform determinism) the performance can suffer a bit. You are just avoiding my question by making excuses. Do you make sure your program’s data is passed around exclusively via data registers? Do you only write branchless code? Does your entire program fit inside the CPU cache? No, because we make performance sacrifices to achieve the desired outcome. The idea that the only valid way of coding something is the way that compromises on the integrity of the output in favour of performance is a step away from programming nihilism.
 You probably meant the x86 with its 80-bit format, which is 
 still noteworthy
Yes because it’s referred to as ‘extended precision’ and doesn’t have a proper name because it’s an unstandardised atrocity. https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format
 However, at least the POWER9 family supports 128-bit IEEE-754 
 quad quadruple-precision floats. IIUC, RISC-V also supports 
 them.
binary128 is obviously not the same as x86’s ‘extended precision’. You will not get cross-platform deterministic results from using them interchangeably; you will get misery.
Aug 13
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 8/8/24 12:31, Carsten Schlote wrote:
 
 And it opens up some questions:
 
 - Can CTFE be used under all circumstances when float numbers of any 
 precision are involved?
 - Or is this some kind of expected behaviour whenever floats are involved?
 - Is the D CTFE documentation completely covering such possible issues?
 
 I can imagine that bugs causes by such subtil differences might be very 
 difficult to fix.
 
 
 Any experiences or thought on this?
The specification allows behavior like that even at runtime. https://dlang.org/spec/float.html I have been vocally against this. Full portability is perhaps one thing, as hardware differences do exist, but completely thwarting the expectations by deliberately doing a different operation than the one you requested I think makes no sense at all. A lot of modern hardware nowadays is compatible or almost compatible regarding floats.
Aug 12
prev sibling next sibling parent reply Abdulhaq <alynch4048 gmail.com> writes:
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:

 Ok, the fractal code uses 'double' floats, which are by their 
 very nature limited in precision. But assuming that the math 
 emulation of CTFE works identical as the CPU does at runtime, 
 the outcome should be identical.
 Any experiences or thought on this?
It's worth mentioning that in the wider field, it is generally held that when working with floating point operations we don't expect to see the same duplicate results every time. It is a mistake to compare for equality, you should be checking for "almost equals". For your particular application it might feel like a "bug" to see different results, but in fact this is expected behaviour and generally not held to be a fault. One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.
Aug 13
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 8/13/24 09:56, Abdulhaq wrote:
 
 It's worth mentioning that in the wider field, it is generally held that 
 when working with floating point operations we don't expect to see the 
 same duplicate results every time.
This is not true at the level of hardware, where results are actually reproducible on the same processor architecture (and often even across architectures) with the same sequence of instructions. Treating floats as "fuzzy" is perhaps common, but it is not correct. Rounding is a pure function like any other.
 It is a mistake to compare for equality, you should be checking for "almost
equals".
That is a different point.
 
 For your particular application it might feel like a "bug" to see different
results, but in fact this is expected behaviour and generally not held to be a
fault.
 ...
It's annoying anyway, and we should hold programming languages to a higher standard than that.
 One little piece of anecdata, when working with python I saw differing results
in the 17th d.p. when running the exact same calculation in the same python
session. The reason turned out to be which registers were being used each time,
which could vary. This is not considered to be a bug. 
Well, I think the x87 is a bad design as a compilation target. If you are targeting that with a compiler whose developers by default do not care about reproducibility, expect weird behavior. These instructions are now deprecated, so the calculation should change a bit.
Aug 13
parent reply Abdulhaq <alynch4048 gmail.com> writes:
On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:

 It's annoying anyway, and we should hold programming languages 
 to a higher standard than that.


 Well, I think the x87 is a bad design as a compilation target. 
 If you are targeting that with a compiler whose developers by 
 default do not care about reproducibility, expect weird 
 behavior. These instructions are now deprecated, so the 
 calculation should change a bit.
It's an interesting discussion, but we are where we are, and I think you'd agree that the OP should understand that in the current world of "systems" programming languages, this behaviour would not be viewed as a "bug". I'm only being somewhat tongue-in-cheek when I say that his application, viewing fractals, could be considered a kind of "chaos amplifier" that is especially prone to these sorts of issues. I'm going to guess that the situation we're in is due to preference of speed over reproduceability in the days when computers ran much more slowly than they do now. Would we choose a different priority these days? I don't know.
Aug 13
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 8/13/24 13:09, Abdulhaq wrote:
 On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:
 
 It's annoying anyway, and we should hold programming languages to a 
 higher standard than that.


 Well, I think the x87 is a bad design as a compilation target. If you 
 are targeting that with a compiler whose developers by default do not 
 care about reproducibility, expect weird behavior. These instructions 
 are now deprecated, so the calculation should change a bit.
It's an interesting discussion, but we are where we are, and I think you'd agree that the OP should understand that in the current world of "systems"  programming languages, this behaviour would not be viewed as a "bug". ...
Well, I think one thing is where we are now, another question is where we should be in the future.
 I'm only being somewhat tongue-in-cheek when I say that his application, 
 viewing fractals, could be considered a kind of "chaos amplifier" that 
 is especially prone to these sorts of issues.
 ...
Not really. I think it is valid to expect reproducible results for addition, absolute value, and squaring, no matter how many times you repeat them. The reason why this issue happened is that D does not even try a little bit, and explicitly just goes ahead and uses the wrong data type, namely the 80-bit one only supported by the x87.
 I'm going to guess that the situation we're in is due to preference of 
 speed over reproduceability in the days when computers ran much more 
 slowly than they do now.
Ironically, the x87 design is mostly about getting accurate numerics. It is a rather inefficient design. One issue is that it was designed for assembly programmers, not C compilers. C language features do not map well to it. x87 results are reproducible in assembly as it is clear where rounding happens. It is true that computers used to run more slowly, and the x87 is some part of the reason why that was true. There are many reasons why the design was ditched, efficiency is one of them.
 Would we choose a different priority these days? I don't know.
The x87 results in both slower code and worse reproducibility when used as a target for `float`/`double`/`real` computations. Ironically, IIRC Kahan was involved in the x87 design, yet algorithms attributed to him, like Kahan summation, can actually not be implemented correctly using D floating-point types, because of the insane rules introduced to support targeting the x87 without changing rounding precision or spilling data to the cache and reloading it. At the same time, "better accuracy" is part of the reason why CTFE uses higher floating-point precision, but higher precision does not mean the result will be more accurate. Unless arbitrary precision is used, which is even slower, it leads to double-rounding issues which can cause the result to be less accurate.
Aug 13
parent reply claptrap <clap trap.com> writes:
On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:
 On 8/13/24 13:09, Abdulhaq wrote:

 At the same time, "better accuracy" is part of the reason why 
 CTFE uses higher floating-point precision, but higher precision 
 does not mean the result will be more accurate. Unless 
 arbitrary precision is used, which is even slower, it leads to 
 double-rounding issues which can cause the result to be less 
 accurate.
If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC limitations. Processing it in 16 bit will automatically loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits you still have 120db SnR, greater precision but same accuracy as the source. The point is the statement "higher precision does not mean the result will be more accurate." is only half true. If the precision you are doing the calculations at is already higher than the accuracy of your data, more precision wont get you much of anything. but if the converse is true, if you are processing the data at lower precision than the accuracy of your source data, then increasing precision will absolutely increase accuracy.
Aug 13
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 8/14/24 00:35, claptrap wrote:
 On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:
 On 8/13/24 13:09, Abdulhaq wrote:

 At the same time, "better accuracy" is part of the reason why CTFE 
 uses higher floating-point precision, but higher precision does not 
 mean the result will be more accurate. Unless arbitrary precision is 
 used, which is even slower, it leads to double-rounding issues which 
 can cause the result to be less accurate.
If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC limitations. Processing it in 16 bit will automatically loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits you still have 120db SnR, greater precision but same accuracy as the source. The point is the statement "higher precision does not mean the result will be more accurate." is only half true. ...
It is fully true. "A means B" holds when in any situation where A holds, B also holds. "A does not mean B" holds when there is a situation where A is true but B is false. Clearly I am not saying that higher accuracy implies lower precision, just that there are cases where lower precision gives you higher accuracy.
 If the precision you are doing the calculations at is already higher 
 than the accuracy of your data, more precision wont get you much of 
 anything.
 ...
It may even break some algorithms, especially when it is inconsistently and implicitly applied, which is the case in D.
 but if the converse is true, if you are processing the data at lower 
 precision than the accuracy of your source data, then increasing 
 precision will absolutely increase accuracy.
 
Usually true, but I still want to decide on my own which computation will be rounded to what precision. Even an accidentally avoided accuracy issue resulting from not guaranteed implicit higher precision is a ticking time bomb and very annoying to debug when it goes off.
Aug 13
parent reply claptrap <clap trap.com> writes:
On Wednesday, 14 August 2024 at 02:35:08 UTC, Timon Gehr wrote:
 On 8/14/24 00:35, claptrap wrote:
 If you have a pristine 24 bit audio sample, maybe 120db SnR 
 due to DAC limitations. Processing it in 16 bit will 
 automatically loose you 4 bits of accuracy, it'll drop the SnR 
 to 96dbs. If you process it at 32 bits you still have 120db 
 SnR, greater precision but same accuracy as the source.
 
 The point is the statement "higher precision does not mean the 
 result will be more accurate." is only half true.
 ...
It is fully true. "A means B" holds when in any situation where A holds, B also holds. "A does not mean B" holds when there is a situation where A is true but B is false.
I'd argue "does not mean" is vague, and most people would read what you wrote as " A != B in any situation." But otherwise I agree based on what you meant.
 but if the converse is true, if you are processing the data at 
 lower precision than the accuracy of your source data, then 
 increasing precision will absolutely increase accuracy.
 
Usually true, but I still want to decide on my own which computation will be rounded to what precision. Even an accidentally avoided accuracy issue resulting from not guaranteed implicit higher precision is a ticking time bomb and very annoying to debug when it goes off.
I agree the compiler should actually use the float precision you explicitly ask for.
Aug 14
parent reply Dom DiSc <dominikus scherkl.de> writes:
On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision 
 you explicitly ask for.
But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on). Also relying on specific inaccuracy of FP calculations is very bad design.
Aug 15
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 8/15/24 12:25, Dom DiSc wrote:
 On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision you 
 explicitly ask for.
But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on). ...
It's still possible, but it's not even what I asked for. Also, personally I am not targeting any machine whose floating-point unit is sufficiently incompatible to cause a difference on this specific code. There has been a convergence of how floats work. D differs gratuitously.
 Also relying on specific inaccuracy of FP calculations is very bad design.
The point is not "relying on specific inaccuracy", the point is reproducibility. The specifics of the inaccuracy usually do not matter at all, what matters a great deal sometimes is that you get the same result when you run the same computation.
Aug 15
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 8/15/24 12:25, Dom DiSc wrote:
 Also relying on specific inaccuracy of FP calculations is very bad design.
Personally I mostly care about reproducibility, but also, as I said earlier, accuracy-improving algorithms like Kahan summation or general double-double computations very much "rely on specific inaccuracy". It's not bad design, it may just be that you are not sufficiently informed about this kind of thing. Kahan summation is even in Phobos: https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L7558-L7572 This implementation is however technically incorrect because D could choose a bad subset of the computations and run only those at "higher" precision, destroying the overall accuracy. It gets more obvious if you drop the judgmental tone and just call "specific inaccuracy" by the proper name: "correct rounding".
Aug 15
prev sibling parent claptrap <clap trap.com> writes:
On Thursday, 15 August 2024 at 10:25:45 UTC, Dom DiSc wrote:
 On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:
 I agree the compiler should actually use the float precision 
 you explicitly ask for.
But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on).
So if have some CTFE function that doesn't a bunch of calculations in double precision and the compiler is running on a platform that only supports float, what happens?
 Also relying on specific inaccuracy of FP calculations is very 
 bad design.
It's not relying on them, it's accounting for them.
Aug 15
prev sibling parent reply Carsten Schlote <carsten.schlote gmx.net> writes:
On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw 
 differing results in the 17th d.p. when running the exact same 
 calculation in the same python session. The reason turned out 
 to be which registers were being used each time, which could 
 vary. This is not considered to be a bug.
I would consider it a bug. Running the same piece of code multiple times with exactly the same parameters should result into the exactly same results. But once I also expected that CTFE and real execution of some code should always yield the same results. This assumption is wrong as I learned recently. But using 'real' as type usually leads to the same result. All the other issues with floats of any size are some separate topic.
Aug 15
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 8/15/24 11:13, Carsten Schlote wrote:
 On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw differing 
 results in the 17th d.p. when running the exact same calculation in 
 the same python session. The reason turned out to be which registers 
 were being used each time, which could vary. This is not considered to 
 be a bug.
I would consider it a bug. Running the same piece of code multiple times with exactly the same parameters should result into the exactly same results.
The D spec allows this to not be true too. I consider it a weakness of the spec.
Aug 15
prev sibling parent reply Abdulhaq <alynch4048 gmail.com> writes:
On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote 
wrote:
 On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:
 One little piece of anecdata, when working with python I saw 
 differing results in the 17th d.p. when running the exact same 
 calculation in the same python session. The reason turned out 
 to be which registers were being used each time, which could 
 vary. This is not considered to be a bug.
I would consider it a bug.
I'd say that whether it's a bug or not comes down to the definition of "bug", and I don't think a discussion of that is useful, and also it's subjective and reasonable people can differ on it. But I would say that, in my python example, both answers were wrong, in the sense that they were not exactly right (and can't be), but they were both wrong in a different way. In your example I'd ask you to consider this: all the images you created, except the trivial ones, were wrong, in the sense that some of the calculations were inexact and some of them had sufficient error to change the color of the pixel. It's just that the CTFE image was wrong in a different way to the non-CTFE image. Why do you want to reproduce the errors? In general floating point calculations are inexact, and we shouldn't (generalising here) expect to get the same error across different platforms. In this specific case you could argue that it's the "same platform", I get that.
Aug 15
next sibling parent reply Abdulhaq <alynch4048 gmail.com> writes:
On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote
To clarify a bit more, I'm not just talking about single isolated computations, I'm talking about e.g. matrix multiplication. Different compilers, even LDC vs DMD for example, could optimise the calculation in a different way, loop unrolling, step elimination etc. even if the rounding algorithms etc. at the chip level are the same, the way the code is compiled and calculations sequenced will change the error in the final answer. Then, variations in pipelining and caching at the processor level could also affect the answer. And if you move on to different computing paradigms such as quantum computing and other as yet undiscovered techniques, again the way operations and rounding etc is compounded will cause divergences in computations. Now, we could insist that we somehow legislate for the way compound calculations are conducted. But that would cripple the speed of calculations for some processor architecture / paradigms for a goal (reproduceability) which is worthy, but for 99% of usages not sufficiently beneficial to pay the big price in performance.
Aug 15
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 8/15/24 18:50, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:
 On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote
To clarify a bit more, I'm not just talking about single isolated computations, I'm talking about e.g. matrix multiplication. Different compilers, even LDC vs DMD for example, could optimise the calculation in a different way, loop unrolling, step elimination etc. even if the rounding algorithms etc. at the chip level are the same, the way the code is compiled and calculations sequenced will change the error in the final answer. ...
LDC disables -ffast-math by default.
 Then, variations in pipelining and caching at the processor level could 
 also affect the answer.
 ...
No.
 And if you move on to different computing paradigms such as quantum 
 computing and other as yet undiscovered techniques, again the way 
 operations and rounding etc is compounded will cause divergences in 
 computations.
 ...
Yes, if you move on to an analog computing paradigm with imperfect error correction, full reproducibility will go out the window. Floating point is not that though.
 Now, we could insist that we somehow legislate for the way compound 
 calculations are conducted. But that would cripple the speed of 
 calculations for some processor architecture / paradigms for a goal 
 (reproduceability) which is worthy, but for 99% of usages not 
 sufficiently beneficial to pay the big price in performance.
 
 
It's really not that expensive, changing the result via optimizations is disabled by default in LDC, and actually, how do you know that the compiler does not pessimize your hand-optimized compound operations. I am not even against people being able to pass -ffast-math, it should just not destroy the correctness and reproducibility of everyone else's computations.
Aug 15
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 8/15/24 18:21, Abdulhaq wrote:
 Why do you want to reproduce the errors?
It's you who calls it "errors". Someone else may just call it "results". Science. Determinism, e.g. blockchain or other deterministic lock-step networking. etc. It's not hard to come up with use cases. Sometimes what exact result you get is not very important, but it is important that you get the same one.
 
 In general floating point calculations are inexact, and we shouldn't
(generalising here) expect to get the same error across different platforms. 
It's just not true that floating point calculations are somehow "inexact". You are confusing digital and analog computing. This is still digital computing. Rounding is a deterministic, exact function, like any other.
Aug 15
parent Abdulhaq <alynch4048 gmail.com> writes:
On Thursday, 15 August 2024 at 16:51:51 UTC, Timon Gehr wrote:
 On 8/15/24 18:21, Abdulhaq wrote:
 Why do you want to reproduce the errors?
It's you who calls it "errors". Someone else may just call it "results".
Here by "error" I mean where the result is different to a known correct answer. It's a simple matter to propose a method for producing a fractal plot, and to have a notional "correct" plot if calculated to infinite precision. Then, at sufficient depth of an e.g. mandelbrot plot, at a finite precision, some pixels will differ from the ideal plot produced by the machine of infinite precision. These divergences are, by my definition, errors. Yes, it's all results, and some results are errors.
 Science. Determinism, e.g. blockchain or other deterministic 
 lock-step networking. etc. It's not hard to come up with use 
 cases. Sometimes what exact result you get is not very 
 important, but it is important that you get the same one.

 
This is why I was careful to say that I was generalising. Of course we can come up with examples where reproduceability is an essential requirement. In this case we can often work out what precision is required to achieve said reproduceability.
 In general floating point calculations are inexact, and we 
 shouldn't (generalising here) expect to get the same error 
 across different platforms.
It's just not true that floating point calculations are somehow "inexact". You are confusing digital and analog computing. This is still digital computing. Rounding is a deterministic, exact function, like any other.
Fair point. I meant to refer to floating point numbers rather than calculations. There are an infinite number of real numbers that cannot be exactly represented by a floating point number of finite precision. The floating point representations of those numbers are inexact. If we calculate 1/3 using floating point, the result can be exactly correct in the sense there is a correct answer in the world of floating point, but the calculated result will be an inexact representation of the correct number, which is what I had in mind.
Aug 15
prev sibling parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:
 Hi

 I'm playing with CTFE in D. This feature allows for a lot of 
 funny things, e.g. initialisation of immutable data at compile 
 time with the result of some other function (template).

 As a result I get immutable result blobs compiled into the 
 binary. But none of the generating code, because it was already 
 executed by CTFE.

 This worked nicly for several other usecases as well.For now 
 the results of CTFE and RT were always the same. As expected.

 However, yesterday a unit-test started to report, that the 
 results created by the same code with same parameters differ 
 when run in CTFE mode or at runtime.

 ```D
     static immutable ubyte[] burningShipImageCTFE = 
 generateBurningShipImage(twidth, theight, maxIter);
     immutable ubyte[] burningShipImageRT = 
 generateBurningShipImage(twidth, theight, maxIter);
     assert(burningShipImageCTFE == burningShipImageRT, "Same 
 results expected.");

 ```
 I diffed the pictures and indeed some of the pixels in the more 
 complex areas of the BurningShip fractal were clearly and 
 noteably different.

 Ok, the fractal code uses 'double' floats, which are by their 
 very nature limited in precision. But assuming that the math 
 emulation of CTFE works identical as the CPU does at runtime, 
 the outcome should be identical.

 Or not, in some cases ;-) E.g. with a fractal equation where 
 smallest changes can result into big differences.

 And it opens up some questions:

 - Can CTFE be used under all circumstances when float numbers 
 of any precision are involved?
 - Or is this some kind of expected behaviour whenever floats 
 are involved?
 - Is the D CTFE documentation completely covering such possible 
 issues?

 I can imagine that bugs causes by such subtil differences might 
 be very difficult to fix.


 Any experiences or thought on this?
Experiences, little, as I'm not doing floating-point stuff professionally, but I know my stuff because in the past, for some years, I did the lecture assistance for a numerical programming course. The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs. Of course, results in user code can differ between CTFE and RT due to using `__ctfe` incorrectly. It might be noteworthy that C++ (at least up to including C++17) does not allow floating-point types in CTFE (i.e. in `constexpr` execution) and my suspicion is that this is the reason. Maybe the solution is the same: Remove floating-point operations from CTFE or at least ones that could differ from RT. It would be awkward, at last in some people's opinion, because that would mean that at CTFE, only `real` is available, despite it being implementation defined (it's not just platform dependent, it's also how expressions are interpreted), while `double` and `float` being seemingly exactly defined. The reason is that `real` can't be optimized to higher precision as it's the highest precision format supposed by the platform by definition, whereas for the smaller formats, RT results may differ for different optimization levels. What the compiler could do, however is replace `a + b * c` by a fused multiply add. If it does that consistently across CTFE and RT, as I read the spec, it would be allowed to do that, even for `real`. The reason D specifies floating-point operations not that precisely is to allow for optimizations. Generally speaking, optimizations require some leeway in the spec. Optimizations are also required not to change observable behavior, but what counts as that is again up to the spec. In C++, the compiler is allowed to optimize away copies, even if those would invoke a copy constructor that has observable side-effects. As I see it (not my opinion, just what I see and conclude), D specifies differences in floating-point operations due to them being carried out in higher-than-required precision not an obersvable side-effect, i.e. one that the optimizer must preserve, even if you can practically observe a difference. The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations so that you'd get both less precise results *and* worse performance. That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends) probably also true for basically all hardware today if you consider `float` specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision. An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account. This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations. In Zig, all integer overflow is undefined for that reason, and for wrap-around or saturated arithmetic, there are separate operators. D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer. You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized. Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.
Aug 17
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 8/17/24 18:33, Quirin Schroll wrote:
 
The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.
 However, differences between CTFE and RT are indeed unacceptable for
core-language operations. Those are bugs.
No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.
 The reason for that is probably because Walter didn't like that other 
 languages nailed down floating-point operations
Probably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.
 so that you'd get both less precise results *and* worse performance.
Imagine just manually using the data type that is most suitable for your use case.
 That would for example be 
 the case on an 80387 coprocessor, and (here's where my knowledge ends) 
Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.
 probably also true for basically all hardware today if you consider 
 `float` specifically. I know of no hardware, that supports single 
 precision, but not double precision. Giving you double precision instead 
 of single is at least basically free and possibly even a performance 
 boost, while also giving you more precision.
It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.
 
 An algorithm like Kahan summation must be implemented in a way that takes
those optimizations into account.
I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.
 This is exactly like in C++, signed integer overflow is undefined, not because
it's undefined on the hardware, but because it allows for optimizations.
If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.
 D could easily add specific functions to `core.math` that specify operations
as specifically IEEE-754 confirming. Using those, Phobos could give you types
that are specified to produce results as specified by IEEE-754, with no
interference by the optimizer. 
It does not do that. Anyway, I would expect that to go to std.numeric.
 You can't actually do the reverse, i.e. provide a type in Phobos that allows
for optimizations of that sort but the core-language types are guaranteed to be
unoptimized.
You say "unoptimized", I hear "not broken". Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.
 Such a type would have to be compiler-recognized, i.e. it would end up being a
built-in type. 
I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware. Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.
Aug 18
parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:
 On 8/17/24 18:33, Quirin Schroll wrote:
 
The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.
 However, differences between CTFE and RT are indeed 
 unacceptable for core-language operations. Those are bugs.
No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.
 The reason for that is probably because Walter didn't like 
 that other languages nailed down floating-point operations
Probably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.
 so that you'd get both less precise results *and* worse 
 performance.
Imagine just manually using the data type that is most suitable for your use case.
 That would for example be the case on an 80387 coprocessor, 
 and (here's where my knowledge ends)
Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.
 probably also true for basically all hardware today if you 
 consider `float` specifically. I know of no hardware, that 
 supports single precision, but not double precision. Giving 
 you double precision instead of single is at least basically 
 free and possibly even a performance boost, while also giving 
 you more precision.
It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.
 
 An algorithm like Kahan summation must be implemented in a way 
 that takes those optimizations into account.
I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.
 This is exactly like in C++, signed integer overflow is 
 undefined, not because it's undefined on the hardware, but 
 because it allows for optimizations.
If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.
 D could easily add specific functions to `core.math` that 
 specify operations as specifically IEEE-754 confirming. Using 
 those, Phobos could give you types that are specified to 
 produce results as specified by IEEE-754, with no interference 
 by the optimizer.
It does not do that. Anyway, I would expect that to go to std.numeric.
 You can't actually do the reverse, i.e. provide a type in 
 Phobos that allows for optimizations of that sort but the 
 core-language types are guaranteed to be unoptimized.
You say "unoptimized", I hear "not broken". Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.
 Such a type would have to be compiler-recognized, i.e. it 
 would end up being a built-in type.
I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware. Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.
I think you got me wrong here on a key aspect: I’m not trying to argue, but to explain what the D spec means and outline possible rationales for why it is as it is. For example, my x87 knowledge isn’t “outdated,” x87 co-processors didn’t change. They’re just not relevant anymore practically, but they could have influenced *why* Walter specified D as he did. I totally agree with you that in the context of modern hardware, D’s float spec makes little sense. Then I tried to outline a compromise between Walter’s numerously stated opinion and the desires of many D community members, including you. I’m somewhere between your side and neither side. I know enough about floating-point arithmetic to avoid it as much as I can, and it’s not due to how languages implement it, but how it works in practice. Personally, I wouldn’t even care much if D removed floating-point types entirely. I just see (saw?, years ago) great potential in D and want it to succeed, and I believe having reproducible results would be a big win. One aspect about immune-to-optimizations types (calling them `float32` and `float64` for now) would be that if you have both a context in which you want good and fast algebraic results (where using fused-multiply-add is welcomed), but also a context, in which reproducibility is required, you could use `double` in the first context, and `float64` in the second, while passing `-ffast-math`. Maybe a `pragma` on one kind of those functions is better. I don’t know. They’re not mutually exclusive. LDC already has ` fastmath`, so maybe it can just become official and be part of the D spec?
Aug 19