digitalmars.D - Differences in results when using the same function in CTFE and
- Carsten Schlote (41/41) Aug 08 Hi
- IchorDev (6/8) Aug 08 Which it probably does not. CTFE is meant to be less
- user1234 (13/20) Aug 08 During CTFE _there has to be_ much more round-trips between the
- Carsten Schlote (30/33) Aug 09 Replaced the 'doubles' with 'real' and now the the results of the
- H. S. Teoh (9/14) Aug 08 IIRC DMD either uses in CTFE or emits code that uses 8088 80-bit
- Nicholas Wilson (5/17) Aug 08 there are `toPrec`[1] intrinsics to solve exactly this issue of
- IchorDev (6/28) Aug 10 Can anyone remind me whether there’s a way to force calculations
- Quirin Schroll (11/45) Aug 12 D specifies that floating point calculations may be performed
- IchorDev (7/13) Aug 12 I don’t care. Only one family of CPU architectures supports
- Quirin Schroll (11/24) Aug 13 I honestly don’t know how JRE did implement `double` operations
- Abdulhaq (5/10) Aug 13 For those who haven't really run into numerical analysis before,
- matheus (4/9) Aug 13 Since I can't access that site, could you please share the URL
- Abdulhaq (2/12) Aug 13 sure, it's https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf
- IchorDev (19/29) Aug 13 Probably in software, but modern x86 CPUs can just use the
- Timon Gehr (8/21) Aug 12 The specification allows behavior like that even at runtime.
- Abdulhaq (14/19) Aug 13 It's worth mentioning that in the wider field, it is generally
- Timon Gehr (13/22) Aug 13 This is not true at the level of hardware, where results are actually
- Abdulhaq (13/20) Aug 13 It's an interesting discussion, but we are where we are, and I
- Timon Gehr (28/53) Aug 13 Well, I think one thing is where we are now, another question is where
- claptrap (14/21) Aug 13 If you have a pristine 24 bit audio sample, maybe 120db SnR due
- Timon Gehr (12/38) Aug 13 It is fully true. "A means B" holds when in any situation where A holds,...
- claptrap (6/29) Aug 14 I'd argue "does not mean" is vague, and most people would read
- Dom DiSc (6/8) Aug 15 But if you do cross compiling, that may simply be not possible
- Timon Gehr (9/18) Aug 15 It's still possible, but it's not even what I asked for. Also,
- Timon Gehr (13/14) Aug 15 Personally I mostly care about reproducibility, but also, as I said
- claptrap (5/13) Aug 15 So if have some CTFE function that doesn't a bunch of
- Carsten Schlote (10/15) Aug 15 I would consider it a bug.
- Timon Gehr (3/14) Aug 15 The D spec allows this to not be true too. I consider it a weakness of
- Abdulhaq (20/27) Aug 15 I'd say that whether it's a bug or not comes down to the
- Abdulhaq (20/21) Aug 15 To clarify a bit more, I'm not just talking about single isolated
- Timon Gehr (12/38) Aug 15 No.
- Timon Gehr (10/13) Aug 15 It's you who calls it "errors". Someone else may just call it "results".
- Abdulhaq (23/39) Aug 15 Here by "error" I mean where the result is different to a known
- Quirin Schroll (65/104) Aug 17 Experiences, little, as I'm not doing floating-point stuff
- Timon Gehr (48/66) Aug 18 The normal use case for floating-point isn't perfectly reproducible
- Quirin Schroll (29/115) Aug 19 I think you got me wrong here on a key aspect: I’m not trying to
Hi I'm playing with CTFE in D. This feature allows for a lot of funny things, e.g. initialisation of immutable data at compile time with the result of some other function (template). As a result I get immutable result blobs compiled into the binary. But none of the generating code, because it was already executed by CTFE. This worked nicly for several other usecases as well.For now the results of CTFE and RT were always the same. As expected. However, yesterday a unit-test started to report, that the results created by the same code with same parameters differ when run in CTFE mode or at runtime. ```D static immutable ubyte[] burningShipImageCTFE = generateBurningShipImage(twidth, theight, maxIter); immutable ubyte[] burningShipImageRT = generateBurningShipImage(twidth, theight, maxIter); assert(burningShipImageCTFE == burningShipImageRT, "Same results expected."); ``` I diffed the pictures and indeed some of the pixels in the more complex areas of the BurningShip fractal were clearly and noteably different. Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical. Or not, in some cases ;-) E.g. with a fractal equation where smallest changes can result into big differences. And it opens up some questions: - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this? Greetz Carsten
Aug 08
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:[…] assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical.Which it probably does not. CTFE is meant to be less platform-dependent than runtime code, so it is interpreted rather than running directly on the CPU. This is why there are strict limitations on where CTFE can be invoked. I believe the exact behaviour of CTFE would be implementation-dependent though.
Aug 08
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Hi [...] I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this? Greetz CarstenDuring CTFE _there has to be_ much more round-trips between the storage and the effective operations whereas with native code several operations can happen without leaving the FPU... those are done with a different accurary (80 bits). During the round trips implied by CTFE you have a floating point truncation after each op. You see during CTFE just `a + b * c` means two truncations, one after the multiply, the second after the addition (even if this example is not very good as those op are done using SSE) That being said, that'd be intersting to verify if it's the problem. Maybe update your code to use `real` variables and let's see.
Aug 08
On Thursday, 8 August 2024 at 13:31:43 UTC, user1234 wrote:That being said, that'd be intersting to verify if it's the problem. Maybe update your code to use `real` variables and let's see.Replaced the 'doubles' with 'real' and now the the results of the CTFE and RT executions are exactly the same. So, as you expected and described, the CTFE lost some precision by splitting the ops and storing the intermediate results back from FPU registers with internal precision to the limited 'double' type somewhere in memory. However, the other fractals (mandelbot, julia, newton, lyapunov, ...) work prefectly with the 'double's. I suspect, that their default/starting views do no involve enough recursions and iterations in the calculation, so that the missing precision is not important, yet. I will pimp my unittests to catch such precision issues for the other fractals as well, and then fix it by using 'real' types. So, what are the lessions learned here: 1. When using floats use the 'real' type - it uses the maximum available FPU precision and no precision is lost, when the FPU register is stored/read back to/from memory. So CTFE and RT execute have the best chance to produce the same results. 2. Use 'float' and 'double' only if you need to save some bits of storage and if the reduced precision is acceptable. 3. Unittests proofed again to be great. Especially when support for unittesting is built into the language. Even simplest assumptions can be wrong and having assertions for such assumptions will be helpful. 4. Use assert()/enforce(), in/out/do to check everything to match your expectations and assumptions. At the end ```assert(burningShipImageCTFE == burningShipImageRT, "Same results expected.");``` in my unitest did find it. And I got a chance to think about it and work on my assumptions ;-)
Aug 09
On Thu, Aug 08, 2024 at 10:31:32AM +0000, Carsten Schlote via Digitalmars-d wrote: [...]Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical.IIRC DMD either uses in CTFE or emits code that uses 8088 80-bit extended precision in intermediate results. Which, obviously, will produce different results when the same computation performed using only doubles. [...]- Or is this some kind of expected behaviour whenever floats are involved?https://floating-point-gui.de/ --T
Aug 08
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Hi - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this?there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214Greetz Carsten
Aug 08
On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Can anyone remind me whether there’s a way to force calculations to be performed with a certain degree of precision (e.g. single or double) instead of rounding down from the largest floats available? Would be really useful for cross-platform consistency. :’|Hi - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this?there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214Greetz Carsten
Aug 10
On Saturday, 10 August 2024 at 11:15:19 UTC, IchorDev wrote:On Friday, 9 August 2024 at 00:46:12 UTC, Nicholas Wilson wrote:D specifies that floating point calculations may be performed with higher precision than the type suggests:On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Can anyone remind me whether there’s a way to force calculations to be performed with a certain degree of precision (e.g. single or double) instead of rounding down from the largest floats available? Would be really useful for cross-platform consistency. :’|Hi - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this?there are `toPrec`[1] intrinsics to solve exactly this issue of lack of truncation of precision. [1]: https://github.com/dlang/dmd/blob/958ba9cbe3583830efe39505238939daa0dbd64c/druntime/src/core/math.d#L198-L214Greetz CarstenExecution of floating-point expressions may yield a result of greater precision than dictated by the source.— [*Floating-Point Intermediate Values*](https://dlang.org/spec/float.html#fp_intermediate_values), D Language Specification On almost all non-embedded CPUs, doing non-vector calculations in `float` is more costly than doing them in `double` or `real` because for single-arguments, the floats are converted to `double` or `real`. I consider `float` to be a type used for storing values in arrays that don’t need the precision and save me half the RAM.
Aug 12
On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:On almost all non-embedded CPUs, doing non-vector calculations in `float` is more costly than doing them in `double` or `real` because for single-arguments, the floats are converted to `double` or `real`. I consider `float` to be a type used for storing values in arrays that don’t need the precision and save me half the RAM.I don’t care. Only one family of CPU architectures supports ‘extended precision’ floats (because it’s a waste of time), so I would like to know a way to always perform calculations with double precision for better cross-platform consistency. Imagine trying to implement JRE without being able to do native double precision maths.
Aug 12
On Monday, 12 August 2024 at 11:06:15 UTC, IchorDev wrote:On Monday, 12 August 2024 at 10:22:52 UTC, Quirin Schroll wrote:I honestly don’t know how JRE did implement `double` operations on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 -O3` and add some `double` values, intermediate results are stored and loaded again. Very inefficient. If I use `long double`, that does not happen. The assertion that only one CPU family supports extended floats is objectively wrong. You probably meant the x86 with its 80-bit format, which is still noteworthy, as x86 is very, very common. However, at least the POWER9 family supports 128-bit IEEE-754 quad quadruple-precision floats. IIUC, RISC-V also supports them.On almost all non-embedded CPUs, doing non-vector calculations in `float` is more costly than doing them in `double` or `real` because for single-arguments, the floats are converted to `double` or `real`. I consider `float` to be a type used for storing values in arrays that don’t need the precision and save me half the RAM.I don’t care. Only one family of CPU architectures supports ‘extended precision’ floats (because it’s a waste of time), so I would like to know a way to always perform calculations with double precision for better cross-platform consistency. Imagine trying to implement JRE without being able to do native double precision maths.
Aug 13
On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:I honestly don’t know how JRE did implement `double` operations on e.g. the Intel 80486, but if I try using `gcc -mfpmath=387 -O3` and add some `double` values, intermediate results are stored and loaded again. Very inefficient. If I use `long double`, that does not happen.For those who haven't really run into numerical analysis before, this paper is worth reading. I've given a link to the HN discussion around it: https://news.ycombinator.com/item?id=37028310
Aug 13
On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:... For those who haven't really run into numerical analysis before, this paper is worth reading. I've given a link to the HN discussion around it: https://news.ycombinator.com/item?id=37028310Since I can't access that site, could you please share the URL directly to this paper? Matheus.
Aug 13
On Tuesday, 13 August 2024 at 14:20:29 UTC, matheus wrote:On Tuesday, 13 August 2024 at 11:14:35 UTC, Abdulhaq wrote:sure, it's https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf... For those who haven't really run into numerical analysis before, this paper is worth reading. I've given a link to the HN discussion around it: https://news.ycombinator.com/item?id=37028310Since I can't access that site, could you please share the URL directly to this paper? Matheus.
Aug 13
On Tuesday, 13 August 2024 at 11:02:07 UTC, Quirin Schroll wrote:I honestly don’t know how JRE did implement `double` operations on e.g. the Intel 80486Probably in software, but modern x86 CPUs can just use the hardware; so the difference isn’t so meaningful anymore.if I try using `gcc -mfpmath=387 -O3` and add some `double` values, intermediate results are stored and loaded again. Very inefficient. If I use `long double`, that does not happen.Who cares? In a situation where we must reach the same result on every platform (cross-platform determinism) the performance can suffer a bit. You are just avoiding my question by making excuses. Do you make sure your program’s data is passed around exclusively via data registers? Do you only write branchless code? Does your entire program fit inside the CPU cache? No, because we make performance sacrifices to achieve the desired outcome. The idea that the only valid way of coding something is the way that compromises on the integrity of the output in favour of performance is a step away from programming nihilism.You probably meant the x86 with its 80-bit format, which is still noteworthyYes because it’s referred to as ‘extended precision’ and doesn’t have a proper name because it’s an unstandardised atrocity. https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_formatHowever, at least the POWER9 family supports 128-bit IEEE-754 quad quadruple-precision floats. IIUC, RISC-V also supports them.binary128 is obviously not the same as x86’s ‘extended precision’. You will not get cross-platform deterministic results from using them interchangeably; you will get misery.
Aug 13
On 8/8/24 12:31, Carsten Schlote wrote:And it opens up some questions: - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this?The specification allows behavior like that even at runtime. https://dlang.org/spec/float.html I have been vocally against this. Full portability is perhaps one thing, as hardware differences do exist, but completely thwarting the expectations by deliberately doing a different operation than the one you requested I think makes no sense at all. A lot of modern hardware nowadays is compatible or almost compatible regarding floats.
Aug 12
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical.Any experiences or thought on this?It's worth mentioning that in the wider field, it is generally held that when working with floating point operations we don't expect to see the same duplicate results every time. It is a mistake to compare for equality, you should be checking for "almost equals". For your particular application it might feel like a "bug" to see different results, but in fact this is expected behaviour and generally not held to be a fault. One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.
Aug 13
On 8/13/24 09:56, Abdulhaq wrote:It's worth mentioning that in the wider field, it is generally held that when working with floating point operations we don't expect to see the same duplicate results every time.This is not true at the level of hardware, where results are actually reproducible on the same processor architecture (and often even across architectures) with the same sequence of instructions. Treating floats as "fuzzy" is perhaps common, but it is not correct. Rounding is a pure function like any other.It is a mistake to compare for equality, you should be checking for "almost equals".That is a different point.For your particular application it might feel like a "bug" to see different results, but in fact this is expected behaviour and generally not held to be a fault. ...It's annoying anyway, and we should hold programming languages to a higher standard than that.One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.Well, I think the x87 is a bad design as a compilation target. If you are targeting that with a compiler whose developers by default do not care about reproducibility, expect weird behavior. These instructions are now deprecated, so the calculation should change a bit.
Aug 13
On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:It's annoying anyway, and we should hold programming languages to a higher standard than that. Well, I think the x87 is a bad design as a compilation target. If you are targeting that with a compiler whose developers by default do not care about reproducibility, expect weird behavior. These instructions are now deprecated, so the calculation should change a bit.It's an interesting discussion, but we are where we are, and I think you'd agree that the OP should understand that in the current world of "systems" programming languages, this behaviour would not be viewed as a "bug". I'm only being somewhat tongue-in-cheek when I say that his application, viewing fractals, could be considered a kind of "chaos amplifier" that is especially prone to these sorts of issues. I'm going to guess that the situation we're in is due to preference of speed over reproduceability in the days when computers ran much more slowly than they do now. Would we choose a different priority these days? I don't know.
Aug 13
On 8/13/24 13:09, Abdulhaq wrote:On Tuesday, 13 August 2024 at 08:33:51 UTC, Timon Gehr wrote:Well, I think one thing is where we are now, another question is where we should be in the future.It's annoying anyway, and we should hold programming languages to a higher standard than that. Well, I think the x87 is a bad design as a compilation target. If you are targeting that with a compiler whose developers by default do not care about reproducibility, expect weird behavior. These instructions are now deprecated, so the calculation should change a bit.It's an interesting discussion, but we are where we are, and I think you'd agree that the OP should understand that in the current world of "systems" programming languages, this behaviour would not be viewed as a "bug". ...I'm only being somewhat tongue-in-cheek when I say that his application, viewing fractals, could be considered a kind of "chaos amplifier" that is especially prone to these sorts of issues. ...Not really. I think it is valid to expect reproducible results for addition, absolute value, and squaring, no matter how many times you repeat them. The reason why this issue happened is that D does not even try a little bit, and explicitly just goes ahead and uses the wrong data type, namely the 80-bit one only supported by the x87.I'm going to guess that the situation we're in is due to preference of speed over reproduceability in the days when computers ran much more slowly than they do now.Ironically, the x87 design is mostly about getting accurate numerics. It is a rather inefficient design. One issue is that it was designed for assembly programmers, not C compilers. C language features do not map well to it. x87 results are reproducible in assembly as it is clear where rounding happens. It is true that computers used to run more slowly, and the x87 is some part of the reason why that was true. There are many reasons why the design was ditched, efficiency is one of them.Would we choose a different priority these days? I don't know.The x87 results in both slower code and worse reproducibility when used as a target for `float`/`double`/`real` computations. Ironically, IIRC Kahan was involved in the x87 design, yet algorithms attributed to him, like Kahan summation, can actually not be implemented correctly using D floating-point types, because of the insane rules introduced to support targeting the x87 without changing rounding precision or spilling data to the cache and reloading it. At the same time, "better accuracy" is part of the reason why CTFE uses higher floating-point precision, but higher precision does not mean the result will be more accurate. Unless arbitrary precision is used, which is even slower, it leads to double-rounding issues which can cause the result to be less accurate.
Aug 13
On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:On 8/13/24 13:09, Abdulhaq wrote: At the same time, "better accuracy" is part of the reason why CTFE uses higher floating-point precision, but higher precision does not mean the result will be more accurate. Unless arbitrary precision is used, which is even slower, it leads to double-rounding issues which can cause the result to be less accurate.If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC limitations. Processing it in 16 bit will automatically loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits you still have 120db SnR, greater precision but same accuracy as the source. The point is the statement "higher precision does not mean the result will be more accurate." is only half true. If the precision you are doing the calculations at is already higher than the accuracy of your data, more precision wont get you much of anything. but if the converse is true, if you are processing the data at lower precision than the accuracy of your source data, then increasing precision will absolutely increase accuracy.
Aug 13
On 8/14/24 00:35, claptrap wrote:On Tuesday, 13 August 2024 at 21:03:00 UTC, Timon Gehr wrote:It is fully true. "A means B" holds when in any situation where A holds, B also holds. "A does not mean B" holds when there is a situation where A is true but B is false. Clearly I am not saying that higher accuracy implies lower precision, just that there are cases where lower precision gives you higher accuracy.On 8/13/24 13:09, Abdulhaq wrote: At the same time, "better accuracy" is part of the reason why CTFE uses higher floating-point precision, but higher precision does not mean the result will be more accurate. Unless arbitrary precision is used, which is even slower, it leads to double-rounding issues which can cause the result to be less accurate.If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC limitations. Processing it in 16 bit will automatically loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits you still have 120db SnR, greater precision but same accuracy as the source. The point is the statement "higher precision does not mean the result will be more accurate." is only half true. ...If the precision you are doing the calculations at is already higher than the accuracy of your data, more precision wont get you much of anything. ...It may even break some algorithms, especially when it is inconsistently and implicitly applied, which is the case in D.but if the converse is true, if you are processing the data at lower precision than the accuracy of your source data, then increasing precision will absolutely increase accuracy.Usually true, but I still want to decide on my own which computation will be rounded to what precision. Even an accidentally avoided accuracy issue resulting from not guaranteed implicit higher precision is a ticking time bomb and very annoying to debug when it goes off.
Aug 13
On Wednesday, 14 August 2024 at 02:35:08 UTC, Timon Gehr wrote:On 8/14/24 00:35, claptrap wrote:I'd argue "does not mean" is vague, and most people would read what you wrote as " A != B in any situation." But otherwise I agree based on what you meant.If you have a pristine 24 bit audio sample, maybe 120db SnR due to DAC limitations. Processing it in 16 bit will automatically loose you 4 bits of accuracy, it'll drop the SnR to 96dbs. If you process it at 32 bits you still have 120db SnR, greater precision but same accuracy as the source. The point is the statement "higher precision does not mean the result will be more accurate." is only half true. ...It is fully true. "A means B" holds when in any situation where A holds, B also holds. "A does not mean B" holds when there is a situation where A is true but B is false.I agree the compiler should actually use the float precision you explicitly ask for.but if the converse is true, if you are processing the data at lower precision than the accuracy of your source data, then increasing precision will absolutely increase accuracy.Usually true, but I still want to decide on my own which computation will be rounded to what precision. Even an accidentally avoided accuracy issue resulting from not guaranteed implicit higher precision is a ticking time bomb and very annoying to debug when it goes off.
Aug 14
On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:I agree the compiler should actually use the float precision you explicitly ask for.But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on). Also relying on specific inaccuracy of FP calculations is very bad design.
Aug 15
On 8/15/24 12:25, Dom DiSc wrote:On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:It's still possible, but it's not even what I asked for. Also, personally I am not targeting any machine whose floating-point unit is sufficiently incompatible to cause a difference on this specific code. There has been a convergence of how floats work. D differs gratuitously.I agree the compiler should actually use the float precision you explicitly ask for.But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on). ...Also relying on specific inaccuracy of FP calculations is very bad design.The point is not "relying on specific inaccuracy", the point is reproducibility. The specifics of the inaccuracy usually do not matter at all, what matters a great deal sometimes is that you get the same result when you run the same computation.
Aug 15
On 8/15/24 12:25, Dom DiSc wrote:Also relying on specific inaccuracy of FP calculations is very bad design.Personally I mostly care about reproducibility, but also, as I said earlier, accuracy-improving algorithms like Kahan summation or general double-double computations very much "rely on specific inaccuracy". It's not bad design, it may just be that you are not sufficiently informed about this kind of thing. Kahan summation is even in Phobos: https://github.com/dlang/phobos/blob/master/std/algorithm/iteration.d#L7558-L7572 This implementation is however technically incorrect because D could choose a bad subset of the computations and run only those at "higher" precision, destroying the overall accuracy. It gets more obvious if you drop the judgmental tone and just call "specific inaccuracy" by the proper name: "correct rounding".
Aug 15
On Thursday, 15 August 2024 at 10:25:45 UTC, Dom DiSc wrote:On Wednesday, 14 August 2024 at 07:54:15 UTC, claptrap wrote:So if have some CTFE function that doesn't a bunch of calculations in double precision and the compiler is running on a platform that only supports float, what happens?I agree the compiler should actually use the float precision you explicitly ask for.But if you do cross compiling, that may simply be not possible (because the target has different hardware implementation than the hardware the compiler runs on).Also relying on specific inaccuracy of FP calculations is very bad design.It's not relying on them, it's accounting for them.
Aug 15
On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.I would consider it a bug. Running the same piece of code multiple times with exactly the same parameters should result into the exactly same results. But once I also expected that CTFE and real execution of some code should always yield the same results. This assumption is wrong as I learned recently. But using 'real' as type usually leads to the same result. All the other issues with floats of any size are some separate topic.
Aug 15
On 8/15/24 11:13, Carsten Schlote wrote:On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:The D spec allows this to not be true too. I consider it a weakness of the spec.One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.I would consider it a bug. Running the same piece of code multiple times with exactly the same parameters should result into the exactly same results.
Aug 15
On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten Schlote wrote:On Tuesday, 13 August 2024 at 07:56:34 UTC, Abdulhaq wrote:I'd say that whether it's a bug or not comes down to the definition of "bug", and I don't think a discussion of that is useful, and also it's subjective and reasonable people can differ on it. But I would say that, in my python example, both answers were wrong, in the sense that they were not exactly right (and can't be), but they were both wrong in a different way. In your example I'd ask you to consider this: all the images you created, except the trivial ones, were wrong, in the sense that some of the calculations were inexact and some of them had sufficient error to change the color of the pixel. It's just that the CTFE image was wrong in a different way to the non-CTFE image. Why do you want to reproduce the errors? In general floating point calculations are inexact, and we shouldn't (generalising here) expect to get the same error across different platforms. In this specific case you could argue that it's the "same platform", I get that.One little piece of anecdata, when working with python I saw differing results in the 17th d.p. when running the exact same calculation in the same python session. The reason turned out to be which registers were being used each time, which could vary. This is not considered to be a bug.I would consider it a bug.
Aug 15
On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten SchloteTo clarify a bit more, I'm not just talking about single isolated computations, I'm talking about e.g. matrix multiplication. Different compilers, even LDC vs DMD for example, could optimise the calculation in a different way, loop unrolling, step elimination etc. even if the rounding algorithms etc. at the chip level are the same, the way the code is compiled and calculations sequenced will change the error in the final answer. Then, variations in pipelining and caching at the processor level could also affect the answer. And if you move on to different computing paradigms such as quantum computing and other as yet undiscovered techniques, again the way operations and rounding etc is compounded will cause divergences in computations. Now, we could insist that we somehow legislate for the way compound calculations are conducted. But that would cripple the speed of calculations for some processor architecture / paradigms for a goal (reproduceability) which is worthy, but for 99% of usages not sufficiently beneficial to pay the big price in performance.
Aug 15
On 8/15/24 18:50, Abdulhaq wrote:On Thursday, 15 August 2024 at 16:21:35 UTC, Abdulhaq wrote:LDC disables -ffast-math by default.On Thursday, 15 August 2024 at 09:13:31 UTC, Carsten SchloteTo clarify a bit more, I'm not just talking about single isolated computations, I'm talking about e.g. matrix multiplication. Different compilers, even LDC vs DMD for example, could optimise the calculation in a different way, loop unrolling, step elimination etc. even if the rounding algorithms etc. at the chip level are the same, the way the code is compiled and calculations sequenced will change the error in the final answer. ...Then, variations in pipelining and caching at the processor level could also affect the answer. ...No.And if you move on to different computing paradigms such as quantum computing and other as yet undiscovered techniques, again the way operations and rounding etc is compounded will cause divergences in computations. ...Yes, if you move on to an analog computing paradigm with imperfect error correction, full reproducibility will go out the window. Floating point is not that though.Now, we could insist that we somehow legislate for the way compound calculations are conducted. But that would cripple the speed of calculations for some processor architecture / paradigms for a goal (reproduceability) which is worthy, but for 99% of usages not sufficiently beneficial to pay the big price in performance.It's really not that expensive, changing the result via optimizations is disabled by default in LDC, and actually, how do you know that the compiler does not pessimize your hand-optimized compound operations. I am not even against people being able to pass -ffast-math, it should just not destroy the correctness and reproducibility of everyone else's computations.
Aug 15
On 8/15/24 18:21, Abdulhaq wrote:Why do you want to reproduce the errors?It's you who calls it "errors". Someone else may just call it "results". Science. Determinism, e.g. blockchain or other deterministic lock-step networking. etc. It's not hard to come up with use cases. Sometimes what exact result you get is not very important, but it is important that you get the same one.In general floating point calculations are inexact, and we shouldn't (generalising here) expect to get the same error across different platforms.It's just not true that floating point calculations are somehow "inexact". You are confusing digital and analog computing. This is still digital computing. Rounding is a deterministic, exact function, like any other.
Aug 15
On Thursday, 15 August 2024 at 16:51:51 UTC, Timon Gehr wrote:On 8/15/24 18:21, Abdulhaq wrote:Here by "error" I mean where the result is different to a known correct answer. It's a simple matter to propose a method for producing a fractal plot, and to have a notional "correct" plot if calculated to infinite precision. Then, at sufficient depth of an e.g. mandelbrot plot, at a finite precision, some pixels will differ from the ideal plot produced by the machine of infinite precision. These divergences are, by my definition, errors. Yes, it's all results, and some results are errors.Why do you want to reproduce the errors?It's you who calls it "errors". Someone else may just call it "results".Science. Determinism, e.g. blockchain or other deterministic lock-step networking. etc. It's not hard to come up with use cases. Sometimes what exact result you get is not very important, but it is important that you get the same one.This is why I was careful to say that I was generalising. Of course we can come up with examples where reproduceability is an essential requirement. In this case we can often work out what precision is required to achieve said reproduceability.Fair point. I meant to refer to floating point numbers rather than calculations. There are an infinite number of real numbers that cannot be exactly represented by a floating point number of finite precision. The floating point representations of those numbers are inexact. If we calculate 1/3 using floating point, the result can be exactly correct in the sense there is a correct answer in the world of floating point, but the calculated result will be an inexact representation of the correct number, which is what I had in mind.In general floating point calculations are inexact, and we shouldn't (generalising here) expect to get the same error across different platforms.It's just not true that floating point calculations are somehow "inexact". You are confusing digital and analog computing. This is still digital computing. Rounding is a deterministic, exact function, like any other.
Aug 15
On Thursday, 8 August 2024 at 10:31:32 UTC, Carsten Schlote wrote:Hi I'm playing with CTFE in D. This feature allows for a lot of funny things, e.g. initialisation of immutable data at compile time with the result of some other function (template). As a result I get immutable result blobs compiled into the binary. But none of the generating code, because it was already executed by CTFE. This worked nicly for several other usecases as well.For now the results of CTFE and RT were always the same. As expected. However, yesterday a unit-test started to report, that the results created by the same code with same parameters differ when run in CTFE mode or at runtime. ```D static immutable ubyte[] burningShipImageCTFE = generateBurningShipImage(twidth, theight, maxIter); immutable ubyte[] burningShipImageRT = generateBurningShipImage(twidth, theight, maxIter); assert(burningShipImageCTFE == burningShipImageRT, "Same results expected."); ``` I diffed the pictures and indeed some of the pixels in the more complex areas of the BurningShip fractal were clearly and noteably different. Ok, the fractal code uses 'double' floats, which are by their very nature limited in precision. But assuming that the math emulation of CTFE works identical as the CPU does at runtime, the outcome should be identical. Or not, in some cases ;-) E.g. with a fractal equation where smallest changes can result into big differences. And it opens up some questions: - Can CTFE be used under all circumstances when float numbers of any precision are involved? - Or is this some kind of expected behaviour whenever floats are involved? - Is the D CTFE documentation completely covering such possible issues? I can imagine that bugs causes by such subtil differences might be very difficult to fix. Any experiences or thought on this?Experiences, little, as I'm not doing floating-point stuff professionally, but I know my stuff because in the past, for some years, I did the lecture assistance for a numerical programming course. The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs. Of course, results in user code can differ between CTFE and RT due to using `__ctfe` incorrectly. It might be noteworthy that C++ (at least up to including C++17) does not allow floating-point types in CTFE (i.e. in `constexpr` execution) and my suspicion is that this is the reason. Maybe the solution is the same: Remove floating-point operations from CTFE or at least ones that could differ from RT. It would be awkward, at last in some people's opinion, because that would mean that at CTFE, only `real` is available, despite it being implementation defined (it's not just platform dependent, it's also how expressions are interpreted), while `double` and `float` being seemingly exactly defined. The reason is that `real` can't be optimized to higher precision as it's the highest precision format supposed by the platform by definition, whereas for the smaller formats, RT results may differ for different optimization levels. What the compiler could do, however is replace `a + b * c` by a fused multiply add. If it does that consistently across CTFE and RT, as I read the spec, it would be allowed to do that, even for `real`. The reason D specifies floating-point operations not that precisely is to allow for optimizations. Generally speaking, optimizations require some leeway in the spec. Optimizations are also required not to change observable behavior, but what counts as that is again up to the spec. In C++, the compiler is allowed to optimize away copies, even if those would invoke a copy constructor that has observable side-effects. As I see it (not my opinion, just what I see and conclude), D specifies differences in floating-point operations due to them being carried out in higher-than-required precision not an obersvable side-effect, i.e. one that the optimizer must preserve, even if you can practically observe a difference. The reason for that is probably because Walter didn't like that other languages nailed down floating-point operations so that you'd get both less precise results *and* worse performance. That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends) probably also true for basically all hardware today if you consider `float` specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision. An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account. This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations. In Zig, all integer overflow is undefined for that reason, and for wrap-around or saturated arithmetic, there are separate operators. D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer. You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized. Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.
Aug 17
On 8/17/24 18:33, Quirin Schroll wrote:The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs.No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.The reason for that is probably because Walter didn't like that other languages nailed down floating-point operationsProbably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.so that you'd get both less precise results *and* worse performance.Imagine just manually using the data type that is most suitable for your use case.That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends)Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.probably also true for basically all hardware today if you consider `float` specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account.I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations.If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer.It does not do that. Anyway, I would expect that to go to std.numeric.You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized.You say "unoptimized", I hear "not broken". Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware. Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.
Aug 18
On Sunday, 18 August 2024 at 12:57:41 UTC, Timon Gehr wrote:On 8/17/24 18:33, Quirin Schroll wrote:I think you got me wrong here on a key aspect: I’m not trying to argue, but to explain what the D spec means and outline possible rationales for why it is as it is. For example, my x87 knowledge isn’t “outdated,” x87 co-processors didn’t change. They’re just not relevant anymore practically, but they could have influenced *why* Walter specified D as he did. I totally agree with you that in the context of modern hardware, D’s float spec makes little sense. Then I tried to outline a compromise between Walter’s numerously stated opinion and the desires of many D community members, including you. I’m somewhere between your side and neither side. I know enough about floating-point arithmetic to avoid it as much as I can, and it’s not due to how languages implement it, but how it works in practice. Personally, I wouldn’t even care much if D removed floating-point types entirely. I just see (saw?, years ago) great potential in D and want it to succeed, and I believe having reproducible results would be a big win. One aspect about immune-to-optimizations types (calling them `float32` and `float64` for now) would be that if you have both a context in which you want good and fast algebraic results (where using fused-multiply-add is welcomed), but also a context, in which reproducibility is required, you could use `double` in the first context, and `float64` in the second, while passing `-ffast-math`. Maybe a `pragma` on one kind of those functions is better. I don’t know. They’re not mutually exclusive. LDC already has ` fastmath`, so maybe it can just become official and be part of the D spec?The normal use case for floating-point isn't perfectly reproducible results between different optimization levels. I would imagine the vast majority of FLOPs nowadays are used in HPC and AI workloads. Reproducibility is at least a plus, particularly in a research context.However, differences between CTFE and RT are indeed unacceptable for core-language operations. Those are bugs.No, they are not bugs, it's just the same kind of badly designed specification. According to the specification, you can get differences between RT and RT when running the exact same function. Of course you will get differences between CTFE and RT.The reason for that is probably because Walter didn't like that other languages nailed down floating-point operationsProbably. C famously nails down floating-point operations, just like it nails down all the other types. D is really well-known for all of its unportable built-in data types, because Walter really does not like nailing things down and this is not one of D's selling points. /s Anyway, at least LDC is sane on this at runtime by default. Otherwise I would have to switch language for use cases involving floating point, which would probably just make me abandon D in the long run.so that you'd get both less precise results *and* worse performance.Imagine just manually using the data type that is most suitable for your use case.That would for example be the case on an 80387 coprocessor, and (here's where my knowledge ends)Then your knowledge may be rather out of date. I get the x87 shenanigans, but that's just not very relevant anymore. I am not targeting 32-bit x86 with anything nowadays.probably also true for basically all hardware today if you consider `float` specifically. I know of no hardware, that supports single precision, but not double precision. Giving you double precision instead of single is at least basically free and possibly even a performance boost, while also giving you more precision.It's nonsense. If I want double, I ask for double. Also, it's definitely not true that going to double instead of single precision will boost your performance on a modern machine. If you are lucky it will not slow you down, but if the code can be auto-vectorized (or you are vectorizing manually), you are looking at least at a 2x slowdown.An algorithm like Kahan summation must be implemented in a way that takes those optimizations into account.I.e., do not try to implement this at all with the built-in floating-point types. It's impossible.This is exactly like in C++, signed integer overflow is undefined, not because it's undefined on the hardware, but because it allows for optimizations.If you have to resort to invoking insane C++ precedent in order to defend a point, you have lost the debate. Anyway, it is not at all the same (triggered by overflow vs triggered by default, undefined behavior vs wrong result), and also, in D, signed overflow is actually defined behavior.D could easily add specific functions to `core.math` that specify operations as specifically IEEE-754 confirming. Using those, Phobos could give you types that are specified to produce results as specified by IEEE-754, with no interference by the optimizer.It does not do that. Anyway, I would expect that to go to std.numeric.You can't actually do the reverse, i.e. provide a type in Phobos that allows for optimizations of that sort but the core-language types are guaranteed to be unoptimized.You say "unoptimized", I hear "not broken". Anyway, clearly the default should be the variant with less pitfalls. If you really want to add some sort of flexible-precision data types, why not, but there should be a compiler flag to disable it.Such a type would have to be compiler-recognized, i.e. it would end up being a built-in type.I have no desire at all to suffer from irreproducible behavior because some dependency tried to max out on some irrelevant to me benchmark. I also have no desire at all to suffer from an unnecessary performance penalty just to recover reproducible behavior that is exposed directly by the hardware. Of course, then there's the issue that libc math functions are not fully precise and have differences between implementations, but at least there seems to be some movement on that front, and this is easy to work around given that the built-in operations are sane.
Aug 19