www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - No CTFE of function

reply Cecil Ward <d cecilward.com> writes:
I have a pure function that has constant inputs, known at 
compile-time, contains no funny stuff internally - looked at the 
generated code, and no RTL calls at all. But in a test call with 
constant literal values (arrays initialised to literal) passed to 
the pure routine GDC refuses to CTFE the whole thing, as I would 
expect it (based on previous experience with d and GDC) to simply 
generate a trivial function that puts out a block of 
CTFE-evaluated constant data corresponding to the input.

Unfortunately it's a bit too long to post in here. I've tried 
lots of variations. Function is marked nogc safe pure nothrow

Any ideas as to why GDC might just refuse to do CTFE on 
compile-time-known inputs in a truly pure situation? Haven't 
tried DMD yet. Can try LDC. Am using d.godbolt.org to look at the 
result, as I don't have a machine here to run a d compiler on.

Other things I can think of. Contains function-in-a-function 
calls, which are all unlined out in the generated code nicely, 
and not the first time I've done that with GDC either.

Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to 
presume + enable the latest x86-64 instructions. release build, 
no bounds-checks.
Aug 26 2017
next sibling parent reply ag0aep6g <anonymous example.com> writes:
On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?
That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
Aug 26 2017
parent reply Cecil Ward <d cecilward.com> writes:
On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?
That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).
Aug 26 2017
parent reply Cecil Ward <d cecilward.com> writes:
On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?
That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).
I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
Aug 26 2017
next sibling parent Cecil Ward <d cecilward.com> writes:
On Saturday, 26 August 2017 at 23:53:36 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 [...]
I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
I suspect I posted this in the wrong category completely, should have been under GDC (poss applies to LDC too, will test that)
Aug 26 2017
prev sibling next sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, August 26, 2017 23:53:36 Cecil Ward via Digitalmars-d-learn 
wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on
 compile-time-known inputs in a truly pure situation?
That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).
I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
I don't know what you've seen before, but CTFE _only_ happens when the result must be known at compile time - e.g. it's used to directly initialize an enum or static variable. You will _never_ see CTFE done simply because you called the function with literals. It's quite possible that GDC's optimizer could inline the function and do constant folding and significantly reduce the code that you actually end up with (maybe even optimize it out entirely in some cases), but it would not be CTFE. It would simply be the compiler backend optimizing the code. CTFE is done by the frontend, and it's the same across dmd, ldc, and gdc so long as they have the same version of the frontend (though the current version of gdc is quite old, so if anything, it's behind on what it can do). So, if you want CTFE to occur, then you _must_ assign the result to something that must have its value known at compile time, and that will be the same across the various compilers so long as the frontend version is the same. Any optimizations which might optimize out function calls would be highly dependent on the compiler backend and could easily differ across compiler versions. My guess is that you previously saw your code optimized down such that you thought that the compiler used CTFE when it didn't and that you're not seeing such an optimization now, because your function is too large. If you want to guarantee that the call is made at compile time and not worry about whether the optimizer will do what you want, just assign the result to an enum and then use the enum rather than hoping that the optimizer will optimize the call out for you. - Jonathan M Davis
Aug 26 2017
parent Cecil Ward <d cecilward.com> writes:
On Sunday, 27 August 2017 at 00:08:45 UTC, Jonathan M Davis wrote:
 [...]
Indeed. I used the term CTFE too loosely.
Aug 28 2017
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On 08/27/2017 01:53 AM, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
[...]
 I think I understand, but I'm not sure. I should have explained 
 properly. I suspect what I should have said was that I was expecting 
 an _optimisation_ and I didn't see it. I thought that a specific 
 instance of a call to my pure function that has all compile-time-known 
 arguments would just produce generated code that returned an explicit 
 constant that is worked out by CTFE calculation, replacing the actual 
 code for the general function entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should have been 
 replaced by generated x64 code looking exactly literally like
     auto foo() { return 5; }
 expect that the returned result would be a fixed-length literal array 
 of 32-but numbers in my case (no dynamic arrays anywhere, these I 
 believe potentially involve RTL calls and the allocator internally).
I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
I don't know what might prevent the optimization. You can force (actual) CTFE with an enum or static variable. Then you don't have to rely on the optimizer. And the compiler will reject the code if you try something that can't be done at compile time. Example: ---- auto foo() { enum r = bar(2, 3); return r; } ---- Please don't use the term "CTFE" for the optimization. The two are related, of course. The optimizer may literally evaluate functions at compile time. But I think we better reserve the acronym "CTFE" for the guaranteed/forced kind of precomputation, to avoid confusion.
Aug 26 2017
parent reply Cecil Ward <d cecilward.com> writes:
On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:
 On 08/27/2017 01:53 AM, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
[...]
 I think I understand, but I'm not sure. I should have 
 explained properly. I suspect what I should have said was 
 that I was expecting an _optimisation_ and I didn't see it. I 
 thought that a specific instance of a call to my pure 
 function that has all compile-time-known arguments would just 
 produce generated code that returned an explicit constant 
 that is worked out by CTFE calculation, replacing the actual 
 code for the general function entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should 
 have been replaced by generated x64 code looking exactly 
 literally like
     auto foo() { return 5; }
 expect that the returned result would be a fixed-length 
 literal array of 32-but numbers in my case (no dynamic arrays 
 anywhere, these I believe potentially involve RTL calls and 
 the allocator internally).
I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
I don't know what might prevent the optimization. You can force (actual) CTFE with an enum or static variable. Then you don't have to rely on the optimizer. And the compiler will reject the code if you try something that can't be done at compile time. Example: ---- auto foo() { enum r = bar(2, 3); return r; } ---- Please don't use the term "CTFE" for the optimization. The two are related, of course. The optimizer may literally evaluate functions at compile time. But I think we better reserve the acronym "CTFE" for the guaranteed/forced kind of precomputation, to avoid confusion.
Static had already been tried. Failed. Thanks to your tip, I tried enum next. Failed as well, wouldn't compile with GDC. I tried LDC, which did the right thing in all cases. Optimised correctly in every use case to not compute in the generated code, just return the literal compile-time calculated result array by writing a load of immediate values straight to the destination. Hurrah for LDC. Then tried DMD via web-based edit/compile feature at dlang.org website. Refused to compile in the enum case and actually told me why, in a very very cryptic way. I worked out that it has a problem internally (this is a now an assignment into an enum, so I have permission to use the term CTFE now) in that it refuses to do CTFE if any variable is declared using an =void initialiser to stop the wasteful huge pre-fill with zeros which could take half an hour on a large object with slow memory and for all I know play havoc with the cache. So simply deleting the = void fixed the problem with DMD. So that's it. There are unknown random internal factors that prevent CTFE or CTFE-type optimisation. I had wondered if pointers might present a problem. The function in question originally was specced something like pure nothrow nogc safe void pure_compute( result_t * p_result, in input_t x ) and just as a test, I tried changing it to result_t pure_compute( in input_t x ) instead. I don't think it makes any difference though. I discovered the DMD -void thing at that point so this was not checked out properly. Your enum tip was very helpful. Ps GDC errors: Another thing that has wasted a load of time is that GDC signals errors on lines where there is a function call that is fine, yet the only problem is in the body of the function that is _being_ called itself, and fixing the function makes the phantom error at the call-site go away. This nasty behaviour has you looking for errors at and before the call-site, or thinking you have the spec of the call args wrong or incorrect types. [Compiler-Explorer problem : I am perhaps blaming GDC unfairly, because I have only ever used it through the telescope that is d.godbolt.org and I am assuming that reports errors on the correct source lines. It doesn't show error message text tho, which is a nightmare, but nothing to do with the compiler obviously.]
Aug 27 2017
parent reply Cecil Ward <d cecilward.com> writes:
On Sunday, 27 August 2017 at 17:36:54 UTC, Cecil Ward wrote:
 On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:
 [...]
Static had already been tried. Failed. Thanks to your tip, I tried enum next. Failed as well, wouldn't compile with GDC. [...]
I wonder if there is anything written up anywhere about what kinds of things are blockers to either CTFE or to successful constant-folding optimisation in particular compilers or in general? Would be useful to know what to stay away from if you really need to make sure that horrendously slow code does not get run at runtime. Sometimes it is possible or even relatively easy to reorganise things and do without certain practices in order to win such a massive reward.
Aug 27 2017
parent reply Mike Parker <aldacron gmail.com> writes:
On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:
 I wonder if there is anything written up anywhere about what 
 kinds of things are blockers to either CTFE or to successful 
 constant-folding optimisation in particular compilers or in 
 general?

 Would be useful to know what to stay away from if you really 
 need to make sure that horrendously slow code does not get run 
 at runtime. Sometimes it is possible or even relatively easy to 
 reorganise things and do without certain practices in order to 
 win such a massive reward.
The rules for CTFE are outlined in the docs [1]. What is described there is all there is to it. If those criteria are not met, the function cannot be executed at compile time. More importantly, as mentioned earlier in the thread, CTFE will only occur if a function *must* be executed at compile time, i.e. it is in a context where the result of the function is required at compile-time. An enum declaration is such a situation, a variable initialization is not. There are also a couple of posts on the D Blog. Stefan has written about the new CTFE engine [1] and I posted something showing a compile-time sort. These illustrate the points laid out in the documentation. As for compiler optimizations, there are some basic optimizations that will be common across all compilers, and you can google for compiler optimizations to find such generalities. Many of these apply across languages, and those specific to the C-family languages will likely be found in D compilers. Beyond that, I'm unaware of any documentation that outlines optimizations in D compilers. [1] https://dlang.org/spec/function.html#interpretation [2] https://dlang.org/blog/2017/04/10/the-new-ctfe-engine/ [3] https://dlang.org/blog/2017/06/05/compile-time-sort-in-d/
Aug 27 2017
parent Cecil Ward <d cecilward.com> writes:
On Monday, 28 August 2017 at 03:16:24 UTC, Mike Parker wrote:
 On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:
 [...]
The rules for CTFE are outlined in the docs [1]. What is described there is all there is to it. If those criteria are not met, the function cannot be executed at compile time. More importantly, as mentioned earlier in the thread, CTFE will only occur if a function *must* be executed at compile time, i.e. it is in a context where the result of the function is required at compile-time. An enum declaration is such a situation, a variable initialization is not. [...]
Those links are extremely useful. Many thanks. Because I am full of NHS pain drugs, I am pretty confused half the time, and so finding documentation is difficult for me through the haze, so much appreciated. RTFM of course applies as always.
Aug 28 2017
prev sibling parent Cecil Ward <d cecilward.com> writes:
On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 I have a pure function that has constant inputs, known at 
 compile-time, contains no funny stuff internally - looked at 
 the generated code, and no RTL calls at all. But in a test call 
 with constant literal values (arrays initialised to literal) 
 passed to the pure routine GDC refuses to CTFE the whole thing, 
 as I would expect it (based on previous experience with d and 
 GDC) to simply generate a trivial function that puts out a 
 block of CTFE-evaluated constant data corresponding to the 
 input.

 Unfortunately it's a bit too long to post in here. I've tried 
 lots of variations. Function is marked nogc safe pure nothrow

 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation? Haven't 
 tried DMD yet. Can try LDC. Am using d.godbolt.org to look at 
 the result, as I don't have a machine here to run a d compiler 
 on.

 Other things I can think of. Contains function-in-a-function 
 calls, which are all unlined out in the generated code nicely, 
 and not the first time I've done that with GDC either.

 Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to 
 presume + enable the latest x86-64 instructions. release build, 
 no bounds-checks.
I will henceforth use the enum trick advice all times. I noticed that the problem with init =void is compiler-dependent. Using an enum for real CTFE, I don't get error messages from LDC or GDC (i.e. [old?] versions currently up on d.godbolt.org) x64 compilers even if I do use the =void optimisation. This saved a totally wasteful and pointless zero-fill of 64 bytes using 2 YMM instructions in the particular unit test case I had, but of course could easily be dramatically bad news depending on the array size I am unnecessarily filling.
Aug 28 2017