digitalmars.D.learn - No CTFE of function

Cecil Ward (20/20) Aug 26 2017 I have a pure function that has constant inputs, known at

ag0aep6g (8/10) Aug 26 2017 That's not how CTFE works. CTFE only kicks in when the *result*

Cecil Ward (17/27) Aug 26 2017 I think I understand, but I'm not sure. I should have explained

Cecil Ward (10/40) Aug 26 2017 I was expecting this optimisation to 'return literal constant

Cecil Ward (3/15) Aug 26 2017 I suspect I posted this in the wrong category completely, should
Jonathan M Davis via Digitalmars-d-learn (26/71) Aug 26 2017 I don't know what you've seen before, but CTFE _only_ happens when the

Cecil Ward (2/3) Aug 28 2017 Indeed. I used the term CTFE too loosely.

ag0aep6g (14/40) Aug 26 2017 I don't know what might prevent the optimization.

Cecil Ward (44/91) Aug 27 2017 Static had already been tried. Failed. Thanks to your tip, I

Cecil Ward (10/15) Aug 27 2017 I wonder if there is anything written up anywhere about what

Mike Parker (23/32) Aug 27 2017 The rules for CTFE are outlined in the docs [1]. What is

Cecil Ward (5/16) Aug 28 2017 Those links are extremely useful. Many thanks. Because I am full

Cecil Ward (10/32) Aug 28 2017 I will henceforth use the enum trick advice all times.

Cecil Ward <d cecilward.com> writes:

I have a pure function that has constant inputs, known at 
compile-time, contains no funny stuff internally - looked at the 
generated code, and no RTL calls at all. But in a test call with 
constant literal values (arrays initialised to literal) passed to 
the pure routine GDC refuses to CTFE the whole thing, as I would 
expect it (based on previous experience with d and GDC) to simply 
generate a trivial function that puts out a block of 
CTFE-evaluated constant data corresponding to the input.

Unfortunately it's a bit too long to post in here. I've tried 
lots of variations. Function is marked nogc safe pure nothrow

Any ideas as to why GDC might just refuse to do CTFE on 
compile-time-known inputs in a truly pure situation? Haven't 
tried DMD yet. Can try LDC. Am using d.godbolt.org to look at the 
result, as I don't have a machine here to run a d compiler on.

Other things I can think of. Contains function-in-a-function 
calls, which are all unlined out in the generated code nicely, 
and not the first time I've done that with GDC either.

Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to 
presume + enable the latest x86-64 instructions. release build, 
no bounds-checks.

Aug 26 2017

ag0aep6g <anonymous example.com> writes:

On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?

That's not how CTFE works. CTFE only kicks in when the *result* 
is required at compile time. For example, when you assign it to 
an enum. The inputs must be known at compile time, and the 
interpreter will refuse to go on when you try something impure. 
But those things don't trigger CTFE.

The compiler may choose to precompute any constant expression, 
but that's an optimization (constant folding), not CTFE.

Aug 26 2017

Cecil Ward <d cecilward.com> writes:

On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?

 That's not how CTFE works. CTFE only kicks in when the *result* 
 is required at compile time. For example, when you assign it to 
 an enum. The inputs must be known at compile time, and the 
 interpreter will refuse to go on when you try something impure. 
 But those things don't trigger CTFE.

 The compiler may choose to precompute any constant expression, 
 but that's an optimization (constant folding), not CTFE.

I think I understand, but I'm not sure. I should have explained 
properly. I suspect what I should have said was that I was 
expecting an _optimisation_ and I didn't see it. I thought that a 
specific instance of a call to my pure function that has all 
compile-time-known arguments would just produce generated code 
that returned an explicit constant that is worked out by CTFE 
calculation, replacing the actual code for the general function 
entirely. So for example

     auto foo() { return bar( 2, 3 ); }

(where bar is strongly pure and completely CTFE-able) should have 
been replaced by generated x64 code looking exactly literally like
     auto foo() { return 5; }
expect that the returned result would be a fixed-length literal 
array of 32-but numbers in my case (no dynamic arrays anywhere, 
these I believe potentially involve RTL calls and the allocator 
internally).

Aug 26 2017

Cecil Ward <d cecilward.com> writes:

On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation?

 That's not how CTFE works. CTFE only kicks in when the 
 *result* is required at compile time. For example, when you 
 assign it to an enum. The inputs must be known at compile 
 time, and the interpreter will refuse to go on when you try 
 something impure. But those things don't trigger CTFE.

 The compiler may choose to precompute any constant expression, 
 but that's an optimization (constant folding), not CTFE.

 I think I understand, but I'm not sure. I should have explained 
 properly. I suspect what I should have said was that I was 
 expecting an _optimisation_ and I didn't see it. I thought that 
 a specific instance of a call to my pure function that has all 
 compile-time-known arguments would just produce generated code 
 that returned an explicit constant that is worked out by CTFE 
 calculation, replacing the actual code for the general function 
 entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should 
 have been replaced by generated x64 code looking exactly 
 literally like
     auto foo() { return 5; }
 expect that the returned result would be a fixed-length literal 
 array of 32-but numbers in my case (no dynamic arrays anywhere, 
 these I believe potentially involve RTL calls and the allocator 
 internally).

I was expecting this optimisation to 'return literal constant 
only' because I have seen it before in other cases with GDC. 
Obviously generating a call that involves running the algorithm 
at runtime is a performance disaster when it certainly could have 
all been thrown away in the particular case in point and been 
replaced by a return of a precomputed value with zero runtime 
cost. So this is actually an issue with specific compilers, but I 
was wondering if I have missed anything about any D general rules 
that make CTFE evaluation practically impossible?

Aug 26 2017

Cecil Ward <d cecilward.com> writes:

On Saturday, 26 August 2017 at 23:53:36 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 [...]

 I was expecting this optimisation to 'return literal constant 
 only' because I have seen it before in other cases with GDC. 
 Obviously generating a call that involves running the algorithm 
 at runtime is a performance disaster when it certainly could 
 have all been thrown away in the particular case in point and 
 been replaced by a return of a precomputed value with zero 
 runtime cost. So this is actually an issue with specific 
 compilers, but I was wondering if I have missed anything about 
 any D general rules that make CTFE evaluation practically 
 impossible?

I suspect I posted this in the wrong category completely, should 
have been under GDC (poss applies to LDC too, will test that)

Aug 26 2017

Jonathan M Davis via Digitalmars-d-learn writes:

On Saturday, August 26, 2017 23:53:36 Cecil Ward via Digitalmars-d-learn 
wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:
 On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 Any ideas as to why GDC might just refuse to do CTFE on
 compile-time-known inputs in a truly pure situation?

 That's not how CTFE works. CTFE only kicks in when the
 *result* is required at compile time. For example, when you
 assign it to an enum. The inputs must be known at compile
 time, and the interpreter will refuse to go on when you try
 something impure. But those things don't trigger CTFE.

 The compiler may choose to precompute any constant expression,
 but that's an optimization (constant folding), not CTFE.

 I think I understand, but I'm not sure. I should have explained
 properly. I suspect what I should have said was that I was
 expecting an _optimisation_ and I didn't see it. I thought that
 a specific instance of a call to my pure function that has all
 compile-time-known arguments would just produce generated code
 that returned an explicit constant that is worked out by CTFE
 calculation, replacing the actual code for the general function
 entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should
 have been replaced by generated x64 code looking exactly
 literally like

     auto foo() { return 5; }

 expect that the returned result would be a fixed-length literal
 array of 32-but numbers in my case (no dynamic arrays anywhere,
 these I believe potentially involve RTL calls and the allocator
 internally).

 I was expecting this optimisation to 'return literal constant
 only' because I have seen it before in other cases with GDC.
 Obviously generating a call that involves running the algorithm
 at runtime is a performance disaster when it certainly could have
 all been thrown away in the particular case in point and been
 replaced by a return of a precomputed value with zero runtime
 cost. So this is actually an issue with specific compilers, but I
 was wondering if I have missed anything about any D general rules
 that make CTFE evaluation practically impossible?

I don't know what you've seen before, but CTFE _only_ happens when the
result must be known at compile time - e.g. it's used to directly initialize
an enum or static variable. You will _never_ see CTFE done simply because
you called the function with literals. It's quite possible that GDC's
optimizer could inline the function and do constant folding and
significantly reduce the code that you actually end up with (maybe even
optimize it out entirely in some cases), but it would not be CTFE. It would
simply be the compiler backend optimizing the code. CTFE is done by the
frontend, and it's the same across dmd, ldc, and gdc so long as they have
the same version of the frontend (though the current version of gdc is quite
old, so if anything, it's behind on what it can do). So, if you want CTFE to
occur, then you _must_ assign the result to something that must have its
value known at compile time, and that will be the same across the various
compilers so long as the frontend version is the same. Any optimizations
which might optimize out function calls would be highly dependent on the
compiler backend and could easily differ across compiler versions.

My guess is that you previously saw your code optimized down such that you
thought that the compiler used CTFE when it didn't and that you're not
seeing such an optimization now, because your function is too large. If you
want to guarantee that the call is made at compile time and not worry about
whether the optimizer will do what you want, just assign the result to an
enum and then use the enum rather than hoping that the optimizer will
optimize the call out for you.

- Jonathan M Davis

Aug 26 2017

Cecil Ward <d cecilward.com> writes:

On Sunday, 27 August 2017 at 00:08:45 UTC, Jonathan M Davis wrote:
 [...]

Indeed. I used the term CTFE too loosely.

Aug 28 2017

ag0aep6g <anonymous example.com> writes:

On 08/27/2017 01:53 AM, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:

[...]
 I think I understand, but I'm not sure. I should have explained 
 properly. I suspect what I should have said was that I was expecting 
 an _optimisation_ and I didn't see it. I thought that a specific 
 instance of a call to my pure function that has all compile-time-known 
 arguments would just produce generated code that returned an explicit 
 constant that is worked out by CTFE calculation, replacing the actual 
 code for the general function entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should have been 
 replaced by generated x64 code looking exactly literally like
     auto foo() { return 5; }
 expect that the returned result would be a fixed-length literal array 
 of 32-but numbers in my case (no dynamic arrays anywhere, these I 
 believe potentially involve RTL calls and the allocator internally).

 
 I was expecting this optimisation to 'return literal constant only' 
 because I have seen it before in other cases with GDC. Obviously 
 generating a call that involves running the algorithm at runtime is a 
 performance disaster when it certainly could have all been thrown away 
 in the particular case in point and been replaced by a return of a 
 precomputed value with zero runtime cost. So this is actually an issue 
 with specific compilers, but I was wondering if I have missed anything 
 about any D general rules that make CTFE evaluation practically impossible?

I don't know what might prevent the optimization.

You can force (actual) CTFE with an enum or static variable.
Then you don't have to rely on the optimizer. And the compiler will 
reject the code if you try something that can't be done at compile time.

Example:
----
auto foo() { enum r = bar(2, 3); return r; }
----

Please don't use the term "CTFE" for the optimization. The two are 
related, of course. The optimizer may literally evaluate functions at 
compile time. But I think we better reserve the acronym "CTFE" for the 
guaranteed/forced kind of precomputation, to avoid confusion.

Aug 26 2017

Cecil Ward <d cecilward.com> writes:

On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:
 On 08/27/2017 01:53 AM, Cecil Ward wrote:
 On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:

 [...]
 I think I understand, but I'm not sure. I should have 
 explained properly. I suspect what I should have said was 
 that I was expecting an _optimisation_ and I didn't see it. I 
 thought that a specific instance of a call to my pure 
 function that has all compile-time-known arguments would just 
 produce generated code that returned an explicit constant 
 that is worked out by CTFE calculation, replacing the actual 
 code for the general function entirely. So for example

     auto foo() { return bar( 2, 3 ); }

 (where bar is strongly pure and completely CTFE-able) should 
 have been replaced by generated x64 code looking exactly 
 literally like
     auto foo() { return 5; }
 expect that the returned result would be a fixed-length 
 literal array of 32-but numbers in my case (no dynamic arrays 
 anywhere, these I believe potentially involve RTL calls and 
 the allocator internally).

 
 I was expecting this optimisation to 'return literal constant 
 only' because I have seen it before in other cases with GDC. 
 Obviously generating a call that involves running the 
 algorithm at runtime is a performance disaster when it 
 certainly could have all been thrown away in the particular 
 case in point and been replaced by a return of a precomputed 
 value with zero runtime cost. So this is actually an issue 
 with specific compilers, but I was wondering if I have missed 
 anything about any D general rules that make CTFE evaluation 
 practically impossible?

 I don't know what might prevent the optimization.

 You can force (actual) CTFE with an enum or static variable.
 Then you don't have to rely on the optimizer. And the compiler 
 will reject the code if you try something that can't be done at 
 compile time.

 Example:
 ----
 auto foo() { enum r = bar(2, 3); return r; }
 ----

 Please don't use the term "CTFE" for the optimization. The two 
 are related, of course. The optimizer may literally evaluate 
 functions at compile time. But I think we better reserve the 
 acronym "CTFE" for the guaranteed/forced kind of 
 precomputation, to avoid confusion.

Static had already been tried. Failed. Thanks to your tip, I 
tried enum next. Failed as well, wouldn't compile with GDC.

I tried LDC, which did the right thing in all cases. Optimised 
correctly in every use case to not compute in the generated code, 
just return the literal compile-time calculated result array by 
writing a load of immediate values straight to the destination. 
Hurrah for LDC.

Then tried DMD via web-based edit/compile feature at dlang.org 
website. Refused to compile in the enum case and actually told me 
why, in a very very cryptic way. I worked out that it has a 
problem internally (this is a now an assignment into an enum, so 
I have permission to use the term CTFE now) in that it refuses to 
do CTFE if any variable is declared using an =void initialiser to 
stop the wasteful huge pre-fill with zeros which could take half 
an hour on a large object with slow memory and for all I know 
play havoc with the cache. So simply deleting the = void fixed 
the problem with DMD.

So that's it. There are unknown random internal factors that 
prevent CTFE or CTFE-type optimisation.

I had wondered if pointers might present a problem. The function 
in question originally was specced something like
     pure nothrow  nogc  safe
     void pure_compute( result_t * p_result, in input_t x )

and just as a test, I tried changing it to

     result_t  pure_compute( in input_t x )

instead. I don't think it makes any difference though. I 
discovered the DMD -void thing at that point so this was not 
checked out properly.

Your enum tip was very helpful.

Ps
GDC errors: Another thing that has wasted a load of time is that 
GDC signals errors on lines where there is a function call that 
is fine, yet the only problem is in the body of the function that 
is _being_ called itself, and fixing the function makes the 
phantom error at the call-site go away. This nasty behaviour has 
you looking for errors at and before the call-site, or thinking 
you have the spec of the call args wrong or incorrect types. 
[Compiler-Explorer problem : I am perhaps blaming GDC unfairly, 
because I have only ever used it through the telescope that is 
d.godbolt.org and I am assuming that reports errors on the 
correct source lines. It doesn't show error message text tho, 
which is a nightmare, but nothing to do with the compiler 
obviously.]

Aug 27 2017

Cecil Ward <d cecilward.com> writes:

On Sunday, 27 August 2017 at 17:36:54 UTC, Cecil Ward wrote:
 On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:
 [...]

 Static had already been tried. Failed. Thanks to your tip, I 
 tried enum next. Failed as well, wouldn't compile with GDC.

 [...]

I wonder if there is anything written up anywhere about what 
kinds of things are blockers to either CTFE or to successful 
constant-folding optimisation in particular compilers or in 
general?

Would be useful to know what to stay away from if you really need 
to make sure that horrendously slow code does not get run at 
runtime. Sometimes it is possible or even relatively easy to 
reorganise things and do without certain practices in order to 
win such a massive reward.

Aug 27 2017

Mike Parker <aldacron gmail.com> writes:

On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:
 I wonder if there is anything written up anywhere about what 
 kinds of things are blockers to either CTFE or to successful 
 constant-folding optimisation in particular compilers or in 
 general?

 Would be useful to know what to stay away from if you really 
 need to make sure that horrendously slow code does not get run 
 at runtime. Sometimes it is possible or even relatively easy to 
 reorganise things and do without certain practices in order to 
 win such a massive reward.

The rules for CTFE are outlined in the docs [1]. What is 
described there is all there is to it. If those criteria are not 
met, the function cannot be executed at compile time. More 
importantly, as mentioned earlier in the thread, CTFE will only 
occur if a function *must* be executed at compile time, i.e. it 
is in a context where the result of the function is required at 
compile-time. An enum declaration is such a situation, a variable 
initialization is not.

There are also a couple of posts on the D Blog. Stefan has 
written about the new CTFE engine [1] and I posted something 
showing a compile-time sort. These illustrate the points laid out 
in the documentation.

As for compiler optimizations, there are some basic optimizations 
that will be common across all compilers, and you can google for 
compiler optimizations to find such generalities. Many of these 
apply across languages, and those specific to the C-family 
languages will likely be found in D compilers. Beyond that, I'm 
unaware of any documentation that outlines optimizations in D 
compilers.

[1] https://dlang.org/spec/function.html#interpretation
[2] https://dlang.org/blog/2017/04/10/the-new-ctfe-engine/
[3] https://dlang.org/blog/2017/06/05/compile-time-sort-in-d/

Aug 27 2017

Cecil Ward <d cecilward.com> writes:

On Monday, 28 August 2017 at 03:16:24 UTC, Mike Parker wrote:
 On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:
 [...]

 The rules for CTFE are outlined in the docs [1]. What is 
 described there is all there is to it. If those criteria are 
 not met, the function cannot be executed at compile time. More 
 importantly, as mentioned earlier in the thread, CTFE will only 
 occur if a function *must* be executed at compile time, i.e. it 
 is in a context where the result of the function is required at 
 compile-time. An enum declaration is such a situation, a 
 variable initialization is not.

 [...]

Those links are extremely useful. Many thanks. Because I am full 
of NHS pain drugs, I am pretty confused half the time, and so 
finding documentation is difficult for me through the haze, so 
much appreciated. RTFM of course applies as always.

Aug 28 2017

Cecil Ward <d cecilward.com> writes:

On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:
 I have a pure function that has constant inputs, known at 
 compile-time, contains no funny stuff internally - looked at 
 the generated code, and no RTL calls at all. But in a test call 
 with constant literal values (arrays initialised to literal) 
 passed to the pure routine GDC refuses to CTFE the whole thing, 
 as I would expect it (based on previous experience with d and 
 GDC) to simply generate a trivial function that puts out a 
 block of CTFE-evaluated constant data corresponding to the 
 input.

 Unfortunately it's a bit too long to post in here. I've tried 
 lots of variations. Function is marked nogc safe pure nothrow

 Any ideas as to why GDC might just refuse to do CTFE on 
 compile-time-known inputs in a truly pure situation? Haven't 
 tried DMD yet. Can try LDC. Am using d.godbolt.org to look at 
 the result, as I don't have a machine here to run a d compiler 
 on.

 Other things I can think of. Contains function-in-a-function 
 calls, which are all unlined out in the generated code nicely, 
 and not the first time I've done that with GDC either.

 Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to 
 presume + enable the latest x86-64 instructions. release build, 
 no bounds-checks.

I will henceforth use the enum trick advice all times.

I noticed that the problem with init =void is compiler-dependent. 
Using an enum for real CTFE, I don't get error messages from LDC 
or GDC (i.e. [old?] versions currently up on d.godbolt.org) x64 
compilers even if I do use the =void optimisation. This saved a 
totally wasteful and pointless zero-fill of 64 bytes using 2 YMM 
instructions in the particular unit test case I had, but of 
course could easily be dramatically bad news depending on the 
array size I am unnecessarily filling.

Aug 28 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - No CTFE of function