www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should pure functions be prevented from reading changeable immutable

reply Don <nospam nospam.com> writes:
Pure functions are allowed to read immutable global variables.
Currently, this even includes globals which are initialized from inside 
'static this()'.
Here's an example of how this can be a problem:

immutable int unstable;

pure int buggy() { return unstable; }

static this() {
     // fails even though buggy is pure
     assert( buggy() == ( ++unstable , buggy() ) );
}

I suspect that such functions should be forbidden from being 'pure'.
Note that they cannot be used in CTFE (conceptually, all other  safe 
pure functions could be used in CTFE, even though the current 
implementation doesn't always allow it).

The motivation for wanting to ban them is to prevent the optimiser from 
generating bad code. But if they were disallowed, there would also be 
benefits for breaking circular module dependencies:
* if module A imports only pure functions from module B,
we know that the static this() of A does not directly depend on the 
static this() of module B.
(Note though it might still depend on C, which depends on B).
* if the static this() of module A only calls pure functions, it does 
not depend on the static this() of any other module.

Probably, this would only avoid the most trivial circular module 
dependencies, so not a huge win, but still a nice bonus.
Nov 05 2010
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 05 November 2010 18:32:47 Don wrote:
 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:
 
 immutable int unstable;
 
 pure int buggy() { return unstable; }
 
 static this() {
      // fails even though buggy is pure
      assert( buggy() == ( ++unstable , buggy() ) );
 }
 
 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe
 pure functions could be used in CTFE, even though the current
 implementation doesn't always allow it).
 
 The motivation for wanting to ban them is to prevent the optimiser from
 generating bad code. But if they were disallowed, there would also be
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does
 not depend on the static this() of any other module.
 
 Probably, this would only avoid the most trivial circular module
 dependencies, so not a huge win, but still a nice bonus.

It's probably simplest to say that they can't be pure since they're accessing global state, but would it be possible to treat them as non-pure in any static constructors and then pure during the rest of the program? Perhaps just outright disabling purity optimizations in static constructors would solve the problem. Then you could treat such functions as pure quite easily without having the problems that it could cause within a static constructor. It wouldn't suprise me though if it were a bit of a pain to special case static constructors like that. - Jonathan M Davis
Nov 05 2010
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Don wrote:

 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:
 
 immutable int unstable;
 
 pure int buggy() { return unstable; }
 
 static this() {
      // fails even though buggy is pure
      assert( buggy() == ( ++unstable , buggy() ) );
 }
 
 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe
 pure functions could be used in CTFE, even though the current
 implementation doesn't always allow it).
 
 The motivation for wanting to ban them is to prevent the optimiser from
 generating bad code. But if they were disallowed, there would also be
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does
 not depend on the static this() of any other module.
 
 Probably, this would only avoid the most trivial circular module
 dependencies, so not a huge win, but still a nice bonus.

Ideally the compiler could still allow such functions when called after static this() is finished. Is this realistic? Probably not. I would vote for ban if not.
Nov 06 2010
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-11-05 21:32:47 -0400, Don <nospam nospam.com> said:

 The motivation for wanting to ban them is to prevent the optimiser from 
 generating bad code.

It seems to me that disabling pure optimizations inside 'static this()' would be enough to prevent generating bad code. It's not like pure optimizations cross function boundaries. In fact, you could still allow the optimization of pure functions in the current module. I understand that by restricting the semantics we could use pure to help use break cyclic imports, but, as much as I'd like a solution to this cyclic import problem, I don't think it's a good idea to complicate the semantics of pure further. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 06 2010
next sibling parent reply Don <nospam nospam.com> writes:
Michel Fortin wrote:
 On 2010-11-05 21:32:47 -0400, Don <nospam nospam.com> said:
 
 The motivation for wanting to ban them is to prevent the optimiser 
 from generating bad code.

It seems to me that disabling pure optimizations inside 'static this()' would be enough to prevent generating bad code. It's not like pure optimizations cross function boundaries.

That's probably doable, if we largely abandon the idea that the return value of a pure function can be cacheable. Which I think is a bit of a fanciful idea anyway. In fact, you could still allow
 the optimization of pure functions in the current module.

Yes.
 I understand that by restricting the semantics we could use pure to help 
 use break cyclic imports, but, as much as I'd like a solution to this 
 cyclic import problem, I don't think it's a good idea to complicate the 
 semantics of pure further.

Indeed, it can't be the primary motivation.
Nov 06 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 If they're not cacheable, what's the point of pure?

Pure functions allow you to be sure global variables are not modifying the results of the function, there are less ways to shoot yourself in the foot. Controlling or removing the unwanted flow of information between subsystems of your system is one of the best ways to reduce its whole complexity. Also pure functions allow the compiler to perform some optimizations, like replace foo(x)+foo(x) with 2*foo(x), or to pull out a function call from a loop. Eventually a bit improved D compiler that allows this too: http://d.puremagic.com/issues/show_bug.cgi?id=5125 may perform higher level transformations on the code, like replace: filter!pure1(map!pure2(range)) With a faster: map!pure2(filter!pure1(range)) Functional languages are able to be not too much slow because they are able to perform several of such complex transformations, that generally are not done by C-derived compilers because the compiler doesn't know enough semantics about the functions it is compiling. The more constraints that are already implicitly present in your code get somehow known by the compiler, the better the smart compiler can digest your code. Bye, bearophile
Nov 06 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Jonathan M Davis wrote:
 On Saturday 06 November 2010 07:42:52 Don wrote:
 Michel Fortin wrote:
 On 2010-11-05 21:32:47 -0400, Don <nospam nospam.com> said:
 The motivation for wanting to ban them is to prevent the optimiser
 from generating bad code.

would be enough to prevent generating bad code. It's not like pure optimizations cross function boundaries.

value of a pure function can be cacheable. Which I think is a bit of a fanciful idea anyway.

If they're not cacheable, what's the point of pure? I thought that that was the entire point.

I mean globaly cacheable. You could avoid extra calls to the function by just re-using its
 value - at least within the current expression if not the current function. I 
 quite understand avoiding caching a result for the entire run of the program
(if 
 nothing else, that could use up a lot of memory),

Yes. And working out if a function is worth caching is a very difficult problem. I think the overhead would be so high, that it would almost never be a nett gain. but I thought that avoiding
 extra function calls was the whole point of pure.

The guarantee of independence is the most important feature. From a performance point of view, the big win 'pure' gives you comes from memory management. All memory allocation can be done using a thread-local memory pool.
Nov 06 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-11-07 01:41:47 -0500, Don <nospam nospam.com> said:

 The guarantee of independence is the most important feature. From a 
 performance point of view, the big win 'pure' gives you comes from 
 memory management. All memory allocation can be done using a 
 thread-local memory pool.

Hum, are you sure about that? If a pure function allocates and returns a string, nothing prevents the non-pure calling function from sending that string to another thread. If that string is allocated from a thread-local pool, what happens? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 07 2010
parent Don <nospam nospam.com> writes:
Michel Fortin wrote:
 On 2010-11-07 01:41:47 -0500, Don <nospam nospam.com> said:
 
 The guarantee of independence is the most important feature. From a 
 performance point of view, the big win 'pure' gives you comes from 
 memory management. All memory allocation can be done using a 
 thread-local memory pool.

Hum, are you sure about that? If a pure function allocates and returns a string, nothing prevents the non-pure calling function from sending that string to another thread. If that string is allocated from a thread-local pool, what happens?

The return value must be (deep) copied from the memory pool as soon as a pure/impure boundary is crossed. Everything else in the memory pool should have its finalizer run, if any. Then the memory pool can be completely discarded. (No mark/sweep cycle required).
Nov 07 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 06 November 2010 07:42:52 Don wrote:
 Michel Fortin wrote:
 On 2010-11-05 21:32:47 -0400, Don <nospam nospam.com> said:
 The motivation for wanting to ban them is to prevent the optimiser
 from generating bad code.

It seems to me that disabling pure optimizations inside 'static this()' would be enough to prevent generating bad code. It's not like pure optimizations cross function boundaries.

That's probably doable, if we largely abandon the idea that the return value of a pure function can be cacheable. Which I think is a bit of a fanciful idea anyway.

If they're not cacheable, what's the point of pure? I thought that that was the entire point. You could avoid extra calls to the function by just re-using its value - at least within the current expression if not the current function. I quite understand avoiding caching a result for the entire run of the program (if nothing else, that could use up a lot of memory), but I thought that avoiding extra function calls was the whole point of pure. - Jonathan M Davis
Nov 06 2010
prev sibling parent spir <denis.spir gmail.com> writes:
On Sat, 6 Nov 2010 17:41:22 -0700
Jonathan M Davis <jmdavisProg gmx.com> wrote:

 That's probably doable, if we largely abandon the idea that the return
 value of a pure function can be cacheable. Which I think is a bit of a
 fanciful idea anyway. =20

If they're not cacheable, what's the point of pure? I thought that that w=

 entire point. You could avoid extra calls to the function by just re-usin=

 value - at least within the current expression if not the current functio=

 quite understand avoiding caching a result for the entire run of the prog=

 nothing else, that could use up a lot of memory), but I thought that avoi=

 extra function calls was the whole point of pure.

AFAIK, the original point of "pure" has nothing to do with efficiency, but = instead with human capacity of understanding. Pure functions, and more gene= rally avoiding state, dramatically dicreases the level of complexity. One c= an much more easily reason about a piece of code calling a pure func than a= func does uses or changes state. =46rom this pov, ability to cache results is, say, a side-effect ;-) There is= a possible intermediate position in the context of OO: allowing a "pure me= thod" to change the state of the very object it is called on (this). Thus, = one can memoize results locally (on the object itself). I like this pov, bu= t know of no language that applies it. By the way, I would enjoy some information on the optimization the compiler= does, or will do one day, based on function purity. Denis -- -- -- -- -- -- -- vit esse estrany =E2=98=A3 spir.wikidot.com
Nov 07 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Don <nospam nospam.com> wrote:

 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside  
 'static this()'.
 Here's an example of how this can be a problem:

 immutable int unstable;

 pure int buggy() { return unstable; }

 static this() {
      // fails even though buggy is pure
      assert( buggy() == ( ++unstable , buggy() ) );
 }

 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe  
 pure functions could be used in CTFE, even though the current  
 implementation doesn't always allow it).

 The motivation for wanting to ban them is to prevent the optimiser from  
 generating bad code. But if they were disallowed, there would also be  
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the  
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does  
 not depend on the static this() of any other module.

 Probably, this would only avoid the most trivial circular module  
 dependencies, so not a huge win, but still a nice bonus.

This feels a bit wrong to me - the only time when that needs to be disallowed is in static constructors, so it seems like an edge case. As you say, there are benefits to this solution, but I'm not sure they outweigh the disadvantages. It all comes down to a case of 'I feel' vs. 'you feel', though. -- Simen
Nov 06 2010
prev sibling next sibling parent Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Don napisaƂ:

 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:
 
 immutable int unstable;
 
 pure int buggy() { return unstable; }
 
 static this() {
      // fails even though buggy is pure
      assert( buggy() == ( ++unstable , buggy() ) );
 }

Interesting. It looks like more of a problem with initialization of immutables, though. I think we're in need for rules governing access to immutable globals in static constructors, e.g. module A; immutable int immut; static this() { foo(immut); // error, reading an uninitialized immutable immut = 5; // ok, initializing immutable foo(immut); // ok, reading an initialized immutable immut = 4; // error, mutating an initialized immutable } In the real world outside static this() the approach is healthy -- mutating an immutable variable is allowed only in a very narrow time-slice just to allow publishing. The post-init use of the immutable is constrained by the compiler armed with a const-aware type system. The pre-init time is also protected because there is no symbol representing an uninitialized immutable variable others can refer to (it doesn't exist before initialization). But inside static constructors immutables can be mutated at will, that's where the problems come from. -- Tomek
Nov 06 2010
prev sibling parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 06/11/2010 01:32, Don wrote:
 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:

 immutable int unstable;

 pure int buggy() { return unstable; }

 static this() {
 // fails even though buggy is pure
 assert( buggy() == ( ++unstable , buggy() ) );
 }

 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe
 pure functions could be used in CTFE, even though the current
 implementation doesn't always allow it).

 The motivation for wanting to ban them is to prevent the optimiser from
 generating bad code. But if they were disallowed, there would also be
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does
 not depend on the static this() of any other module.

 Probably, this would only avoid the most trivial circular module
 dependencies, so not a huge win, but still a nice bonus.

Hum, another nice catch Don! I'm not sure I agree with the resolution though (other than perhaps as a temporary limitation, if the alternative is too complex to implement in the compiler). I mean, looking at this conceptually, there is nothing intrinsically wrong with the 'buggy' function being pure, the real problem here is that a variable is accessed before it is /properly initialized/. 'buggy' would be fine to use afterwards. This is not really specific to pure functions, the same problem can happen with a regular functions that have immutable changing underneath them. :S It is also not specific to static constructors, an identical issue can happen with class (non-static) constructors. From this, it seems the ideal solution would be to have the compiler analyze the code, and see if any such uninitialized access occurs, and issue a compiler error if it does. However, this may be too complex to implement, so maybe when need some simpler rules, with perhaps some language restrictions on what can be done in static and non-static constructors. Doesn't TDPL mention something related to this? Because otherwise this is a very big safety hazard for immutability. -- Bruno Medeiros - Software Engineer
Nov 29 2010
parent reply Don <nospam nospam.com> writes:
Bruno Medeiros wrote:
 On 06/11/2010 01:32, Don wrote:
 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:

 immutable int unstable;

 pure int buggy() { return unstable; }

 static this() {
 // fails even though buggy is pure
 assert( buggy() == ( ++unstable , buggy() ) );
 }

 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe
 pure functions could be used in CTFE, even though the current
 implementation doesn't always allow it).

 The motivation for wanting to ban them is to prevent the optimiser from
 generating bad code. But if they were disallowed, there would also be
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does
 not depend on the static this() of any other module.

 Probably, this would only avoid the most trivial circular module
 dependencies, so not a huge win, but still a nice bonus.

Hum, another nice catch Don! I'm not sure I agree with the resolution though (other than perhaps as a temporary limitation, if the alternative is too complex to implement in the compiler). I mean, looking at this conceptually, there is nothing intrinsically wrong with the 'buggy' function being pure, the real problem here is that a variable is accessed before it is /properly initialized/. 'buggy' would be fine to use afterwards. This is not really specific to pure functions, the same problem can happen with a regular functions that have immutable changing underneath them. :S It is also not specific to static constructors, an identical issue can happen with class (non-static) constructors. From this, it seems the ideal solution would be to have the compiler analyze the code, and see if any such uninitialized access occurs, and issue a compiler error if it does. However, this may be too complex to implement, so maybe when need some simpler rules, with perhaps some language restrictions on what can be done in static and non-static constructors. Doesn't TDPL mention something related to this? Because otherwise this is a very big safety hazard for immutability.

I think I can implement it. From inside a static this(), pure functions should never be optimised as if they were pure. When this is done, we're just left with standard cases of 'variable is used before being initialized', and nothing specific to pure.
Dec 02 2010
parent reply Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 02/12/2010 08:39, Don wrote:
 Bruno Medeiros wrote:
 On 06/11/2010 01:32, Don wrote:
 Pure functions are allowed to read immutable global variables.
 Currently, this even includes globals which are initialized from inside
 'static this()'.
 Here's an example of how this can be a problem:

 immutable int unstable;

 pure int buggy() { return unstable; }

 static this() {
 // fails even though buggy is pure
 assert( buggy() == ( ++unstable , buggy() ) );
 }

 I suspect that such functions should be forbidden from being 'pure'.
 Note that they cannot be used in CTFE (conceptually, all other  safe
 pure functions could be used in CTFE, even though the current
 implementation doesn't always allow it).

 The motivation for wanting to ban them is to prevent the optimiser from
 generating bad code. But if they were disallowed, there would also be
 benefits for breaking circular module dependencies:
 * if module A imports only pure functions from module B,
 we know that the static this() of A does not directly depend on the
 static this() of module B.
 (Note though it might still depend on C, which depends on B).
 * if the static this() of module A only calls pure functions, it does
 not depend on the static this() of any other module.

 Probably, this would only avoid the most trivial circular module
 dependencies, so not a huge win, but still a nice bonus.

Hum, another nice catch Don! I'm not sure I agree with the resolution though (other than perhaps as a temporary limitation, if the alternative is too complex to implement in the compiler). I mean, looking at this conceptually, there is nothing intrinsically wrong with the 'buggy' function being pure, the real problem here is that a variable is accessed before it is /properly initialized/. 'buggy' would be fine to use afterwards. This is not really specific to pure functions, the same problem can happen with a regular functions that have immutable changing underneath them. :S It is also not specific to static constructors, an identical issue can happen with class (non-static) constructors. From this, it seems the ideal solution would be to have the compiler analyze the code, and see if any such uninitialized access occurs, and issue a compiler error if it does. However, this may be too complex to implement, so maybe when need some simpler rules, with perhaps some language restrictions on what can be done in static and non-static constructors. Doesn't TDPL mention something related to this? Because otherwise this is a very big safety hazard for immutability.

I think I can implement it. From inside a static this(), pure functions should never be optimised as if they were pure. When this is done, we're just left with standard cases of 'variable is used before being initialized', and nothing specific to pure.

Ok, but still, should the language specifiably the validity of those cases of 'variable is used before being initialized'? Should we strive for such a change? -- Bruno Medeiros - Software Engineer
Dec 03 2010
parent Steve Teale <steve.teale britseyeview.com> writes:
Bruno Medeiros Wrote:

 On 02/12/2010 08:39, Don wrote:
 Bruno Medeiros wrote:
 On 06/11/2010 01:32, Don wrote:
 Pure functions are allowed to read immutable global variables.




At the risk of becoming a troll. In some interpretations of Shiite sharia law you can be stoned to death for reading an immutable global variable! Steve
Dec 03 2010