www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Forcing weak-pure

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
There has been an increased effort as of late to make as much as possible  
inside the runtime  safe, pure, and nothrow. The goal is to make it so  
it's not as nasty writing such code, because the library will allow it.  
Everything that we mark properly down below will allow higher-level  
constructs to be marked as well.

Of course naturally, with all things this low-level, we have to make  
decisions as to what is *provably*  safe/pure/nothrow, and what is  
*logically*  safe/pure/nothrow. The most obvious example is memory  
allocation -- It is technically not pure as it accesses global state (the  
heap), but we can rationalize making it pure since we have more logical  
knowledge of the situation than the compiler (i.e. locking the global  
heap, and realizing that if a block is unallocated, it is uniquely  
accessed with the allocation of it).

One issue, however, is modification of heap metadata for immutable data.  
The D compiler has the (correct) rationalization that for a pure function  
that has only immutable or implicitly convertible to immutable parameters,  
it is strong-pure. This means, it can optimize away calls to such  
functions. But in the case of modifying heap metadata, the call is  
actually weak-pure, since it is making modifications that you *don't* want  
to optimize away.

For this reason, functions like GC.setAttr and assumeSafeAppend cannot be  
marked pure. For example:

auto str = "hello".idup;
str = str[0..1];
str.assumeSafeAppend();
str ~= "iya";

The compiler could rationally elide the call to assumeSafeAppend if it is  
pure. We are not using the return value, and the only parameter is  
immutable. Since pure functions technically have no side effects, this  
call can be eliminated. A recent compiler change made such calls warnings  
(not using result of strong-pure function). But assumeSafeAppend really  
should be weak-pure, because it does have an effect. In essence, you are  
technically passing to assumeSafeAppend a pointer to the block that  
contains the slice, not the slice itself. And that block is mutable.

GC.setAttr has similar issues.

How can we force these to be weak-pure? One option suggested is to add a  
hidden void * parameter that defaults to null, to force the issue. But I  
think future compilers may be smart enough to realize that can also be a  
strong-pure call.

Another possibility is to pass the array/pointer by reference. This causes  
its own problems with rvalues, and does not help when the array/pointer  
itself is immutable.

Do we need a way (maybe a pragma) to make a function inside druntime  
"forced weak" pure? Is there an option any of you wonderful geniuses out  
there can figure out without changes to the compiler? :)

Some reference material:

https://github.com/D-Programming-Language/druntime/pull/553
https://github.com/D-Programming-Language/druntime/pull/747
https://github.com/D-Programming-Language/phobos/pull/2025
https://github.com/D-Programming-Language/phobos/pull/2046

-Steve
Mar 25 2014
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 25 March 2014 at 13:30:04 UTC, Steven Schveighoffer 
wrote:
 How can we force these to be weak-pure? One option suggested is 
 to add a hidden void * parameter that defaults to null, to 
 force the issue. But I think future compilers may be smart 
 enough to realize that can also be a strong-pure call.

Maybe instead of using null, we could use global junk? //Global junk struct WeakPure{int a;} __gshared WeakPure dummyWeakPure; //Signature void weakPureFun(WeakPure* p = &dummyWeakPure) {} //Useage weakPureFun(); AFAIK, the compiler should NOT be able to infer strong purity here. I don't know about performance effects though.
Mar 25 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 25 Mar 2014 10:08:20 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Tuesday, 25 March 2014 at 13:30:04 UTC, Steven Schveighoffer wrote:
 How can we force these to be weak-pure? One option suggested is to add  
 a hidden void * parameter that defaults to null, to force the issue.  
 But I think future compilers may be smart enough to realize that can  
 also be a strong-pure call.

Maybe instead of using null, we could use global junk? //Global junk struct WeakPure{int a;} __gshared WeakPure dummyWeakPure; //Signature void weakPureFun(WeakPure* p = &dummyWeakPure) {} //Useage weakPureFun(); AFAIK, the compiler should NOT be able to infer strong purity here. I don't know about performance effects though.

I think again, a sufficiently intelligent compiler could see that dummyWeakPure would not be used in the template, we likely would have to pass it to the underlying opaque function. I'd really like to avoid this option, if it's not going to be optimized down to the minimal case. -Steve
Mar 25 2014
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 03/25/14 14:30, Steven Schveighoffer wrote:
 [...] functions like GC.setAttr and assumeSafeAppend cannot be marked pure.
For example:
 
 auto str = "hello".idup;
 str = str[0..1];
 str.assumeSafeAppend();
 str ~= "iya";
 
 The compiler could rationally elide the call to assumeSafeAppend if it is
pure. We are not using the return value, and the only parameter is immutable.
Since pure functions technically have no side effects, this call can be
eliminated. A recent compiler change made such calls warnings (not using result
of strong-pure function). But assumeSafeAppend really should be weak-pure,
because it does have an effect. In essence, you are technically passing to
assumeSafeAppend a pointer to the block that contains the slice, not the slice
itself. And that block is mutable.
 
 GC.setAttr has similar issues.
 
 How can we force these to be weak-pure?

Functions returning 'void' and w/o mutable args cannot be logically pure, as long as they aren't no-ops, obviously. While this property could be used to render them "weak-pure" in d-speak, this (or any other approach to marking them as such) would not be enough... // assuming 'assumeSafeAppend()' is "weak-pure": string f(string s) pure { s.assumeSafeAppend(); s ~= "d"; return s; } string a = "abc".idup, b = f(a[0..2]); artur
Mar 25 2014
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 25/03/14 15:08, monarch_dodra wrote:
 Maybe instead of using null, we could use global junk?

That really seems like a nasty workaround which could break at any moment depending on how the compiler is tweaked. :-( Better surely to have an explicit indication that a function may not be assumed to be strongly pure?
Mar 25 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 25 Mar 2014 14:49:27 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:

 On 03/25/14 14:30, Steven Schveighoffer wrote:
 [...] functions like GC.setAttr and assumeSafeAppend cannot be marked  
 pure. For example:

 auto str = "hello".idup;
 str = str[0..1];
 str.assumeSafeAppend();
 str ~= "iya";

 The compiler could rationally elide the call to assumeSafeAppend if it  
 is pure. We are not using the return value, and the only parameter is  
 immutable. Since pure functions technically have no side effects, this  
 call can be eliminated. A recent compiler change made such calls  
 warnings (not using result of strong-pure function). But  
 assumeSafeAppend really should be weak-pure, because it does have an  
 effect. In essence, you are technically passing to assumeSafeAppend a  
 pointer to the block that contains the slice, not the slice itself. And  
 that block is mutable.

 GC.setAttr has similar issues.

 How can we force these to be weak-pure?

Functions returning 'void' and w/o mutable args cannot be logically pure, as long as they aren't no-ops, obviously. While this property could be used to render them "weak-pure" in d-speak, this (or any other approach to marking them as such) would not be enough... // assuming 'assumeSafeAppend()' is "weak-pure": string f(string s) pure { s.assumeSafeAppend(); s ~= "d"; return s; } string a = "abc".idup, b = f(a[0..2]);

This could not be elided, because you are using the return value. It would have to be called at least once. However, I am OK with it being elided for a second call with the same input. In other words, this strong pure function does not have the same issues. But I get what you are saying -- just wrap it again, and we're back to the same issue. It is a systematic problem, because the concept is that you are not passing mutable references, even though you are then using those references to mark global data. There is no really good answer I suppose. The huge downside of not marking assumeSafeAppend as weak-pure is that one cannot use assumeSafeAppend or set attributes inside pure functions that deal with arrays with immutable data. For example, the Appender type cannot be marked pure for immutable data. -Steve
Mar 25 2014
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 03/25/14 21:51, Steven Schveighoffer wrote:
 On Tue, 25 Mar 2014 14:49:27 -0400, Artur Skawina <art.08.09 gmail.com> wrote:
 
 On 03/25/14 14:30, Steven Schveighoffer wrote:
 [...] functions like GC.setAttr and assumeSafeAppend cannot be marked pure.
For example:

 auto str = "hello".idup;
 str = str[0..1];
 str.assumeSafeAppend();
 str ~= "iya";

 The compiler could rationally elide the call to assumeSafeAppend if it is
pure. We are not using the return value, and the only parameter is immutable.
Since pure functions technically have no side effects, this call can be
eliminated. A recent compiler change made such calls warnings (not using result
of strong-pure function). But assumeSafeAppend really should be weak-pure,
because it does have an effect. In essence, you are technically passing to
assumeSafeAppend a pointer to the block that contains the slice, not the slice
itself. And that block is mutable.

 GC.setAttr has similar issues.

 How can we force these to be weak-pure?

Functions returning 'void' and w/o mutable args cannot be logically pure, as long as they aren't no-ops, obviously. While this property could be used to render them "weak-pure" in d-speak, this (or any other approach to marking them as such) would not be enough... // assuming 'assumeSafeAppend()' is "weak-pure": string f(string s) pure { s.assumeSafeAppend(); s ~= "d"; return s; } string a = "abc".idup, b = f(a[0..2]);

This could not be elided, because you are using the return value. It would have to be called at least once. However, I am OK with it being elided for a second call with the same input. In other words, this strong pure function does not have the same issues.

It's ok to treat allocator and factory functions as pure, because those really are logically pure, ie can't affect any /visible/ state and return results that are unique. 'assumeSafeAppend()' does have side effects - it can lead to corruption of other, not directly related, data. The only thing that prevents this is a declaration by the programmer -- "Trust me, I own this data and there are no other live aliases". The problem is that this declaration would now happen /implicitly when calling a (strongly) pure function/. Note that the 'a' string in the above example could no longer be "abc" after 'f()' runs. The "pure" concept (re-)definition in D is bad enough even without having to worry about supposedly pure functions stomping on unrelated data.
 But I get what you are saying -- just wrap it again, and we're back to the
same issue. It is a systematic problem, because the concept is that you are not
passing mutable references, even though you are then using those references to
mark global data.
 
 There is no really good answer I suppose. The huge downside of not marking
assumeSafeAppend as weak-pure is that one cannot use assumeSafeAppend or set
attributes inside pure functions that deal with arrays with immutable data. For
example, the Appender type cannot be marked pure for immutable data.

If there was a version of assumeSafeAppend() marked as pure, then that one could be used for such cases, where no aliases are allowed to escape before an ownership transfer happens. It's much easier to audit a contained implementation (eg Appender) than to check the whole program and every caller of every function which relies on a potentially unsafe assumption. If 'assumeSafeAppend()' isn't pure /by default/ then D's purity guarantees are at least a little bit stronger. Or maybe such a change (marking unsafe code as pure and hoping that the programmer knows exactly what (s)he's doing) wouldn't really make the situation significantly worse than it already is - I'm not sure... artur
Mar 25 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 25 Mar 2014 19:30:04 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:

 On 03/25/14 21:51, Steven Schveighoffer wrote:
 On Tue, 25 Mar 2014 14:49:27 -0400, Artur Skawina <art.08.09 gmail.com>  
 wrote:

 On 03/25/14 14:30, Steven Schveighoffer wrote:
 [...] functions like GC.setAttr and assumeSafeAppend cannot be marked  
 pure. For example:

 auto str = "hello".idup;
 str = str[0..1];
 str.assumeSafeAppend();
 str ~= "iya";

 The compiler could rationally elide the call to assumeSafeAppend if  
 it is pure. We are not using the return value, and the only parameter  
 is immutable. Since pure functions technically have no side effects,  
 this call can be eliminated. A recent compiler change made such calls  
 warnings (not using result of strong-pure function). But  
 assumeSafeAppend really should be weak-pure, because it does have an  
 effect. In essence, you are technically passing to assumeSafeAppend a  
 pointer to the block that contains the slice, not the slice itself.  
 And that block is mutable.

 GC.setAttr has similar issues.

 How can we force these to be weak-pure?

Functions returning 'void' and w/o mutable args cannot be logically pure, as long as they aren't no-ops, obviously. While this property could be used to render them "weak-pure" in d-speak, this (or any other approach to marking them as such) would not be enough... // assuming 'assumeSafeAppend()' is "weak-pure": string f(string s) pure { s.assumeSafeAppend(); s ~= "d"; return s; } string a = "abc".idup, b = f(a[0..2]);

This could not be elided, because you are using the return value. It would have to be called at least once. However, I am OK with it being elided for a second call with the same input. In other words, this strong pure function does not have the same issues.

It's ok to treat allocator and factory functions as pure, because those really are logically pure, ie can't affect any /visible/ state and return results that are unique. 'assumeSafeAppend()' does have side effects - it can lead to corruption of other, not directly related, data. The only thing that prevents this is a declaration by the programmer -- "Trust me, I own this data and there are no other live aliases". The problem is that this declaration would now happen /implicitly when calling a (strongly) pure function/. Note that the 'a' string in the above example could no longer be "abc" after 'f()' runs. The "pure" concept (re-)definition in D is bad enough even without having to worry about supposedly pure functions stomping on unrelated data.

You what I wasn't thinking of? parallelization of pure function calls! Right there, that kills this whole idea. In fact, I think at this point, any operation to immutable arrays that affects the metadata for that block, including setting length and appending, must be by definition impure. This is not a good turn of events... -Steve
Mar 25 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Tuesday, 25 March 2014 at 23:30:28 UTC, Artur Skawina wrote:
 It's ok to treat allocator and factory functions as pure, 
 because those really
 are logically pure, ie can't affect any /visible/ state and 
 return results that
 are unique.

Allocators don't return unique results, GC being the primary example.
Mar 26 2014
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 03/26/14 15:36, Kagamin wrote:
 On Tuesday, 25 March 2014 at 23:30:28 UTC, Artur Skawina wrote:
 It's ok to treat allocator and factory functions as pure, because those really
 are logically pure, ie can't affect any /visible/ state and return results that
 are unique.

Allocators don't return unique results, GC being the primary example.

I'm not really sure what your point is. I was saying that "allocator functions", ie functions that return newly allocated objects, can be treated as pure as long they do not have any other visible side effects and the result is "unique", ie does not have any aliases. This will be true in many cases. The /implementation/ won't likely be pure, but from the caller's POV this is irrelevant. A way to make those functions appear pure would allow for more pure /callers/; that's why Steven was asking for a way to "force" purity. Even C compilers are treating 'malloc' etc specially, exactly because they can assume that nothing aliases the returned object. artur
Mar 26 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 26 Mar 2014 10:36:45 -0400, Kagamin <spam here.lot> wrote:

 On Tuesday, 25 March 2014 at 23:30:28 UTC, Artur Skawina wrote:
 It's ok to treat allocator and factory functions as pure, because those  
 really
 are logically pure, ie can't affect any /visible/ state and return  
 results that
 are unique.

Allocators don't return unique results, GC being the primary example.

I think you misunderstand. The result is uniquely referenced, meaning no other pointers to the data are present (except the all-seeing heap). Multiple calls to the allocator may return different results, and each result is uniquely referenced. -Steve
Mar 26 2014
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 03/25/14 14:30, Steven Schveighoffer wrote:
 Of course naturally, with all things this low-level, we have to make decisions
as to what is *provably*  safe/pure/nothrow, and what is *logically*
 safe/pure/nothrow. The most obvious example is memory allocation -- It is
technically not pure as it accesses global state (the heap), but we can
rationalize making it pure since we have more logical knowledge of the
situation than the compiler (i.e. locking the global heap, and realizing that
if a block is unallocated, it is uniquely accessed with the allocation of it).

On 03/26/14 00:30, Artur Skawina wrote:
 It's ok to treat allocator and factory functions as pure, because those really
 are logically pure, ie can't affect any /visible/ state and return results that
 are unique.

That was wrong; treating these funcs as pure won't work. While allocator functions have some traits of pure ones -- no side-effects, their result only depends on the args, and the call can be optimized out when the result in unused -- there is one difference, which isn't expressible in D: the uniqueness of the result. Letting them be pure would mean that important purity based optimization could not be allowed (so that "S* s1 = makeS(32), s2 = makeS(32);" does not break). But this case is easy to deal with -- it's enough to (1) mark the alloc funcs with a 'malloc' attribute, (2) allow calling malloc functions inside pure ones. artur
Apr 05 2014
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 05 Apr 2014 08:26:01 -0400, Artur Skawina <art.08.09 gmail.com>  
wrote:
 On 03/26/14 00:30, Artur Skawina wrote:
 It's ok to treat allocator and factory functions as pure, because those  
 really
 are logically pure, ie can't affect any /visible/ state and return  
 results that
 are unique.

That was wrong; treating these funcs as pure won't work. While allocator functions have some traits of pure ones -- no side-effects, their result only depends on the args, and the call can be optimized out when the result in unused -- there is one difference, which isn't expressible in D: the uniqueness of the result. Letting them be pure would mean that important purity based optimization could not be allowed (so that "S* s1 = makeS(32), s2 = makeS(32);" does not break).

I think this still works, even if makeS is pure, since it would be weak-pure. -Steve
Apr 07 2014