www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - assumeSafeAppend and purity

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
At present, assumeSafeAppend isn't pure - nor is capacity or reserve. AFAIK, 
none of them access any global variables aside from GC-related stuff (and new 
is already allowed in pure functions). All it would take to make them pure is 
to mark the declarations for the C functions that they call pure (and those 
functions aren't part of the public API) and then mark them as pure. Is there 
any reason why this would be a _bad_ idea?

Appender runs into similar difficulties. Would it make sense to just mark the 
various memory-related functions in core.memory as pure (or at least some 
subset of them)? The fact that they aren't in spite of the fact that they 
involve memory like new does really makes it hard to both use pure and 
optimize code in a number of cases - especially when dealing with arrays. We 
might also want to just mark malloc as pure for the same reason.

What are the downsides to doing this? It probably wouldn't be a good idea to 
mark functions like free pure, but the ones that involve allocating or 
reallocating memory seem like good candidates for it.

- Jonathan M Davis
Feb 06 2012
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis 
wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or 
 reserve. AFAIK, none of them access any global variables aside 
 from GC-related stuff (and new is already allowed in pure 
 functions). All it would take to make them pure is to mark the 
 declarations for the C functions that they call pure (and those 
 functions aren't part of the public API) and then mark them as 
 pure. Is there any reason why this would be a _bad_ idea?
pure void f(const(int)[] arr) { debug /* bypass purity check to pretend assumeSafeAppend is pure */ { assumeSafeAppend(arr); } arr ~= 42; } void main() { int[] arr = [0, 1, 2, 3, 4]; f(arr[1..$-1]); assert(arr[4] == 4, "f has a side effect"); }
Feb 06 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, February 07, 2012 02:54:40 Vladimir Panteleev wrote:
 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis
 
 wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or
 reserve. AFAIK, none of them access any global variables aside
 from GC-related stuff (and new is already allowed in pure
 functions). All it would take to make them pure is to mark the
 declarations for the C functions that they call pure (and those
 functions aren't part of the public API) and then mark them as
 pure. Is there any reason why this would be a _bad_ idea?
pure void f(const(int)[] arr) { debug /* bypass purity check to pretend assumeSafeAppend is pure */ { assumeSafeAppend(arr); } arr ~= 42; } void main() { int[] arr = [0, 1, 2, 3, 4]; f(arr[1..$-1]); assert(arr[4] == 4, "f has a side effect"); }
Except that assumeSafeAppend was misused. It's dangerous to use when you don't use it properly regardless of purity. By its very nature, it can screw stuff up. The problem is what to do when you use it _correctly_ and want to use it in a pure function? If used properly, aside from avoiding potential reallocations, assumeSafeAppend has no effect. Should it be made pure, because as long as you're using it properly it's not a problem (and it's always a problem if you misuse it - regardless of purity)? Or should the caller be forced to cast it to pure to use it in a pure function? Given how ugly having to deal with the casting is and the fact that misusing assumeSafeAppend results in very broken code anyway, I'd be inclined to just mark it as pure. - Jonathan M Davis
Feb 06 2012
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 7 February 2012 at 02:02:22 UTC, Jonathan M Davis 
wrote:
 On Tuesday, February 07, 2012 02:54:40 Vladimir Panteleev wrote:
 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis
 
 wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or
 reserve. AFAIK, none of them access any global variables 
 aside
 from GC-related stuff (and new is already allowed in pure
 functions). All it would take to make them pure is to mark 
 the
 declarations for the C functions that they call pure (and 
 those
 functions aren't part of the public API) and then mark them 
 as
 pure. Is there any reason why this would be a _bad_ idea?
pure void f(const(int)[] arr) { debug /* bypass purity check to pretend assumeSafeAppend is pure */ { assumeSafeAppend(arr); } arr ~= 42; } void main() { int[] arr = [0, 1, 2, 3, 4]; f(arr[1..$-1]); assert(arr[4] == 4, "f has a side effect"); }
Except that assumeSafeAppend was misused. It's dangerous to use when you don't use it properly regardless of purity. By its very nature, it can screw stuff up.
When reviewing safe or pure code, there is inevitably a list of language features that reviewers need to be aware of as bypassing the guarantees that said language features provide, for example assumeUnique, calling trusted functions, or faux-pure functions which may lead to side effects. It's a question of how big do we want to let this list grow. The situation where assumeSafeAppend may be misused due to a bug, but the source of the bug is "hidden out of sight" because it happens inside a pure function, is imaginable. Personally, I never use assumeSafeAppend often enough to justify a potential headache later on.
Feb 06 2012
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 06 Feb 2012 21:18:21 -0500, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Tuesday, 7 February 2012 at 02:02:22 UTC, Jonathan M Davis wrote:
 On Tuesday, February 07, 2012 02:54:40 Vladimir Panteleev wrote:
 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis
  wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or
 reserve. AFAIK, none of them access any global variables > aside
 from GC-related stuff (and new is already allowed in pure
 functions). All it would take to make them pure is to mark > the
 declarations for the C functions that they call pure (and > those
 functions aren't part of the public API) and then mark them > as
 pure. Is there any reason why this would be a _bad_ idea?
pure void f(const(int)[] arr) { debug /* bypass purity check to pretend assumeSafeAppend is pure */ { assumeSafeAppend(arr); } arr ~= 42; } void main() { int[] arr = [0, 1, 2, 3, 4]; f(arr[1..$-1]); assert(arr[4] == 4, "f has a side effect"); }
Except that assumeSafeAppend was misused. It's dangerous to use when you don't use it properly regardless of purity. By its very nature, it can screw stuff up.
When reviewing safe or pure code, there is inevitably a list of language features that reviewers need to be aware of as bypassing the guarantees that said language features provide, for example assumeUnique, calling trusted functions, or faux-pure functions which may lead to side effects. It's a question of how big do we want to let this list grow. The situation where assumeSafeAppend may be misused due to a bug, but the source of the bug is "hidden out of sight" because it happens inside a pure function, is imaginable. Personally, I never use assumeSafeAppend often enough to justify a potential headache later on.
by the definition of assumeSafeAppend, using it, and then using data in the now 'unallocated' space results in undefined behavior. It should definitely not be marked safe or trusted, but pure should be ok. You can also do this in a pure function without issue: pure void crap(int *data) {*--data = 5;} Which might or might not be valid, depending on the context. safe != pure, and at some point, even compiler guarantees cannot guarantee validity. At the very least, however, reserve and capacity should be pure. -Steve
Feb 06 2012
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 06 Feb 2012 21:28:49 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Mon, 06 Feb 2012 21:18:21 -0500, Vladimir Panteleev  
 <vladimir thecybershadow.net> wrote:

 On Tuesday, 7 February 2012 at 02:02:22 UTC, Jonathan M Davis wrote:
 On Tuesday, February 07, 2012 02:54:40 Vladimir Panteleev wrote:
 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis
  wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or
 reserve. AFAIK, none of them access any global variables > aside
 from GC-related stuff (and new is already allowed in pure
 functions). All it would take to make them pure is to mark > the
 declarations for the C functions that they call pure (and > those
 functions aren't part of the public API) and then mark them > as
 pure. Is there any reason why this would be a _bad_ idea?
pure void f(const(int)[] arr) { debug /* bypass purity check to pretend assumeSafeAppend is pure */ { assumeSafeAppend(arr); } arr ~= 42; } void main() { int[] arr = [0, 1, 2, 3, 4]; f(arr[1..$-1]); assert(arr[4] == 4, "f has a side effect"); }
Except that assumeSafeAppend was misused. It's dangerous to use when you don't use it properly regardless of purity. By its very nature, it can screw stuff up.
When reviewing safe or pure code, there is inevitably a list of language features that reviewers need to be aware of as bypassing the guarantees that said language features provide, for example assumeUnique, calling trusted functions, or faux-pure functions which may lead to side effects. It's a question of how big do we want to let this list grow. The situation where assumeSafeAppend may be misused due to a bug, but the source of the bug is "hidden out of sight" because it happens inside a pure function, is imaginable. Personally, I never use assumeSafeAppend often enough to justify a potential headache later on.
by the definition of assumeSafeAppend, using it, and then using data in the now 'unallocated' space results in undefined behavior. It should definitely not be marked safe or trusted, but pure should be ok.
I thought of a better solution: pure T[] pureSafeShrink(T)(ref T[] arr, size_t maxLength) { if(maxLength < arr.length) { bool safeToShrink = (arr.capacity == arr.length); arr = arr[0..maxLength]; if(safeToShrink) arr.assumeSafeAppend(); // must workaround purity here } return arr; } This guarantees that you only affect data you were passed. -Steve
Feb 06 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, February 06, 2012 21:40:55 Steven Schveighoffer wrote:
 I thought of a better solution:
 
 pure T[] pureSafeShrink(T)(ref T[] arr, size_t maxLength)
 {
     if(maxLength < arr.length)
     {
         bool safeToShrink = (arr.capacity == arr.length);
         arr = arr[0..maxLength];
         if(safeToShrink) arr.assumeSafeAppend(); // must workaround purity
 here
     }
     return arr;
 }
 
 This guarantees that you only affect data you were passed.
Does it really? What if I did this: auto arr = new int[](63); auto saved = arr; assert(arr.capacity == 63); assert(saved.capacity == 63); pureSafeToShrink(arr, 0); This happens to pass on my computer, though the exact value required for the length will probably vary. So, a slice of the data which is now supposed to be no longer part of any array still exists. Also, given that allocating a new array and then immediately trying to shrink it with pureSafeShrink will only use assumeSafeAppend if you just so happen to have picked a length that lines up with the block size allocated makes it pretty much useless IMHO. I'm only going to use assumeSafe append if I _know_ that it's safe. pureSafeShrink is therefore trying to protect me when I don't need it and is ruining the guarantees that assumeSafeAppend gives me, since it's only better than arr = arr[0 .. maxLength]; if the array just so happens to have the same length as its capacity. So, I don't think that this function really buys us anything. I'm inclined to just make assumeSafeAppend pure. - Jonathan M Davis
Feb 06 2012
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 07 Feb 2012 00:35:14 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Monday, February 06, 2012 21:40:55 Steven Schveighoffer wrote:
 I thought of a better solution:

 pure T[] pureSafeShrink(T)(ref T[] arr, size_t maxLength)
 {
     if(maxLength < arr.length)
     {
         bool safeToShrink = (arr.capacity == arr.length);
         arr = arr[0..maxLength];
         if(safeToShrink) arr.assumeSafeAppend(); // must workaround  
 purity
 here
     }
     return arr;
 }

 This guarantees that you only affect data you were passed.
Does it really? What if I did this: auto arr = new int[](63); auto saved = arr; assert(arr.capacity == 63); assert(saved.capacity == 63); pureSafeToShrink(arr, 0); This happens to pass on my computer, though the exact value required for the length will probably vary. So, a slice of the data which is now supposed to be no longer part of any array still exists.
There is a difference between this and the example given by Vladimir. In Vladimir's example, you are passed an array slice of elements 0-3, but the assumeSafeAppend affects element 4. This violates the spirit of pure having no side effects, even if it is technically sound. I still am undecided as to whether assumeSafeAppend should be pure or not. In this case, the function will only affect array elements that it is passed. The fact that you changed something in data you were passed does not violate pure rules. However, I think my test is too strict, it actually should be arr.capacity != 0. This means that the array ends at the end of valid data (no valid data exists beyond the array). -Steve
Feb 08 2012
prev sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis 
wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or 
 reserve. AFAIK, none of them access any global variables aside 
 from GC-related stuff (and new is already allowed in pure 
 functions). All it would take to make them pure is to mark the 
 declarations for the C functions that they call pure (and those 
 functions aren't part of the public API) and then mark them as 
 pure. Is there any reason why this would be a _bad_ idea?
If precedent means anything, assumeUnique is pure.
Feb 06 2012
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 06 Feb 2012 21:32:05 -0500, Vladimir Panteleev  
<vladimir thecybershadow.net> wrote:

 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or reserve.  
 AFAIK, none of them access any global variables aside from GC-related  
 stuff (and new is already allowed in pure functions). All it would take  
 to make them pure is to mark the declarations for the C functions that  
 they call pure (and those functions aren't part of the public API) and  
 then mark them as pure. Is there any reason why this would be a _bad_  
 idea?
If precedent means anything, assumeUnique is pure.
I think there is a difference -- assumeSafeAppend can make invalid data that you did not pass to the pure function. I'm still not sure if it's pure's job to protect data that you can get to via pointer arithmetic, but I think the two cases are different. -Steve
Feb 06 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 02/07/2012 03:48 AM, Steven Schveighoffer wrote:
 On Mon, 06 Feb 2012 21:32:05 -0500, Vladimir Panteleev
 <vladimir thecybershadow.net> wrote:

 On Tuesday, 7 February 2012 at 01:47:12 UTC, Jonathan M Davis wrote:
 At present, assumeSafeAppend isn't pure - nor is capacity or reserve.
 AFAIK, none of them access any global variables aside from GC-related
 stuff (and new is already allowed in pure functions). All it would
 take to make them pure is to mark the declarations for the C
 functions that they call pure (and those functions aren't part of the
 public API) and then mark them as pure. Is there any reason why this
 would be a _bad_ idea?
If precedent means anything, assumeUnique is pure.
I think there is a difference -- assumeSafeAppend can make invalid data that you did not pass to the pure function. I'm still not sure if it's pure's job to protect data that you can get to via pointer arithmetic, but I think the two cases are different. -Steve
I think both cases are kinda equivalent, but you are right: pure does not protect from undefined behaviour nor is it supposed to guarantee memory safety.
Feb 07 2012