www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - safe leak fix?

reply Walter Bright <newshound1 digitalmars.com> writes:
Consider the code:

    safe:
     T[] foo(T[] a) { return a; }

     T[] bar()
     {
         T[10] x;
         return foo(x);
     }

Now we've got an escaping reference to bar's stack. This is not memory 
safe. But giving up slices is a heavy burden.

So it occurred to me that the same solution for closures can be used 
here. If the address is taken of a stack variable in a safe function, 
that variable is instead allocated on the heap. If a more advanced 
compiler could prove that the address does not escape, it could be put 
back on the stack.

The code will be a little slower, but it will be memory safe. This 
change wouldn't be done in trusted or unsafe functions.
Nov 11 2009
next sibling parent BCS <none anon.com> writes:
Hello Walter,

 Consider the code:
 
  safe:
 T[] foo(T[] a) { return a; }
 T[] bar()
 {
 T[10] x;
 return foo(x);
 }
 Now we've got an escaping reference to bar's stack. This is not memory
 safe. But giving up slices is a heavy burden.
 
 So it occurred to me that the same solution for closures can be used
 here. If the address is taken of a stack variable in a safe function,
 that variable is instead allocated on the heap. If a more advanced
 compiler could prove that the address does not escape, it could be put
 back on the stack.
 
 The code will be a little slower, but it will be memory safe. This
 change wouldn't be done in trusted or unsafe functions.
 

Sounds good. If it happens, I'd vote for a push on the static analysis to do those proofs.
Nov 11 2009
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2009-11-11 16:47:10 -0500, Walter Bright <newshound1 digitalmars.com> said:

 Consider the code:
 
     safe:
      T[] foo(T[] a) { return a; }
 
      T[] bar()
      {
          T[10] x;
          return foo(x);
      }
 
 Now we've got an escaping reference to bar's stack. This is not memory 
 safe. But giving up slices is a heavy burden.
 
 So it occurred to me that the same solution for closures can be used 
 here. If the address is taken of a stack variable in a safe function, 
 that variable is instead allocated on the heap. If a more advanced 
 compiler could prove that the address does not escape, it could be put 
 back on the stack.
 
 The code will be a little slower, but it will be memory safe. This 
 change wouldn't be done in trusted or unsafe functions.

Interesting. This is exactly what I've proposed a few months ago while we were endlessly discussing about scope as a function argument modifier: automatic heap allocation of all escaping variables. Of course I'm all for it. :-) But now you should consider wether or not it should do the same in unsafe D. If it doesn't do the same unsafe D will crash for things safe D won't crash. If you do this in unsafe D you need a way to force a variable not be heap allocated whatever happens. (Perhaps using 'scope' as a storage modifier for variables.) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 11 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 Consider the code:
 
    safe:
     T[] foo(T[] a) { return a; }
 
     T[] bar()
     {
         T[10] x;
         return foo(x);
     }
 
 Now we've got an escaping reference to bar's stack. This is not memory 
 safe. But giving up slices is a heavy burden.
 
 So it occurred to me that the same solution for closures can be used 
 here. If the address is taken of a stack variable in a safe function, 
 that variable is instead allocated on the heap. If a more advanced 
 compiler could prove that the address does not escape, it could be put 
 back on the stack.
 
 The code will be a little slower, but it will be memory safe. This 
 change wouldn't be done in trusted or unsafe functions.

That's just idiotic. One of the main uses of static arrays is to _avoid_ heap memory allocation in the first place. Do what you want within safe, but leave "unsafe" (oh god what a pejorative) alone.
Nov 11 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 Walter Bright wrote:
 The code will be a little slower, but it will be memory safe. This 
 change wouldn't be done in trusted or unsafe functions.

heap memory allocation in the first place. Do what you want within safe, but leave "unsafe" (oh god what a pejorative) alone.

Well, I did propose only doing this in safe functions! Also, I agree with "unsafe" being a pejorative. Got any better ideas? "unchecked"?
Nov 11 2009
next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 grauzone wrote:
 Walter Bright wrote:
 The code will be a little slower, but it will be memory safe. This 
 change wouldn't be done in trusted or unsafe functions.

_avoid_ heap memory allocation in the first place. Do what you want within safe, but leave "unsafe" (oh god what a pejorative) alone.

Well, I did propose only doing this in safe functions!

In this case, the semantic difference between safe and unsafe functions will cause trouble, and you'd eventually end up imposing the "safe" semantics upon unsafe functions. I'd vote for disallowing slicing in safe functions. Safe code can just use dynamic arrays instead. One other important use of arrays will be small SSE optimized vectors (as far as I understood that), but those should be fine in safe mode; usually you won't want to slice them.
 Also, I agree with "unsafe" being a pejorative. Got any better ideas? 
 "unchecked"?

Some brainstorming: highperf, fast mode, system mode, lowlevel, bare-metal, turbo mode (silly but fun), ...?
Nov 11 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 In this case, the semantic difference between safe and unsafe functions 
 will cause trouble, and you'd eventually end up imposing the "safe" 
 semantics upon unsafe functions.

May be.
 I'd vote for disallowing slicing in safe functions. Safe code can just 
 use dynamic arrays instead. One other important use of arrays will be 
 small SSE optimized vectors (as far as I understood that), but those 
 should be fine in safe mode; usually you won't want to slice them.

I thought of that, but I think it's too restrictive.
 
 Also, I agree with "unsafe" being a pejorative. Got any better ideas? 
 "unchecked"?

Some brainstorming: highperf, fast mode, system mode, lowlevel, bare-metal, turbo mode (silly but fun), ...?

"system" sounds good.
Nov 11 2009
next sibling parent reply grauzone <none example.net> writes:
Walter Bright wrote:
 grauzone wrote:
 In this case, the semantic difference between safe and unsafe 
 functions will cause trouble, and you'd eventually end up imposing the 
 "safe" semantics upon unsafe functions.

May be.

Returning a slice to a local static array would be fine in safe mode, but lead to silent corruption in "unsafe" mode. I think everyone would assume that, if something works in safe mode, it should also work in unsafe mode. So, it's really a "must", or is there some other way around it?
Nov 11 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
grauzone wrote:
 Walter Bright wrote:
 grauzone wrote:
 In this case, the semantic difference between safe and unsafe 
 functions will cause trouble, and you'd eventually end up imposing 
 the "safe" semantics upon unsafe functions.

May be.

Returning a slice to a local static array would be fine in safe mode, but lead to silent corruption in "unsafe" mode. I think everyone would assume that, if something works in safe mode, it should also work in unsafe mode.

It's a good point.
 So, it's really a "must", or is there some other way around it?

I don't know.
Nov 11 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

I thought of that, but I think it's too restrictive.<

I agree. A possible solution to this problem (Ellery Newcomer may have said the same thing): in safe functions require locally some kind of annotation that turns that into a safe heap allocation (and at the same time it denotes such heap allocation in a visible way). In unsafe functions such annotation is optional, while in safe code you must put it if you want to use the feature.
"system" sounds good.<

"unsafe" is still good that purpose because it's like the __ for gshared: it's designed on purpose to look less nice. Bye, bearophile
Nov 11 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 "system" sounds good.<

"unsafe" is still good that purpose because it's like the __ for gshared: it's designed on purpose to look less nice.

"Unsafe" is also a misnomer. It implies the code is broken. I don't like it.
Nov 11 2009
parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 bearophile wrote:
 "system" sounds good.<

"unsafe" is still good that purpose because it's like the __ for gshared: it's designed on purpose to look less nice.

"Unsafe" is also a misnomer. It implies the code is broken. I don't like it.

There are definitely functions which are dangerous if you pass them invalid parameters. ie, "use at own risk" -- any function which uses them needs to add its own tests. I think something which implies "you should think before you use this function" is reasonable. I don't care at all what the name is, however. system would be OK.
Nov 12 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Don (nospam nospam.com)'s article
 There are definitely functions which are dangerous if you pass them
 invalid parameters. ie, "use at own risk" -- any function which uses
 them needs to add its own tests. I think something which implies "you
 should think before you use this function" is reasonable.
 I don't care at all what the name is, however.  system would be OK.

Yeah, and sometimes the functions that are unsafe when passed invalid parameters aren't obvious. Granted this is an extreme corner case, but I recently debugged an access violation that was occurring in a well-tested sorting function that I would have definitely annotated trusted. I was sorting on floating point keys, and it turned out there were NaNs in there and the sort function assumed that there would be a proper total ordering. If the pivot element was a NaN, it would therefore enter an endless loop because there was nothing in the array that was <= the pivot, until it read past the end of the array. This was a latent bug for a long time and only showed up when I ran the program with parameters that generated NaNs. Of course the real solution here is to get rid of the #()&# lack of total ordering for floats.
Nov 12 2009
parent Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 == Quote from Don (nospam nospam.com)'s article
 There are definitely functions which are dangerous if you pass them
 invalid parameters. ie, "use at own risk" -- any function which uses
 them needs to add its own tests. I think something which implies "you
 should think before you use this function" is reasonable.
 I don't care at all what the name is, however.  system would be OK.

Yeah, and sometimes the functions that are unsafe when passed invalid parameters aren't obvious.

Also, even safe functions cannot be guaranteed safe if they are passed garbage as arguments.
Nov 12 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Also, I agree with "unsafe" being a pejorative. Got any better ideas? 
 "unchecked"?

Naming it "unsafe" is OK because it's already used in C#, and because unsafe code is indeed worse than safe code (because lot of today people want safety), so it's a fit name. Languages like C#, Java, etc, start being designed for safety from day 0, and then they add optimizations on top to make them fast too (and today Java is sometimes about as fast as C++, despite it lacks things as arrays of structs). D is now doing the opposite, but I think this try to create a SafeD may require a lot of work and in the end holes in the safety net may be possible still... I hope the design of SafeD will go well. Bye, bearophile
Nov 11 2009
prev sibling next sibling parent Brad Roberts <braddr bellevue.puremagic.com> writes:
On Wed, 11 Nov 2009, Walter Bright wrote:

 So it occurred to me that the same solution for closures can be used here. If
 the address is taken of a stack variable in a safe function, that variable is
 instead allocated on the heap. If a more advanced compiler could prove that
 the address does not escape, it could be put back on the stack.
 
 The code will be a little slower, but it will be memory safe. This change
 wouldn't be done in trusted or unsafe functions.

I think safe vs unsafe causing a behavior change is a really bad idea. They're contracts / constraints, not modifiers. - Brad
Nov 11 2009
prev sibling next sibling parent Frank Benoit <keinfarbton googlemail.com> writes:
Walter Bright schrieb:
 Consider the code:
 
    safe:
     T[] foo(T[] a) { return a; }
 
     T[] bar()
     {
         T[10] x;
         return foo(x);
     }
 

If D would have something like a slice-info which could be returned instead of the slice itself, then foo would be safe. slice-info would be something like a struct/Tuple storing the start and end index. That applied onto the original array gives the slice. SliceInfo foo( T[] a){ // do something, resulting in e.g. a[2..6] return SliceInfo(2, 6); } T[] bar(){ T[] x = new T[10]; return x[foo(x)]; // safe compile OK } T[] bar(){ T[10] x; return x[foo(x)]; // safe error, because x slice escapes } This shifts responsibility of memory safety to the caller with little extra effort.
Nov 11 2009
prev sibling next sibling parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
Walter Bright wrote:
 Consider the code:
 
    safe:
     T[] foo(T[] a) { return a; }
 
     T[] bar()
     {
         T[10] x;
         return foo(x);
     }
 
 Now we've got an escaping reference to bar's stack. This is not memory
 safe. But giving up slices is a heavy burden.
 
 So it occurred to me that the same solution for closures can be used
 here. If the address is taken of a stack variable in a safe function,
 that variable is instead allocated on the heap. If a more advanced
 compiler could prove that the address does not escape, it could be put
 back on the stack.
 
 The code will be a little slower, but it will be memory safe. This
 change wouldn't be done in trusted or unsafe functions.

Enter random annotation which asserts the function is allocated on the stack?
Nov 11 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 11 Nov 2009 16:47:10 -0500, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Consider the code:

     safe:
      T[] foo(T[] a) { return a; }

      T[] bar()
      {
          T[10] x;
          return foo(x);
      }

 Now we've got an escaping reference to bar's stack. This is not memory  
 safe. But giving up slices is a heavy burden.

 So it occurred to me that the same solution for closures can be used  
 here. If the address is taken of a stack variable in a safe function,  
 that variable is instead allocated on the heap. If a more advanced  
 compiler could prove that the address does not escape, it could be put  
 back on the stack.

 The code will be a little slower, but it will be memory safe. This  
 change wouldn't be done in trusted or unsafe functions.

This sounds acceptable to me. In response to others making claims about modifying behavior, you can get a safe function that can use unsafe behavior by using trusted if you wish. I'm assuming this behavior translates to local non-array variables? Can we allow the scope variable hack that is afforded for delegates: safe int sum(scope int[] a) { int retval = 0; foreach(i; a) retval += i; return retval;} This would not result in a heap allocation when called with a static array. -Steve
Nov 12 2009
prev sibling parent =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

Walter Bright wrote:
 Consider the code:
=20
    safe:
     T[] foo(T[] a) { return a; }
=20
     T[] bar()
     {
         T[10] x;
         return foo(x);
     }
=20
 Now we've got an escaping reference to bar's stack. This is not memory =

 safe. But giving up slices is a heavy burden.
=20
 So it occurred to me that the same solution for closures can be used=20
 here. If the address is taken of a stack variable in a safe function,=20
 that variable is instead allocated on the heap. If a more advanced=20
 compiler could prove that the address does not escape, it could be put =

 back on the stack.
=20
 The code will be a little slower, but it will be memory safe. This=20
 change wouldn't be done in trusted or unsafe functions.

Cyclone has this neat notion that a pointer is associated to a=20 memory "region" (by default, there are 3 regions: the data segment,=20 the heap and the stack of the current function, but you can have=20 user-defined regions). In this case, the function "foo" would have=20 the type: region (`R) T[] foo ( region (`R) T[] a) where `R is an abstract region name meaning that the return value is=20 in the same region as the argument. When compiling "bar", the=20 compiler would then be able to see that it is returning a pointer to=20 bar's stack region and refuse. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Nov 12 2009