digitalmars.D - Re: safe leak fix?

Jason House <jason.james.house gmail.com> Nov 11 2009

Walter Bright <newshound1 digitalmars.com> Nov 11 2009

Jason House <jason.james.house gmail.com> Nov 12 2009

Jason House <jason.james.house gmail.com> Nov 12 2009

Nick B <"nick_NOSPAM_.barbalich" gmail.com> Nov 12 2009
grauzone <none example.net> Nov 13 2009
Jason House <jason.james.house gmail.com> Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> Nov 12 2009
"Robert Jacques" <sandford jhu.edu> Nov 12 2009
"Denis Koroskin" <2korden gmail.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Denis Koroskin" <2korden gmail.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Denis Koroskin" <2korden gmail.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Denis Koroskin" <2korden gmail.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Steven Schveighoffer" <schveiguy yahoo.com> Nov 13 2009
"Robert Jacques" <sandford jhu.edu> Nov 13 2009

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Consider the code:
 
     safe:
      T[] foo(T[] a) { return a; }
 
      T[] bar()
      {
          T[10] x;
          return foo(x);
      }
 
 Now we've got an escaping reference to bar's stack. This is not memory 
 safe. But giving up slices is a heavy burden.
 
 So it occurred to me that the same solution for closures can be used 
 here. If the address is taken of a stack variable in a safe function, 
 that variable is instead allocated on the heap. If a more advanced 
 compiler could prove that the address does not escape, it could be put 
 back on the stack.
 
 The code will be a little slower, but it will be memory safe. This 
 change wouldn't be done in trusted or unsafe functions.


At a fundamental level, safety isn't about pointers or references to stack
variables, but rather preventing their escape beyond function scope. Scope
parameters could be very useful. Scope delegates were introduced for a similar
reason.

Nov 11 2009

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


The problem is, they aren't so easy to prove correct.

Nov 11 2009

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


I understand the general problem with escape analysis, but I've always thought
of scope input as meaning  noescape. That should lead to easy proofs. If my
 noescape input (or slice of an array on the stack) is passed to a function
without  noescape, it's a compile error. That reduces escape analysis to local
verification.

Nov 12 2009

Jason House <jason.james.house gmail.com> writes:

Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House  
 <jason.james.house gmail.com> wrote:
 
 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always  
 thought of scope input as meaning  noescape. That should lead to easy  
 proofs. If my  noescape input (or slice of an array on the stack) is  
 passed to a function without  noescape, it's a compile error. That  
 reduces escape analysis to local verification.


 The problem is cases like this:
 
 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }
 
 This function is completely safe, but without full escape analysis the  
 compiler can't tell.  The problem is, you don't know how the outputs of a  
 function are connected to its inputs.  strstr cannot have its parameters  
 marked as scope because it returns them.
 
 Scope parameters draw a rather conservative line in the sand, and while I  
 think it's a good optimization we can get right now, it's not going to  
 help in every case.  I'm perfectly fine with  safe being conservative and  
  trusted not, at least the power is still there if you need it.
 
 -Steve


what's the signature of strstr? Your example really boils down to proving
strstr is safe. You're implying that the return of buf from strstr is unsafe.
Indeed, my intentionally short post didn't discuss returning from functions.
Ignoring that for a moment, surely you'd agree the following is safe:

char[] foo(){
    char[100] buf;
    copystringintobuf(buf, "hi");
    return buf[0..2].dup;
}

As far as return types, there are two subtle issues:
1. Returned input argument must preserve the scope requirements of the caller.
A similar problem as return variables and const annotation.
2. Unlike const annotations, there is more than three states for scope, it's
simply a measure of how deep/shallowvariables can be in the stack.

Nov 12 2009

Nick B <"nick_NOSPAM_.barbalich" gmail.com> writes:

Overview:

The AMD Advanced Synchronization Facility (ASF) is an experimental 
instruction set extension for the AMD64 architecture that would provide 
new capabilities for efficient synchronization of access to shared data 
in highly multithreaded applications as well as operating system 
kernels. ASF provides a means for software to inspect and update 
multiple shared memory locations atomically without having to rely on 
locks for mutual exclusion. It is intended to facilitate lock-free 
programming for highly concurrent shared data structures, allowing more 
complex and higher performance manipulation of such structures than is 
practical with traditional techniques based on compare-swap instructions 
such as CMPXCHG16B. ASF code can also interoperate with lock-based code, 
or with _Software Transactional Memory_.

Some basic usage examples of ASF are provided in the specification. 
However, we expect the programming community could readily use the power 
and flexibility of ASF to implement very sophisticated, robust and 
innovative concurrent data structure algorithms, and we encourage such 
experimentation. AMD will be releasing a simulation framework in the 
near future to facilitate this.

AMD is releasing this proposal to encourage the parallel programming 
community to review and comment on it. Such input will help shape the 
ultimate direction of this feature, so that it may best serve the needs 
of advanced parallel application developers.

Discussion:
http://forums.amd.com/devblog/blogpost.cfm?catid=317&threadid=114715&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AmdDeveloperBlogs+%28AMD+Developer+Blogs%29
  and here
http://forums.amd.com/devblog/blogpost.cfm?catid=317&threadid=118419&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AmdDeveloperBlogs+%28AMD+Developer+Blogs%29


The spec can be found here:

http://developer.amd.com/cpu/ASF/Pages/default.aspx


regards
Nick B

Nov 12 2009

grauzone <none example.net> writes:

Denis Koroskin wrote:
 I don't like his proposal at all. It introduces one more hidden 
 allocation. Why not just write
 
 char[] buf = new char[100];
 
 and disallow taking a slice of static array? (Andrei already hinted this 
 will be disallowed in  safe, if I understood him right).


I think that would be the best. What uses of static arrays are there?
- allocating memory "inline" (eh, you better not use SafeD if you need 
this! new always works)
- as value types, e.g. small vectors (don't really need slices in this case)
- ...?

 Speaking about safety, I don't know how we can allow pointers in safe D:
 
 void foo()
 {
    int* p = new int;
    p[1000] = 0; // Will it crash or not? Is this a defined behavior, or 
 not?
    // If not, this must be disallowed in safe D
 }
 
 And, most importantly, *why* users would want to work with pointers in 
 safe D at all?


As far as I understood, pointers are supposed to be allowed in SafeD. 
You just aren't allowed to do the following things:
- pointer arithmetic
- turning arrays into slices
- taking address (messy one!)
- (unsafe) casts between pointers
- array.ptr
- probably more

Nov 13 2009

Jason House <jason.james.house gmail.com> writes:

Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 18:34:48 -0500, Jason House  
 <jason.james.house gmail.com> wrote:
 
 Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references  






 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always
 thought of scope input as meaning  noescape. That should lead to easy
 proofs. If my  noescape input (or slice of an array on the stack) is
 passed to a function without  noescape, it's a compile error. That
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the
 compiler can't tell.  The problem is, you don't know how the outputs of  
 a
 function are connected to its inputs.  strstr cannot have its parameters
 marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and while  
 I
 think it's a good optimization we can get right now, it's not going to
 help in every case.  I'm perfectly fine with  safe being conservative  
 and
  trusted not, at least the power is still there if you need it.

 -Steve


 what's the signature of strstr? Your example really boils down to  
 proving strstr is safe.


 The problem is, strstr isn't safe by itself, it's only safe in certain  
 contexts.  You can't mark it as  trusted either because it has the  
 potential to be unsafe.  I think if safe D heap-allocates when it passes a  
 local address into an unprovable function such as strstr, that's fine with  
 me.
 
 So the signature of strstr has to be unmarked (no  safe or  trusted).


I disagree. Borrowing the syntax from the return const proposal, let's define
strstr as follows:
inout(char[]) strstr(inout(char[]) buf, const(char[]) orig);

What I want that to tell the compiler is that buf, or some piece of buf, is
returned from strstr. (please don't assign any more meaning than that, i.e.
constness of buf). The compiler would then treat the return value with the same
protection as buf, and a return without .dup is a compile error.

I've been in drawn out discussions with you before. If this post and my prior
post don't make you budge from your position than I'll simply give up trying to
convince you. It's not worth the aggregation.

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 08:45:36 -0500, Jason House  
<jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always  
 thought of scope input as meaning  noescape. That should lead to easy  
 proofs. If my  noescape input (or slice of an array on the stack) is  
 passed to a function without  noescape, it's a compile error. That  
 reduces escape analysis to local verification.


The problem is cases like this:

char[] foo()
{
   char buf[100];
   // fill buf
   return strstr(buf, "hi").dup;
}

This function is completely safe, but without full escape analysis the  
compiler can't tell.  The problem is, you don't know how the outputs of a  
function are connected to its inputs.  strstr cannot have its parameters  
marked as scope because it returns them.

Scope parameters draw a rather conservative line in the sand, and while I  
think it's a good optimization we can get right now, it's not going to  
help in every case.  I'm perfectly fine with  safe being conservative and  
 trusted not, at least the power is still there if you need it.

-Steve

Nov 12 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 12 Nov 2009 08:56:25 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always  
 thought of scope input as meaning  noescape. That should lead to easy  
 proofs. If my  noescape input (or slice of an array on the stack) is  
 passed to a function without  noescape, it's a compile error. That  
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the  
 compiler can't tell.  The problem is, you don't know how the outputs of  
 a function are connected to its inputs.  strstr cannot have its  
 parameters marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and while  
 I think it's a good optimization we can get right now, it's not going to  
 help in every case.  I'm perfectly fine with  safe being conservative  
 and  trusted not, at least the power is still there if you need it.

 -Steve


Well something like this should work (note that I'm making the conversion  
 from T[N] to T[] explicit)

auto strstr(T,U)(T src, U substring)
     if(isRandomAccessRange!T &&
        isRandomAccessRange!U &&
        is(ElementType!U == ElementType!T)
{ /* Do strstr */ }

char[] foo() {                     // Returns type char[]
    char buf[100];                  // Of type scope char[100]
    // fill buf                     // "hi" is type immutable(char)[]
    return strstr(buf[], "hi").dup; // returns a lent char[], which is  
dup-ed into a char[], which is okay to return
}

char[] foo2() {                    // Returns type char[]
    char buf[100];                  // Of type scope char[100]
    // fill buf                     // "hi" is type immutable(char)[]
    return strstr(buf[], "hi");     // Error, strstr returns a lent char[],  
not char[].
}

lent char[] foo3() {               // Returns type lent char[]
    char buf[100];                  // Of type scope char[100]
    // fill buf                     // "hi" is type immutable(char)[]
    return strstr(buf[], "hi");     // Error, scope char[] cannot be  
implicitly converted to lent char[] inside a lent char[] function:  
possible escape.
}

char[] foo4() {                    // Returns type char[]
    char buf[100];                  // Of type scope char[100]
    return buf;                     // Error, return type is char[] not  
char[100].
}

char[] foo5() {                    // Returns type char[]
    char buf[100];                  // Of type scope char[100]
    return buf[];                   // Error, return type is char[] not  
scope char[].
}

Here's an (outdated and confusing) proposal I put together a while ago  
(It's pre-DIP): http://prowiki.org/wiki4d/wiki.cgi?OwnershipTypesInD In  
it, I used stack and scope instead of scope and lent.

Nov 12 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 13 Nov 2009 05:23:08 +0300, Nick B  
<nick_NOSPAM_.barbalich gmail.com> wrote:

 Overview:

 The AMD Advanced Synchronization Facility (ASF) is an experimental  
 instruction set extension for the AMD64 architecture that would provide  
 new capabilities for efficient synchronization of access to shared data  
 in highly multithreaded applications as well as operating system  
 kernels. ASF provides a means for software to inspect and update  
 multiple shared memory locations atomically without having to rely on  
 locks for mutual exclusion. It is intended to facilitate lock-free  
 programming for highly concurrent shared data structures, allowing more  
 complex and higher performance manipulation of such structures than is  
 practical with traditional techniques based on compare-swap instructions  
 such as CMPXCHG16B. ASF code can also interoperate with lock-based code,  
 or with _Software Transactional Memory_.

 Some basic usage examples of ASF are provided in the specification.  
 However, we expect the programming community could readily use the power  
 and flexibility of ASF to implement very sophisticated, robust and  
 innovative concurrent data structure algorithms, and we encourage such  
 experimentation. AMD will be releasing a simulation framework in the  
 near future to facilitate this.

 AMD is releasing this proposal to encourage the parallel programming  
 community to review and comment on it. Such input will help shape the  
 ultimate direction of this feature, so that it may best serve the needs  
 of advanced parallel application developers.

 Discussion:
 http://forums.amd.com/devblog/blogpost.cfm?catid=317&threadid=114715&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AmdDeveloperBlogs+%28AMD+Developer+Blogs%29
   and here
 http://forums.amd.com/devblog/blogpost.cfm?catid=317&threadid=118419&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AmdDeveloperBlogs+%28AMD+Developer+Blogs%29


 The spec can be found here:

 http://developer.amd.com/cpu/ASF/Pages/default.aspx


 regards
 Nick B


<offtopic>
Please start a new thread by clicking "Create New" button (or similar),  
not by replying to an existing thread. Thanks!
</offtopic>

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 19:29:28 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Thu, 12 Nov 2009 08:56:25 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references to
 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always  
 thought of scope input as meaning  noescape. That should lead to easy  
 proofs. If my  noescape input (or slice of an array on the stack) is  
 passed to a function without  noescape, it's a compile error. That  
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the  
 compiler can't tell.  The problem is, you don't know how the outputs of  
 a function are connected to its inputs.  strstr cannot have its  
 parameters marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and while  
 I think it's a good optimization we can get right now, it's not going  
 to help in every case.  I'm perfectly fine with  safe being  
 conservative and  trusted not, at least the power is still there if you  
 need it.

 -Steve


 Well something like this should work (note that I'm making the  
 conversion  from T[N] to T[] explicit)

 auto strstr(T,U)(T src, U substring)
      if(isRandomAccessRange!T &&
         isRandomAccessRange!U &&
         is(ElementType!U == ElementType!T)
 { /* Do strstr */ }

 char[] foo() {                     // Returns type char[]
     char buf[100];                  // Of type scope char[100]
     // fill buf                     // "hi" is type immutable(char)[]
     return strstr(buf[], "hi").dup; // returns a lent char[], which is  
 dup-ed into a char[], which is okay to return
 }

 char[] foo2() {                    // Returns type char[]
     char buf[100];                  // Of type scope char[100]
     // fill buf                     // "hi" is type immutable(char)[]
     return strstr(buf[], "hi");     // Error, strstr returns a lent  
 char[], not char[].
 }


Your proposal depends on scope being a type modifier, which it currently  
is not.  I think that's a separate issue to tackle.

-Steve

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Thu, 12 Nov 2009 18:34:48 -0500, Jason House  
<jason.james.house gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references  






 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've always
 thought of scope input as meaning  noescape. That should lead to easy
 proofs. If my  noescape input (or slice of an array on the stack) is
 passed to a function without  noescape, it's a compile error. That
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the
 compiler can't tell.  The problem is, you don't know how the outputs of  
 a
 function are connected to its inputs.  strstr cannot have its parameters
 marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and while  
 I
 think it's a good optimization we can get right now, it's not going to
 help in every case.  I'm perfectly fine with  safe being conservative  
 and
  trusted not, at least the power is still there if you need it.

 -Steve


 what's the signature of strstr? Your example really boils down to  
 proving strstr is safe.


The problem is, strstr isn't safe by itself, it's only safe in certain  
contexts.  You can't mark it as  trusted either because it has the  
potential to be unsafe.  I think if safe D heap-allocates when it passes a  
local address into an unprovable function such as strstr, that's fine with  
me.

So the signature of strstr has to be unmarked (no  safe or  trusted).

 You're implying that the return of buf from strstr is unsafe. Indeed, my  
 intentionally short post didn't discuss returning from functions.  
 Ignoring that for a moment, surely you'd agree the following is safe:

 char[] foo(){
     char[100] buf;
     copystringintobuf(buf, "hi");
     return buf[0..2].dup;
 }

 As far as return types, there are two subtle issues:
 1. Returned input argument must preserve the scope requirements of the  
 caller. A similar problem as return variables and const annotation.
 2. Unlike const annotations, there is more than three states for scope,  
 it's simply a measure of how deep/shallowvariables can be in the stack.


Yes, but I think such an annotation system is unworkable.  I'd rather see  
the compiler annotate into an intermediate file.  Even with those, you  
would be hard pressed to be able to prove all cases when the scope depth  
depends on runtime values.  That would require runtime checks.  I think  
escape analysis is a worthy goal, but very hard to implement.  Just  
allocating when you can't prove anything is a decent solution.

-Steve

Nov 13 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 13 Nov 2009 14:50:58 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 18:34:48 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references  






 stack variables, but rather preventing their escape beyond  






 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've  


 thought of scope input as meaning  noescape. That should lead to easy
 proofs. If my  noescape input (or slice of an array on the stack) is
 passed to a function without  noescape, it's a compile error. That
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the
 compiler can't tell.  The problem is, you don't know how the outputs  
 of a
 function are connected to its inputs.  strstr cannot have its  
 parameters
 marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and  
 while I
 think it's a good optimization we can get right now, it's not going to
 help in every case.  I'm perfectly fine with  safe being conservative  
 and
  trusted not, at least the power is still there if you need it.

 -Steve


 what's the signature of strstr? Your example really boils down to  
 proving strstr is safe.


 The problem is, strstr isn't safe by itself, it's only safe in certain  
 contexts.  You can't mark it as  trusted either because it has the  
 potential to be unsafe.  I think if safe D heap-allocates when it passes  
 a local address into an unprovable function such as strstr, that's fine  
 with me.

 So the signature of strstr has to be unmarked (no  safe or  trusted).


Any example of how unsafe strstr may be? BTW, strstr is no different from  
std.algorithm.find:

import std.algorithm;

char[] foo()
{
     char[5] buf = ['h', 'e', 'l', 'l', 'o'];
     char[] result = find(buf[], 'e');

     return result.dup;
}

I don't see why a general-purpose searching algorithm is unsafe.

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 13 Nov 2009 07:01:25 -0500, Denis Koroskin <2korden gmail.com>  
wrote:

 On Fri, 13 Nov 2009 14:50:58 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 18:34:48 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or  






 stack variables, but rather preventing their escape beyond  






 scope. Scope parameters could be very useful. Scope delegates  






 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've  


 thought of scope input as meaning  noescape. That should lead to  


 proofs. If my  noescape input (or slice of an array on the stack) is
 passed to a function without  noescape, it's a compile error. That
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the
 compiler can't tell.  The problem is, you don't know how the outputs  
 of a
 function are connected to its inputs.  strstr cannot have its  
 parameters
 marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and  
 while I
 think it's a good optimization we can get right now, it's not going to
 help in every case.  I'm perfectly fine with  safe being conservative  
 and
  trusted not, at least the power is still there if you need it.

 -Steve


 what's the signature of strstr? Your example really boils down to  
 proving strstr is safe.


 The problem is, strstr isn't safe by itself, it's only safe in certain  
 contexts.  You can't mark it as  trusted either because it has the  
 potential to be unsafe.  I think if safe D heap-allocates when it  
 passes a local address into an unprovable function such as strstr,  
 that's fine with me.

 So the signature of strstr has to be unmarked (no  safe or  trusted).


 Any example of how unsafe strstr may be?


Sure (with the current compiler):

char[] foo()
{
   char buf[100];
   // fill buf
   return strstr(buf, "hi"); // no .dup, buf escapes
}

The whole meaning of safe is fuzzy, because we don't know the safe rules  
with regards to passing references to local data.  But I think the goal is  
to make it so strstr can be marked as safe.  In order to do that, foo must  
be required to be unmarked or  trusted, or foo allocates buf on the heap.

The point I was trying to make to Jason is that escape analysis is more  
complicated than just marking parameters as  noescape -- you leave out  
some provably safe functions.

 BTW, strstr is no different from std.algorithm.find:

 import std.algorithm;

 char[] foo()
 {
      char[5] buf = ['h', 'e', 'l', 'l', 'o'];
      char[] result = find(buf[], 'e');

      return result.dup;
 }

 I don't see why a general-purpose searching algorithm is unsafe.


It isn't inherently unsafe.  It's just difficult for the compiler to see  
just from a function signature where the data flows, and escape analysis  
requires full data-flow disclosure.  I think with Walter's proposal of  
allocating when a  safe function passes an address to a local to another  
 safe function is perfectly acceptable to me.  I'd also like to see cases  
where you can mark the input parameter as scope, potentially optimizing  
out the allocation (but then you cannot return the scope parameter or a  
reference to any part of it).

-Steve

Nov 13 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 13 Nov 2009 15:29:20 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Fri, 13 Nov 2009 07:01:25 -0500, Denis Koroskin <2korden gmail.com>  
 wrote:

 On Fri, 13 Nov 2009 14:50:58 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 18:34:48 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Steven Schveighoffer Wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or  






 stack variables, but rather preventing their escape beyond  






 scope. Scope parameters could be very useful. Scope delegates  






 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've  


 thought of scope input as meaning  noescape. That should lead to  


 proofs. If my  noescape input (or slice of an array on the stack)  


 passed to a function without  noescape, it's a compile error. That
 reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis  
 the
 compiler can't tell.  The problem is, you don't know how the outputs  
 of a
 function are connected to its inputs.  strstr cannot have its  
 parameters
 marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and  
 while I
 think it's a good optimization we can get right now, it's not going  
 to
 help in every case.  I'm perfectly fine with  safe being  
 conservative and
  trusted not, at least the power is still there if you need it.

 -Steve


 what's the signature of strstr? Your example really boils down to  
 proving strstr is safe.


 The problem is, strstr isn't safe by itself, it's only safe in certain  
 contexts.  You can't mark it as  trusted either because it has the  
 potential to be unsafe.  I think if safe D heap-allocates when it  
 passes a local address into an unprovable function such as strstr,  
 that's fine with me.

 So the signature of strstr has to be unmarked (no  safe or  trusted).


 Any example of how unsafe strstr may be?


 Sure (with the current compiler):

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi"); // no .dup, buf escapes
 }


No, no, no! It's foo which is unsafe in your example, not strstr!

 The whole meaning of safe is fuzzy, because we don't know the safe rules  
 with regards to passing references to local data.  But I think the goal  
 is to make it so strstr can be marked as safe.  In order to do that, foo  
 must be required to be unmarked or  trusted, or foo allocates buf on the  
 heap.

 The point I was trying to make to Jason is that escape analysis is more  
 complicated than just marking parameters as  noescape -- you leave out  
 some provably safe functions.

 BTW, strstr is no different from std.algorithm.find:

 import std.algorithm;

 char[] foo()
 {
      char[5] buf = ['h', 'e', 'l', 'l', 'o'];
      char[] result = find(buf[], 'e');

      return result.dup;
 }

 I don't see why a general-purpose searching algorithm is unsafe.


 It isn't inherently unsafe.  It's just difficult for the compiler to see  
 just from a function signature where the data flows, and escape analysis  
 requires full data-flow disclosure.  I think with Walter's proposal of  
 allocating when a  safe function passes an address to a local to another  
  safe function is perfectly acceptable to me.  I'd also like to see  
 cases where you can mark the input parameter as scope, potentially  
 optimizing out the allocation (but then you cannot return the scope  
 parameter or a reference to any part of it).

 -Steve


I don't like his proposal at all. It introduces one more hidden  
allocation. Why not just write

char[] buf = new char[100];

and disallow taking a slice of static array? (Andrei already hinted this  
will be disallowed in  safe, if I understood him right).

Speaking about safety, I don't know how we can allow pointers in safe D:

void foo()
{
    int* p = new int;
    p[1000] = 0; // Will it crash or not? Is this a defined behavior, or  
not?
    // If not, this must be disallowed in safe D
}

And, most importantly, *why* users would want to work with pointers in  
safe D at all?

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 13 Nov 2009 07:46:02 -0500, Denis Koroskin <2korden gmail.com>  
wrote:


 Sure (with the current compiler):

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi"); // no .dup, buf escapes
 }


 No, no, no! It's foo which is unsafe in your example, not strstr!


OK, tell me if foo is now safe or unsafe:

 safe char[] bar(char[] x);

char[] foo()
{
   char buf[100];
   return bar(buf);
}

This is how the compiler looks at the code.  It doesn't know what strstr  
does.  For all it knows, bar (or strstr) could allocate heap data based on  
x and is perfectly safe.

 I don't like his proposal at all. It introduces one more hidden  
 allocation. Why not just write

 char[] buf = new char[100];

 and disallow taking a slice of static array? (Andrei already hinted this  
 will be disallowed in  safe, if I understood him right).


A major performance gain in D is to use stack-allocated buffers for things  
as opposed to heap-allocated buffers.  The proposal allows lots of  
existing code to be marked as safe without having to add the explicit  
allocations.

I have mixed feelings on the whole thing.  I think disallowing a high  
performance technique such as stack buffer allocation is going to make  
safe code much less attractive, especially when it's very easy to write  
provably safe code that uses stack buffers.  It's going to confuse and  
frustrate developers that want to use such buffers.

The one good thing I see about the proposal is the heap allocations could  
be optimized out later if the compiler can get smarter, without having to  
go remove all those manual heap allocations you added.

The other side of the coin is that you just have to mark your functions as  
 trusted instead of safe.  Then when the compiler gets smarter, you have  
to go back and change those functions to safe.  That's also a possible  
solution.

 Speaking about safety, I don't know how we can allow pointers in safe D:

 void foo()
 {
     int* p = new int;
     p[1000] = 0; // Will it crash or not? Is this a defined behavior, or  
 not?
     // If not, this must be disallowed in safe D
 }

 And, most importantly, *why* users would want to work with pointers in  
 safe D at all?


I agree with you on this.  But slicing a stack array is not exactly the  
same as taking a pointer and using unbounded pointer arithmetic.  It has  
the potential to escape scope, but not the potential (at least in safe  
mode) of accessing data outside the array.

-Steve

Nov 13 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Fri, 13 Nov 2009 16:16:29 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Fri, 13 Nov 2009 07:46:02 -0500, Denis Koroskin <2korden gmail.com>  
 wrote:


 Sure (with the current compiler):

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi"); // no .dup, buf escapes
 }


 No, no, no! It's foo which is unsafe in your example, not strstr!


 OK, tell me if foo is now safe or unsafe:

  safe char[] bar(char[] x);

 char[] foo()
 {
    char buf[100];
    return bar(buf);
 }


It is unsafe even if bar doesn't return anything (it could store reference  
to a buf in some global variable, for example). Or accessing globals is  
considered unsafe now?

It is foo's fault that pointer to a stack allocated buffer is passed and  
returned outside of the scope. The dangerous line is buf[], which gets a  
slice out of a static array, not return bar(...). You could as well write:

char[] foo()
{
     char buf[100];
     return buf[]; // no more bar, but code is still dangerous
}

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 13 Nov 2009 08:45:28 -0500, Denis Koroskin <2korden gmail.com>  
wrote:

 On Fri, 13 Nov 2009 16:16:29 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Fri, 13 Nov 2009 07:46:02 -0500, Denis Koroskin <2korden gmail.com>  
 wrote:


 Sure (with the current compiler):

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi"); // no .dup, buf escapes
 }


 No, no, no! It's foo which is unsafe in your example, not strstr!


 OK, tell me if foo is now safe or unsafe:

  safe char[] bar(char[] x);

 char[] foo()
 {
    char buf[100];
    return bar(buf);
 }


 It is unsafe even if bar doesn't return anything (it could store  
 reference to a buf in some global variable, for example). Or accessing  
 globals is considered unsafe now?


No, it's *potentially* unsafe.  If bar is written like this:

 safe char[] bar(char[] x){ return x.dup;}

Then bar is completely safe in all contexts, and therefore foo is  
completely safe.  Merely taking the address of a stack variable does not  
make a function unsafe.

Is this unsafe?

char[] foo()
{
   char buf[100];
   return buf[0..50].dup;
}

What about this?

void foo(int a, int b)
{
   swap(a, b); // uses references to local variables, what if swap stores a  
reference to one of its args in a global?
}

You might understand that if these kinds of thing is not allowed to be  
marked as safe, you might have non-stop complaints from new users and  
critics of D about how D's "safety" features are a joke, just like Vista's  
security popups are a joke.  And then everything gets marked as  trusted  
or unmarked, and safed becomes a complete waste of time.  We need to  
choose rules that are good for safety, but which allow intuitive code to  
be written.

 It is foo's fault that pointer to a stack allocated buffer is passed and  
 returned outside of the scope. The dangerous line is buf[], which gets a  
 slice out of a static array, not return bar(...). You could as well  
 write:

 char[] foo()
 {
      char buf[100];
      return buf[]; // no more bar, but code is still dangerous
 }


The line is most of the time fuzzy whose fault it is.  This is why  
definitions of what is allowed and what is not are important.  Your  
example looks obvious, but there is code that does not look so obvious.   
Unless you know exactly the flow of the data in the functions you call,  
then you can't prove whether it's safe or not.  I hope that someday the  
compiler can prove safety even through function calls, but we are a long  
ways away from that.

-Steve

Nov 13 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 13 Nov 2009 08:31:07 -0500, Jason House  
<jason.james.house gmail.com> wrote:

 Steven Schveighoffer Wrote:

 So the signature of strstr has to be unmarked (no  safe or  trusted).


 I disagree. Borrowing the syntax from the return const proposal, let's  
 define strstr as follows:
 inout(char[]) strstr(inout(char[]) buf, const(char[]) orig);

 What I want that to tell the compiler is that buf, or some piece of buf,  
 is returned from strstr. (please don't assign any more meaning than  
 that, i.e. constness of buf). The compiler would then treat the return  
 value with the same protection as buf, and a return without .dup is a  
 compile error.

 I've been in drawn out discussions with you before. If this post and my  
 prior post don't make you budge from your position than I'll simply give  
 up trying to convince you. It's not worth the aggregation.


Sure, we can stop discussing.  I'll just say I think the escape analysis  
problem is more complicated than the scoped const problem.  Simply  
because, scoped parameters are not necessarily non-mutable, whereas scoped  
const parameters are always treated as const.  scoped const has one output  
(the return value) and N inputs.  escape analysis has N inputs and M  
outputs.  Annotation is going to be very hard for functions like swap.   
Simplifications are possible, but like I said, conservative line in the  
sand.

-Steve

Nov 13 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 13 Nov 2009 06:42:24 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 19:29:28 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Thu, 12 Nov 2009 08:56:25 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 12 Nov 2009 08:45:36 -0500, Jason House  
 <jason.james.house gmail.com> wrote:

 Walter Bright Wrote:

 Jason House wrote:
 At a fundamental level, safety isn't about pointers or references  


 stack variables, but rather preventing their escape beyond function
 scope. Scope parameters could be very useful. Scope delegates were
 introduced for a similar reason.


 The problem is, they aren't so easy to prove correct.


 I understand the general problem with escape analysis, but I've  
 always thought of scope input as meaning  noescape. That should lead  
 to easy proofs. If my  noescape input (or slice of an array on the  
 stack) is passed to a function without  noescape, it's a compile  
 error. That reduces escape analysis to local verification.


 The problem is cases like this:

 char[] foo()
 {
    char buf[100];
    // fill buf
    return strstr(buf, "hi").dup;
 }

 This function is completely safe, but without full escape analysis the  
 compiler can't tell.  The problem is, you don't know how the outputs  
 of a function are connected to its inputs.  strstr cannot have its  
 parameters marked as scope because it returns them.

 Scope parameters draw a rather conservative line in the sand, and  
 while I think it's a good optimization we can get right now, it's not  
 going to help in every case.  I'm perfectly fine with  safe being  
 conservative and  trusted not, at least the power is still there if  
 you need it.

 -Steve


 Well something like this should work (note that I'm making the  
 conversion  from T[N] to T[] explicit)

 auto strstr(T,U)(T src, U substring)
      if(isRandomAccessRange!T &&
         isRandomAccessRange!U &&
         is(ElementType!U == ElementType!T)
 { /* Do strstr */ }

 char[] foo() {                     // Returns type char[]
     char buf[100];                  // Of type scope char[100]
     // fill buf                     // "hi" is type immutable(char)[]
     return strstr(buf[], "hi").dup; // returns a lent char[], which is  
 dup-ed into a char[], which is okay to return
 }

 char[] foo2() {                    // Returns type char[]
     char buf[100];                  // Of type scope char[100]
     // fill buf                     // "hi" is type immutable(char)[]
     return strstr(buf[], "hi");     // Error, strstr returns a lent  
 char[], not char[].
 }


 Your proposal depends on scope being a type modifier, which it currently  
 is not.  I think that's a separate issue to tackle.

 -Steve


Actually, scope is currently a somewhat-limited type modifier (i.e. scope  
classes, scope class allocation). My use of it here was mainly to  
illustrate the compiler's internal representation. Also, the use of scope  
keyword in the proposal was based on a blog by Walter, where 'scope'  
became a more universal type modifier.

The point was you can handle a large number of escape analysis cases  
correctly using only the type system (more, of course with type  
system+local analysis).

Nov 13 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: safe leak fix?