www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Escape Analysis on reddit

reply Walter Bright <newshound1 digitalmars.com> writes:
http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/
Oct 31 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

The question is, in the example: int* bar(int* p) { return p; } int* foo() { int x = 3; return bar(&x); } where is the problem? Is the problem in bar, or is the problem in foo? The real problem is in foo, because there's technically nothing wrong with bar. So what ends up happening in your scheme is this: int *bar(int *p) { return p;} void foo(scope int *p) { int *x = bar(p); // no escape, but compile error } So yes, you are protected, but at the cost of not being able to write code that is provably safe. The end result is you end up removing scope where it really should be, and then virally, it makes you remove scope decorations everywhere else. I predict that if this is implemented, nobody will be able to use it. In order for this to be useful, we need to have a way to express total relationships between parameters and the return value. And that would probably be so convoluted that it's not worth the syntax. I'll reiterate my challenge from another thread: void f(ref int *a, int *b, int *c) { if(*b < *c) a = b; else a = c;} struct S { int *v; } int *f2(S* s) { return s.v;} void f3(ref int *a, ref int *b, ref int *c) { int *tmp = a; a = b; b = c; c = tmp; } void f4() { int a=1, b=2, c=3; auto ap = &a; auto bp = &b; auto cp = &c; S s(cp); f(ap, bp, cp); b = *f2(&s); f3(ap, bp, cp); } Please tell me how I should mark f, S, f2, and f3 such that f4 compiles. Note that this code all works in D1 and provably has no problems with escaping values, since f4 returns void, and no globals are accessed. -Steve
Oct 31 2008
next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Fri, 31 Oct 2008 23:17:41 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Walter Bright" wrote
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

The question is, in the example: int* bar(int* p) { return p; } int* foo() { int x = 3; return bar(&x); } where is the problem? Is the problem in bar, or is the problem in foo? The real problem is in foo, because there's technically nothing wrong with bar. So what ends up happening in your scheme is this: int *bar(int *p) { return p;} void foo(scope int *p) { int *x = bar(p); // no escape, but compile error } So yes, you are protected, but at the cost of not being able to write code that is provably safe. The end result is you end up removing scope where it really should be, and then virally, it makes you remove scope decorations everywhere else. I predict that if this is implemented, nobody will be able to use it. In order for this to be useful, we need to have a way to express total relationships between parameters and the return value. And that would probably be so convoluted that it's not worth the syntax. I'll reiterate my challenge from another thread: void f(ref int *a, int *b, int *c) { if(*b < *c) a = b; else a = c;} struct S { int *v; } int *f2(S* s) { return s.v;} void f3(ref int *a, ref int *b, ref int *c) { int *tmp = a; a = b; b = c; c = tmp; } void f4() { int a=1, b=2, c=3; auto ap = &a; auto bp = &b; auto cp = &c; S s(cp); f(ap, bp, cp); b = *f2(&s); f3(ap, bp, cp); } Please tell me how I should mark f, S, f2, and f3 such that f4 compiles. Note that this code all works in D1 and provably has no problems with escaping values, since f4 returns void, and no globals are accessed. -Steve

I will state the question slightly different. f4 is now pure. What changes should be done to source code (if any) so that it compiles? pure void f4() { int a=1, b=2, c=3; auto ap = &a; auto bp = &b; auto cp = &c; S s(cp); f(ap, bp, cp); b = *f2(&s); f3(ap, bp, cp); }
Oct 31 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Walter Bright" wrote
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

The question is, in the example: int* bar(int* p) { return p; } int* foo() { int x = 3; return bar(&x); } where is the problem? Is the problem in bar, or is the problem in foo? The real problem is in foo, because there's technically nothing wrong with bar.

The problem is that foo does not know enough about bar to deem the call correct. Notice that if you consider local functions instead of ints, that gets you straight to the delegates case that is the crux of the matter.
 So what ends up happening in your scheme is this:
 
 int *bar(int *p) { return p;}
 
 void foo(scope int *p)
 {
    int *x = bar(p); // no escape, but compile error
 }

Well I was talking to Walter how in compiling a function there are two scopes to track: that of parameters, and that of locals inside functions. The former includes the latter. Bar should be defined as: scope int* bar(scope int* p) { return p; } which is easy to check as long as some conservativeness is acceptable.
 So yes, you are protected, but at the cost of not being able to write code 
 that is provably safe.  The end result is you end up removing scope where it 
 really should be, and then virally, it makes you remove scope decorations 
 everywhere else.  I predict that if this is implemented, nobody will be able 
 to use it.

There are many cases in which statically typed programs disallow valid constructs. Each case is a matter of nuance, but by and large types have turned out to be successful.
 In order for this to be useful, we need to have a way to express total 
 relationships between parameters and the return value.  And that would 
 probably be so convoluted that it's not worth the syntax.

I don't think there is a need to express the exact relationship.
 I'll reiterate my challenge from another thread:
 
 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}
 
 struct S
 {
    int *v;
 }
 
 int *f2(S* s) { return s.v;}
 
 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }
 
 void f4()
 {
    int a=1, b=2, c=3;
    auto ap = &a;
    auto bp = &b;
    auto cp = &c;
    S s(cp);
 
    f(ap, bp, cp);
    b = *f2(&s);
    f3(ap, bp, cp);
 }
 
 Please tell me how I should mark f, S, f2, and f3 such that f4 compiles. 
 Note that this code all works in D1 and provably has no problems with 
 escaping values, since f4 returns void, and no globals are accessed.

First off, we need to clarify that "provably" is a bit strong, and that the hypothesis "f4 returns void, and no globals are accessed" is not all that's needed for the proof. Full access to function bodies is necessary. Correctness may be provable to a human or machine having access to the entire body of code, but not if e.g. f, f2, and f3 are invisible to f4. Second, I contest the entire example. Static typing is supposed to prove interesting properties about *most* correct programs, at the cost of disallowing a *few* correct programs that are not very frequent, are not very interesting, or can be easily replaced with typesafe alternatives. One can write correct code based on void* that is virtually impossible to typecheck statically, but that doesn't show that static types are useless. Andrei
Oct 31 2008
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Andrei Alexandrescu wrote:
 Second, I contest the entire example. Static typing is supposed to prove 
 interesting properties about *most* correct programs, at the cost of 
 disallowing a *few* correct programs that are not very frequent, are not 
 very interesting, or can be easily replaced with typesafe alternatives. 
 One can write correct code based on void* that is virtually impossible 
 to typecheck statically, but that doesn't show that static types are 
 useless.

I think Steven's point was that these programs are not the few; they're the many. Most fairly large programs will have issues like this one.
Oct 31 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
 I think Steven's point was that these programs are not the few; they're 
 the many. Most fairly large programs will have issues like this one.

I'm not sure that's the case, or if it is, those programs will suffer significantly from using an alternative (the most obvious alternative is to allocate those particular cases on the heap). Consider the other problem. It's not easy for a human to look at Steven's case and verify that it is correct. A small mistake in it, and it isn't correct anymore. What's it worth to be rid of such problems?
Oct 31 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Robert Fraser wrote:
 I think Steven's point was that these programs are not the few; they're 
 the many. Most fairly large programs will have issues like this one.

I'm not sure that's the case, or if it is, those programs will suffer significantly from using an alternative (the most obvious alternative is to allocate those particular cases on the heap).

Yes, my point in all this is, if we are going to 'solve' the scope escape issue which today causes unnecessary heap allocations, let's not make new syntax that makes things harder or impossible to do and *still* have the heap allocation problem.
 Consider the other problem. It's not easy for a human to look at Steven's 
 case and verify that it is correct. A small mistake in it, and it isn't 
 correct anymore. What's it worth to be rid of such problems?

It's worth a lot, I agree. But not at the cost of invalidating tested efficient designs such as Tango. FWIW, I think with changes to the compiler/linker, this can be all detected automatically without any extra syntax. Consider that the linker currently resolves dependencies of symbols. It could also resolve incomplete escape analysis graphs. If the graphs don't match up, it's a linker error. This would be needed in the case of interfaces and virtual functions. That would be the ultimate solution IMO, but definitely not in time for D2 ;) -Steve
Oct 31 2008
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-31 22:47:59 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 Yes, my point in all this is, if we are going to 'solve' the scope escape
 issue which today causes unnecessary heap allocations, let's not make new
 syntax that makes things harder or impossible to do and *still* have the
 heap allocation problem.

But declaring arguments as scope solve the heap allocation problem, except where not heap-allocating would create a bug because of someone keeping the reference for too long.
 It's worth a lot, I agree.  But not at the cost of invalidating tested
 efficient designs such as Tango.

If you can't annotate Tango's code to be scope-correct, then there are probably hidden bugs in there. If D scope syntax isn't expressive enough to annotate valid and efficient designs, then the syntax is probably not well enough.
 FWIW, I think with changes to the compiler/linker, this can be all detected
 automatically without any extra syntax.  Consider that the linker currently
 resolves dependencies of symbols.  It could also resolve incomplete escape
 analysis graphs.  If the graphs don't match up, it's a linker error.  This
 would be needed in the case of interfaces and virtual functions.

Here you're expecting the linker to perform escape analysis on comiled code, a world where pointers, numbers and raw data are the same thing: bytes and values in registers. That's probably a million times harder to do than when working on the semantic tree of a D program, if that's even possible.
 That would be the ultimate solution IMO, but definitely not in time for D2
 ;)

Indeed. :-) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 31 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Michel Fortin" wrote
 On 2008-10-31 22:47:59 -0400, "Steven Schveighoffer" <schveiguy yahoo.com> 
 said:

 Yes, my point in all this is, if we are going to 'solve' the scope escape
 issue which today causes unnecessary heap allocations, let's not make new
 syntax that makes things harder or impossible to do and *still* have the
 heap allocation problem.

But declaring arguments as scope solve the heap allocation problem, except where not heap-allocating would create a bug because of someone keeping the reference for too long.

What I was talking about is possible cases where it can be proven (with all source code available) that a heap allocation is not required, but the compiler will not let it compile unless it is heap-allocated. Therefore, you have to do the heap allocation to get it to work. So we would still have the heap allocation problem, but now you have to resolve it manually.
 It's worth a lot, I agree.  But not at the cost of invalidating tested
 efficient designs such as Tango.

If you can't annotate Tango's code to be scope-correct, then there are probably hidden bugs in there.

It depends on what 'scope correct' means. If the syntax to annotate is expressive enough, then it should be possible in most cases. Tango is used quite a bit, and most code has bugs, but generally, scope escape bugs are rare, and special care is taken by the devs to make sure it doesn't happen.
 If D scope syntax isn't expressive enough to annotate valid and efficient 
 designs, then the syntax is probably not well enough.

Yes, this is what I'm concerned about.
 FWIW, I think with changes to the compiler/linker, this can be all 
 detected
 automatically without any extra syntax.  Consider that the linker 
 currently
 resolves dependencies of symbols.  It could also resolve incomplete 
 escape
 analysis graphs.  If the graphs don't match up, it's a linker error. 
 This
 would be needed in the case of interfaces and virtual functions.

Here you're expecting the linker to perform escape analysis on comiled code, a world where pointers, numbers and raw data are the same thing: bytes and values in registers. That's probably a million times harder to do than when working on the semantic tree of a D program, if that's even possible.

I mean that the linker would be given a scope-graph metadata (in the object file), which contains simple dependency edges between parameters of functions and their return values. The graph is generated by the compiler like you said, when working on the semantic tree. The linker just resolves undefined vertices, connecting the incomplete graphs to see if the entire graph is valid. It only makes sense for values that contain references, so code like this: int f(scope int x) should mean nothing to the compiler. So the linker only has to worry about references, and doesn't really need to understand the types of the parameters, just where they can escape to. -Steve
Nov 01 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 FWIW, I think with changes to the compiler/linker, this can be all detected 
 automatically without any extra syntax.  Consider that the linker currently 
 resolves dependencies of symbols.  It could also resolve incomplete escape 
 analysis graphs.  If the graphs don't match up, it's a linker error.  This 
 would be needed in the case of interfaces and virtual functions.

Pushing things off to the linker is not practical because then you cannot rely on standard, off-the-shelf linkers.
Oct 31 2008
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 FWIW, I think with changes to the compiler/linker, this can be all 
 detected automatically without any extra syntax.  Consider that the 
 linker currently resolves dependencies of symbols.  It could also 
 resolve incomplete escape analysis graphs.  If the graphs don't match 
 up, it's a linker error.  This would be needed in the case of 
 interfaces and virtual functions.

Pushing things off to the linker is not practical because then you cannot rely on standard, off-the-shelf linkers.

Or dynamic ones.
Nov 01 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Robert Fraser" wrote
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 FWIW, I think with changes to the compiler/linker, this can be all 
 detected automatically without any extra syntax.  Consider that the 
 linker currently resolves dependencies of symbols.  It could also 
 resolve incomplete escape analysis graphs.  If the graphs don't match 
 up, it's a linker error.  This would be needed in the case of interfaces 
 and virtual functions.

Pushing things off to the linker is not practical because then you cannot rely on standard, off-the-shelf linkers.

Or dynamic ones.

Both good points. Such is the requirement for a full analysis though, as you don't always have all the source available, and if you have cyclic dependencies, you have a chicken-and-egg problem. But if this scope proposal just makes D nightmarish, either with convoluted syntax or correct code not compiling, it's still an option to think about. As far as dynamic linkers, at least on windows, the DLL can have a routine that resolves the graphs when loaded. I'm not sure about shared objects though... -Steve
Nov 01 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Sat, 1 Nov 2008 11:28:15 -0400, Steven Schveighoffer wrote:

 "Robert Fraser" wrote
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 FWIW, I think with changes to the compiler/linker, this can be all 
 detected automatically without any extra syntax.  Consider that the 
 linker currently resolves dependencies of symbols.  It could also 
 resolve incomplete escape analysis graphs.  If the graphs don't match 
 up, it's a linker error.  This would be needed in the case of interfaces 
 and virtual functions.

Pushing things off to the linker is not practical because then you cannot rely on standard, off-the-shelf linkers.

Or dynamic ones.

Both good points. Such is the requirement for a full analysis though, as you don't always have all the source available, and if you have cyclic dependencies, you have a chicken-and-egg problem. But if this scope proposal just makes D nightmarish, either with convoluted syntax or correct code not compiling, it's still an option to think about. As far as dynamic linkers, at least on windows, the DLL can have a routine that resolves the graphs when loaded. I'm not sure about shared objects though...

Or there could be a separate tool, like verify in Java. DLLs can be handled, too, because every DLL must have an import library. So it can include an import escape graph as well. Though there'll have to be an escape metal-language anyway to be able to describe external APIs and dynamically loaded DLLs. An image of core.sys.windows.windows annotated to support escape analysis boggles my mind.
Nov 07 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

The question is, in the example: int* bar(int* p) { return p; } int* foo() { int x = 3; return bar(&x); } where is the problem? Is the problem in bar, or is the problem in foo? The real problem is in foo, because there's technically nothing wrong with bar.

The problem is that foo does not know enough about bar to deem the call correct. Notice that if you consider local functions instead of ints, that gets you straight to the delegates case that is the crux of the matter.

No, foo doesn't know anything. The compiler is the one who does not know. The user does, so it looks to the user like the compiler has a bug (which I would argue is correct).
 So what ends up happening in your scheme is this:

 int *bar(int *p) { return p;}

 void foo(scope int *p)
 {
    int *x = bar(p); // no escape, but compile error
 }

Well I was talking to Walter how in compiling a function there are two scopes to track: that of parameters, and that of locals inside functions. The former includes the latter. Bar should be defined as: scope int* bar(scope int* p) { return p; } which is easy to check as long as some conservativeness is acceptable.

So how does foo know what scope return means? Consider a min function: scope T min(T)(scope T v1, scope T v2); Now, let's try it out: scope int *foo(scope int *p) { int x; return min(&x, p); } Shouldn't this not compile? If you say 'well, the compiler can take the most conservative approach, and assume scope means the most inner scope.' OK scope T[] findFirst(scope T[] buf, scope T target); scope T[] findFirst5(scope T[] buf) { scope t5 = new T(5); return findFirst(buf, t5); } The problem is, unless you allow fully specifying dependencies, there is going to be ways people call functions that you didn't think of, which are legitimate designs, and just can't be done with your new regime. The ultimate result is either people cast around the system, don't use the system, or end up allocating unecessarily on the heap to work around it.
 So yes, you are protected, but at the cost of not being able to write 
 code that is provably safe.  The end result is you end up removing scope 
 where it really should be, and then virally, it makes you remove scope 
 decorations everywhere else.  I predict that if this is implemented, 
 nobody will be able to use it.

There are many cases in which statically typed programs disallow valid constructs. Each case is a matter of nuance, but by and large types have turned out to be successful.

I would bet that either A) the proposed system would not provide correct protection, or B) the system would require many design changes in a library like Tango, which uses lots of stack variables and inner functions to achive higher performance. It just doesn't work with actual already successful software programs. It's too limited.
 I'll reiterate my challenge from another thread:

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

 struct S
 {
    int *v;
 }

 int *f2(S* s) { return s.v;}

 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

 void f4()
 {
    int a=1, b=2, c=3;
    auto ap = &a;
    auto bp = &b;
    auto cp = &c;
    S s(cp);

    f(ap, bp, cp);
    b = *f2(&s);
    f3(ap, bp, cp);
 }

 Please tell me how I should mark f, S, f2, and f3 such that f4 compiles. 
 Note that this code all works in D1 and provably has no problems with 
 escaping values, since f4 returns void, and no globals are accessed.

First off, we need to clarify that "provably" is a bit strong, and that the hypothesis "f4 returns void, and no globals are accessed" is not all that's needed for the proof. Full access to function bodies is necessary.

OK, if f4 returns void, and takes no arguments, and none of the code used by any of the functions accesses globals (I admit, I may have said this ambiguously last time), there's not even any heap activity, you tell me how it can escape. Prove me wrong. You have full access to the source as you can see.
 Correctness may be provable to a human or machine having access to the 
 entire body of code, but not if e.g. f, f2, and f3 are invisible to f4.

That is the challenge. If I remove the function bodies, how do you mark the signatures so you can still prove that it's correct?
 Second, I contest the entire example. Static typing is supposed to prove 
 interesting properties about *most* correct programs, at the cost of 
 disallowing a *few* correct programs that are not very frequent, are not 
 very interesting, or can be easily replaced with typesafe alternatives. 
 One can write correct code based on void* that is virtually impossible to 
 typecheck statically, but that doesn't show that static types are useless.

My example is at least as interesting as Walter's in the article to show the problem. If you want to complain about simple examples not being useful, then stop using simple examples to show your point. We are not talking about mucking with void *'s or drastic violations of types here, we are talking about reference parameters for returns, or an accessor of a member, or a 3way swap function. These are all common designs. The fact that you cannot find a solution so you resort to just bluntly calling my example 'uninteresting' speaks for itself. -Steve
Oct 31 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 So how does foo know what scope return means?  Consider a min function:
 
 scope T min(T)(scope T v1, scope T v2);
 
 Now, let's try it out:
 
 scope int *foo(scope int *p)
 {
    int x;
    return min(&x, p);
 }
 
 Shouldn't this not compile?  If you say 'well, the compiler can take the 
 most conservative approach, and assume scope means the most inner scope.'

Assume the compiler keeps track of, for every expression, a list of references that it is composed of. For: min(&x, p) the compiler knows that min() returns some combination of v1 and v2, so the expression min(&x, p) is tagged with the list [&x,p]. Now it examines the return, and sees that [&x,p] is escaping the scope of the function. Although p is a scope parameter, the scope on the foo() return value says that's ok, so that passes. The &x, however, fails because it's a reference to a local going out of scope. So, I think we've got this case covered properly. But I don't like the syntax. For one thing, if the return value might refer to parameter p1 but not p2, we need to express that. So, something like: int* foo(return scope int* p1, scope int* p2, return scope int* p3); Now, the caller tags a call to foo() with [arg1,arg3] for its escape analysis, omitting arg2. The fly in this is I'd like to combine it somehow with the const propagation issue of argument type to return type.
Oct 31 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Steven Schveighoffer wrote:
 So how does foo know what scope return means?  Consider a min function:

 scope T min(T)(scope T v1, scope T v2);

 Now, let's try it out:

 scope int *foo(scope int *p)
 {
    int x;
    return min(&x, p);
 }

 Shouldn't this not compile?  If you say 'well, the compiler can take the 
 most conservative approach, and assume scope means the most inner scope.'

Assume the compiler keeps track of, for every expression, a list of references that it is composed of. For: min(&x, p) the compiler knows that min() returns some combination of v1 and v2, so the expression min(&x, p) is tagged with the list [&x,p]. Now it examines the return, and sees that [&x,p] is escaping the scope of the function. Although p is a scope parameter, the scope on the foo() return value says that's ok, so that passes. The &x, however, fails because it's a reference to a local going out of scope. So, I think we've got this case covered properly.

OK, I assumed that was an easy one, but I was establishing how it's hard to solve this and the other example.
 But I don't like the syntax. For one thing, if the return value might 
 refer to parameter p1 but not p2, we need to express that.

This is part of it, you also can have return values through parameters which need to be tracked.
 So, something like:

 int* foo(return scope int* p1, scope int* p2, return scope int* p3);

 Now, the caller tags a call to foo() with [arg1,arg3] for its escape 
 analysis, omitting arg2.

This is better functionally, but notice how verbose we are getting. Now add in tagging parameters to depend on other parameters (like for instance in a swap function).
 The fly in this is I'd like to combine it somehow with the const 
 propagation issue of argument type to return type.

Note that the const propogation issue isn't as complex, because you can't change const parameters, so you only have to do return value analysis. -Steve
Nov 01 2008
parent Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 This is better functionally, but notice how verbose we are getting.  Now add 
 in tagging parameters to depend on other parameters (like for instance in a 
 swap function).

I hadn't thought about a swap function. Hrmmm...
Nov 01 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

int* bar(int* p) { return p; } int* foo() { int x = 3; return bar(&x); } where is the problem? Is the problem in bar, or is the problem in foo? The real problem is in foo, because there's technically nothing wrong with bar.

correct. Notice that if you consider local functions instead of ints, that gets you straight to the delegates case that is the crux of the matter.

No, foo doesn't know anything. The compiler is the one who does not know. The user does, so it looks to the user like the compiler has a bug (which I would argue is correct).

Well obviously. When I said "foo doesn't know" I really meant "the compiler while compiling foo and enjoying its horizon". I'm at a continuous disadvantage because I don't have the time to give full detail in my posts so they come off as unclear.
 So what ends up happening in your scheme is this:

 int *bar(int *p) { return p;}

 void foo(scope int *p)
 {
    int *x = bar(p); // no escape, but compile error
 }

scopes to track: that of parameters, and that of locals inside functions. The former includes the latter. Bar should be defined as: scope int* bar(scope int* p) { return p; } which is easy to check as long as some conservativeness is acceptable.

So how does foo know what scope return means? Consider a min function: scope T min(T)(scope T v1, scope T v2); Now, let's try it out: scope int *foo(scope int *p) { int x; return min(&x, p); }

That won't compile per my explanation: scope in a function parameter is different from scope in a local variable. You can't return a scoped local. So the code won't compile.
 Shouldn't this not compile?  If you say 'well, the compiler can take the 
 most conservative approach, and assume scope means the most inner scope.'
 
 OK

In this case it's not even conservative, it's doing the correct thing. To illustrate the effects of a conservative approach, consider: scope int *foo(scope int *p) { int x; auto r = &x; r = p; return r; } This function won't compile if the compiler employs flow-insensitive escape analysis (see e.g. http://en.wikipedia.org/wiki/Data_flow_analysis) because r will be marked as escaping regardless of order of statements. Walter plans to implement path- and flow-insensitive escape analysis. In fact he first wants to define scope to work on credit alone (trust the programmer) and later on implement the analysis. I hope I'll be able to talk him out of that, because it's a precisely backwards approach: he'll first allow vale tudo, and then break code when he implements a limited analysis. I think the right thing is to move from conservative to permissive.
 scope T[] findFirst(scope T[] buf, scope T target);
 
 scope T[] findFirst5(scope T[] buf)
 {
    scope t5 = new T(5);
    return findFirst(buf, t5);
 }

Same thing, won't compile. This is a clear example where conservative doesn't cut it. There is a paper that I don't have the time to look for, that does a pointer analysis and associate with each function a little graph showing how the return value depends on the inputs. (Does anyone know of it?) I don't think we can afford to implement anything near that level of sophistication.
 The problem is, unless you allow fully specifying dependencies, there is 
 going to be ways people call functions that you didn't think of, which are 
 legitimate designs, and just can't be done with your new regime.  The 
 ultimate result is either people cast around the system, don't use the 
 system, or end up allocating unecessarily on the heap to work around it.

It all depends on how often those cases are encountered.
 So yes, you are protected, but at the cost of not being able to write 
 code that is provably safe.  The end result is you end up removing scope 
 where it really should be, and then virally, it makes you remove scope 
 decorations everywhere else.  I predict that if this is implemented, 
 nobody will be able to use it.

constructs. Each case is a matter of nuance, but by and large types have turned out to be successful.

I would bet that either A) the proposed system would not provide correct protection, or B) the system would require many design changes in a library like Tango, which uses lots of stack variables and inner functions to achive higher performance. It just doesn't work with actual already successful software programs. It's too limited.

It would be helpful to provide some examples from Tango or other libraries to substantiate this.
 I'll reiterate my challenge from another thread:

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

 struct S
 {
    int *v;
 }

 int *f2(S* s) { return s.v;}

 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

 void f4()
 {
    int a=1, b=2, c=3;
    auto ap = &a;
    auto bp = &b;
    auto cp = &c;
    S s(cp);

    f(ap, bp, cp);
    b = *f2(&s);
    f3(ap, bp, cp);
 }

 Please tell me how I should mark f, S, f2, and f3 such that f4 compiles. 
 Note that this code all works in D1 and provably has no problems with 
 escaping values, since f4 returns void, and no globals are accessed.

the hypothesis "f4 returns void, and no globals are accessed" is not all that's needed for the proof. Full access to function bodies is necessary.

OK, if f4 returns void, and takes no arguments, and none of the code used by any of the functions accesses globals (I admit, I may have said this ambiguously last time), there's not even any heap activity, you tell me how it can escape. Prove me wrong. You have full access to the source as you can see.

That's exactly what I said. Without access to the bodies you can't tell anything. And I automatically discount access to bodies. Information should be inferred from function signatures.
 Correctness may be provable to a human or machine having access to the 
 entire body of code, but not if e.g. f, f2, and f3 are invisible to f4.

That is the challenge. If I remove the function bodies, how do you mark the signatures so you can still prove that it's correct?

Instead of looking on an artificial example, it would be better to work on a corpus of examples extracted from actual code.
 Second, I contest the entire example. Static typing is supposed to prove 
 interesting properties about *most* correct programs, at the cost of 
 disallowing a *few* correct programs that are not very frequent, are not 
 very interesting, or can be easily replaced with typesafe alternatives. 
 One can write correct code based on void* that is virtually impossible to 
 typecheck statically, but that doesn't show that static types are useless.

My example is at least as interesting as Walter's in the article to show the problem. If you want to complain about simple examples not being useful, then stop using simple examples to show your point. We are not talking about mucking with void *'s or drastic violations of types here, we are talking about reference parameters for returns, or an accessor of a member, or a 3way swap function. These are all common designs. The fact that you cannot find a solution so you resort to just bluntly calling my example 'uninteresting' speaks for itself.

This is the second time my identity is conflated with that of Walter. I do not write his blog entries! Moreover, I also think they are due for an increase in quality of treatment. About two months ago I told Walter that his writing is overly imprecise and glossing over a lot of detail. To me it conveys the feeling that he considers the reader not too smart and not too critical and just willing to accept a lot of statements at face value. If I were tasked with writing on the same subjects, I'd make damn sure I could stand to lose a toe for every incorrect word in there. (Newsgroup standards aren't that high, hallelujah.) He rightly said in his defense that a blog entry can't be an in-depth treatment for the time he can afford to invest in it. My opinion of the whole banana is that if you want to make an authoritative blog that's also short and sweet you must end up pondering each word carefully and provide ample references for the interested. Andrei
Oct 31 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
[snip]
 There is a paper that I don't have the time to look for, that does a 
 pointer analysis and associate with each function a little graph showing 
 how the return value depends on the inputs. (Does anyone know of it?) I 
 don't think we can afford to implement anything near that level of 
 sophistication.

Lo and behold, I found the paper. I'm amazed because it's been many years since I read it and I never fully understood it :o). http://suif.stanford.edu/papers/wilson95/paper.html The pointer representation they use form little graphs that represent their aliasing relationship. Andrei
Oct 31 2008
parent "Dave" <Dave_member pathlink.com> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:geglbg$2f8s$1 digitalmars.com...
 Andrei Alexandrescu wrote:
 [snip]
 There is a paper that I don't have the time to look for, that does a 
 pointer analysis and associate with each function a little graph showing 
 how the return value depends on the inputs. (Does anyone know of it?) I 
 don't think we can afford to implement anything near that level of 
 sophistication.

Lo and behold, I found the paper. I'm amazed because it's been many years since I read it and I never fully understood it :o). http://suif.stanford.edu/papers/wilson95/paper.html The pointer representation they use form little graphs that represent their aliasing relationship. Andrei

Another interesting blog on the subject of pointer analysis. This one talks about using a relational query language over Binary Decison Diagrams to make PA scalable and relatively easy to build into static analysis tools: http://blog.robjsoftware.org/2007/10/bddbddb-pointer-alias-analysis-comes-of.html which refers to: http://suif.stanford.edu/papers/pldi04.pdf
Nov 01 2008
prev sibling next sibling parent reply ore-sama <spam here.lot> writes:
Andrei Alexandrescu Wrote:

 In this case it's not even conservative, it's doing the correct thing. 
 To illustrate the effects of a conservative approach, consider:
 
 scope int *foo(scope int *p)
 {
     int x;
     auto r = &x;
     r = p;
     return r;
 }
 
 This function won't compile if the compiler employs flow-insensitive 
 escape analysis (see e.g. 
 http://en.wikipedia.org/wiki/Data_flow_analysis ) because r will be 
 marked as escaping regardless of order of statements.

If compiler is able to mark data as escaping and compare escape marks of variables, isn't it able to derive parameters' marks rather than requesting them from programmers? Thus it can build function's precise signature and reading it from obj file (another long-announced feature) it can decide whether that function is callable from current context, but compiler should also check objs before linking to make sure that their metadata don't conflict.
Nov 01 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
ore-sama wrote:
 Andrei Alexandrescu Wrote:
 
 In this case it's not even conservative, it's doing the correct
 thing. To illustrate the effects of a conservative approach,
 consider:
 
 scope int *foo(scope int *p) { int x; auto r = &x; r = p; return r;
  }
 
 This function won't compile if the compiler employs
 flow-insensitive escape analysis (see e.g. 
 http://en.wikipedia.org/wiki/Data_flow_analysis ) because r will be
  marked as escaping regardless of order of statements.

If compiler is able to mark data as escaping and compare escape marks of variables, isn't it able to derive parameters' marks rather than requesting them from programmers? Thus it can build function's precise signature and reading it from obj file (another long-announced feature) it can decide whether that function is callable from current context, but compiler should also check objs before linking to make sure that their metadata don't conflict.

Excellent point. The compiler can do that, but it would take a long time and would need access to the entire source code (the dreaded "interprocedural analysis"). This makes modular development difficult. A solution is to have the programmer put information about the function in the function signature. There's a tradeoff to be made so as not to burden the programmer with writing too long signatures. Andrei
Nov 01 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 Excellent point. The compiler can do that, but it would take a long time 
 and would need access to the entire source code (the dreaded 
 "interprocedural analysis"). This makes modular development difficult.
 
 A solution is to have the programmer put information about the function 
 in the function signature. There's a tradeoff to be made so as not to 
 burden the programmer with writing too long signatures.

I can see a better intermediate solution: let the compiler do interprocedural analysis on the visible/available modules, and let the programmer add manually the required information at the points where the known and unknown modules interface to each other. This minimizes the work of the programmer while giving the compiler all the information it needs. Bye, bearophile
Nov 01 2008
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-31 23:51:45 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 The problem is, unless you allow fully specifying dependencies, there 
 is going to be ways people call functions that you didn't think of, 
 which are legitimate designs, and just can't be done with your new 
 regime.  The ultimate result is either people cast around the system, 
 don't use the system, or end up allocating unecessarily on the heap to 
 work around it.

It all depends on how often those cases are encountered.

Tell me how often you encounter a function that swap two variables then. Basically, it's implemented like this: void swap(scope ref int a, scope ref int b) { int tmp = a; a = b; b = tmp; } All fine, no reference is escaping. Now change the parameters for pointers or object references. You could naively implement it like this for pointers: void swap(scope ref int* a, scope ref int* b) { scope int* tmp = a; a = b; b = tmp; } But if swap compiles like this, it's dangerous because the scope for *a may be different from the scope for *b and you may leak a pointer to a narrower scope in a broader one. For instance: // broader scope (global) int a = 1; int* pa = &a; void test() { // narrower scope (local) int b = 2; int* pb = &b; swap(pa, pb); // should be an error } Here, swap(pa, pb), if allowed, would attempt to put a pointer to the local variable b into the global variable pa. After the function returns, pointer pa becomes invalid. To avoid this, you need the signature for swap to tell you a more complext scope constrain. Using Robert Jacques's suggested notation, I think it should be like: void swap(scope ref int* a, scope ref int* b) if ((*a).scope <= b.scope && (*b).scope <= a.scope) { scope int* tmp = a; a = b; b = tmp; } Basically, if you're giving b a pointer to the value pointed by a, the value pointed by a needs to be alive as long or longer than b if you want to avoid having b point to invalid memory. Same for a and the value pointed by b. Robert's notation is good for expressing that in the function signature, but I don't like it because it places constrains on the signature, not on the type, making constrains propagation a lot more complicated. So I've been thinking of this notation: void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b) { scopeof(b) int* tmp = a; a = b; b = tmp; } Note that tmp here needs to be scopeof(a): this guarenties that whatever you put in tmp, you can copy in b later (since scopeof(a) == scopeof(*b)). Oh, and if you don't about the type of tmp, including the scoping constrains, you should be allowed to substitute it all with "auto": void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b) { auto tmp = a; a = b; b = tmp; } - - - That's enough for the swap function. Now lets take a look at a plain setter function, which, I'm pretty sure, is a common pattern. Let's try this first: struct S { int *pi; } void setPI(scope S s, scope int* pi) { s.pi = pi; } Simple enough you think? But we have the same problem: the scope of the value pointed by pi may be narrower than the one s is in: S s1; // global scope int i1; void test() { int i2; S s2; setPI(s1, &i1); // ok, i1 same scope as s1.pi. setPI(s2, &i1); // ok, i1 is of a broader scope than s2.pi. setPI(s1, &i2); // should be an error, i2 scope narrower than s1.pi. setPI(s2, &i2); // ok, i2 is same scope as s2.pi. } To enforce correctly the constrains while still allowing the usage of s2 above, where the second argument given to the setter may be a local variable, here's what we need: struct S { scope int *pi; } void setPI(scope S s, scopeof(s.pi) int* pi) { s.pi = pi; } "scope int *pi" in the struct means you can't take the value pointed to by pi and keep it beyond S's scope. "scopeof(s.pi) int* pi" in the function signature means that the scope of the value pointed to by the second argument is at least as broad as the given scope for s.pi. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 01 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-10-31 23:51:45 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 The problem is, unless you allow fully specifying dependencies, there 
 is going to be ways people call functions that you didn't think of, 
 which are legitimate designs, and just can't be done with your new 
 regime.  The ultimate result is either people cast around the system, 
 don't use the system, or end up allocating unecessarily on the heap 
 to work around it.

It all depends on how often those cases are encountered.

Tell me how often you encounter a function that swap two variables then. Basically, it's implemented like this: void swap(scope ref int a, scope ref int b) { int tmp = a; a = b; b = tmp; } All fine, no reference is escaping. Now change the parameters for pointers or object references. You could naively implement it like this for pointers: void swap(scope ref int* a, scope ref int* b) { scope int* tmp = a; a = b; b = tmp; }

"scope" is not needed inside the function (in both cases). The compiler will infer it.
 But if swap compiles like this, it's dangerous because the scope for *a 
 may be different from the scope for *b and you may leak a pointer to a 
 narrower scope in a broader one.

Aha! Good point. But notice, however, that you are trying to extract more juice from "scope" than there is in it. What scope says in swap is only that swap itself does not escape any of its parameters. It doesn't tell that the parameters have actually the same scope.
 For instance:
 
     // broader scope (global)
     int a = 1;
     int* pa = &a;
 
     void test()
     {
         // narrower scope (local)
         int b = 2;
         int* pb = &b;
         swap(pa, pb); // should be an error
     }
 
 Here, swap(pa, pb), if allowed, would attempt to put a pointer to the 
 local variable b into the global variable pa. After the function 
 returns, pointer pa becomes invalid. To avoid this, you need the 
 signature for swap to tell you a more complext scope constrain. Using 
 Robert Jacques's suggested notation, I think it should be like:
 
     void swap(scope ref int* a, scope ref int* b)
         if ((*a).scope <= b.scope && (*b).scope <= a.scope)
     {
         scope int* tmp = a;
         a = b; b = tmp;
     }

This would require each variable to carry its actual scope either at compilation or at runtime. Either would be very hard to implement if at all. But I'm glad there is a bit of exaggeration on the side of safety :o).
 Basically, if you're giving b a pointer to the value pointed by a, the 
 value pointed by a needs to be alive as long or longer than b if you 
 want to avoid having b point to invalid memory. Same for a and the value 
 pointed by b.
 
 Robert's notation is good for expressing that in the function signature, 
 but I don't like it because it places constrains on the signature, not 
 on the type, making constrains propagation a lot more complicated.
 
 So I've been thinking of this notation:
 
     void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b)
     {
         scopeof(b) int* tmp = a;
         a = b; b = tmp;
     }

That's nice, but it would reduce what swap can do. Oftentimes it is not known whether values manipulated by a function are in the same scope or not.
 Note that tmp here needs to be scopeof(a): this guarenties that whatever 
 you put in tmp, you can copy in b later (since scopeof(a) == scopeof(*b)).
 
 Oh, and if you don't about the type of tmp, including the scoping 
 constrains, you should be allowed to substitute it all with "auto":
 
     void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b)
     {
         auto tmp = a;
         a = b; b = tmp;
     }
 
 - - -
 
 That's enough for the swap function. Now lets take a look at a plain 
 setter function, which, I'm pretty sure, is a common pattern.
 
 Let's try this first:
 
     struct S
     {
         int *pi;
     }
 
     void setPI(scope S s, scope int* pi)
     {
         s.pi = pi;
     }

 Simple enough you think? But we have the same problem: the scope of the 
 value pointed by pi may be narrower than the one s is in:

I am glad there is awareness of the limitations of scope. There are still things that are outside scope's charter, and the summary is - scope tells something about the function declaring it, not about a relationship between its parameters. Andrei
Nov 01 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-01 12:02:47 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 Basically, it's implemented like this:
 
     void swap(scope ref int a, scope ref int b)
     {
         int tmp = a;
         a = b; b = tmp;
     }
 
 All fine, no reference is escaping. Now change the parameters for 
 pointers or object references. You could naively implement it like this 
 for pointers:
 
     void swap(scope ref int* a, scope ref int* b)
     {
         scope int* tmp = a;
         a = b; b = tmp;
     }

"scope" is not needed inside the function (in both cases). The compiler will infer it.

Indeed.
 But if swap compiles like this, it's dangerous because the scope for *a 
 may be different from the scope for *b and you may leak a pointer to a 
 narrower scope in a broader one.

Aha! Good point. But notice, however, that you are trying to extract more juice from "scope" than there is in it. What scope says in swap is only that swap itself does not escape any of its parameters. It doesn't tell that the parameters have actually the same scope.

But my point was that the "no escape" guarenty doesn't hold if the two arguments' scope doesn't match. The parameter may escape via the other parameter.
 For instance:
 
     // broader scope (global)
     int a = 1;
     int* pa = &a;
 
     void test()
     {
         // narrower scope (local)
         int b = 2;
         int* pb = &b;
         swap(pa, pb); // should be an error
     }
 
 Here, swap(pa, pb), if allowed, would attempt to put a pointer to the 
 local variable b into the global variable pa. After the function 
 returns, pointer pa becomes invalid. To avoid this, you need the 
 signature for swap to tell you a more complext scope constrain. Using 
 Robert Jacques's suggested notation, I think it should be like:
 
     void swap(scope ref int* a, scope ref int* b)
         if ((*a).scope <= b.scope && (*b).scope <= a.scope)
     {
         scope int* tmp = a;
         a = b; b = tmp;
     }

This would require each variable to carry its actual scope either at compilation or at runtime. Either would be very hard to implement if at all. But I'm glad there is a bit of exaggeration on the side of safety :o).

I wouldn't call it exageration. Adding complexity to function signatures only to be half-safe isn't very appealing to me.
 Basically, if you're giving b a pointer to the value pointed by a, the 
 value pointed by a needs to be alive as long or longer than b if you 
 want to avoid having b point to invalid memory. Same for a and the 
 value pointed by b.
 
 Robert's notation is good for expressing that in the function 
 signature, but I don't like it because it places constrains on the 
 signature, not on the type, making constrains propagation a lot more 
 complicated.
 
 So I've been thinking of this notation:
 
     void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b)
     {
         scopeof(b) int* tmp = a;
         a = b; b = tmp;
     }

That's nice, but it would reduce what swap can do. Oftentimes it is not known whether values manipulated by a function are in the same scope or not.

The caller function should know, and from there it could be possible to check against the constrain. About reducing what swap can do, I'd like to see
 That's enough for the swap function. Now lets take a look at a plain 
 setter function, which, I'm pretty sure, is a common pattern.
 
 Let's try this first:
 
     struct S
     {
         int *pi;
     }
 
     void setPI(scope S s, scope int* pi)
     {
         s.pi = pi;
     }
 
 Simple enough you think? But we have the same problem: the scope of the 
 value pointed by pi may be narrower than the one s is in:

I am glad there is awareness of the limitations of scope. There are still things that are outside scope's charter, and the summary is - scope tells something about the function declaring it, not about a relationship between its parameters.

So basically, if you have two scope arguments and you assign some part of the first to the other, you risk bypassing the scope-checking system despite the appearances to the contrary. I'm not sure it's a good idea to have a scope-checking system if you can only trust in half. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 01 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-01 12:02:47 -0400, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 Basically, it's implemented like this:

     void swap(scope ref int a, scope ref int b)
     {
         int tmp = a;
         a = b; b = tmp;
     }

 All fine, no reference is escaping. Now change the parameters for 
 pointers or object references. You could naively implement it like 
 this for pointers:

     void swap(scope ref int* a, scope ref int* b)
     {
         scope int* tmp = a;
         a = b; b = tmp;
     }

"scope" is not needed inside the function (in both cases). The compiler will infer it.

Indeed.
 But if swap compiles like this, it's dangerous because the scope for 
 *a may be different from the scope for *b and you may leak a pointer 
 to a narrower scope in a broader one.

Aha! Good point. But notice, however, that you are trying to extract more juice from "scope" than there is in it. What scope says in swap is only that swap itself does not escape any of its parameters. It doesn't tell that the parameters have actually the same scope.

But my point was that the "no escape" guarenty doesn't hold if the two arguments' scope doesn't match. The parameter may escape via the other parameter.
 For instance:

     // broader scope (global)
     int a = 1;
     int* pa = &a;

     void test()
     {
         // narrower scope (local)
         int b = 2;
         int* pb = &b;
         swap(pa, pb); // should be an error
     }

 Here, swap(pa, pb), if allowed, would attempt to put a pointer to the 
 local variable b into the global variable pa. After the function 
 returns, pointer pa becomes invalid. To avoid this, you need the 
 signature for swap to tell you a more complext scope constrain. Using 
 Robert Jacques's suggested notation, I think it should be like:

     void swap(scope ref int* a, scope ref int* b)
         if ((*a).scope <= b.scope && (*b).scope <= a.scope)
     {
         scope int* tmp = a;
         a = b; b = tmp;
     }

This would require each variable to carry its actual scope either at compilation or at runtime. Either would be very hard to implement if at all. But I'm glad there is a bit of exaggeration on the side of safety :o).

I wouldn't call it exageration. Adding complexity to function signatures only to be half-safe isn't very appealing to me.
 Basically, if you're giving b a pointer to the value pointed by a, 
 the value pointed by a needs to be alive as long or longer than b if 
 you want to avoid having b point to invalid memory. Same for a and 
 the value pointed by b.

 Robert's notation is good for expressing that in the function 
 signature, but I don't like it because it places constrains on the 
 signature, not on the type, making constrains propagation a lot more 
 complicated.

 So I've been thinking of this notation:

     void swap(scopeof(b*) ref int* a, scopeof(*a) ref int* b)
     {
         scopeof(b) int* tmp = a;
         a = b; b = tmp;
     }

That's nice, but it would reduce what swap can do. Oftentimes it is not known whether values manipulated by a function are in the same scope or not.

The caller function should know, and from there it could be possible to check against the constrain. About reducing what swap can do, I'd like to see
 That's enough for the swap function. Now lets take a look at a plain 
 setter function, which, I'm pretty sure, is a common pattern.

 Let's try this first:

     struct S
     {
         int *pi;
     }

     void setPI(scope S s, scope int* pi)
     {
         s.pi = pi;
     }

 Simple enough you think? But we have the same problem: the scope of 
 the value pointed by pi may be narrower than the one s is in:

I am glad there is awareness of the limitations of scope. There are still things that are outside scope's charter, and the summary is - scope tells something about the function declaring it, not about a relationship between its parameters.

So basically, if you have two scope arguments and you assign some part of the first to the other, you risk bypassing the scope-checking system despite the appearances to the contrary. I'm not sure it's a good idea to have a scope-checking system if you can only trust in half.

I think you're putting forth a compelling argument. Thank you! I will try to get a hold of Walter today and discuss the matter. Off the top of my head, there are the following ways to go: 1. Cop-out: do whatever is the syntactic minimum for allowing closure allocation when requested. Don't check anything, trust the coder. 2. Accept: implement scope with an understanding of its issues and limitations. Your argument weakens the motivation to pursue this approach. 3. Dig further: look into how scope can be made indeed good. My current understanding is that a partial ordering of scopes would be interesting to think about. On a side note, I'm very glad to see Walter having taken this bull by the horns. This is entirely his initiative - and if it turns sour, it's all his fault! :o) - initiative tackling a difficult problem that revealed an embarrassing corner of the language. Andrei
Nov 01 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Walter plans to implement path- and flow-insensitive escape analysis.

This is good! It is what I was hoping for as a fix for trivial examples.
 In fact he first wants to define scope to work on credit alone (trust the 
 programmer) and later on implement the analysis. I hope I'll be able to 
 talk him out of that, because it's a precisely backwards approach: he'll 
 first allow vale tudo, and then break code when he implements a limited 
 analysis. I think the right thing is to move from conservative to 
 permissive.

I agree that implementing it on credit alone is going to open up problems later on.
 scope T[] findFirst(scope T[] buf, scope T target);

 scope T[] findFirst5(scope T[] buf)
 {
    scope t5 = new T(5);
    return findFirst(buf, t5);
 }

Same thing, won't compile. This is a clear example where conservative doesn't cut it.

OK, so then your proposed solution won't work for valid code like this. Will there be a way to force this to compile?
 There is a paper that I don't have the time to look for, that does a 
 pointer analysis and associate with each function a little graph showing 
 how the return value depends on the inputs. (Does anyone know of it?) I 
 don't think we can afford to implement anything near that level of 
 sophistication.

I think something like this isn't possible, especially using interfaces or function pointers, but might be possible if graph dependencies are resolved at link-time. And even then, it might reject some valid code.
 The problem is, unless you allow fully specifying dependencies, there is 
 going to be ways people call functions that you didn't think of, which 
 are legitimate designs, and just can't be done with your new regime.  The 
 ultimate result is either people cast around the system, don't use the 
 system, or end up allocating unecessarily on the heap to work around it.

It all depends on how often those cases are encountered.

It only takes one common function to ruin the whole thing. Like I said before, it is viral. If one lower level function can't be scope, then nothing above it can be scope. Unless there is a way to cast away the scope property, in which case you have compromised the system.
 So yes, you are protected, but at the cost of not being able to write 
 code that is provably safe.  The end result is you end up removing 
 scope where it really should be, and then virally, it makes you remove 
 scope decorations everywhere else.  I predict that if this is 
 implemented, nobody will be able to use it.

constructs. Each case is a matter of nuance, but by and large types have turned out to be successful.

I would bet that either A) the proposed system would not provide correct protection, or B) the system would require many design changes in a library like Tango, which uses lots of stack variables and inner functions to achive higher performance. It just doesn't work with actual already successful software programs. It's too limited.

It would be helpful to provide some examples from Tango or other libraries to substantiate this.

I will find some examples. Just not today, I've got some fall cleanup to do ;)
 I'll reiterate my challenge from another thread:

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

 struct S
 {
    int *v;
 }

 int *f2(S* s) { return s.v;}

 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

 void f4()
 {
    int a=1, b=2, c=3;
    auto ap = &a;
    auto bp = &b;
    auto cp = &c;
    S s(cp);

    f(ap, bp, cp);
    b = *f2(&s);
    f3(ap, bp, cp);
 }

 Please tell me how I should mark f, S, f2, and f3 such that f4 
 compiles. Note that this code all works in D1 and provably has no 
 problems with escaping values, since f4 returns void, and no globals 
 are accessed.

the hypothesis "f4 returns void, and no globals are accessed" is not all that's needed for the proof. Full access to function bodies is necessary.

OK, if f4 returns void, and takes no arguments, and none of the code used by any of the functions accesses globals (I admit, I may have said this ambiguously last time), there's not even any heap activity, you tell me how it can escape. Prove me wrong. You have full access to the source as you can see.

That's exactly what I said. Without access to the bodies you can't tell anything. And I automatically discount access to bodies. Information should be inferred from function signatures.

Sure, which is why I posted the challenge. You may have misunderstood what I meant. I meant, look at this group of functions. Having access to the source code, you can prove that it is valid, and no escapes occur. Now remove the function bodies for everything but f4, and show me how your solution marks up the signatures such that f4 compiles. I think with being able to specify the scope return value, it's probably a lot easier than without that.
 Correctness may be provable to a human or machine having access to the 
 entire body of code, but not if e.g. f, f2, and f3 are invisible to f4.

That is the challenge. If I remove the function bodies, how do you mark the signatures so you can still prove that it's correct?

Instead of looking on an artificial example, it would be better to work on a corpus of examples extracted from actual code.

OK, I'll find some (per your request above, from Tango).
 Second, I contest the entire example. Static typing is supposed to prove 
 interesting properties about *most* correct programs, at the cost of 
 disallowing a *few* correct programs that are not very frequent, are not 
 very interesting, or can be easily replaced with typesafe alternatives. 
 One can write correct code based on void* that is virtually impossible 
 to typecheck statically, but that doesn't show that static types are 
 useless.

My example is at least as interesting as Walter's in the article to show the problem. If you want to complain about simple examples not being useful, then stop using simple examples to show your point. We are not talking about mucking with void *'s or drastic violations of types here, we are talking about reference parameters for returns, or an accessor of a member, or a 3way swap function. These are all common designs. The fact that you cannot find a solution so you resort to just bluntly calling my example 'uninteresting' speaks for itself.

This is the second time my identity is conflated with that of Walter. I do not write his blog entries! Moreover, I also think they are due for an increase in quality of treatment. About two months ago I told Walter that his writing is overly imprecise and glossing over a lot of detail. To me it conveys the feeling that he considers the reader not too smart and not too critical and just willing to accept a lot of statements at face value. If I were tasked with writing on the same subjects, I'd make damn sure I could stand to lose a toe for every incorrect word in there. (Newsgroup standards aren't that high, hallelujah.) He rightly said in his defense that a blog entry can't be an in-depth treatment for the time he can afford to invest in it. My opinion of the whole banana is that if you want to make an authoritative blog that's also short and sweet you must end up pondering each word carefully and provide ample references for the interested.

Very true, but contrived simple examples are used almost exclusively to illustrate a point when discussing merits and drawbacks of theoretical new features of the compiler. You can't just discount it because I made it up; it is a valid piece of code, provably not dangerous, and has elements that are common in programming design. Your rebuttal rings hollow because you have sidesteped the main argument in the interest of essentially name calling on my code. I'll find some real examples that are like this in Tango. If nothing else, to remove this argument about unimportance. -Steve
Nov 01 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 scope T[] findFirst(scope T[] buf, scope T target);

 scope T[] findFirst5(scope T[] buf)
 {
    scope t5 = new T(5);
    return findFirst(buf, t5);
 }

doesn't cut it.

OK, so then your proposed solution won't work for valid code like this. Will there be a way to force this to compile?

That's an excellent point. Casts won't work because scope is not really a type constructor. I think we need to explore more Walter's idea in a post of his: to be able to attach the scope of the returned value to a specific parameter. Andrei
Nov 01 2008
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 In order for this to be useful, we need to have a way to express total 
 relationships between parameters and the return value.  And that would 
 probably be so convoluted that it's not worth the syntax.

That would certainly address that problem.
Oct 31 2008
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

Oh, I encourage people who find the discussion even marginally interesting to upvote the article. The reddit militia is at it again - Walter's articles get regularly down voted by a few people, even within so short time it would be impossible for the voters to have read them. Andrei
Oct 31 2008
parent reply Jason House <jason.james.house gmail.com> writes:
Not to be rude, but the articles usually aren't very good. This article gives a
basic example of the problem space, and mentions the scope keyword to fix it,
but that's about it. It doesn't show the code with the fix. It doesn't discuss
transitivity, multiple types of scope, or any other concerns discussed on this
list.

If you double the length of the article by discussing these deeper issues, the
article would be way better. Since D articles already have a reputation to be
like this one, I'd avoid useless mentions of D, at least at the beginning of
the article.

Andrei Alexandrescu Wrote:

 Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

Oh, I encourage people who find the discussion even marginally interesting to upvote the article. The reddit militia is at it again - Walter's articles get regularly down voted by a few people, even within so short time it would be impossible for the voters to have read them. Andrei

Oct 31 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Not to be rude, but the articles usually aren't very good. This
 article gives a basic example of the problem space, and mentions the
 scope keyword to fix it, but that's about it. It doesn't show the
 code with the fix. It doesn't discuss transitivity, multiple types of
 scope, or any other concerns discussed on this list.
 
 If you double the length of the article by discussing these deeper
 issues, the article would be way better. Since D articles already
 have a reputation to be like this one, I'd avoid useless mentions of
 D, at least at the beginning of the article.

I agree. Andrei
Oct 31 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Not to be rude, but the articles usually aren't very good. This
 article gives a basic example of the problem space, and mentions the
 scope keyword to fix it, but that's about it. It doesn't show the
 code with the fix. It doesn't discuss transitivity, multiple types of
 scope, or any other concerns discussed on this list.
 
 If you double the length of the article by discussing these deeper
 issues, the article would be way better. Since D articles already
 have a reputation to be like this one, I'd avoid useless mentions of
 D, at least at the beginning of the article.

Point taken, but on the other hand, I've found if the articles are long ones they don't get read. Blogs aren't intended to be an in-depth treatment.
Oct 31 2008
parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 Not to be rude, but the articles usually aren't very good. This
 article gives a basic example of the problem space, and mentions the
 scope keyword to fix it, but that's about it. It doesn't show the
 code with the fix. It doesn't discuss transitivity, multiple types of
 scope, or any other concerns discussed on this list.
 
 If you double the length of the article by discussing these deeper
 issues, the article would be way better. Since D articles already
 have a reputation to be like this one, I'd avoid useless mentions of
 D, at least at the beginning of the article.

Point taken, but on the other hand, I've found if the articles are long ones they don't get read. Blogs aren't intended to be an in-depth treatment.

I was responding to the frequent requests to vote for D articles. I definitely agree there's a sweet spot for article length. I wasn't trying to insult or change what is written. I only wanted to point out how to get my vote, and possibly the vote of others.
Oct 31 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 I was responding to the frequent requests to vote for D articles. I
 definitely agree there's a sweet spot for article length. I wasn't
 trying to insult or change what is written. I only wanted to point
 out how to get my vote, and possibly the vote of others.

That's fair, and I appreciate the constructive criticism. Would you mind if I passed the next one by you for comments before I post it? There is another hamster lurking in the closet, however. I try to do two of them a month, and doing two in-depth articles a month is quite a bit of work. I guess I don't know just where that sweet spot is.
Oct 31 2008
parent Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 I was responding to the frequent requests to vote for D articles. I
 definitely agree there's a sweet spot for article length. I wasn't
 trying to insult or change what is written. I only wanted to point
 out how to get my vote, and possibly the vote of others.

That's fair, and I appreciate the constructive criticism. Would you mind if I passed the next one by you for comments before I post it?

I don't mind. It could even be fun :)
 There is another hamster lurking in the closet, however. I try to do two 
 of them a month, and doing two in-depth articles a month is quite a bit 
 of work. I guess I don't know just where that sweet spot is.

I don't know where the sweet spot is either :) I won't be offended if you reject my tips about article additions. It's much easier to be a critic than an author.
Nov 01 2008
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

I think for this complex stuff Lambda the Ultimate is much fitter than Reddit :-) Bye, bearophile
Oct 31 2008
parent Sean Kelly <sean invisibleduck.org> writes:
bearophile wrote:
 Walter Bright:
 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

I think for this complex stuff Lambda the Ultimate is much fitter than Reddit :-)

It's a bit off-topic, but Jaron Lanier wrote an interesting diatribe recently on metafilters such as reddit. Definitely worth a read if you like that sort of thing: http://edge.org/3rd_culture/lanier06/lanier06_index.html Sean
Oct 31 2008
prev sibling parent reply ore-sama <spam here.lot> writes:
Walter Bright Wrote:

 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

how to mark and verify this? void foo() { void goo() { DoSomeJob(); } JobHandle job=ThreadPool.QueueJob(&goo); //goo goes to the global state ... job.WaitForCompletion(); //but it's safe }
Oct 31 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"ore-sama" wrote
 Walter Bright Wrote:

 http://www.reddit.com/r/programming/comments/7akzd/escape_analysis/

how to mark and verify this? void foo() { void goo() { DoSomeJob(); } JobHandle job=ThreadPool.QueueJob(&goo); //goo goes to the global state ... job.WaitForCompletion(); //but it's safe }

I think your example would be impossible for the compiler to prove safe, even with all the code available, and all dependency graphs available. This might be a case where you have to force the compiler to accept it. Walter, if you do implement this feature, will there at least be a way to force perceived incorrect code to compile? Similar to casting away const when you know what you are doing. -Steve
Nov 01 2008
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-01 11:23:17 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 "ore-sama" wrote
 how to mark and verify this?
 
 void foo()
 {
   void goo()
   {
     DoSomeJob();
   }
   JobHandle job=ThreadPool.QueueJob(&goo); //goo goes to the global state
   ...
   job.WaitForCompletion(); //but it's safe
 }

I think your example would be impossible for the compiler to prove safe, even with all the code available, and all dependency graphs available. This might be a case where you have to force the compiler to accept it.

Just create a scope ThreadPool object, accepting a scopeof(this) function pointer and make sure the destructor waits for completion of the threads. Hum, and "goo" in this example does not use any local variable of foo, so there wouldn't be a need to allocate anything anyway. ;-) That said, I think casting scope away should be allowed, but undefined behaviour (just as casting away immutable). -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 01 2008
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 Walter, if you do implement this feature, will there at least be a way to 
 force perceived incorrect code to compile?  Similar to casting away const 
 when you know what you are doing.

Yes.
Nov 01 2008