www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Escape analysis

reply Walter Bright <newshound1 digitalmars.com> writes:
The delegate closure issue is part of a wider issue - escape analysis. A 
reference is said to 'escape' a scope if it, well, leaves that scope. 
Here's a trivial example:

int* foo() { int i; return &i; }

The reference to i escapes the scope of i, thus courting disaster. 
Another form of escaping:

int* p;
void bar(int* x) { p = x; }

which is, on the surface, legitimate, but fails for:

void abc(int j)
{
     bar(&j);
}

This kind of problem is currently undetectable by the compiler.

The first step is, are function parameters considered to be escaping by 
default or not by default? I.e.:

void bar(noscope int* p);    // p escapes
void bar(scope int* p);      // p does not escape
void bar(int* p);            // what should be the default?

What should be the default? The functional programmer would probably 
choose scope as the default, and the OOP programmer noscope.

(The issue with delegates is we need the dynamic closure only if the 
delegate 'escapes'.)
Oct 27 2008
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. Another 
 form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
     bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I think the default should be no escape. This should cover 90% of cases, and does not have an 'allocate by default' policy. But I think whether a variable escapes or not cannot really be determined by the function accepting the variable, since the function doesn't know where the variable comes from. An example: void bar(int *x, ref int *y) { y = x;} How do you know that y is not defined in the same scope or a sub-scope of the scope of x? If the compiler sees: void bar(noscope int *x, scope ref int *y) It's going to assume that x will always escape, and probably allocate a closure so it can call bar. Which might not be the right decision. I think that without a full graph analysis of what escapes to where, it is going to be impossible to make this correct for the compiler to use, and that might be too much for the compiler to deal with. I'd rather just have the compiler assume scope unless told otherwise (at the point of use, not in the function signature). For instance: void bar(int *x, ref int *y) { y = x;} void abc(int x, ref int *y) { bar(noscope &x, y); } void abc2() { int x; int *y; bar(&x, y); } tells the compiler to allocate a closure for abc, because x might escape. But does not allocate a closure for abc2, because there are no escapes indicated by the developer. -Steve
Oct 27 2008
prev sibling next sibling parent Hxal <Hxal freenode.irc> writes:
Walter Bright Wrote:
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

While requiring parameters by default to not escape the function would be a great, because it'd cause less spam (I think they don't escape in most cases) and potentially make programmers think around and refactor their code - it'd also be quite a breaking change. Defaulting to no escape checking being done and providing a scope parameter class seems therefore the more obvious choice. It keeps existing code intact and allows correctness checking and optimization on demand. My only fear is that the feature will cause much frustration when we can reason that a reference doesn't escape, but the compiler can't know that. For example putting one scope parameter into another's field, or referencing a scope parameter from a complex return value. Anyway, if escape analysis is implemented, I'd suggest using a more high level terminology like temporary and permanent objects. Might make more sense to beginners.
Oct 27 2008
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
The reason the scope/noscope needs to be part of the function signature 
is because:

1. that makes it self-documenting
2. function bodies may be external, i.e. not present
3. virtual functions
4. notifies the user if a library function parameter scope-ness changes 
(you'll get a compile time error)
Oct 27 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature is 
 because:

 1. that makes it self-documenting

But the documentation is not enough. You cannot express the intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.
 2. function bodies may be external, i.e. not present
 3. virtual functions

Yes, so you are now implying a scope escape contract on all derived classes. But not a very expressive one.
 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

Oh really? I imagined that if the scope-ness changed it just results in a new heap allocation when I call the function. i.e. Joe library developer has this function foo: int foo(scope int *x) {return *x;} And he now decides he wants to change it somehow: int *lastFooCalledWith; int foo(int *x) {lastFooCalledWith = x; return *x;} I used foo like this: int i; auto j = foo(&i); So does this now fail to compile? Or does it silently kill the performance of my code? If the latter, we are left with the same problem we have now. If the former, how does one call a function with a noscope parameter? The more I think about this, the more I'd rather have D1 behavior and some sort of way to indicate my function should allocate a heap frame (except on easily provable scope escapes). The most common case I think which will cause unnecessary allocations, is a very common case. A class setter: class X { private int *v_; int *v(int *newV) {return v_ = newV;} int *v() { return v_;} } Clearly, newV escapes into the class instance, but how do we know what the scope of the class instance is to know if newV truly escapes its own scope? -Steve
Oct 28 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Robert Jacques" wrote
 On Tue, 28 Oct 2008 08:58:18 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 4. notifies the user if a library function parameter scope-ness changes
 (you'll get a compile time error)

Oh really? I imagined that if the scope-ness changed it just results in a new heap allocation when I call the function. i.e. Joe library developer has this function foo: int foo(scope int *x) {return *x;} And he now decides he wants to change it somehow: int *lastFooCalledWith; int foo(int *x) {lastFooCalledWith = x; return *x;} I used foo like this: int i; auto j = foo(&i); So does this now fail to compile? Or does it silently kill the performance of my code? If the latter, we are left with the same problem we have now. If the former, how does one call a function with a noscope parameter? The more I think about this, the more I'd rather have D1 behavior and some sort of way to indicate my function should allocate a heap frame (except on easily provable scope escapes). The most common case I think which will cause unnecessary allocations, is a very common case. A class setter: class X { private int *v_; int *v(int *newV) {return v_ = newV;} int *v() { return v_;} } Clearly, newV escapes into the class instance, but how do we know what the scope of the class instance is to know if newV truly escapes its own scope?

Escape analysis also applies to shared/local/scope storage types and not just delegates. Consider having to write a function for every combination of shared/local/scope for every object or pointer in the function signature.

shared/unshared is not a storage class, it is a type modifier (like const). But in any case, shared is much easier to define. Only one line needs to be checked -- is this accessible by another thread or not. Since it is a type modifier, it's carried around for every reference to shared data, and you can easily do escape analysis there. Scope is much more difficult because there are many scopes to consider. It's not just global or not global, you have a scope for each function, a scope for each set of braces within a function, and there is no easy way to say which scope you are referring to when you say a variable is scope. If you can only refer to the current scope, then you have not solved the closure problem, and useful escape analysis is impossible beyond simply 'a pointer to a variable I declared in this scope is being returned.' In order for escape analysis to be useful, I need to be able to specify in a function such as: void foo(int *x, int **y, int **z) That x might escape to y's or z's scope. How do you do allow that specification without making function signatures dreadfully complicated? -Steve
Oct 28 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Robert Jacques" wrote
 On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like 
 const).

No, because shared and local objects get created and garbage collected on different heaps.

That's an interesting point. Shared definitely has to be a type modifier, otherwise, it cannot do this: shared int x = 0; int *xp = &x; // error, xp now is unshared, and points to shared data. But it probably also has to be a storage class also. Not sure about that.
 In order for escape analysis to be useful, I need to be able to specify 
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully complicated?

Well, x escapes to y or z is easy since it's how D works today.

But what if y or z is not in x's scope? For instance: void bar(ref int *y, ref int *z) { int x = 5; foo(&x, &y, &z); } If y or z gets set to &x, then you have to allocate a closure for bar. The opposite example: void bar(int *y, int *z) { int x = 5; foo(&x, &y, &z); } No closure necessary. So you need something to say that y or z can get set to x, so the compiler would be smart enough to only allocate a closure if y or z exists outside x's scope. Otherwise, you have unnecessary closures, and we are in the same boat as today. -Steve
Oct 28 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature is 
 because:

 1. that makes it self-documenting

But the documentation is not enough. You cannot express the intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.

I think it is conceptually straightforward whether a reference escapes or not, though it is difficult for the compiler to detect it reliably.
 2. function bodies may be external, i.e. not present
 3. virtual functions

Yes, so you are now implying a scope escape contract on all derived classes. But not a very expressive one.
 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

Oh really? I imagined that if the scope-ness changed it just results in a new heap allocation when I call the function.

First off, the mangled names will be different, so it won't link until you recompile. This is critical because the caller's code depends on the scope/noscope characteristic. Secondly, passing a scoped reference to a noscope parameter should be a compile time error.
 The more I think about this, the more I'd rather have D1 behavior and some 
 sort of way to indicate my function should allocate a heap frame (except on 
 easily provable scope escapes).

Having the caller specify it is not tenable, because the caller has no control over (and likely no knowledge of) what the callee does. Functions should be regarded as black boxes, where all you can know about them is in the function signature.
 The most common case I think which will cause unnecessary allocations, is a 
 very common case.  A class setter:
 
 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }
 
 Clearly, newV escapes into the class instance,

Then it's noscope.
 but how do we know what the 
 scope of the class instance is to know if newV truly escapes its own scope?

We take the conservative approach, and regard "might escape" and "don't know if it escapes" as "treat as if it does escape".
Oct 28 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

But the documentation is not enough. You cannot express the intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.

I think it is conceptually straightforward whether a reference escapes or not, though it is difficult for the compiler to detect it reliably.
 2. function bodies may be external, i.e. not present
 3. virtual functions

Yes, so you are now implying a scope escape contract on all derived classes. But not a very expressive one.


There's another weird issue that I'm not sure if anyone has touched on: struct S { int x; int getX() { return x; } } void main() { auto s = new S; fn( s ); } void fnA( S* s ) { fnB( &s.getX ); } void fnB( noscope int delegate() dg ) {} How does the compiler handle this? It can't tell by inspecting the type whether the data for S is dynamic... in fact, the same could be said of a "scope" instance of a class. I guess it would have to assume that object variables without a "noscope" label must be scoped? Sean
Oct 28 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Sean Kelly wrote:
 How does the compiler handle this?  It can't tell by inspecting the type 
 whether the data for S is dynamic... in fact, the same could be said of 
 a "scope" instance of a class.  I guess it would have to assume that 
 object variables without a "noscope" label must be scoped?

What you're talking about is the escaping of pointers to local variables. The compiler does not detect it, except in trivial cases. This is why, in safe mode, taking the address of a local variable will not be allowed.
Oct 28 2008
parent Don <nospam nospam.com.au> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 How does the compiler handle this?  It can't tell by inspecting the 
 type whether the data for S is dynamic... in fact, the same could be 
 said of a "scope" instance of a class.  I guess it would have to 
 assume that object variables without a "noscope" label must be scoped?

What you're talking about is the escaping of pointers to local variables. The compiler does not detect it, except in trivial cases. This is why, in safe mode, taking the address of a local variable will not be allowed.

You could allow it in inside a pure function, whenever the return type does not contain pointers.
Oct 29 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sean Kelly wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.

or not, though it is difficult for the compiler to detect it reliably.
 2. function bodies may be external, i.e. not present
 3. virtual functions

classes. But not a very expressive one.


There's another weird issue that I'm not sure if anyone has touched on: struct S { int x; int getX() { return x; } } void main() { auto s = new S; fn( s ); } void fnA( S* s ) { fnB( &s.getX ); }

Part of s escapes, so the compiler should assume that the whole s escapes. If s is scope by default, it should be a compile-time error here.
 void fnB( noscope int delegate() dg ) {}
 
 How does the compiler handle this?  It can't tell by inspecting the type 
 whether the data for S is dynamic... in fact, the same could be said of 
 a "scope" instance of a class.  I guess it would have to assume that 
 object variables without a "noscope" label must be scoped?
 
 
 Sean

Oct 28 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Sergey Gromov wrote:
 Sean Kelly wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.

or not, though it is difficult for the compiler to detect it reliably.
 2. function bodies may be external, i.e. not present
 3. virtual functions

classes. But not a very expressive one.


struct S { int x; int getX() { return x; } } void main() { auto s = new S; fn( s ); } void fnA( S* s ) { fnB( &s.getX ); }

Part of s escapes, so the compiler should assume that the whole s escapes. If s is scope by default, it should be a compile-time error here.

So are you saying that I'd have to rewrite fnA as: void fnA( noscope S* s ) {...} I guess I can see the point, but that's horribly viral. Particularly when classes come into the picture. With this in mind, from a syntax standpoint I'd be leaning towards what D does right now (ie having noscope as the default), but from a performance standpoint this is absolutely not an option--I may as well just switch to something like Ruby. Sean
Oct 28 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature 
 is because:

 1. that makes it self-documenting

But the documentation is not enough. You cannot express the intricacies of what variables are scope escapes so that the compiler can make intelligent enough decisions. What this will result in is slightly less unnecessary closures, but not enough to make a difference. Or else you won't be able to declare things the way you want, so you will be forced to declare something that *could* result in an escape, but usually doesn't.

I think it is conceptually straightforward whether a reference escapes or not, though it is difficult for the compiler to detect it reliably.
 2. function bodies may be external, i.e. not present
 3. virtual functions

Yes, so you are now implying a scope escape contract on all derived classes. But not a very expressive one.
 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

Oh really? I imagined that if the scope-ness changed it just results in a new heap allocation when I call the function.

First off, the mangled names will be different, so it won't link until you recompile. This is critical because the caller's code depends on the scope/noscope characteristic.

This 'feature' is basically useless ;) D has no shared libraries, so I don't think anyone generally keeps their stale object files around and tries to link with them instead of trying to recompile the sources. You are asking for trouble otherwise.
 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

OK, so when does a closure happen? I thought the point of this was to specify when a closure was necessary... compiler sees foo(noscope int *x) I try to pass in an address to a local variable. Compiler says, hm... I need a closure to convert my scope variable into a noscope.
 The more I think about this, the more I'd rather have D1 behavior and 
 some sort of way to indicate my function should allocate a heap frame 
 (except on easily provable scope escapes).

Having the caller specify it is not tenable, because the caller has no control over (and likely no knowledge of) what the callee does. Functions should be regarded as black boxes, where all you can know about them is in the function signature.

But the compiler's lack of knowledge/proof about the escape intricacies of a function will cause either a) unnecessary closure allocation, or b) impossible specifications. i.e. I want to specify that either a scope or noscope variable can be passed in, and the variable might escape depending on what you pass in for other arguments, how do I do that?
 The most common case I think which will cause unnecessary allocations, is 
 a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,

Then it's noscope.

So then to call X.v, the function must allocate a closure? How does this improve the current situation where closures are allocated by default?
 but how do we know what the scope of the class instance is to know if 
 newV truly escapes its own scope?

We take the conservative approach, and regard "might escape" and "don't know if it escapes" as "treat as if it does escape".

Also untenable. We have the same situation today. You will have achieved nothing with this syntax except making people write scope or noscope everywhere to satisfy incomplete compiler rules. -Steve
Oct 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 "Walter Bright" wrote
 First off, the mangled names will be different, so it won't link until you 
 recompile. This is critical because the caller's code depends on the 
 scope/noscope characteristic.

This 'feature' is basically useless ;) D has no shared libraries, so I don't think anyone generally keeps their stale object files around and tries to link with them instead of trying to recompile the sources. You are asking for trouble otherwise.

I disagree. The whole idea behind separate compilation and using makefiles is to recompile only what is necessary. Encoding the function specification into its identifier is a tried and true way of detecting mistakes in that.
 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

OK, so when does a closure happen? I thought the point of this was to specify when a closure was necessary... compiler sees foo(noscope int *x) I try to pass in an address to a local variable. Compiler says, hm... I need a closure to convert my scope variable into a noscope.

Either the compiler issues an error, or it allocates the scoped variable on the heap. I prefer the former behavior.
 But the compiler's lack of knowledge/proof about the escape intricacies of a 
 function will cause either a) unnecessary closure allocation, or b) 
 impossible specifications.  i.e. I want to specify that either a scope or 
 noscope variable can be passed in, and the variable might escape depending 
 on what you pass in for other arguments, how do I do that?

You make it noscope. Remember that scope is an optimization.
 The most common case I think which will cause unnecessary allocations, is 
 a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,


So then to call X.v, the function must allocate a closure? How does this improve the current situation where closures are allocated by default?

If it's escaping, you MUST allocate it in a way that doesn't disappear when the escape happens.
 We take the conservative approach, and regard "might escape" and "don't 
 know if it escapes" as "treat as if it does escape".

Also untenable. We have the same situation today. You will have achieved nothing with this syntax except making people write scope or noscope everywhere to satisfy incomplete compiler rules.

The improvement with the 'scope' keyword is it allows the compiler to assume that the reference does not escape.
Oct 28 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 First off, the mangled names will be different, so it won't link until 
 you recompile. This is critical because the caller's code depends on the 
 scope/noscope characteristic.

This 'feature' is basically useless ;) D has no shared libraries, so I don't think anyone generally keeps their stale object files around and tries to link with them instead of trying to recompile the sources. You are asking for trouble otherwise.

I disagree. The whole idea behind separate compilation and using makefiles is to recompile only what is necessary. Encoding the function specification into its identifier is a tried and true way of detecting mistakes in that.

Any decent build tool (including make, assuming dependencies are created) will rebuild the source when it sees the dependency changed. In this case, if the new signature can be used, it will recompile silently. That was my point. However, I'm no longer sure what you are planning, because you have sufficiently confused me ;) So if the recompile causes a compile failure, then it would fail. But that is unrelated to the requirement that you have to recompile to get it to link. Even if the function signatures are the same, the build tool is going to recompile the file instead of linking the stale object.
 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

OK, so when does a closure happen? I thought the point of this was to specify when a closure was necessary... compiler sees foo(noscope int *x) I try to pass in an address to a local variable. Compiler says, hm... I need a closure to convert my scope variable into a noscope.

Either the compiler issues an error, or it allocates the scoped variable on the heap. I prefer the former behavior.

Huh? So no automatic closures? If the compiler can't prove that a closure is or is not necessary, does code now just fail to compile?
 But the compiler's lack of knowledge/proof about the escape intricacies 
 of a function will cause either a) unnecessary closure allocation, or b) 
 impossible specifications.  i.e. I want to specify that either a scope or 
 noscope variable can be passed in, and the variable might escape 
 depending on what you pass in for other arguments, how do I do that?

You make it noscope. Remember that scope is an optimization.
 The most common case I think which will cause unnecessary allocations, 
 is a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,


So then to call X.v, the function must allocate a closure? How does this improve the current situation where closures are allocated by default?

If it's escaping, you MUST allocate it in a way that doesn't disappear when the escape happens.

The problem is, what if I know it's escaping in some cases, but not in others, but the compiler can't tell either way? (see example below)
 We take the conservative approach, and regard "might escape" and "don't 
 know if it escapes" as "treat as if it does escape".

Also untenable. We have the same situation today. You will have achieved nothing with this syntax except making people write scope or noscope everywhere to satisfy incomplete compiler rules.

The improvement with the 'scope' keyword is it allows the compiler to assume that the reference does not escape.

And is that property enforced while compiling the function, or does the compiler assume the author knows best? Like I said, I'm sufficiently confused... How do I markup class X so that at least foo and foo2 compile without issues? class X { int *p; this(int *p_) {p = p_;} } // I expect this to compile and work. void foo() { int i; auto x = new X(&i); } // I expect this to compile and work. X foo2() { int[] arr = new int[1]; return new X(&arr[0]); } // What happens here, a closure or a failure? X foo3() { int i; auto x = new X(&i); return x; } If you have some syntax such that all 3 compile (i.e. foo3 creates a closure), then how does the compiler know foo3 is ok? -Steve
Oct 28 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 28 Oct 2008 08:58:18 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 4. notifies the user if a library function parameter scope-ness changes
 (you'll get a compile time error)

Oh really? I imagined that if the scope-ness changed it just results in a new heap allocation when I call the function. i.e. Joe library developer has this function foo: int foo(scope int *x) {return *x;} And he now decides he wants to change it somehow: int *lastFooCalledWith; int foo(int *x) {lastFooCalledWith = x; return *x;} I used foo like this: int i; auto j = foo(&i); So does this now fail to compile? Or does it silently kill the performance of my code? If the latter, we are left with the same problem we have now. If the former, how does one call a function with a noscope parameter? The more I think about this, the more I'd rather have D1 behavior and some sort of way to indicate my function should allocate a heap frame (except on easily provable scope escapes). The most common case I think which will cause unnecessary allocations, is a very common case. A class setter: class X { private int *v_; int *v(int *newV) {return v_ = newV;} int *v() { return v_;} } Clearly, newV escapes into the class instance, but how do we know what the scope of the class instance is to know if newV truly escapes its own scope?

Escape analysis also applies to shared/local/scope storage types and not just delegates. Consider having to write a function for every combination of shared/local/scope for every object or pointer in the function signature.
Oct 28 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like  
 const).

No, because shared and local objects get created and garbage collected on different heaps.
 In order for escape analysis to be useful, I need to be able to specify  
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully complicated?

Well, x escapes to y or z is easy since it's how D works today. And if you have a no_assignment type, then the x won't escape to y or z is easy too. It's the mixed cases that things get complicated in. I'd recommend looking up pedigree types as one possible solution.
Oct 28 2008
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 28 Oct 2008 14:46:34 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Robert Jacques" wrote
 On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like
 const).

No, because shared and local objects get created and garbage collected on different heaps.

That's an interesting point. Shared definitely has to be a type modifier, otherwise, it cannot do this: shared int x = 0; int *xp = &x; // error, xp now is unshared, and points to shared data. But it probably also has to be a storage class also. Not sure about that.

This is a desirable error that was discussed back when shared was introduced. You can think of shared / local like immutable and mutable. The real problem is that a 'const' for shared/local/scope isn't clear yet.
 In order for escape analysis to be useful, I need to be able to specify
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully  
 complicated?

Well, x escapes to y or z is easy since it's how D works today.

But what if y or z is not in x's scope?

Which is an issue with the user of foo, but not foo's signature.
 For instance:
 void bar(ref int *y, ref int *z)
 {
    int x = 5;
    foo(&x, &y, &z);
 }

 If y or z gets set to &x, then you have to allocate a closure for bar.

 The opposite example:

 void bar(int *y, int *z)
 {
    int x = 5;
    foo(&x, &y, &z);
 }

 No closure necessary.  So you need something to say that y or z can get  
 set
 to x, so the compiler would be smart enough to only allocate a closure  
 if y
 or z exists outside x's scope.  Otherwise, you have unnecessary closures,
 and we are in the same boat as today.

This example, although important is essentially about whether optimizing the closure is valid or not and has nothing to do with the behaviour of foo. However, this does seem to illustrate a need for three types: global escape (variable may escape to anywhere), pure escape (variable may escape to other inputs), no escape (variable is guaranteed not to escape). For example, if foo saved &x to a static variable (global escape) then is all cases it needs to be heap allocated. But if (as in your example) &x is saved to one of the function inputs (pure escape), then the caller can detect if it can ensure no escape and therefore use the stack.
Oct 28 2008
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Pure functions almost implicitly imply that its parameters are all 
scoped. The exception is the return value of the pure function. If the 
return value can contain any references that came from the parameters, 
then those parameters are not scoped.
Oct 27 2008
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-27 17:33:36 -0400, Walter Bright <newshound1 digitalmars.com> said:

 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

Not if you define "scope" in the function prototype as not escaping the caller's scope. That would mean that you can recieve a "caller scope" pointer on input and return it back to the caller when the function ends. It never escapes the caller's scope, so all is fine. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 27 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

I think even the return value can be considered scoped. Essentially it does not leave the scope of the caller. Andrei
Oct 27 2008
parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Walter Bright wrote:
 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

I think even the return value can be considered scoped. Essentially it does not leave the scope of the caller. Andrei

As far as I know, there's no way for functions to specially prepare objects to be called scope. Isn't that the called's choice?
Oct 28 2008
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I like the original definition of in as "const scope". I would also like in to be the default for function parameters. Does that make me a heretic OOP programmer? :)
Oct 27 2008
prev sibling next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I get the feeling that D's type system is going to become the joke of the programming world. Are we really going to have to worry about a scope unshared(invariant(int)*) ...? What other type modifiers can we put on that?
Oct 27 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

That argues that "noscope" should be the default. Using "scope" would be an optional optimization. BTW, "unshared" is the default. "shared" would be the keyword.
Oct 27 2008
next sibling parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Walter Bright wrote:
 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

That argues that "noscope" should be the default. Using "scope" would be an optional optimization. BTW, "unshared" is the default. "shared" would be the keyword.

My point wasn't the number of keywords... ("shared" is actually the first keyword introduced that's conflicted with an identifier I've used). My point was the type system is getting incredibly complex. The theory that static typing is the solution to everything is what lead to the beast known as checked exceptions.
Oct 27 2008
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Robert Fraser wrote:
 My point wasn't the number of keywords... ("shared" is actually the 
 first keyword introduced that's conflicted with an identifier I've 
 used). My point was the type system is getting incredibly complex. The 
 theory that static typing is the solution to everything is what lead to 
 the beast known as checked exceptions.

The complexity is an issue that concerns me. That's why I suspect that if one doesn't use them, the defaults should work. I wouldn't worry about checked exceptions. *Why* it's a disaster is well understood, and the reason isn't because it is complicated or does static checking.
Oct 27 2008
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Fraser wrote:
 Walter Bright wrote:
 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

That argues that "noscope" should be the default. Using "scope" would be an optional optimization. BTW, "unshared" is the default. "shared" would be the keyword.

My point wasn't the number of keywords... ("shared" is actually the first keyword introduced that's conflicted with an identifier I've used). My point was the type system is getting incredibly complex. The theory that static typing is the solution to everything is what lead to the beast known as checked exceptions.

I don't think you have a case. Andrei
Oct 27 2008
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> said:

 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

I don't think you have much choice. Take these examples: scope(int*)* a; // noscope pointer to a scope pointer. noscope(int*)* b; // scope pointer to a noscope pointer. Only one of these two makes sense. - - - On the other side, you could make a different syntax for scope than for const and shared, and then the noscope could be the default: int*scope* b; // scope pointer to a noscope pointer. But that looks as attractive as const in C++. - - - Hum, and please find a better name than "noscope". -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 27 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Michel Fortin wrote:
 On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

I don't think you have much choice. Take these examples: scope(int*)* a; // noscope pointer to a scope pointer. noscope(int*)* b; // scope pointer to a noscope pointer. Only one of these two makes sense.

scope is a storage class, not a type constructor.
Oct 27 2008
parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Michel Fortin wrote:
 On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

I don't think you have much choice. Take these examples: scope(int*)* a; // noscope pointer to a scope pointer. noscope(int*)* b; // scope pointer to a noscope pointer. Only one of these two makes sense.

scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a delegate in it, is it treated as scope too? What if it's an array? A class?
Oct 27 2008
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

The scope applies to the bits of the object, not what they may refer to.
Oct 27 2008
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-28 00:28:27 -0400, Walter Bright <newshound1 digitalmars.com> said:

 Jason House wrote:
 Walter Bright wrote:
 scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

The scope applies to the bits of the object, not what they may refer to.

So basically, we always have head-scope. Here's my question: int** a; void foo() { scope int b; scope int* c = &b; scope int** d = &c; a = &c; // error, c is scope, can't copy address of scope to non-scope. a = d; // error? d is scope, but we're only making a copy of its bits. // It's what d points to that is scope, but do we know about that? } In this case, it's obvious that the last assignment (a = d) is bogus. Is there any plan in having this fail to compile? If so, where does it fail? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 28 2008
parent Jason House <jason.james.house gmail.com> writes:
Michel Fortin Wrote:

 On 2008-10-28 00:28:27 -0400, Walter Bright <newshound1 digitalmars.com> said:
 
 Jason House wrote:
 Walter Bright wrote:
 scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

The scope applies to the bits of the object, not what they may refer to.

So basically, we always have head-scope. Here's my question: int** a; void foo() { scope int b; scope int* c = &b; scope int** d = &c; a = &c; // error, c is scope, can't copy address of scope to non-scope. a = d; // error? d is scope, but we're only making a copy of its bits. // It's what d points to that is scope, but do we know about that? }

Your assignment to c discards the scope protection. Taking the address of scope variables should be an error.
 
 In this case, it's obvious that the last assignment (a = d) is bogus. 
 Is there any plan in having this fail to compile? If so, where does it 
 fail?
 
 -- 
 Michel Fortin
 michel.fortin michelf.com
 http://michelf.com/
 

Oct 28 2008
prev sibling next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

The scope applies to the bits of the object, not what they may refer to.

This seems rather limiting. I know this is aimed at addressing the dynamic closure problem. This solution would mean that I can't encapsulate delegates. Ideally, I should be able to declare my encapsulating struct as scope or noscope and manage the member delegate accordingly.
Oct 28 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 scope is a storage class, not a type constructor.

struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

refer to.

This seems rather limiting. I know this is aimed at addressing the dynamic closure problem. This solution would mean that I can't encapsulate delegates. Ideally, I should be able to declare my encapsulating struct as scope or noscope and manage the member delegate accordingly.

I think it's clear that scope is transitive as much as const or immutable are. Noscope is also transitive. Escape analysis is a tricky business. My opinion is that we either take care of it properly or blissfully ignore the entire issue. That opinion may disagree a bit with Walter's, who'd prefer a quick patch for delegates so he returns to threading. I think if we opt for a quick patch now, it'll turn to gangrene later. Among other things, it will hurt the threading infrastructure it was supposed to give precedence to. Andrei
Oct 28 2008
next sibling parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 scope is a storage class, not a type constructor.

struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

refer to.

This seems rather limiting. I know this is aimed at addressing the dynamic closure problem. This solution would mean that I can't encapsulate delegates. Ideally, I should be able to declare my encapsulating struct as scope or noscope and manage the member delegate accordingly.

I think it's clear that scope is transitive as much as const or immutable are. Noscope is also transitive. Escape analysis is a tricky business. My opinion is that we either take care of it properly or blissfully ignore the entire issue. That opinion may disagree a bit with Walter's, who'd prefer a quick patch for delegates so he returns to threading. I think if we opt for a quick patch now, it'll turn to gangrene later. Among other things, it will hurt the threading infrastructure it was supposed to give precedence to. Andrei

Transitive scope means that scope can't be a storage class. It's a tricky subject and threading is way more important to me. I'm fine with a quick fix, I just don't want to pretend it's more than that.
Oct 28 2008
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Jason House wrote:
 Walter Bright Wrote:

 Jason House wrote:
 scope is a storage class, not a type constructor.

struct with a delegate in it, is it treated as scope too? What if it's an array? A class?

refer to.

This seems rather limiting. I know this is aimed at addressing the dynamic closure problem. This solution would mean that I can't encapsulate delegates. Ideally, I should be able to declare my encapsulating struct as scope or noscope and manage the member delegate accordingly.

I think it's clear that scope is transitive as much as const or immutable are. Noscope is also transitive. Escape analysis is a tricky business. My opinion is that we either take care of it properly or blissfully ignore the entire issue. That opinion may disagree a bit with Walter's, who'd prefer a quick patch for delegates so he returns to threading. I think if we opt for a quick patch now, it'll turn to gangrene later. Among other things, it will hurt the threading infrastructure it was supposed to give precedence to.

A quick patch is not possible IMO. What I'd prefer is allocate closure when you can prove it, allow specification when you can't. That is, allocate a closure automatically in simple cases like this: int *f() {int x = 5; return &x;} And in cases where you can't prove it, default to not allocating a closure, and allow the developer to specify that a closure is necessary: int *f2(int *y){...} int *f() <insert closure keyword here> {int x = 5; return f2(&x);} Syntax to be debated ;) I do *not* think the problem should be ignored (i.e. continue with the current D2 implementation). -Steve
Oct 28 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically 
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a 
 closure,
 and allow the developer to specify that a closure is necessary:

So basically programmers have to memorize all the rules the compiler uses to prove when it's necessary to allocate a closure, and then run those rules in their heads to determine if the current line of code will trigger allocation or not?

First, the compiler does not have any sound rules for this. It currently allocates a closure on a knee jerk reaction from taking the address of a stack variable. And its either this or substitute in your statement "prove when it's *not* necessary to allocate a closure", which is about as hard and probably 10x more common. Second, for 90% of functions that don't require you to allocate closures, you don't have to think about any rules. For the 9% of functions which return a pointer to local data, proven by the compiler, you don't have to think about rules. For the last 1% of functions, the documentation should clarify how your data can escape, and then you have to think about how that affects your usage of it. The docs could say 'best to allocate a closure unless you know what you are doing'.
 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

Not really. If the compiler can some day store the scope dependency information in the object file (and get rid of reading source to determine function signature), then this whole manual requirement goes away.
 That doesn't sound like a good solution to me.

Then let's go back to D1's solution -- no closures ;) For example, NONE of tango uses closures (as evidenced by the fact that it's D1), and it uses pointers to stack data very often (to improve performance). So if closure-by-default is the choice, then I'll have to mark all these usages as non-closure, which is going to make the whole code base look awful. With the way Walter is thinking of implementing, it might be impossible to specify correctly. -Steve
Oct 28 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 Escape analysis is a tricky business. My opinion is that we either take 
 care of it properly or blissfully ignore the entire issue. That opinion 
 may disagree a bit with Walter's, who'd prefer a quick patch for 
 delegates so he returns to threading. I think if we opt for a quick 
 patch now, it'll turn to gangrene later. Among other things, it will 
 hurt the threading infrastructure it was supposed to give precedence to.

Like const, I'd rather have no solution than a bad solution insofar as escape analysis is concerned. Sean
Oct 28 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 1:04 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu wrote:
 Escape analysis is a tricky business. My opinion is that we either take
 care of it properly or blissfully ignore the entire issue. That opinion may
 disagree a bit with Walter's, who'd prefer a quick patch for delegates so he
 returns to threading. I think if we opt for a quick patch now, it'll turn to
 gangrene later. Among other things, it will hurt the threading
 infrastructure it was supposed to give precedence to.

escape analysis is concerned.

The only serious problem people have right now is that closures are allocated automatically when they may not need to be. Making closure allocation manual for now seems like the most future-compatible way to fix things. In some nebulous future, the manual allocation could become unnecessary, or it could become compiler-checked, but it seems to me that for now just making it manual does the least harm and lets Walter get back to work on other things.

This would be the most backwards-compatible way also. The only real argument against it in my mind is that it makes the default behavior the unsafe behavior. I don't think this is a big deal given what I see as the target market for D, but then I don't see a point in SafeD either, for the same reason. The syntax seems like it should be pretty straightforward: use 'new' (Andrei will love that ;-)): void fn( int delegate() dg ); void main() { int x; int getX() { return x; } // static closure fn( &getX ); // dynamic closure fn( new &getX ); } That said, the fact that some function calls will always be opaque suggests to me that automatic escape analysis will never be possible in all situations. Therefore, we'll likely need something roughly similar to the proposed keyword eventually. So perhaps it really is worth considering adding some sort of 'noscope' storage class now: // generates a dynamic closure noscope int delegate() dg = &getX; I do think, however, that 'scope' should be the default behavior, for two reasons. It's backwards-compatible, which is handy. But more importantly, I'd say that probably 95% of the current uses of delegates are scoped, and that isn't likely to shift all the way to 50% even if D moved to a much more functional style of programming. Algorithms, for example, all use scoped delegates, which I'd say is far and away their most common current use. Sean
Oct 28 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for 
 two reasons.  It's backwards-compatible, which is handy.  But more 
 importantly, I'd say that probably 95% of the current uses of delegates 
 are scoped, and that isn't likely to shift all the way to 50% even if D 
 moved to a much more functional style of programming.  Algorithms, for 
 example, all use scoped delegates, which I'd say is far and away their 
 most common current use.

The counter to that is that when there is an inadvertent escape of a reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer. Contrast that with, say, a null pointer bug which results in an unambiguous sudden halt to the program with a clear indication of what happened. The 'scope' storage class also has a future in that it is possible using data flow analysis to statically verify it.
Oct 28 2008
parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for 
 two reasons.  It's backwards-compatible, which is handy.  But more 
 importantly, I'd say that probably 95% of the current uses of 
 delegates are scoped, and that isn't likely to shift all the way to 
 50% even if D moved to a much more functional style of programming.  
 Algorithms, for example, all use scoped delegates, which I'd say is 
 far and away their most common current use.

The counter to that is that when there is an inadvertent escape of a reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.
 The 'scope' storage class also has a future in that it is possible using 
 data flow analysis to statically verify it.

This is the real benefit in my mind. From a "features I want in a systems programming language" standpoint I absolutely do not want default dynamic closures (today at any rate). However, just like 'const' I very much appreciate that this approach allows for static verification. So as much as I hate to say so I think that default dynamic closures would be the best long-term option for D. The cost of DMA will continue to come down anyway, and once a codebase is converted it probably won't be too difficult to maintain going forward. Sean
Oct 28 2008
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors, 
 even though rare, is very high. This makes it highly desirable to 
 prevent them automatically, rather than relying on the skill and 
 attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

As the author of an open source multithreaded application in D1, I've had these errors pop up. It's easy to overlook this stuff and pass things the wrong way (it's easier to code). Tango doesn't even have a bind library to make it easier!
Oct 28 2008
next sibling parent Jason House <jason.james.house gmail.com> writes:
Jarrett Billingsley Wrote:

 On Tue, Oct 28, 2008 at 6:29 PM, Jason House
 <jason.james.house gmail.com> wrote:
 Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent them automatically, rather than relying on the skill and
 attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

As the author of an open source multithreaded application in D1, I've had these errors pop up. It's easy to overlook this stuff and pass things the wrong way (it's easier to code). Tango doesn't even have a bind library to make it easier!

For what it's worth, std.bind I think depends on one Phobos-specific function. It would probably take a matter of a minute or two to port it to work with Tango.

I ported a bind implementation and maintain it in my code base. I didn't mention that because I still maintain hope that Tango will add it. The last time I asked the Tango folks why they didn't have it, the answer was something to the effect of "we don't recognize a need for it"
Oct 28 2008
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Jason House wrote:
 Sean Kelly Wrote:
 
 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those
 errors, even though rare, is very high. This makes it highly
 desirable to prevent them automatically, rather than relying on
 the skill and attention to detail of the programmer.

way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

As the author of an open source multithreaded application in D1, I've had these errors pop up. It's easy to overlook this stuff and pass things the wrong way (it's easier to code). Tango doesn't even have a bind library to make it easier!

I agree. Particularly in higher-order code this kind of problem is bound to show itself. And it's a really really nasty case of reality ripping straight through a carefully-conceived abstraction - something like a bullet carving through a precision microprocessor. Andrei
Oct 28 2008
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me the 
 benefit is entirely theoretical while the cost is immediate.

I have. Not often in my own code because I am very careful to avoid it, but it frequently happens in 'bug' reports I get sent. This trap does happen to programmers who are less familiar with how the underlying stack machine actually works. The real problem is there is no way to verify that this isn't happening in some arbitrarily large code base. I strongly believe that it is good for D and for programming languages in general to work towards a design that can provably eliminate certain types of bugs.
Oct 28 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Walter Bright wrote:
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me 
 the benefit is entirely theoretical while the cost is immediate.

I have. Not often in my own code because I am very careful to avoid it, but it frequently happens in 'bug' reports I get sent. This trap does happen to programmers who are less familiar with how the underlying stack machine actually works.

I tend to ask a question along these lines to entry-level interviewees and it's surprising how often they get it wrong. So I agree that this is a fair point. I mostly brought up this argument because C++ is unapologetically designed for experts and I'm occasionally inclined to view D the same way... even though its goal is really somewhat different.
 The real problem is there is no way to verify that this isn't happening 
 in some arbitrarily large code base. I strongly believe that it is good 
 for D and for programming languages in general to work towards a design 
 that can provably eliminate certain types of bugs.

I agree, which is why I'm actually in favor of this despite what I said above. Sean
Oct 28 2008
parent Walter Bright <newshound1 digitalmars.com> writes:
Sean Kelly wrote:
 I tend to ask a question along these lines to entry-level interviewees 
 and it's surprising how often they get it wrong.  So I agree that this 
 is a fair point.  I mostly brought up this argument because C++ is 
 unapologetically designed for experts and I'm occasionally inclined to 
 view D the same way... even though its goal is really somewhat different.

To me that is akin to building a car with no brakes and justifying it by saying it is "designed for experts." Sure, an expert who never makes any mistakes could effectively drive such a car. The trouble is, the road is full of non-expert drivers the expert ones are forced to interact with, and even the experts still make mistakes now and then. I don't believe that having brakes impairs the performance of my car one bit. I also would not say that C++ was deliberately designed without brakes, it just kinda worked out that way. We have the benefit of hindsight in designing D.
Oct 28 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me the 
 benefit is entirely theoretical while the cost is immediate.

I have. Not often in my own code because I am very careful to avoid it, but it frequently happens in 'bug' reports I get sent. This trap does happen to programmers who are less familiar with how the underlying stack machine actually works. The real problem is there is no way to verify that this isn't happening in some arbitrarily large code base. I strongly believe that it is good for D and for programming languages in general to work towards a design that can provably eliminate certain types of bugs.

I agree with this. It would be nice to be able to flag these kinds of things. Even if it was a warning and not a true error. Just not a solution which silently allocates data that shouldn't be allocated. This would be a great candidate for a lint tool. -Steve
Oct 28 2008
parent Robert Fraser <fraserofthenight gmail.com> writes:
Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 1:04 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

but it frequently happens in 'bug' reports I get sent. This trap does happen to programmers who are less familiar with how the underlying stack machine actually works. The real problem is there is no way to verify that this isn't happening in some arbitrarily large code base. I strongly believe that it is good for D and for programming languages in general to work towards a design that can provably eliminate certain types of bugs.

things. Even if it was a warning and not a true error. Just not a solution which silently allocates data that shouldn't be allocated.

Ok, I think we're completely on the same page here. I'm for the compiler finding bugs. But I'm not for the compiler being conservative and allocating memory when it doesn't have to, as it does currently. --bb

How about adding a warning switch (I know Walter you're against them but it might be justified here) that would flag all the closure allocations. I know that should be the job of a "lint" tool, but the compiler already has the context here.
Oct 29 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for two
 reasons.  It's backwards-compatible, which is handy.  But more importantly,
 I'd say that probably 95% of the current uses of delegates are scoped, and
 that isn't likely to shift all the way to 50% even if D moved to a much more
 functional style of programming.  Algorithms, for example, all use scoped
 delegates, which I'd say is far and away their most common current use.

reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

I've had bugs caused by this but they were pretty easy to find. Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find.

I don't think we can afford program correctness to rest on anecdote and "it works for me". That age is long gone. Andrei
Oct 28 2008
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 I don't think we can afford program correctness to rest on anecdote and 
 "it works for me". That age is long gone.

I agree. When you're managing a program with a million lines of code in it, there is great value in being able to *prove* that it does not suffer from as many kinds of bugs as practical, especially memory corruption bugs. Think of buffer overflow bugs, for example. Think of all the grief that would have been saved if the C/C++ compiler could prove that buffer overflows could not happen.
Oct 28 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current use.

reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find.

works for me". That age is long gone.

I haven't seen any real data about how serious a problem this is from you either. Chasing bogeymen is at least as bad as ignoring real problems.

Well to provide real data I'd have to spend time on user studies, which would be time-intensive. I also think it's not an interesting research problem because it is generally accepted in the community that memory un-safety is a source of problems. So I don't quite feel burdened with the need to provide a proof. Reframing the problem as chasing a bogeyman won't help with addressing it. Andrei
Oct 28 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, 
 for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are 
 scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to 
 a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common 
 current use.

reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find.

and "it works for me". That age is long gone.

I haven't seen any real data about how serious a problem this is from you either. Chasing bogeymen is at least as bad as ignoring real problems.

Well to provide real data I'd have to spend time on user studies, which would be time-intensive. I also think it's not an interesting research problem because it is generally accepted in the community that memory un-safety is a source of problems. So I don't quite feel burdened with the need to provide a proof. Reframing the problem as chasing a bogeyman won't help with addressing it. Andrei

I just wanted to issue an apology to Bill for the above, which is brusque and demeaning. He was delicate enough to email me privately what he thought about my response, and in very levelheaded terms. After having answered privately as well, I thought I'd post a public apology; it would be quite unethical to apologize in private for a public remark! Hopefully this helps with undoing the damage and with keeping the recent streak of good discussions going. Andrei
Oct 29 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 Back to the technical topic, as I told Andrei, all I want is some
 solution that doesn't kill performance with lots of hidden memory
 allocations.
 I doubt that's something anyone really wants, so all this huffing and
 puffing about it probably isn't necessary.

I doubt anyone wants that. But here is my main concern (my defense for huffing): One of my main goals for D at the moment is to have Tango compile on D2. Right now, I'm slowly getting everything constified, and dealing with small design changes to make that happen (and filing bugs that I find). However, when dissecting solutions to unnecessary dynamic closures, I want to make sure that the solution does not force Tango to change its overall design. Right now, with Walter's proposal, I fear a large amount of scope decorations would be necessary (making the api very unattractive), and possibly some of the ways Tango uses stack variables might be made uncompilable. I would like to avoid that. It has happened in the past that things considered closed on D2 did not work with Tango because the main code used to test D2 (Phobos) does not have a similar design, and does not use the same features as Tango does. When I think a solution solves the problem, and will allow Tango to compile, I'll stop my whining ;) -Steve
Oct 30 2008
parent ore-sama <spam here.lot> writes:
Steven Schveighoffer Wrote:

 Right now, with Walter's proposal, I fear a large amount of scope 
 decorations would be necessary (making the api very unattractive)

Moreover that sematics in some cases will force allocation when it's not needed.
Oct 30 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Oct 29, 2008 at 1:04 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

I have. Not often in my own code because I am very careful to avoid it, but it frequently happens in 'bug' reports I get sent. This trap does happen to programmers who are less familiar with how the underlying stack machine actually works. The real problem is there is no way to verify that this isn't happening in some arbitrarily large code base. I strongly believe that it is good for D and for programming languages in general to work towards a design that can provably eliminate certain types of bugs.

I agree with this. It would be nice to be able to flag these kinds of things. Even if it was a warning and not a true error. Just not a solution which silently allocates data that shouldn't be allocated.

Ok, I think we're completely on the same page here. I'm for the compiler finding bugs. But I'm not for the compiler being conservative and allocating memory when it doesn't have to, as it does currently. --bb
Oct 28 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current use.

The counter to that is that when there is an inadvertent escape of a reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

I've had bugs caused by this but they were pretty easy to find. Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find.

I don't think we can afford program correctness to rest on anecdote and "it works for me". That age is long gone.

I haven't seen any real data about how serious a problem this is from you either. Chasing bogeymen is at least as bad as ignoring real problems. --bb
Oct 28 2008
prev sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Tue, Oct 28, 2008 at 6:29 PM, Jason House
<jason.james.house gmail.com> wrote:
 Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent them automatically, rather than relying on the skill and
 attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

As the author of an open source multithreaded application in D1, I've had these errors pop up. It's easy to overlook this stuff and pass things the wrong way (it's easier to code). Tango doesn't even have a bind library to make it easier!

For what it's worth, std.bind I think depends on one Phobos-specific function. It would probably take a matter of a minute or two to port it to work with Tango.
Oct 28 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for two
 reasons.  It's backwards-compatible, which is handy.  But more importantly,
 I'd say that probably 95% of the current uses of delegates are scoped, and
 that isn't likely to shift all the way to 50% even if D moved to a much more
 functional style of programming.  Algorithms, for example, all use scoped
 delegates, which I'd say is far and away their most common current use.

The counter to that is that when there is an inadvertent escape of a reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

I've had bugs caused by this but they were pretty easy to find. Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find. --bb
Oct 28 2008
prev sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Oct 29, 2008 at 4:56 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a
 closure,
 and allow the developer to specify that a closure is necessary:

So basically programmers have to memorize all the rules the compiler uses to prove when it's necessary to allocate a closure, and then run those rules in their heads to determine if the current line of code will trigger allocation or not?

First, the compiler does not have any sound rules for this. It currently allocates a closure on a knee jerk reaction from taking the address of a stack variable. And its either this or substitute in your statement "prove when it's *not* necessary to allocate a closure", which is about as hard and probably 10x more common. Second, for 90% of functions that don't require you to allocate closures, you don't have to think about any rules.

I don't see why not. Because the compiler might be allocating a closure when I don't want it to, killing performance. So I'll either be surprised later, or I need to think about it when I'm writing that line of code.
 For the 9% of functions which return a pointer to local data, proven by the
 compiler, you don't have to think about rules.

Except didn't you just give us some examples where the function does things that escape in the local sense, but can be seen not to escape when examining the full context? So in those 9% of the cases I may also want to think about what the compiler will do to avoid unnecessary hidden allocations in my code. And if I am getting one of these unnecessary allocations, then I will have to think about how to rearrange my code so that the compiler doesn't get tricked. But it could be a library function that's causing it.
 For the last 1% of functions, the documentation should clarify how your data
 can escape, and then you have to think about how that affects your usage of
 it.  The docs could say 'best to allocate a closure unless you know what you
 are doing'.

 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

Not really. If the compiler can some day store the scope dependency information in the object file (and get rid of reading source to determine function signature), then this whole manual requirement goes away.

Until the compiler can do the right thing 100% of the time, I have to be on the lookout for spurious allocations.
 That doesn't sound like a good solution to me.

Then let's go back to D1's solution -- no closures ;)

Sure. But if you're going to do that, then at least give us an easy way to explicitly request a closure for those of us who know we need one and when we don't. :-) --bb
Oct 28 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Bill Baxter" wrote
 On Wed, Oct 29, 2008 at 4:56 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure
 automatically
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a
 closure,
 and allow the developer to specify that a closure is necessary:

So basically programmers have to memorize all the rules the compiler uses to prove when it's necessary to allocate a closure, and then run those rules in their heads to determine if the current line of code will trigger allocation or not?

First, the compiler does not have any sound rules for this. It currently allocates a closure on a knee jerk reaction from taking the address of a stack variable. And its either this or substitute in your statement "prove when it's *not* necessary to allocate a closure", which is about as hard and probably 10x more common. Second, for 90% of functions that don't require you to allocate closures, you don't have to think about any rules.

I don't see why not. Because the compiler might be allocating a closure when I don't want it to, killing performance. So I'll either be surprised later, or I need to think about it when I'm writing that line of code.

No, I'm proposing the compiler SHOULDN'T allocate closures unless it can prove without a shadow of a doubt that a closure is required. i.e. it defaults to D1 behavior, which should cover 90% of functions today.
 For the 9% of functions which return a pointer to local data, proven by
 the
 compiler, you don't have to think about rules.

Except didn't you just give us some examples where the function does things that escape in the local sense, but can be seen not to escape when examining the full context?

This is what I'm thinking as proven by the compiler: int *f() { int x = 5; return &x; } There is no doubt that this will cause an escape. A more common scenario (I just ran into this with a newb on irc): char[] readData(InputStream s) { char[64] buf; auto len = s.read(buf); return buf[0..len]; }
 So in those 9% of the cases I may also want to think about what the
 compiler will do to avoid unnecessary hidden allocations in my code.
 And if I am getting one of these unnecessary allocations, then I will
 have to think about how to rearrange my code so that the compiler
 doesn't get tricked.  But it could be a library function that's
 causing it.

I'm starting to think that if you compile with warnings on, these 9% of functions shouldn't compile. Perhaps they shouldn't compile by default since it's very easy to do this kind of stuff explicitly without closures.
 For the last 1% of functions, the documentation should clarify how your
 data
 can escape, and then you have to think about how that affects your usage
 of
 it.  The docs could say 'best to allocate a closure unless you know what
 you
 are doing'.

 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

Not really. If the compiler can some day store the scope dependency information in the object file (and get rid of reading source to determine function signature), then this whole manual requirement goes away.

Until the compiler can do the right thing 100% of the time, I have to be on the lookout for spurious allocations.

I'm saying no automatic closures unless it's absolutely provable that an escape occurs.
 That doesn't sound like a good solution to me.

Then let's go back to D1's solution -- no closures ;)

Sure. But if you're going to do that, then at least give us an easy way to explicitly request a closure for those of us who know we need one and when we don't. :-)

Assuming the compiler does not ever allocate closures needlessly, I agree with a way to specify when closures should occur, but not when they should not (since there's no need). -Steve
Oct 28 2008
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 28 Oct 2008 01:15:24 +0300, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

That argues that "noscope" should be the default. Using "scope" would be an optional optimization. BTW, "unshared" is the default. "shared" would be the keyword.

I hope that 'noscope' is considered to be default *not* because is would introduce one more keyword otherwise... OTOH, noscope *should* be a keyword in either case, due to some casts: scope int* sp; noscope int* nsp; nsp = cast(noscope int*)sp; sp = cast(scope int*)nsp;
Oct 27 2008
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Fraser wrote:
 Walter Bright Wrote:
 
 The delegate closure issue is part of a wider issue - escape
 analysis. A reference is said to 'escape' a scope if it, well,
 leaves that scope. Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster.
  Another form of escaping:
 
 int* p; void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j) { bar(&j); }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be
 escaping by default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes void bar(scope int* p);
 // p does not escape void bar(int* p);            // what should be
 the default?
 
 What should be the default? The functional programmer would
 probably choose scope as the default, and the OOP programmer
 noscope.
 
 (The issue with delegates is we need the dynamic closure only if
 the delegate 'escapes'.)

I get the feeling that D's type system is going to become the joke of the programming world. Are we really going to have to worry about a scope unshared(invariant(int)*) ...? What other type modifiers can we put on that?

This is a misunderstanding. Scope is a storage class, not a type modifier, so it's not as pervasive as you may think. Andrei
Oct 27 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 27 Oct 2008 23:05:48 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:
 scope is a storage class, not a type constructor.

Okay, I'm confused. I had assumed that the escape scope was different from the storage scope as the storage scope has a few known problems with regard to escape analysis as currently defined e.g. class Node { Node next }; void append(scope Node a) { scope b = new Node(); a.next = b; // b just escaped } scope const also has similar issues. So is the plan for the compilier going to do a static escape analysis based on the funtion signiture? Alternatively, a deep type which prevents assignment except at declaration grantees (I think) no escape.
Oct 27 2008
prev sibling next sibling parent reply Mosfet <mosfet anonymous.org> writes:
Robert Fraser wrote:
 Walter Bright Wrote:
 
 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
      bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I get the feeling that D's type system is going to become the joke of the programming world. Are we really going to have to worry about a scope unshared(invariant(int)*) ...? What other type modifiers can we put on that?

I agree I think that D will be used only by people like you that understand all this shared/scope/mutable/lazy things. I thought C++ was complex and difficult to learn but I think I was wrong.
Oct 28 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Mosfet wrote:
 Robert Fraser wrote:
 Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape 
 analysis. A reference is said to 'escape' a scope if it, well, leaves 
 that scope. Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
      bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping 
 by default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I get the feeling that D's type system is going to become the joke of the programming world. Are we really going to have to worry about a scope unshared(invariant(int)*) ...? What other type modifiers can we put on that?

I agree I think that D will be used only by people like you that understand all this shared/scope/mutable/lazy things. I thought C++ was complex and difficult to learn but I think I was wrong.

Well I think you were right. The question is how much you spend learning things that are actually useful, versus learning gratuitous complexity. I think D is much more rewarding per unit of effort invested than C++. Andrei
Oct 28 2008
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 Well I think you were right. The question is how much you spend learning
 things that are actually useful, versus learning gratuitous complexity.
 I think D is much more rewarding per unit of effort invested than C++.
 Andrei

Seconded. Both C++ and D are very complex languages, but I don't see that as a problem. As Bjarne <insert correct spelling of his last name here> would say, "Complexity has to go somewhere." If you oversimplify the core language, you end up acting has a human compiler to make your code fit within the confines of the simple language. See Java and C. The real problem with C++ is not complexity per se, but cruft, the fact that it's a low-level language masquerading as a high-level language, and the complete ignorance of convenience as a design goal. This can be exemplified just by examining how arrays "work" in C++. First, you have the cruft of C arrays that are very low-level and really aren't good for much, except making things more confusing. To get around this without doing anything to the core language, C++ adds vector to the STL. This is fine, except that you have no vector literals, no slice syntax, horrible error messages, inefficient copying semantics by default, vectors can't be used in metaprogramming, etc. It works, but it's not very convenient. Furthermore, the reason you have no nice slice syntax or default reference semantics is because you have no garbage collection.
Oct 28 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Oct 29, 2008 at 1:04 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu wrote:
 Escape analysis is a tricky business. My opinion is that we either take
 care of it properly or blissfully ignore the entire issue. That opinion may
 disagree a bit with Walter's, who'd prefer a quick patch for delegates so he
 returns to threading. I think if we opt for a quick patch now, it'll turn to
 gangrene later. Among other things, it will hurt the threading
 infrastructure it was supposed to give precedence to.

Like const, I'd rather have no solution than a bad solution insofar as escape analysis is concerned.

The only serious problem people have right now is that closures are allocated automatically when they may not need to be. Making closure allocation manual for now seems like the most future-compatible way to fix things. In some nebulous future, the manual allocation could become unnecessary, or it could become compiler-checked, but it seems to me that for now just making it manual does the least harm and lets Walter get back to work on other things. --bb
Oct 28 2008
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Tue, 28 Oct 2008 16:54:15 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
[snip]
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically  
 in simple cases like this:

 int *f() {int x = 5; return &x;}

Hmm.. This is nice! You can implement 'new' in pure D in just a few lines: template new(T) { T* new(Args...)(Args args) { T t = T(args); return &t; } } Example: struct Foo { public this(int value) { this.value = value; } private int value; } Foo* foo = new!(Foo)(42);
Oct 28 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a closure,
 and allow the developer to specify that a closure is necessary:

So basically programmers have to memorize all the rules the compiler uses to prove when it's necessary to allocate a closure, and then run those rules in their heads to determine if the current line of code will trigger allocation or not? And when the compiler gets a little smarter, the programmers need to get smarter, too. In lock step. That doesn't sound like a good solution to me. --bb
Oct 28 2008
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

In D1, local variables implicitly follow a mixed rule: Objects are noscope Primitive types are scope Structs are headscope Headscope may be a bit of a misnomer because member scope is type dependent. When it comes to escaping, do we need transitive scope? I currently can't imagine that without allowing some exceptions. That insane path seems to lead to 3 scopes for membervariables...
Oct 27 2008
prev sibling next sibling parent ore-sama <spam here.lot> writes:
Allocation is determined on delegate creation, not on passing it somewhere
else, isn't it? Closure allocation is a caller's task, so it's responsible and
should be able to control this. Documentation is generally not needed, usually
it's quite obvious, what's going on. Default should be alloc by default, and
programmer should be able to track allocations with compiler's help, if he
wants.
Oct 28 2008
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

I'm for safe defaults. Programs shouldn't crash for no reason. Here are my thoughts on escape analysis. Sorry if they're obvious. I think it is possible to detect whether a reference escapes or not in the absence of function calls by analyzing an expression graph. Assigning to a global state variable is an ultimate escape. In the worst case, when only the current function can be analyzed and no meta-info is available about other functions, the compiler must assume a reference escapes if it is passed as an argument to another function. This is the current D2 behavior. Pure functions provide some meta-info because any reference passed as an argument can only escape via a reference return value or other mutable reference arguments. This makes escape analysis possible even after an unknown pure function is called. For any function in a tree of imported modules the compiler could keep some meta-data about which argument escapes where, if at all. This way even regular functions can participate in escape analysis without blowing it up. An argument to a virtual function call always escapes by default. It may be possible to declare an argument as non-escaping (scope?) and compiler should then enforce non-escaping contract upon any overriding functions. An argument to a function declared as a prototype always escapes by default. It may be possible for the compiler to export some meta-info along with the prototype when a .di file is generated, whether an argument is guaranteed to not escape, or maybe even detailed info about which argument escapes where, to mimic the compile-time meta-info. The expression graph analysis should be the first step towards safe stack closures.
Oct 28 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.

I'm for safe defaults. Programs shouldn't crash for no reason.

If safe defaults means 75% performance decrease, I'm for using unsafe defaults that are safe 99% of the time, with the ability to make them 100% safe if needed.
 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

Yes, but not in D, since import uses uncompiled files as input.
 Assigning to a global state variable is an ultimate escape.

Agree there.
 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

This leads to the current situation, where you have a huge performance decrease for little or no gain in reliability.
 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

Good point. Easy analysis on pure functions.
 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

Where is the data kept? It must be in the object file, and d imports must then read the object file for api instead of the source file. I don't think it's worth anything to break the single file for imports/code model. Requiring a .di file is a little iffy as it is today.
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph. Another solution is that a derived function must have the same expression graph or a tighter one than the base class'. But without being able to store the graph with the compiled code (and having the compiler import the metadata instead of the source file), this is a moot point.
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files. In the meantime, to make D2 a systems language again, it should drop conservative closures. -Steve
Oct 28 2008
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.


If safe defaults means 75% performance decrease, I'm for using unsafe defaults that are safe 99% of the time, with the ability to make them 100% safe if needed.
 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

Yes, but not in D, since import uses uncompiled files as input.

Please note the "in the absence of function calls" part. I'm talking about code which is doing pure calculus, without calling anything external. It's pretty useless by itself, but it's the basics. Unfortunately I don't know how import is implemented. It should do some parsing though, to be able to inline functions from other modules, and to expand templates.
 Assigning to a global state variable is an ultimate escape.

Agree there.
 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

This leads to the current situation, where you have a huge performance decrease for little or no gain in reliability.
 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

Good point. Easy analysis on pure functions.
 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

Where is the data kept? It must be in the object file, and d imports must then read the object file for api instead of the source file. I don't think it's worth anything to break the single file for imports/code model. Requiring a .di file is a little iffy as it is today.

Here I'm talking about disposable compile-time data, module-local if you wish. This means that local optimization is better than inter-module optimization. Nothing new here I suppose. Of course it would be nice if this data is exported somehow and used when compiling other modules. But it'd make the compilation process asymmetric, when meta-data is available for already compiled modules and not available for others.
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph.

Not every call to a virtual function is itself virtual, and not every virtual function cares whether its argument escapes. I'd say more: the noscope should be default for all reference types except delegates because you usually don't care. I agree that having scope delegates the default is probably the right thing to do, but only if a compiler can detect violations of this contract.
 Another solution is that a derived function must have the same expression 
 graph or a tighter one than the base class'.  But without being able to 
 store the graph with the compiled code (and having the compiler import the 
 metadata instead of the source file), this is a moot point.
 
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.

There are separate import files. Actually compiler can simply put scope/noscope for the arguments based upon the meta-data collected during compilation. If your .di is manually created, you either put them manually as well, or you don't care.
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files.

You can limit analysis to a single module for now. This will cover local function calls, including some local method calls, and I hope it'll also cover template function calls which means std.algorithm will work without memory allocation again.
 In the meantime, to make D2 a systems language again, it should drop 
 conservative closures.
 
 -Steve

Oct 28 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.


If safe defaults means 75% performance decrease, I'm for using unsafe defaults that are safe 99% of the time, with the ability to make them 100% safe if needed.
 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

Yes, but not in D, since import uses uncompiled files as input.

Please note the "in the absence of function calls" part. I'm talking about code which is doing pure calculus, without calling anything external. It's pretty useless by itself, but it's the basics.

Ah, sorry. I read 'absence of function source'. My bad, in that case we agree on this one.
 Unfortunately I don't know how import is implemented.  It should do some
 parsing though, to be able to inline functions from other modules, and
 to expand templates.

Those are all problems to be solved. But if the file used by the linker and the file that contains the expression graphs aren't the same, or at least forced to be related, then you end up with very weird issues.
 Assigning to a global state variable is an ultimate escape.

Agree there.
 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

This leads to the current situation, where you have a huge performance decrease for little or no gain in reliability.
 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

Good point. Easy analysis on pure functions.
 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

Where is the data kept? It must be in the object file, and d imports must then read the object file for api instead of the source file. I don't think it's worth anything to break the single file for imports/code model. Requiring a .di file is a little iffy as it is today.

Here I'm talking about disposable compile-time data, module-local if you wish. This means that local optimization is better than inter-module optimization. Nothing new here I suppose.

Except the linker has to enforce it. Which means it needs to somehow be munged into the signature. If the signature is defined only in a .di file then it might not match. I just think the object file and .di file are too unrelated to force continuity. Weird issues can happen when these things are edited separately. If .di files were not editable and always generated with object files, I'd say they were a good place to put this info. But they aren't.
 Of course it would be nice if this data is exported somehow and used
 when compiling other modules.  But it'd make the compilation process
 asymmetric, when meta-data is available for already compiled modules and
 not available for others.

It would have to be available for all of them. That would be the point of including it in the object file.
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph.

Not every call to a virtual function is itself virtual, and not every virtual function cares whether its argument escapes. I'd say more: the noscope should be default for all reference types except delegates because you usually don't care. I agree that having scope delegates the default is probably the right thing to do, but only if a compiler can detect violations of this contract.

A very very common technique in Tango to save using heap allocation is to declare a static array as a buffer, and then pass that buffer to be used as scratch space in a function (which is possibly virtual). This would be my golden use case that has to not allocate anything and has to work in order for any solution to be viable. Saying all reference types are noscope would prevent this, no?
 Another solution is that a derived function must have the same expression
 graph or a tighter one than the base class'.  But without being able to
 store the graph with the compiled code (and having the compiler import 
 the
 metadata instead of the source file), this is a moot point.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.

There are separate import files. Actually compiler can simply put scope/noscope for the arguments based upon the meta-data collected during compilation. If your .di is manually created, you either put them manually as well, or you don't care.

I think the graph has to be complete for this to be usable. Otherwise, it becomes an unused feature. Using .di files is optional. I generally don't use them.
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files.

You can limit analysis to a single module for now. This will cover local function calls, including some local method calls, and I hope it'll also cover template function calls which means std.algorithm will work without memory allocation again.

Yes, but not class virtual methods or interface methods. These are used quite a bit in Tango. End result, not a lot of benefit. -Steve
Oct 28 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph.

Not every call to a virtual function is itself virtual, and not every virtual function cares whether its argument escapes. I'd say more: the noscope should be default for all reference types except delegates because you usually don't care. I agree that having scope delegates the default is probably the right thing to do, but only if a compiler can detect violations of this contract.

A very very common technique in Tango to save using heap allocation is to declare a static array as a buffer, and then pass that buffer to be used as scratch space in a function (which is possibly virtual). This would be my golden use case that has to not allocate anything and has to work in order for any solution to be viable. Saying all reference types are noscope would prevent this, no?

Allocation only happens when a stack variable reference escapes via a delegate. A static array is not a stack variable, therefore the compiler doesn't care if it escapes.
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.

There are separate import files. Actually compiler can simply put scope/noscope for the arguments based upon the meta-data collected during compilation. If your .di is manually created, you either put them manually as well, or you don't care.

I think the graph has to be complete for this to be usable. Otherwise, it becomes an unused feature. Using .di files is optional. I generally don't use them.

For the incomplete graph to be usable, the compiler must assume the worst for nodes with absent meta-info. Therefore if you don't care to provide meta-info for your modules, it'll still work, though not as efficiently. On the other hand, if you supply .di files with your library and you do care enough, or you generate your .di files automatically, the meta-info will be present there saving some allocations for the user.
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files.

You can limit analysis to a single module for now. This will cover local function calls, including some local method calls, and I hope it'll also cover template function calls which means std.algorithm will work without memory allocation again.

Yes, but not class virtual methods or interface methods. These are used quite a bit in Tango. End result, not a lot of benefit.

If those virtual and interface methods are often used with function-local delegates as parameters then yes, the benefit wouldn't be that significant. Are you sure this is the case with Tango?
Oct 29 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph.

Not every call to a virtual function is itself virtual, and not every virtual function cares whether its argument escapes. I'd say more: the noscope should be default for all reference types except delegates because you usually don't care. I agree that having scope delegates the default is probably the right thing to do, but only if a compiler can detect violations of this contract.

A very very common technique in Tango to save using heap allocation is to declare a static array as a buffer, and then pass that buffer to be used as scratch space in a function (which is possibly virtual). This would be my golden use case that has to not allocate anything and has to work in order for any solution to be viable. Saying all reference types are noscope would prevent this, no?

Allocation only happens when a stack variable reference escapes via a delegate. A static array is not a stack variable, therefore the compiler doesn't care if it escapes.

A static array declared on the stack absolutely is a stack variable. An example (from Tango's integer to text converter): char[] toString (long i, char[] fmt = null) { char[66] tmp = void; return format (tmp, i, fmt).dup; } Without the dup, toString returns a pointer to it's own stack. With a full graph analysis, it can be proven that tmp doesn't escape, but without either that or some crazy scope scheme, it would either allocate a closure, or fail to compile. Neither of those options are acceptable.
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info 
 about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.

There are separate import files. Actually compiler can simply put scope/noscope for the arguments based upon the meta-data collected during compilation. If your .di is manually created, you either put them manually as well, or you don't care.

I think the graph has to be complete for this to be usable. Otherwise, it becomes an unused feature. Using .di files is optional. I generally don't use them.

For the incomplete graph to be usable, the compiler must assume the worst for nodes with absent meta-info. Therefore if you don't care to provide meta-info for your modules, it'll still work, though not as efficiently. On the other hand, if you supply .di files with your library and you do care enough, or you generate your .di files automatically, the meta-info will be present there saving some allocations for the user.

This doesn't cover virtual functions or runtime-determined delegates. I'd rather just have a separate meta file or have the meta data included in the object file. What is wrong with that? Why must it be in the .di file? If the compiler always generates these meta files, then the graph is always complete.
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files.

You can limit analysis to a single module for now. This will cover local function calls, including some local method calls, and I hope it'll also cover template function calls which means std.algorithm will work without memory allocation again.

Yes, but not class virtual methods or interface methods. These are used quite a bit in Tango. End result, not a lot of benefit.

If those virtual and interface methods are often used with function-local delegates as parameters then yes, the benefit wouldn't be that significant. Are you sure this is the case with Tango?

Any time you use opApply (and opApply is virtual), you are doing this. I suppose opApply is a special case, and can be failed if you save the delegate somewhere. But what about being able to pass the delegate to another virtual function while inside your opApply? Here is another example from Tango that isn't used via foreach: final bool putCache (char[] key, IMessage message) { void send (IConduit conduit) { buffer.setConduit (conduit); writer.put (ProtocolWriter.Command.Add, name_, key, message).flush; } // return false if the cache server said there's // already something newer if (cluster_.cache.request (&send, reader, key)) return false; return true; } cluster_.cache is a class. -Steve
Oct 29 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 29 Oct 2008 11:52:14 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are forced to litter all your functions with escaping/non-escaping syntax. To be accurate you need to define the escape graph in the signature, which will be a PITA. What would be worse is to not have a way to express the complete graph.

Not every call to a virtual function is itself virtual, and not every virtual function cares whether its argument escapes. I'd say more: the noscope should be default for all reference types except delegates because you usually don't care. I agree that having scope delegates the default is probably the right thing to do, but only if a compiler can detect violations of this contract.

A very very common technique in Tango to save using heap allocation is to declare a static array as a buffer, and then pass that buffer to be used as scratch space in a function (which is possibly virtual). This would be my golden use case that has to not allocate anything and has to work in order for any solution to be viable. Saying all reference types are noscope would prevent this, no?

Allocation only happens when a stack variable reference escapes via a delegate. A static array is not a stack variable, therefore the compiler doesn't care if it escapes.

A static array declared on the stack absolutely is a stack variable. An example (from Tango's integer to text converter): char[] toString (long i, char[] fmt = null) { char[66] tmp = void; return format (tmp, i, fmt).dup; } Without the dup, toString returns a pointer to it's own stack. With a full graph analysis, it can be proven that tmp doesn't escape, but without either that or some crazy scope scheme, it would either allocate a closure, or fail to compile. Neither of those options are acceptable.

There is no delegate, therefore nothing to allocate a closure for. If tmp escapes, it is a compile-time error. If format() were pure it would be trivial to prove that tmp didn't escape. If format() is not pure, and escape graph for it is not known, then issuing an error here would be too much of a breaking change, I agree.
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info 
 about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated. You also now back to a separate import and source file, like C has. I think in order for this to work, the graph and object code must be stored in the same file that is imported.

There are separate import files. Actually compiler can simply put scope/noscope for the arguments based upon the meta-data collected during compilation. If your .di is manually created, you either put them manually as well, or you don't care.

I think the graph has to be complete for this to be usable. Otherwise, it becomes an unused feature. Using .di files is optional. I generally don't use them.

For the incomplete graph to be usable, the compiler must assume the worst for nodes with absent meta-info. Therefore if you don't care to provide meta-info for your modules, it'll still work, though not as efficiently. On the other hand, if you supply .di files with your library and you do care enough, or you generate your .di files automatically, the meta-info will be present there saving some allocations for the user.

This doesn't cover virtual functions or runtime-determined delegates. I'd rather just have a separate meta file or have the meta data included in the object file. What is wrong with that? Why must it be in the .di file? If the compiler always generates these meta files, then the graph is always complete.

If you compile two files for the first time, and the first file imports the second one, where do you get that meta-data for the second file? What if you compile only one file, and that file imports another which wasn't compiled yet? Either you construct meta-data on the fly, or require it included in the source, or assume it's not present (worst case).
 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this. But I don't think it's happening in the near future. And I hope it's not done through .di files.

You can limit analysis to a single module for now. This will cover local function calls, including some local method calls, and I hope it'll also cover template function calls which means std.algorithm will work without memory allocation again.

Yes, but not class virtual methods or interface methods. These are used quite a bit in Tango. End result, not a lot of benefit.

If those virtual and interface methods are often used with function-local delegates as parameters then yes, the benefit wouldn't be that significant. Are you sure this is the case with Tango?

Any time you use opApply (and opApply is virtual), you are doing this.

Fair enough. opApply() is an important technique.
Oct 29 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Sergey Gromov" wrote
 Wed, 29 Oct 2008 11:52:14 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:
 A very very common technique in Tango to save using heap allocation is 
 to
 declare a static array as a buffer, and then pass that buffer to be 
 used
 as
 scratch space in a function (which is possibly virtual).

 This would be my golden use case that has to not allocate anything and
 has
 to work in order for any solution to be viable.

 Saying all reference types are noscope would prevent this, no?

Allocation only happens when a stack variable reference escapes via a delegate. A static array is not a stack variable, therefore the compiler doesn't care if it escapes.

A static array declared on the stack absolutely is a stack variable. An example (from Tango's integer to text converter): char[] toString (long i, char[] fmt = null) { char[66] tmp = void; return format (tmp, i, fmt).dup; } Without the dup, toString returns a pointer to it's own stack. With a full graph analysis, it can be proven that tmp doesn't escape, but without either that or some crazy scope scheme, it would either allocate a closure, or fail to compile. Neither of those options are acceptable.

There is no delegate, therefore nothing to allocate a closure for. If tmp escapes, it is a compile-time error.

I was under the impression that closures are currently allocated if you return a reference to a stack variable, not just for delegates. Maybe I'm wrong...
 If format() were pure it would be trivial to prove that tmp didn't
 escape.  If format() is not pure, and escape graph for it is not known,
 then issuing an error here would be too much of a breaking change, I
 agree.

format cannot be pure because it accepts mutable reference data. It happens to be in the same file, so it probably would not be an issue because a graph is generated for the current file, but these are not the only cases that Tango has.
 I think the graph has to be complete for this to be usable.  Otherwise,
 it
 becomes an unused feature.  Using .di files is optional.  I generally
 don't
 use them.

For the incomplete graph to be usable, the compiler must assume the worst for nodes with absent meta-info. Therefore if you don't care to provide meta-info for your modules, it'll still work, though not as efficiently. On the other hand, if you supply .di files with your library and you do care enough, or you generate your .di files automatically, the meta-info will be present there saving some allocations for the user.

This doesn't cover virtual functions or runtime-determined delegates. I'd rather just have a separate meta file or have the meta data included in the object file. What is wrong with that? Why must it be in the .di file? If the compiler always generates these meta files, then the graph is always complete.

If you compile two files for the first time, and the first file imports the second one, where do you get that meta-data for the second file? What if you compile only one file, and that file imports another which wasn't compiled yet? Either you construct meta-data on the fly, or require it included in the source, or assume it's not present (worst case).

My vote would be for compiling it on the fly. The compiler already does parsing of the source file, so it can also generate this graph data. It shouldn't be too hard a task. Look, I agree that a graph analysis is the best possible solution. It requires no work from the user, no extra specification, and it will solve the problem accurately. But the current mode of compliation doesn't allow for that easily. That's all I was saying. -Steve
Oct 29 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 29 Oct 2008 15:23:12 -0400, Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 If you compile two files for the first time, and the first file imports
 the second one, where do you get that meta-data for the second file?
 What if you compile only one file, and that file imports another which
 wasn't compiled yet?  Either you construct meta-data on the fly, or
 require it included in the source, or assume it's not present (worst
 case).

My vote would be for compiling it on the fly. The compiler already does parsing of the source file, so it can also generate this graph data. It shouldn't be too hard a task. Look, I agree that a graph analysis is the best possible solution. It requires no work from the user, no extra specification, and it will solve the problem accurately. But the current mode of compliation doesn't allow for that easily. That's all I was saying.

I do understand that. I just wanted to discuss whether it is possible to approach this problem incrementally, so that relatively simple changes significantly improve the situation without breaking safety. And I thought that a dispute was a nice way for probing an idea for hidden flaws.
Oct 29 2008
prev sibling parent Chad J <gamerchad __spam.is.bad__gmail.com> writes:
Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.


If safe defaults means 75% performance decrease, I'm for using unsafe defaults that are safe 99% of the time, with the ability to make them 100% safe if needed.

If safe defaults means 2% performance decrease, I'm for using unsafe defaults that are safe 10% of the time, with the inability to make them 100% safe if needed. I might also be insane. ... I'm initially biased towards the safe default. I remember reading that part of D's design philosophy is to be safe by default, and I like that A LOT because it saves me from wasting many many hours of my life on stupid bugs. I'm also not convinced that full closures really run that much slower. That said, I'd be happy to ignore escape analysis for a while longer and just have D1 closures with the option to manually heap allocate them. I say that under the assumption that it's really easy to implement, mostly sortof solves the problem, and allows better (more general, safer) solutions to be put in place later.
Oct 28 2008
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
I've run across some academic work on ownership types which seems relevant  
to this discussion on share/local/scope/noscope.

Paper: http://www.cs.jhu.edu/~scott/pll/papers/pedigree-types.pdf
Slides: http://www.cs.jhu.edu/~scott/pll/papers/iwaco.ppt
Site: http://www.cs.jhu.edu/~scott/pll/abinitio.html
Overview:
Pedigree Types are an intuitive ownership type system requiring minimal  
programmer annotations. Reusing the vocabulary of human genealogy,  
Pedigree Types programmers can qualify any object reference with a  
pedigree -- a child, sibling, parent, grandparent, etc -- to indicate what  
relationship the object being referred to has with the referant on the  
standard ownership tree, following the owners-as-dominators convention.  
Such a qualifier serves as a heap shape constraint that must hold at run  
time and is enforced statically. Pedigree child captures the intention of  
encapsulation, i.e. ownership: the modified object reference is ensured  
not to escape the boundary of its parent. Among existing ownership type  
systems, Pedigree Types are closest to Universe Types. The former can be  
viewed as extending the latter with a more general form of pedigree  
modifiers, so that the relationship between any pair of objects on the  
aforementioned ownership tree can be named and -- more importantly --  
inferred. We use a constraint-based type system which is proved sound via  
subject reduction. Other technical originalities include a polymorphic  
treatment of pedigrees not explicitly specified by programmers, and use of  
linear diophantine equations in type constraints to enforce the hierarchy.
Oct 28 2008
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems 
 relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same direction as I was thinking. Basically, all the scope variables you can get are guarentied to be in the current or in some ansestry scope. To allow a reference to a scope variable, or a scope function, to be put inside a member of a struct or class, you only need to prove that the struct or class lifetime is smaller or equal to the one of the reference to your scope variable. If you could tell to the compiler the scope relationship of the various arguments, then you'd have pretty good scope analysis. For instance, with this syntax, we could define i to be available during the whole lifetime of o: void foo(scope MyObject o, scope(o) int* i) { o.i = i; } So you could do: void bar() { scope int i; scope MyObject o = new MyObject; foo(o, &i); } And the compiler would let it pass because foo guarenties not to keep references to i outside of o's scope, and o's scope is the same as i. Or you could do: void test1() { int i; test2(&i); } void test2(scope int* i) { scope o = new MyObject; foo(o, &i); } Again, the compiler can statically check that test2 won't keep a reference to i outside of the caller's scope (test1) because o scope is limited to test2. And if you try the reverse: void test1() { scope o = new MyObject; test2(o); } void test2(scope MyObject o) { int i; foo(o, &i); } Then the compiler could determine automatically that i needs to escape test2's scope and allocate the variable on the heap to make its lifetime as long as the object's scope (as it does currently with nested functions) [see my reserves to this in post scriptum]. This could be avoided by explictly binding i to the current scope, in which case the compiler could issue a scope error: void test2(scope MyObject o) { scope int i; foo(o, &i); // error, i scope needs to match o's, but i is bound to the current scope. } Interistingly, with this scheme, assuming your function arguments are properly scope-labeled, you never need to allocate variables on the heap explicitly anymore, the compiler can take care of it for you when the use of the variable inside the function body requires it. void test3(int* i); // unscoped parameter void test4() { int i; // allocated on heap because calling test3 requires an unscoped variable. test3(&i); } The reverse is also true: objects declared as allocated on the heap could be automatically rescoped as local stack variables if their use inside the function is limited in scope: void test5() { auto o = new MyObject; test2(o); } For instance, in test3 above where o isn't declared as scope, the compiler could still allocate o on the stack (as long as it knows the constructor doesn't leave unwanted references to the object in the global state), because it knows from the argument declaration of test2 that no references to o will leave the current scope. So basically, what to heap-allocate and what to stack-allocate could be left entirely to the compiler's discretion. Note that for all this to work, the pointer "i" in MyObject must be defined as not escaping the scope of the class: class MyObject { scope int* i; } or else someone could take the reference and put it into a global variable, or a variable of a greater scope than the object. P.S.: I'm still somewhat skeptical about this automatic allocation thing because it would mean a lot of extra heap allocation (and thus loss of performance) for any function where the parameters are not properly scoped. Perhaps the default should be local scope and you explicitly make it greater by declaring variables as noscope, which would allow the compiler to allocate if needed, but it doesn't solve the issue of the need to allocate on the heap for calling safely functions not using scope-labeled arguments. P.P.S.: This syntax doesn't fit very well with the current scope(success/failure/exit) feature. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 29 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Michel Fortin" wrote
 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems 
 relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same direction as I was thinking.

This is exactly the kind of thing I DON'T want to have. Here, you have to specify everything, even though the compiler is also doing the work, and making sure it matches. Tack on const modifiers, shared modifiers, and pure functions and there's going to be more decorations on function signatures than there are parameters. Note that especially this scope stuff will be required more often than the others. I'd much rather have either no checks, or have the compiler (or a lint tool) do all the work to tell me if anything escapes. -Steve
Oct 29 2008
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-29 11:01:35 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 This is exactly the kind of thing I DON'T want to have.  Here, you have to
 specify everything, even though the compiler is also doing the work, and
 making sure it matches.  Tack on const modifiers, shared modifiers, and pure
 functions and there's going to be more decorations on function signatures
 than there are parameters.

I agree that this is becomming a problem, even without scope. What we need is good defaults so that you don't have to decorate most of the time, and especially when you want to bypass it. I'd also like to point out that beside the possibility of better optimization and error catching by the compiler, specifying more properties function interfaces can free us of handling other releated things. With "immutable" values you don't need to worry about duplicating them everywhere to avoid other references from changing it; with "shared", you'll have less to worry about thread synchronization; and with "scope" as I proposed, you no longer have to worry about providing variables with the correct scope as the compiler can dynamically allocate when it sees the variable is needed outside of the current scope. Basically, by documenting better the interfaces in a machine-readable way, we are freed of other burdens the compiler can now take care of. In addition, we have better defined interfaces and the compiler has a lot more room to optimize things.
 Note that especially this scope stuff will be required more often than the
 others.

Indeed.
 I'd much rather have either no checks, or have the compiler (or a lint tool)
 do all the work to tell me if anything escapes.

The problem is that as soon as you have a function declaration without the body, the lint tool won't be able to tell you if it escapes or not. So, without a way to specify the requested scope of the parameters, you'll very often have holes in your escape analysis that will propagate down the caller chain, preventing any useful conclusion. For instance: void foo() { char[5] x = ['1', '2', '3', '4', '\0']; bar(x); } void bar(char* x) { printf(x); } void printf(char* x); Here you have no specification telling you that printf won't keep a reference to x beyond its scope, so we have to expect that it may do so. Turns out that because of that, a compiler or lit tool can't deduce if bar may or not leak the reference beyond its scope, which basically mean that calling bar(x) in foo may or may not be safe. With my proposal, it'd become this: void foo() { char[5] x = ['1', '2', '3', '4', '\0']; bar(x.ptr); } void bar(scope char* x) { printf(x); } void printf(scope char* x); And here the compiler, or the lint tool, can see that x doesn't need to live outside of foo's scope and that all is fine. If bar decided to keep the pointer in a global variable for further use, then the function signature would have a noscope x or the assignment to a global wouldn't work, and once bar has a noscope argument then foo won't compile unless x is allocated on the heap. I don't think it's bad to force interfaces to be well documented, and documented in a format that the compiler can understand to find errors like this. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 30 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable way, 
 we are freed of other burdens the compiler can now take care of. In 
 addition, we have better defined interfaces and the compiler has a lot 
 more room to optimize things.

But the burden you have left for the developer is a tough one. You have to analyze the inputs and function calls from a function and determine which variable depends on what. This is a perfect problem for a tool to solve.
 The problem is that as soon as you have a function declaration without the 
 body, the lint tool won't be able to tell you if it escapes or not.

This I agree is a problem. In fact, without specifications in the function things like interfaces would be very difficult to determine scope-ness at compile time. The only way I can see to solve this is to do it at link time. When you link, piece together the parts of the graph that were incomplete, and see if they all work. It would be a very radical change, and might not even work with the current linkers. Especially if you want to do shared libraries, where the linker is builtin to the OS. A related question: how do you handle C functions?
 So, without a way to specify the requested scope of the parameters, you'll 
 very often have holes in your escape analysis that will propagate down the 
 caller chain, preventing any useful conclusion.

Yes, and if a function has mis-specified some of its parameters, then you have code that doesn't compile. Or the function itself won't compile, and you need to do some more manual analysis. Imagine a function that calls 5 or 6 other functions with its parameters. And there are multiple different dependencies you have to resolve. That's a lot of analysis you have to do manually.
 I don't think it's bad to force interfaces to be well documented, and 
 documented in a format that the compiler can understand to find errors 
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right. We are talking about a graph dependency analysis, in which many edges can exist, and the vertices do not necessarily have to be parameters. This is not stuff for the meager developer looking to get work done to have to think about. I'd much rather have a tool that does it, if not the compiler, then something else. Or partial analysis. Or no analysis. I agree it's good to have bugs caught by the compiler, but this solution requires too much work from the developer to be used. Some fun puzzles for you to come up with a proper scope syntax to use: void f(ref int *a, int *b, int *c) { if(*b < *c) a = b; else a = c;} struct S { int *v; } int *f2(S* s) { return s.v;} void f3(ref int *a, ref int *b, ref int *c) { int *tmp = a; a = b; b = c; c = tmp; } -Steve
Oct 31 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-31 11:11:26 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 "Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable way,
 we are freed of other burdens the compiler can now take care of. In
 addition, we have better defined interfaces and the compiler has a lot
 more room to optimize things.

But the burden you have left for the developer is a tough one. You have to analyze the inputs and function calls from a function and determine which variable depends on what. This is a perfect problem for a tool to solve.
 The problem is that as soon as you have a function declaration without the
 body, the lint tool won't be able to tell you if it escapes or not.

This I agree is a problem. In fact, without specifications in the function things like interfaces would be very difficult to determine scope-ness at compile time.

If you can't determine yourself that a function can work with scoped parameters, you'd better never call that function with reference to local variables and leave its prototype with noscope parameters, making the compiler aware of the situation. In any case, the one who design the function is the one who is most likely able to tell you whether or not it accepts scoped arguments. The current situation makes the caller of that function responsible of calling it correctly. I think that's backward.
 The only way I can see to solve this is to do it at link time.  When you
 link, piece together the parts of the graph that were incomplete, and see if
 they all work.  It would be a very radical change, and might not even work
 with the current linkers.  Especially if you want to do shared libraries,
 where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, but that's probably not even possible.
 A related question: how do you handle C functions?

You read the documentation of the function to determine if the function will let the pointer escape somewhere, and if not declare the parameter scope. For instance: extern (C) void printf(scope char* format, scope...); By the way, extern (C) functions with noscope parameters need careful consideration since their pointers aren't tracked by the garbage collector.
 So, without a way to specify the requested scope of the parameters, you'll
 very often have holes in your escape analysis that will propagate down the
 caller chain, preventing any useful conclusion.

Yes, and if a function has mis-specified some of its parameters, then you have code that doesn't compile. Or the function itself won't compile, and you need to do some more manual analysis. Imagine a function that calls 5 or 6 other functions with its parameters. And there are multiple different dependencies you have to resolve. That's a lot of analysis you have to do manually.

You'll get an error at some call site, which can mean only two things: either your local variable shouldn't be bound to the local scope (because the function expects a reference it can keep beyond its scope) so you should allocate it on the heap, or the function you're calling has its prototype wrong. There's a chance that fixing the function prototype will create problems upward if it tries to put a reference to a scope variable in a global, or pass it to a function as a noscope argument.
 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right.

It takes some thinking to get the prototype right at first. But it takes less caution calling the function later with local variables since the compiler will either issue an error or automatically fix the issue by allocating on the heap when an argument requires a greater scope.
 We are talking about a graph dependency
 analysis, in which many edges can exist, and the vertices do not necessarily
 have to be parameters.  This is not stuff for the meager developer looking
 to get work done to have to think about.  I'd much rather have a tool that
 does it, if not the compiler, then something else.  Or partial analysis.  Or
 no analysis.  I agree it's good to have bugs caught by the compiler, but
 this solution requires too much work from the developer to be used.
 
 Some fun puzzles for you to come up with a proper scope syntax to use:
 
 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

void f(scope ref int *a, scopeof(a) int *b, scopeof(o) int *c) { if (*b < *c) a = b; else a = c; }
 struct S
 {
    int *v;
 }
 
 int *f2(S* s) { return s.v;}

Here you have two options depending on what you mean. Your example above is valid, but would allow v to point only to heap variables. If your intension is that S.v should be able to refer to scope variables too, then you'd need to write S as: struct S { scope int *v; } Then, no function can copy this pointer and keep it beyond of the scope of S. Therfore, the function needs to be updated to propagate this property: scopeof(s) int *f2(scope S* s) { return s.v; }
 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

This one is special, because you have a circular reference between the parameters. Note that a simpler example of this would be swapping two values. I had to invent something here saying that all these variables share the same scope... but I'd agree the syntax isn't so good. void f3(ref scope(1) int *a, ref scope(1) int *b, ref scope(1) int *c) { scope int *tmp = a; a = b; b = c; c = tmp; } -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 31 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Michel Fortin" wrote
 If you can't determine yourself that a function can work with scoped 
 parameters, you'd better never call that function with reference to local 
 variables and leave its prototype with noscope parameters, making the 
 compiler aware of the situation.

 In any case, the one who design the function is the one who is most likely 
 able to tell you whether or not it accepts scoped arguments. The current 
 situation makes the caller of that function responsible of calling it 
 correctly. I think that's backward.

But often times, the safety of the call depends on how it is being called. Unless the function has fully documented the scope escapes of its parameters, which as I have been saying, is going to be difficult, or impossible, for a person to figure out.
 The only way I can see to solve this is to do it at link time.  When you
 link, piece together the parts of the graph that were incomplete, and see 
 if
 they all work.  It would be a very radical change, and might not even 
 work
 with the current linkers.  Especially if you want to do shared libraries,
 where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, but that's probably not even possible.

Sure it is ;) You have to write a special linker. I think everyone who thinks a scope decoration proposal is going to 1) solve all scope escape issues and 2) be easy to use is dreaming :P
 So, without a way to specify the requested scope of the parameters, 
 you'll
 very often have holes in your escape analysis that will propagate down 
 the
 caller chain, preventing any useful conclusion.

Yes, and if a function has mis-specified some of its parameters, then you have code that doesn't compile. Or the function itself won't compile, and you need to do some more manual analysis. Imagine a function that calls 5 or 6 other functions with its parameters. And there are multiple different dependencies you have to resolve. That's a lot of analysis you have to do manually.

You'll get an error at some call site, which can mean only two things: either your local variable shouldn't be bound to the local scope (because the function expects a reference it can keep beyond its scope) so you should allocate it on the heap, or the function you're calling has its prototype wrong.

Or, the prototype can't be written correctly, even though it is provable that no escapes occur.
 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right.

It takes some thinking to get the prototype right at first. But it takes less caution calling the function later with local variables since the compiler will either issue an error or automatically fix the issue by allocating on the heap when an argument requires a greater scope.

I hope to avoid this last situation. Having the compiler make decisions for me, especially when heap allocation occurs, is bad.
 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

void f(scope ref int *a, scopeof(a) int *b, scopeof(o) int *c) { if (*b < *c) a = b; else a = c; }

I assume you meant scopeof(a) instead of scopeof(o), but in any case, your design is incorrect. a depends on b and c's scope, not the other way around. Consider this valid usage: void foo() { int b = 1, c = 2; bar(&b, &c); } void bar(scope int *b, scope int *c) { int *a; f(a, b, c);// should not fail, but would with your decorations } -Steve
Nov 01 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) solve 
 all scope escape issues and 2) be easy to use is dreaming :P

I think that's a fair assessment. One suggestion I made Walter is to only allow and implement the scope storage class for delegates, which simply means the callee will not squirrel away a pointer to delegate. That would allow us to solve the closure issue and for now sleep some more on the other issues. Andrei
Nov 01 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

I think that's a fair assessment. One suggestion I made Walter is to only allow and implement the scope storage class for delegates, which simply means the callee will not squirrel away a pointer to delegate. That would allow us to solve the closure issue and for now sleep some more on the other issues.

If scope delegates means trust the coder knows what he is doing (in the beginning), I agree with that plan of attack. -Steve
Nov 02 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

allow and implement the scope storage class for delegates, which simply means the callee will not squirrel away a pointer to delegate. That would allow us to solve the closure issue and for now sleep some more on the other issues.

If scope delegates means trust the coder knows what he is doing (in the beginning), I agree with that plan of attack.

It looks like things will move that way. Bartosz, Walter and I talked a lot yesterday about it - a lot of crazy things were on the table! The next step is to make this a reference, which is highly related to escape analysis. At the risk of anticipating a bit an unfinalized design, here's what's on the table: * Continue an "anything goes" policy for *explicit* pointers, i.e. those written explicitly by user code with stars and stuff. * Disallow pointers in SafeD. * Make all ref parameters scoped by default. There will be impossible for a function to escape the address of a ref parameter without a cast. I haven't proved it to myself yet, but I believe that if pointers are not used and with the amendments below regarding arrays and delegates, this makes things entirely safe. In Walter's words, "it buttons things pretty tight". * Make this a reference so that it obeys what references obey. * If people want to implement e.g. linked lists, they should do it with classes. Implementing them with structs will require casts to obtain and escape &this. That also means they'd be using pointers, so anything goes - pointers are not restricted from escaping. * There are two cases in which things escape without the user explicitly using pointers: delegates and dynamic arrays initialized from stack-allocated arrays. * For delegates require the scope keyword in the signature of the callee. A scoped delegate cannot be stored, only called or passed down to another function that in turn takes a scoped delegate. This makes scope delegates entirely safe. Non-scoped delegates use dynamic allocation. * We don't have an idea for dynamic arrays initialized from stack-allocated arrays. Thoughts? Ideas? Andrei
Nov 02 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with 
 classes.

UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All objects have a vtable. Bye, bearophile
Nov 02 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it
 with classes.

UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All objects have a vtable.

Yah, we can't get rid of that. Possibilities discussed were (a) make final classes not have a vtable, and (b) define a new kind of struct that's only heap allocated. Walter thinks both add quite some complication for little benefit. Let's not forget that a cast will allow the trick for those interested in saving the extra word. Andrei
Nov 02 2008
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with
 classes.


 Bye,
 bearophile

And a monitor. And RTTI. Then again, for code that absolutely must be as efficient as possible, doing some fairly hackish/unsafe things is generally considered more acceptable than in run-of-the-mill programming. In these cases, you could always do it with structs and just use the casts. In the other 97% of cases, when we should forget about small efficiencies, a class works fine.
Nov 02 2008
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-02 10:12:46 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 It looks like things will move that way. Bartosz, Walter and I talked a 
 lot yesterday about it - a lot of crazy things were on the table! The 
 next step is to make this a reference, which is highly related to 
 escape analysis. At the risk of anticipating a bit an unfinalized 
 design, here's what's on the table:
 
 * Continue an "anything goes" policy for *explicit* pointers, i.e. 
 those written explicitly by user code with stars and stuff.

That's a little disapointing. I was hoping for something to fix all holes. I know it isn't easy to design and implement, but once done I firmly believe it would have the potential to completely eliminate the need for explicit memory allocation. For the programmer, it's a good trade: less worrying about what needs to be dynamically allocated and better documented function signatures. Perhaps that would be too much of a departure from C and C++ though.
 * Disallow pointers in SafeD.

Again a consequence of not having a full scoping solution. Couldn't you allow pointers in SafeD, while disallowing taking the address of local variables? This would limit pointers to heap-allocated variables. And disallow pointer arithmetic too.
 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a cast. 
 I haven't proved it to myself yet, but I believe that if pointers are 
 not used and with the amendments below regarding arrays and delegates, 
 this makes things entirely safe. In Walter's words, "it buttons things 
 pretty tight".

If this means you can't implement a swap function for this struct, then I think you're right that it's safe: struct A { ref A a; } void swap(ref A a0, ref A a1); On the other side, if you can implement the swap function, then calling it is unsafe since you can rebind a reference to another without being able to check that their scopes are compatible. So basically, references must always be initialized at construction and should be non-rebindable, just like in C++. (Hum, and I should mention I don't like too much references in C++.)
 * Make this a reference so that it obeys what references obey.

Ah, so that's why Walter wanted to change that suddenly. This is a good thing by itself, even without correct scoping.
 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain 
 and escape &this. That also means they'd be using pointers, so anything 
 goes - pointers are not restricted from escaping.
 
 * There are two cases in which things escape without the user 
 explicitly using pointers: delegates and dynamic arrays initialized 
 from stack-allocated arrays.
 
 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

Again, I'd say that if you can implement a swap function with those scope delegates, it's unsafe. Case in point: void f1(ref scope void delegate() arg) { int i; scope void f2() { ++i; } scope void delegate() inner = &f2; swap(arg, inner); // this should be an error. arg = inner; // this too should be an error. } If you can't rebind a the value of a scope delegate pointer, then all is fine.
 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Either disallow it, either keep it as unsafe as pointers (bad for SafeD I expect), or implement a complete scope-checking system (if you do it for arrays, you'll have done it for pointers too). You don't have much choice there, as arrays are pretty much the same thing as pointers.
 Thoughts? Ideas?

I'm under the impression that scope classes could be dangerous in this system: an object reference is not necessarly on the heap. Personally, I'd have liked to have a language where you can be completely scope safe, where you could document interfaces so they know the scope they're evolving in. This concept of something in between is a nice attempt at a compromize, but I find it somewhat limitting. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 02 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-02 10:12:46 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 It looks like things will move that way. Bartosz, Walter and I talked 
 a lot yesterday about it - a lot of crazy things were on the table! 
 The next step is to make this a reference, which is highly related to 
 escape analysis. At the risk of anticipating a bit an unfinalized 
 design, here's what's on the table:

 * Continue an "anything goes" policy for *explicit* pointers, i.e. 
 those written explicitly by user code with stars and stuff.

That's a little disapointing. I was hoping for something to fix all holes. I know it isn't easy to design and implement, but once done I firmly believe it would have the potential to completely eliminate the need for explicit memory allocation. For the programmer, it's a good trade: less worrying about what needs to be dynamically allocated and better documented function signatures. Perhaps that would be too much of a departure from C and C++ though.

That's only the half of it. If you want to take a look at a C-like language that is safe, you may want to look at Cyclone. The reality is that making things 100% safe are going to require more or less the moral equivalent of Cyclone's limitations and demands from its user. I think Dan Grossman has done an excellent job making things "as tight as possible but not tighter", so Cyclone is a great yardstick to measure D's tradeoffs against.
 * Disallow pointers in SafeD.

Again a consequence of not having a full scoping solution.

A "full scoping solution" would impose demands on you that you'd be the first to dislike.
 Couldn't you allow pointers in SafeD, while disallowing taking the 
 address of local variables? This would limit pointers to heap-allocated 
 variables. And disallow pointer arithmetic too.

I think pointers can be allowed in SafeD under certain restrictions starting with the ones you mention. We best start from the safe end.
 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a 
 cast. I haven't proved it to myself yet, but I believe that if 
 pointers are not used and with the amendments below regarding arrays 
 and delegates, this makes things entirely safe. In Walter's words, "it 
 buttons things pretty tight".

If this means you can't implement a swap function for this struct, then I think you're right that it's safe: struct A { ref A a; } void swap(ref A a0, ref A a1); On the other side, if you can implement the swap function, then calling it is unsafe since you can rebind a reference to another without being able to check that their scopes are compatible.

Swap will work fine because ref is not a type constructor. Struct A is in error. In fact ref not being a type constructor is much of the beauty of it all.
 So basically, references must always be initialized at construction and 
 should be non-rebindable, just like in C++. (Hum, and I should mention I 
 don't like too much references in C++.)

No, C++ references are "almost" type constructors. Also note that rvalues won't bind to any kind of references in D. (More on that later.)
 * Make this a reference so that it obeys what references obey.

Ah, so that's why Walter wanted to change that suddenly. This is a good thing by itself, even without correct scoping.

Yah, in fact it's pretty amazing it seems to work out so well. We gain a huge guarantee without changing much in the language.
 * If people want to implement e.g. linked lists, they should do it 
 with classes. Implementing them with structs will require casts to 
 obtain and escape &this. That also means they'd be using pointers, so 
 anything goes - pointers are not restricted from escaping.

 * There are two cases in which things escape without the user 
 explicitly using pointers: delegates and dynamic arrays initialized 
 from stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

Again, I'd say that if you can implement a swap function with those scope delegates, it's unsafe. Case in point: void f1(ref scope void delegate() arg) { int i; scope void f2() { ++i; } scope void delegate() inner = &f2; swap(arg, inner); // this should be an error. arg = inner; // this too should be an error. } If you can't rebind a the value of a scope delegate pointer, then all is fine.

Indeed, rebinding would be disallowed.
 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Either disallow it, either keep it as unsafe as pointers (bad for SafeD I expect), or implement a complete scope-checking system (if you do it for arrays, you'll have done it for pointers too). You don't have much choice there, as arrays are pretty much the same thing as pointers.

Exactly. Essentially array are as "bad" as structs containing pointers.
 Thoughts? Ideas?

I'm under the impression that scope classes could be dangerous in this system: an object reference is not necessarly on the heap.

I think a fair move to do is deal away with scope classes. We can still allow them via systems-level tricks, but not with an innocuous construct that's in fact a weapon of mass destruction.
 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they know 
 the scope they're evolving in. This concept of something in between is a 
 nice attempt at a compromize, but I find it somewhat limitting.

I agree. Again, something like this was on the table: void wyda(scope T* a, scope U* b) if (scope(a) <= scope(b) { a.field = b; } I think it's not hard to appreciate the toll this kind of user-written function summary exacts on the user of the language. Andrei
Nov 02 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-02 19:04:37 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they know 
 the scope they're evolving in. This concept of something in between is 
 a nice attempt at a compromize, but I find it somewhat limitting.

I agree. Again, something like this was on the table: void wyda(scope T* a, scope U* b) if (scope(a) <= scope(b) { a.field = b; } I think it's not hard to appreciate the toll this kind of user-written function summary exacts on the user of the language.

First, I think it's a pretty good idea to have this. Second, I think it's possible to improve the syntax; there should be a way to not have to worry about the scope rules when you don't want them to bother you. Here's something we could do about it... Add a special keyword (lets call it "autoscope" for now) that you can put at the start of the function making the compiler create automatically the less restrictive scope constrains from the function body and apply them to the signature. The restriction is that the source must be available for the compiler to see and there must not be any override based solely on scope constrains. So basically, you could write: autoscope void wyda(T* a, U* b) { a.field = b; } and the compiler would make the signature like your example above. And it'd be a good idea if the compiler could generate correct scoping constrains (without using "autoscope") in an eventual generated .di file to make things faster and not reliant on the code itself. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 02 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-02 19:04:37 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they 
 know the scope they're evolving in. This concept of something in 
 between is a nice attempt at a compromize, but I find it somewhat 
 limitting.

I agree. Again, something like this was on the table: void wyda(scope T* a, scope U* b) if (scope(a) <= scope(b) { a.field = b; } I think it's not hard to appreciate the toll this kind of user-written function summary exacts on the user of the language.

First, I think it's a pretty good idea to have this. Second, I think it's possible to improve the syntax; there should be a way to not have to worry about the scope rules when you don't want them to bother you. Here's something we could do about it...

But syntax is so little a part of it. I knew since age immemorial that escape analysis is a bitch. I mean, everybody knows. Every once in a while, I'd get lulled into the belief that things can get "a little pregnant" in a sweet spot where the implementation isn't too hard, limitations aren't too severe, and the language doesn't get too complex. A couple of weeks ago was the (n + 1)th time that that happened; I got encouraged that Walter was willing to tackle the task of writing even a context/flow insensitive escape analyzer, and I also got hope from "scope" being an easy way to express something about a function. Ironically, it was your example that disabused me of my mistaken belief. That leaves me in the position that if someone wants to show me there *is* such a sweet spot, they better come with a very airtight argument. Andrei
Nov 02 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 But syntax is so little a part of it. I knew since age immemorial that 
 escape analysis is a bitch. I mean, everybody knows. Every once in a 
 while, I'd get lulled into the belief that things can get "a little 
 pregnant" in a sweet spot where the implementation isn't too hard, 
 limitations aren't too severe, and the language doesn't get too 
 complex. A couple of weeks ago was the (n + 1)th time that that 
 happened; I got encouraged that Walter was willing to tackle the task 
 of writing even a context/flow insensitive escape analyzer, and I also 
 got hope from "scope" being an easy way to express something about a 
 function. Ironically, it was your example that disabused me of my 
 mistaken belief.

Studying things more in depth often at first leave you with the impression that things are more complicated than they are. But after some time, you start to see a few common patterns and you can start to simplify and unify the concepts. Who would have thought some centuries ago that you could use the same math formulas to understand how an apple falls from a tree and how the Moon is orbiting around the Earth? Perhaps it's a wise choice to forget about the idea and avoid wasting time on making things more complicated *if* they'll indeed make things more complicated. But right now I have the feeling that you're bailing out after a first try seeing things are more complicated than they first looked like, without even digging further to see if there are common patterns that would allow simplification and unification with other concepts further down the line.
 That leaves me in the position that if someone wants to show me there 
 *is* such a sweet spot, they better come with a very airtight argument.

I believe I have a complete solution by placing the scope annotations on the type as I will explain below, alghouth I don't have a good syntax for it. My solution doesn't revolve around escape analysis but more about explicit scoping constrains (which could and should be made implicit through escape analysis, but that isn't stricly needed for the scoping system to work). And, as a bonus, it can provide a way for the compiler to completly free the programmer from having to explicity dynamically allocate things in his program (because all scopes are known at compile-time, the compiler can tell what needs to be dynamically allocated and what doesn't need to). So, are you interested? - - - Personally, I'd implement scoping rules by reusing the framework that was built for const. I'd make scope like const (a type modifier, is it called like that?), but with the additional variation that each scope qualifier could be bound to another variable's scope that would become a child scope (needed for a swap function for instance). Basically, each pointer or reference in a type can get its own scope qualifier. Scope restriction work in the reverse direction however: the data you point to impose scoping restrictions to pointers leading to it, not the other way around like with transitive const. You can have a scope pointer to no-scope data. You can't have a no-scope pointer pointing to scope data. So "scope(char)*" makes little sense, since "char" being scope, the pointer needs to be scope too. This makes more sense: "char scope(*)", a scope pointer to non-scope data. Basically, scope should be more and more restricted while reading a type from left to right, so you could have something like "char scope(* scopeof(x)(*))". There's of course a need for a better syntax than the above. But, I think the ugly syntax above conceptualize pretty well a good solution to the scoping problem that could extend arrays, structs and classes. We sure should make it prettier, perhaps by imposing restrictions like forcing all pointers in the type to be of the most restrictive scope which would avoid placing scope annotations everywhere in it. But in essence, I think this solution is workable. From there, we can define scope comparisons, and scope restriction checks to apply when asigning variables to one another. Scope restriction checks could allow restriction propagation when doing escape analysis. If you want more details, I can provide them as I've thought of the matter a lot in the last few days. I just don't have the time to write about everything about it right now. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 03 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to write 
 about everything about it right now.

It may be wise to read some more before writing some more. As far as I understand it, your idea, if taken to completion, is very much like region analysis as defined in Cyclone. http://www.research.att.com/~trevor/papers/pldi2002.pdf Here are some slides: http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt My hope was that we can obtain an approximation of that idea by defining only two regions - "inside this function" and "outside this function". It looks like that's not much gain for a lot of pain. So the question is - should we introduce region analysis to D, or not? Andrei
Nov 03 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-03 11:21:08 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to write 
 about everything about it right now.

It may be wise to read some more before writing some more. As far as I understand it, your idea, if taken to completion, is very much like region analysis as defined in Cyclone. http://www.research.att.com/~trevor/papers/pldi2002.pdf Here are some slides: http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt

Pretty interesting slides. Yeah, that looks pretty much like my idea, in concept, where I call regions scopes. But I'd have made things simpler by having only local function regions (on the stack) and the global region (dynamically-allocated garbage-collected heap), which mean you don't need templates at all for dealing with them. I also belive we can completly avoid the use of named regions, such as: { int*`L p; L: { int x; p = x; } } The problem illustrated above, of having a pointer outside the inner braces take the address of a variable inside it, solves itself if you allow a variable's region to be "promoted" automatically to a broader one. For instance, you could write: { int* p; { int x; p = x; } } and p = x would make the compiler automatically extend the life of x up to p's region (local scope), although x wouldn't be accessible outside of the the inner braces other than by dereferencing p. If the pointer was copied outside of the function, then the only available broader region to promote x to would be the heap. I think this should be done automatically, although it could be decided to require dynamic allocation to be explicit too; this is of little importance to the escape analysis and scopre restriction problem.
 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.
 
 So the question is - should we introduce region analysis to D, or not?

I think we should at least try. I don't think we need everything Cyclone does however; we can and should keep things simpler. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 04 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-03 11:21:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to 
 write about everything about it right now.

It may be wise to read some more before writing some more. As far as I understand it, your idea, if taken to completion, is very much like region analysis as defined in Cyclone. http://www.research.att.com/~trevor/papers/pldi2002.pdf Here are some slides: http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt

Pretty interesting slides. Yeah, that looks pretty much like my idea, in concept, where I call regions scopes. But I'd have made things simpler by having only local function regions (on the stack) and the global region (dynamically-allocated garbage-collected heap), which mean you don't need templates at all for dealing with them.

I don't understand that part.
 I also belive we can 
 completly avoid the use of named regions, such as:
 
     {
         int*`L p;
         L: { int x; p = x; }
     }
 
 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:
 
     {
         int* p;
         { int x; p = x; }
     }
 
 and p = x would make the compiler automatically extend the life of x up 
 to p's region (local scope), although x wouldn't be accessible outside 
 of the the inner braces other than by dereferencing p.

Cyclone has region subtyping which takes care of that.
 If the pointer was copied outside of the function, then the only 
 available broader region to promote x to would be the heap. I think this 
 should be done automatically, although it could be decided to require 
 dynamic allocation to be explicit too; this is of little importance to 
 the escape analysis and scopre restriction problem.
 
 
 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.

 So the question is - should we introduce region analysis to D, or not?

I think we should at least try. I don't think we need everything Cyclone does however; we can and should keep things simpler.

I'm not sure how to read this. For what I can tell, Cyclone's region analysis does not introduce undue complexity. It does the minimum necessary to prove that function manipulating pointers are safe. So if you suggest a simpler scheme, then either it is more limiting, less safe, or both. What are the tradeoffs you are thinking about, and how do they compare to Cyclone? Andrei
Nov 04 2008
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Yeah, that looks pretty much like my idea, in concept, where I call 
 regions scopes. But I'd have made things simpler by having only local 
 function regions (on the stack) and the global region 
 (dynamically-allocated garbage-collected heap), which mean you don't 
 need templates at all for dealing with them.

I don't understand that part.

Indeed, I was somewhat mistaken that the <> notation was templates (seen to much C++ lately), which somewhat confused my analysis for a few things later. And perhaps I should have read a little more about Cyclone before attempting a comparison as it seems I got a few things wrong from the slides.
 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.
 
 So the question is - should we introduce region analysis to D, or not?

I think we should at least try. I don't think we need everything Cyclone does however; we can and should keep things simpler.

I'm not sure how to read this. For what I can tell, Cyclone's region analysis does not introduce undue complexity. It does the minimum necessary to prove that function manipulating pointers are safe. So if you suggest a simpler scheme, then either it is more limiting, less safe, or both. What are the tradeoffs you are thinking about, and how do they compare to Cyclone?

I guess I'd have to familiarize myself with Cyclone a little more to be able to do a good comparison. Right now I've just been scratching the surface, but it looks more complicated than what I had in mind for D. I'd tend to believe Cyclone may cover some cases that wouldn't be by mine, but I'm not sure which one and I am currently under the impression that they are not that important (could be handled in other manners). Don't forget that Cyclone is targeted at the C language, which doesn't has templates nor garbage collection (although Cyclone supports an optional garbage collector). Since D has both, it can leverage some of this to simplify things. For instance, because of the garbage collector I don't think we need what Cyclone calls dynamic regions: I'd simply put everything escaping a function on the heap. It then follows that we don't need to propagate region handles. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 04 2008
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I also belive we can completly avoid the use of named regions, such as:
 
     {
         int*`L p;
         L: { int x; p = x; }
     }
 
 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:
 
     {
         int* p;
         { int x; p = x; }
     }
 
 and p = x would make the compiler automatically extend the life of x up 
 to p's region (local scope), although x wouldn't be accessible outside 
 of the the inner braces other than by dereferencing p.

Cyclone has region subtyping which takes care of that.

Not the same way as I'm proposing. What cyclone does is make p undereferencable outside the scope of L. So if I add an assignment to p outside of L, it won't compile: { int*`L p; L: { int x; p = &x; } *p = 42; // error, dereferencing p outside of L. } What I'm proposing is that such code extends the life of the storage of the local variable x to p's region: { int* p; { int x; p = &x; } *p = 42; // okay; per assignment to p, x lives up to p's scope. x; // error, x is not accessible in this scope, except through p. } Follows that if p is outside of the local function, x needs to be allocated dynamically (just as closures currently do for each variable they use): void f(ref int* p) { int x; p = &x; } If you want to make sure x never escapes the memory region associated to its scope, then you can declare x as scope and get a compile-time error when assigning it to p. So, in essence, the system I propose is a little simpler because pointer variables just cannot point to values coming from a region that doesn't exist in the scope the pointer is declared. The guaranty I propose is that during the whole lifetime of a pointer, it points to either a valid memory region, or null. Cyclone's approach is to forbid you from dereferencing the pointer. Combine this with my proposal to not have dynamic regions and we don't need named regions anymore. Perhaps the syntax could be made simpler with region names, but technically, we don't need them as we can always go the route of saying that a pointer value is "valid within the scope of variable_x". This is what I'm expressing with "scopeof(variable_x)" in my other examples, and I believe it is analogous to the "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it pervasively. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 05 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 I also belive we can completly avoid the use of named regions, such as:

     {
         int*`L p;
         L: { int x; p = x; }
     }

 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:

     {
         int* p;
         { int x; p = x; }
     }

 and p = x would make the compiler automatically extend the life of x 
 up to p's region (local scope), although x wouldn't be accessible 
 outside of the the inner braces other than by dereferencing p.

Cyclone has region subtyping which takes care of that.

Not the same way as I'm proposing. What cyclone does is make p undereferencable outside the scope of L. So if I add an assignment to p outside of L, it won't compile: { int*`L p; L: { int x; p = &x; } *p = 42; // error, dereferencing p outside of L. } What I'm proposing is that such code extends the life of the storage of the local variable x to p's region: { int* p; { int x; p = &x; } *p = 42; // okay; per assignment to p, x lives up to p's scope. x; // error, x is not accessible in this scope, except through p. }

Well how about this: int * p; float * q; if (condition) { int x; p = &x; } else { float y; q = &y; } Houston, we have a problem. You can of course patch that little rule in a number of ways, but really at the end of the day what happens only inside a function is uninteresting. The main challenge is making the analysis scalable to multiple functions.
 Follows that if p is outside of the local function, x needs to be 
 allocated dynamically (just as closures currently do for each variable 
 they use):
 
     void f(ref int* p)
    {
        int x;
         p = &x;
    }

Well this pretty much hamstrings pointers. You can take addresses of things inside a function but you can't pass them around. Moreover, people disliked the stealth dynamic allocation when delegates are being used; you are adding more of those.
 If you want to make sure x never escapes the memory region associated to 
 its scope, then you can declare x as scope and get a compile-time error 
 when assigning it to p.
 
 So, in essence, the system I propose is a little simpler because pointer 
 variables just cannot point to values coming from a region that doesn't 
 exist in the scope the pointer is declared. The guaranty I propose is 
 that during the whole lifetime of a pointer, it points to either a valid 
 memory region, or null. Cyclone's approach is to forbid you from 
 dereferencing the pointer.
 
 Combine this with my proposal to not have dynamic regions and we don't 
 need named regions anymore. Perhaps the syntax could be made simpler 
 with region names, but technically, we don't need them as we can always 
 go the route of saying that a pointer value is "valid within the scope 
 of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
 in my other examples, and I believe it is analogous to the 
 "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
 pervasively.

IMHO this may be made to work. I personally prefer the system in which ref is safe and pointers are permissive. The system you are referring to makes ref and pointer of the same power, so we could as well dispense with either. But I'd be curious what others think of it. Notice how the discussion participants got reduced to you and me, and from what I saw that's not a good sign. Andrei
Nov 06 2008
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 IMHO this may be made to work. I personally prefer the system in which ref 
 is safe and pointers are permissive. The system you are referring to makes 
 ref and pointer of the same power, so we could as well dispense with 
 either. But I'd be curious what others think of it. Notice how the 
 discussion participants got reduced to you and me, and from what I saw 
 that's not a good sign.

FWIW, I still think the proposal you have put forth about references being the safe type and pointers being permissive is the best one so far. It's clean, doesn't add excessive syntax, and makes good practical sense. I think full scope analysis is an interesting problem to solve, but it may just be an academic exercise, as it would be impractical to develop with. Just MHO. -Steve
Nov 07 2008
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-06 23:36:55 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well how about this:
 
 int * p;
 float * q;
 if (condition) {
      int x; p = &x;
 } else {
      float y; q = &y;
 }
 
 Houston, we have a problem.

I don't see a problem at all. The compiler would expand the lifetime of x to the outer scope, and do the same for y. Basically, the compiler would make it this way in the compiled code: int * p; float * q; int x; float y; if (condition) { p = &x; } else { q = &y; } A good optimising compiler could also place x and y in a union to save some space.
 You can of course patch that little rule in a number of ways, but 
 really at the end of the day what happens only inside a function is 
 uninteresting. The main challenge is making the analysis scalable to 
 multiple functions.

Indeed. Personally, I take the case above as a simple optimisation to avoid unnecessary dynamic allocation of x and y when you need to extend variable lifetime to a broader scope part of the same function.
 Follows that if p is outside of the local function, x needs to be 
 allocated dynamically (just as closures currently do for each variable 
 they use):
 
     void f(ref int* p)
    {
        int x;
         p = &x;
    }

Well this pretty much hamstrings pointers. You can take addresses of things inside a function but you can't pass them around. Moreover, people disliked the stealth dynamic allocation when delegates are being used; you are adding more of those.

I'd like to point out that the two things people complained the most about regarding the automatic dynamic allocation for dynamic closures: 1. There is no way to prevent it, to make sure there is no allocation. 2. The compiler does allocate a lot more than necessary. In my proposal, these two points are addressed: 1. You can declare any variable as "scope", preventing it from being placed in a broader scope, preventing at the same time dynamic allocation. 2. The compiler being aware of what arguments do and do not escape the scope of the called functions, it won't allocate unnecessarily. So I think the situation would be much better. But all this is orthogonal to having or not an escape analysis system, as we could choose the reverse conventions: no variable can escape its scope unless explicitly authorized by some new syntactic construct.
 If you want to make sure x never escapes the memory region associated 
 to its scope, then you can declare x as scope and get a compile-time 
 error when assigning it to p.
 
 So, in essence, the system I propose is a little simpler because 
 pointer variables just cannot point to values coming from a region that 
 doesn't exist in the scope the pointer is declared. The guaranty I 
 propose is that during the whole lifetime of a pointer, it points to 
 either a valid memory region, or null. Cyclone's approach is to forbid 
 you from dereferencing the pointer.
 
 Combine this with my proposal to not have dynamic regions and we don't 
 need named regions anymore. Perhaps the syntax could be made simpler 
 with region names, but technically, we don't need them as we can always 
 go the route of saying that a pointer value is "valid within the scope 
 of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
 in my other examples, and I believe it is analogous to the 
 "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
 pervasively.

IMHO this may be made to work. I personally prefer the system in which ref is safe and pointers are permissive. The system you are referring to makes ref and pointer of the same power, so we could as well dispense with either.

I'm not too thrilled by references. I once got a question from someone coming from C: what is the difference between a pointer and a reference in C++? I had to answer: references are pointers with a different syntax, no rebindability, and no possibility of being null. It seems he and I both agree that references are mostly a cosmetic patch to solve a syntactic problem. References in D aren't much different. If we could have a unified syntax for pointers of all kinds, I think it'd be more convenient than having two kinds of pointers. A null-forbiding but rebindable pointer would be more useful in my opinion than the current reference concept.
 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

Indeed. I'm interested in other opinions too. But I'm under the impression that many lost track of what was being discussed, especially since we started referring to Cyclone which few are familiar with and probably few have read the paper. One of the fears expressed at the start of the thread was about excessive need for annotation, but as the Cyclone paper say, with good defaults, you need to add scoping annotation only to a few specific places. (It took me some time to read the paper and start discussing things sanely after that, remember?) So perhaps we could get more people involved if we could propose a tangible syntax for it. Or perhaps not; for advanced programmers who already understand well what can and cannot be done by passing pointers around, full escape analysis may not seem to be a so interesting gain since they've already adopted the right conventions to avoid most bugs it would prevent. And most people here who can discuss this topic with some confidence are not newbies to programming and don't make too much mistakes of the sort anymore. Which makes me think of beginners saying pointers are hard. You've certainly seen beginners struggle as they learn how to correctly use pointers in C or C++. Making sure their program fail at compile-time, with an explicative error message as to why they mustn't do this or that, is certainly going to help their experience learning the language more than cryptic and frustrating segfaults and access violations at runtime, sometime far from the source of the problem. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 09 2008
next sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Michel Fortin wrote:
 On 2008-11-06 23:36:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Well how about this:

 int * p;
 float * q;
 if (condition) {
      int x; p = &x;
 } else {
      float y; q = &y;
 }

 Houston, we have a problem.

I don't see a problem at all. The compiler would expand the lifetime of x to the outer scope, and do the same for y. Basically, the compiler would make it this way in the compiled code: int * p; float * q; int x; float y; if (condition) { p = &x; } else { q = &y; }

In point of fact, it's expensive to extend the stack, so any compiler would do that, even without escape analysis. On the other hand, what about nested functions? I don't think they'd cause any trouble, but I'm not certain.
Nov 09 2008
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-09 08:59:18 -0500, Christopher Wright <dhasenan gmail.com> said:

 Michel Fortin wrote:
 I don't see a problem at all. The compiler would expand the lifetime of 
 x to the outer scope, and do the same for y. Basically, the compiler 
 would make it this way in the compiled code:
 
     int * p;
     float * q;
     int x;
     float y;
     if (condition) {
         p = &x;
     } else {
         q = &y;
     }

In point of fact, it's expensive to extend the stack, so any compiler would do that, even without escape analysis.

Indeed.
 On the other hand, what about nested functions? I don't think they'd 
 cause any trouble, but I'm not certain.

If you mean there could be a problem with functions referring to the pointer, I'd say that with properly propagated escape constrains, it's safe. But it's an interesting case nonetheless. Consider this: int * p; if (condition) { int x; p = &x; } else { int y; p = &y; } int f() { return *p; } return &f; Now returning &f forces p to dynamically allocate on the heap, which puts a constrain on p forcing it to point only to variables on the heap, which in turn forces x and y to be allocated on the heap. I haven't verified, but I'm pretty certain this doesn't work correctly with the current dynamic closures in D2 however (because escape analysis doesn't see through pointers). Also, if you made p point to a value it received in argument, and the scope of that argument isn't the global scope, it'd be an error. For instance, this wouldn't work: int delegate() foo1(int* arg) { int f() { return *arg; } return &f; // error, returned closure may live longer than *arg; need constraint } Constraining the lifetime of the returned value to be no longer than the one of the argument would allow it to work safely (disregard the bizarre syntax for expressing the constrain on the delegate): int delegate(arg)() foo2(int* arg) { int f() { return *arg; } return &f; // ok, returned closure lifetime guarantied to be // at most as long as the lifetime of *arg. } int globalInt; int delegate() globalDelegate; void bar() { int localInt; int delegate() localDelegate; globalDelegate = foo2(globalInt); // ok, same lifetime localDelegate = foo2(globalInt); // ok, delegate lifetime shorter localDelegate = foo2(localInt); // ok, same lifetime globalDelegate = foo2(localInt); // ok, but forces bar to allocate localInt on the heap since otherwise // localInt lifetime would be shorter than lifetime of the delegate } Note that what I want to demonstrate is that the compiler can see pretty clearly what needs and what doesn't need to be allocated on the heap to guaranty safety. Whether we decide it does allocate automatically or it generate an error is of lesser concern to me. (And I'll add that some other issues with templates may make this automatic allocation scheme unworkable.) -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 14 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 I'd like to point out that the two things people complained the most 
 about regarding the automatic dynamic allocation for dynamic closures:
 
 1.    There is no way to prevent it, to make sure there is no allocation.
 2.    The compiler does allocate a lot more than necessary.
 
 In my proposal, these two points are addressed:
 
 1.    You can declare any variable as "scope", preventing it from being 
 placed
     in a broader scope, preventing at the same time dynamic allocation.
 2.    The compiler being aware of what arguments do and do not escape the
     scope of the called functions, it won't allocate unnecessarily.
 
 So I think the situation would be much better.

I agree that an escape analyzer would improve things. I am not sure that one oblivious to regions is expressive enough.
 But all this is orthogonal to having or not an escape analysis system, 
 as we could choose the reverse conventions: no variable can escape its 
 scope unless explicitly authorized by some new syntactic construct.

It's not orthogonal. Whatever the default is, you must be able to enforce escaping rules, otherwise the system would be as good as a convention.
 If you want to make sure x never escapes the memory region associated 
 to its scope, then you can declare x as scope and get a compile-time 
 error when assigning it to p.

 So, in essence, the system I propose is a little simpler because 
 pointer variables just cannot point to values coming from a region 
 that doesn't exist in the scope the pointer is declared. The guaranty 
 I propose is that during the whole lifetime of a pointer, it points 
 to either a valid memory region, or null. Cyclone's approach is to 
 forbid you from dereferencing the pointer.

 Combine this with my proposal to not have dynamic regions and we 
 don't need named regions anymore. Perhaps the syntax could be made 
 simpler with region names, but technically, we don't need them as we 
 can always go the route of saying that a pointer value is "valid 
 within the scope of variable_x". This is what I'm expressing with 
 "scopeof(variable_x)" in my other examples, and I believe it is 
 analogous to the "regions_of(variable_x)" in Cyclone, although 
 Cyclone doesn't use it pervasively.

IMHO this may be made to work. I personally prefer the system in which ref is safe and pointers are permissive. The system you are referring to makes ref and pointer of the same power, so we could as well dispense with either.

I'm not too thrilled by references. I once got a question from someone coming from C: what is the difference between a pointer and a reference in C++? I had to answer: references are pointers with a different syntax, no rebindability, and no possibility of being null. It seems he and I both agree that references are mostly a cosmetic patch to solve a syntactic problem. References in D aren't much different.

I disagree. References in D are very different. They are not type constructors. They are storage classes that can only be used in function signatures, which makes them impossible to dangle. I think C++ references would also have been much better off as storage classes instead of half-life types.
 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my opinion 
 than the current reference concept.

Well ref means "This function wants to modify its argument". That is a very different charter from what pointers mean. So I'm not sure how you say you'd much prefer this to that. They are not comparable.
 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

Indeed. I'm interested in other opinions too. But I'm under the impression that many lost track of what was being discussed, especially since we started referring to Cyclone which few are familiar with and probably few have read the paper.

In my experience, when someone is interested in something, she'd make time for it. So I take that as lack of interest. And hey, since when was lack of expertise a real deterrent? :o)
 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with good 
 defaults, you need to add scoping annotation only to a few specific 
 places. (It took me some time to read the paper and start discussing 
 things sanely after that, remember?) So perhaps we could get more people 
 involved if we could propose a tangible syntax for it.

To be very frank, I think we are very far from having an actual proposal, and syntax is of very low priority now if you want to put one together. Right now what we have is a few vague ideas and conjectures (e.g., there's no need for named regions because the need would be rare enough to require dynamic allocation for those cases). I'm not saying that to criticize, but merely to underline the difficulties.
 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've already 
 adopted the right conventions to avoid most bugs it would prevent. And 
 most people here who can discuss this topic with some confidence are not 
 newbies to programming and don't make too much mistakes of the sort 
 anymore.
 
 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the language 
 more than cryptic and frustrating segfaults and access violations at 
 runtime, sometime far from the source of the problem.

I totally agree that pointers are hard and good static checking for them would help. Currently, what we try to do is obviate the need for pointers in most cases, and to actually forbid them in safe modules. The question that remains is, how many unsafe modules are necessary, and what liability do they entail? If there are few and not too unwieldy, maybe we can declare victory without constructing an escape analyzer. I agree if you or anyone says they don't think so. At this point, I am not sure, but what I can say is that it's good to reduce the need for pointers regardless. Andrei
Nov 09 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 I'd like to point out that the two things people complained the most 
 about regarding the automatic dynamic allocation for dynamic closures:
 
 1.    There is no way to prevent it, to make sure there is no allocation.
 2.    The compiler does allocate a lot more than necessary.
 
 In my proposal, these two points are addressed:
 
 1.    You can declare any variable as "scope", preventing it from being placed
     in a broader scope, preventing at the same time dynamic allocation.
 2.    The compiler being aware of what arguments do and do not escape the
     scope of the called functions, it won't allocate unnecessarily.
 
 So I think the situation would be much better.

I agree that an escape analyzer would improve things. I am not sure that one oblivious to regions is expressive enough.

If you think I proposed a region-oblivious scheme, then you've got me wrong (and perhaps it's my fault for not explaining well enough). Let me explain again, and I'll try to not skip anything this time. Cyclone has dynamic regions, regions which are allocated on the heap but that are deleted at the end of the scope that created them. Basically, those are scoped heaps offering a very useful system to automatically free memory. (It's somewhat similar in concept to Cocoa's NSAutoReleasePool for instance.) The downside of them is that you need to pass region handle around (so called functions can allocate objects within them). So my first point is that since we have a garbage collector in D, and moreover since we're likely to get one heap per thread in D2, we don't need dynamic regions. The remaining regions are: 1) the shared heap, 2) the thread-local heap, 3) All the stack frames; and you can't allocate other stack frames than the current one. Because none of these regions require a handle to allocate into, we (A) don't need region handles. We still have many regions. Beside the two heaps (shared, thread-local), each function's stack frame, and each block within them, creates a distinct memory region. But nowhere we need to know exactly which region a function parameter comes from; what we need to know is which address outlives which pointer, and then we can forbid assigning addresses to pointers that outlive them. All we need is a relative ordering of the various regions, and for that we don't need to attach *names* to the regions so that you can refer explicitly to them in the syntax. Instead, you could say something like "region of (x)", or "region of (*y)" and that would be enough. So there is still a region for every pointer, only regions don't need to be *named* because you can always refer to them by referring to the variables. (And perhaps the syntax would be clearer with region names than without, in which case I don't mind we use them. But they're not required for the concept to work.)
 I'm not too thrilled by references. I once got a question from someone 
 coming from C: what is the difference between a pointer and a reference 
 in C++? I had to answer: references are pointers with a different 
 syntax, no rebindability, and no possibility of being null. It seems he 
 and I both agree that references are mostly a cosmetic patch to solve a 
 syntactic problem. References in D aren't much different.

I disagree. References in D are very different. They are not type constructors. They are storage classes that can only be used in function signatures, which makes them impossible to dangle. I think C++ references would also have been much better off as storage classes instead of half-life types.

Which makes me think of this: struct A { int i; this(); } ref A foo(ref A a) { return a; } ref A bar() { foo(A()).i = 1; ref A a = foo(A()); // illegal, ref cannot be used outside function signature a.i = 1; return foo(A()); // illegal ? } Also, I'd like to point out that ref (and out) being storage classes somewhat hinder me from using them where it makes sense in the D/Objective-C bridge, since there most functions are instanciated by templates where template arguments give the type of each function argument. Perhaps there should be a way to specify "ref" and "out" in template arguments...
 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my 
 opinion than the current reference concept.

Well ref means "This function wants to modify its argument". That is a very different charter from what pointers mean. So I'm not sure how you say you'd much prefer this to that. They are not comparable.

I was under the impression that ref would be allowed as a storage class for local variables. I'll say it's perfectly acceptable for function arguments, but I'm less sure about function return types. Also, I'd still like to have a non-null pointer type, especially for clarifying function sigatures. A template can do. If it was in the language however it be used by more people, which would be better.
 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

Indeed. I'm interested in other opinions too. But I'm under the impression that many lost track of what was being discussed, especially since we started referring to Cyclone which few are familiar with and probably few have read the paper.

In my experience, when someone is interested in something, she'd make time for it. So I take that as lack of interest. And hey, since when was lack of expertise a real deterrent? :o)

As I said below, I think many people in this group are already confortable with using pointers, which may explain why they're not so interested. Having no one interested in something doesn't necessarly mean they won't appreciate it when it comes. It does, however reduce the incitative for continuing forward. So I understand why you're backing off, even if it displease me somewhat.
 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with good 
 defaults, you need to add scoping annotation only to a few specific 
 places. (It took me some time to read the paper and start discussing 
 things sanely after that, remember?) So perhaps we could get more 
 people involved if we could propose a tangible syntax for it.

To be very frank, I think we are very far from having an actual proposal, and syntax is of very low priority now if you want to put one together. Right now what we have is a few vague ideas and conjectures (e.g., there's no need for named regions because the need would be rare enough to require dynamic allocation for those cases). I'm not saying that to criticize, but merely to underline the difficulties.

I never said the need for dynamic regions would be rare: I said garbage collector obsoletes it. If we can justify the need for dynamic regions later, we can add them back (with all the added complexity it requires) but I'd try without them first.
 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've already 
 adopted the right conventions to avoid most bugs it would prevent. And 
 most people here who can discuss this topic with some confidence are 
 not newbies to programming and don't make too much mistakes of the sort 
 anymore.
 
 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the language 
 more than cryptic and frustrating segfaults and access violations at 
 runtime, sometime far from the source of the problem.

I totally agree that pointers are hard and good static checking for them would help. Currently, what we try to do is obviate the need for pointers in most cases, and to actually forbid them in safe modules.

But dynamic arrays *are* pointers, how are you oblivating the need for them? If you find a solution for dynamic arrays, you'll have a solution for pointers too. You could forbid dynamic arrays from refering to stack-allocated static ones, or automatically dynamically allocate those when they escape in a dynamic array. And if I were you, whatever you choose for arrays I'd allow it for pointers too, to keep things consistent. Pointer to heap objects should be retained in my opinion.
 The question that remains is, how many unsafe modules are necessary, 
 and what liability do they entail? If there are few and not too 
 unwieldy, maybe we can declare victory without constructing an escape 
 analyzer. I agree if you or anyone says they don't think so. At this 
 point, I am not sure, but what I can say is that it's good to reduce 
 the need for pointers regardless.

But are you reducing the need for pointers or hiding and restricting them? I'd say the later. Reference are pointers with restrictions. Object references are no different from pointer except in syntax (they can even point to stack allocated objects with scope classes). Dynamic arrays are pointers with a certain range. Closure have a pointer to a stack frame, which can be heap-allocated or not. The only way to have a safe system without escape analysis is to force everything they can point to to be on the heap, or prevent them from escaping at all (as with ref). I which there could be some consistency here. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we don't 
 need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
 the thread-local heap, 3) All the stack frames; and you can't allocate 
 other stack frames than the current one. Because none of these regions 
 require a handle to allocate into, we (A) don't need region handles.
 
 We still have many regions. Beside the two heaps (shared, thread-local), 
 each function's stack frame, and each block within them, creates a 
 distinct memory region. But nowhere we need to know exactly which region 
 a function parameter comes from; what we need to know is which address 
 outlives which pointer, and then we can forbid assigning addresses to 
 pointers that outlive them. All we need is a relative ordering of the 
 various regions, and for that we don't need to attach *names* to the 
 regions so that you can refer explicitly to them in the syntax. Instead, 
 you could say something like "region of (x)", or "region of (*y)" and 
 that would be enough.

But how do you type then the assignment example? void assign(int** p, int * r) { *p = *r; } How do you reflect the requirement that r's region outlives *p's region? But that's not even the point. Say you define some notation, such as: void assign(int** p, int * r) if (region(r) <= region(p)); But the whole point of regions was to _simplify_ notations like the above into: void assign(region R)(int*R* p, int *R r); So although you think you simplified things by using region(symbol) instead of symbolic names, you complicated things. The compiler still needs to infer regions for each value, so it is as complicated as a named-regions compiler, and in addition you require the user to write bulkier expressions because you disallow use of symbols. So everybody is worse off. Note how in the example using a symbolic region the outlives relationship is enforced implicitly by using the same symbol name in two places. I suspect there are things you can't even express without symbolic regions. Consider this example from Dan's slides: struct ILst(region R1, region R2) { int *R1 hd; ILst!(R1, R2) *R2 tl; } This code reflects the fact that the list holds pointer to integers in one region, whereas the nodes themselves are in a different region. It would be a serious challenge to tackle that without symbolic regions, and simpler that won't be for anybody. I'll insert a few more points below in this sprawling discussion.
 Which makes me think of this:
 
     struct A { int i; this(); }
     ref A foo(ref A a) { return a; }
 
     ref A bar()
     {
         foo(A()).i = 1;
 
         ref A a = foo(A()); // illegal, ref cannot be used outside 
 function signature
         a.i = 1;
 
         return foo(A()); // illegal ?
     }

foo(A()) is illegal because ref does not bind to an rvalue.
 Also, I'd like to point out that ref (and out) being storage classes 
 somewhat hinder me from using them where it makes sense in the 
 D/Objective-C bridge, since there most functions are instanciated by 
 templates where template arguments give the type of each function 
 argument. Perhaps there should be a way to specify "ref" and "out" in 
 template arguments...

I agree. Something like that is on the list.
 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my 
 opinion than the current reference concept.

Well ref means "This function wants to modify its argument". That is a very different charter from what pointers mean. So I'm not sure how you say you'd much prefer this to that. They are not comparable.

I was under the impression that ref would be allowed as a storage class for local variables. I'll say it's perfectly acceptable for function arguments, but I'm less sure about function return types.

As of now, ref is not planned for local variables.
 Also, I'd still like to have a non-null pointer type, especially for 
 clarifying function sigatures. A template can do. If it was in the 
 language however it be used by more people, which would be better.

I don't grok this notion "if it's in the language it would be used by more people". How does that come about? Does it mean templates are at such a high syntactic disadvantage? Maybe we should do something about that then, such as replacing !() with something else :o). If we put it in phobos (which after integration will be usable alongside with tango) could it count as being in the language?
 But I'd be curious what others think of it. Notice how the 
 discussion participants got reduced to you and me, and from what I 
 saw that's not a good sign.

Indeed. I'm interested in other opinions too. But I'm under the impression that many lost track of what was being discussed, especially since we started referring to Cyclone which few are familiar with and probably few have read the paper.

In my experience, when someone is interested in something, she'd make time for it. So I take that as lack of interest. And hey, since when was lack of expertise a real deterrent? :o)

As I said below, I think many people in this group are already confortable with using pointers, which may explain why they're not so interested. Having no one interested in something doesn't necessarly mean they won't appreciate it when it comes.

That I totally agree with. It's happened a couple of times with D features.
 It does, however reduce the incitative for continuing forward. So I 
 understand why you're backing off, even if it displease me somewhat.

I'm sorry about how you feel. Now we're in a conundrum of sorts. You seem to strongly believe you can make some nice simplified regions work, and make people like them. Taking that to a proof is hard. The conundrum is, you are facing the prospect of putting work into it and creating a system that, albeit correct, is not enticing.
 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with 
 good defaults, you need to add scoping annotation only to a few 
 specific places. (It took me some time to read the paper and start 
 discussing things sanely after that, remember?) So perhaps we could 
 get more people involved if we could propose a tangible syntax for it.

To be very frank, I think we are very far from having an actual proposal, and syntax is of very low priority now if you want to put one together. Right now what we have is a few vague ideas and conjectures (e.g., there's no need for named regions because the need would be rare enough to require dynamic allocation for those cases). I'm not saying that to criticize, but merely to underline the difficulties.

I never said the need for dynamic regions would be rare: I said garbage collector obsoletes it. If we can justify the need for dynamic regions later, we can add them back (with all the added complexity it requires) but I'd try without them first.

Let's not forget that symbolic regions (for typing purposes) should not be confused with dynamic regions (for efficiency purposes). I agree we can do away with the latter and put them in later if we care. I disagree that dropping symbolic regions simplifies things.
 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've 
 already adopted the right conventions to avoid most bugs it would 
 prevent. And most people here who can discuss this topic with some 
 confidence are not newbies to programming and don't make too much 
 mistakes of the sort anymore.

 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the 
 language more than cryptic and frustrating segfaults and access 
 violations at runtime, sometime far from the source of the problem.

I totally agree that pointers are hard and good static checking for them would help. Currently, what we try to do is obviate the need for pointers in most cases, and to actually forbid them in safe modules.

But dynamic arrays *are* pointers, how are you oblivating the need for them? If you find a solution for dynamic arrays, you'll have a solution for pointers too. You could forbid dynamic arrays from refering to stack-allocated static ones, or automatically dynamically allocate those when they escape in a dynamic array. And if I were you, whatever you choose for arrays I'd allow it for pointers too, to keep things consistent. Pointer to heap objects should be retained in my opinion.

But a possible path is to make arrays safe and leave pointers for those cases in which efficiency is of utmost importance. With luck, those cases are rare.
 The question that remains is, how many unsafe modules are necessary, 
 and what liability do they entail? If there are few and not too 
 unwieldy, maybe we can declare victory without constructing an escape 
 analyzer. I agree if you or anyone says they don't think so. At this 
 point, I am not sure, but what I can say is that it's good to reduce 
 the need for pointers regardless.

But are you reducing the need for pointers or hiding and restricting them?

Of course - that's the whole point. In fact, I'll insert a small correction: we are reducing the need for pointers BY hiding and restricting them. And that's a good thing. If you can do most of your work with restricted pointers (e.g. ref), then that's a net win. Andrei
Nov 12 2008
next sibling parent Hxal <hxal freenode.irc> writes:
Andrei Alexandrescu Wrote:
 But how do you type then the assignment example?
 
 void assign(int** p, int * r) { *p = *r; }
 
 How do you reflect the requirement that r's region outlives *p's region?
 
 But that's not even the point. Say you define some notation, such as:
 
 void assign(int** p, int * r) if (region(r) <= region(p));
 
 But the whole point of regions was to _simplify_ notations like the 
 above into:
 
 void assign(region R)(int*R* p, int *R r);
 
 So although you think you simplified things by using region(symbol) 
 instead of symbolic names, you complicated things. The compiler still 
 needs to infer regions for each value, so it is as complicated as a 
 named-regions compiler, and in addition you require the user to write 
 bulkier expressions because you disallow use of symbols. So everybody is 
 worse off. Note how in the example using a symbolic region the outlives 
 relationship is enforced implicitly by using the same symbol name in two 
 places.

Examples such as this one are rare enough to afford the need for annotations. I was under the impression that D was supposed to promote the use of references over pointers. People working with low-level code will probably either appreciate the optimization and correctness checking, or can request a way to turn off compiler enforcement of scoping in low-level code fragments.
 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Transitive scope ownership ensures that a member of a structure outlives the structure itself. In which case we can create a list in a local scope, and either add objects allocated in that scope or any parent scope or the heap. Referencing objects from child scopes would be incorrect and I don't think it's unreasonable to expect the programmer to code around such a desire. foo*R*Q x, if (R in Q) is illegal, because it could produce a dangling reference. foo*R*Q x, if (Q in R) is equivalent to foo*Q*Q, for the purpose of: *x = y; where y is one of foo*R, foo*Q or foo*global A problem arises for other operations though: foo*R*Q might have different semantics than foo*Q*Q when being on the right-hand side of the assignment. y = *x; is legal for foo*R y, but not for foo*Q y. Therefore, while the lifetime must always stay constant or be reduced towards the right side of the type declaration. It's necessary to be able to explicitly relax restrictions towards the left. The problem is that the type syntax is suited for scope relaxation rules to be transitive, not scope restriction. Ie. global(foo*)* makes sense, when * is scoped by default, but scope(foo*)* doesn't make sense, when * is global by default. So we could either implement it with regions, which I'm not a big fan of (better than nothing though!); or ditch "scope" (as a restriction) in favor of "global" and maybe "scopeof()" (as a relaxation). Hopefully soon D2 and the book will be done and the development of D3 can start, and such a breaking change can be introduced.
 But a possible path is to make arrays safe and leave pointers for those 
 cases in which efficiency is of utmost importance. With luck, those 
 cases are rare.

Safe sure, but not by fobidding the usage of stack arrays. Let's try to keep D performance competitive with C++, not C#. :P
Nov 12 2008
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we don't 
 need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
 the thread-local heap, 3) All the stack frames; and you can't allocate 
 other stack frames than the current one. Because none of these regions 
 require a handle to allocate into, we (A) don't need region handles.
 
 We still have many regions. Beside the two heaps (shared, 
 thread-local), each function's stack frame, and each block within them, 
 creates a distinct memory region. But nowhere we need to know exactly 
 which region a function parameter comes from; what we need to know is 
 which address outlives which pointer, and then we can forbid assigning 
 addresses to pointers that outlive them. All we need is a relative 
 ordering of the various regions, and for that we don't need to attach 
 *names* to the regions so that you can refer explicitly to them in the 
 syntax. Instead, you could say something like "region of (x)", or 
 "region of (*y)" and that would be enough.

But how do you type then the assignment example? void assign(int** p, int * r) { *p = *r; } How do you reflect the requirement that r's region outlives *p's region? But that's not even the point. Say you define some notation, such as: void assign(int** p, int * r) if (region(r) <= region(p)); But the whole point of regions was to _simplify_ notations like the above into: void assign(region R)(int*R* p, int *R r); So although you think you simplified things by using region(symbol) instead of symbolic names, you complicated things. The compiler still needs to infer regions for each value, so it is as complicated as a named-regions compiler, and in addition you require the user to write bulkier expressions because you disallow use of symbols. So everybody is worse off. Note how in the example using a symbolic region the outlives relationship is enforced implicitly by using the same symbol name in two places.

Everywhere I said there was no need for named regions, I also said named regions could be kept to ease the syntax. That said, I'm not so sure named regions are that good at simplifying the syntax. In your assign example above, the named-region version has an error: it forces the two pointers to be of the same region. That could be fine, but, assuming you're assigning to *p, it'd be more precise to write it like that: void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2); Once we get there, I think the no-named region syntax is better. That said, for the swap example, where both values need to share the same region, the named region notation is simpler: void swap(region R)(int*R a, int*R b); void swap(int* a, int* b) if (region(a) == region(b)); But I'd argue that most of the time regions do not need to be equal, but are subset or superset of each other, so reusing variable names makes more sense in my opinion. In any case, I prefer a notation where regions constrains are attached directly to the type instead of being expressed somewhere else. Something like this (explained below): void assign(int*(r)* p, int* r) { *p = r; } void swap(ref int*(b) a, ref int*(a) b); Here, a parenthesis suffix after a pointer indicates the region constrain of the pointer, based on the region of another pointer. In the first example, int*(r)* means that the integer pointer "*p" must not live beyond the value pointed by "r" (because we're going to assign "r" to "*p"). In the second example, the value pointed by "a" must not live longer than the one pointed by "b" and the value pointed by "b" must not live longer than the one pointed "a"; the net result is that they must have the same lifetime and need to be in the same region. For something more complicated, you could give multiple commas-separated constrains: void choose(ref int*(a,b) result, int* a, int* b) { result = rand() > 0.5 ? a : b; }
 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Today's templates are just fine for that. Just propagate variables through template arguments and apply region constrains to the members: struct ILst(alias var1, alias var2) { int*(var1) hd; ILst!(var1, var2)*(var2) tl; } int z; int*(z) a, b; ILst!(a, b) lst1; ILst!(&z, &z) lst2; We could even allow regions to propagate through type arguments too: struct ILst2(T1, T2) { int*(T1) hd; ILst2!(T1, T2)*(T2) tl; } ILst2!(typeof(&z), typeof(b)) lst3; I think this example is a good case for attaching region constrains directly to types instead of expressing them as conditional expressions elsewhere, as in "if (region a <= region b)".
 I'll insert a few more points below in this sprawling discussion.
 
 Which makes me think of this:
 
     struct A { int i; this(); }
     ref A foo(ref A a) { return a; }
 
     ref A bar()
     {
         foo(A()).i = 1;
 
         ref A a = foo(A()); // illegal, ref cannot be used outside 
 function signature
         a.i = 1;
 
         return foo(A()); // illegal ?
     }

foo(A()) is illegal because ref does not bind to an rvalue.

Ah, you're right.
 Also, I'd like to point out that ref (and out) being storage classes 
 somewhat hinder me from using them where it makes sense in the 
 D/Objective-C bridge, since there most functions are instanciated by 
 templates where template arguments give the type of each function 
 argument. Perhaps there should be a way to specify "ref" and "out" in 
 template arguments...

I agree. Something like that is on the list.

Great!
 Also, I'd still like to have a non-null pointer type, especially for 
 clarifying function sigatures. A template can do. If it was in the 
 language however it be used by more people, which would be better.

I don't grok this notion "if it's in the language it would be used by more people". How does that come about?

No, I really think it's true that if it is in the language, explained right alongside nullable pointers, more people would learn them more and use them more. Isn't it this exact notion that made Walter add Ddoc and unit tests directly into the language?
 Does it mean templates are at such a high syntactic disadvantage? Maybe 
 we should do something about that then, such as replacing !() with 
 something else :o). If we put it in phobos (which after integration 
 will be usable alongside with tango) could it count as being in the 
 language?

Pointers that shouldn't be null are pretty common, possibly even more common that can-be-null pointers, which is why I think it deserves a good, short, easy to read and remember syntax. I'd even suggest changing the standard syntax for pointer "*" so it only allows non-null pointers, and having something else "*?" for nullable ones. This would force people into giving more consideration before allowing nullable pointers, and the same syntax could apply to objects too. That said, having a non-nullable pointer in the standard library would certainly be better than nothing. And the standard library should make use of it everywhere it makes sense. But is a standard-libary solution going to work with "extern (C)" functions? I think it'd be sad if it didn't, and it would look strange if it did (C functions with template arguments!).
 As I said below, I think many people in this group are already 
 confortable with using pointers, which may explain why they're not so 
 interested. Having no one interested in something doesn't necessarly 
 mean they won't appreciate it when it comes.

That I totally agree with. It's happened a couple of times with D features.
 It does, however reduce the incitative for continuing forward. So I 
 understand why you're backing off, even if it displease me somewhat.

I'm sorry about how you feel. Now we're in a conundrum of sorts. You seem to strongly believe you can make some nice simplified regions work, and make people like them. Taking that to a proof is hard. The conundrum is, you are facing the prospect of putting work into it and creating a system that, albeit correct, is not enticing.

Currently, I'm just trying to convince you (and any other potential silent listeners) that it can work. I haven't given much though about the syntax before today as I wanted to clear up the concepts first. But now, in part because of your syntactic arguments above, I'm wondering if this was the good path to take. I don't mind much if it never gets into the language, although I'd like it very much. I doing it for myself too, to better understand how you can document and analyse the region/scope relationship of various variables in a program piece by piece.
 I never said the need for dynamic regions would be rare: I said garbage 
 collector obsoletes it. If we can justify the need for dynamic regions 
 later, we can add them back (with all the added complexity it requires) 
 but I'd try without them first.

Let's not forget that symbolic regions (for typing purposes) should not be confused with dynamic regions (for efficiency purposes). I agree we can do away with the latter and put them in later if we care. I disagree that dropping symbolic regions simplifies things.

I was under the impression that Cyclone requirement for named regions came with its use of dynamic regions, which I now believe was incorrect. If I take this example from the paper: char?p rstrdup(region_t<p>, const char? s); you *need* a name for the region handle. Since region handles are there for supporting dynamic regions, it therefore follows that you need named regions to make things work at all... well here's the catch: you need named *region handles* as variables, not necessarily named regions, as you could always arrange the syntax so that the returned pointer is of the region of the region handle... or something like that.
 But dynamic arrays *are* pointers, how are you oblivating the need for 
 them? If you find a solution for dynamic arrays, you'll have a solution 
 for pointers too.
 
 You could forbid dynamic arrays from refering to stack-allocated static 
 ones, or automatically dynamically allocate those when they escape in a 
 dynamic array. And if I were you, whatever you choose for arrays I'd 
 allow it for pointers too, to keep things consistent. Pointer to heap 
 objects should be retained in my opinion.

But a possible path is to make arrays safe and leave pointers for those cases in which efficiency is of utmost importance. With luck, those cases are rare.

"make arrays safe"... by forcing dynamic ones to always be on the heap? Or by implementing a full region system that applies only to arrays? Obviously it's not the later; the former is the only choice I can see. And I think you should at least allow pointers to work with heap variables in SafeD... otherwise people will work around that by creating one-item arrays. :-)
 But are you reducing the need for pointers or hiding and restricting them?

Of course - that's the whole point. In fact, I'll insert a small correction: we are reducing the need for pointers BY hiding and restricting them. And that's a good thing. If you can do most of your work with restricted pointers (e.g. ref), then that's a net win.

Whether you can work effectively only with ref or not remains to be seen. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 12 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we 
 don't need dynamic regions. The remaining regions are: 1) the shared 
 heap, 2) the thread-local heap, 3) All the stack frames; and you 
 can't allocate other stack frames than the current one. Because none 
 of these regions require a handle to allocate into, we (A) don't need 
 region handles.

 We still have many regions. Beside the two heaps (shared, 
 thread-local), each function's stack frame, and each block within 
 them, creates a distinct memory region. But nowhere we need to know 
 exactly which region a function parameter comes from; what we need to 
 know is which address outlives which pointer, and then we can forbid 
 assigning addresses to pointers that outlive them. All we need is a 
 relative ordering of the various regions, and for that we don't need 
 to attach *names* to the regions so that you can refer explicitly to 
 them in the syntax. Instead, you could say something like "region of 
 (x)", or "region of (*y)" and that would be enough.

But how do you type then the assignment example? void assign(int** p, int * r) { *p = *r; } How do you reflect the requirement that r's region outlives *p's region? But that's not even the point. Say you define some notation, such as: void assign(int** p, int * r) if (region(r) <= region(p)); But the whole point of regions was to _simplify_ notations like the above into: void assign(region R)(int*R* p, int *R r); So although you think you simplified things by using region(symbol) instead of symbolic names, you complicated things. The compiler still needs to infer regions for each value, so it is as complicated as a named-regions compiler, and in addition you require the user to write bulkier expressions because you disallow use of symbols. So everybody is worse off. Note how in the example using a symbolic region the outlives relationship is enforced implicitly by using the same symbol name in two places.

Everywhere I said there was no need for named regions, I also said named regions could be kept to ease the syntax. That said, I'm not so sure named regions are that good at simplifying the syntax. In your assign example above, the named-region version has an error: it forces the two pointers to be of the same region. That could be fine, but, assuming you're assigning to *p, it'd be more precise to write it like that: void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

No, the code is correct as written (without the if). You may want to reread the paper with an eye for region subtyping rules. This partly backs up my point: understanding region analysis may be quite a burden for the average programmer. Even you, who took pains to think through everything and absorb the paper, are having trouble. And me too to be honest :o).
 Once we get there, I think the no-named region syntax is better.

This is invalidated by the wrong assertion above.
 That 
 said, for the swap example, where both values need to share the same 
 region, the named region notation is simpler:
 
     void swap(region R)(int*R a, int*R b);
     void swap(int* a, int* b) if (region(a) == region(b));

No, for that swap there is no need to specify any region. You can swap ints in any two regions. Probably you meant to use int** throughout.
 But I'd argue that most of the time regions do not need to be equal, but 
 are subset or superset of each other, so reusing variable names makes 
 more sense in my opinion.

Don't forget that using a region name twice may actually work with two different regions, so far as they are in a subtyping relationship. Region subtyping is key to both simplifying code and to understanding code after simplification.
 In any case, I prefer a notation where regions constrains are attached 
 directly to the type instead of being expressed somewhere else. 
 Something like this (explained below):
 
     void assign(int*(r)* p, int* r) { *p = r; }
     void swap(ref int*(b) a, ref int*(a) b);

Sure. I'm sure there's understanding that that doesn't make anything any simpler or any easier to implement or understand. It's just a minor change in notation, and IMHO not to the better.
 Here, a parenthesis suffix after a pointer indicates the region 
 constrain of the pointer, based on the region of another pointer.

I thought it means pointer to function. Oops.
 In the 
 first example, int*(r)* means that the integer pointer "*p" must not 
 live beyond the value pointed by "r" (because we're going to assign "r" 
 to "*p"). In the second example, the value pointed by "a" must not live 
 longer than the one pointed by "b" and the value pointed by "b" must not 
 live longer than the one pointed "a"; the net result is that they must 
 have the same lifetime and need to be in the same region.
 
 For something more complicated, you could give multiple commas-separated 
 constrains:
 
     void choose(ref int*(a,b) result, int* a, int* b)
     {
         result = rand() > 0.5 ? a : b;
     }

This all is irrelevant. You essentially change the syntax. Syntax is, again, the least of the problems to be solved.
 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:

 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }

 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Today's templates are just fine for that. Just propagate variables through template arguments and apply region constrains to the members: struct ILst(alias var1, alias var2) { int*(var1) hd; ILst!(var1, var2)*(var2) tl; } int z; int*(z) a, b; ILst!(a, b) lst1; ILst!(&z, &z) lst2;

I hope you agree that this is just written symbols without much meaning. This is not half-baked. It's not even rare. The cow is still moving. I can't eat that! :o) I can't even start replying to it because there are so many actual and potential issues, I'd need to get to work on them first.
 We could even allow regions to propagate through type arguments too:
 
     struct ILst2(T1, T2) {
         int*(T1) hd;
         ILst2!(T1, T2)*(T2) tl;
     }
     ILst2!(typeof(&z), typeof(b)) lst3;
 
 I think this example is a good case for attaching region constrains 
 directly to types instead of expressing them as conditional expressions 
 elsewhere, as in "if (region a <= region b)".

I am thoroughly lost here, sorry. I can't even answer "this is so wrong" or "this is pure genius". Probably it's somewhere in between :o). At any rate, I suggest you develop a solid understanding of Cyclone if you want to build something related to it. [In the interest of coherence I snipped away unrelated parts of the discussion.]
 I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
 seem to strongly believe you can make some nice simplified regions 
 work, and make people like them. Taking that to a proof is hard. The 
 conundrum is, you are facing the prospect of putting work into it and 
 creating a system that, albeit correct, is not enticing.

Currently, I'm just trying to convince you (and any other potential silent listeners) that it can work.

I understand I've been blunt throughout this post, but please side with me for a minute. I'm doing so for the following reasons: (a) I'm essentially writing this post in negative time; (b) I believe you currently don't have an attack on the problem you're trying to solve; (c) I believe it's worthwhile for you to develop an attack on the problem, (d) I think "we" = "the D community" should seriously consider safety and consequently things like region analysis. You can now stop siding with me and side again with yourself. At this point you can easily guess that all of the above was to prepare you for an even blunter comment. Here goes. You say you want to convince people "it can work". But right now there is no "it". You have no "it". Much less an "it" that can work. But there is of course good hope that an "it" could emerge, and I encourage you to continue working towards that goal. It's just a lot more work than it might appear. Andrei
Nov 12 2008
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 Everywhere I said there was no need for named regions, I also said 
 named regions could be kept to ease the syntax. That said, I'm not so 
 sure named regions are that good at simplifying the syntax. In your 
 assign example above, the named-region version has an error: it forces 
 the two pointers to be of the same region. That could be fine, but, 
 assuming you're assigning to *p, it'd be more precise to write it like 
 that:
 
     void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

No, the code is correct as written (without the if). You may want to reread the paper with an eye for region subtyping rules. This partly backs up my point: understanding region analysis may be quite a burden for the average programmer. Even you, who took pains to think through everything and absorb the paper, are having trouble. And me too to be honest :o).

Ok, I've reread that part and it's true that using Cyclone's subtyping rules it'd work fine with only one region name because Cyclone implicitly creates two regions from that, the first being a subset of the other, just as I wrote explicitly here. But what I missed out was one of Cyclone's syntactic construct, not a concept of regions. ... or perhaps we have a different notion of what is a syntax and what is a concept?
 Once we get there, I think the no-named region syntax is better.

This is invalidated by the wrong assertion above.

Yes and no. It's true that Cyclone's region subtyping makes the syntax prettier. On the other side, the programmer has to be aware of how it works, and especially aware that changing the order his arguments will implicitly change the region relationship between them.
 That said, for the swap example, where both values need to share the 
 same region, the named region notation is simpler:
 
     void swap(region R)(int*R a, int*R b);
     void swap(int* a, int* b) if (region(a) == region(b));

No, for that swap there is no need to specify any region. You can swap ints in any two regions. Probably you meant to use int** throughout.

Hum, you're right, I meant to make these "ref int*".
 But I'd argue that most of the time regions do not need to be equal, 
 but are subset or superset of each other, so reusing variable names 
 makes more sense in my opinion.

Don't forget that using a region name twice may actually work with two different regions, so far as they are in a subtyping relationship. Region subtyping is key to both simplifying code and to understanding code after simplification.

I'm not convinced that region subtyping is so simple to understand for neophytes, especially because you may assume the same region at first glance. Cyclone isn't C++, but this region subtyping rule makes me think of one of those many little known corners in C++ such as Koenig name lookup. But I consider this just a syntactic issue about how to express regions though. And I may be completely wrong about its unintuitiveness.
 In any case, I prefer a notation where regions constrains are attached 
 directly to the type instead of being expressed somewhere else. 
 Something like this (explained below):
 
     void assign(int*(r)* p, int* r) { *p = r; }
     void swap(ref int*(b) a, ref int*(a) b);

Sure. I'm sure there's understanding that that doesn't make anything any simpler or any easier to implement or understand. It's just a minor change in notation, and IMHO not to the better.

Ok, then we disagree here. I think this notation is better because it makes you think about things in term of pointer lifetime vs. the pointed data lifetime, which I think is much less abstract than variables being part of different regions where some regions encompass other regions. It's a shift in perspective from the syntactic approach of Cyclone, although under the hood the compiler would do mostly the same work.
 Here, a parenthesis suffix after a pointer indicates the region 
 constrain of the pointer, based on the region of another pointer.

I thought it means pointer to function. Oops.

And I though the syntax was the least of your concern right now? :-) This probably can't be the final syntax, but I think it makes things clear enough talk about about the concepts... for now.
 In the first example, int*(r)* means that the integer pointer "*p" must 
 not live beyond the value pointed by "r" (because we're going to assign 
 "r" to "*p"). In the second example, the value pointed by "a" must not 
 live longer than the one pointed by "b" and the value pointed by "b" 
 must not live longer than the one pointed "a"; the net result is that 
 they must have the same lifetime and need to be in the same region.
 
 For something more complicated, you could give multiple 
 commas-separated constrains:
 
     void choose(ref int*(a,b) result, int* a, int* b)
     {
         result = rand() > 0.5 ? a : b;
     }

This all is irrelevant. You essentially change the syntax. Syntax is, again, the least of the problems to be solved.

Ok then. Let's go to the real problems.
 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Today's templates are just fine for that. Just propagate variables through template arguments and apply region constrains to the members: struct ILst(alias var1, alias var2) { int*(var1) hd; ILst!(var1, var2)*(var2) tl; } int z; int*(z) a, b; ILst!(a, b) lst1; ILst!(&z, &z) lst2;

I hope you agree that this is just written symbols without much meaning. This is not half-baked. It's not even rare. The cow is still moving. I can't eat that! :o) I can't even start replying to it because there are so many actual and potential issues, I'd need to get to work on them first.

If you mean there aren't any explanation, then you're right that explanations were somewhat missing from my last post. Sorry. I guess I was too tired to notice the lack of instructions. Basically you apply the same rules as for the function signatures in the preceding function examples. For instance, "int*(var1)" means the ht pointer points to an int that lives at least as long as the one pointed by var1 (var1 must be an "int*" pointer). This means that you can assign the content of var1 to it, or anything else that will live at least as long as var1. It also mean you can take its value and place it in var1, or any pointer with a shorter life. Then, we have "ILst!(var1, var2)*(var2)". It's the same rules as the first, except that we have a different type beyond the pointer which must be valid through var2's lifetime. The last code snippet shows how to use that template. int z; int*(z) a, b; ILst!(a, b) lst1; ILst!(&z, &z) lst2; Here, we're declaring "int*(z)", which is a pointer to an int whose lifetime is equal or longer than the address of z. (ok, there's an error here, it should have been "int*(&z)"). And normally, you wouldn't explicitly write that, "int*" would be enough: the compiler should determine the default constrains automatically. Then when you instanciate ILst!(a, b), the template will take the lifetime of a and b (which is the lifetime of the address of z) and apply it to pointers inside the struct.
 We could even allow regions to propagate through type arguments too:
 
     struct ILst2(T1, T2) {
         int*(T1) hd;
         ILst2!(T1, T2)*(T2) tl;
     }
     ILst2!(typeof(&z), typeof(b)) lst3;


Again, some explanations were missing... Basically, region/scoping/lifetime constrains are attached to pointers. Which means that propagating a type ought to be enough to propagate the lifetime constrains too. "ILst2!(typeof(&z), typeof(b))" is exactly the same as "ILst!(&z, b)". ILst takes its constrains from variables while ILst2 takes its constrains from types. But the two previous examples are a little stretched to make the concept more similar to Cyclone. With my proposal, you can do much better than this. I think in most cases where you want to propagate constrains, you'll want to propagate a type too. If what you want is a linked list, it'd be better expressed generically like this: struct ListRoot(T) { ListNode!(T)* first; } struct ListNode(T) { T hd; ILst2!(T)* tl; } int global; void foo() { int a; ListRoot!(int*) listRoot; ListNode!(int*) listNode; listRoot.first = &listNode; listNode.hd = &a; listNode.hd = &global; } Notice how there is absolutely no special annotation here; it's already valid template code. Now, let the compiler apply some defaults according to these rules: types declared in local variables will be allowed to point to values of their own region, and structs members will be allowed to point to values of the same region the struct comes from. Annotated explicitly, the default annotations would look like this: struct ListRoot(T) { ListNode!(T)*(this) first; // pointer to something in the same region as this } struct ListNode(T) { T value; // if T is a pointer, it holds its own region annotations ILst2!(T)*(this) next; // pointer to something in the same region as this } int global; void foo() { int a; ListRoot!(int*(&listRoot)) listRoot; ListNode!(int*(&listNode)) listNode; listRoot.first = &listNode; listNode.value = &a; listNode.value = &global; } With this scheme, the lifetime of all nodes in the linked list need to be equal or longer than the one of the preceding node (normally, they will all be equal), and the lifetime of the value pointer is determined by the type you give as a template argument to ListRoot and ListNode. Therefore, it becomes possible to construct the linked list on the stack when the root is on the stack, with no need for explicit annotations. There is still one problem though. If you want to swap two nodes, you can't, because there is no guarenty that the lifetime of the "this" pointer of a ListNode is equal to lifetime of the "next" pointer. (In fact, the next pointer lifetime is longer or equal to the struct lifetime). So if we're going to swap or reorder nodes, we'll need a way to constrain the "this" pointer against the "next" pointer to create a circular reference and thus forcing the two pointers to point to the same region... perhaps something like this: struct ListNode(T) { ListNode*(next) this; T value; ILst2!(T)*(this) next; } Not a very good syntax though.
 I think this example is a good case for attaching region constrains 
 directly to types instead of expressing them as conditional expressions 
 elsewhere, as in "if (region a <= region b)".

I am thoroughly lost here, sorry. I can't even answer "this is so wrong" or "this is pure genius". Probably it's somewhere in between :o). At any rate, I suggest you develop a solid understanding of Cyclone if you want to build something related to it.

I'll side with "pure genius", but I also consider myself biased. :-)
 I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
 seem to strongly believe you can make some nice simplified regions 
 work, and make people like them. Taking that to a proof is hard. The 
 conundrum is, you are facing the prospect of putting work into it and 
 creating a system that, albeit correct, is not enticing.

Currently, I'm just trying to convince you (and any other potential silent listeners) that it can work.

I understand I've been blunt throughout this post, but please side with me for a minute. I'm doing so for the following reasons: (a) I'm essentially writing this post in negative time; (b) I believe you currently don't have an attack on the problem you're trying to solve; (c) I believe it's worthwhile for you to develop an attack on the problem, (d) I think "we" = "the D community" should seriously consider safety and consequently things like region analysis.

I don't mind about (a) and I agree about (d). I'll say that because of my lack of expertise with Cyclone I have some difficulty expressing my proposal as a comparaison of what is different from Cyclone (it's difficult enough without it). You're the one asking for such a comparison and increasing the difficulty. I do not dislike the challenge, but I don't think you can take this as a proof that I don't understand well the problem I'm trying to solve when I may just be mixing some things about the approach taken by Cyclone. Another thing not helping is that my original proposal has evolved a little since the first time I started the "full scope analysis proposal" thread. I also revamped the syntax I use to talk about the problem (and apparently I should do it again to avoid a conflicts with function names). Hunting in previous post the details I leave out in the more recent ones doesn't help anyone understanding what I'm talking about. I'm thinking that maybe I should put everything in one document to have a coherent proposal that could evolve as a whole instead of one scattered on various post between which the syntax I use and some concepts have evolved.
 You can now stop siding with me and side again with yourself. At this 
 point you can easily guess that all of the above was to prepare you for 
 an even blunter comment. Here goes.
 
 You say you want to convince people "it can work". But right now there 
 is no "it". You have no "it". Much less an "it" that can work.
 
 But there is of course good hope that an "it" could emerge, and I 
 encourage you to continue working towards that goal. It's just a lot 
 more work than it might appear.

I'm pretty sure I hold that "it" just now, or something very near it. It's just that it seems I haven't explained it well enough for you (and probably anyone) to understand correctly. I should probably write it all down in one coherent and more formal document rather than scattering all the details over many different posts as half-documented concept-name-changing written-too-fast examples. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 14 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Just to fix a little misunderstanding:

Michel Fortin wrote:
 On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu 
 with me for a minute. I'm doing so for the following reasons: (a) I'm 
 essentially writing this post in negative time;


By this I meant I don't have time (t < 0), not that I was writing while being at a time when I had a negative outlook. Andrei
Nov 14 2008
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Jacques wrote:
 On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a 
 cast. I haven't proved it to myself yet, but I believe that if 
 pointers are not used and with the amendments below regarding arrays 
 and delegates, this makes things entirely safe. In Walter's words, "it 
 buttons things pretty tight".

Does this mean the whole shared/local/scope issue for classes is being sidestepped for now?

What issue do you have in mind? Andrei
Nov 02 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

only allow and implement the scope storage class for delegates, which simply means the callee will not squirrel away a pointer to delegate. That would allow us to solve the closure issue and for now sleep some more on the other issues.

If scope delegates means trust the coder knows what he is doing (in the beginning), I agree with that plan of attack.

It looks like things will move that way. Bartosz, Walter and I talked a lot yesterday about it - a lot of crazy things were on the table! The next step is to make this a reference, which is highly related to escape analysis. At the risk of anticipating a bit an unfinalized design, here's what's on the table: * Continue an "anything goes" policy for *explicit* pointers, i.e. those written explicitly by user code with stars and stuff. * Disallow pointers in SafeD.

Isn't this already the case? BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD. This isn't an implemented language right? Is the plan for D to become SafeD? Or is there going to be a compiler switch? Or something else maybe? I've heard SafeD mentioned a lot on this NG, without ever really knowing how it exists (concrete or theory).
 * Make all ref parameters scoped by default. There will be impossible for 
 a function to escape the address of a ref parameter without a cast. I 
 haven't proved it to myself yet, but I believe that if pointers are not 
 used and with the amendments below regarding arrays and delegates, this 
 makes things entirely safe. In Walter's words, "it buttons things pretty 
 tight".

I think this sounds reasonable. However, will there be a way to override this behavior? For example, some modifier to signify that a reference is not scope? The advantage to having the other be the default is that the scope keyword already exists. Having to cast for every time I convert to a pointer will be unpleasant, but not horrific. I'd prefer to state one time 'this is an unsafe reference', preferrably in the signature, and be able to use it like before. The same semantics still apply as far as calling the function, it just says "the author of this function knows what he is doing" to the compiler. You would also disallow this keyword usage in SafeD which would be easy to filter. noscope would be a good keyword...
 * Make this a reference so that it obeys what references obey.

This is one place where I think whole-heartedly it should be done. One rarely needs the address this, in fact, I generally end up returning *this quite a bit in struct operators, so this change will be most welcome.
 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain and 
 escape &this. That also means they'd be using pointers, so anything goes - 
 pointers are not restricted from escaping.

I implemented dcollections' node-based containers (tree, hash, linked list) as structs, because I wanted to control the allocation of them. I agree with others that the defacto standard is going to be structs, since performance is paramount, and you have little need for OOP in the internal node structures. Also, if the noscope (or equivalent keyword) is implemented as above, you can easily decorate your pointer-using functions: struct LinkNode(T) { noscope { LinkNode *find(T value); LinkNode *findReverse(T value); ... } }
 * There are two cases in which things escape without the user explicitly 
 using pointers: delegates and dynamic arrays initialized from 
 stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the callee. 
 A scoped delegate cannot be stored, only called or passed down to another 
 function that in turn takes a scoped delegate. This makes scope delegates 
 entirely safe. Non-scoped delegates use dynamic allocation.

If noscope (or equivalent keyword) is used, can we make scope the default? I'd much rather have the default be the higher-performance, more commonly used option. Also, when you say stored, do you mean stored anywhere, or stored anywhere but the stack? Because there is no harm in storing a scope delegate in a local variable (as long as it is also scope).
 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Hm... this is a tough one. At the very least, you can disallow returning such arrays, as long as the compiler can prove the arrays origins. That should cover 90% of the issues. The other 10% are ones that are passed into functions. You might employ the same techniques as for delegates, but then we are stuck with the same problems as needed for full escape analysis. Plus the need to return a slice of an array is much greater than the need to return a delegate. You could also argue that an array contains a pointer, and morphing into a dynamic array is the same as taking the address of a stack local variable (which would require a cast). But that means SafeD cannot use dynamic arrays to reference static arrays. However, you can then argue that dynamic arrays allocated using new are OK for SafeD because you didn't take the address of a local stack variable. My understanding is that in SafeD, safety trumps performance. Note that a static array could be used for a rebindable reference, since it has a rebindable pointer in it, so it is really an unsafe operation: int[2] a; int[] aref = a[0..1]; // reference to a[0] aref = a[1..2]; // rebind to a[1] -Steve
Nov 03 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 * Disallow pointers in SafeD.

Isn't this already the case?

At a point we wanted to allow pointers in restricted ways.
 BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD.  This 
 isn't an implemented language right?  Is the plan for D to become SafeD?  Or 
 is there going to be a compiler switch?  Or something else maybe?  I've 
 heard SafeD mentioned a lot on this NG, without ever really knowing how it 
 exists (concrete or theory).

It's planned as a compiler switch and module option. Essentially SafeD is slated to be a safe, proper, well-defined subset of D. It was Bartosz's idea, and IMHO an important dimension of D's development. Walter is implementing module safety options like this: module(safe) mymodule; which means the module must always be compiled with safety on. On the contrary, module(system) mymodule; means the module is getting its hands greasy.
 * Make all ref parameters scoped by default. There will be impossible for 
 a function to escape the address of a ref parameter without a cast. I 
 haven't proved it to myself yet, but I believe that if pointers are not 
 used and with the amendments below regarding arrays and delegates, this 
 makes things entirely safe. In Walter's words, "it buttons things pretty 
 tight".

I think this sounds reasonable. However, will there be a way to override this behavior? For example, some modifier to signify that a reference is not scope? The advantage to having the other be the default is that the scope keyword already exists.

Good point. I think escaping the address of a ref should be allowed via a cast.
 Having to cast for every time I convert to a pointer will be unpleasant, but 
 not horrific.  I'd prefer to state one time 'this is an unsafe reference', 
 preferrably in the signature, and be able to use it like before.  The same 
 semantics still apply as far as calling the function, it just says "the 
 author of this function knows what he is doing" to the compiler.

Currently Walter plans to do that at module granularity.
 You would also disallow this keyword usage in SafeD which would be easy to 
 filter.
 
 noscope would be a good keyword...
 
 * Make this a reference so that it obeys what references obey.

This is one place where I think whole-heartedly it should be done. One rarely needs the address this, in fact, I generally end up returning *this quite a bit in struct operators, so this change will be most welcome.
 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain and 
 escape &this. That also means they'd be using pointers, so anything goes - 
 pointers are not restricted from escaping.

I implemented dcollections' node-based containers (tree, hash, linked list) as structs, because I wanted to control the allocation of them. I agree with others that the defacto standard is going to be structs, since performance is paramount, and you have little need for OOP in the internal node structures. Also, if the noscope (or equivalent keyword) is implemented as above, you can easily decorate your pointer-using functions: struct LinkNode(T) { noscope { LinkNode *find(T value); LinkNode *findReverse(T value); ... } }
 * There are two cases in which things escape without the user explicitly 
 using pointers: delegates and dynamic arrays initialized from 
 stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the callee. 
 A scoped delegate cannot be stored, only called or passed down to another 
 function that in turn takes a scoped delegate. This makes scope delegates 
 entirely safe. Non-scoped delegates use dynamic allocation.

If noscope (or equivalent keyword) is used, can we make scope the default? I'd much rather have the default be the higher-performance, more commonly used option.

I think safety should be the default. People who care about efficiency will be willing to write a little bit more. I agree that this is annoying if that's the more frequent situation.
 Also, when you say stored, do you mean stored anywhere, or stored anywhere 
 but the stack?  Because there is no harm in storing a scope delegate in a 
 local variable (as long as it is also scope).

That could be allowed, but probably it's not really needed.
 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Hm... this is a tough one. At the very least, you can disallow returning such arrays, as long as the compiler can prove the arrays origins. That should cover 90% of the issues. The other 10% are ones that are passed into functions. You might employ the same techniques as for delegates, but then we are stuck with the same problems as needed for full escape analysis. Plus the need to return a slice of an array is much greater than the need to return a delegate. You could also argue that an array contains a pointer, and morphing into a dynamic array is the same as taking the address of a stack local variable (which would require a cast). But that means SafeD cannot use dynamic arrays to reference static arrays. However, you can then argue that dynamic arrays allocated using new are OK for SafeD because you didn't take the address of a local stack variable. My understanding is that in SafeD, safety trumps performance. Note that a static array could be used for a rebindable reference, since it has a rebindable pointer in it, so it is really an unsafe operation: int[2] a; int[] aref = a[0..1]; // reference to a[0] aref = a[1..2]; // rebind to a[1]

I agree with the above. The floor is open for more ideas. Andrei
Nov 03 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD. 
 This isn't an implemented language right?  Is the plan for D to become 
 SafeD?  Or is there going to be a compiler switch?  Or something else 
 maybe?  I've heard SafeD mentioned a lot on this NG, without ever really 
 knowing how it exists (concrete or theory).

It's planned as a compiler switch and module option. Essentially SafeD is slated to be a safe, proper, well-defined subset of D. It was Bartosz's idea, and IMHO an important dimension of D's development.

I personally probably won't use it, as I feel I have enough experience to avoid the problems that SafeD will prevent. But it does sound like a very important version of the language.
 Walter is implementing module safety options like this:

 module(safe) mymodule;

 which means the module must always be compiled with safety on. On the 
 contrary,

 module(system) mymodule;

 means the module is getting its hands greasy.

Hm... that's kinda too high level. I might have one function in a class that does things that are 'unsafe', but I don't want to have to mark my whole class as unsafe.
 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

If noscope (or equivalent keyword) is used, can we make scope the default? I'd much rather have the default be the higher-performance, more commonly used option.

I think safety should be the default. People who care about efficiency will be willing to write a little bit more. I agree that this is annoying if that's the more frequent situation.

What I meant was, make the default behavior as if scope was marked on the delegate. This doesn't make it unsafe (you said so yourself). But it does line up with most code today, which doesn't do anything with a delegate but call it. i.e. less decorations on current code that is already considered safe. The most obvious usage is opApply. Every opApply will have to have its delegate marked scope unless it's the default. The only downside is that you then have to come up with a way to mark a delegate as noscope.
 Also, when you say stored, do you mean stored anywhere, or stored 
 anywhere but the stack?  Because there is no harm in storing a scope 
 delegate in a local variable (as long as it is also scope).

That could be allowed, but probably it's not really needed.

I can think of certain cases to need it, for example if you have two inner functions that have the same signature, and you want to decide which one to use at runtime, you might store the one to use in a local variable. --------------------------- It seems to me like the way you are saying things will work is that you will have either safety checks or no safety checks at a module level. I think that is a mistake. Most of my code should be safe, and I'd prefer it to be safety checked. The ideas that all of you have come up with in this post are very good, and should be easy to use for most code. I especially like the requirement to cast in order to take the address of a reference. But if all those checks go away when you mark your module as system, then this seems like it will either require me to split up my modules into safe and unsafe parts, or just not use safety checks where they could be used. I'd prefer to be able to mark specific functions/parameters as unsafe or safe so I know exactly where I have disabled the safety checks. And I'd prefer safety by default, not have to mark for safety. As long as the safety can be easily verified and allows most usages. I really like how pointers are simply considered unsafe, so all safety checks are off. That draws a clear line of where it's difficult to verify safety without hindering ability. The further check of compliance to SafeD can eliminate possible pointer usages that you miss. -Steve
Nov 04 2008
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Robert Jacques" wrote
 On Sat, 01 Nov 2008 12:00:10 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 "Michel Fortin" wrote
 The only way I can see to solve this is to do it at link time.  When 
 you
 link, piece together the parts of the graph that were incomplete, and 
 see
 if
 they all work.  It would be a very radical change, and might not even
 work
 with the current linkers.  Especially if you want to do shared 
 libraries,
 where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, but that's probably not even possible.

Sure it is ;) You have to write a special linker. I think everyone who thinks a scope decoration proposal is going to 1) solve all scope escape issues and 2) be easy to use is dreaming :P

Various research languages have shown both 1 and 2 are possible.

I think 1 can be possibly done. 2 is a matter of subjectivity, and so far, I haven't seen an example of it. But I also don't want D to become a purely academic language. I want it to keep the system-level performance and usability that drew me to it in the first place.
 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right.

It takes some thinking to get the prototype right at first. But it takes less caution calling the function later with local variables since the compiler will either issue an error or automatically fix the issue by allocating on the heap when an argument requires a greater scope.

I hope to avoid this last situation. Having the compiler make decisions for me, especially when heap allocation occurs, is bad.

How so? Please explain why it's bad (an opinion by itself isn't and argument).

Allocating on the heap involves locking a global mutex (as long as the heap is global), searching for a free memory space, possibly running a garbage collection cycle, and finally possibly allocating more memory from the OS. All of these are very expensive compared to adjusting the stack pointer. For instance, I wrote a 'chunk allocator' which uses D's allocator to allocate memory in chunks instead of going to the GC for each piece in dcollections' implementation. Doing this achieved at least a 2x speedup because I was calling on the GC less often. The author of Tango's new container implementation wrote a similar allocator that's even faster than that because it doesn't use the GC for any allocation (of course, you cannot use it to allocate items which have references, because the GC doesn't look at that memory). In Tango, many operations rely on using stack allocation for buffers and temporary classes. If the compiler decides I don't know what I'm doing and helpfully allocates those on the heap for my protection, I just lost all the performance that I purposely build the library to have. This is one of the main arguments I hear from the other Tango devs about moving to D2, the automatic dynamic closure. I think many people are not aware of how important it is to avoid heap allocation when possible. It is one of the central goals that makes Tango so much faster than other libraries. -Steve
Nov 03 2008
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-11-03 14:47:25 -0500, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 I hope to avoid this last situation.  Having the compiler make decisions
 for me, especially when heap allocation occurs, is bad.

How so? Please explain why it's bad (an opinion by itself isn't and argument).

Allocating on the heap involves locking a global mutex (as long as the heap is global), searching for a free memory space, possibly running a garbage collection cycle, and finally possibly allocating more memory from the OS. All of these are very expensive compared to adjusting the stack pointer.

I won't dispute this. I'll note that the upcomming "shared" keyword may help regarding not locking a global mutex for unshared variables, but even without the mutex the operation still is expensive.
 For instance, I wrote a 'chunk allocator' which uses D's allocator to
 allocate memory in chunks instead of going to the GC for each piece in
 dcollections' implementation.  Doing this achieved at least a 2x speedup
 because I was calling on the GC less often.  The author of Tango's new
 container implementation wrote a similar allocator that's even faster than
 that because it doesn't use the GC for any allocation (of course, you cannot
 use it to allocate items which have references, because the GC doesn't look
 at that memory).

Nothing of the sort should be prevented by a scoping system. If it is, then I'd consider the system a failure.
 In Tango, many operations rely on using stack allocation for buffers and
 temporary classes.  If the compiler decides I don't know what I'm doing and
 helpfully allocates those on the heap for my protection, I just lost all the
 performance that I purposely build the library to have.  This is one of the
 main arguments I hear from the other Tango devs about moving to D2, the
 automatic dynamic closure.

Then we must make sure the compiler doesn't heap allocate when it doesn't absolutely need to. And, *in addition*, when the programmer really needs to be sure that a variable is not heap-allocated, marking a varialbe "scope" would do the trick.
 I think many people are not aware of how important it is to avoid heap
 allocation when possible.  It is one of the central goals that makes Tango
 so much faster than other libraries.

I agree with your first assertion (and am not enough familiar with Tango to say anything about the second) and this is exactly why I'm in favor of the compiler deciding what to heap-allocate. People are not aware enough of how important it is to avoid heap allocation, so I expect that if the compiler can be made to know about scopes, it can avoid heap allocation where many users wouldn't bother (especially in a garbage-collected language where you can heap-allocate without thinking), which would result in faster programs with fewer bugs all this without having to think about the technical details. Note that I may be wrong with this, but there's no way to be sure without trying. Anyway, once we have a proper scoping system, it'll be easy to try and decide between auto-allocation and simply enforcing constrains by emitting errors. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 04 2008
prev sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Sun, Nov 2, 2008 at 10:39 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with
 classes.

UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All objects have a vtable.

No, they have a *pointer* to a vtable. There is only one vtable per class, allocated in static memory. You only pay the cost of one pointer.
Nov 02 2008
prev sibling next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-29 15:10:00 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 Basically, all the scope variables you can get are guarentied to be in  
 the current or in some ansestry scope. To allow a reference to a scope  
 variable, or a scope function, to be put inside a member of a struct or 
  class, you only need to prove that the struct or class lifetime is  
 smaller or equal to the one of the reference to your scope variable. If 
  you could tell to the compiler the scope relationship of the various  
 arguments, then you'd have pretty good scope analysis.
 
 For instance, with this syntax, we could define i to be available 
 during  the whole lifetime of o:
 
 	void foo(scope MyObject o, scope(o) int* i)
 	{
 		o.i = i;
 	}

What does the scope part of 'scope MyObject o' mean? (i.e. is this D's current scope or something else?)

Ok, I should have defined that better. It means that o is bound the caller scope (possibly on the stack). Scopes are created for each function and each {}-delimited blocks in them, basically it's the stack of the current thread. Once you exit a scope, its variables cease to exist and we must ensure there is no more reference to them. In this case, "scope MyObject o" means that we're recieving a MyObject reference which could be pointing to somewhere down in the stack *or* the heap. We have to consider the most restrictive constrain however, so let's say it's in the stack. The rule is that you can't place a reference to a scoped variable anywhere below its scope in the stack, making sure that you can't keep a reference to a variable which no longer exist once the top scope has dissapeared. Scope stack (call stack with the global scope at the bottom): 1. foo ( scope MyObject o = function1.o ) { } 2. function1 () { scope MyObject o, int i } 3. main () { } ... n. global scope In practical terms, "scope MyObject o" means that we can't put a reference to the object anywhere that lives beyond the current function call... except in a scope return value, but I haven't entered that yet.
 What does 'scope(o)' explicitly mean? I'm going to assume scope(o) 
 means  the scope of o.

That's it... mostly. scope(o) is the scope of o, or any scope below o. Take it as any scope valid as long as o exists. If o was not scope, scope(o) would be noscope.
 So you could do:
 
 	void bar()
 	{
 		scope int i;
 		scope MyObject o = new MyObject;
 		foo(o, &i);
 	}
 
 And the compiler would let it pass because foo guarenties not to keep  
 references to i outside of o's scope, and o's scope is the same as i.
 
 Or you could do:
 
 	void test1()
 	{
 		int i;
 		test2(&i);
 	}
 
 	void test2(scope int* i)
 	{
 		scope o = new MyObject;
 		foo(o, &i);


 	}
 
 Again, the compiler can statically check that test2 won't keep a  
 reference to i outside of the caller's scope (test1) because o scope is 
  limited to test2.

The way I read your example, no useful escape analysis can be done by the complier, and it works mainly because i is a pointer to a value type.

It's not escape analysis. It scoping constrains enforced by making sure that every function declares what may escape and what may not. If this was a pure value type passed by copy, scope would be meaningless indeed as there would be no reference that could escape.
 And if you try the reverse:
 
 	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}
 
 	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}
 
 Then the compiler could determine automatically that i needs to escape  
 test2's scope and allocate the variable on the heap to make its 
 lifetime  as long as the object's scope (as it does currently with 
 nested  functions) [see my reserves to this in post scriptum]. This 
 could be  avoided by explictly binding i to the current scope, in which 
 case the  compiler could issue a scope error:

The way I read this is o is of type scope MyObject, i is of type scope int and therefore foo(o,&i) is valid and an escape happens.

That's my point. The compiler can detect an escape may happen just by looking at the funciton prototype for foo. The prototype tells us that foo needs i to be at the same or a lower scope than o, something we don't have here. The compiler can then decide to allocate i dynamically on the heap to make sure it exists for at least the scope of o; or it could be decided to just make that illegal. I prefer automatic heap allocation, as it means we can get rid of the decision to statically or dynamically allocate variables: the compiler can decide based on the funciton prototypes whichever is best. For cases you really mean a variable to be on the stack, you can use scope, as in: scope int i; and the compiler would just issue an error if you attept to give a reference to i to a function that wants to use it in a lower scope. Otherwise, the compiler would be free to decide whichever scope to use between local or heap-allocated. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 30 2008
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-30 09:04:10 -0400, "Robert Jacques" <sandford jhu.edu> said:

 Just to clarify:
    	void test2(scope MyObject o)	// the scope of o is a parent of test2
   	{
   		int i;			// the scope of i is test2
   		foo(o, &i);		// foo(o,&i) requires &i to have o's scope or a parent 
 of  o's scope, so i must be heap (the root parent) allocated.
   	}
 
 A problem I see is that once shared/local are introduced, you have  
 multiple heaps where i should be allocated, depending on the runtime 
 type  of o. How would this be handled in this scheme?

Well, it all depends if foo wants the second argument of i must be shared or not. If foo's declaration was like this: void foo(scope MyObject o, scope(o) shared int* i); then you'd need to use "shared int i" in test2 to avoid an error at the call site. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 30 2008
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 P.P.S.: This syntax doesn't fit very well with the current  
 scope(success/failure/exit) feature.

contract-like syntax: void foo (myObject o, int* i) if (o.scope <= i.scope) { ... }

Hum, but can that syntax guarenty a reference to o or i won't escape the current function's scope, like void foo(scope Object o); ? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 30 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Oct 2008 11:01:35 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Michel Fortin" wrote
 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems
 relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same direction as I was thinking.

This is exactly the kind of thing I DON'T want to have. Here, you have to specify everything, even though the compiler is also doing the work, and making sure it matches. Tack on const modifiers, shared modifiers, and pure functions and there's going to be more decorations on function signatures than there are parameters. Note that especially this scope stuff will be required more often than the others. I'd much rather have either no checks, or have the compiler (or a lint tool) do all the work to tell me if anything escapes. -Steve

Note that one of a major points in the Pedigree paper is the static type inference, so you don't have to specify everything.
Oct 29 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems  
 relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same direction as I was thinking. Basically, all the scope variables you can get are guarentied to be in the current or in some ansestry scope. To allow a reference to a scope variable, or a scope function, to be put inside a member of a struct or class, you only need to prove that the struct or class lifetime is smaller or equal to the one of the reference to your scope variable. If you could tell to the compiler the scope relationship of the various arguments, then you'd have pretty good scope analysis. For instance, with this syntax, we could define i to be available during the whole lifetime of o: void foo(scope MyObject o, scope(o) int* i) { o.i = i; }

What does the scope part of 'scope MyObject o' mean? (i.e. is this D's current scope or something else?) What does 'scope(o)' explicitly mean? I'm going to assume scope(o) means the scope of o.
 So you could do:

 	void bar()
 	{
 		scope int i;
 		scope MyObject o = new MyObject;
 		foo(o, &i);
 	}

 And the compiler would let it pass because foo guarenties not to keep  
 references to i outside of o's scope, and o's scope is the same as i.

 Or you could do:

 	void test1()
 	{
 		int i;
 		test2(&i);
 	}

 	void test2(scope int* i)
 	{
 		scope o = new MyObject;
 		foo(o, &i);

 	}

 Again, the compiler can statically check that test2 won't keep a  
 reference to i outside of the caller's scope (test1) because o scope is  
 limited to test2.

The way I read your example, no useful escape analysis can be done by the complier, and it works mainly because i is a pointer to a value type.
 And if you try the reverse:

 	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}

 	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}

 Then the compiler could determine automatically that i needs to escape  
 test2's scope and allocate the variable on the heap to make its lifetime  
 as long as the object's scope (as it does currently with nested  
 functions) [see my reserves to this in post scriptum]. This could be  
 avoided by explictly binding i to the current scope, in which case the  
 compiler could issue a scope error:

The way I read this is o is of type scope MyObject, i is of type scope int and therefore foo(o,&i) is valid and an escape happens.
Oct 29 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Oct 2008 08:14:31 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 And if you try the reverse:
  	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}
  	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}
  Then the compiler could determine automatically that i needs to  
 escape  test2's scope and allocate the variable on the heap to make  
 its lifetime  as long as the object's scope (as it does currently with  
 nested  functions) [see my reserves to this in post scriptum]. This  
 could be  avoided by explictly binding i to the current scope, in  
 which case the  compiler could issue a scope error:

int and therefore foo(o,&i) is valid and an escape happens.

That's my point. The compiler can detect an escape may happen just by looking at the funciton prototype for foo. The prototype tells us that foo needs i to be at the same or a lower scope than o, something we don't have here. The compiler can then decide to allocate i dynamically on the heap to make sure it exists for at least the scope of o; or it could be decided to just make that illegal. I prefer automatic heap allocation, as it means we can get rid of the decision to statically or dynamically allocate variables: the compiler can decide based on the funciton prototypes whichever is best. For cases you really mean a variable to be on the stack, you can use scope, as in: scope int i; and the compiler would just issue an error if you attept to give a reference to i to a function that wants to use it in a lower scope. Otherwise, the compiler would be free to decide whichever scope to use between local or heap-allocated.

Just to clarify: void test2(scope MyObject o) // the scope of o is a parent of test2 { int i; // the scope of i is test2 foo(o, &i); // foo(o,&i) requires &i to have o's scope or a parent of o's scope, so i must be heap (the root parent) allocated. } A problem I see is that once shared/local are introduced, you have multiple heaps where i should be allocated, depending on the runtime type of o. How would this be handled in this scheme?
Oct 30 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current  
 scope(success/failure/exit) feature.

contract-like syntax: void foo (myObject o, int* i) if (o.scope <= i.scope) { ... }
Oct 30 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Oct 2008 21:01:28 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin   
 <michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current   
 scope(success/failure/exit) feature.

contract-like syntax: void foo (myObject o, int* i) if (o.scope <= i.scope) { ... }

Hum, but can that syntax guarenty a reference to o or i won't escape the current function's scope, like void foo(scope Object o); ?

No, the syntax was meant to address the more complex problem of specifying the concept of scope(o). It also add some flexibility for other relationships. As for do not escape, I'm assuming a no_escape type (it would behave as a transitive version of final). I dislike reusing the scope keyword for this as void foo(scope Object a) { scope Object b = new Object(); scope Object c = b; // Okay scope Object d = a; // Error }
Oct 31 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Oct 2008 21:01:27 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 09:04:10 -0400, "Robert Jacques" <sandford jhu.edu> said:

 Just to clarify:
    	void test2(scope MyObject o)	// the scope of o is a parent of test2
   	{
   		int i;			// the scope of i is test2
   		foo(o, &i);		// foo(o,&i) requires &i to have o's scope or a parent  
 of  o's scope, so i must be heap (the root parent) allocated.
   	}
  A problem I see is that once shared/local are introduced, you have   
 multiple heaps where i should be allocated, depending on the runtime  
 type  of o. How would this be handled in this scheme?

Well, it all depends if foo wants the second argument of i must be shared or not. If foo's declaration was like this: void foo(scope MyObject o, scope(o) shared int* i); then you'd need to use "shared int i" in test2 to avoid an error at the call site.

Actually, what I meant was that o may be local or shared. However, assuming thin-locks, o may be tested at runtime for share/local cheaply and the right allocation done.
Oct 31 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 30 Oct 2008 21:01:28 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin   
 <michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current   
 scope(success/failure/exit) feature.

contract-like syntax: void foo (myObject o, int* i) if (o.scope <= i.scope) { ... }

Hum, but can that syntax guarenty a reference to o or i won't escape the current function's scope, like void foo(scope Object o); ?

Another option is for the default to be escape. i.e. a contract is required for an escape to happen Object o; void foo(Object a, Object b) if(b.scope <= o.scope) { o = b; // Okay o = a; // Error }
Oct 31 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 31 Oct 2008 11:02:31 -0400, Robert Jacques <sandford jhu.edu>  
wrote:
 Another option is for the default to be escape.

Oct 31 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Fri, 31 Oct 2008 11:11:26 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable  
 way,
 we are freed of other burdens the compiler can now take care of. In
 addition, we have better defined interfaces and the compiler has a lot
 more room to optimize things.

But the burden you have left for the developer is a tough one. You have to analyze the inputs and function calls from a function and determine which variable depends on what. This is a perfect problem for a tool to solve.

Tools can't handle function pointers, which is why escape analysis has been limited to dynamic laguages like Java so far.
 The problem is that as soon as you have a function declaration without  
 the
 body, the lint tool won't be able to tell you if it escapes or not.

This I agree is a problem. In fact, without specifications in the function things like interfaces would be very difficult to determine scope-ness at compile time. The only way I can see to solve this is to do it at link time. When you link, piece together the parts of the graph that were incomplete, and see if they all work. It would be a very radical change, and might not even work with the current linkers. Especially if you want to do shared libraries, where the linker is builtin to the OS.

One option is link time compilation, although that doesn't apply to shared libs.
 A related question: how do you handle C functions?

Hope and pray? (i.e. The same way C functions and immutable types are handled now.)
 So, without a way to specify the requested scope of the parameters,  
 you'll
 very often have holes in your escape analysis that will propagate down  
 the
 caller chain, preventing any useful conclusion.

Yes, and if a function has mis-specified some of its parameters, then you have code that doesn't compile. Or the function itself won't compile, and you need to do some more manual analysis. Imagine a function that calls 5 or 6 other functions with its parameters. And there are multiple different dependencies you have to resolve. That's a lot of analysis you have to do manually.

Well, the same problem occurs with const today and just like const you'd have specific compilier errors to guide you.
 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right. We are talking about a graph dependency analysis, in which many edges can exist, and the vertices do not necessarily have to be parameters. This is not stuff for the meager developer looking to get work done to have to think about. I'd much rather have a tool that does it, if not the compiler, then something else. Or partial analysis. Or no analysis. I agree it's good to have bugs caught by the compiler, but this solution requires too much work from the developer to be used.

Well, I'd guess most functions are either no escape or heap escape. Only functions that permit escape and want to play nice with stack variables need to do actual graph analysis. You'll note Walter's blog ignores this usage.
 Some fun puzzles for you to come up with a proper scope syntax to use:

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

if( a.scope <= b.scope && a.scope <= c.scope )
 struct S
 {
    int *v;
 }

 int *f2(S* s) { return s.v;}

int* f2(S* s) if( return.scope >= s.scope )
 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

if ( a.scope == b.scope && a.scope == c.scope )
 -Steve

This is actually pretty straight forward as a = b implies a.scope <= b.scope.
Oct 31 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 * Make all ref parameters scoped by default. There will be impossible  
 for a function to escape the address of a ref parameter without a cast.  
 I haven't proved it to myself yet, but I believe that if pointers are  
 not used and with the amendments below regarding arrays and delegates,  
 this makes things entirely safe. In Walter's words, "it buttons things  
 pretty tight".

Does this mean the whole shared/local/scope issue for classes is being sidestepped for now?
Nov 02 2008
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 01 Nov 2008 12:00:10 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Michel Fortin" wrote
 The only way I can see to solve this is to do it at link time.  When  
 you
 link, piece together the parts of the graph that were incomplete, and  
 see
 if
 they all work.  It would be a very radical change, and might not even
 work
 with the current linkers.  Especially if you want to do shared  
 libraries,
 where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, but that's probably not even possible.

Sure it is ;) You have to write a special linker. I think everyone who thinks a scope decoration proposal is going to 1) solve all scope escape issues and 2) be easy to use is dreaming :P

Various research languages have shown both 1 and 2 are possible.
 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

I think this concept is going to be really hard for a person to decipher, and really hard to get right.

It takes some thinking to get the prototype right at first. But it takes less caution calling the function later with local variables since the compiler will either issue an error or automatically fix the issue by allocating on the heap when an argument requires a greater scope.

I hope to avoid this last situation. Having the compiler make decisions for me, especially when heap allocation occurs, is bad.

How so? Please explain why it's bad (an opinion by itself isn't and argument).
Nov 02 2008
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 03 Nov 2008 00:29:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 * Make all ref parameters scoped by default. There will be impossible  
 for a function to escape the address of a ref parameter without a  
 cast. I haven't proved it to myself yet, but I believe that if  
 pointers are not used and with the amendments below regarding arrays  
 and delegates, this makes things entirely safe. In Walter's words, "it  
 buttons things pretty tight".

sidestepped for now?

What issue do you have in mind?

Right now, it's trivial for scope classes to escape due to automatic conversion to 'local'. And under the current shared/local scheme, one has to write multiple functions (one for each type combination).
Nov 03 2008
prev sibling next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Thu, Oct 30, 2008 at 12:21 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior,
 for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are
 scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a
 much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current
 use.

The counter to that is that when there is an inadvertent escape of a reference, the error is often undetectable even while it silently corrupts data and behaves erratically. In other words (as Andrei pointed out to me) the cost of those errors, even though rare, is very high. This makes it highly desirable to prevent them automatically, rather than relying on the skill and attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. I've never encountered a bug related to this, for example, so to me the benefit is entirely theoretical while the cost is immediate.

I've had bugs caused by this but they were pretty easy to find. Some delegate I'm calling crashes and all the variables are nonsensical garbage... Hmm maybe I was using out-of-scope variables in that delegate that I wasn't supposed to? Maybe there are real cases where the bugs caused are harder to find. But I'll just add my 2c to Sean's. I haven't had many such bugs, and when I've had them they've been pretty easy to find.

I don't think we can afford program correctness to rest on anecdote and "it works for me". That age is long gone.

I haven't seen any real data about how serious a problem this is from you either. Chasing bogeymen is at least as bad as ignoring real problems.

Well to provide real data I'd have to spend time on user studies, which would be time-intensive. I also think it's not an interesting research problem because it is generally accepted in the community that memory un-safety is a source of problems. So I don't quite feel burdened with the need to provide a proof. Reframing the problem as chasing a bogeyman won't help with addressing it. Andrei

I just wanted to issue an apology to Bill for the above, which is brusque and demeaning. He was delicate enough to email me privately what he thought about my response, and in very levelheaded terms. After having answered privately as well, I thought I'd post a public apology; it would be quite unethical to apologize in private for a public remark! Hopefully this helps with undoing the damage and with keeping the recent streak of good discussions going.

No problem. My comment leading to that response was a bit snarky too. Though I tried really hard not to make it snarky. It still is basically saying "I you are but what am I?" Back to the technical topic, as I told Andrei, all I want is some solution that doesn't kill performance with lots of hidden memory allocations. I doubt that's something anyone really wants, so all this huffing and puffing about it probably isn't necessary. --bb
Oct 29 2008
prev sibling next sibling parent Chad J <gamerchad __spam.is.bad__gmail.com> writes:
I wonder if it would be easy enough to allocate closures lazily at runtime.

So the compiler scans executable code, and any time there is an 
assignment (passing as function args doesn't count, returning does) 
involving delegates, it inserts code that will do the following:
- Check whether the delegate being assigned from is on the stack or the 
heap.
- If it's on the stack, make a copy on the heap, and use that.

Scope (partial) closures never get assigned to other things, so no extra 
code will ever be generated or executed for them.

I worry that this might be more complicated with multithreading though.

Also, I'm not sure how to make sure all calls to the closure access the 
same context, and that the function that contains the context also knows 
when it's context has moved off of the stack and into the heap.  I'm not 
sure of this because I'm also not sure how that's handled anyways.

Also notable is that the heuristic I suggest is just that; it is not 
necessarily optimal or even strictly lazy.  There are cases where 
delegates could be passed around by assignment yet never escape their 
scope.  Maybe it is easy enough to add that as another condition for the 
runtime check: is this delegate being assigned to some place in the heap 
or too far up (down?) in the stack?  Just an optimization though, and 
probably one not nearly as important.

OK so all of this doesn't help much with the more general problem of 
/static/ escape analysis.  Oh well.
Oct 30 2008
prev sibling parent Christopher Wright <dhasenan gmail.com> writes:
Walter Bright wrote:
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I appreciate OOP. I also appreciate it when it takes no significant effort to write safe code. I also appreciate it when I don't have to convince the compiler that what I'm doing is safe when I know it's safe. In the case of pointers, I don't use them, most of the time. (I'm working on a variable-key-length cache-oblivious lookahead array right now, and that requires pointers for efficiency, but this is probably the first time I've used pointers in D.) In the case of delegates, I use them. I've been confused and upset by the lack of closures in D1. I think a lot of new programmers will expect closures and get confused by having two different ways of declaring them. For my code, I won't mind using whatever new syntax for closures, even if it's slightly verbose. For new programmers, I'd recommend using closures by default, since they're safer. Once they're more comfortable with the language, you can introduce the idea of allocating delegate context on the stack as an occasionally unsafe optimization.
Nov 01 2008