digitalmars.D - Escape analysis

Walter Bright (23/23) Oct 27 2008 The delegate closure issue is part of a wider issue - escape analysis. A...

Steven Schveighoffer (33/56) Oct 27 2008 I think the default should be no escape. This should cover 90% of cases...
Hxal (15/24) Oct 27 2008 While requiring parameters by default to not escape the function
Walter Bright (7/7) Oct 27 2008 The reason the scope/noscope needs to be part of the function signature

Steven Schveighoffer (37/44) Oct 28 2008 But the documentation is not enough. You cannot express the intricacies...

Robert Jacques (6/43) Oct 28 2008 Escape analysis also applies to shared/local/scope storage types and not...

Steven Schveighoffer (19/76) Oct 28 2008 shared/unshared is not a storage class, it is a type modifier (like cons...

Robert Jacques (8/16) Oct 28 2008 No, because shared and local objects get created and garbage collected o...

Steven Schveighoffer (24/39) Oct 28 2008 That's an interesting point. Shared definitely has to be a type modifie...

Robert Jacques (15/63) Oct 28 2008 This is a desirable error that was discussed back when shared was

Walter Bright (15/54) Oct 28 2008 I think it is conceptually straightforward whether a reference escapes

Sean Kelly (22/45) Oct 28 2008 There's another weird issue that I'm not sure if anyone has touched on:

Walter Bright (5/9) Oct 28 2008 What you're talking about is the escaping of pointers to local

Don (3/14) Oct 29 2008 You could allow it in inside a pure function, whenever the return type

Sergey Gromov (3/52) Oct 28 2008 Part of s escapes, so the compiler should assume that the whole s

Sean Kelly (9/52) Oct 28 2008 So are you saying that I'd have to rewrite fnA as:

Steven Schveighoffer (21/76) Oct 28 2008 This 'feature' is basically useless ;) D has no shared libraries, so I

Walter Bright (12/57) Oct 28 2008 I disagree. The whole idea behind separate compilation and using

Steven Schveighoffer (47/104) Oct 28 2008 Any decent build tool (including make, assuming dependencies are created...

Walter Bright (4/4) Oct 27 2008 Pure functions almost implicitly imply that its parameters are all

Michel Fortin (9/13) Oct 27 2008 Not if you define "scope" in the function prototype as not escaping the
Andrei Alexandrescu (4/8) Oct 27 2008 I think even the return value can be considered scoped. Essentially it

Jason House (2/12) Oct 28 2008 As far as I know, there's no way for functions to specially prepare obje...

Jason House (3/36) Oct 27 2008 I like the original definition of in as "const scope". I would also lik...
Robert Fraser (2/35) Oct 27 2008 I get the feeling that D's type system is going to become the joke of th...

Walter Bright (4/8) Oct 27 2008 That argues that "noscope" should be the default. Using "scope" would be...

Denis Koroskin (9/17) Oct 27 2008 I hope that 'noscope' is considered to be default *not* because is would...
Robert Fraser (6/16) Oct 27 2008 My point wasn't the number of keywords... ("shared" is actually the

Walter Bright (6/11) Oct 27 2008 The complexity is an issue that concerns me. That's why I suspect that
Andrei Alexandrescu (3/20) Oct 27 2008 I don't think you have a case.

Michel Fortin (16/18) Oct 27 2008 I don't think you have much choice. Take these examples:

Walter Bright (2/15) Oct 27 2008 scope is a storage class, not a type constructor.

Jason House (2/18) Oct 27 2008 How do you treat members of objects passed in? If I pass in a struct wit...

Walter Bright (2/7) Oct 27 2008 The scope applies to the bits of the object, not what they may refer to.

Michel Fortin (18/27) Oct 28 2008 So basically, we always have head-scope. Here's my question:

Jason House (2/36) Oct 28 2008

Jason House (2/10) Oct 28 2008 This seems rather limiting. I know this is aimed at addressing the dynam...

Andrei Alexandrescu (10/25) Oct 28 2008 I think it's clear that scope is transitive as much as const or

Jason House (2/31) Oct 28 2008 Transitive scope means that scope can't be a storage class. It's a trick...
Steven Schveighoffer (14/38) Oct 28 2008 A quick patch is not possible IMO.

Denis Koroskin (21/25) Oct 28 2008 On Tue, 28 Oct 2008 16:54:15 +0300, Steven Schveighoffer
Bill Baxter (10/16) Oct 28 2008 So basically programmers have to memorize all the rules the compiler

Steven Schveighoffer (26/46) Oct 28 2008 First, the compiler does not have any sound rules for this. It currentl...

Bill Baxter (21/60) Oct 28 2008 I don't see why not. Because the compiler might be allocating a

Steven Schveighoffer (27/98) Oct 28 2008 No, I'm proposing the compiler SHOULDN'T allocate closures unless it can

Sean Kelly (4/11) Oct 28 2008 Like const, I'd rather have no solution than a bad solution insofar as

Bill Baxter (10/20) Oct 28 2008 The only serious problem people have right now is that closures are

Sean Kelly (32/52) Oct 28 2008 This would be the most backwards-compatible way also. The only real

Walter Bright (13/20) Oct 28 2008 The counter to that is that when there is an inadvertent escape of a

Sean Kelly (13/32) Oct 28 2008 I think the cost/benefit of this could probably be argued either way.

Jason House (2/11) Oct 28 2008 As the author of an open source multithreaded application in D1, I've ha...

Jarrett Billingsley (5/16) Oct 28 2008 For what it's worth, std.bind I think depends on one Phobos-specific

Jason House (2/22) Oct 28 2008 I ported a bind implementation and maintain it in my code base. I didn't...

Andrei Alexandrescu (6/22) Oct 28 2008 I agree. Particularly in higher-order code this kind of problem is bound...

Bill Baxter (10/32) Oct 28 2008 I've had bugs caused by this but they were pretty easy to find.

Andrei Alexandrescu (4/34) Oct 28 2008 I don't think we can afford program correctness to rest on anecdote and

Walter Bright (8/10) Oct 28 2008 I agree. When you're managing a program with a million lines of code in
Bill Baxter (6/56) Oct 28 2008 I haven't seen any real data about how serious a problem this is from

Andrei Alexandrescu (8/57) Oct 28 2008 Well to provide real data I'd have to spend time on user studies, which

Andrei Alexandrescu (9/74) Oct 29 2008 I just wanted to issue an apology to Bill for the above, which is

Bill Baxter (11/95) Oct 29 2008 No problem. My comment leading to that response was a bit snarky too.

Steven Schveighoffer (18/23) Oct 30 2008 I doubt anyone wants that. But here is my main concern (my defense for

ore-sama (2/4) Oct 30 2008 Moreover that sematics in some cases will force allocation when it's not...

Walter Bright (9/12) Oct 28 2008 I have. Not often in my own code because I am very careful to avoid it,

Sean Kelly (9/22) Oct 28 2008 I tend to ask a question along these lines to entry-level interviewees

Walter Bright (11/16) Oct 28 2008 To me that is akin to building a car with no brakes and justifying it by...

Steven Schveighoffer (6/18) Oct 28 2008 I agree with this. It would be nice to be able to flag these kinds of

Bill Baxter (7/25) Oct 28 2008 Ok, I think we're completely on the same page here. I'm for the

Robert Fraser (5/32) Oct 29 2008 How about adding a warning switch (I know Walter you're against them but...

Robert Jacques (15/16) Oct 27 2008 Okay, I'm confused. I had assumed that the escape scope was different fr...

Andrei Alexandrescu (4/41) Oct 27 2008 This is a misunderstanding. Scope is a storage class, not a type
Mosfet (4/41) Oct 28 2008 I agree I think that D will be used only by people like you that

Andrei Alexandrescu (5/50) Oct 28 2008 Well I think you were right. The question is how much you spend learning...

dsimcha (18/22) Oct 28 2008 Seconded. Both C++ and D are very complex languages, but I don't see th...

Jason House (7/40) Oct 27 2008 In D1, local variables implicitly follow a mixed rule:
ore-sama (1/1) Oct 28 2008 Allocation is determined on delegate creation, not on passing it somewhe...
Sergey Gromov (29/38) Oct 28 2008 I'm for safe defaults. Programs shouldn't crash for no reason.

Steven Schveighoffer (31/69) Oct 28 2008 If safe defaults means 75% performance decrease, I'm for using unsafe

Sergey Gromov (28/113) Oct 28 2008 Please note the "in the absence of function calls" part. I'm talking

Steven Schveighoffer (27/145) Oct 28 2008 Ah, sorry. I read 'absence of function source'. My bad, in that case w...

Sergey Gromov (13/76) Oct 29 2008 Allocation only happens when a stack variable reference escapes via a

Steven Schveighoffer (38/120) Oct 29 2008 A static array declared on the stack absolutely is a stack variable.

Sergey Gromov (14/124) Oct 29 2008 There is no delegate, therefore nothing to allocate a closure for. If

Steven Schveighoffer (17/89) Oct 29 2008 I was under the impression that closures are currently allocated if you

Sergey Gromov (6/24) Oct 29 2008 I do understand that. I just wanted to discuss whether it is possible

Chad J (15/32) Oct 28 2008 If safe defaults means 2% performance decrease, I'm for using unsafe

Robert Jacques (24/24) Oct 28 2008 I've run across some academic work on ownership types which seems releva...

Michel Fortin (110/112) Oct 29 2008 I haven't read the paper yet, but the overview seems to go in the same

Steven Schveighoffer (12/17) Oct 29 2008 [snip]

Robert Jacques (4/27) Oct 29 2008 Note that one of a major points in the Pedigree paper is the static type...
Michel Fortin (65/74) Oct 30 2008 I agree that this is becomming a problem, even without scope. What we

Steven Schveighoffer (40/52) Oct 31 2008 But the burden you have left for the developer is a tough one. You have...

Robert Jacques (19/87) Oct 31 2008 Tools can't handle function pointers, which is why escape analysis has
Michel Fortin (62/122) Oct 31 2008 If you can't determine yourself that a function can work with scoped

Steven Schveighoffer (27/80) Nov 01 2008 But often times, the safety of the call depends on how it is being calle...

Andrei Alexandrescu (7/9) Nov 01 2008 I think that's a fair assessment. One suggestion I made Walter is to

Steven Schveighoffer (4/12) Nov 02 2008 If scope delegates means trust the coder knows what he is doing (in the

Andrei Alexandrescu (31/43) Nov 02 2008 It looks like things will move that way. Bartosz, Walter and I talked a

bearophile (4/6) Nov 02 2008 UHm... I see. But I am not sure I like that. Isn't that a waste of memor...

Andrei Alexandrescu (7/13) Nov 02 2008 Yah, we can't get rid of that. Possibilities discussed were (a) make
dsimcha (7/13) Nov 02 2008 And a monitor. And RTTI. Then again, for code that absolutely must be a...
Jarrett Billingsley (4/8) Nov 02 2008 No, they have a *pointer* to a vtable. There is only one vtable per

Michel Fortin (56/89) Nov 02 2008 That's a little disapointing. I was hoping for something to fix all

Andrei Alexandrescu (33/134) Nov 02 2008 That's only the half of it. If you want to take a look at a C-like

Michel Fortin (25/40) Nov 02 2008 First, I think it's a pretty good idea to have this. Second, I think

Andrei Alexandrescu (15/39) Nov 02 2008 [snip]

Michel Fortin (62/75) Nov 03 2008 Studying things more in depth often at first leave you with the

Andrei Alexandrescu (13/18) Nov 03 2008 It may be wise to read some more before writing some more. As far as I

Michel Fortin (35/57) Nov 04 2008 Pretty interesting slides.

Andrei Alexandrescu (10/74) Nov 04 2008 Cyclone has region subtyping which takes care of that.

Michel Fortin (25/47) Nov 04 2008 Indeed, I was somewhat mistaken that the <> notation was templates
Michel Fortin (47/69) Nov 05 2008 Not the same way as I'm proposing. What cyclone does is make p

Andrei Alexandrescu (25/98) Nov 06 2008 Well how about this:

Steven Schveighoffer (8/14) Nov 07 2008 FWIW, I still think the proposal you have put forth about references bei...
Michel Fortin (70/126) Nov 09 2008 I don't see a problem at all. The compiler would expand the lifetime of

Christopher Wright (5/33) Nov 09 2008 In point of fact, it's expensive to extend the stack, so any compiler

Michel Fortin (60/79) Nov 14 2008 If you mean there could be a problem with functions referring to the

Andrei Alexandrescu (33/116) Nov 09 2008 I agree that an escape analyzer would improve things. I am not sure that...

Michel Fortin (87/176) Nov 12 2008 If you think I proposed a region-oblivious scheme, then you've got me

Andrei Alexandrescu (55/187) Nov 12 2008 But how do you type then the assignment example?

Hxal (42/80) Nov 12 2008 Examples such as this one are rare enough to afford the need for
Michel Fortin (106/242) Nov 12 2008 Everywhere I said there was no need for named regions, I also said

Andrei Alexandrescu (46/180) Nov 12 2008 No, the code is correct as written (without the if). You may want to

Michel Fortin (162/300) Nov 14 2008 Ok, I've reread that part and it's true that using Cyclone's subtyping

Andrei Alexandrescu (5/8) Nov 14 2008 By this I meant I don't have time (t < 0), not that I was writing while

Robert Jacques (4/10) Nov 02 2008 Does this mean the whole shared/local/scope issue for classes is being

Andrei Alexandrescu (3/15) Nov 02 2008 What issue do you have in mind?

Robert Jacques (5/18) Nov 03 2008 Right now, it's trivial for scope classes to escape due to automatic

Steven Schveighoffer (64/105) Nov 03 2008 Isn't this already the case?

Andrei Alexandrescu (20/121) Nov 03 2008 It's planned as a compiler switch and module option. Essentially SafeD

Steven Schveighoffer (38/69) Nov 04 2008 I personally probably won't use it, as I feel I have enough experience t...

Robert Jacques (5/39) Nov 02 2008 Various research languages have shown both 1 and 2 are possible.

Steven Schveighoffer (28/70) Nov 03 2008 I think 1 can be possibly done. 2 is a matter of subjectivity, and so f...

Michel Fortin (28/55) Nov 04 2008 I won't dispute this. I'll note that the upcomming "shared" keyword may

Robert Jacques (11/69) Oct 29 2008 What does the scope part of 'scope MyObject o' mean? (i.e. is this D's

Michel Fortin (50/130) Oct 30 2008 Ok, I should have defined that better. It means that o is bound the

Robert Jacques (12/47) Oct 30 2008 Just to clarify:

Michel Fortin (10/21) Oct 30 2008 Well, it all depends if foo wants the second argument of i must be

Robert Jacques (5/21) Oct 31 2008 Actually, what I meant was that o may be local or shared. However,

Robert Jacques (8/10) Oct 30 2008 How about o.scope instead of scope(o)? Also, this would allow

Michel Fortin (9/20) Oct 30 2008 Hum, but can that syntax guarenty a reference to o or i won't escape

Robert Jacques (12/28) Oct 31 2008 No, the syntax was meant to address the more complex problem of specifyi...
Robert Jacques (10/26) Oct 31 2008 Another option is for the default to be escape. i.e. a contract is

Robert Jacques (3/4) Oct 31 2008 Correction: default to be _no_ escape.

bearophile (5/5) Oct 29 2008 I think C++ designers are fully mad, this shows how to use C++ lambdas:

Sergey Gromov (4/11) Oct 29 2008 Well, they're somewhat limited, and a bit manual, and actually just a

Bill Baxter (8/19) Oct 29 2008 I think it's mostly the capture mode [] stuff that's a bit ugly.

Sergey Gromov (5/17) Oct 29 2008 The discussed features are really a significant improvement for C++

Jarrett Billingsley (2/3) Oct 29 2008 It's called decltype().

Robert Fraser (2/6) Oct 29 2008 C++ is a .NET language now ;-P

Chad J (23/23) Oct 30 2008 I wonder if it would be easy enough to allocate closures lazily at runti...
Christopher Wright (16/25) Nov 01 2008 I appreciate OOP. I also appreciate it when it takes no significant

Walter Bright <newshound1 digitalmars.com> writes:

The delegate closure issue is part of a wider issue - escape analysis. A 
reference is said to 'escape' a scope if it, well, leaves that scope. 
Here's a trivial example:

int* foo() { int i; return &i; }

The reference to i escapes the scope of i, thus courting disaster. 
Another form of escaping:

int* p;
void bar(int* x) { p = x; }

which is, on the surface, legitimate, but fails for:

void abc(int j)
{
     bar(&j);
}

This kind of problem is currently undetectable by the compiler.

The first step is, are function parameters considered to be escaping by 
default or not by default? I.e.:

void bar(noscope int* p);    // p escapes
void bar(scope int* p);      // p does not escape
void bar(int* p);            // what should be the default?

What should be the default? The functional programmer would probably 
choose scope as the default, and the OOP programmer noscope.

(The issue with delegates is we need the dynamic closure only if the 
delegate 'escapes'.)

Oct 27 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. Another 
 form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
     bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I think the default should be no escape.  This should cover 90% of cases, 
and does not have an 'allocate by default' policy.

But I think whether a variable escapes or not cannot really be determined by 
the function accepting the variable, since the function doesn't know where 
the variable comes from.  An example:

void bar(int *x, ref int *y) { y = x;}

How do you know that y is not defined in the same scope or a sub-scope of 
the scope of x?  If the compiler sees:

void bar(noscope int *x, scope ref int *y)

It's going to assume that x will always escape, and probably allocate a 
closure so it can call bar.  Which might not be the right decision.

I think that without a full graph analysis of what escapes to where, it is 
going to be impossible to make this correct for the compiler to use, and 
that might be too much for the compiler to deal with.  I'd rather just have 
the compiler assume scope unless told otherwise (at the point of use, not in 
the function signature).

For instance:

void bar(int *x, ref int *y) { y = x;}

void abc(int x, ref int *y)
{
   bar(noscope &x, y);
}

void abc2()
{
   int x;
   int *y;
   bar(&x, y);
}

tells the compiler to allocate a closure for abc, because x might escape. 
But does not allocate a closure for abc2, because there are no escapes 
indicated by the developer.

-Steve

Oct 27 2008

Hxal <Hxal freenode.irc> writes:

Walter Bright Wrote:
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

While requiring parameters by default to not escape the function
would be a great, because it'd cause less spam (I think they don't escape
in most cases) and potentially make programmers think around and
refactor their code - it'd also be quite a breaking change.

Defaulting to no escape checking being done and providing a scope parameter
class seems therefore the more obvious choice. It keeps existing code intact
and allows correctness checking and optimization on demand.

My only fear is that the feature will cause much frustration when we can reason
that a reference doesn't escape, but the compiler can't know that. For example
putting one scope parameter into another's field, or referencing a scope
parameter from a complex return value.

Anyway, if escape analysis is implemented, I'd suggest using a more high level
terminology like temporary and permanent objects. Might make more sense to
beginners.

Oct 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

The reason the scope/noscope needs to be part of the function signature 
is because:

1. that makes it self-documenting
2. function bodies may be external, i.e. not present
3. virtual functions
4. notifies the user if a library function parameter scope-ness changes 
(you'll get a compile time error)

Oct 27 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature is 
 because:

 1. that makes it self-documenting

But the documentation is not enough.  You cannot express the intricacies of 
what variables are scope escapes so that the compiler can make intelligent 
enough decisions.  What this will result in is slightly less unnecessary 
closures, but not enough to make a difference.  Or else you won't be able to 
declare things the way you want, so you will be forced to declare something 
that *could* result in an escape, but usually doesn't.

 2. function bodies may be external, i.e. not present
 3. virtual functions

Yes, so you are now implying a scope escape contract on all derived classes. 
But not a very expressive one.

 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

Oh really?  I imagined that if the scope-ness changed it just results in a 
new heap allocation when I call the function.

i.e.  Joe library developer has this function foo:

int foo(scope int *x) {return *x;}

And he now decides he wants to change it somehow:

int *lastFooCalledWith;
int foo(int *x) {lastFooCalledWith = x; return *x;}

I used foo like this:

int i;
auto j = foo(&i);

So does this now fail to compile?  Or does it silently kill the performance 
of my code?

If the latter, we are left with the same problem we have now.

If the former, how does one call a function with a noscope parameter?

The more I think about this, the more I'd rather have D1 behavior and some 
sort of way to indicate my function should allocate a heap frame (except on 
easily provable scope escapes).

The most common case I think which will cause unnecessary allocations, is a 
very common case.  A class setter:

class X
{
   private int *v_;
   int *v(int *newV) {return v_ = newV;}
   int *v() { return v_;}
}

Clearly, newV escapes into the class instance, but how do we know what the 
scope of the class instance is to know if newV truly escapes its own scope?

-Steve

Oct 28 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 28 Oct 2008 08:58:18 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 4. notifies the user if a library function parameter scope-ness changes
 (you'll get a compile time error)

 Oh really?  I imagined that if the scope-ness changed it just results in  
 a
 new heap allocation when I call the function.

 i.e.  Joe library developer has this function foo:

 int foo(scope int *x) {return *x;}

 And he now decides he wants to change it somehow:

 int *lastFooCalledWith;
 int foo(int *x) {lastFooCalledWith = x; return *x;}

 I used foo like this:

 int i;
 auto j = foo(&i);

 So does this now fail to compile?  Or does it silently kill the  
 performance
 of my code?

 If the latter, we are left with the same problem we have now.

 If the former, how does one call a function with a noscope parameter?

 The more I think about this, the more I'd rather have D1 behavior and  
 some
 sort of way to indicate my function should allocate a heap frame (except  
 on
 easily provable scope escapes).

 The most common case I think which will cause unnecessary allocations,  
 is a
 very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance, but how do we know what  
 the
 scope of the class instance is to know if newV truly escapes its own  
 scope?

Escape analysis also applies to shared/local/scope storage types and not  
just delegates. Consider having to write a function for every combination  
of shared/local/scope for every object or pointer in the function  
signature.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Robert Jacques" wrote
 On Tue, 28 Oct 2008 08:58:18 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 4. notifies the user if a library function parameter scope-ness changes
 (you'll get a compile time error)

 Oh really?  I imagined that if the scope-ness changed it just results in 
 a
 new heap allocation when I call the function.

 i.e.  Joe library developer has this function foo:

 int foo(scope int *x) {return *x;}

 And he now decides he wants to change it somehow:

 int *lastFooCalledWith;
 int foo(int *x) {lastFooCalledWith = x; return *x;}

 I used foo like this:

 int i;
 auto j = foo(&i);

 So does this now fail to compile?  Or does it silently kill the 
 performance
 of my code?

 If the latter, we are left with the same problem we have now.

 If the former, how does one call a function with a noscope parameter?

 The more I think about this, the more I'd rather have D1 behavior and 
 some
 sort of way to indicate my function should allocate a heap frame (except 
 on
 easily provable scope escapes).

 The most common case I think which will cause unnecessary allocations, 
 is a
 very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance, but how do we know what 
 the
 scope of the class instance is to know if newV truly escapes its own 
 scope?

 Escape analysis also applies to shared/local/scope storage types and not 
 just delegates. Consider having to write a function for every combination 
 of shared/local/scope for every object or pointer in the function 
 signature.

shared/unshared is not a storage class, it is a type modifier (like const). 
But in any case, shared is much easier to define.  Only one line needs to be 
checked -- is this accessible by another thread or not.  Since it is a type 
modifier, it's carried around for every reference to shared data, and you 
can easily do escape analysis there.

Scope is much more difficult because there are many scopes to consider. 
It's not just global or not global, you have a scope for each function, a 
scope for each set of braces within a function, and there is no easy way to 
say which scope you are referring to when you say a variable is scope.  If 
you can only refer to the current scope, then you have not solved the 
closure problem, and useful escape analysis is impossible beyond simply 'a 
pointer to a variable I declared in this scope is being returned.'

In order for escape analysis to be useful, I need to be able to specify in a 
function such as:

void foo(int *x, int **y, int **z)

That x might escape to y's or z's scope.  How do you do allow that 
specification without making function signatures dreadfully complicated?

-Steve

Oct 28 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like  
 const).

No, because shared and local objects get created and garbage collected on  
different heaps.

 In order for escape analysis to be useful, I need to be able to specify  
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully complicated?

Well, x escapes to y or z is easy since it's how D works today. And if you  
have a no_assignment type, then the x won't escape to y or z is easy too.  
It's the mixed cases that things get complicated in. I'd recommend looking  
up pedigree types as one possible solution.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Robert Jacques" wrote
 On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like 
 const).

 No, because shared and local objects get created and garbage collected on 
 different heaps.

That's an interesting point.  Shared definitely has to be a type modifier, 
otherwise, it cannot do this:

shared int x = 0;

int *xp = &x; // error, xp now is unshared, and points to shared data.

But it probably also has to be a storage class also.  Not sure about that.

 In order for escape analysis to be useful, I need to be able to specify 
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully complicated?

 Well, x escapes to y or z is easy since it's how D works today.

But what if y or z is not in x's scope?  For instance:

void bar(ref int *y, ref int *z)
{
   int x = 5;
   foo(&x, &y, &z);
}

If y or z gets set to &x, then you have to allocate a closure for bar.

The opposite example:

void bar(int *y, int *z)
{
   int x = 5;
   foo(&x, &y, &z);
}

No closure necessary.  So you need something to say that y or z can get set 
to x, so the compiler would be smart enough to only allocate a closure if y 
or z exists outside x's scope.  Otherwise, you have unnecessary closures, 
and we are in the same boat as today.

-Steve

Oct 28 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 28 Oct 2008 14:46:34 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Robert Jacques" wrote
 On Tue, 28 Oct 2008 09:44:28 -0400, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 shared/unshared is not a storage class, it is a type modifier (like
 const).

 No, because shared and local objects get created and garbage collected  
 on
 different heaps.

 That's an interesting point.  Shared definitely has to be a type  
 modifier,
 otherwise, it cannot do this:

 shared int x = 0;

 int *xp = &x; // error, xp now is unshared, and points to shared data.

 But it probably also has to be a storage class also.  Not sure about  
 that.

This is a desirable error that was discussed back when shared was  
introduced. You can think of shared / local like immutable and mutable.  
The real problem is that a 'const' for shared/local/scope isn't clear yet.

 In order for escape analysis to be useful, I need to be able to specify
 in a
 function such as:

 void foo(int *x, int **y, int **z)

 That x might escape to y's or z's scope.  How do you do allow that
 specification without making function signatures dreadfully  
 complicated?

 Well, x escapes to y or z is easy since it's how D works today.

 But what if y or z is not in x's scope?

Which is an issue with the user of foo, but not foo's signature.

 For instance:
 void bar(ref int *y, ref int *z)
 {
    int x = 5;
    foo(&x, &y, &z);
 }

 If y or z gets set to &x, then you have to allocate a closure for bar.

 The opposite example:

 void bar(int *y, int *z)
 {
    int x = 5;
    foo(&x, &y, &z);
 }

 No closure necessary.  So you need something to say that y or z can get  
 set
 to x, so the compiler would be smart enough to only allocate a closure  
 if y
 or z exists outside x's scope.  Otherwise, you have unnecessary closures,
 and we are in the same boat as today.

This example, although important is essentially about whether optimizing  
the closure is valid or not and has nothing to do with the behaviour of  
foo. However, this does seem to illustrate a need for three types: global  
escape (variable may escape to anywhere), pure escape (variable may escape  
to other inputs), no escape (variable is guaranteed not to escape). For  
example, if foo saved &x to a static variable (global escape) then is all  
cases it needs to be heap allocated. But if (as in your example) &x is  
saved to one of the function inputs (pure escape), then the caller can  
detect if it can ensure no escape and therefore use the stack.

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature is 
 because:

 1. that makes it self-documenting

 
 But the documentation is not enough.  You cannot express the intricacies of 
 what variables are scope escapes so that the compiler can make intelligent 
 enough decisions.  What this will result in is slightly less unnecessary 
 closures, but not enough to make a difference.  Or else you won't be able to 
 declare things the way you want, so you will be forced to declare something 
 that *could* result in an escape, but usually doesn't.

I think it is conceptually straightforward whether a reference escapes 
or not, though it is difficult for the compiler to detect it reliably.

 2. function bodies may be external, i.e. not present
 3. virtual functions

 
 Yes, so you are now implying a scope escape contract on all derived classes. 
 But not a very expressive one.
 
 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

 
 Oh really?  I imagined that if the scope-ness changed it just results in a 
 new heap allocation when I call the function.

First off, the mangled names will be different, so it won't link until 
you recompile. This is critical because the caller's code depends on the 
scope/noscope characteristic.

Secondly, passing a scoped reference to a noscope parameter should be a 
compile time error.

 The more I think about this, the more I'd rather have D1 behavior and some 
 sort of way to indicate my function should allocate a heap frame (except on 
 easily provable scope escapes).

Having the caller specify it is not tenable, because the caller has no 
control over (and likely no knowledge of) what the callee does. 
Functions should be regarded as black boxes, where all you can know 
about them is in the function signature.

 The most common case I think which will cause unnecessary allocations, is a 
 very common case.  A class setter:
 
 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }
 
 Clearly, newV escapes into the class instance,

Then it's noscope.

 but how do we know what the 
 scope of the class instance is to know if newV truly escapes its own scope?

We take the conservative approach, and regard "might escape" and "don't 
know if it escapes" as "treat as if it does escape".

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

 But the documentation is not enough.  You cannot express the 
 intricacies of what variables are scope escapes so that the compiler 
 can make intelligent enough decisions.  What this will result in is 
 slightly less unnecessary closures, but not enough to make a 
 difference.  Or else you won't be able to declare things the way you 
 want, so you will be forced to declare something that *could* result 
 in an escape, but usually doesn't.

 
 I think it is conceptually straightforward whether a reference escapes 
 or not, though it is difficult for the compiler to detect it reliably.
 
 2. function bodies may be external, i.e. not present
 3. virtual functions

 Yes, so you are now implying a scope escape contract on all derived 
 classes. But not a very expressive one.


There's another weird issue that I'm not sure if anyone has touched on:

struct S
{
     int x;
     int getX() { return x; }
}

void main()
{
     auto s = new S;
     fn( s );
}

void fnA( S* s )
{
     fnB( &s.getX );
}

void fnB( noscope int delegate() dg ) {}

How does the compiler handle this?  It can't tell by inspecting the type 
whether the data for S is dynamic... in fact, the same could be said of 
a "scope" instance of a class.  I guess it would have to assume that 
object variables without a "noscope" label must be scoped?


Sean

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 How does the compiler handle this?  It can't tell by inspecting the type 
 whether the data for S is dynamic... in fact, the same could be said of 
 a "scope" instance of a class.  I guess it would have to assume that 
 object variables without a "noscope" label must be scoped?

What you're talking about is the escaping of pointers to local 
variables. The compiler does not detect it, except in trivial cases.

This is why, in safe mode, taking the address of a local variable will 
not be allowed.

Oct 28 2008

Don <nospam nospam.com.au> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 How does the compiler handle this?  It can't tell by inspecting the 
 type whether the data for S is dynamic... in fact, the same could be 
 said of a "scope" instance of a class.  I guess it would have to 
 assume that object variables without a "noscope" label must be scoped?

 
 What you're talking about is the escaping of pointers to local 
 variables. The compiler does not detect it, except in trivial cases.
 
 This is why, in safe mode, taking the address of a local variable will 
 not be allowed.

You could allow it in inside a pure function, whenever the return type 
does not contain pointers.

Oct 29 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Sean Kelly wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

 But the documentation is not enough.  You cannot express the 
 intricacies of what variables are scope escapes so that the compiler 
 can make intelligent enough decisions.  What this will result in is 
 slightly less unnecessary closures, but not enough to make a 
 difference.  Or else you won't be able to declare things the way you 
 want, so you will be forced to declare something that *could* result 
 in an escape, but usually doesn't.

 I think it is conceptually straightforward whether a reference escapes 
 or not, though it is difficult for the compiler to detect it reliably.

 2. function bodies may be external, i.e. not present
 3. virtual functions

 Yes, so you are now implying a scope escape contract on all derived 
 classes. But not a very expressive one.


 
 There's another weird issue that I'm not sure if anyone has touched on:
 
 struct S
 {
      int x;
      int getX() { return x; }
 }
 
 void main()
 {
      auto s = new S;
      fn( s );
 }
 
 void fnA( S* s )
 {
      fnB( &s.getX );
 }

Part of s escapes, so the compiler should assume that the whole s
escapes.  If s is scope by default, it should be a compile-time error here.

 void fnB( noscope int delegate() dg ) {}
 
 How does the compiler handle this?  It can't tell by inspecting the type 
 whether the data for S is dynamic... in fact, the same could be said of 
 a "scope" instance of a class.  I guess it would have to assume that 
 object variables without a "noscope" label must be scoped?
 
 
 Sean

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Sergey Gromov wrote:
 Sean Kelly wrote:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function 
 signature is because:

 1. that makes it self-documenting

 But the documentation is not enough.  You cannot express the 
 intricacies of what variables are scope escapes so that the compiler 
 can make intelligent enough decisions.  What this will result in is 
 slightly less unnecessary closures, but not enough to make a 
 difference.  Or else you won't be able to declare things the way you 
 want, so you will be forced to declare something that *could* result 
 in an escape, but usually doesn't.

 I think it is conceptually straightforward whether a reference escapes 
 or not, though it is difficult for the compiler to detect it reliably.

 2. function bodies may be external, i.e. not present
 3. virtual functions

 Yes, so you are now implying a scope escape contract on all derived 
 classes. But not a very expressive one.


 There's another weird issue that I'm not sure if anyone has touched on:

 struct S
 {
      int x;
      int getX() { return x; }
 }

 void main()
 {
      auto s = new S;
      fn( s );
 }

 void fnA( S* s )
 {
      fnB( &s.getX );
 }

 
 Part of s escapes, so the compiler should assume that the whole s
 escapes.  If s is scope by default, it should be a compile-time error here.

So are you saying that I'd have to rewrite fnA as:

     void fnA( noscope S* s ) {...}

I guess I can see the point, but that's horribly viral.  Particularly 
when classes come into the picture.  With this in mind, from a syntax 
standpoint I'd be leaning towards what D does right now (ie having 
noscope as the default), but from a performance standpoint this is 
absolutely not an option--I may as well just switch to something like Ruby.


Sean

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 The reason the scope/noscope needs to be part of the function signature 
 is because:

 1. that makes it self-documenting

 But the documentation is not enough.  You cannot express the intricacies 
 of what variables are scope escapes so that the compiler can make 
 intelligent enough decisions.  What this will result in is slightly less 
 unnecessary closures, but not enough to make a difference.  Or else you 
 won't be able to declare things the way you want, so you will be forced 
 to declare something that *could* result in an escape, but usually 
 doesn't.

 I think it is conceptually straightforward whether a reference escapes or 
 not, though it is difficult for the compiler to detect it reliably.

 2. function bodies may be external, i.e. not present
 3. virtual functions

 Yes, so you are now implying a scope escape contract on all derived 
 classes. But not a very expressive one.

 4. notifies the user if a library function parameter scope-ness changes 
 (you'll get a compile time error)

 Oh really?  I imagined that if the scope-ness changed it just results in 
 a new heap allocation when I call the function.

 First off, the mangled names will be different, so it won't link until you 
 recompile. This is critical because the caller's code depends on the 
 scope/noscope characteristic.

This 'feature' is basically useless ;)  D has no shared libraries, so I 
don't think anyone generally keeps their stale object files around and tries 
to link with them instead of trying to recompile the sources.  You are 
asking for trouble otherwise.

 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

OK, so when does a closure happen?  I thought the point of this was to 
specify when a closure was necessary...

compiler sees foo(noscope int *x)

I try to pass in an address to a local variable.  Compiler says, hm... I 
need a closure to convert my scope variable into a noscope.

 The more I think about this, the more I'd rather have D1 behavior and 
 some sort of way to indicate my function should allocate a heap frame 
 (except on easily provable scope escapes).

 Having the caller specify it is not tenable, because the caller has no 
 control over (and likely no knowledge of) what the callee does. Functions 
 should be regarded as black boxes, where all you can know about them is in 
 the function signature.

But the compiler's lack of knowledge/proof about the escape intricacies of a 
function will cause either a) unnecessary closure allocation, or b) 
impossible specifications.  i.e. I want to specify that either a scope or 
noscope variable can be passed in, and the variable might escape depending 
on what you pass in for other arguments, how do I do that?

 The most common case I think which will cause unnecessary allocations, is 
 a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,

 Then it's noscope.

So then to call X.v, the function must allocate a closure?  How does this 
improve the current situation where closures are allocated by default?

 but how do we know what the scope of the class instance is to know if 
 newV truly escapes its own scope?

 We take the conservative approach, and regard "might escape" and "don't 
 know if it escapes" as "treat as if it does escape".

Also untenable.  We have the same situation today.  You will have achieved 
nothing with this syntax except making people write scope or noscope 
everywhere to satisfy incomplete compiler rules.

-Steve

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Steven Schveighoffer wrote:
 "Walter Bright" wrote
 First off, the mangled names will be different, so it won't link until you 
 recompile. This is critical because the caller's code depends on the 
 scope/noscope characteristic.

 
 This 'feature' is basically useless ;)  D has no shared libraries, so I 
 don't think anyone generally keeps their stale object files around and tries 
 to link with them instead of trying to recompile the sources.  You are 
 asking for trouble otherwise.

I disagree. The whole idea behind separate compilation and using 
makefiles is to recompile only what is necessary. Encoding the function 
specification into its identifier is a tried and true way of detecting 
mistakes in that.


 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

 
 OK, so when does a closure happen?  I thought the point of this was to 
 specify when a closure was necessary...
 
 compiler sees foo(noscope int *x)
 
 I try to pass in an address to a local variable.  Compiler says, hm... I 
 need a closure to convert my scope variable into a noscope.

Either the compiler issues an error, or it allocates the scoped variable 
on the heap. I prefer the former behavior.


 But the compiler's lack of knowledge/proof about the escape intricacies of a 
 function will cause either a) unnecessary closure allocation, or b) 
 impossible specifications.  i.e. I want to specify that either a scope or 
 noscope variable can be passed in, and the variable might escape depending 
 on what you pass in for other arguments, how do I do that?

You make it noscope. Remember that scope is an optimization.

 The most common case I think which will cause unnecessary allocations, is 
 a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,

 Then it's noscope.

 
 So then to call X.v, the function must allocate a closure?  How does this 
 improve the current situation where closures are allocated by default?

If it's escaping, you MUST allocate it in a way that doesn't disappear 
when the escape happens.


 We take the conservative approach, and regard "might escape" and "don't 
 know if it escapes" as "treat as if it does escape".

 
 Also untenable.  We have the same situation today.  You will have achieved 
 nothing with this syntax except making people write scope or noscope 
 everywhere to satisfy incomplete compiler rules.

The improvement with the 'scope' keyword is it allows the compiler to 
assume that the reference does not escape.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 Steven Schveighoffer wrote:
 "Walter Bright" wrote
 First off, the mangled names will be different, so it won't link until 
 you recompile. This is critical because the caller's code depends on the 
 scope/noscope characteristic.

 This 'feature' is basically useless ;)  D has no shared libraries, so I 
 don't think anyone generally keeps their stale object files around and 
 tries to link with them instead of trying to recompile the sources.  You 
 are asking for trouble otherwise.

 I disagree. The whole idea behind separate compilation and using makefiles 
 is to recompile only what is necessary. Encoding the function 
 specification into its identifier is a tried and true way of detecting 
 mistakes in that.

Any decent build tool (including make, assuming dependencies are created)
will rebuild the source when it sees the dependency changed.  In this case,
if the new signature can be used, it will recompile silently.  That was my
point.

However, I'm no longer sure what you are planning, because you have
sufficiently confused me ;)

So if the recompile causes a compile failure, then it would fail.  But that
is unrelated to the requirement that you have to recompile to get it to
link.  Even if the function signatures are the same, the build tool is going
to recompile the file instead of linking the stale object.

 Secondly, passing a scoped reference to a noscope parameter should be a 
 compile time error.

 OK, so when does a closure happen?  I thought the point of this was to 
 specify when a closure was necessary...

 compiler sees foo(noscope int *x)

 I try to pass in an address to a local variable.  Compiler says, hm... I 
 need a closure to convert my scope variable into a noscope.

 Either the compiler issues an error, or it allocates the scoped variable 
 on the heap. I prefer the former behavior.

Huh?  So no automatic closures?  If the compiler can't prove that a closure
is or is not necessary, does code now just fail to compile?

 But the compiler's lack of knowledge/proof about the escape intricacies 
 of a function will cause either a) unnecessary closure allocation, or b) 
 impossible specifications.  i.e. I want to specify that either a scope or 
 noscope variable can be passed in, and the variable might escape 
 depending on what you pass in for other arguments, how do I do that?

 You make it noscope. Remember that scope is an optimization.

 The most common case I think which will cause unnecessary allocations, 
 is a very common case.  A class setter:

 class X
 {
    private int *v_;
    int *v(int *newV) {return v_ = newV;}
    int *v() { return v_;}
 }

 Clearly, newV escapes into the class instance,

 Then it's noscope.

 So then to call X.v, the function must allocate a closure?  How does this 
 improve the current situation where closures are allocated by default?

 If it's escaping, you MUST allocate it in a way that doesn't disappear 
 when the escape happens.

The problem is, what if I know it's escaping in some cases, but not in
others, but the compiler can't tell either way?  (see example below)

 We take the conservative approach, and regard "might escape" and "don't 
 know if it escapes" as "treat as if it does escape".

 Also untenable.  We have the same situation today.  You will have 
 achieved nothing with this syntax except making people write scope or 
 noscope everywhere to satisfy incomplete compiler rules.

 The improvement with the 'scope' keyword is it allows the compiler to 
 assume that the reference does not escape.

And is that property enforced while compiling the function, or does the
compiler assume the author knows best?

Like I said, I'm sufficiently confused...

How do I markup class X so that at least foo and foo2 compile without
issues?

class X
{
   int *p;
   this(int *p_) {p = p_;}
}

// I expect this to compile and work.
void foo()
{
   int i;
   auto x = new X(&i);
}

// I expect this to compile and work.
X foo2()
{
   int[] arr = new int[1];
   return new X(&arr[0]);
}

// What happens here, a closure or a failure?
X foo3()
{
  int i;
  auto x = new X(&i);
  return x;
}

If you have some syntax such that all 3 compile (i.e. foo3 creates a
closure), then how does the compiler know foo3 is ok?

-Steve

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Pure functions almost implicitly imply that its parameters are all 
scoped. The exception is the return value of the pure function. If the 
return value can contain any references that came from the parameters, 
then those parameters are not scoped.

Oct 27 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-27 17:33:36 -0400, Walter Bright <newshound1 digitalmars.com> said:

 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

Not if you define "scope" in the function prototype as not escaping the 
caller's scope. That would mean that you can recieve a "caller scope" 
pointer on input and return it back to the caller when the function 
ends. It never escapes the caller's scope, so all is fine.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter Bright wrote:
 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

I think even the return value can be considered scoped. Essentially it 
does not leave the scope of the caller.

Andrei

Oct 27 2008

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Walter Bright wrote:
 Pure functions almost implicitly imply that its parameters are all 
 scoped. The exception is the return value of the pure function. If the 
 return value can contain any references that came from the parameters, 
 then those parameters are not scoped.

 
 I think even the return value can be considered scoped. Essentially it 
 does not leave the scope of the caller.
 
 Andrei


As far as I know, there's no way for functions to specially prepare objects to
be called scope. Isn't that the called's choice?

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I like the original definition of in as "const scope".  I would also like in to
be the default for function parameters.

Does that make me a heretic OOP programmer? :)

Oct 27 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I get the feeling that D's type system is going to become the joke of the
programming world. Are we really going to have to worry about a scope
unshared(invariant(int)*) ...? What other type modifiers can we put on that?

Oct 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

That argues that "noscope" should be the default. Using "scope" would be 
an optional optimization.

BTW, "unshared" is the default. "shared" would be the keyword.

Oct 27 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 28 Oct 2008 01:15:24 +0300, Walter Bright  
<newshound1 digitalmars.com> wrote:

 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

 That argues that "noscope" should be the default. Using "scope" would be  
 an optional optimization.

 BTW, "unshared" is the default. "shared" would be the keyword.

I hope that 'noscope' is considered to be default *not* because is would  
introduce one more keyword otherwise...

OTOH, noscope *should* be a keyword in either case, due to some casts:

scope int* sp;
noscope int* nsp;

nsp = cast(noscope int*)sp;
sp = cast(scope int*)nsp;

Oct 27 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Walter Bright wrote:
 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

 
 That argues that "noscope" should be the default. Using "scope" would be 
 an optional optimization.
 
 BTW, "unshared" is the default. "shared" would be the keyword.

My point wasn't the number of keywords... ("shared" is actually the 
first keyword introduced that's conflicted with an identifier I've 
used). My point was the type system is getting incredibly complex. The 
theory that static typing is the solution to everything is what lead to 
the beast known as checked exceptions.

Oct 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

Robert Fraser wrote:
 My point wasn't the number of keywords... ("shared" is actually the 
 first keyword introduced that's conflicted with an identifier I've 
 used). My point was the type system is getting incredibly complex. The 
 theory that static typing is the solution to everything is what lead to 
 the beast known as checked exceptions.

The complexity is an issue that concerns me. That's why I suspect that 
if one doesn't use them, the defaults should work.

I wouldn't worry about checked exceptions. *Why* it's a disaster is well 
understood, and the reason isn't because it is complicated or does 
static checking.

Oct 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Fraser wrote:
 Walter Bright wrote:
 Robert Fraser wrote:
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

 BTW, "unshared" is the default. "shared" would be the keyword.

 
 My point wasn't the number of keywords... ("shared" is actually the 
 first keyword introduced that's conflicted with an identifier I've 
 used). My point was the type system is getting incredibly complex. The 
 theory that static typing is the solution to everything is what lead to 
 the beast known as checked exceptions.

I don't think you have a case.

Andrei

Oct 27 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> said:

 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

I don't think you have much choice. Take these examples:

	scope(int*)* a; // noscope pointer to a scope pointer.

	noscope(int*)* b; // scope pointer to a noscope pointer.

Only one of these two makes sense.

 - - -

On the other side, you could make a different syntax for scope than for 
const and shared, and then the noscope could be the default:

	int*scope* b; // scope pointer to a noscope pointer.

But that looks as attractive as const in C++.

 - - -

Hum, and please find a better name than "noscope".

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

Michel Fortin wrote:
 On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

 
 I don't think you have much choice. Take these examples:
 
     scope(int*)* a; // noscope pointer to a scope pointer.
 
     noscope(int*)* b; // scope pointer to a noscope pointer.
 
 Only one of these two makes sense.

scope is a storage class, not a type constructor.

Oct 27 2008

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Michel Fortin wrote:
 On 2008-10-27 18:15:24 -0400, Walter Bright <newshound1 digitalmars.com> 
 said:
 
 That argues that "noscope" should be the default. Using "scope" would 
 be an optional optimization.

 
 I don't think you have much choice. Take these examples:
 
     scope(int*)* a; // noscope pointer to a scope pointer.
 
     noscope(int*)* b; // scope pointer to a noscope pointer.
 
 Only one of these two makes sense.

 
 scope is a storage class, not a type constructor.

How do you treat members of objects passed in? If I pass in a struct with a
delegate in it, is it treated as scope too? What if it's an array? A class?

Oct 27 2008

Walter Bright <newshound1 digitalmars.com> writes:

Jason House wrote:
 scope is a storage class, not a type constructor.

 
 How do you treat members of objects passed in? If I pass in a struct
 with a delegate in it, is it treated as scope too? What if it's an
 array? A class?

The scope applies to the bits of the object, not what they may refer to.

Oct 27 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-28 00:28:27 -0400, Walter Bright <newshound1 digitalmars.com> said:

 Jason House wrote:
 Walter Bright wrote:
 scope is a storage class, not a type constructor.

 
 How do you treat members of objects passed in? If I pass in a struct
 with a delegate in it, is it treated as scope too? What if it's an
 array? A class?

 
 The scope applies to the bits of the object, not what they may refer to.

So basically, we always have head-scope. Here's my question:

	int** a;

	void foo() {
		scope int b;
		scope int* c = &b;
		scope int** d = &c;
		a = &c; // error, c is scope, can't copy address of scope to non-scope.
		a = d; // error? d is scope, but we're only making a copy of its bits.
		       // It's what d points to that is scope, but do we know about that?
	}

In this case, it's obvious that the last assignment (a = d) is bogus. 
Is there any plan in having this fail to compile? If so, where does it 
fail?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Michel Fortin Wrote:

 On 2008-10-28 00:28:27 -0400, Walter Bright <newshound1 digitalmars.com> said:
 
 Jason House wrote:
 Walter Bright wrote:
 scope is a storage class, not a type constructor.

 
 How do you treat members of objects passed in? If I pass in a struct
 with a delegate in it, is it treated as scope too? What if it's an
 array? A class?

 
 The scope applies to the bits of the object, not what they may refer to.

 
 So basically, we always have head-scope. Here's my question:
 
 	int** a;
 
 	void foo() {
 		scope int b;
 		scope int* c = &b;
 		scope int** d = &c;
 		a = &c; // error, c is scope, can't copy address of scope to non-scope.
 		a = d; // error? d is scope, but we're only making a copy of its bits.
 		       // It's what d points to that is scope, but do we know about that?
 	}

Your assignment to c discards the scope protection. Taking the address of scope
variables should be an error.



 
 In this case, it's obvious that the last assignment (a = d) is bogus. 
 Is there any plan in having this fail to compile? If so, where does it 
 fail?
 
 -- 
 Michel Fortin
 michel.fortin michelf.com
 http://michelf.com/

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 Jason House wrote:
 scope is a storage class, not a type constructor.

 
 How do you treat members of objects passed in? If I pass in a struct
 with a delegate in it, is it treated as scope too? What if it's an
 array? A class?

 
 The scope applies to the bits of the object, not what they may refer to.

This seems rather limiting. I know this is aimed at addressing the dynamic
closure problem. This solution would mean that I can't encapsulate delegates.
Ideally, I should be able to declare my encapsulating struct as scope or
noscope and manage the member delegate accordingly.

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 scope is a storage class, not a type constructor.

 How do you treat members of objects passed in? If I pass in a
 struct with a delegate in it, is it treated as scope too? What if
 it's an array? A class?

 The scope applies to the bits of the object, not what they may
 refer to.

 
 This seems rather limiting. I know this is aimed at addressing the
 dynamic closure problem. This solution would mean that I can't
 encapsulate delegates. Ideally, I should be able to declare my
 encapsulating struct as scope or noscope and manage the member
 delegate accordingly.

I think it's clear that scope is transitive as much as const or 
immutable are. Noscope is also transitive.

Escape analysis is a tricky business. My opinion is that we either take 
care of it properly or blissfully ignore the entire issue. That opinion 
may disagree a bit with Walter's, who'd prefer a quick patch for 
delegates so he returns to threading. I think if we opt for a quick 
patch now, it'll turn to gangrene later. Among other things, it will 
hurt the threading infrastructure it was supposed to give precedence to.


Andrei

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 scope is a storage class, not a type constructor.

 How do you treat members of objects passed in? If I pass in a
 struct with a delegate in it, is it treated as scope too? What if
 it's an array? A class?

 The scope applies to the bits of the object, not what they may
 refer to.

 
 This seems rather limiting. I know this is aimed at addressing the
 dynamic closure problem. This solution would mean that I can't
 encapsulate delegates. Ideally, I should be able to declare my
 encapsulating struct as scope or noscope and manage the member
 delegate accordingly.

 
 I think it's clear that scope is transitive as much as const or 
 immutable are. Noscope is also transitive.
 
 Escape analysis is a tricky business. My opinion is that we either take 
 care of it properly or blissfully ignore the entire issue. That opinion 
 may disagree a bit with Walter's, who'd prefer a quick patch for 
 delegates so he returns to threading. I think if we opt for a quick 
 patch now, it'll turn to gangrene later. Among other things, it will 
 hurt the threading infrastructure it was supposed to give precedence to.
 
 
 Andrei

Transitive scope means that scope can't be a storage class. It's a tricky
subject and threading is way more important to me. I'm fine with a quick fix, I
just don't want to pretend it's more than that.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Jason House wrote:
 Walter Bright Wrote:

 Jason House wrote:
 scope is a storage class, not a type constructor.

 How do you treat members of objects passed in? If I pass in a
 struct with a delegate in it, is it treated as scope too? What if
 it's an array? A class?

 The scope applies to the bits of the object, not what they may
 refer to.

 This seems rather limiting. I know this is aimed at addressing the
 dynamic closure problem. This solution would mean that I can't
 encapsulate delegates. Ideally, I should be able to declare my
 encapsulating struct as scope or noscope and manage the member
 delegate accordingly.

 I think it's clear that scope is transitive as much as const or immutable 
 are. Noscope is also transitive.

 Escape analysis is a tricky business. My opinion is that we either take 
 care of it properly or blissfully ignore the entire issue. That opinion 
 may disagree a bit with Walter's, who'd prefer a quick patch for delegates 
 so he returns to threading. I think if we opt for a quick patch now, it'll 
 turn to gangrene later. Among other things, it will hurt the threading 
 infrastructure it was supposed to give precedence to.

A quick patch is not possible IMO.

What I'd prefer is allocate closure when you can prove it, allow 
specification when you can't.  That is, allocate a closure automatically in 
simple cases like this:

int *f() {int x = 5; return &x;}

And in cases where you can't prove it, default to not allocating a closure, 
and allow the developer to specify that a closure is necessary:

int *f2(int *y){...}

int *f() <insert closure keyword here> {int x = 5; return f2(&x);}

Syntax to be debated ;)

I do *not* think the problem should be ignored (i.e. continue with the 
current D2 implementation).

-Steve

Oct 28 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Tue, 28 Oct 2008 16:54:15 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
[snip]
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically  
 in simple cases like this:

 int *f() {int x = 5; return &x;}

Hmm.. This is nice! You can implement 'new' in pure D in just a few lines:

template new(T)
{
     T* new(Args...)(Args args)
     {
         T t = T(args);
         return &t;
     }
}

Example:

struct Foo
{
     public this(int value) {
         this.value = value;
     }

     private int value;
}

Foo* foo = new!(Foo)(42);

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a closure,
 and allow the developer to specify that a closure is necessary:

So basically programmers have to memorize all the rules the compiler
uses to prove when it's necessary to allocate a closure, and then run
those rules in their heads to determine if the current line of code
will trigger allocation or not?

And when the compiler gets a little smarter, the programmers need to
get smarter, too.  In lock step.

That doesn't sound like a good solution to me.

--bb

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically 
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a 
 closure,
 and allow the developer to specify that a closure is necessary:

 So basically programmers have to memorize all the rules the compiler
 uses to prove when it's necessary to allocate a closure, and then run
 those rules in their heads to determine if the current line of code
 will trigger allocation or not?

First, the compiler does not have any sound rules for this.  It currently 
allocates a closure on a knee jerk reaction from taking the address of a 
stack variable.  And its either this or substitute in your statement "prove 
when it's *not* necessary to allocate a closure", which is about as hard and 
probably 10x more common.

Second, for 90% of functions that don't require you to allocate closures, 
you don't have to think about any rules.

For the 9% of functions which return a pointer to local data, proven by the 
compiler, you don't have to think about rules.

For the last 1% of functions, the documentation should clarify how your data 
can escape, and then you have to think about how that affects your usage of 
it.  The docs could say 'best to allocate a closure unless you know what you 
are doing'.

 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

Not really.  If the compiler can some day store the scope dependency 
information in the object file (and get rid of reading source to determine 
function signature), then this whole manual requirement goes away.

 That doesn't sound like a good solution to me.

Then let's go back to D1's solution -- no closures ;)

For example, NONE of tango uses closures (as evidenced by the fact that it's 
D1), and it uses pointers to stack data very often (to improve performance). 
So if closure-by-default is the choice, then I'll have to mark all these 
usages as non-closure, which is going to make the whole code base look 
awful.

With the way Walter is thinking of implementing, it might be impossible to 
specify correctly.

-Steve

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Oct 29, 2008 at 4:56 AM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure automatically
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a
 closure,
 and allow the developer to specify that a closure is necessary:

 So basically programmers have to memorize all the rules the compiler
 uses to prove when it's necessary to allocate a closure, and then run
 those rules in their heads to determine if the current line of code
 will trigger allocation or not?

 First, the compiler does not have any sound rules for this.  It currently
 allocates a closure on a knee jerk reaction from taking the address of a
 stack variable.  And its either this or substitute in your statement "prove
 when it's *not* necessary to allocate a closure", which is about as hard and
 probably 10x more common.

 Second, for 90% of functions that don't require you to allocate closures,
 you don't have to think about any rules.

I don't see why not.  Because the compiler might be allocating a
closure when I don't want it to, killing  performance.  So I'll either
be surprised later, or I need to think about it when I'm writing that
line of code.

 For the 9% of functions which return a pointer to local data, proven by the
 compiler, you don't have to think about rules.

Except didn't you just give us some examples where the function does
things that escape in the local sense, but can be seen not to escape
when examining the full context?

So in those 9% of the cases I may also want to think about what the
compiler will do to avoid unnecessary hidden allocations in my code.
And if I am getting one of these unnecessary allocations, then I will
have to think about how to rearrange my code so that the compiler
doesn't get tricked.  But it could be a library function that's
causing it.

 For the last 1% of functions, the documentation should clarify how your data
 can escape, and then you have to think about how that affects your usage of
 it.  The docs could say 'best to allocate a closure unless you know what you
 are doing'.

 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

 Not really.  If the compiler can some day store the scope dependency
 information in the object file (and get rid of reading source to determine
 function signature), then this whole manual requirement goes away.

Until the compiler can do the right thing 100% of the time, I have to
be on the lookout for spurious allocations.

 That doesn't sound like a good solution to me.

 Then let's go back to D1's solution -- no closures ;)

Sure.  But if you're going to do that, then at least give us an easy
way to explicitly request a closure for those of us who know we need
one and when we don't. :-)

--bb

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 On Wed, Oct 29, 2008 at 4:56 AM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 "Bill Baxter" wrote
 On Tue, Oct 28, 2008 at 10:54 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 What I'd prefer is allocate closure when you can prove it, allow
 specification when you can't.  That is, allocate a closure
 automatically
 in
 simple cases like this:

 int *f() {int x = 5; return &x;}

 And in cases where you can't prove it, default to not allocating a
 closure,
 and allow the developer to specify that a closure is necessary:

 So basically programmers have to memorize all the rules the compiler
 uses to prove when it's necessary to allocate a closure, and then run
 those rules in their heads to determine if the current line of code
 will trigger allocation or not?

 First, the compiler does not have any sound rules for this.  It currently
 allocates a closure on a knee jerk reaction from taking the address of a
 stack variable.  And its either this or substitute in your statement
 "prove
 when it's *not* necessary to allocate a closure", which is about as hard
 and
 probably 10x more common.

 Second, for 90% of functions that don't require you to allocate closures,
 you don't have to think about any rules.

 I don't see why not.  Because the compiler might be allocating a
 closure when I don't want it to, killing  performance.  So I'll either
 be surprised later, or I need to think about it when I'm writing that
 line of code.

No, I'm proposing the compiler SHOULDN'T allocate closures unless it can
prove without a shadow of a doubt that a closure is required.  i.e. it
defaults to D1 behavior, which should cover 90% of functions today.

 For the 9% of functions which return a pointer to local data, proven by
 the
 compiler, you don't have to think about rules.

 Except didn't you just give us some examples where the function does
 things that escape in the local sense, but can be seen not to escape
 when examining the full context?

This is what I'm thinking as proven by the compiler:

int *f()
{
   int x = 5;
   return &x;
}

There is no doubt that this will cause an escape.  A more common scenario (I
just ran into this with a newb on irc):

char[] readData(InputStream s)
{
   char[64] buf;
   auto len = s.read(buf);
   return buf[0..len];
}

 So in those 9% of the cases I may also want to think about what the
 compiler will do to avoid unnecessary hidden allocations in my code.
 And if I am getting one of these unnecessary allocations, then I will
 have to think about how to rearrange my code so that the compiler
 doesn't get tricked.  But it could be a library function that's
 causing it.

I'm starting to think that if you compile with warnings on, these 9% of
functions shouldn't compile.  Perhaps they shouldn't compile by default
since it's very easy to do this kind of stuff explicitly without closures.

 For the last 1% of functions, the documentation should clarify how your
 data
 can escape, and then you have to think about how that affects your usage
 of
 it.  The docs could say 'best to allocate a closure unless you know what
 you
 are doing'.

 And when the compiler gets a little smarter, the programmers need to
 get smarter, too.  In lock step.

 Not really.  If the compiler can some day store the scope dependency
 information in the object file (and get rid of reading source to
 determine
 function signature), then this whole manual requirement goes away.

 Until the compiler can do the right thing 100% of the time, I have to
 be on the lookout for spurious allocations.

I'm saying no automatic closures unless it's absolutely provable that an
escape occurs.

 That doesn't sound like a good solution to me.

 Then let's go back to D1's solution -- no closures ;)

 Sure.  But if you're going to do that, then at least give us an easy
 way to explicitly request a closure for those of us who know we need
 one and when we don't. :-)

Assuming the compiler does not ever allocate closures needlessly, I agree
with a way to specify when closures should occur, but not when they should
not (since there's no need).

-Steve

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu wrote:
 
 Escape analysis is a tricky business. My opinion is that we either take 
 care of it properly or blissfully ignore the entire issue. That opinion 
 may disagree a bit with Walter's, who'd prefer a quick patch for 
 delegates so he returns to threading. I think if we opt for a quick 
 patch now, it'll turn to gangrene later. Among other things, it will 
 hurt the threading infrastructure it was supposed to give precedence to.

Like const, I'd rather have no solution than a bad solution insofar as 
escape analysis is concerned.


Sean

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Oct 29, 2008 at 1:04 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu wrote:
 Escape analysis is a tricky business. My opinion is that we either take
 care of it properly or blissfully ignore the entire issue. That opinion may
 disagree a bit with Walter's, who'd prefer a quick patch for delegates so he
 returns to threading. I think if we opt for a quick patch now, it'll turn to
 gangrene later. Among other things, it will hurt the threading
 infrastructure it was supposed to give precedence to.

 Like const, I'd rather have no solution than a bad solution insofar as
 escape analysis is concerned.

The only serious problem people have right now is that closures are
allocated automatically when they may not need to be.

Making closure allocation manual for now seems like the most
future-compatible way to fix things.  In some nebulous future, the
manual allocation could become unnecessary, or it could become
compiler-checked, but it seems to me that for now just making it
manual does the least harm and lets Walter get back to work on other
things.

--bb

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 1:04 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu wrote:
 Escape analysis is a tricky business. My opinion is that we either take
 care of it properly or blissfully ignore the entire issue. That opinion may
 disagree a bit with Walter's, who'd prefer a quick patch for delegates so he
 returns to threading. I think if we opt for a quick patch now, it'll turn to
 gangrene later. Among other things, it will hurt the threading
 infrastructure it was supposed to give precedence to.

 Like const, I'd rather have no solution than a bad solution insofar as
 escape analysis is concerned.

 
 The only serious problem people have right now is that closures are
 allocated automatically when they may not need to be.
 
 Making closure allocation manual for now seems like the most
 future-compatible way to fix things.  In some nebulous future, the
 manual allocation could become unnecessary, or it could become
 compiler-checked, but it seems to me that for now just making it
 manual does the least harm and lets Walter get back to work on other
 things.

This would be the most backwards-compatible way also.  The only real 
argument against it in my mind is that it makes the default behavior the 
unsafe behavior.  I don't think this is a big deal given what I see as 
the target market for D, but then I don't see a point in SafeD either, 
for the same reason.  The syntax seems like it should be pretty 
straightforward: use 'new' (Andrei will love that ;-)):

     void fn( int delegate() dg );

     void main()
     {
         int x;

         int getX() { return x; }

         // static closure
         fn( &getX );

         // dynamic closure
         fn( new &getX );
     }

That said, the fact that some function calls will always be opaque 
suggests to me that automatic escape analysis will never be possible in 
all situations.  Therefore, we'll likely need something roughly similar 
to the proposed keyword eventually.  So perhaps it really is worth 
considering adding some sort of 'noscope' storage class now:

     // generates a dynamic closure
     noscope int delegate() dg = &getX;

I do think, however, that 'scope' should be the default behavior, for 
two reasons.  It's backwards-compatible, which is handy.  But more 
importantly, I'd say that probably 95% of the current uses of delegates 
are scoped, and that isn't likely to shift all the way to 50% even if D 
moved to a much more functional style of programming.  Algorithms, for 
example, all use scoped delegates, which I'd say is far and away their 
most common current use.


Sean

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for 
 two reasons.  It's backwards-compatible, which is handy.  But more 
 importantly, I'd say that probably 95% of the current uses of delegates 
 are scoped, and that isn't likely to shift all the way to 50% even if D 
 moved to a much more functional style of programming.  Algorithms, for 
 example, all use scoped delegates, which I'd say is far and away their 
 most common current use.

The counter to that is that when there is an inadvertent escape of a 
reference, the error is often undetectable even while it silently 
corrupts data and behaves erratically.

In other words (as Andrei pointed out to me) the cost of those errors, 
even though rare, is very high. This makes it highly desirable to 
prevent them automatically, rather than relying on the skill and 
attention to detail of the programmer.

Contrast that with, say, a null pointer bug which results in an 
unambiguous sudden halt to the program with a clear indication of what 
happened.

The 'scope' storage class also has a future in that it is possible using 
data flow analysis to statically verify it.

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for 
 two reasons.  It's backwards-compatible, which is handy.  But more 
 importantly, I'd say that probably 95% of the current uses of 
 delegates are scoped, and that isn't likely to shift all the way to 
 50% even if D moved to a much more functional style of programming.  
 Algorithms, for example, all use scoped delegates, which I'd say is 
 far and away their most common current use.

 
 The counter to that is that when there is an inadvertent escape of a 
 reference, the error is often undetectable even while it silently 
 corrupts data and behaves erratically.
 
 In other words (as Andrei pointed out to me) the cost of those errors, 
 even though rare, is very high. This makes it highly desirable to 
 prevent them automatically, rather than relying on the skill and 
 attention to detail of the programmer.

I think the cost/benefit of this could probably be argued either way. 
I've never encountered a bug related to this, for example, so to me the 
benefit is entirely theoretical while the cost is immediate.

 The 'scope' storage class also has a future in that it is possible using 
 data flow analysis to statically verify it.

This is the real benefit in my mind.  From a "features I want in a 
systems programming language" standpoint I absolutely do not want 
default dynamic closures (today at any rate).  However, just like 
'const' I very much appreciate that this approach allows for static 
verification.  So as much as I hate to say so I think that default 
dynamic closures would be the best long-term option for D.  The cost of 
DMA will continue to come down anyway, and once a codebase is converted 
it probably won't be too difficult to maintain going forward.


Sean

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors, 
 even though rare, is very high. This makes it highly desirable to 
 prevent them automatically, rather than relying on the skill and 
 attention to detail of the programmer.

 
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me the 
 benefit is entirely theoretical while the cost is immediate.

As the author of an open source multithreaded application in D1, I've had these
errors pop up.  It's easy to overlook this stuff and pass things the wrong way
(it's easier to code).  Tango doesn't even have a bind library to make it
easier!

Oct 28 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Tue, Oct 28, 2008 at 6:29 PM, Jason House
<jason.james.house gmail.com> wrote:
 Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent them automatically, rather than relying on the skill and
 attention to detail of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

 As the author of an open source multithreaded application in D1, I've had
these errors pop up.  It's easy to overlook this stuff and pass things the
wrong way (it's easier to code).  Tango doesn't even have a bind library to
make it easier!

For what it's worth, std.bind I think depends on one Phobos-specific
function.  It would probably take a matter of a minute or two to port
it to work with Tango.

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Jarrett Billingsley Wrote:

 On Tue, Oct 28, 2008 at 6:29 PM, Jason House
 <jason.james.house gmail.com> wrote:
 Sean Kelly Wrote:

 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent them automatically, rather than relying on the skill and
 attention to detail of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

 As the author of an open source multithreaded application in D1, I've had
these errors pop up.  It's easy to overlook this stuff and pass things the
wrong way (it's easier to code).  Tango doesn't even have a bind library to
make it easier!

 
 For what it's worth, std.bind I think depends on one Phobos-specific
 function.  It would probably take a matter of a minute or two to port
 it to work with Tango.

I ported a bind implementation and maintain it in my code base. I didn't
mention that because I still maintain hope that Tango will add it. The last
time I asked the Tango folks why they didn't have it, the answer was something
to the effect of "we don't recognize a need for it"

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Sean Kelly Wrote:
 
 Walter Bright wrote:
 In other words (as Andrei pointed out to me) the cost of those
 errors, even though rare, is very high. This makes it highly
 desirable to prevent them automatically, rather than relying on
 the skill and attention to detail of the programmer.

 I think the cost/benefit of this could probably be argued either
 way. I've never encountered a bug related to this, for example, so
 to me the benefit is entirely theoretical while the cost is
 immediate.

 
 As the author of an open source multithreaded application in D1, I've
 had these errors pop up.  It's easy to overlook this stuff and pass
 things the wrong way (it's easier to code).  Tango doesn't even have
 a bind library to make it easier!

I agree. Particularly in higher-order code this kind of problem is bound 
to show itself. And it's a really really nasty case of reality ripping 
straight through a carefully-conceived abstraction - something like a 
bullet carving through a precision microprocessor.

Andrei

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for two
 reasons.  It's backwards-compatible, which is handy.  But more importantly,
 I'd say that probably 95% of the current uses of delegates are scoped, and
 that isn't likely to shift all the way to 50% even if D moved to a much more
 functional style of programming.  Algorithms, for example, all use scoped
 delegates, which I'd say is far and away their most common current use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to prevent
 them automatically, rather than relying on the skill and attention to detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way. I've
 never encountered a bug related to this, for example, so to me the benefit
 is entirely theoretical while the cost is immediate.

I've had bugs caused by this but they were pretty easy to find.
Some delegate I'm calling crashes and all the variables are
nonsensical garbage...
Hmm maybe I was using out-of-scope variables in that delegate that I
wasn't supposed to?

Maybe there are real cases where the bugs caused are harder to find.
But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
when I've had them they've been pretty easy to find.

--bb

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org> wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for two
 reasons.  It's backwards-compatible, which is handy.  But more importantly,
 I'd say that probably 95% of the current uses of delegates are scoped, and
 that isn't likely to shift all the way to 50% even if D moved to a much more
 functional style of programming.  Algorithms, for example, all use scoped
 delegates, which I'd say is far and away their most common current use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to prevent
 them automatically, rather than relying on the skill and attention to detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way. I've
 never encountered a bug related to this, for example, so to me the benefit
 is entirely theoretical while the cost is immediate.

 
 I've had bugs caused by this but they were pretty easy to find.
 Some delegate I'm calling crashes and all the variables are
 nonsensical garbage...
 Hmm maybe I was using out-of-scope variables in that delegate that I
 wasn't supposed to?
 
 Maybe there are real cases where the bugs caused are harder to find.
 But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
 when I've had them they've been pretty easy to find.

I don't think we can afford program correctness to rest on anecdote and 
"it works for me". That age is long gone.

Andrei

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I don't think we can afford program correctness to rest on anecdote and 
 "it works for me". That age is long gone.

I agree. When you're managing a program with a million lines of code in 
it, there is great value in being able to *prove* that it does not 
suffer from as many kinds of bugs as practical, especially memory 
corruption bugs.

Think of buffer overflow bugs, for example. Think of all the grief that 
would have been saved if the C/C++ compiler could prove that buffer 
overflows could not happen.

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently
 corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent
 them automatically, rather than relying on the skill and attention to
 detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've
 never encountered a bug related to this, for example, so to me the
 benefit
 is entirely theoretical while the cost is immediate.

 I've had bugs caused by this but they were pretty easy to find.
 Some delegate I'm calling crashes and all the variables are
 nonsensical garbage...
 Hmm maybe I was using out-of-scope variables in that delegate that I
 wasn't supposed to?

 Maybe there are real cases where the bugs caused are harder to find.
 But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
 when I've had them they've been pretty easy to find.

 I don't think we can afford program correctness to rest on anecdote and "it
 works for me". That age is long gone.

I haven't seen any real data about how serious a problem this is from
you either.
Chasing bogeymen is at least as bad as ignoring real problems.

--bb

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently
 corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those errors,
 even though rare, is very high. This makes it highly desirable to
 prevent
 them automatically, rather than relying on the skill and attention to
 detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've
 never encountered a bug related to this, for example, so to me the
 benefit
 is entirely theoretical while the cost is immediate.

 I've had bugs caused by this but they were pretty easy to find.
 Some delegate I'm calling crashes and all the variables are
 nonsensical garbage...
 Hmm maybe I was using out-of-scope variables in that delegate that I
 wasn't supposed to?

 Maybe there are real cases where the bugs caused are harder to find.
 But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
 when I've had them they've been pretty easy to find.

 I don't think we can afford program correctness to rest on anecdote and "it
 works for me". That age is long gone.

 
 I haven't seen any real data about how serious a problem this is from
 you either.
 Chasing bogeymen is at least as bad as ignoring real problems.

Well to provide real data I'd have to spend time on user studies, which 
would be time-intensive. I also think it's not an interesting research 
problem because it is generally accepted in the community that memory 
un-safety is a source of problems. So I don't quite feel burdened with 
the need to provide a proof. Reframing the problem as chasing a bogeyman 
won't help with addressing it.

Andrei

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior, 
 for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are 
 scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to 
 a much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common 
 current use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently
 corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those 
 errors,
 even though rare, is very high. This makes it highly desirable to
 prevent
 them automatically, rather than relying on the skill and attention to
 detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've
 never encountered a bug related to this, for example, so to me the
 benefit
 is entirely theoretical while the cost is immediate.

 I've had bugs caused by this but they were pretty easy to find.
 Some delegate I'm calling crashes and all the variables are
 nonsensical garbage...
 Hmm maybe I was using out-of-scope variables in that delegate that I
 wasn't supposed to?

 Maybe there are real cases where the bugs caused are harder to find.
 But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
 when I've had them they've been pretty easy to find.

 I don't think we can afford program correctness to rest on anecdote 
 and "it
 works for me". That age is long gone.

 I haven't seen any real data about how serious a problem this is from
 you either.
 Chasing bogeymen is at least as bad as ignoring real problems.

 
 Well to provide real data I'd have to spend time on user studies, which 
 would be time-intensive. I also think it's not an interesting research 
 problem because it is generally accepted in the community that memory 
 un-safety is a source of problems. So I don't quite feel burdened with 
 the need to provide a proof. Reframing the problem as chasing a bogeyman 
 won't help with addressing it.
 
 Andrei

I just wanted to issue an apology to Bill for the above, which is 
brusque and demeaning. He was delicate enough to email me privately what 
he thought about my response, and in very levelheaded terms. After 
having answered privately as well, I thought I'd post a public apology; 
it would be quite unethical to apologize in private for a public remark! 
Hopefully this helps with undoing the damage and with keeping the recent 
streak of good discussions going.


Andrei

Oct 29 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Oct 30, 2008 at 12:21 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 11:40 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 7:23 AM, Sean Kelly <sean invisibleduck.org>
 wrote:
 Walter Bright wrote:
 Sean Kelly wrote:
 I do think, however, that 'scope' should be the default behavior,
 for
 two
 reasons.  It's backwards-compatible, which is handy.  But more
 importantly,
 I'd say that probably 95% of the current uses of delegates are
 scoped,
 and
 that isn't likely to shift all the way to 50% even if D moved to a
 much
 more
 functional style of programming.  Algorithms, for example, all use
 scoped
 delegates, which I'd say is far and away their most common current
 use.

 The counter to that is that when there is an inadvertent escape of a
 reference, the error is often undetectable even while it silently
 corrupts
 data and behaves erratically.

 In other words (as Andrei pointed out to me) the cost of those
 errors,
 even though rare, is very high. This makes it highly desirable to
 prevent
 them automatically, rather than relying on the skill and attention to
 detail
 of the programmer.

 I think the cost/benefit of this could probably be argued either way.
 I've
 never encountered a bug related to this, for example, so to me the
 benefit
 is entirely theoretical while the cost is immediate.

 I've had bugs caused by this but they were pretty easy to find.
 Some delegate I'm calling crashes and all the variables are
 nonsensical garbage...
 Hmm maybe I was using out-of-scope variables in that delegate that I
 wasn't supposed to?

 Maybe there are real cases where the bugs caused are harder to find.
 But I'll just add my 2c to Sean's.  I haven't had many such bugs, and
 when I've had them they've been pretty easy to find.

 I don't think we can afford program correctness to rest on anecdote and
 "it
 works for me". That age is long gone.

 I haven't seen any real data about how serious a problem this is from
 you either.
 Chasing bogeymen is at least as bad as ignoring real problems.

 Well to provide real data I'd have to spend time on user studies, which
 would be time-intensive. I also think it's not an interesting research
 problem because it is generally accepted in the community that memory
 un-safety is a source of problems. So I don't quite feel burdened with the
 need to provide a proof. Reframing the problem as chasing a bogeyman won't
 help with addressing it.

 Andrei

 I just wanted to issue an apology to Bill for the above, which is brusque
 and demeaning. He was delicate enough to email me privately what he thought
 about my response, and in very levelheaded terms. After having answered
 privately as well, I thought I'd post a public apology; it would be quite
 unethical to apologize in private for a public remark! Hopefully this helps
 with undoing the damage and with keeping the recent streak of good
 discussions going.

No problem.  My comment leading to that response was a bit snarky too.
 Though I tried really hard not to make it snarky.  It still is
basically saying "I you are but what am I?"

Back to the technical topic, as I told Andrei, all I want is some
solution that doesn't kill performance with lots of hidden memory
allocations.
I doubt that's something anyone really wants, so all this huffing and
puffing about it probably isn't necessary.

--bb

Oct 29 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Bill Baxter" wrote
 Back to the technical topic, as I told Andrei, all I want is some
 solution that doesn't kill performance with lots of hidden memory
 allocations.
 I doubt that's something anyone really wants, so all this huffing and
 puffing about it probably isn't necessary.

I doubt anyone wants that.  But here is my main concern (my defense for 
huffing):

One of my main goals for D at the moment is to have Tango compile on D2. 
Right now, I'm slowly getting everything constified, and dealing with small 
design changes to make that happen (and filing bugs that I find).

However, when dissecting solutions to unnecessary dynamic closures, I want 
to make sure that the solution does not force Tango to change its overall 
design.  Right now, with Walter's proposal, I fear a large amount of scope 
decorations would be necessary (making the api very unattractive), and 
possibly some of the ways Tango uses stack variables might be made 
uncompilable.  I would like to avoid that.  It has happened in the past that 
things considered closed on D2 did not work with Tango because the main code 
used to test D2 (Phobos) does not have a similar design, and does not use 
the same features as Tango does.

When I think a solution solves the problem, and will allow Tango to compile, 
I'll stop my whining ;)

-Steve

Oct 30 2008

ore-sama <spam here.lot> writes:

Steven Schveighoffer Wrote:

 Right now, with Walter's proposal, I fear a large amount of scope 
 decorations would be necessary (making the api very unattractive)

Moreover that sematics in some cases will force allocation when it's not needed.

Oct 30 2008

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me the 
 benefit is entirely theoretical while the cost is immediate.

I have. Not often in my own code because I am very careful to avoid it, 
but it frequently happens in 'bug' reports I get sent. This trap does 
happen to programmers who are less familiar with how the underlying 
stack machine actually works.

The real problem is there is no way to verify that this isn't happening 
in some arbitrarily large code base. I strongly believe that it is good 
for D and for programming languages in general to work towards a design 
that can provably eliminate certain types of bugs.

Oct 28 2008

Sean Kelly <sean invisibleduck.org> writes:

Walter Bright wrote:
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me 
 the benefit is entirely theoretical while the cost is immediate.

 
 I have. Not often in my own code because I am very careful to avoid it, 
 but it frequently happens in 'bug' reports I get sent. This trap does 
 happen to programmers who are less familiar with how the underlying 
 stack machine actually works.

I tend to ask a question along these lines to entry-level interviewees 
and it's surprising how often they get it wrong.  So I agree that this 
is a fair point.  I mostly brought up this argument because C++ is 
unapologetically designed for experts and I'm occasionally inclined to 
view D the same way... even though its goal is really somewhat different.

 The real problem is there is no way to verify that this isn't happening 
 in some arbitrarily large code base. I strongly believe that it is good 
 for D and for programming languages in general to work towards a design 
 that can provably eliminate certain types of bugs.

I agree, which is why I'm actually in favor of this despite what I said 
above.


Sean

Oct 28 2008

Walter Bright <newshound1 digitalmars.com> writes:

Sean Kelly wrote:
 I tend to ask a question along these lines to entry-level interviewees 
 and it's surprising how often they get it wrong.  So I agree that this 
 is a fair point.  I mostly brought up this argument because C++ is 
 unapologetically designed for experts and I'm occasionally inclined to 
 view D the same way... even though its goal is really somewhat different.

To me that is akin to building a car with no brakes and justifying it by 
saying it is "designed for experts." Sure, an expert who never makes any 
mistakes could effectively drive such a car. The trouble is, the road is 
full of non-expert drivers the expert ones are forced to interact with, 
and even the experts still make mistakes now and then.

I don't believe that having brakes impairs the performance of my car one 
bit.

I also would not say that C++ was deliberately designed without brakes, 
it just kinda worked out that way. We have the benefit of hindsight in 
designing D.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way. 
 I've never encountered a bug related to this, for example, so to me the 
 benefit is entirely theoretical while the cost is immediate.

 I have. Not often in my own code because I am very careful to avoid it, 
 but it frequently happens in 'bug' reports I get sent. This trap does 
 happen to programmers who are less familiar with how the underlying stack 
 machine actually works.

 The real problem is there is no way to verify that this isn't happening in 
 some arbitrarily large code base. I strongly believe that it is good for D 
 and for programming languages in general to work towards a design that can 
 provably eliminate certain types of bugs.

I agree with this.  It would be nice to be able to flag these kinds of 
things.  Even if it was a warning and not a true error.  Just not a solution 
which silently allocates data that shouldn't be allocated.

This would be a great candidate for a lint tool.

-Steve

Oct 28 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Wed, Oct 29, 2008 at 1:04 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

 I have. Not often in my own code because I am very careful to avoid it,
 but it frequently happens in 'bug' reports I get sent. This trap does
 happen to programmers who are less familiar with how the underlying stack
 machine actually works.

 The real problem is there is no way to verify that this isn't happening in
 some arbitrarily large code base. I strongly believe that it is good for D
 and for programming languages in general to work towards a design that can
 provably eliminate certain types of bugs.

 I agree with this.  It would be nice to be able to flag these kinds of
 things.  Even if it was a warning and not a true error.  Just not a solution
 which silently allocates data that shouldn't be allocated.


Ok, I think we're completely on the same page here.  I'm for the
compiler finding bugs.  But I'm not for the compiler being
conservative and allocating memory when it doesn't have to, as it does
currently.

--bb

Oct 28 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Bill Baxter wrote:
 On Wed, Oct 29, 2008 at 1:04 PM, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 "Walter Bright" wrote
 Sean Kelly wrote:
 I think the cost/benefit of this could probably be argued either way.
 I've never encountered a bug related to this, for example, so to me the
 benefit is entirely theoretical while the cost is immediate.

 I have. Not often in my own code because I am very careful to avoid it,
 but it frequently happens in 'bug' reports I get sent. This trap does
 happen to programmers who are less familiar with how the underlying stack
 machine actually works.

 The real problem is there is no way to verify that this isn't happening in
 some arbitrarily large code base. I strongly believe that it is good for D
 and for programming languages in general to work towards a design that can
 provably eliminate certain types of bugs.

 I agree with this.  It would be nice to be able to flag these kinds of
 things.  Even if it was a warning and not a true error.  Just not a solution
 which silently allocates data that shouldn't be allocated.

 
 
 Ok, I think we're completely on the same page here.  I'm for the
 compiler finding bugs.  But I'm not for the compiler being
 conservative and allocating memory when it doesn't have to, as it does
 currently.
 
 --bb

How about adding a warning switch (I know Walter you're against them but 
it might be justified here) that would flag all the closure allocations. 
I know that should be the job of a "lint" tool, but the compiler already 
has the context here.

Oct 29 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 27 Oct 2008 23:05:48 -0400, Walter Bright  
<newshound1 digitalmars.com> wrote:
 scope is a storage class, not a type constructor.

Okay, I'm confused. I had assumed that the escape scope was different from  
the storage scope as the storage scope has a few known problems with  
regard to escape analysis as currently defined
e.g.
class Node { Node next };

void append(scope Node a) {
     scope b = new Node();
     a.next = b; // b just escaped
}

scope const also has similar issues. So is the plan for the compilier  
going to do a static escape analysis based on the funtion signiture?

Alternatively, a deep type which prevents assignment except at declaration  
grantees (I think) no escape.

Oct 27 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Fraser wrote:
 Walter Bright Wrote:
 
 The delegate closure issue is part of a wider issue - escape
 analysis. A reference is said to 'escape' a scope if it, well,
 leaves that scope. Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster.
  Another form of escaping:
 
 int* p; void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j) { bar(&j); }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be
 escaping by default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes void bar(scope int* p);
 // p does not escape void bar(int* p);            // what should be
 the default?
 
 What should be the default? The functional programmer would
 probably choose scope as the default, and the OOP programmer
 noscope.
 
 (The issue with delegates is we need the dynamic closure only if
 the delegate 'escapes'.)

 
 I get the feeling that D's type system is going to become the joke of
 the programming world. Are we really going to have to worry about a
 scope unshared(invariant(int)*) ...? What other type modifiers can we
 put on that?

This is a misunderstanding. Scope is a storage class, not a type
modifier, so it's not as pervasive as you may think.

Andrei

Oct 27 2008

Mosfet <mosfet anonymous.org> writes:

Robert Fraser wrote:
 Walter Bright Wrote:
 
 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
      bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

 
 I get the feeling that D's type system is going to become the joke of the
programming world. Are we really going to have to worry about a scope
unshared(invariant(int)*) ...? What other type modifiers can we put on that?

I agree I think that D will be used only by people like you that 
understand all this shared/scope/mutable/lazy things.
I thought C++ was complex and difficult to learn but I think I was wrong.

Oct 28 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Mosfet wrote:
 Robert Fraser wrote:
 Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape 
 analysis. A reference is said to 'escape' a scope if it, well, leaves 
 that scope. Here's a trivial example:

 int* foo() { int i; return &i; }

 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:

 int* p;
 void bar(int* x) { p = x; }

 which is, on the surface, legitimate, but fails for:

 void abc(int j)
 {
      bar(&j);
 }

 This kind of problem is currently undetectable by the compiler.

 The first step is, are function parameters considered to be escaping 
 by default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

 I get the feeling that D's type system is going to become the joke of 
 the programming world. Are we really going to have to worry about a 
 scope unshared(invariant(int)*) ...? What other type modifiers can we 
 put on that?

 
 I agree I think that D will be used only by people like you that 
 understand all this shared/scope/mutable/lazy things.
 I thought C++ was complex and difficult to learn but I think I was wrong.

Well I think you were right. The question is how much you spend learning 
things that are actually useful, versus learning gratuitous complexity. 
I think D is much more rewarding per unit of effort invested than C++.

Andrei

Oct 28 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 Well I think you were right. The question is how much you spend learning
 things that are actually useful, versus learning gratuitous complexity.
 I think D is much more rewarding per unit of effort invested than C++.
 Andrei

Seconded.  Both C++ and D are very complex languages, but I don't see that as a
problem.  As Bjarne <insert correct spelling of his last name here> would say,
"Complexity has to go somewhere."  If you oversimplify the core language, you
end
up acting has a human compiler to make your code fit within the confines of the
simple language.  See Java and C.

The real problem with C++ is not complexity per se, but cruft, the fact that
it's
a low-level language masquerading as a high-level language, and the complete
ignorance of convenience as a design goal.

This can be exemplified just by examining how arrays "work" in C++.  First, you
have the cruft of C arrays that are very low-level and really aren't good for
much, except making things more confusing.  To get around this without doing
anything to the core language, C++ adds vector to the STL.  This is fine, except
that you have no vector literals, no slice syntax, horrible error messages,
inefficient copying semantics by default, vectors can't be used in
metaprogramming, etc.  It works, but it's not very convenient.  Furthermore, the
reason you have no nice slice syntax or default reference semantics is because
you
have no garbage collection.

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Walter Bright Wrote:

 The delegate closure issue is part of a wider issue - escape analysis. A 
 reference is said to 'escape' a scope if it, well, leaves that scope. 
 Here's a trivial example:
 
 int* foo() { int i; return &i; }
 
 The reference to i escapes the scope of i, thus courting disaster. 
 Another form of escaping:
 
 int* p;
 void bar(int* x) { p = x; }
 
 which is, on the surface, legitimate, but fails for:
 
 void abc(int j)
 {
      bar(&j);
 }
 
 This kind of problem is currently undetectable by the compiler.
 
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

In D1, local variables implicitly follow a mixed rule:
Objects are noscope
Primitive types are scope
Structs are headscope

Headscope may be a bit of a misnomer because member scope is type dependent.

When it comes to escaping, do we need transitive scope? I currently can't
imagine that without allowing some exceptions. That insane path seems to lead
to 3 scopes for membervariables...

Oct 27 2008

ore-sama <spam here.lot> writes:

Allocation is determined on delegate creation, not on passing it somewhere
else, isn't it? Closure allocation is a caller's task, so it's responsible and
should be able to control this. Documentation is generally not needed, usually
it's quite obvious, what's going on. Default should be alloc by default, and
programmer should be able to track allocations with compiler's help, if he
wants.

Oct 28 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by 
 default or not by default? I.e.:
 
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.

I'm for safe defaults.  Programs shouldn't crash for no reason.

Here are my thoughts on escape analysis.  Sorry if they're obvious.

I think it is possible to detect whether a reference escapes or not in
the absence of function calls by analyzing an expression graph.

Assigning to a global state variable is an ultimate escape.

In the worst case, when only the current function can be analyzed and no
meta-info is available about other functions, the compiler must assume a
reference escapes if it is passed as an argument to another function.
This is the current D2 behavior.

Pure functions provide some meta-info because any reference passed as an
argument can only escape via a reference return value or other mutable
reference arguments.  This makes escape analysis possible even after an
unknown pure function is called.

For any function in a tree of imported modules the compiler could keep
some meta-data about which argument escapes where, if at all.  This way
even regular functions can participate in escape analysis without
blowing it up.

An argument to a virtual function call always escapes by default.  It
may be possible to declare an argument as non-escaping (scope?) and
compiler should then enforce non-escaping contract upon any overriding
functions.

An argument to a function declared as a prototype always escapes by
default.  It may be possible for the compiler to export some meta-info
along with the prototype when a .di file is generated, whether an
argument is guaranteed to not escape, or maybe even detailed info about
which argument escapes where, to mimic the compile-time meta-info.

The expression graph analysis should be the first step towards safe
stack closures.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.

 I'm for safe defaults.  Programs shouldn't crash for no reason.

If safe defaults means 75% performance decrease, I'm for using unsafe 
defaults that are safe 99% of the time, with the ability to make them 100% 
safe if needed.

 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

Yes, but not in D, since import uses uncompiled files as input.

 Assigning to a global state variable is an ultimate escape.

Agree there.

 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

This leads to the current situation, where you have a huge performance 
decrease for little or no gain in reliability.

 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

Good point.  Easy analysis on pure functions.

 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

Where is the data kept?  It must be in the object file, and d imports must 
then read the object file for api instead of the source file.  I don't think 
it's worth anything to break the single file for imports/code model. 
Requiring a .di file is a little iffy as it is today.

 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

This is tricky, because most class member functions are virtual, so you are 
forced to litter all your functions with escaping/non-escaping syntax.  To 
be accurate you need to define the escape graph in the signature, which will 
be a PITA.  What would be worse is to not have a way to express the complete 
graph.

Another solution is that a derived function must have the same expression 
graph or a tighter one than the base class'.  But without being able to 
store the graph with the compiled code (and having the compiler import the 
metadata instead of the source file), this is a moot point.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

No, the di file might not be auto-generated.  You also now back to a 
separate import and source file, like C has.  I think in order for this to 
work, the graph and object code must be stored in the same file that is 
imported.

 The expression graph analysis should be the first step towards safe
 stack closures.

I would agree with this.  But I don't think it's happening in the near 
future.  And I hope it's not done through .di files.

In the meantime, to make D2 a systems language again, it should drop 
conservative closures.

-Steve

Oct 28 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.

 I'm for safe defaults.  Programs shouldn't crash for no reason.

 
 If safe defaults means 75% performance decrease, I'm for using unsafe 
 defaults that are safe 99% of the time, with the ability to make them 100% 
 safe if needed.
 
 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

 
 Yes, but not in D, since import uses uncompiled files as input.

Please note the "in the absence of function calls" part.  I'm talking
about code which is doing pure calculus, without calling anything
external.  It's pretty useless by itself, but it's the basics.

Unfortunately I don't know how import is implemented.  It should do some
parsing though, to be able to inline functions from other modules, and
to expand templates.

 Assigning to a global state variable is an ultimate escape.

 
 Agree there.
 
 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

 
 This leads to the current situation, where you have a huge performance 
 decrease for little or no gain in reliability.
 
 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

 
 Good point.  Easy analysis on pure functions.
 
 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

 
 Where is the data kept?  It must be in the object file, and d imports must 
 then read the object file for api instead of the source file.  I don't think 
 it's worth anything to break the single file for imports/code model. 
 Requiring a .di file is a little iffy as it is today.

Here I'm talking about disposable compile-time data, module-local if you
wish.  This means that local optimization is better than inter-module
optimization.  Nothing new here I suppose.

Of course it would be nice if this data is exported somehow and used
when compiling other modules.  But it'd make the compilation process
asymmetric, when meta-data is available for already compiled modules and
not available for others.

 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

 
 This is tricky, because most class member functions are virtual, so you are 
 forced to litter all your functions with escaping/non-escaping syntax.  To 
 be accurate you need to define the escape graph in the signature, which will 
 be a PITA.  What would be worse is to not have a way to express the complete 
 graph.

Not every call to a virtual function is itself virtual, and not every
virtual function cares whether its argument escapes.

I'd say more: the noscope should be default for all reference types
except delegates because you usually don't care.  I agree that having
scope delegates the default is probably the right thing to do, but only
if a compiler can detect violations of this contract.

 Another solution is that a derived function must have the same expression 
 graph or a tighter one than the base class'.  But without being able to 
 store the graph with the compiled code (and having the compiler import the 
 metadata instead of the source file), this is a moot point.
 
 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

 
 No, the di file might not be auto-generated.  You also now back to a 
 separate import and source file, like C has.  I think in order for this to 
 work, the graph and object code must be stored in the same file that is 
 imported.

There are separate import files.  Actually compiler can simply put
scope/noscope for the arguments based upon the meta-data collected
during compilation.  If your .di is manually created, you either put
them manually as well, or you don't care.

 The expression graph analysis should be the first step towards safe
 stack closures.

 
 I would agree with this.  But I don't think it's happening in the near 
 future.  And I hope it's not done through .di files.

You can limit analysis to a single module for now.  This will cover
local function calls, including some local method calls, and I hope
it'll also cover template function calls which means std.algorithm will
work without memory allocation again.

 In the meantime, to make D2 a systems language again, it should drop 
 conservative closures.
 
 -Steve

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.

 I'm for safe defaults.  Programs shouldn't crash for no reason.

 If safe defaults means 75% performance decrease, I'm for using unsafe
 defaults that are safe 99% of the time, with the ability to make them 
 100%
 safe if needed.

 Here are my thoughts on escape analysis.  Sorry if they're obvious.

 I think it is possible to detect whether a reference escapes or not in
 the absence of function calls by analyzing an expression graph.

 Yes, but not in D, since import uses uncompiled files as input.

 Please note the "in the absence of function calls" part.  I'm talking
 about code which is doing pure calculus, without calling anything
 external.  It's pretty useless by itself, but it's the basics.

Ah, sorry.  I read 'absence of function source'.  My bad, in that case we 
agree on this one.

 Unfortunately I don't know how import is implemented.  It should do some
 parsing though, to be able to inline functions from other modules, and
 to expand templates.

Those are all problems to be solved.  But if the file used by the linker and 
the file that contains the expression graphs aren't the same, or at least 
forced to be related, then you end up with very weird issues.

 Assigning to a global state variable is an ultimate escape.

 Agree there.

 In the worst case, when only the current function can be analyzed and no
 meta-info is available about other functions, the compiler must assume a
 reference escapes if it is passed as an argument to another function.
 This is the current D2 behavior.

 This leads to the current situation, where you have a huge performance
 decrease for little or no gain in reliability.

 Pure functions provide some meta-info because any reference passed as an
 argument can only escape via a reference return value or other mutable
 reference arguments.  This makes escape analysis possible even after an
 unknown pure function is called.

 Good point.  Easy analysis on pure functions.

 For any function in a tree of imported modules the compiler could keep
 some meta-data about which argument escapes where, if at all.  This way
 even regular functions can participate in escape analysis without
 blowing it up.

 Where is the data kept?  It must be in the object file, and d imports 
 must
 then read the object file for api instead of the source file.  I don't 
 think
 it's worth anything to break the single file for imports/code model.
 Requiring a .di file is a little iffy as it is today.

 Here I'm talking about disposable compile-time data, module-local if you
 wish.  This means that local optimization is better than inter-module
 optimization.  Nothing new here I suppose.

Except the linker has to enforce it.  Which means it needs to somehow be 
munged into the signature.  If the signature is defined only in a .di file 
then it might not match.  I just think the object file and .di file are too 
unrelated to force continuity.  Weird issues can happen when these things 
are edited separately.

If .di files were not editable and always generated with object files, I'd 
say they were a good place to put this info.  But they aren't.

 Of course it would be nice if this data is exported somehow and used
 when compiling other modules.  But it'd make the compilation process
 asymmetric, when meta-data is available for already compiled modules and
 not available for others.

It would have to be available for all of them.  That would be the point of 
including it in the object file.

 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

 This is tricky, because most class member functions are virtual, so you 
 are
 forced to litter all your functions with escaping/non-escaping syntax. 
 To
 be accurate you need to define the escape graph in the signature, which 
 will
 be a PITA.  What would be worse is to not have a way to express the 
 complete
 graph.

 Not every call to a virtual function is itself virtual, and not every
 virtual function cares whether its argument escapes.

 I'd say more: the noscope should be default for all reference types
 except delegates because you usually don't care.  I agree that having
 scope delegates the default is probably the right thing to do, but only
 if a compiler can detect violations of this contract.

A very very common technique in Tango to save using heap allocation is to 
declare a static array as a buffer, and then pass that buffer to be used as 
scratch space in a function (which is possibly virtual).

This would be my golden use case that has to not allocate anything and has 
to work in order for any solution to be viable.

Saying all reference types are noscope would prevent this, no?

 Another solution is that a derived function must have the same expression
 graph or a tighter one than the base class'.  But without being able to
 store the graph with the compiled code (and having the compiler import 
 the
 metadata instead of the source file), this is a moot point.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

 No, the di file might not be auto-generated.  You also now back to a
 separate import and source file, like C has.  I think in order for this 
 to
 work, the graph and object code must be stored in the same file that is
 imported.

 There are separate import files.  Actually compiler can simply put
 scope/noscope for the arguments based upon the meta-data collected
 during compilation.  If your .di is manually created, you either put
 them manually as well, or you don't care.

I think the graph has to be complete for this to be usable.  Otherwise, it 
becomes an unused feature.  Using .di files is optional.  I generally don't 
use them.

 The expression graph analysis should be the first step towards safe
 stack closures.

 I would agree with this.  But I don't think it's happening in the near
 future.  And I hope it's not done through .di files.

 You can limit analysis to a single module for now.  This will cover
 local function calls, including some local method calls, and I hope
 it'll also cover template function calls which means std.algorithm will
 work without memory allocation again.

Yes, but not class virtual methods or interface methods.  These are used 
quite a bit in Tango.  End result, not a lot of benefit.

-Steve

Oct 28 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

 
 This is tricky, because most class member functions are virtual, so
 you are forced to litter all your functions with escaping/non-escaping
 syntax. To be accurate you need to define the escape graph in the
 signature, which will be a PITA.  What would be worse is to not have a
 way to express the complete graph.

 Not every call to a virtual function is itself virtual, and not every
 virtual function cares whether its argument escapes.

 I'd say more: the noscope should be default for all reference types
 except delegates because you usually don't care.  I agree that having
 scope delegates the default is probably the right thing to do, but only
 if a compiler can detect violations of this contract.

 
 A very very common technique in Tango to save using heap allocation is to 
 declare a static array as a buffer, and then pass that buffer to be used as 
 scratch space in a function (which is possibly virtual).
 
 This would be my golden use case that has to not allocate anything and has 
 to work in order for any solution to be viable.
 
 Saying all reference types are noscope would prevent this, no?

Allocation only happens when a stack variable reference escapes via a
delegate.  A static array is not a stack variable, therefore the compiler
doesn't care if it escapes.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info about
 which argument escapes where, to mimic the compile-time meta-info.

 No, the di file might not be auto-generated.  You also now back to a
 separate import and source file, like C has.  I think in order for this 
 to
 work, the graph and object code must be stored in the same file that is
 imported.

 There are separate import files.  Actually compiler can simply put
 scope/noscope for the arguments based upon the meta-data collected
 during compilation.  If your .di is manually created, you either put
 them manually as well, or you don't care.

 
 I think the graph has to be complete for this to be usable.  Otherwise, it 
 becomes an unused feature.  Using .di files is optional.  I generally don't 
 use them.

For the incomplete graph to be usable, the compiler must assume the worst
for nodes with absent meta-info.  Therefore if you don't care to provide
meta-info for your modules, it'll still work, though not as efficiently.
On the other hand, if you supply .di files with your library and you do
care enough, or you generate your .di files automatically, the meta-info
will be present there saving some allocations for the user.

 The expression graph analysis should be the first step towards safe
 stack closures.

 I would agree with this.  But I don't think it's happening in the near
 future.  And I hope it's not done through .di files.

 You can limit analysis to a single module for now.  This will cover
 local function calls, including some local method calls, and I hope
 it'll also cover template function calls which means std.algorithm will
 work without memory allocation again.

 
 Yes, but not class virtual methods or interface methods.  These are used 
 quite a bit in Tango.  End result, not a lot of benefit.

If those virtual and interface methods are often used with function-local
delegates as parameters then yes, the benefit wouldn't be that significant.
Are you sure this is the case with Tango?

Oct 29 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

 This is tricky, because most class member functions are virtual, so
 you are forced to litter all your functions with escaping/non-escaping
 syntax. To be accurate you need to define the escape graph in the
 signature, which will be a PITA.  What would be worse is to not have a
 way to express the complete graph.

 Not every call to a virtual function is itself virtual, and not every
 virtual function cares whether its argument escapes.

 I'd say more: the noscope should be default for all reference types
 except delegates because you usually don't care.  I agree that having
 scope delegates the default is probably the right thing to do, but only
 if a compiler can detect violations of this contract.

 A very very common technique in Tango to save using heap allocation is to
 declare a static array as a buffer, and then pass that buffer to be used 
 as
 scratch space in a function (which is possibly virtual).

 This would be my golden use case that has to not allocate anything and 
 has
 to work in order for any solution to be viable.

 Saying all reference types are noscope would prevent this, no?

 Allocation only happens when a stack variable reference escapes via a
 delegate.  A static array is not a stack variable, therefore the compiler
 doesn't care if it escapes.

A static array declared on the stack absolutely is a stack variable.

An example (from Tango's integer to text converter):

char[] toString (long i, char[] fmt = null)
{
        char[66] tmp = void;
        return format (tmp, i, fmt).dup;
}

Without the dup, toString returns a pointer to it's own stack.  With a full 
graph analysis, it can be proven that tmp doesn't escape, but without either 
that or some crazy scope scheme, it would either allocate a closure, or fail 
to compile.  Neither of those options are acceptable.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info 
 about
 which argument escapes where, to mimic the compile-time meta-info.

 No, the di file might not be auto-generated.  You also now back to a
 separate import and source file, like C has.  I think in order for this
 to
 work, the graph and object code must be stored in the same file that is
 imported.

 There are separate import files.  Actually compiler can simply put
 scope/noscope for the arguments based upon the meta-data collected
 during compilation.  If your .di is manually created, you either put
 them manually as well, or you don't care.

 I think the graph has to be complete for this to be usable.  Otherwise, 
 it
 becomes an unused feature.  Using .di files is optional.  I generally 
 don't
 use them.

 For the incomplete graph to be usable, the compiler must assume the worst
 for nodes with absent meta-info.  Therefore if you don't care to provide
 meta-info for your modules, it'll still work, though not as efficiently.
 On the other hand, if you supply .di files with your library and you do
 care enough, or you generate your .di files automatically, the meta-info
 will be present there saving some allocations for the user.

This doesn't cover virtual functions or runtime-determined delegates.  I'd 
rather just have a separate meta file or have the meta data included in the 
object file.  What is wrong with that?  Why must it be in the .di file?  If 
the compiler always generates these meta files, then the graph is always 
complete.

 The expression graph analysis should be the first step towards safe
 stack closures.

 I would agree with this.  But I don't think it's happening in the near
 future.  And I hope it's not done through .di files.

 You can limit analysis to a single module for now.  This will cover
 local function calls, including some local method calls, and I hope
 it'll also cover template function calls which means std.algorithm will
 work without memory allocation again.

 Yes, but not class virtual methods or interface methods.  These are used
 quite a bit in Tango.  End result, not a lot of benefit.

 If those virtual and interface methods are often used with function-local
 delegates as parameters then yes, the benefit wouldn't be that 
 significant.
 Are you sure this is the case with Tango?

Any time you use opApply (and opApply is virtual), you are doing this.  I 
suppose opApply is a special case, and can be failed if you save the 
delegate somewhere.  But what about being able to pass the delegate to 
another virtual function while inside your opApply?

Here is another example from Tango that isn't used via foreach:

final bool putCache (char[] key, IMessage message)
        {
                void send (IConduit conduit)
                {
                        buffer.setConduit (conduit);
                        writer.put (ProtocolWriter.Command.Add, name_, key, 
message).flush;
                }

                // return false if the cache server said there's
                // already something newer
                if (cluster_.cache.request (&send, reader, key))
                    return false;
                return true;
        }

cluster_.cache is a class.

-Steve

Oct 29 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 29 Oct 2008 11:52:14 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 An argument to a virtual function call always escapes by default.  It
 may be possible to declare an argument as non-escaping (scope?) and
 compiler should then enforce non-escaping contract upon any overriding
 functions.

 This is tricky, because most class member functions are virtual, so
 you are forced to litter all your functions with escaping/non-escaping
 syntax. To be accurate you need to define the escape graph in the
 signature, which will be a PITA.  What would be worse is to not have a
 way to express the complete graph.

 Not every call to a virtual function is itself virtual, and not every
 virtual function cares whether its argument escapes.

 I'd say more: the noscope should be default for all reference types
 except delegates because you usually don't care.  I agree that having
 scope delegates the default is probably the right thing to do, but only
 if a compiler can detect violations of this contract.

 A very very common technique in Tango to save using heap allocation is to
 declare a static array as a buffer, and then pass that buffer to be used 
 as
 scratch space in a function (which is possibly virtual).

 This would be my golden use case that has to not allocate anything and 
 has
 to work in order for any solution to be viable.

 Saying all reference types are noscope would prevent this, no?

 Allocation only happens when a stack variable reference escapes via a
 delegate.  A static array is not a stack variable, therefore the compiler
 doesn't care if it escapes.

 
 A static array declared on the stack absolutely is a stack variable.
 
 An example (from Tango's integer to text converter):
 
 char[] toString (long i, char[] fmt = null)
 {
         char[66] tmp = void;
         return format (tmp, i, fmt).dup;
 }
 
 Without the dup, toString returns a pointer to it's own stack.  With a full 
 graph analysis, it can be proven that tmp doesn't escape, but without either 
 that or some crazy scope scheme, it would either allocate a closure, or fail 
 to compile.  Neither of those options are acceptable.

There is no delegate, therefore nothing to allocate a closure for.  If
tmp escapes, it is a compile-time error.

If format() were pure it would be trivial to prove that tmp didn't
escape.  If format() is not pure, and escape graph for it is not known,
then issuing an error here would be too much of a breaking change, I
agree.

 An argument to a function declared as a prototype always escapes by
 default.  It may be possible for the compiler to export some meta-info
 along with the prototype when a .di file is generated, whether an
 argument is guaranteed to not escape, or maybe even detailed info 
 about
 which argument escapes where, to mimic the compile-time meta-info.

 No, the di file might not be auto-generated.  You also now back to a
 separate import and source file, like C has.  I think in order for this
 to
 work, the graph and object code must be stored in the same file that is
 imported.

 There are separate import files.  Actually compiler can simply put
 scope/noscope for the arguments based upon the meta-data collected
 during compilation.  If your .di is manually created, you either put
 them manually as well, or you don't care.

 I think the graph has to be complete for this to be usable.  Otherwise, 
 it
 becomes an unused feature.  Using .di files is optional.  I generally 
 don't
 use them.

 For the incomplete graph to be usable, the compiler must assume the worst
 for nodes with absent meta-info.  Therefore if you don't care to provide
 meta-info for your modules, it'll still work, though not as efficiently.
 On the other hand, if you supply .di files with your library and you do
 care enough, or you generate your .di files automatically, the meta-info
 will be present there saving some allocations for the user.

 
 This doesn't cover virtual functions or runtime-determined delegates.  I'd 
 rather just have a separate meta file or have the meta data included in the 
 object file.  What is wrong with that?  Why must it be in the .di file?  If 
 the compiler always generates these meta files, then the graph is always 
 complete.

If you compile two files for the first time, and the first file imports
the second one, where do you get that meta-data for the second file?
What if you compile only one file, and that file imports another which
wasn't compiled yet?  Either you construct meta-data on the fly, or
require it included in the source, or assume it's not present (worst
case).

 The expression graph analysis should be the first step towards safe
 stack closures.

 I would agree with this.  But I don't think it's happening in the near
 future.  And I hope it's not done through .di files.

 You can limit analysis to a single module for now.  This will cover
 local function calls, including some local method calls, and I hope
 it'll also cover template function calls which means std.algorithm will
 work without memory allocation again.

 Yes, but not class virtual methods or interface methods.  These are used
 quite a bit in Tango.  End result, not a lot of benefit.

 If those virtual and interface methods are often used with function-local
 delegates as parameters then yes, the benefit wouldn't be that 
 significant.
 Are you sure this is the case with Tango?

 
 Any time you use opApply (and opApply is virtual), you are doing this.

Fair enough.  opApply() is an important technique.

Oct 29 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Sergey Gromov" wrote
 Wed, 29 Oct 2008 11:52:14 -0400, Steven Schveighoffer wrote:

 "Sergey Gromov" wrote
 Tue, 28 Oct 2008 23:33:53 -0400, Steven Schveighoffer wrote:
 A very very common technique in Tango to save using heap allocation is 
 to
 declare a static array as a buffer, and then pass that buffer to be 
 used
 as
 scratch space in a function (which is possibly virtual).

 This would be my golden use case that has to not allocate anything and
 has
 to work in order for any solution to be viable.

 Saying all reference types are noscope would prevent this, no?

 Allocation only happens when a stack variable reference escapes via a
 delegate.  A static array is not a stack variable, therefore the 
 compiler
 doesn't care if it escapes.

 A static array declared on the stack absolutely is a stack variable.

 An example (from Tango's integer to text converter):

 char[] toString (long i, char[] fmt = null)
 {
         char[66] tmp = void;
         return format (tmp, i, fmt).dup;
 }

 Without the dup, toString returns a pointer to it's own stack.  With a 
 full
 graph analysis, it can be proven that tmp doesn't escape, but without 
 either
 that or some crazy scope scheme, it would either allocate a closure, or 
 fail
 to compile.  Neither of those options are acceptable.

 There is no delegate, therefore nothing to allocate a closure for.  If
 tmp escapes, it is a compile-time error.

I was under the impression that closures are currently allocated if you 
return a reference to a stack variable, not just for delegates.  Maybe I'm 
wrong...

 If format() were pure it would be trivial to prove that tmp didn't
 escape.  If format() is not pure, and escape graph for it is not known,
 then issuing an error here would be too much of a breaking change, I
 agree.

format cannot be pure because it accepts mutable reference data.  It happens 
to be in the same file, so it probably would not be an issue because a graph 
is generated for the current file, but these are not the only cases that 
Tango has.

 I think the graph has to be complete for this to be usable.  Otherwise,
 it
 becomes an unused feature.  Using .di files is optional.  I generally
 don't
 use them.

 For the incomplete graph to be usable, the compiler must assume the 
 worst
 for nodes with absent meta-info.  Therefore if you don't care to provide
 meta-info for your modules, it'll still work, though not as efficiently.
 On the other hand, if you supply .di files with your library and you do
 care enough, or you generate your .di files automatically, the meta-info
 will be present there saving some allocations for the user.

 This doesn't cover virtual functions or runtime-determined delegates. 
 I'd
 rather just have a separate meta file or have the meta data included in 
 the
 object file.  What is wrong with that?  Why must it be in the .di file? 
 If
 the compiler always generates these meta files, then the graph is always
 complete.

 If you compile two files for the first time, and the first file imports
 the second one, where do you get that meta-data for the second file?
 What if you compile only one file, and that file imports another which
 wasn't compiled yet?  Either you construct meta-data on the fly, or
 require it included in the source, or assume it's not present (worst
 case).

My vote would be for compiling it on the fly.  The compiler already does 
parsing of the source file, so it can also generate this graph data.  It 
shouldn't be too hard a task.

Look, I agree that a graph analysis is the best possible solution.  It 
requires no work from the user, no extra specification, and it will solve 
the problem accurately.

But the current mode of compliation doesn't allow for that easily.  That's 
all I was saying.

-Steve

Oct 29 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 29 Oct 2008 15:23:12 -0400, Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 If you compile two files for the first time, and the first file imports
 the second one, where do you get that meta-data for the second file?
 What if you compile only one file, and that file imports another which
 wasn't compiled yet?  Either you construct meta-data on the fly, or
 require it included in the source, or assume it's not present (worst
 case).

 
 My vote would be for compiling it on the fly.  The compiler already does 
 parsing of the source file, so it can also generate this graph data.  It 
 shouldn't be too hard a task.
 
 Look, I agree that a graph analysis is the best possible solution.  It 
 requires no work from the user, no extra specification, and it will solve 
 the problem accurately.
 
 But the current mode of compliation doesn't allow for that easily.  That's 
 all I was saying.

I do understand that.  I just wanted to discuss whether it is possible
to approach this problem incrementally, so that relatively simple
changes significantly improve the situation without breaking safety.
And I thought that a dispute was a nice way for probing an idea for
hidden flaws.

Oct 29 2008

Chad J <gamerchad __spam.is.bad__gmail.com> writes:

Steven Schveighoffer wrote:
 "Sergey Gromov" wrote
 Walter Bright wrote:
 The first step is, are function parameters considered to be escaping by
 default or not by default? I.e.:

 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?

 What should be the default? The functional programmer would probably
 choose scope as the default, and the OOP programmer noscope.

 I'm for safe defaults.  Programs shouldn't crash for no reason.

 
 If safe defaults means 75% performance decrease, I'm for using unsafe 
 defaults that are safe 99% of the time, with the ability to make them 100% 
 safe if needed.
 

If safe defaults means 2% performance decrease, I'm for using unsafe 
defaults that are safe 10% of the time, with the inability to make them 
100% safe if needed.

I might also be insane.

...

I'm initially biased towards the safe default.  I remember reading that 
part of D's design philosophy is to be safe by default, and I like that 
A LOT because it saves me from wasting many many hours of my life on 
stupid bugs.  I'm also not convinced that full closures really run that 
much slower.  That said, I'd be happy to ignore escape analysis for a 
while longer and just have D1 closures with the option to manually heap 
allocate them.  I say that under the assumption that it's really easy to 
implement, mostly sortof solves the problem, and allows better (more 
general, safer) solutions to be put in place later.

Oct 28 2008

"Robert Jacques" <sandford jhu.edu> writes:

I've run across some academic work on ownership types which seems relevant  
to this discussion on share/local/scope/noscope.

Paper: http://www.cs.jhu.edu/~scott/pll/papers/pedigree-types.pdf
Slides: http://www.cs.jhu.edu/~scott/pll/papers/iwaco.ppt
Site: http://www.cs.jhu.edu/~scott/pll/abinitio.html
Overview:
Pedigree Types are an intuitive ownership type system requiring minimal  
programmer annotations. Reusing the vocabulary of human genealogy,  
Pedigree Types programmers can qualify any object reference with a  
pedigree -- a child, sibling, parent, grandparent, etc -- to indicate what  
relationship the object being referred to has with the referant on the  
standard ownership tree, following the owners-as-dominators convention.  
Such a qualifier serves as a heap shape constraint that must hold at run  
time and is enforced statically. Pedigree child captures the intention of  
encapsulation, i.e. ownership: the modified object reference is ensured  
not to escape the boundary of its parent. Among existing ownership type  
systems, Pedigree Types are closest to Universe Types. The former can be  
viewed as extending the latter with a more general form of pedigree  
modifiers, so that the relationship between any pair of objects on the  
aforementioned ownership tree can be named and -- more importantly --  
inferred. We use a constraint-based type system which is proved sound via  
subject reduction. Other technical originalities include a polymorphic  
treatment of pedigrees not explicitly specified by programmers, and use of  
linear diophantine equations in type constraints to enforce the hierarchy.

Oct 28 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems 
 relevant  to this discussion on share/local/scope/noscope.

I haven't read the paper yet, but the overview seems to go in the same 
direction as I was thinking.

Basically, all the scope variables you can get are guarentied to be in 
the current or in some ansestry scope. To allow a reference to a scope 
variable, or a scope function, to be put inside a member of a struct or 
class, you only need to prove that the struct or class lifetime is 
smaller or equal to the one of the reference to your scope variable. If 
you could tell to the compiler the scope relationship of the various 
arguments, then you'd have pretty good scope analysis.

For instance, with this syntax, we could define i to be available 
during the whole lifetime of o:

	void foo(scope MyObject o, scope(o) int* i)
	{
		o.i = i;
	}

So you could do:

	void bar()
	{
		scope int i;
		scope MyObject o = new MyObject;
		foo(o, &i);
	}

And the compiler would let it pass because foo guarenties not to keep 
references to i outside of o's scope, and o's scope is the same as i.

Or you could do:

	void test1()
	{
		int i;
		test2(&i);
	}

	void test2(scope int* i)
	{
		scope o = new MyObject;
		foo(o, &i);
	}

Again, the compiler can statically check that test2 won't keep a 
reference to i outside of the caller's scope (test1) because o scope is 
limited to test2.

And if you try the reverse:

	void test1()
	{
		scope o = new MyObject;
		test2(o);
	}

	void test2(scope MyObject o)
	{
		int i;
		foo(o, &i);
	}

Then the compiler could determine automatically that i needs to escape 
test2's scope and allocate the variable on the heap to make its 
lifetime as long as the object's scope (as it does currently with 
nested functions) [see my reserves to this in post scriptum]. This 
could be avoided by explictly binding i to the current scope, in which 
case the compiler could issue a scope error:

	void test2(scope MyObject o)
	{
		scope int i;
		foo(o, &i); // error, i scope needs to match o's, but i is bound to 
the current scope.
	}

Interistingly, with this scheme, assuming your function arguments are 
properly scope-labeled, you never need to allocate variables on the 
heap explicitly anymore, the compiler can take care of it for you when 
the use of the variable inside the function body requires it.

	void test3(int* i); // unscoped parameter
	void test4()
	{
		int i; // allocated on heap because calling test3 requires an 
unscoped variable.
		test3(&i);
	}

The reverse is also true: objects declared as allocated on the heap 
could be automatically rescoped as local stack variables if their use 
inside the function is limited in scope:

	void test5()
	{
		auto o = new MyObject;
		test2(o);
	}

For instance, in test3 above where o isn't declared as scope, the 
compiler could still allocate o on the stack (as long as it knows the 
constructor doesn't leave unwanted references to the object in the 
global state), because it knows from the argument declaration of test2 
that no references to o will leave the current scope.

So basically, what to heap-allocate and what to stack-allocate could be 
left entirely to the compiler's discretion.

Note that for all this to work, the pointer "i" in MyObject must be 
defined as not escaping the scope of the class:

	class MyObject
	{
		scope int* i;
	}

or else someone could take the reference and put it into a global 
variable, or a variable of a greater scope than the object.

P.S.: I'm still somewhat skeptical about this automatic allocation 
thing because it would mean a lot of extra heap allocation (and thus 
loss of performance) for any function where the parameters are not 
properly scoped. Perhaps the default should be local scope and you 
explicitly make it greater by declaring variables as noscope, which 
would allow the compiler to allocate if needed, but it doesn't solve 
the issue of the need to allocate on the heap for calling safely 
functions not using scope-labeled arguments.

P.P.S.: This syntax doesn't fit very well with the current 
scope(success/failure/exit) feature.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 29 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Michel Fortin" wrote
 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems 
 relevant  to this discussion on share/local/scope/noscope.

 I haven't read the paper yet, but the overview seems to go in the same 
 direction as I was thinking.

[snip]

This is exactly the kind of thing I DON'T want to have.  Here, you have to 
specify everything, even though the compiler is also doing the work, and 
making sure it matches.  Tack on const modifiers, shared modifiers, and pure 
functions and there's going to be more decorations on function signatures 
than there are parameters.

Note that especially this scope stuff will be required more often than the 
others.

I'd much rather have either no checks, or have the compiler (or a lint tool) 
do all the work to tell me if anything escapes.

-Steve

Oct 29 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Oct 2008 11:01:35 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Michel Fortin" wrote
 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems
 relevant  to this discussion on share/local/scope/noscope.

 I haven't read the paper yet, but the overview seems to go in the same
 direction as I was thinking.

 [snip]

 This is exactly the kind of thing I DON'T want to have.  Here, you have  
 to
 specify everything, even though the compiler is also doing the work, and
 making sure it matches.  Tack on const modifiers, shared modifiers, and  
 pure
 functions and there's going to be more decorations on function signatures
 than there are parameters.

 Note that especially this scope stuff will be required more often than  
 the
 others.

 I'd much rather have either no checks, or have the compiler (or a lint  
 tool)
 do all the work to tell me if anything escapes.

 -Steve

Note that one of a major points in the Pedigree paper is the static type  
inference, so you don't have to specify everything.

Oct 29 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-29 11:01:35 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 This is exactly the kind of thing I DON'T want to have.  Here, you have to
 specify everything, even though the compiler is also doing the work, and
 making sure it matches.  Tack on const modifiers, shared modifiers, and pure
 functions and there's going to be more decorations on function signatures
 than there are parameters.

I agree that this is becomming a problem, even without scope. What we 
need is good defaults so that you don't have to decorate most of the 
time, and especially when you want to bypass it.

I'd also like to point out that beside the possibility of better 
optimization and error catching by the compiler, specifying more 
properties function interfaces can free us of handling other releated 
things. With "immutable" values you don't need to worry about 
duplicating them everywhere to avoid other references from changing it; 
with "shared", you'll have less to worry about thread synchronization; 
and with "scope" as I proposed, you no longer have to worry about 
providing variables with the correct scope as the compiler can 
dynamically allocate when it sees the variable is needed outside of the 
current scope.

Basically, by documenting better the interfaces in a machine-readable 
way, we are freed of other burdens the compiler can now take care of. 
In addition, we have better defined interfaces and the compiler has a 
lot more room to optimize things.

 Note that especially this scope stuff will be required more often than the
 others.

Indeed.

 I'd much rather have either no checks, or have the compiler (or a lint tool)
 do all the work to tell me if anything escapes.

The problem is that as soon as you have a function declaration without 
the body, the lint tool won't be able to tell you if it escapes or not. 
So, without a way to specify the requested scope of the parameters, 
you'll very often have holes in your escape analysis that will 
propagate down the caller chain, preventing any useful conclusion.

For instance:

	void foo()
	{
		char[5] x = ['1', '2', '3', '4', '\0'];
		bar(x);
	}

	void bar(char* x)
	{
		printf(x);
	}

	void printf(char* x);

Here you have no specification telling you that printf won't keep a 
reference to x beyond its scope, so we have to expect that it may do 
so. Turns out that because of that, a compiler or lit tool can't deduce 
if bar may or not leak the reference beyond its scope, which basically 
mean that calling bar(x) in foo may or may not be safe. With my 
proposal, it'd become this:

	void foo()
	{
		char[5] x = ['1', '2', '3', '4', '\0'];
		bar(x.ptr);
	}

	void bar(scope char* x)
	{
		printf(x);
	}

	void printf(scope char* x);

And here the compiler, or the lint tool, can see that x doesn't need to 
live outside of foo's scope and that all is fine. If bar decided to 
keep the pointer in a global variable for further use, then the 
function signature would have a noscope x or the assignment to a global 
wouldn't work, and once bar has a noscope argument then foo won't 
compile unless x is allocated on the heap.

I don't think it's bad to force interfaces to be well documented, and 
documented in a format that the compiler can understand to find errors 
like this.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 30 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable way, 
 we are freed of other burdens the compiler can now take care of. In 
 addition, we have better defined interfaces and the compiler has a lot 
 more room to optimize things.

But the burden you have left for the developer is a tough one.  You have to 
analyze the inputs and function calls from a function and determine which 
variable depends on what.  This is a perfect problem for a tool to solve.

 The problem is that as soon as you have a function declaration without the 
 body, the lint tool won't be able to tell you if it escapes or not.

This I agree is a problem.  In fact, without specifications in the function 
things like interfaces would be very difficult to determine scope-ness at 
compile time.

The only way I can see to solve this is to do it at link time.  When you 
link, piece together the parts of the graph that were incomplete, and see if 
they all work.  It would be a very radical change, and might not even work 
with the current linkers.  Especially if you want to do shared libraries, 
where the linker is builtin to the OS.

A related question: how do you handle C functions?

 So, without a way to specify the requested scope of the parameters, you'll 
 very often have holes in your escape analysis that will propagate down the 
 caller chain, preventing any useful conclusion.

Yes, and if a function has mis-specified some of its parameters, then you 
have code that doesn't compile.  Or the function itself won't compile, and 
you need to do some more manual analysis.  Imagine a function that calls 5 
or 6 other functions with its parameters.  And there are multiple different 
dependencies you have to resolve.  That's a lot of analysis you have to do 
manually.

 I don't think it's bad to force interfaces to be well documented, and 
 documented in a format that the compiler can understand to find errors 
 like this.

I think this concept is going to be really hard for a person to decipher, 
and really hard to get right.  We are talking about a graph dependency 
analysis, in which many edges can exist, and the vertices do not necessarily 
have to be parameters.  This is not stuff for the meager developer looking 
to get work done to have to think about.  I'd much rather have a tool that 
does it, if not the compiler, then something else.  Or partial analysis.  Or 
no analysis.  I agree it's good to have bugs caught by the compiler, but 
this solution requires too much work from the developer to be used.

Some fun puzzles for you to come up with a proper scope syntax to use:

void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

struct S
{
   int *v;
}

int *f2(S* s) { return s.v;}

void f3(ref int *a, ref int *b, ref int *c)
{
   int *tmp = a;
   a = b; b = c; c = tmp;
}

-Steve

Oct 31 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 31 Oct 2008 11:11:26 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 "Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable  
 way,
 we are freed of other burdens the compiler can now take care of. In
 addition, we have better defined interfaces and the compiler has a lot
 more room to optimize things.

 But the burden you have left for the developer is a tough one.  You have  
 to
 analyze the inputs and function calls from a function and determine which
 variable depends on what.  This is a perfect problem for a tool to solve.

Tools can't handle function pointers, which is why escape analysis has  
been limited to dynamic laguages like Java so far.

 The problem is that as soon as you have a function declaration without  
 the
 body, the lint tool won't be able to tell you if it escapes or not.

 This I agree is a problem.  In fact, without specifications in the  
 function
 things like interfaces would be very difficult to determine scope-ness at
 compile time.

 The only way I can see to solve this is to do it at link time.  When you
 link, piece together the parts of the graph that were incomplete, and  
 see if
 they all work.  It would be a very radical change, and might not even  
 work
 with the current linkers.  Especially if you want to do shared libraries,
 where the linker is builtin to the OS.

One option is link time compilation, although that doesn't apply to shared  
libs.

 A related question: how do you handle C functions?

Hope and pray? (i.e. The same way C functions and immutable types are  
handled now.)

 So, without a way to specify the requested scope of the parameters,  
 you'll
 very often have holes in your escape analysis that will propagate down  
 the
 caller chain, preventing any useful conclusion.

 Yes, and if a function has mis-specified some of its parameters, then you
 have code that doesn't compile.  Or the function itself won't compile,  
 and
 you need to do some more manual analysis.  Imagine a function that calls  
 5
 or 6 other functions with its parameters.  And there are multiple  
 different
 dependencies you have to resolve.  That's a lot of analysis you have to  
 do
 manually.

Well, the same problem occurs with const today and just like const you'd  
have specific compilier errors to guide you.

 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

 I think this concept is going to be really hard for a person to decipher,
 and really hard to get right.  We are talking about a graph dependency
 analysis, in which many edges can exist, and the vertices do not  
 necessarily
 have to be parameters.  This is not stuff for the meager developer  
 looking
 to get work done to have to think about.  I'd much rather have a tool  
 that
 does it, if not the compiler, then something else.  Or partial  
 analysis.  Or
 no analysis.  I agree it's good to have bugs caught by the compiler, but
 this solution requires too much work from the developer to be used.

Well, I'd guess most functions are either no escape or heap escape. Only  
functions that permit escape and want to play nice with stack variables  
need to do actual graph analysis. You'll note Walter's blog ignores this  
usage.

 Some fun puzzles for you to come up with a proper scope syntax to use:

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

if( a.scope <= b.scope && a.scope <= c.scope )

 struct S
 {
    int *v;
 }

 int *f2(S* s) { return s.v;}

int* f2(S* s) if( return.scope >= s.scope )

 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

if ( a.scope == b.scope && a.scope == c.scope )

 -Steve

This is actually pretty straight forward as a = b implies a.scope <=  
b.scope.

Oct 31 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-31 11:11:26 -0400, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 "Michel Fortin" wrote
 Basically, by documenting better the interfaces in a machine-readable way,
 we are freed of other burdens the compiler can now take care of. In
 addition, we have better defined interfaces and the compiler has a lot
 more room to optimize things.

 
 But the burden you have left for the developer is a tough one.  You have to
 analyze the inputs and function calls from a function and determine which
 variable depends on what.  This is a perfect problem for a tool to solve.
 
 The problem is that as soon as you have a function declaration without the
 body, the lint tool won't be able to tell you if it escapes or not.

 
 This I agree is a problem.  In fact, without specifications in the function
 things like interfaces would be very difficult to determine scope-ness at
 compile time.

If you can't determine yourself that a function can work with scoped 
parameters, you'd better never call that function with reference to 
local variables and leave its prototype with noscope parameters, making 
the compiler aware of the situation.

In any case, the one who design the function is the one who is most 
likely able to tell you whether or not it accepts scoped arguments. The 
current situation makes the caller of that function responsible of 
calling it correctly. I think that's backward.


 The only way I can see to solve this is to do it at link time.  When you
 link, piece together the parts of the graph that were incomplete, and see if
 they all work.  It would be a very radical change, and might not even work
 with the current linkers.  Especially if you want to do shared libraries,
 where the linker is builtin to the OS.

I think you're dreaming... not that it's a bad thing to have ambition, 
but that's probably not even possible.


 A related question: how do you handle C functions?

You read the documentation of the function to determine if the function 
will let the pointer escape somewhere, and if not declare the parameter 
scope. For instance:

	extern (C)
	void printf(scope char* format, scope...);

By the way, extern (C) functions with noscope parameters need careful 
consideration since their pointers aren't tracked by the garbage 
collector.


 So, without a way to specify the requested scope of the parameters, you'll
 very often have holes in your escape analysis that will propagate down the
 caller chain, preventing any useful conclusion.

 
 Yes, and if a function has mis-specified some of its parameters, then you
 have code that doesn't compile.  Or the function itself won't compile, and
 you need to do some more manual analysis.  Imagine a function that calls 5
 or 6 other functions with its parameters.  And there are multiple different
 dependencies you have to resolve.  That's a lot of analysis you have to do
 manually.

You'll get an error at some call site, which can mean only two things: 
either your local variable shouldn't be bound to the local scope 
(because the function expects a reference it can keep beyond its scope) 
so you should allocate it on the heap, or the function you're calling 
has its prototype wrong.

There's a chance that fixing the function prototype will create 
problems upward if it tries to put a reference to a scope variable in a 
global, or pass it to a function as a noscope argument.


 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

 
 I think this concept is going to be really hard for a person to decipher,
 and really hard to get right.

It takes some thinking to get the prototype right at first. But it 
takes less caution calling the function later with local variables 
since the compiler will either issue an error or automatically fix the 
issue by allocating on the heap when an argument requires a greater 
scope.


 We are talking about a graph dependency
 analysis, in which many edges can exist, and the vertices do not necessarily
 have to be parameters.  This is not stuff for the meager developer looking
 to get work done to have to think about.  I'd much rather have a tool that
 does it, if not the compiler, then something else.  Or partial analysis.  Or
 no analysis.  I agree it's good to have bugs caught by the compiler, but
 this solution requires too much work from the developer to be used.
 
 Some fun puzzles for you to come up with a proper scope syntax to use:
 
 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

	void f(scope ref int *a, scopeof(a) int *b, scopeof(o) int *c)
	{
		if (*b < *c) a = b; else a = c;
	}


 struct S
 {
    int *v;
 }
 
 int *f2(S* s) { return s.v;}

Here you have two options depending on what you mean. Your example 
above is valid, but would allow v to point only to heap variables. If 
your intension is that S.v should be able to refer to scope variables 
too, then you'd need to write S as:

	struct S
	{
		scope int *v;
	}

Then, no function can copy this pointer and keep it beyond of the scope 
of S. Therfore, the function needs to be updated to propagate this 
property:

	scopeof(s) int *f2(scope S* s) { return s.v; }


 void f3(ref int *a, ref int *b, ref int *c)
 {
    int *tmp = a;
    a = b; b = c; c = tmp;
 }

This one is special, because you have a circular reference between the 
parameters. Note that a simpler example of this would be swapping two 
values. I had to invent something here saying that all these variables 
share the same scope... but I'd agree the syntax isn't so good.

	void f3(ref scope(1) int *a, ref scope(1) int *b, ref scope(1) int *c)
	{
		scope int *tmp = a;
		a = b; b = c; c = tmp;
	}


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 31 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Michel Fortin" wrote
 If you can't determine yourself that a function can work with scoped 
 parameters, you'd better never call that function with reference to local 
 variables and leave its prototype with noscope parameters, making the 
 compiler aware of the situation.

 In any case, the one who design the function is the one who is most likely 
 able to tell you whether or not it accepts scoped arguments. The current 
 situation makes the caller of that function responsible of calling it 
 correctly. I think that's backward.

But often times, the safety of the call depends on how it is being called. 
Unless the function has fully documented the scope escapes of its 
parameters, which as I have been saying, is going to be difficult, or 
impossible, for a person to figure out.

 The only way I can see to solve this is to do it at link time.  When you
 link, piece together the parts of the graph that were incomplete, and see 
 if
 they all work.  It would be a very radical change, and might not even 
 work
 with the current linkers.  Especially if you want to do shared libraries,
 where the linker is builtin to the OS.

 I think you're dreaming... not that it's a bad thing to have ambition, but 
 that's probably not even possible.

Sure it is ;)  You have to write a special linker.

I think everyone who thinks a scope decoration proposal is going to 1) solve 
all scope escape issues and 2) be easy to use is dreaming :P

 So, without a way to specify the requested scope of the parameters, 
 you'll
 very often have holes in your escape analysis that will propagate down 
 the
 caller chain, preventing any useful conclusion.

 Yes, and if a function has mis-specified some of its parameters, then you
 have code that doesn't compile.  Or the function itself won't compile, 
 and
 you need to do some more manual analysis.  Imagine a function that calls 
 5
 or 6 other functions with its parameters.  And there are multiple 
 different
 dependencies you have to resolve.  That's a lot of analysis you have to 
 do
 manually.

 You'll get an error at some call site, which can mean only two things: 
 either your local variable shouldn't be bound to the local scope (because 
 the function expects a reference it can keep beyond its scope) so you 
 should allocate it on the heap, or the function you're calling has its 
 prototype wrong.

Or, the prototype can't be written correctly, even though it is provable 
that no escapes occur.

 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

 I think this concept is going to be really hard for a person to decipher,
 and really hard to get right.

 It takes some thinking to get the prototype right at first. But it takes 
 less caution calling the function later with local variables since the 
 compiler will either issue an error or automatically fix the issue by 
 allocating on the heap when an argument requires a greater scope.

I hope to avoid this last situation.  Having the compiler make decisions for 
me, especially when heap allocation occurs, is bad.

 void f(ref int *a, int *b, int *c) { if(*b < *c) a = b;  else a = c;}

 void f(scope ref int *a, scopeof(a) int *b, scopeof(o) int *c)
 {
 if (*b < *c) a = b; else a = c;
 }

I assume you meant scopeof(a) instead of scopeof(o), but in any case, your 
design is incorrect.  a depends on b and c's scope, not the other way 
around.

Consider this valid usage:

void foo()
{
   int b = 1, c = 2;
   bar(&b, &c);
}

void bar(scope int *b, scope int *c)
{
   int *a;
   f(a, b, c);// should not fail, but would with your decorations
}

-Steve

Nov 01 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) solve 
 all scope escape issues and 2) be easy to use is dreaming :P

I think that's a fair assessment. One suggestion I made Walter is to 
only allow and implement the scope storage class for delegates, which 
simply means the callee will not squirrel away a pointer to delegate. 
That would allow us to solve the closure issue and for now sleep some 
more on the other issues.

Andrei

Nov 01 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

 I think that's a fair assessment. One suggestion I made Walter is to only 
 allow and implement the scope storage class for delegates, which simply 
 means the callee will not squirrel away a pointer to delegate. That would 
 allow us to solve the closure issue and for now sleep some more on the 
 other issues.

If scope delegates means trust the coder knows what he is doing (in the 
beginning), I agree with that plan of attack.

-Steve

Nov 02 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

 I think that's a fair assessment. One suggestion I made Walter is to only 
 allow and implement the scope storage class for delegates, which simply 
 means the callee will not squirrel away a pointer to delegate. That would 
 allow us to solve the closure issue and for now sleep some more on the 
 other issues.

 
 If scope delegates means trust the coder knows what he is doing (in the 
 beginning), I agree with that plan of attack.

It looks like things will move that way. Bartosz, Walter and I talked a 
lot yesterday about it - a lot of crazy things were on the table! The 
next step is to make this a reference, which is highly related to escape 
analysis. At the risk of anticipating a bit an unfinalized design, 
here's what's on the table:

* Continue an "anything goes" policy for *explicit* pointers, i.e. those 
written explicitly by user code with stars and stuff.

* Disallow pointers in SafeD.

* Make all ref parameters scoped by default. There will be impossible 
for a function to escape the address of a ref parameter without a cast. 
I haven't proved it to myself yet, but I believe that if pointers are 
not used and with the amendments below regarding arrays and delegates, 
this makes things entirely safe. In Walter's words, "it buttons things 
pretty tight".

* Make this a reference so that it obeys what references obey.

* If people want to implement e.g. linked lists, they should do it with 
classes. Implementing them with structs will require casts to obtain and 
escape &this. That also means they'd be using pointers, so anything goes 
- pointers are not restricted from escaping.

* There are two cases in which things escape without the user explicitly 
using pointers: delegates and dynamic arrays initialized from 
stack-allocated arrays.

* For delegates require the scope keyword in the signature of the 
callee. A scoped delegate cannot be stored, only called or passed down 
to another function that in turn takes a scoped delegate. This makes 
scope delegates entirely safe. Non-scoped delegates use dynamic allocation.

* We don't have an idea for dynamic arrays initialized from 
stack-allocated arrays.

Thoughts? Ideas?


Andrei

Nov 02 2008

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with 
 classes.

UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All
objects have a vtable.

Bye,
bearophile

Nov 02 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it
 with classes.

 
 UHm... I see. But I am not sure I like that. Isn't that a waste of
 memory? All objects have a vtable.

Yah, we can't get rid of that. Possibilities discussed were (a) make 
final classes not have a vtable, and (b) define a new kind of struct 
that's only heap allocated. Walter thinks both add quite some 
complication for little benefit. Let's not forget that a cast will allow 
the trick for those interested in saving the extra word.

Andrei

Nov 02 2008

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with
 classes.

 UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All

objects have a vtable.
 Bye,
 bearophile

And a monitor.  And RTTI. Then again, for code that absolutely must be as
efficient as possible, doing some fairly hackish/unsafe things is generally
considered more acceptable than in run-of-the-mill programming.  In these cases,
you could always do it with structs and just use the casts.  In the other 97% of
cases, when we should forget about small efficiencies, a class works fine.

Nov 02 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Sun, Nov 2, 2008 at 10:39 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Andrei Alexandrescu Wrote:
 * If people want to implement e.g. linked lists, they should do it with
 classes.

 UHm... I see. But I am not sure I like that. Isn't that a waste of memory? All
objects have a vtable.

No, they have a *pointer* to a vtable.  There is only one vtable per
class, allocated in static memory.  You only pay the cost of one
pointer.

Nov 02 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-02 10:12:46 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 It looks like things will move that way. Bartosz, Walter and I talked a 
 lot yesterday about it - a lot of crazy things were on the table! The 
 next step is to make this a reference, which is highly related to 
 escape analysis. At the risk of anticipating a bit an unfinalized 
 design, here's what's on the table:
 
 * Continue an "anything goes" policy for *explicit* pointers, i.e. 
 those written explicitly by user code with stars and stuff.

That's a little disapointing. I was hoping for something to fix all 
holes. I know it isn't easy to design and implement, but once done I 
firmly believe it would have the potential to completely eliminate the 
need for explicit memory allocation. For the programmer, it's a good 
trade: less worrying about what needs to be dynamically allocated and 
better documented function signatures.

Perhaps that would be too much of a departure from C and C++ though.


 * Disallow pointers in SafeD.

Again a consequence of not having a full scoping solution.

Couldn't you allow pointers in SafeD, while disallowing taking the 
address of local variables? This would limit pointers to heap-allocated 
variables. And disallow pointer arithmetic too.


 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a cast. 
 I haven't proved it to myself yet, but I believe that if pointers are 
 not used and with the amendments below regarding arrays and delegates, 
 this makes things entirely safe. In Walter's words, "it buttons things 
 pretty tight".

If this means you can't implement a swap function for this struct, then 
I think you're right that it's safe:

	struct A
	{
		ref A a;
	}

	void swap(ref A a0, ref A a1);

On the other side, if you can implement the swap function, then calling 
it is unsafe since you can rebind a reference to another without being 
able to check that their scopes are compatible.

So basically, references must always be initialized at construction and 
should be non-rebindable, just like in C++. (Hum, and I should mention 
I don't like too much references in C++.)


 * Make this a reference so that it obeys what references obey.

Ah, so that's why Walter wanted to change that suddenly. This is a good 
thing by itself, even without correct scoping.


 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain 
 and escape &this. That also means they'd be using pointers, so anything 
 goes - pointers are not restricted from escaping.
 
 * There are two cases in which things escape without the user 
 explicitly using pointers: delegates and dynamic arrays initialized 
 from stack-allocated arrays.
 
 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

Again, I'd say that if you can implement a swap function with those 
scope delegates, it's unsafe. Case in point:

	void f1(ref scope void delegate() arg)
	{
		int i;
		scope void f2()
		{
			++i;
		}

		scope void delegate() inner = &f2;
		swap(arg, inner); // this should be an error.
		arg = inner; // this too should be an error.
	}

If you can't rebind a the value of a scope delegate pointer, then all is fine.


 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Either disallow it, either keep it as unsafe as pointers (bad for SafeD 
I expect), or implement a complete scope-checking system (if you do it 
for arrays, you'll have done it for pointers too). You don't have much 
choice there, as arrays are pretty much the same thing as pointers.


 Thoughts? Ideas?

I'm under the impression that scope classes could be dangerous in this 
system: an object reference is not necessarly on the heap.

Personally, I'd have liked to have a language where you can be 
completely scope safe, where you could document interfaces so they know 
the scope they're evolving in. This concept of something in between is 
a nice attempt at a compromize, but I find it somewhat limitting.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 02 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-02 10:12:46 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 It looks like things will move that way. Bartosz, Walter and I talked 
 a lot yesterday about it - a lot of crazy things were on the table! 
 The next step is to make this a reference, which is highly related to 
 escape analysis. At the risk of anticipating a bit an unfinalized 
 design, here's what's on the table:

 * Continue an "anything goes" policy for *explicit* pointers, i.e. 
 those written explicitly by user code with stars and stuff.

 
 That's a little disapointing. I was hoping for something to fix all 
 holes. I know it isn't easy to design and implement, but once done I 
 firmly believe it would have the potential to completely eliminate the 
 need for explicit memory allocation. For the programmer, it's a good 
 trade: less worrying about what needs to be dynamically allocated and 
 better documented function signatures.
 
 Perhaps that would be too much of a departure from C and C++ though.

That's only the half of it. If you want to take a look at a C-like 
language that is safe, you may want to look at Cyclone. The reality is 
that making things 100% safe are going to require more or less the moral 
equivalent of Cyclone's limitations and demands from its user. I think 
Dan Grossman has done an excellent job making things "as tight as 
possible but not tighter", so Cyclone is a great yardstick to measure 
D's tradeoffs against.

 * Disallow pointers in SafeD.

 
 Again a consequence of not having a full scoping solution.

A "full scoping solution" would impose demands on you that you'd be the 
first to dislike.

 Couldn't you allow pointers in SafeD, while disallowing taking the 
 address of local variables? This would limit pointers to heap-allocated 
 variables. And disallow pointer arithmetic too.

I think pointers can be allowed in SafeD under certain restrictions 
starting with the ones you mention. We best start from the safe end.

 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a 
 cast. I haven't proved it to myself yet, but I believe that if 
 pointers are not used and with the amendments below regarding arrays 
 and delegates, this makes things entirely safe. In Walter's words, "it 
 buttons things pretty tight".

 
 If this means you can't implement a swap function for this struct, then 
 I think you're right that it's safe:
 
     struct A
     {
         ref A a;
     }
 
     void swap(ref A a0, ref A a1);
 
 On the other side, if you can implement the swap function, then calling 
 it is unsafe since you can rebind a reference to another without being 
 able to check that their scopes are compatible.

Swap will work fine because ref is not a type constructor. Struct A is 
in error. In fact ref not being a type constructor is much of the beauty 
of it all.

 So basically, references must always be initialized at construction and 
 should be non-rebindable, just like in C++. (Hum, and I should mention I 
 don't like too much references in C++.)

No, C++ references are "almost" type constructors. Also note that 
rvalues won't bind to any kind of references in D. (More on that later.)

 * Make this a reference so that it obeys what references obey.

 
 Ah, so that's why Walter wanted to change that suddenly. This is a good 
 thing by itself, even without correct scoping.

Yah, in fact it's pretty amazing it seems to work out so well. We gain a 
huge guarantee without changing much in the language.

 * If people want to implement e.g. linked lists, they should do it 
 with classes. Implementing them with structs will require casts to 
 obtain and escape &this. That also means they'd be using pointers, so 
 anything goes - pointers are not restricted from escaping.

 * There are two cases in which things escape without the user 
 explicitly using pointers: delegates and dynamic arrays initialized 
 from stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

 
 Again, I'd say that if you can implement a swap function with those 
 scope delegates, it's unsafe. Case in point:
 
     void f1(ref scope void delegate() arg)
     {
         int i;
         scope void f2()
         {
             ++i;
         }
 
         scope void delegate() inner = &f2;
         swap(arg, inner); // this should be an error.
         arg = inner; // this too should be an error.
     }
 
 If you can't rebind a the value of a scope delegate pointer, then all is 
 fine.

Indeed, rebinding would be disallowed.

 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

 
 Either disallow it, either keep it as unsafe as pointers (bad for SafeD 
 I expect), or implement a complete scope-checking system (if you do it 
 for arrays, you'll have done it for pointers too). You don't have much 
 choice there, as arrays are pretty much the same thing as pointers.

Exactly. Essentially array are as "bad" as structs containing pointers.

 Thoughts? Ideas?

 
 I'm under the impression that scope classes could be dangerous in this 
 system: an object reference is not necessarly on the heap.

I think a fair move to do is deal away with scope classes. We can still 
allow them via systems-level tricks, but not with an innocuous construct 
that's in fact a weapon of mass destruction.

 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they know 
 the scope they're evolving in. This concept of something in between is a 
 nice attempt at a compromize, but I find it somewhat limitting.

I agree. Again, something like this was on the table:

void wyda(scope T* a, scope U* b)
         if (scope(a) <= scope(b)
{
     a.field = b;
}

I think it's not hard to appreciate the toll this kind of user-written 
function summary exacts on the user of the language.


Andrei

Nov 02 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-02 19:04:37 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they know 
 the scope they're evolving in. This concept of something in between is 
 a nice attempt at a compromize, but I find it somewhat limitting.

 
 I agree. Again, something like this was on the table:
 
 void wyda(scope T* a, scope U* b)
          if (scope(a) <= scope(b)
 {
      a.field = b;
 }
 
 I think it's not hard to appreciate the toll this kind of user-written 
 function summary exacts on the user of the language.

First, I think it's a pretty good idea to have this. Second, I think 
it's possible to improve the syntax; there should be a way to not have 
to worry about the scope rules when you don't want them to bother you. 
Here's something we could do about it...

Add a special keyword (lets call it "autoscope" for now) that you can 
put at the start of the function making the compiler create 
automatically the less restrictive scope constrains from the function 
body and apply them to the signature. The restriction is that the 
source must be available for the compiler to see and there must not be 
any override based solely on scope constrains.

So basically, you could write:

	autoscope void wyda(T* a, U* b)
	{
		a.field = b;
	}

and the compiler would make the signature like your example above.

And it'd be a good idea if the compiler could generate correct scoping 
constrains (without using "autoscope") in an eventual generated .di 
file to make things faster and not reliant on the code itself.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 02 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-02 19:04:37 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Personally, I'd have liked to have a language where you can be 
 completely scope safe, where you could document interfaces so they 
 know the scope they're evolving in. This concept of something in 
 between is a nice attempt at a compromize, but I find it somewhat 
 limitting.

 I agree. Again, something like this was on the table:

 void wyda(scope T* a, scope U* b)
          if (scope(a) <= scope(b)
 {
      a.field = b;
 }

 I think it's not hard to appreciate the toll this kind of user-written 
 function summary exacts on the user of the language.

 
 First, I think it's a pretty good idea to have this. Second, I think 
 it's possible to improve the syntax; there should be a way to not have 
 to worry about the scope rules when you don't want them to bother you. 
 Here's something we could do about it...

[snip]

But syntax is so little a part of it. I knew since age immemorial that 
escape analysis is a bitch. I mean, everybody knows. Every once in a 
while, I'd get lulled into the belief that things can get "a little 
pregnant" in a sweet spot where the implementation isn't too hard, 
limitations aren't too severe, and the language doesn't get too complex. 
A couple of weeks ago was the (n + 1)th time that that happened; I got 
encouraged that Walter was willing to tackle the task of writing even a 
context/flow insensitive escape analyzer, and I also got hope from 
"scope" being an easy way to express something about a function. 
Ironically, it was your example that disabused me of my mistaken belief. 
  That leaves me in the position that if someone wants to show me there 
*is* such a sweet spot, they better come with a very airtight argument.


Andrei

Nov 02 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 But syntax is so little a part of it. I knew since age immemorial that 
 escape analysis is a bitch. I mean, everybody knows. Every once in a 
 while, I'd get lulled into the belief that things can get "a little 
 pregnant" in a sweet spot where the implementation isn't too hard, 
 limitations aren't too severe, and the language doesn't get too 
 complex. A couple of weeks ago was the (n + 1)th time that that 
 happened; I got encouraged that Walter was willing to tackle the task 
 of writing even a context/flow insensitive escape analyzer, and I also 
 got hope from "scope" being an easy way to express something about a 
 function. Ironically, it was your example that disabused me of my 
 mistaken belief.

Studying things more in depth often at first leave you with the 
impression that things are more complicated than they are. But after 
some time, you start to see a few common patterns and you can start to 
simplify and unify the concepts. Who would have thought some centuries 
ago that you could use the same math formulas to understand how an 
apple falls from a tree and how the Moon is orbiting around the Earth?

Perhaps it's a wise choice to forget about the idea and avoid wasting 
time on making things more complicated *if* they'll indeed make things 
more complicated. But right now I have the feeling that you're bailing 
out after a first try seeing things are more complicated than they 
first looked like, without even digging further to see if there are 
common patterns that would allow simplification and unification with 
other concepts further down the line.


 That leaves me in the position that if someone wants to show me there 
 *is* such a sweet spot, they better come with a very airtight argument.

I believe I have a complete solution by placing the scope annotations 
on the type as I will explain below, alghouth I don't have a good 
syntax for it. My solution doesn't revolve around escape analysis but 
more about explicit scoping constrains (which could and should be made 
implicit through escape analysis, but that isn't stricly needed for the 
scoping system to work).

And, as a bonus, it can provide a way for the compiler to completly 
free the programmer from having to explicity dynamically allocate 
things in his program (because all scopes are known at compile-time, 
the compiler can tell what needs to be dynamically allocated and what 
doesn't need to).

So, are you interested?

 - - -

Personally, I'd implement scoping rules by reusing the framework that 
was built for const. I'd make scope like const (a type modifier, is it 
called like that?), but with the additional variation that each scope 
qualifier could be bound to another variable's scope that would become 
a child scope (needed for a swap function for instance).

Basically, each pointer or reference in a type can get its own scope 
qualifier. Scope restriction work in the reverse direction however: the 
data you point to impose scoping restrictions to pointers leading to 
it, not the other way around like with transitive const. You can have a 
scope pointer to no-scope data. You can't have a no-scope pointer 
pointing to scope data. So "scope(char)*" makes little sense, since 
"char" being scope, the pointer needs to be scope too. This makes more 
sense: "char scope(*)", a scope pointer to non-scope data. Basically, 
scope should be more and more restricted while reading a type from left 
to right, so you could have something like "char scope(* 
scopeof(x)(*))".

There's of course a need for a better syntax than the above. But, I 
think the ugly syntax above conceptualize pretty well a good solution 
to the scoping problem that could extend arrays, structs and classes. 
We sure should make it prettier, perhaps by imposing restrictions like 
forcing all pointers in the type to be of the most restrictive scope 
which would avoid placing scope annotations everywhere in it. But in 
essence, I think this solution is workable.

From there, we can define scope comparisons, and scope restriction 
checks to apply when asigning variables to one another. Scope 
restriction checks could allow restriction propagation when doing 
escape analysis.

If you want more details, I can provide them as I've thought of the 
matter a lot in the last few days. I just don't have the time to write 
about everything about it right now.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 03 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

[snip]
 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to write 
 about everything about it right now.

It may be wise to read some more before writing some more. As far as I
understand it, your idea, if taken to completion, is very much like
region analysis as defined in Cyclone.

http://www.research.att.com/~trevor/papers/pldi2002.pdf

Here are some slides:

http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt

My hope was that we can obtain an approximation of that idea by defining
only two regions - "inside this function" and "outside this function".
It looks like that's not much gain for a lot of pain.

So the question is - should we introduce region analysis to D, or not?


Andrei

Nov 03 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-03 11:21:08 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 [snip]
 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to write 
 about everything about it right now.

 
 It may be wise to read some more before writing some more. As far as I
 understand it, your idea, if taken to completion, is very much like
 region analysis as defined in Cyclone.
 
 http://www.research.att.com/~trevor/papers/pldi2002.pdf
 
 Here are some slides:
 
 http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt

Pretty interesting slides.

Yeah, that looks pretty much like my idea, in concept, where I call 
regions scopes. But I'd have made things simpler by having only local 
function regions (on the stack) and the global region 
(dynamically-allocated garbage-collected heap), which mean you don't 
need templates at all for dealing with them. I also belive we can 
completly avoid the use of named regions, such as:

	{
		int*`L p;
		L: { int x; p = x; }
	}

The problem illustrated above, of having a pointer outside the inner 
braces take the address of a variable inside it, solves itself if you 
allow a variable's region to be "promoted" automatically to a broader 
one. For instance, you could write:

	{
		int* p;
		{ int x; p = x; }
	}

and p = x would make the compiler automatically extend the life of x up 
to p's region (local scope), although x wouldn't be accessible outside 
of the the inner braces other than by dereferencing p.

If the pointer was copied outside of the function, then the only 
available broader region to promote x to would be the heap. I think 
this should be done automatically, although it could be decided to 
require dynamic allocation to be explicit too; this is of little 
importance to the escape analysis and scopre restriction problem.


 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.
 
 So the question is - should we introduce region analysis to D, or not?

I think we should at least try. I don't think we need everything 
Cyclone does however; we can and should keep things simpler.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 04 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-03 11:21:08 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 On 2008-11-03 00:39:34 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:

 [snip]
 If you want more details, I can provide them as I've thought of the 
 matter a lot in the last few days. I just don't have the time to 
 write about everything about it right now.

 It may be wise to read some more before writing some more. As far as I
 understand it, your idea, if taken to completion, is very much like
 region analysis as defined in Cyclone.

 http://www.research.att.com/~trevor/papers/pldi2002.pdf

 Here are some slides:

 http://www.cs.washington.edu/homes/djg/slides/cyclone_pldi02.ppt

 
 Pretty interesting slides.
 
 Yeah, that looks pretty much like my idea, in concept, where I call 
 regions scopes. But I'd have made things simpler by having only local 
 function regions (on the stack) and the global region 
 (dynamically-allocated garbage-collected heap), which mean you don't 
 need templates at all for dealing with them.

I don't understand that part.

 I also belive we can 
 completly avoid the use of named regions, such as:
 
     {
         int*`L p;
         L: { int x; p = x; }
     }
 
 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:
 
     {
         int* p;
         { int x; p = x; }
     }
 
 and p = x would make the compiler automatically extend the life of x up 
 to p's region (local scope), although x wouldn't be accessible outside 
 of the the inner braces other than by dereferencing p.

Cyclone has region subtyping which takes care of that.

 If the pointer was copied outside of the function, then the only 
 available broader region to promote x to would be the heap. I think this 
 should be done automatically, although it could be decided to require 
 dynamic allocation to be explicit too; this is of little importance to 
 the escape analysis and scopre restriction problem.
 
 
 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.

 So the question is - should we introduce region analysis to D, or not?

 
 I think we should at least try. I don't think we need everything Cyclone 
 does however; we can and should keep things simpler.

I'm not sure how to read this. For what I can tell, Cyclone's region 
analysis does not introduce undue complexity. It does the minimum 
necessary to prove that function manipulating pointers are safe. So if 
you suggest a simpler scheme, then either it is more limiting, less 
safe, or both. What are the tradeoffs you are thinking about, and how do 
they compare to Cyclone?


Andrei

Nov 04 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Yeah, that looks pretty much like my idea, in concept, where I call 
 regions scopes. But I'd have made things simpler by having only local 
 function regions (on the stack) and the global region 
 (dynamically-allocated garbage-collected heap), which mean you don't 
 need templates at all for dealing with them.

 
 I don't understand that part.

Indeed, I was somewhat mistaken that the <> notation was templates 
(seen to much C++ lately), which somewhat confused my analysis for a 
few things later. And perhaps I should have read a little more about 
Cyclone before attempting a comparison as it seems I got a few things 
wrong from the slides.


 My hope was that we can obtain an approximation of that idea by defining
 only two regions - "inside this function" and "outside this function".
 It looks like that's not much gain for a lot of pain.
 
 So the question is - should we introduce region analysis to D, or not?

 
 I think we should at least try. I don't think we need everything 
 Cyclone does however; we can and should keep things simpler.

 
 I'm not sure how to read this. For what I can tell, Cyclone's region 
 analysis does not introduce undue complexity. It does the minimum 
 necessary to prove that function manipulating pointers are safe. So if 
 you suggest a simpler scheme, then either it is more limiting, less 
 safe, or both. What are the tradeoffs you are thinking about, and how 
 do they compare to Cyclone?

I guess I'd have to familiarize myself with Cyclone a little more to be 
able to do a good comparison. Right now I've just been scratching the 
surface, but it looks more complicated than what I had in mind for D. 
I'd tend to believe Cyclone may cover some cases that wouldn't be by 
mine, but I'm not sure which one and I am currently under the 
impression that they are not that important (could be handled in other 
manners).

Don't forget that Cyclone is targeted at the C language, which doesn't 
has templates nor garbage collection (although Cyclone supports an 
optional garbage collector). Since D has both, it can leverage some of 
this to simplify things. For instance, because of the garbage collector 
I don't think we need what Cyclone calls dynamic regions: I'd simply 
put everything escaping a function on the heap. It then follows that we 
don't need to propagate region handles.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 04 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I also belive we can completly avoid the use of named regions, such as:
 
     {
         int*`L p;
         L: { int x; p = x; }
     }
 
 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:
 
     {
         int* p;
         { int x; p = x; }
     }
 
 and p = x would make the compiler automatically extend the life of x up 
 to p's region (local scope), although x wouldn't be accessible outside 
 of the the inner braces other than by dereferencing p.

 
 Cyclone has region subtyping which takes care of that.

Not the same way as I'm proposing. What cyclone does is make p 
undereferencable outside the scope of L. So if I add an assignment to p 
outside of L, it won't compile:

    {
        int*`L p;
        L: { int x; p = &x; }
		*p = 42; // error, dereferencing p outside of L.
    }

What I'm proposing is that such code extends the life of the storage of 
the local variable x to p's region:

    {
        int* p;
        { int x; p = &x; }
		*p = 42; // okay; per assignment to p, x lives up to p's scope.
		x; // error, x is not accessible in this scope, except through p.
    }

Follows that if p is outside of the local function, x needs to be 
allocated dynamically (just as closures currently do for each variable 
they use):

	void f(ref int* p)
    {
        int x;
		p = &x;
    }

If you want to make sure x never escapes the memory region associated 
to its scope, then you can declare x as scope and get a compile-time 
error when assigning it to p.

So, in essence, the system I propose is a little simpler because 
pointer variables just cannot point to values coming from a region that 
doesn't exist in the scope the pointer is declared. The guaranty I 
propose is that during the whole lifetime of a pointer, it points to 
either a valid memory region, or null. Cyclone's approach is to forbid 
you from dereferencing the pointer.

Combine this with my proposal to not have dynamic regions and we don't 
need named regions anymore. Perhaps the syntax could be made simpler 
with region names, but technically, we don't need them as we can always 
go the route of saying that a pointer value is "valid within the scope 
of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
in my other examples, and I believe it is analogous to the 
"regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
pervasively.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 05 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-04 12:36:15 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 I also belive we can completly avoid the use of named regions, such as:

     {
         int*`L p;
         L: { int x; p = x; }
     }

 The problem illustrated above, of having a pointer outside the inner 
 braces take the address of a variable inside it, solves itself if you 
 allow a variable's region to be "promoted" automatically to a broader 
 one. For instance, you could write:

     {
         int* p;
         { int x; p = x; }
     }

 and p = x would make the compiler automatically extend the life of x 
 up to p's region (local scope), although x wouldn't be accessible 
 outside of the the inner braces other than by dereferencing p.

 Cyclone has region subtyping which takes care of that.

 
 Not the same way as I'm proposing. What cyclone does is make p 
 undereferencable outside the scope of L. So if I add an assignment to p 
 outside of L, it won't compile:
 
    {
        int*`L p;
        L: { int x; p = &x; }
         *p = 42; // error, dereferencing p outside of L.
    }
 
 What I'm proposing is that such code extends the life of the storage of 
 the local variable x to p's region:
 
    {
        int* p;
        { int x; p = &x; }
         *p = 42; // okay; per assignment to p, x lives up to p's scope.
         x; // error, x is not accessible in this scope, except through p.
    }

Well how about this:

int * p;
float * q;
if (condition) {
     int x; p = &x;
} else {
     float y; q = &y;
}

Houston, we have a problem.

You can of course patch that little rule in a number of ways, but really 
at the end of the day what happens only inside a function is 
uninteresting. The main challenge is making the analysis scalable to 
multiple functions.

 Follows that if p is outside of the local function, x needs to be 
 allocated dynamically (just as closures currently do for each variable 
 they use):
 
     void f(ref int* p)
    {
        int x;
         p = &x;
    }

Well this pretty much hamstrings pointers. You can take addresses of 
things inside a function but you can't pass them around. Moreover, 
people disliked the stealth dynamic allocation when delegates are being 
used; you are adding more of those.

 If you want to make sure x never escapes the memory region associated to 
 its scope, then you can declare x as scope and get a compile-time error 
 when assigning it to p.
 
 So, in essence, the system I propose is a little simpler because pointer 
 variables just cannot point to values coming from a region that doesn't 
 exist in the scope the pointer is declared. The guaranty I propose is 
 that during the whole lifetime of a pointer, it points to either a valid 
 memory region, or null. Cyclone's approach is to forbid you from 
 dereferencing the pointer.
 
 Combine this with my proposal to not have dynamic regions and we don't 
 need named regions anymore. Perhaps the syntax could be made simpler 
 with region names, but technically, we don't need them as we can always 
 go the route of saying that a pointer value is "valid within the scope 
 of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
 in my other examples, and I believe it is analogous to the 
 "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
 pervasively.

IMHO this may be made to work. I personally prefer the system in which 
ref is safe and pointers are permissive. The system you are referring to 
makes ref and pointer of the same power, so we could as well dispense 
with either. But I'd be curious what others think of it. Notice how the 
discussion participants got reduced to you and me, and from what I saw 
that's not a good sign.


Andrei

Nov 06 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 IMHO this may be made to work. I personally prefer the system in which ref 
 is safe and pointers are permissive. The system you are referring to makes 
 ref and pointer of the same power, so we could as well dispense with 
 either. But I'd be curious what others think of it. Notice how the 
 discussion participants got reduced to you and me, and from what I saw 
 that's not a good sign.

FWIW, I still think the proposal you have put forth about references being 
the safe type and pointers being permissive is the best one so far.  It's 
clean, doesn't add excessive syntax, and makes good practical sense.

I think full scope analysis is an interesting problem to solve, but it may 
just be an academic exercise, as it would be impractical to develop with. 
Just MHO.

-Steve

Nov 07 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-06 23:36:55 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Well how about this:
 
 int * p;
 float * q;
 if (condition) {
      int x; p = &x;
 } else {
      float y; q = &y;
 }
 
 Houston, we have a problem.

I don't see a problem at all. The compiler would expand the lifetime of 
x to the outer scope, and do the same for y. Basically, the compiler 
would make it this way in the compiled code:

	int * p;
	float * q;
	int x;
	float y;
	if (condition) {
		p = &x;
	} else {
		q = &y;
	}

A good optimising compiler could also place x and y in a union to save 
some space.


 You can of course patch that little rule in a number of ways, but 
 really at the end of the day what happens only inside a function is 
 uninteresting. The main challenge is making the analysis scalable to 
 multiple functions.

Indeed. Personally, I take the case above as a simple optimisation to 
avoid unnecessary dynamic allocation of x and y when you need to extend 
variable lifetime to a broader scope part of the same function.


 Follows that if p is outside of the local function, x needs to be 
 allocated dynamically (just as closures currently do for each variable 
 they use):
 
     void f(ref int* p)
    {
        int x;
         p = &x;
    }

 
 Well this pretty much hamstrings pointers. You can take addresses of 
 things inside a function but you can't pass them around. Moreover, 
 people disliked the stealth dynamic allocation when delegates are being 
 used; you are adding more of those.

I'd like to point out that the two things people complained the most 
about regarding the automatic dynamic allocation for dynamic closures:

1.	There is no way to prevent it, to make sure there is no allocation.
2.	The compiler does allocate a lot more than necessary.

In my proposal, these two points are addressed:

1.	You can declare any variable as "scope", preventing it from being placed
	in a broader scope, preventing at the same time dynamic allocation.
2.	The compiler being aware of what arguments do and do not escape the
	scope of the called functions, it won't allocate unnecessarily.

So I think the situation would be much better.

But all this is orthogonal to having or not an escape analysis system, 
as we could choose the reverse conventions: no variable can escape its 
scope unless explicitly authorized by some new syntactic construct.


 If you want to make sure x never escapes the memory region associated 
 to its scope, then you can declare x as scope and get a compile-time 
 error when assigning it to p.
 
 So, in essence, the system I propose is a little simpler because 
 pointer variables just cannot point to values coming from a region that 
 doesn't exist in the scope the pointer is declared. The guaranty I 
 propose is that during the whole lifetime of a pointer, it points to 
 either a valid memory region, or null. Cyclone's approach is to forbid 
 you from dereferencing the pointer.
 
 Combine this with my proposal to not have dynamic regions and we don't 
 need named regions anymore. Perhaps the syntax could be made simpler 
 with region names, but technically, we don't need them as we can always 
 go the route of saying that a pointer value is "valid within the scope 
 of variable_x". This is what I'm expressing with "scopeof(variable_x)" 
 in my other examples, and I believe it is analogous to the 
 "regions_of(variable_x)" in Cyclone, although Cyclone doesn't use it 
 pervasively.

 
 IMHO this may be made to work. I personally prefer the system in which 
 ref is safe and pointers are permissive. The system you are referring 
 to makes ref and pointer of the same power, so we could as well 
 dispense with either.

I'm not too thrilled by references. I once got a question from someone 
coming from C: what is the difference between a pointer and a reference 
in C++? I had to answer: references are pointers with a different 
syntax, no rebindability, and no possibility of being null. It seems he 
and I both agree that references are mostly a cosmetic patch to solve a 
syntactic problem. References in D aren't much different.

If we could have a unified syntax for pointers of all kinds, I think 
it'd be more convenient than having two kinds of pointers. A 
null-forbiding but rebindable pointer would be more useful in my 
opinion than the current reference concept.


 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

Indeed. I'm interested in other opinions too.

But I'm under the impression that many lost track of what was being 
discussed, especially since we started referring to Cyclone which few 
are familiar with and probably few have read the paper.

One of the fears expressed at the start of the thread was about 
excessive need for annotation, but as the Cyclone paper say, with good 
defaults, you need to add scoping annotation only to a few specific 
places. (It took me some time to read the paper and start discussing 
things sanely after that, remember?) So perhaps we could get more 
people involved if we could propose a tangible syntax for it.

Or perhaps not; for advanced programmers who already understand well 
what can and cannot be done by passing pointers around, full escape 
analysis may not seem to be a so interesting gain since they've already 
adopted the right conventions to avoid most bugs it would prevent. And 
most people here who can discuss this topic with some confidence are 
not newbies to programming and don't make too much mistakes of the sort 
anymore.

Which makes me think of beginners saying pointers are hard. You've 
certainly seen beginners struggle as they learn how to correctly use 
pointers in C or C++. Making sure their program fail at compile-time, 
with an explicative error message as to why they mustn't do this or 
that, is certainly going to help their experience learning the language 
more than cryptic and frustrating segfaults and access violations at 
runtime, sometime far from the source of the problem.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 09 2008

Christopher Wright <dhasenan gmail.com> writes:

Michel Fortin wrote:
 On 2008-11-06 23:36:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Well how about this:

 int * p;
 float * q;
 if (condition) {
      int x; p = &x;
 } else {
      float y; q = &y;
 }

 Houston, we have a problem.

 
 I don't see a problem at all. The compiler would expand the lifetime of 
 x to the outer scope, and do the same for y. Basically, the compiler 
 would make it this way in the compiled code:
 
     int * p;
     float * q;
     int x;
     float y;
     if (condition) {
         p = &x;
     } else {
         q = &y;
     }

In point of fact, it's expensive to extend the stack, so any compiler 
would do that, even without escape analysis.

On the other hand, what about nested functions? I don't think they'd 
cause any trouble, but I'm not certain.

Nov 09 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-09 08:59:18 -0500, Christopher Wright <dhasenan gmail.com> said:

 Michel Fortin wrote:
 I don't see a problem at all. The compiler would expand the lifetime of 
 x to the outer scope, and do the same for y. Basically, the compiler 
 would make it this way in the compiled code:
 
     int * p;
     float * q;
     int x;
     float y;
     if (condition) {
         p = &x;
     } else {
         q = &y;
     }

 
 In point of fact, it's expensive to extend the stack, so any compiler 
 would do that, even without escape analysis.

Indeed.


 On the other hand, what about nested functions? I don't think they'd 
 cause any trouble, but I'm not certain.

If you mean there could be a problem with functions referring to the 
pointer, I'd say that with properly propagated escape constrains, it's 
safe. But it's an interesting case nonetheless. Consider this:

    int * p;
    if (condition) {
    	int x;
    	p = &x;
    } else {
    	int y;
    	p = &y;
    }
	int f() { return *p; }
	return &f;

Now returning &f forces p to dynamically allocate on the heap, which 
puts a constrain on p forcing it to point only to variables on the 
heap, which in turn forces x and y to be allocated on the heap.

I haven't verified, but I'm pretty certain this doesn't work correctly 
with the current dynamic closures in D2 however (because escape 
analysis doesn't see through pointers).

Also, if you made p point to a value it received in argument, and the 
scope of that argument isn't the global scope, it'd be an error. For 
instance, this wouldn't work:

	int delegate() foo1(int* arg) {
		int f() { return *arg; }
		return &f; // error, returned closure may live longer than *arg; need 
constraint
	}

Constraining the lifetime of the returned value to be no longer than 
the one of the argument would allow it to work safely (disregard the 
bizarre syntax for expressing the constrain on the delegate):

	int delegate(arg)() foo2(int* arg) {
		int f() { return *arg; }
		return &f;
		// ok, returned closure lifetime guarantied to be
		// at most as long as the lifetime of *arg.
	}

	int globalInt;
	int delegate() globalDelegate;

	void bar() {
		int localInt;
		int delegate() localDelegate;

		globalDelegate = foo2(globalInt); // ok, same lifetime
		localDelegate = foo2(globalInt); // ok, delegate lifetime shorter
		localDelegate = foo2(localInt); // ok, same lifetime

		globalDelegate = foo2(localInt);
		// ok, but forces bar to allocate localInt on the heap since otherwise
		// localInt lifetime would be shorter than lifetime of the delegate
	}

Note that what I want to demonstrate is that the compiler can see 
pretty clearly what needs and what doesn't need to be allocated on the 
heap to guaranty safety. Whether we decide it does allocate 
automatically or it generate an error is of lesser concern to me. (And 
I'll add that some other issues with templates may make this automatic 
allocation scheme unworkable.)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 14 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 I'd like to point out that the two things people complained the most 
 about regarding the automatic dynamic allocation for dynamic closures:
 
 1.    There is no way to prevent it, to make sure there is no allocation.
 2.    The compiler does allocate a lot more than necessary.
 
 In my proposal, these two points are addressed:
 
 1.    You can declare any variable as "scope", preventing it from being 
 placed
     in a broader scope, preventing at the same time dynamic allocation.
 2.    The compiler being aware of what arguments do and do not escape the
     scope of the called functions, it won't allocate unnecessarily.
 
 So I think the situation would be much better.

I agree that an escape analyzer would improve things. I am not sure that 
one oblivious to regions is expressive enough.

 But all this is orthogonal to having or not an escape analysis system, 
 as we could choose the reverse conventions: no variable can escape its 
 scope unless explicitly authorized by some new syntactic construct.

It's not orthogonal. Whatever the default is, you must be able to 
enforce escaping rules, otherwise the system would be as good as a 
convention.

 If you want to make sure x never escapes the memory region associated 
 to its scope, then you can declare x as scope and get a compile-time 
 error when assigning it to p.

 So, in essence, the system I propose is a little simpler because 
 pointer variables just cannot point to values coming from a region 
 that doesn't exist in the scope the pointer is declared. The guaranty 
 I propose is that during the whole lifetime of a pointer, it points 
 to either a valid memory region, or null. Cyclone's approach is to 
 forbid you from dereferencing the pointer.

 Combine this with my proposal to not have dynamic regions and we 
 don't need named regions anymore. Perhaps the syntax could be made 
 simpler with region names, but technically, we don't need them as we 
 can always go the route of saying that a pointer value is "valid 
 within the scope of variable_x". This is what I'm expressing with 
 "scopeof(variable_x)" in my other examples, and I believe it is 
 analogous to the "regions_of(variable_x)" in Cyclone, although 
 Cyclone doesn't use it pervasively.

 IMHO this may be made to work. I personally prefer the system in which 
 ref is safe and pointers are permissive. The system you are referring 
 to makes ref and pointer of the same power, so we could as well 
 dispense with either.

 
 I'm not too thrilled by references. I once got a question from someone 
 coming from C: what is the difference between a pointer and a reference 
 in C++? I had to answer: references are pointers with a different 
 syntax, no rebindability, and no possibility of being null. It seems he 
 and I both agree that references are mostly a cosmetic patch to solve a 
 syntactic problem. References in D aren't much different.

I disagree. References in D are very different. They are not type 
constructors. They are storage classes that can only be used in function 
signatures, which makes them impossible to dangle. I think C++ 
references would also have been much better off as storage classes 
instead of half-life types.

 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my opinion 
 than the current reference concept.

Well ref means "This function wants to modify its argument". That is a 
very different charter from what pointers mean. So I'm not sure how you 
say you'd much prefer this to that. They are not comparable.

 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

 
 Indeed. I'm interested in other opinions too.
 
 But I'm under the impression that many lost track of what was being 
 discussed, especially since we started referring to Cyclone which few 
 are familiar with and probably few have read the paper.

In my experience, when someone is interested in something, she'd make 
time for it. So I take that as lack of interest. And hey, since when was 
lack of expertise a real deterrent? :o)

 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with good 
 defaults, you need to add scoping annotation only to a few specific 
 places. (It took me some time to read the paper and start discussing 
 things sanely after that, remember?) So perhaps we could get more people 
 involved if we could propose a tangible syntax for it.

To be very frank, I think we are very far from having an actual 
proposal, and syntax is of very low priority now if you want to put one 
together. Right now what we have is a few vague ideas and conjectures 
(e.g., there's no need for named regions because the need would be rare 
enough to require dynamic allocation for those cases). I'm not saying 
that to criticize, but merely to underline the difficulties.

 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've already 
 adopted the right conventions to avoid most bugs it would prevent. And 
 most people here who can discuss this topic with some confidence are not 
 newbies to programming and don't make too much mistakes of the sort 
 anymore.
 
 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the language 
 more than cryptic and frustrating segfaults and access violations at 
 runtime, sometime far from the source of the problem.

I totally agree that pointers are hard and good static checking for them 
would help. Currently, what we try to do is obviate the need for 
pointers in most cases, and to actually forbid them in safe modules. The 
question that remains is, how many unsafe modules are necessary, and 
what liability do they entail? If there are few and not too unwieldy, 
maybe we can declare victory without constructing an escape analyzer. I 
agree if you or anyone says they don't think so. At this point, I am not 
sure, but what I can say is that it's good to reduce the need for 
pointers regardless.


Andrei

Nov 09 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 I'd like to point out that the two things people complained the most 
 about regarding the automatic dynamic allocation for dynamic closures:
 
 1.    There is no way to prevent it, to make sure there is no allocation.
 2.    The compiler does allocate a lot more than necessary.
 
 In my proposal, these two points are addressed:
 
 1.    You can declare any variable as "scope", preventing it from being placed
     in a broader scope, preventing at the same time dynamic allocation.
 2.    The compiler being aware of what arguments do and do not escape the
     scope of the called functions, it won't allocate unnecessarily.
 
 So I think the situation would be much better.

 
 I agree that an escape analyzer would improve things. I am not sure 
 that one oblivious to regions is expressive enough.

If you think I proposed a region-oblivious scheme, then you've got me 
wrong (and perhaps it's my fault for not explaining well enough). Let 
me explain again, and I'll try to not skip anything this time.

Cyclone has dynamic regions, regions which are allocated on the heap 
but that are deleted at the end of the scope that created them. 
Basically, those are scoped heaps offering a very useful system to 
automatically free memory. (It's somewhat similar in concept to Cocoa's 
NSAutoReleasePool for instance.) The downside of them is that you need 
to pass region handle around (so called functions can allocate objects 
within them).

So my first point is that since we have a garbage collector in D, and 
moreover since we're likely to get one heap per thread in D2, we don't 
need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
the thread-local heap, 3) All the stack frames; and you can't allocate 
other stack frames than the current one. Because none of these regions 
require a handle to allocate into, we (A) don't need region handles.

We still have many regions. Beside the two heaps (shared, 
thread-local), each function's stack frame, and each block within them, 
creates a distinct memory region. But nowhere we need to know exactly 
which region a function parameter comes from; what we need to know is 
which address outlives which pointer, and then we can forbid assigning 
addresses to pointers that outlive them. All we need is a relative 
ordering of the various regions, and for that we don't need to attach 
*names* to the regions so that you can refer explicitly to them in the 
syntax. Instead, you could say something like "region of (x)", or 
"region of (*y)" and that would be enough.

So there is still a region for every pointer, only regions don't need 
to be *named* because you can always refer to them by referring to the 
variables. (And perhaps the syntax would be clearer with region names 
than without, in which case I don't mind we use them. But they're not 
required for the concept to work.)


 I'm not too thrilled by references. I once got a question from someone 
 coming from C: what is the difference between a pointer and a reference 
 in C++? I had to answer: references are pointers with a different 
 syntax, no rebindability, and no possibility of being null. It seems he 
 and I both agree that references are mostly a cosmetic patch to solve a 
 syntactic problem. References in D aren't much different.

 
 I disagree. References in D are very different. They are not type 
 constructors. They are storage classes that can only be used in 
 function signatures, which makes them impossible to dangle. I think C++ 
 references would also have been much better off as storage classes 
 instead of half-life types.

Which makes me think of this:

	struct A { int i; this(); }
	ref A foo(ref A a) { return a; }

	ref A bar()
	{
		foo(A()).i = 1;

		ref A a = foo(A()); // illegal, ref cannot be used outside function signature
		a.i = 1;

		return foo(A()); // illegal ?
	}

Also, I'd like to point out that ref (and out) being storage classes 
somewhat hinder me from using them where it makes sense in the 
D/Objective-C bridge, since there most functions are instanciated by 
templates where template arguments give the type of each function 
argument. Perhaps there should be a way to specify "ref" and "out" in 
template arguments...


 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my 
 opinion than the current reference concept.

 
 Well ref means "This function wants to modify its argument". That is a 
 very different charter from what pointers mean. So I'm not sure how you 
 say you'd much prefer this to that. They are not comparable.

I was under the impression that ref would be allowed as a storage class 
for local variables. I'll say it's perfectly acceptable for function 
arguments, but I'm less sure about function return types.

Also, I'd still like to have a non-null pointer type, especially for 
clarifying function sigatures. A template can do. If it was in the 
language however it be used by more people, which would be better.


 But I'd be curious what others think of it. Notice how the discussion 
 participants got reduced to you and me, and from what I saw that's not 
 a good sign.

 
 Indeed. I'm interested in other opinions too.
 
 But I'm under the impression that many lost track of what was being 
 discussed, especially since we started referring to Cyclone which few 
 are familiar with and probably few have read the paper.

 
 In my experience, when someone is interested in something, she'd make 
 time for it. So I take that as lack of interest. And hey, since when 
 was lack of expertise a real deterrent? :o)

As I said below, I think many people in this group are already 
confortable with using pointers, which may explain why they're not so 
interested. Having no one interested in something doesn't necessarly 
mean they won't appreciate it when it comes.

It does, however reduce the incitative for continuing forward. So I 
understand why you're backing off, even if it displease me somewhat.


 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with good 
 defaults, you need to add scoping annotation only to a few specific 
 places. (It took me some time to read the paper and start discussing 
 things sanely after that, remember?) So perhaps we could get more 
 people involved if we could propose a tangible syntax for it.

 
 To be very frank, I think we are very far from having an actual 
 proposal, and syntax is of very low priority now if you want to put one 
 together. Right now what we have is a few vague ideas and conjectures 
 (e.g., there's no need for named regions because the need would be rare 
 enough to require dynamic allocation for those cases). I'm not saying 
 that to criticize, but merely to underline the difficulties.

I never said the need for dynamic regions would be rare: I said garbage 
collector obsoletes it. If we can justify the need for dynamic regions 
later, we can add them back (with all the added complexity it requires) 
but I'd try without them first.


 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've already 
 adopted the right conventions to avoid most bugs it would prevent. And 
 most people here who can discuss this topic with some confidence are 
 not newbies to programming and don't make too much mistakes of the sort 
 anymore.
 
 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the language 
 more than cryptic and frustrating segfaults and access violations at 
 runtime, sometime far from the source of the problem.

 
 I totally agree that pointers are hard and good static checking for 
 them would help. Currently, what we try to do is obviate the need for 
 pointers in most cases, and to actually forbid them in safe modules.

But dynamic arrays *are* pointers, how are you oblivating the need for 
them? If you find a solution for dynamic arrays, you'll have a solution 
for pointers too.

You could forbid dynamic arrays from refering to stack-allocated static 
ones, or automatically dynamically allocate those when they escape in a 
dynamic array. And if I were you, whatever you choose for arrays I'd 
allow it for pointers too, to keep things consistent. Pointer to heap 
objects should be retained in my opinion.


 The question that remains is, how many unsafe modules are necessary, 
 and what liability do they entail? If there are few and not too 
 unwieldy, maybe we can declare victory without constructing an escape 
 analyzer. I agree if you or anyone says they don't think so. At this 
 point, I am not sure, but what I can say is that it's good to reduce 
 the need for pointers regardless.

But are you reducing the need for pointers or hiding and restricting 
them? I'd say the later. Reference are pointers with restrictions. 
Object references are no different from pointer except in syntax (they 
can even point to stack allocated objects with scope classes). Dynamic 
arrays are pointers with a certain range. Closure have a pointer to a 
stack frame, which can be heap-allocated or not.

The only way to have a safe system without escape analysis is to force 
everything they can point to to be on the heap, or prevent them from 
escaping at all (as with ref). I which there could be some consistency 
here.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 12 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we don't 
 need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
 the thread-local heap, 3) All the stack frames; and you can't allocate 
 other stack frames than the current one. Because none of these regions 
 require a handle to allocate into, we (A) don't need region handles.
 
 We still have many regions. Beside the two heaps (shared, thread-local), 
 each function's stack frame, and each block within them, creates a 
 distinct memory region. But nowhere we need to know exactly which region 
 a function parameter comes from; what we need to know is which address 
 outlives which pointer, and then we can forbid assigning addresses to 
 pointers that outlive them. All we need is a relative ordering of the 
 various regions, and for that we don't need to attach *names* to the 
 regions so that you can refer explicitly to them in the syntax. Instead, 
 you could say something like "region of (x)", or "region of (*y)" and 
 that would be enough.

But how do you type then the assignment example?

void assign(int** p, int * r) { *p = *r; }

How do you reflect the requirement that r's region outlives *p's region?

But that's not even the point. Say you define some notation, such as:

void assign(int** p, int * r) if (region(r) <= region(p));

But the whole point of regions was to _simplify_ notations like the 
above into:

void assign(region R)(int*R* p, int *R r);

So although you think you simplified things by using region(symbol) 
instead of symbolic names, you complicated things. The compiler still 
needs to infer regions for each value, so it is as complicated as a 
named-regions compiler, and in addition you require the user to write 
bulkier expressions because you disallow use of symbols. So everybody is 
worse off. Note how in the example using a symbolic region the outlives 
relationship is enforced implicitly by using the same symbol name in two 
places.

I suspect there are things you can't even express without symbolic 
regions. Consider this example from Dan's slides:

struct ILst(region R1, region R2) {
     int *R1 hd;
     ILst!(R1, R2) *R2 tl;
}

This code reflects the fact that the list holds pointer to integers in 
one region, whereas the nodes themselves are in a different region. It 
would be a serious challenge to tackle that without symbolic regions, 
and simpler that won't be for anybody.

I'll insert a few more points below in this sprawling discussion.

 Which makes me think of this:
 
     struct A { int i; this(); }
     ref A foo(ref A a) { return a; }
 
     ref A bar()
     {
         foo(A()).i = 1;
 
         ref A a = foo(A()); // illegal, ref cannot be used outside 
 function signature
         a.i = 1;
 
         return foo(A()); // illegal ?
     }

foo(A()) is illegal because ref does not bind to an rvalue.

 Also, I'd like to point out that ref (and out) being storage classes 
 somewhat hinder me from using them where it makes sense in the 
 D/Objective-C bridge, since there most functions are instanciated by 
 templates where template arguments give the type of each function 
 argument. Perhaps there should be a way to specify "ref" and "out" in 
 template arguments...

I agree. Something like that is on the list.

 If we could have a unified syntax for pointers of all kinds, I think 
 it'd be more convenient than having two kinds of pointers. A 
 null-forbiding but rebindable pointer would be more useful in my 
 opinion than the current reference concept.

 Well ref means "This function wants to modify its argument". That is a 
 very different charter from what pointers mean. So I'm not sure how 
 you say you'd much prefer this to that. They are not comparable.

 
 I was under the impression that ref would be allowed as a storage class 
 for local variables. I'll say it's perfectly acceptable for function 
 arguments, but I'm less sure about function return types.

As of now, ref is not planned for local variables.

 Also, I'd still like to have a non-null pointer type, especially for 
 clarifying function sigatures. A template can do. If it was in the 
 language however it be used by more people, which would be better.

I don't grok this notion "if it's in the language it would be used by 
more people". How does that come about? Does it mean templates are at 
such a high syntactic disadvantage? Maybe we should do something about 
that then, such as replacing !() with something else :o). If we put it 
in phobos (which after integration will be usable alongside with tango) 
could it count as being in the language?

 But I'd be curious what others think of it. Notice how the 
 discussion participants got reduced to you and me, and from what I 
 saw that's not a good sign.

 Indeed. I'm interested in other opinions too.

 But I'm under the impression that many lost track of what was being 
 discussed, especially since we started referring to Cyclone which few 
 are familiar with and probably few have read the paper.

 In my experience, when someone is interested in something, she'd make 
 time for it. So I take that as lack of interest. And hey, since when 
 was lack of expertise a real deterrent? :o)

 
 As I said below, I think many people in this group are already 
 confortable with using pointers, which may explain why they're not so 
 interested. Having no one interested in something doesn't necessarly 
 mean they won't appreciate it when it comes.

That I totally agree with. It's happened a couple of times with D features.

 It does, however reduce the incitative for continuing forward. So I 
 understand why you're backing off, even if it displease me somewhat.

I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
seem to strongly believe you can make some nice simplified regions work, 
and make people like them. Taking that to a proof is hard. The conundrum 
is, you are facing the prospect of putting work into it and creating a 
system that, albeit correct, is not enticing.

 One of the fears expressed at the start of the thread was about 
 excessive need for annotation, but as the Cyclone paper say, with 
 good defaults, you need to add scoping annotation only to a few 
 specific places. (It took me some time to read the paper and start 
 discussing things sanely after that, remember?) So perhaps we could 
 get more people involved if we could propose a tangible syntax for it.

 To be very frank, I think we are very far from having an actual 
 proposal, and syntax is of very low priority now if you want to put 
 one together. Right now what we have is a few vague ideas and 
 conjectures (e.g., there's no need for named regions because the need 
 would be rare enough to require dynamic allocation for those cases). 
 I'm not saying that to criticize, but merely to underline the 
 difficulties.

 
 I never said the need for dynamic regions would be rare: I said garbage 
 collector obsoletes it. If we can justify the need for dynamic regions 
 later, we can add them back (with all the added complexity it requires) 
 but I'd try without them first.

Let's not forget that symbolic regions (for typing purposes) should not 
be confused with dynamic regions (for efficiency purposes). I agree we 
can do away with the latter and put them in later if we care. I disagree 
that dropping symbolic regions simplifies things.

 Or perhaps not; for advanced programmers who already understand well 
 what can and cannot be done by passing pointers around, full escape 
 analysis may not seem to be a so interesting gain since they've 
 already adopted the right conventions to avoid most bugs it would 
 prevent. And most people here who can discuss this topic with some 
 confidence are not newbies to programming and don't make too much 
 mistakes of the sort anymore.

 Which makes me think of beginners saying pointers are hard. You've 
 certainly seen beginners struggle as they learn how to correctly use 
 pointers in C or C++. Making sure their program fail at compile-time, 
 with an explicative error message as to why they mustn't do this or 
 that, is certainly going to help their experience learning the 
 language more than cryptic and frustrating segfaults and access 
 violations at runtime, sometime far from the source of the problem.

 I totally agree that pointers are hard and good static checking for 
 them would help. Currently, what we try to do is obviate the need for 
 pointers in most cases, and to actually forbid them in safe modules.

 
 But dynamic arrays *are* pointers, how are you oblivating the need for 
 them? If you find a solution for dynamic arrays, you'll have a solution 
 for pointers too.
 
 You could forbid dynamic arrays from refering to stack-allocated static 
 ones, or automatically dynamically allocate those when they escape in a 
 dynamic array. And if I were you, whatever you choose for arrays I'd 
 allow it for pointers too, to keep things consistent. Pointer to heap 
 objects should be retained in my opinion.

But a possible path is to make arrays safe and leave pointers for those 
cases in which efficiency is of utmost importance. With luck, those 
cases are rare.

 The question that remains is, how many unsafe modules are necessary, 
 and what liability do they entail? If there are few and not too 
 unwieldy, maybe we can declare victory without constructing an escape 
 analyzer. I agree if you or anyone says they don't think so. At this 
 point, I am not sure, but what I can say is that it's good to reduce 
 the need for pointers regardless.

 
 But are you reducing the need for pointers or hiding and restricting 
 them?

Of course - that's the whole point. In fact, I'll insert a small 
correction: we are reducing the need for pointers BY hiding and 
restricting them. And that's a good thing. If you can do most of your 
work with restricted pointers (e.g. ref), then that's a net win.


Andrei

Nov 12 2008

Hxal <hxal freenode.irc> writes:

Andrei Alexandrescu Wrote:
 But how do you type then the assignment example?
 
 void assign(int** p, int * r) { *p = *r; }
 
 How do you reflect the requirement that r's region outlives *p's region?
 
 But that's not even the point. Say you define some notation, such as:
 
 void assign(int** p, int * r) if (region(r) <= region(p));
 
 But the whole point of regions was to _simplify_ notations like the 
 above into:
 
 void assign(region R)(int*R* p, int *R r);
 
 So although you think you simplified things by using region(symbol) 
 instead of symbolic names, you complicated things. The compiler still 
 needs to infer regions for each value, so it is as complicated as a 
 named-regions compiler, and in addition you require the user to write 
 bulkier expressions because you disallow use of symbols. So everybody is 
 worse off. Note how in the example using a symbolic region the outlives 
 relationship is enforced implicitly by using the same symbol name in two 
 places.

Examples such as this one are rare enough to afford the need for
annotations. I was under the impression that D was supposed to promote
the use of references over pointers. People working with low-level
code will probably either appreciate the optimization and correctness
checking, or can request a way to turn off compiler enforcement of
scoping in low-level code fragments.

 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Transitive scope ownership ensures that a member of a structure outlives
the structure itself. In which case we can create a list in a local scope,
and either add objects allocated in that scope or any parent scope or
the heap. Referencing objects from child scopes would be incorrect and
I don't think it's unreasonable to expect the programmer to code around
such a desire.

foo*R*Q x, if (R in Q)
is illegal, because it could produce a dangling reference.

foo*R*Q x, if (Q in R)
is equivalent to foo*Q*Q, for the purpose of:
*x = y;
where y is one of foo*R, foo*Q or foo*global

A problem arises for other operations though:
foo*R*Q might have different semantics than foo*Q*Q
when being on the right-hand side of the assignment.
y = *x;
is legal for foo*R y, but not for foo*Q y.

Therefore, while the lifetime must always stay constant
or be reduced towards the right side of the type declaration.
It's necessary to be able to explicitly relax restrictions
towards the left.

The problem is that the type syntax is suited for scope
relaxation rules to be transitive, not scope restriction.
Ie. global(foo*)* makes sense, when * is scoped by default,
but scope(foo*)* doesn't make sense, when * is global
by default.

So we could either implement it with regions, which I'm
not a big fan of (better than nothing though!);
or ditch "scope" (as a restriction) in favor of "global"
and maybe "scopeof()" (as a relaxation).

Hopefully soon D2 and the book will be done and
the development of D3 can start, and such a breaking
change can be introduced.

 But a possible path is to make arrays safe and leave pointers for those 
 cases in which efficiency is of utmost importance. With luck, those 
 cases are rare.

Safe sure, but not by fobidding the usage of stack arrays.

Nov 12 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we don't 
 need dynamic regions. The remaining regions are: 1) the shared heap, 2) 
 the thread-local heap, 3) All the stack frames; and you can't allocate 
 other stack frames than the current one. Because none of these regions 
 require a handle to allocate into, we (A) don't need region handles.
 
 We still have many regions. Beside the two heaps (shared, 
 thread-local), each function's stack frame, and each block within them, 
 creates a distinct memory region. But nowhere we need to know exactly 
 which region a function parameter comes from; what we need to know is 
 which address outlives which pointer, and then we can forbid assigning 
 addresses to pointers that outlive them. All we need is a relative 
 ordering of the various regions, and for that we don't need to attach 
 *names* to the regions so that you can refer explicitly to them in the 
 syntax. Instead, you could say something like "region of (x)", or 
 "region of (*y)" and that would be enough.

 
 But how do you type then the assignment example?
 
 void assign(int** p, int * r) { *p = *r; }
 
 How do you reflect the requirement that r's region outlives *p's region?
 
 But that's not even the point. Say you define some notation, such as:
 
 void assign(int** p, int * r) if (region(r) <= region(p));
 
 But the whole point of regions was to _simplify_ notations like the above into:
 
 void assign(region R)(int*R* p, int *R r);
 
 So although you think you simplified things by using region(symbol) 
 instead of symbolic names, you complicated things. The compiler still 
 needs to infer regions for each value, so it is as complicated as a 
 named-regions compiler, and in addition you require the user to write 
 bulkier expressions because you disallow use of symbols. So everybody 
 is worse off. Note how in the example using a symbolic region the 
 outlives relationship is enforced implicitly by using the same symbol 
 name in two places.

Everywhere I said there was no need for named regions, I also said 
named regions could be kept to ease the syntax. That said, I'm not so 
sure named regions are that good at simplifying the syntax. In your 
assign example above, the named-region version has an error: it forces 
the two pointers to be of the same region. That could be fine, but, 
assuming you're assigning to *p, it'd be more precise to write it like 
that:

	void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

Once we get there, I think the no-named region syntax is better. That 
said, for the swap example, where both values need to share the same 
region, the named region notation is simpler:

	void swap(region R)(int*R a, int*R b);
	void swap(int* a, int* b) if (region(a) == region(b));

But I'd argue that most of the time regions do not need to be equal, 
but are subset or superset of each other, so reusing variable names 
makes more sense in my opinion.

In any case, I prefer a notation where regions constrains are attached 
directly to the type instead of being expressed somewhere else. 
Something like this (explained below):

	void assign(int*(r)* p, int* r) { *p = r; }
	void swap(ref int*(b) a, ref int*(a) b);

Here, a parenthesis suffix after a pointer indicates the region 
constrain of the pointer, based on the region of another pointer. In 
the first example, int*(r)* means that the integer pointer "*p" must 
not live beyond the value pointed by "r" (because we're going to assign 
"r" to "*p"). In the second example, the value pointed by "a" must not 
live longer than the one pointed by "b" and the value pointed by "b" 
must not live longer than the one pointed "a"; the net result is that 
they must have the same lifetime and need to be in the same region.

For something more complicated, you could give multiple 
commas-separated constrains:

	void choose(ref int*(a,b) result, int* a, int* b)
	{
		result = rand() > 0.5 ? a : b;
	}


 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

Today's templates are just fine for that. Just propagate variables 
through template arguments and apply region constrains to the members:

	struct ILst(alias var1, alias var2) {
		int*(var1) hd;
		ILst!(var1, var2)*(var2) tl;
	}
	
	int z;
	int*(z) a, b;
	ILst!(a, b) lst1;
	ILst!(&z, &z) lst2;

We could even allow regions to propagate through type arguments too:

	struct ILst2(T1, T2) {
		int*(T1) hd;
		ILst2!(T1, T2)*(T2) tl;
	}
	ILst2!(typeof(&z), typeof(b)) lst3;

I think this example is a good case for attaching region constrains 
directly to types instead of expressing them as conditional expressions 
elsewhere, as in "if (region a <= region b)".


 I'll insert a few more points below in this sprawling discussion.
 
 Which makes me think of this:
 
     struct A { int i; this(); }
     ref A foo(ref A a) { return a; }
 
     ref A bar()
     {
         foo(A()).i = 1;
 
         ref A a = foo(A()); // illegal, ref cannot be used outside 
 function signature
         a.i = 1;
 
         return foo(A()); // illegal ?
     }

 
 foo(A()) is illegal because ref does not bind to an rvalue.

Ah, you're right.


 Also, I'd like to point out that ref (and out) being storage classes 
 somewhat hinder me from using them where it makes sense in the 
 D/Objective-C bridge, since there most functions are instanciated by 
 templates where template arguments give the type of each function 
 argument. Perhaps there should be a way to specify "ref" and "out" in 
 template arguments...

 
 I agree. Something like that is on the list.

Great!


 Also, I'd still like to have a non-null pointer type, especially for 
 clarifying function sigatures. A template can do. If it was in the 
 language however it be used by more people, which would be better.

 
 I don't grok this notion "if it's in the language it would be used by 
 more people". How does that come about?

No, I really think it's true that if it is in the language, explained 
right alongside nullable pointers, more people would learn them more 
and use them more. Isn't it this exact notion that made Walter add Ddoc 
and unit tests directly into the language?


 Does it mean templates are at such a high syntactic disadvantage? Maybe 
 we should do something about that then, such as replacing !() with 
 something else :o). If we put it in phobos (which after integration 
 will be usable alongside with tango) could it count as being in the 
 language?

Pointers that shouldn't be null are pretty common, possibly even more 
common that can-be-null pointers, which is why I think it deserves a 
good, short, easy to read and remember syntax. I'd even suggest 
changing the standard syntax for pointer "*" so it only allows non-null 
pointers, and having something else "*?" for nullable ones. This would 
force people into giving more consideration before allowing nullable 
pointers, and the same syntax could apply to objects too.

That said, having a non-nullable pointer in the standard library would 
certainly be better than nothing. And the standard library should make 
use of it everywhere it makes sense. But is a standard-libary solution 
going to work with "extern (C)" functions? I think it'd be sad if it 
didn't, and it would look strange if it did (C functions with template 
arguments!).


 As I said below, I think many people in this group are already 
 confortable with using pointers, which may explain why they're not so 
 interested. Having no one interested in something doesn't necessarly 
 mean they won't appreciate it when it comes.

 
 That I totally agree with. It's happened a couple of times with D features.
 
 It does, however reduce the incitative for continuing forward. So I 
 understand why you're backing off, even if it displease me somewhat.

 
 I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
 seem to strongly believe you can make some nice simplified regions 
 work, and make people like them. Taking that to a proof is hard. The 
 conundrum is, you are facing the prospect of putting work into it and 
 creating a system that, albeit correct, is not enticing.

Currently, I'm just trying to convince you (and any other potential 
silent listeners) that it can work. I haven't given much though about 
the syntax before today as I wanted to clear up the concepts first. But 
now, in part because of your syntactic arguments above, I'm wondering 
if this was the good path to take.

I don't mind much if it never gets into the language, although I'd like 
it very much. I doing it for myself too, to better understand how you 
can document and analyse the region/scope relationship of various 
variables in a program piece by piece.


 I never said the need for dynamic regions would be rare: I said garbage 
 collector obsoletes it. If we can justify the need for dynamic regions 
 later, we can add them back (with all the added complexity it requires) 
 but I'd try without them first.

 
 Let's not forget that symbolic regions (for typing purposes) should not 
 be confused with dynamic regions (for efficiency purposes). I agree we 
 can do away with the latter and put them in later if we care. I 
 disagree that dropping symbolic regions simplifies things.

I was under the impression that Cyclone requirement for named regions 
came with its use of dynamic regions, which I now believe was 
incorrect. If I take this example from the paper:

	char?p rstrdup(region_t<p>, const char? s);

you *need* a name for the region handle. Since region handles are there 
for supporting dynamic regions, it therefore follows that you need 
named regions to make things work at all... well here's the catch: you 
need named *region handles* as variables, not necessarily named 
regions, as you could always arrange the syntax so that the returned 
pointer is of the region of the region handle... or something like that.


 But dynamic arrays *are* pointers, how are you oblivating the need for 
 them? If you find a solution for dynamic arrays, you'll have a solution 
 for pointers too.
 
 You could forbid dynamic arrays from refering to stack-allocated static 
 ones, or automatically dynamically allocate those when they escape in a 
 dynamic array. And if I were you, whatever you choose for arrays I'd 
 allow it for pointers too, to keep things consistent. Pointer to heap 
 objects should be retained in my opinion.

 
 But a possible path is to make arrays safe and leave pointers for those 
 cases in which efficiency is of utmost importance. With luck, those 
 cases are rare.

"make arrays safe"... by forcing dynamic ones to always be on the heap? 
Or by implementing a full region system that applies only to arrays? 
Obviously it's not the later; the former is the only choice I can see.

And I think you should at least allow pointers to work with heap 
variables in SafeD... otherwise people will work around that by 
creating one-item arrays. :-)


 But are you reducing the need for pointers or hiding and restricting them?

 
 Of course - that's the whole point. In fact, I'll insert a small 
 correction: we are reducing the need for pointers BY hiding and 
 restricting them. And that's a good thing. If you can do most of your 
 work with restricted pointers (e.g. ref), then that's a net win.

Whether you can work effectively only with ref or not remains to be seen.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 12 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2008-11-12 10:02:02 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Michel Fortin wrote:
 On 2008-11-09 10:10:03 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 So my first point is that since we have a garbage collector in D, and 
 moreover since we're likely to get one heap per thread in D2, we 
 don't need dynamic regions. The remaining regions are: 1) the shared 
 heap, 2) the thread-local heap, 3) All the stack frames; and you 
 can't allocate other stack frames than the current one. Because none 
 of these regions require a handle to allocate into, we (A) don't need 
 region handles.

 We still have many regions. Beside the two heaps (shared, 
 thread-local), each function's stack frame, and each block within 
 them, creates a distinct memory region. But nowhere we need to know 
 exactly which region a function parameter comes from; what we need to 
 know is which address outlives which pointer, and then we can forbid 
 assigning addresses to pointers that outlive them. All we need is a 
 relative ordering of the various regions, and for that we don't need 
 to attach *names* to the regions so that you can refer explicitly to 
 them in the syntax. Instead, you could say something like "region of 
 (x)", or "region of (*y)" and that would be enough.

 But how do you type then the assignment example?

 void assign(int** p, int * r) { *p = *r; }

 How do you reflect the requirement that r's region outlives *p's region?

 But that's not even the point. Say you define some notation, such as:

 void assign(int** p, int * r) if (region(r) <= region(p));

 But the whole point of regions was to _simplify_ notations like the 
 above into:

 void assign(region R)(int*R* p, int *R r);

 So although you think you simplified things by using region(symbol) 
 instead of symbolic names, you complicated things. The compiler still 
 needs to infer regions for each value, so it is as complicated as a 
 named-regions compiler, and in addition you require the user to write 
 bulkier expressions because you disallow use of symbols. So everybody 
 is worse off. Note how in the example using a symbolic region the 
 outlives relationship is enforced implicitly by using the same symbol 
 name in two places.

 
 Everywhere I said there was no need for named regions, I also said named 
 regions could be kept to ease the syntax. That said, I'm not so sure 
 named regions are that good at simplifying the syntax. In your assign 
 example above, the named-region version has an error: it forces the two 
 pointers to be of the same region. That could be fine, but, assuming 
 you're assigning to *p, it'd be more precise to write it like that:
 
     void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

No, the code is correct as written (without the if). You may want to 
reread the paper with an eye for region subtyping rules. This partly 
backs up my point: understanding region analysis may be quite a burden 
for the average programmer. Even you, who took pains to think through 
everything and absorb the paper, are having trouble. And me too to be 
honest :o).

 Once we get there, I think the no-named region syntax is better.

This is invalidated by the wrong assertion above.

 That 
 said, for the swap example, where both values need to share the same 
 region, the named region notation is simpler:
 
     void swap(region R)(int*R a, int*R b);
     void swap(int* a, int* b) if (region(a) == region(b));

No, for that swap there is no need to specify any region. You can swap 
ints in any two regions. Probably you meant to use int** throughout.

 But I'd argue that most of the time regions do not need to be equal, but 
 are subset or superset of each other, so reusing variable names makes 
 more sense in my opinion.

Don't forget that using a region name twice may actually work with two 
different regions, so far as they are in a subtyping relationship. 
Region subtyping is key to both simplifying code and to understanding 
code after simplification.

 In any case, I prefer a notation where regions constrains are attached 
 directly to the type instead of being expressed somewhere else. 
 Something like this (explained below):
 
     void assign(int*(r)* p, int* r) { *p = r; }
     void swap(ref int*(b) a, ref int*(a) b);

Sure. I'm sure there's understanding that that doesn't make anything any 
simpler or any easier to implement or understand. It's just a minor 
change in notation, and IMHO not to the better.

 Here, a parenthesis suffix after a pointer indicates the region 
 constrain of the pointer, based on the region of another pointer.

I thought it means pointer to function. Oops.

 In the 
 first example, int*(r)* means that the integer pointer "*p" must not 
 live beyond the value pointed by "r" (because we're going to assign "r" 
 to "*p"). In the second example, the value pointed by "a" must not live 
 longer than the one pointed by "b" and the value pointed by "b" must not 
 live longer than the one pointed "a"; the net result is that they must 
 have the same lifetime and need to be in the same region.
 
 For something more complicated, you could give multiple commas-separated 
 constrains:
 
     void choose(ref int*(a,b) result, int* a, int* b)
     {
         result = rand() > 0.5 ? a : b;
     }

This all is irrelevant. You essentially change the syntax. Syntax is, 
again, the least of the problems to be solved.

 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:

 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }

 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

 
 Today's templates are just fine for that. Just propagate variables 
 through template arguments and apply region constrains to the members:
 
     struct ILst(alias var1, alias var2) {
         int*(var1) hd;
         ILst!(var1, var2)*(var2) tl;
     }
     
     int z;
     int*(z) a, b;
     ILst!(a, b) lst1;
     ILst!(&z, &z) lst2;

I hope you agree that this is just written symbols without much meaning. 
This is not half-baked. It's not even rare. The cow is still moving. I 
can't eat that! :o) I can't even start replying to it because there are 
so many actual and potential issues, I'd need to get to work on them first.

 We could even allow regions to propagate through type arguments too:
 
     struct ILst2(T1, T2) {
         int*(T1) hd;
         ILst2!(T1, T2)*(T2) tl;
     }
     ILst2!(typeof(&z), typeof(b)) lst3;
 
 I think this example is a good case for attaching region constrains 
 directly to types instead of expressing them as conditional expressions 
 elsewhere, as in "if (region a <= region b)".

I am thoroughly lost here, sorry. I can't even answer "this is so wrong" 
or "this is pure genius". Probably it's somewhere in between :o). At any 
rate, I suggest you develop a solid understanding of Cyclone if you want 
to build something related to it.

[In the interest of coherence I snipped away unrelated parts of the 
discussion.]

 I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
 seem to strongly believe you can make some nice simplified regions 
 work, and make people like them. Taking that to a proof is hard. The 
 conundrum is, you are facing the prospect of putting work into it and 
 creating a system that, albeit correct, is not enticing.

 
 Currently, I'm just trying to convince you (and any other potential 
 silent listeners) that it can work.

I understand I've been blunt throughout this post, but please side with 
me for a minute. I'm doing so for the following reasons: (a) I'm 
essentially writing this post in negative time; (b) I believe you 
currently don't have an attack on the problem you're trying to solve; 
(c) I believe it's worthwhile for you to develop an attack on the 
problem, (d) I think "we" = "the D community" should seriously consider 
safety and consequently things like region analysis.

You can now stop siding with me and side again with yourself. At this 
point you can easily guess that all of the above was to prepare you for 
an even blunter comment. Here goes.

You say you want to convince people "it can work". But right now there 
is no "it". You have no "it". Much less an "it" that can work.

But there is of course good hope that an "it" could emerge, and I 
encourage you to continue working towards that goal. It's just a lot 
more work than it might appear.



Andrei

Nov 12 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Michel Fortin wrote:
 Everywhere I said there was no need for named regions, I also said 
 named regions could be kept to ease the syntax. That said, I'm not so 
 sure named regions are that good at simplifying the syntax. In your 
 assign example above, the named-region version has an error: it forces 
 the two pointers to be of the same region. That could be fine, but, 
 assuming you're assigning to *p, it'd be more precise to write it like 
 that:
 
     void assign(region R1, region R2)(int*R1* p, int*R2 r) if (R1 <= R2);

 
 No, the code is correct as written (without the if). You may want to 
 reread the paper with an eye for region subtyping rules. This partly 
 backs up my point: understanding region analysis may be quite a burden 
 for the average programmer. Even you, who took pains to think through 
 everything and absorb the paper, are having trouble. And me too to be 
 honest :o).

Ok, I've reread that part and it's true that using Cyclone's subtyping 
rules it'd work fine with only one region name because Cyclone 
implicitly creates two regions from that, the first being a subset of 
the other, just as I wrote explicitly here. But what I missed out was 
one of Cyclone's syntactic construct, not a concept of regions.

... or perhaps we have a different notion of what is a syntax and what 
is a concept?


 Once we get there, I think the no-named region syntax is better.

 
 This is invalidated by the wrong assertion above.

Yes and no. It's true that Cyclone's region subtyping makes the syntax 
prettier. On the other side, the programmer has to be aware of how it 
works, and especially aware that changing the order his arguments will 
implicitly change the region relationship between them.


 That said, for the swap example, where both values need to share the 
 same region, the named region notation is simpler:
 
     void swap(region R)(int*R a, int*R b);
     void swap(int* a, int* b) if (region(a) == region(b));

 
 No, for that swap there is no need to specify any region. You can swap 
 ints in any two regions. Probably you meant to use int** throughout.

Hum, you're right, I meant to make these "ref int*".


 But I'd argue that most of the time regions do not need to be equal, 
 but are subset or superset of each other, so reusing variable names 
 makes more sense in my opinion.

 
 Don't forget that using a region name twice may actually work with two 
 different regions, so far as they are in a subtyping relationship. 
 Region subtyping is key to both simplifying code and to understanding 
 code after simplification.

I'm not convinced that region subtyping is so simple to understand for 
neophytes, especially because you may assume the same region at first 
glance. Cyclone isn't C++, but this region subtyping rule makes me 
think of one of those many little known corners in C++ such as Koenig 
name lookup.

But I consider this just a syntactic issue about how to express regions 
though. And I may be completely wrong about its unintuitiveness.


 In any case, I prefer a notation where regions constrains are attached 
 directly to the type instead of being expressed somewhere else. 
 Something like this (explained below):
 
     void assign(int*(r)* p, int* r) { *p = r; }
     void swap(ref int*(b) a, ref int*(a) b);

 
 Sure. I'm sure there's understanding that that doesn't make anything 
 any simpler or any easier to implement or understand. It's just a minor 
 change in notation, and IMHO not to the better.

Ok, then we disagree here. I think this notation is better because it 
makes you think about things in term of pointer lifetime vs. the 
pointed data lifetime, which I think is much less abstract than 
variables being part of different regions where some regions encompass 
other regions. It's a shift in perspective from the syntactic approach 
of Cyclone, although under the hood the compiler would do mostly the 
same work.


 Here, a parenthesis suffix after a pointer indicates the region 
 constrain of the pointer, based on the region of another pointer.

 
 I thought it means pointer to function. Oops.

And I though the syntax was the least of your concern right now? :-) 
This probably can't be the final syntax, but I think it makes things 
clear enough talk about about the concepts... for now.


 In the first example, int*(r)* means that the integer pointer "*p" must 
 not live beyond the value pointed by "r" (because we're going to assign 
 "r" to "*p"). In the second example, the value pointed by "a" must not 
 live longer than the one pointed by "b" and the value pointed by "b" 
 must not live longer than the one pointed "a"; the net result is that 
 they must have the same lifetime and need to be in the same region.
 
 For something more complicated, you could give multiple 
 commas-separated constrains:
 
     void choose(ref int*(a,b) result, int* a, int* b)
     {
         result = rand() > 0.5 ? a : b;
     }

 
 This all is irrelevant. You essentially change the syntax. Syntax is, 
 again, the least of the problems to be solved.

Ok then. Let's go to the real problems.


 I suspect there are things you can't even express without symbolic 
 regions. Consider this example from Dan's slides:
 
 struct ILst(region R1, region R2) {
      int *R1 hd;
      ILst!(R1, R2) *R2 tl;
 }
 
 This code reflects the fact that the list holds pointer to integers in 
 one region, whereas the nodes themselves are in a different region. It 
 would be a serious challenge to tackle that without symbolic regions, 
 and simpler that won't be for anybody.

 
 Today's templates are just fine for that. Just propagate variables 
 through template arguments and apply region constrains to the members:
 
     struct ILst(alias var1, alias var2) {
         int*(var1) hd;
         ILst!(var1, var2)*(var2) tl;
     }
         int z;
     int*(z) a, b;
     ILst!(a, b) lst1;
     ILst!(&z, &z) lst2;

 
 I hope you agree that this is just written symbols without much 
 meaning. This is not half-baked. It's not even rare. The cow is still 
 moving. I can't eat that! :o) I can't even start replying to it because 
 there are so many actual and potential issues, I'd need to get to work 
 on them first.

If you mean there aren't any explanation, then you're right that 
explanations were somewhat missing from my last post. Sorry. I guess I 
was too tired to notice the lack of instructions.

Basically you apply the same rules as for the function signatures in 
the preceding function examples. For instance, "int*(var1)" means the 
ht pointer points to an int that lives at least as long as the one 
pointed by var1 (var1 must be an "int*" pointer). This means that you 
can assign the content of var1 to it, or anything else that will live 
at least as long as var1. It also mean you can take its value and place 
it in var1, or any pointer with a shorter life.

Then, we have "ILst!(var1, var2)*(var2)". It's the same rules as the 
first, except that we have a different type beyond the pointer which 
must be valid through var2's lifetime.

The last code snippet shows how to use that template.

    int z;
    int*(z) a, b;
    ILst!(a, b) lst1;
    ILst!(&z, &z) lst2;

Here, we're declaring "int*(z)", which is a pointer to an int whose 
lifetime is equal or longer than the address of z. (ok, there's an 
error here, it should have been "int*(&z)"). And normally, you wouldn't 
explicitly write that, "int*" would be enough: the compiler should 
determine the default constrains automatically.

Then when you instanciate ILst!(a, b), the template will take the 
lifetime of a and b (which is the lifetime of the address of z) and 
apply it to pointers inside the struct.


 We could even allow regions to propagate through type arguments too:
 
     struct ILst2(T1, T2) {
         int*(T1) hd;
         ILst2!(T1, T2)*(T2) tl;
     }
     ILst2!(typeof(&z), typeof(b)) lst3;


Again, some explanations were missing... Basically, 
region/scoping/lifetime constrains are attached to pointers. Which 
means that propagating a type ought to be enough to propagate the 
lifetime constrains too. "ILst2!(typeof(&z), typeof(b))" is exactly the 
same as "ILst!(&z, b)". ILst takes its constrains from variables while 
ILst2 takes its constrains from types.

But the two previous examples are a little stretched to make the 
concept more similar to Cyclone. With my proposal, you can do much 
better than this.

I think in most cases where you want to propagate constrains, you'll 
want to propagate a type too. If what you want is a linked list, it'd 
be better expressed generically like this:

	struct ListRoot(T) {
		ListNode!(T)* first;
	}
    struct ListNode(T) {
        T hd;
        ILst2!(T)* tl;
    }

	int global;
	void foo() {
		int a;
		ListRoot!(int*) listRoot;
		ListNode!(int*) listNode;
		listRoot.first = &listNode;
		listNode.hd = &a;
		listNode.hd = &global;
	}

Notice how there is absolutely no special annotation here; it's already 
valid template code.

Now, let the compiler apply some defaults according to these rules: 
types declared in local variables will be allowed to point to values of 
their own region, and structs members will be allowed to point to 
values of the same region the struct comes from. Annotated explicitly, 
the default annotations would look like this:

	struct ListRoot(T) {
		ListNode!(T)*(this) first; // pointer to something in the same region as this
	}
    struct ListNode(T) {
        T value; // if T is a pointer, it holds its own region annotations
        ILst2!(T)*(this) next; // pointer to something in the same 
region as this
    }

	int global;
	void foo() {
		int a;
		ListRoot!(int*(&listRoot)) listRoot;
		ListNode!(int*(&listNode)) listNode;
		listRoot.first = &listNode;
		listNode.value = &a;
		listNode.value = &global;
	}

With this scheme, the lifetime of all nodes in the linked list need to 
be equal or longer than the one of the preceding node (normally, they 
will all be equal), and the lifetime of the value pointer is determined 
by the type you give as a template argument to ListRoot and ListNode. 
Therefore, it becomes possible to construct the linked list on the 
stack when the root is on the stack, with no need for explicit 
annotations.

There is still one problem though. If you want to swap two nodes, you 
can't, because there is no guarenty that the lifetime of the "this" 
pointer of a ListNode is equal to lifetime of the "next" pointer. (In 
fact, the next pointer lifetime is longer or equal to the struct 
lifetime). So if we're going to swap or reorder nodes, we'll need a way 
to constrain the "this" pointer against the "next" pointer to create a 
circular reference and thus forcing the two pointers to point to the 
same region... perhaps something like this:
	
    struct ListNode(T) {
		ListNode*(next) this;
        T value;
        ILst2!(T)*(this) next;
    }

Not a very good syntax though.


 I think this example is a good case for attaching region constrains 
 directly to types instead of expressing them as conditional expressions 
 elsewhere, as in "if (region a <= region b)".

 
 I am thoroughly lost here, sorry. I can't even answer "this is so 
 wrong" or "this is pure genius". Probably it's somewhere in between 
 :o). At any rate, I suggest you develop a solid understanding of 
 Cyclone if you want to build something related to it.

I'll side with "pure genius", but I also consider myself biased. :-)


 I'm sorry about how you feel. Now we're in a conundrum of sorts. You 
 seem to strongly believe you can make some nice simplified regions 
 work, and make people like them. Taking that to a proof is hard. The 
 conundrum is, you are facing the prospect of putting work into it and 
 creating a system that, albeit correct, is not enticing.

 
 Currently, I'm just trying to convince you (and any other potential 
 silent listeners) that it can work.

 
 I understand I've been blunt throughout this post, but please side with 
 me for a minute. I'm doing so for the following reasons: (a) I'm 
 essentially writing this post in negative time; (b) I believe you 
 currently don't have an attack on the problem you're trying to solve; 
 (c) I believe it's worthwhile for you to develop an attack on the 
 problem, (d) I think "we" = "the D community" should seriously consider 
 safety and consequently things like region analysis.

I don't mind about (a) and I agree about (d).

I'll say that because of my lack of expertise with Cyclone I have some 
difficulty expressing my proposal as a comparaison of what is different 
from Cyclone (it's difficult enough without it). You're the one asking 
for such a comparison and increasing the difficulty. I do not dislike 
the challenge, but I don't think you can take this as a proof that I 
don't understand well the problem I'm trying to solve when I may just 
be mixing some things about the approach taken by Cyclone.

Another thing not helping is that my original proposal has evolved a 
little since the first time I started the "full scope analysis 
proposal" thread. I also revamped the syntax I use to talk about the 
problem (and apparently I should do it again to avoid a conflicts with 
function names). Hunting in previous post the details I leave out in 
the more recent ones doesn't help anyone understanding what I'm talking 
about.

I'm thinking that maybe I should put everything in one document to have 
a coherent proposal that could evolve as a whole instead of one 
scattered on various post between which the syntax I use and some 
concepts have evolved.


 You can now stop siding with me and side again with yourself. At this 
 point you can easily guess that all of the above was to prepare you for 
 an even blunter comment. Here goes.
 
 You say you want to convince people "it can work". But right now there 
 is no "it". You have no "it". Much less an "it" that can work.
 
 But there is of course good hope that an "it" could emerge, and I 
 encourage you to continue working towards that goal. It's just a lot 
 more work than it might appear.

I'm pretty sure I hold that "it" just now, or something very near it. 
It's just that it seems I haven't explained it well enough for you (and 
probably anyone) to understand correctly. I should probably write it 
all down in one coherent and more formal document rather than 
scattering all the details over many different posts as half-documented 
concept-name-changing written-too-fast examples.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 14 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Just to fix a little misunderstanding:

Michel Fortin wrote:
 On 2008-11-13 00:53:50 -0500, Andrei Alexandrescu 
 with me for a minute. I'm doing so for the following reasons: (a) I'm 
 essentially writing this post in negative time;


By this I meant I don't have time (t < 0), not that I was writing while 
being at a time when I had a negative outlook.

Andrei

Nov 14 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 * Make all ref parameters scoped by default. There will be impossible  
 for a function to escape the address of a ref parameter without a cast.  
 I haven't proved it to myself yet, but I believe that if pointers are  
 not used and with the amendments below regarding arrays and delegates,  
 this makes things entirely safe. In Walter's words, "it buttons things  
 pretty tight".

Does this mean the whole shared/local/scope issue for classes is being  
sidestepped for now?

Nov 02 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 * Make all ref parameters scoped by default. There will be impossible 
 for a function to escape the address of a ref parameter without a 
 cast. I haven't proved it to myself yet, but I believe that if 
 pointers are not used and with the amendments below regarding arrays 
 and delegates, this makes things entirely safe. In Walter's words, "it 
 buttons things pretty tight".

 
 Does this mean the whole shared/local/scope issue for classes is being 
 sidestepped for now?

What issue do you have in mind?

Andrei

Nov 02 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Mon, 03 Nov 2008 00:29:29 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:
 Robert Jacques wrote:
 On Sun, 02 Nov 2008 10:12:46 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 * Make all ref parameters scoped by default. There will be impossible  
 for a function to escape the address of a ref parameter without a  
 cast. I haven't proved it to myself yet, but I believe that if  
 pointers are not used and with the amendments below regarding arrays  
 and delegates, this makes things entirely safe. In Walter's words, "it  
 buttons things pretty tight".

  Does this mean the whole shared/local/scope issue for classes is being  
 sidestepped for now?

 What issue do you have in mind?

Right now, it's trivial for scope classes to escape due to automatic  
conversion to 'local'. And under the current shared/local scheme, one has  
to write multiple functions (one for each type combination).

Nov 03 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve all scope escape issues and 2) be easy to use is dreaming :P

 I think that's a fair assessment. One suggestion I made Walter is to 
 only allow and implement the scope storage class for delegates, which 
 simply means the callee will not squirrel away a pointer to delegate. 
 That would allow us to solve the closure issue and for now sleep some 
 more on the other issues.

 If scope delegates means trust the coder knows what he is doing (in the 
 beginning), I agree with that plan of attack.

 It looks like things will move that way. Bartosz, Walter and I talked a 
 lot yesterday about it - a lot of crazy things were on the table! The next 
 step is to make this a reference, which is highly related to escape 
 analysis. At the risk of anticipating a bit an unfinalized design, here's 
 what's on the table:

 * Continue an "anything goes" policy for *explicit* pointers, i.e. those 
 written explicitly by user code with stars and stuff.

 * Disallow pointers in SafeD.

Isn't this already the case?

BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD.  This 
isn't an implemented language right?  Is the plan for D to become SafeD?  Or 
is there going to be a compiler switch?  Or something else maybe?  I've 
heard SafeD mentioned a lot on this NG, without ever really knowing how it 
exists (concrete or theory).

 * Make all ref parameters scoped by default. There will be impossible for 
 a function to escape the address of a ref parameter without a cast. I 
 haven't proved it to myself yet, but I believe that if pointers are not 
 used and with the amendments below regarding arrays and delegates, this 
 makes things entirely safe. In Walter's words, "it buttons things pretty 
 tight".

I think this sounds reasonable.  However, will there be a way to override 
this behavior?  For example, some modifier to signify that a reference is 
not scope?  The advantage to having the other be the default is that the 
scope keyword already exists.

Having to cast for every time I convert to a pointer will be unpleasant, but 
not horrific.  I'd prefer to state one time 'this is an unsafe reference', 
preferrably in the signature, and be able to use it like before.  The same 
semantics still apply as far as calling the function, it just says "the 
author of this function knows what he is doing" to the compiler.

You would also disallow this keyword usage in SafeD which would be easy to 
filter.

noscope would be a good keyword...

 * Make this a reference so that it obeys what references obey.

This is one place where I think whole-heartedly it should be done.  One 
rarely needs the address this, in fact, I generally end up returning *this 
quite a bit in struct operators, so this change will be most welcome.

 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain and 
 escape &this. That also means they'd be using pointers, so anything goes - 
 pointers are not restricted from escaping.

I implemented dcollections' node-based containers (tree, hash, linked list) 
as structs, because I wanted to control the allocation of them.  I agree 
with others that the defacto standard is going to be structs, since 
performance is paramount, and you have little need for OOP in the internal 
node structures.

Also, if the noscope (or equivalent keyword) is implemented as above, you 
can easily decorate your pointer-using functions:

struct LinkNode(T)
{
noscope
{
    LinkNode *find(T value);
    LinkNode *findReverse(T value);
    ...
}
}

 * There are two cases in which things escape without the user explicitly 
 using pointers: delegates and dynamic arrays initialized from 
 stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the callee. 
 A scoped delegate cannot be stored, only called or passed down to another 
 function that in turn takes a scoped delegate. This makes scope delegates 
 entirely safe. Non-scoped delegates use dynamic allocation.

If noscope (or equivalent keyword) is used, can we make scope the default? 
I'd much rather have the default be the higher-performance, more commonly 
used option.

Also, when you say stored, do you mean stored anywhere, or stored anywhere 
but the stack?  Because there is no harm in storing a scope delegate in a 
local variable (as long as it is also scope).

 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

Hm... this is a tough one.  At the very least, you can disallow returning 
such arrays, as long as the compiler can prove the arrays origins.  That 
should cover 90% of the issues.

The other 10% are ones that are passed into functions.  You might employ the 
same techniques as for delegates, but then we are stuck with the same 
problems as needed for full escape analysis.  Plus the need to return a 
slice of an array is much greater than the need to return a delegate.

You could also argue that an array contains a pointer, and morphing into a 
dynamic array is the same as taking the address of a stack local variable 
(which would require a cast).  But that means SafeD cannot use dynamic 
arrays to reference static arrays.  However, you can then argue that dynamic 
arrays allocated using new are OK for SafeD because you didn't take the 
address of a local stack variable.  My understanding is that in SafeD, 
safety trumps performance.

Note that a static array could be used for a rebindable reference, since it 
has a rebindable pointer in it, so it is really an unsafe operation:

int[2] a;

int[] aref = a[0..1]; // reference to a[0]
aref = a[1..2]; // rebind to a[1]

-Steve

Nov 03 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 "Andrei Alexandrescu" wrote
 * Disallow pointers in SafeD.

 
 Isn't this already the case?

At a point we wanted to allow pointers in restricted ways.

 BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD.  This 
 isn't an implemented language right?  Is the plan for D to become SafeD?  Or 
 is there going to be a compiler switch?  Or something else maybe?  I've 
 heard SafeD mentioned a lot on this NG, without ever really knowing how it 
 exists (concrete or theory).

It's planned as a compiler switch and module option. Essentially SafeD 
is slated to be a safe, proper, well-defined subset of D. It was 
Bartosz's idea, and IMHO an important dimension of D's development. 
Walter is implementing module safety options like this:

module(safe) mymodule;

which means the module must always be compiled with safety on. On the 
contrary,

module(system) mymodule;

means the module is getting its hands greasy.

 * Make all ref parameters scoped by default. There will be impossible for 
 a function to escape the address of a ref parameter without a cast. I 
 haven't proved it to myself yet, but I believe that if pointers are not 
 used and with the amendments below regarding arrays and delegates, this 
 makes things entirely safe. In Walter's words, "it buttons things pretty 
 tight".

 
 I think this sounds reasonable.  However, will there be a way to override 
 this behavior?  For example, some modifier to signify that a reference is 
 not scope?  The advantage to having the other be the default is that the 
 scope keyword already exists.

Good point. I think escaping the address of a ref should be allowed via 
a cast.

 Having to cast for every time I convert to a pointer will be unpleasant, but 
 not horrific.  I'd prefer to state one time 'this is an unsafe reference', 
 preferrably in the signature, and be able to use it like before.  The same 
 semantics still apply as far as calling the function, it just says "the 
 author of this function knows what he is doing" to the compiler.

Currently Walter plans to do that at module granularity.

 You would also disallow this keyword usage in SafeD which would be easy to 
 filter.
 
 noscope would be a good keyword...
 
 * Make this a reference so that it obeys what references obey.

 
 This is one place where I think whole-heartedly it should be done.  One 
 rarely needs the address this, in fact, I generally end up returning *this 
 quite a bit in struct operators, so this change will be most welcome.
 
 * If people want to implement e.g. linked lists, they should do it with 
 classes. Implementing them with structs will require casts to obtain and 
 escape &this. That also means they'd be using pointers, so anything goes - 
 pointers are not restricted from escaping.

 
 I implemented dcollections' node-based containers (tree, hash, linked list) 
 as structs, because I wanted to control the allocation of them.  I agree 
 with others that the defacto standard is going to be structs, since 
 performance is paramount, and you have little need for OOP in the internal 
 node structures.
 
 Also, if the noscope (or equivalent keyword) is implemented as above, you 
 can easily decorate your pointer-using functions:
 
 struct LinkNode(T)
 {
 noscope
 {
     LinkNode *find(T value);
     LinkNode *findReverse(T value);
     ...
 }
 }
 * There are two cases in which things escape without the user explicitly 
 using pointers: delegates and dynamic arrays initialized from 
 stack-allocated arrays.

 * For delegates require the scope keyword in the signature of the callee. 
 A scoped delegate cannot be stored, only called or passed down to another 
 function that in turn takes a scoped delegate. This makes scope delegates 
 entirely safe. Non-scoped delegates use dynamic allocation.

 
 If noscope (or equivalent keyword) is used, can we make scope the default? 
 I'd much rather have the default be the higher-performance, more commonly 
 used option.

I think safety should be the default. People who care about efficiency 
will be willing to write a little bit more. I agree that this is 
annoying if that's the more frequent situation.

 Also, when you say stored, do you mean stored anywhere, or stored anywhere 
 but the stack?  Because there is no harm in storing a scope delegate in a 
 local variable (as long as it is also scope).

That could be allowed, but probably it's not really needed.

 * We don't have an idea for dynamic arrays initialized from 
 stack-allocated arrays.

 
 Hm... this is a tough one.  At the very least, you can disallow returning 
 such arrays, as long as the compiler can prove the arrays origins.  That 
 should cover 90% of the issues.
 
 The other 10% are ones that are passed into functions.  You might employ the 
 same techniques as for delegates, but then we are stuck with the same 
 problems as needed for full escape analysis.  Plus the need to return a 
 slice of an array is much greater than the need to return a delegate.
 
 You could also argue that an array contains a pointer, and morphing into a 
 dynamic array is the same as taking the address of a stack local variable 
 (which would require a cast).  But that means SafeD cannot use dynamic 
 arrays to reference static arrays.  However, you can then argue that dynamic 
 arrays allocated using new are OK for SafeD because you didn't take the 
 address of a local stack variable.  My understanding is that in SafeD, 
 safety trumps performance.
 
 Note that a static array could be used for a rebindable reference, since it 
 has a rebindable pointer in it, so it is really an unsafe operation:
 
 int[2] a;
 
 int[] aref = a[0..1]; // reference to a[0]
 aref = a[1..2]; // rebind to a[1]

I agree with the above. The floor is open for more ideas.


Andrei

Nov 03 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Andrei Alexandrescu" wrote
 Steven Schveighoffer wrote:
 BTW, slightly OT, I read Bartosz' article on digitalmars about SafeD. 
 This isn't an implemented language right?  Is the plan for D to become 
 SafeD?  Or is there going to be a compiler switch?  Or something else 
 maybe?  I've heard SafeD mentioned a lot on this NG, without ever really 
 knowing how it exists (concrete or theory).

 It's planned as a compiler switch and module option. Essentially SafeD is 
 slated to be a safe, proper, well-defined subset of D. It was Bartosz's 
 idea, and IMHO an important dimension of D's development.

I personally probably won't use it, as I feel I have enough experience to 
avoid the problems that SafeD will prevent.  But it does sound like a very 
important version of the language.

 Walter is implementing module safety options like this:

 module(safe) mymodule;

 which means the module must always be compiled with safety on. On the 
 contrary,

 module(system) mymodule;

 means the module is getting its hands greasy.

Hm... that's kinda too high level.  I might have one function in a class 
that does things that are 'unsafe', but I don't want to have to mark my 
whole class as unsafe.

 * For delegates require the scope keyword in the signature of the 
 callee. A scoped delegate cannot be stored, only called or passed down 
 to another function that in turn takes a scoped delegate. This makes 
 scope delegates entirely safe. Non-scoped delegates use dynamic 
 allocation.

 If noscope (or equivalent keyword) is used, can we make scope the 
 default? I'd much rather have the default be the higher-performance, more 
 commonly used option.

 I think safety should be the default. People who care about efficiency 
 will be willing to write a little bit more. I agree that this is annoying 
 if that's the more frequent situation.

What I meant was, make the default behavior as if scope was marked on the 
delegate.  This doesn't make it unsafe (you said so yourself).  But it does 
line up with most code today, which doesn't do anything with a delegate but 
call it.  i.e. less decorations on current code that is already considered 
safe.

The most obvious usage is opApply.  Every opApply will have to have its 
delegate marked scope unless it's the default.

The only downside is that you then have to come up with a way to mark a 
delegate as noscope.

 Also, when you say stored, do you mean stored anywhere, or stored 
 anywhere but the stack?  Because there is no harm in storing a scope 
 delegate in a local variable (as long as it is also scope).

 That could be allowed, but probably it's not really needed.

I can think of certain cases to need it, for example if you have two inner 
functions that have the same signature, and you want to decide which one to 
use at runtime, you might store the one to use in a local variable.

---------------------------

It seems to me like the way you are saying things will work is that you will 
have either safety checks or no safety checks at a module level.  I think 
that is a mistake.  Most of my code should be safe, and I'd prefer it to be 
safety checked.  The ideas that all of you have come up with in this post 
are very good, and should be easy to use for most code.  I especially like 
the requirement to cast in order to take the address of a reference.  But if 
all those checks go away when you mark your module as system, then this 
seems like it will either require me to split up my modules into safe and 
unsafe parts, or just not use safety checks where they could be used.  I'd 
prefer to be able to mark specific functions/parameters as unsafe or safe so 
I know exactly where I have disabled the safety checks.

And I'd prefer safety by default, not have to mark for safety.  As long as 
the safety can be easily verified and allows most usages.  I really like how 
pointers are simply considered unsafe, so all safety checks are off.  That 
draws a clear line of where it's difficult to verify safety without 
hindering ability.  The further check of compliance to SafeD can eliminate 
possible pointer usages that you miss.

-Steve

Nov 04 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 01 Nov 2008 12:00:10 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 "Michel Fortin" wrote
 The only way I can see to solve this is to do it at link time.  When  
 you
 link, piece together the parts of the graph that were incomplete, and  
 see
 if
 they all work.  It would be a very radical change, and might not even
 work
 with the current linkers.  Especially if you want to do shared  
 libraries,
 where the linker is builtin to the OS.

 I think you're dreaming... not that it's a bad thing to have ambition,  
 but
 that's probably not even possible.

 Sure it is ;)  You have to write a special linker.

 I think everyone who thinks a scope decoration proposal is going to 1)  
 solve
 all scope escape issues and 2) be easy to use is dreaming :P

Various research languages have shown both 1 and 2 are possible.

 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

 I think this concept is going to be really hard for a person to  
 decipher,
 and really hard to get right.

 It takes some thinking to get the prototype right at first. But it takes
 less caution calling the function later with local variables since the
 compiler will either issue an error or automatically fix the issue by
 allocating on the heap when an argument requires a greater scope.

 I hope to avoid this last situation.  Having the compiler make decisions  
 for
 me, especially when heap allocation occurs, is bad.

How so? Please explain why it's bad (an opinion by itself isn't and  
argument).

Nov 02 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Robert Jacques" wrote
 On Sat, 01 Nov 2008 12:00:10 -0400, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 "Michel Fortin" wrote
 The only way I can see to solve this is to do it at link time.  When 
 you
 link, piece together the parts of the graph that were incomplete, and 
 see
 if
 they all work.  It would be a very radical change, and might not even
 work
 with the current linkers.  Especially if you want to do shared 
 libraries,
 where the linker is builtin to the OS.

 I think you're dreaming... not that it's a bad thing to have ambition, 
 but
 that's probably not even possible.

 Sure it is ;)  You have to write a special linker.

 I think everyone who thinks a scope decoration proposal is going to 1) 
 solve
 all scope escape issues and 2) be easy to use is dreaming :P

 Various research languages have shown both 1 and 2 are possible.

I think 1 can be possibly done.  2 is a matter of subjectivity, and so far, 
I haven't seen an example of it.

But I also don't want D to become a purely academic language.  I want it to 
keep the system-level performance and usability that drew me to it in the 
first place.

 I don't think it's bad to force interfaces to be well documented, and
 documented in a format that the compiler can understand to find errors
 like this.

 I think this concept is going to be really hard for a person to 
 decipher,
 and really hard to get right.

 It takes some thinking to get the prototype right at first. But it takes
 less caution calling the function later with local variables since the
 compiler will either issue an error or automatically fix the issue by
 allocating on the heap when an argument requires a greater scope.

 I hope to avoid this last situation.  Having the compiler make decisions 
 for
 me, especially when heap allocation occurs, is bad.

 How so? Please explain why it's bad (an opinion by itself isn't and 
 argument).

Allocating on the heap involves locking a global mutex (as long as the heap 
is global), searching for a free memory space, possibly running a garbage 
collection cycle, and finally possibly allocating more memory from the OS.

All of these are very expensive compared to adjusting the stack pointer.

For instance, I wrote a 'chunk allocator' which uses D's allocator to 
allocate memory in chunks instead of going to the GC for each piece in 
dcollections' implementation.  Doing this achieved at least a 2x speedup 
because I was calling on the GC less often.  The author of Tango's new 
container implementation wrote a similar allocator that's even faster than 
that because it doesn't use the GC for any allocation (of course, you cannot 
use it to allocate items which have references, because the GC doesn't look 
at that memory).

In Tango, many operations rely on using stack allocation for buffers and 
temporary classes.  If the compiler decides I don't know what I'm doing and 
helpfully allocates those on the heap for my protection, I just lost all the 
performance that I purposely build the library to have.  This is one of the 
main arguments I hear from the other Tango devs about moving to D2, the 
automatic dynamic closure.

I think many people are not aware of how important it is to avoid heap 
allocation when possible.  It is one of the central goals that makes Tango 
so much faster than other libraries.

-Steve

Nov 03 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-11-03 14:47:25 -0500, "Steven Schveighoffer" 
<schveiguy yahoo.com> said:

 I hope to avoid this last situation.  Having the compiler make decisions
 for me, especially when heap allocation occurs, is bad.

 
 How so? Please explain why it's bad (an opinion by itself isn't and argument).

 
 Allocating on the heap involves locking a global mutex (as long as the heap
 is global), searching for a free memory space, possibly running a garbage
 collection cycle, and finally possibly allocating more memory from the OS.
 
 All of these are very expensive compared to adjusting the stack pointer.

I won't dispute this. I'll note that the upcomming "shared" keyword may 
help regarding not locking a global mutex for unshared variables, but 
even without the mutex the operation still is expensive.


 For instance, I wrote a 'chunk allocator' which uses D's allocator to
 allocate memory in chunks instead of going to the GC for each piece in
 dcollections' implementation.  Doing this achieved at least a 2x speedup
 because I was calling on the GC less often.  The author of Tango's new
 container implementation wrote a similar allocator that's even faster than
 that because it doesn't use the GC for any allocation (of course, you cannot
 use it to allocate items which have references, because the GC doesn't look
 at that memory).

Nothing of the sort should be prevented by a scoping system. If it is, 
then I'd consider the system a failure.


 In Tango, many operations rely on using stack allocation for buffers and
 temporary classes.  If the compiler decides I don't know what I'm doing and
 helpfully allocates those on the heap for my protection, I just lost all the
 performance that I purposely build the library to have.  This is one of the
 main arguments I hear from the other Tango devs about moving to D2, the
 automatic dynamic closure.

Then we must make sure the compiler doesn't heap allocate when it 
doesn't absolutely need to. And, *in addition*, when the programmer 
really needs to be sure that a variable is not heap-allocated, marking 
a varialbe "scope" would do the trick.


 I think many people are not aware of how important it is to avoid heap
 allocation when possible.  It is one of the central goals that makes Tango
 so much faster than other libraries.

I agree with your first assertion (and am not enough familiar with 
Tango to say anything about the second) and this is exactly why I'm in 
favor of the compiler deciding what to heap-allocate. People are not 
aware enough of how important it is to avoid heap allocation, so I 
expect that if the compiler can be made to know about scopes, it can 
avoid heap allocation where many users wouldn't bother (especially in a 
garbage-collected language where you can heap-allocate without 
thinking), which would result in faster programs with fewer bugs all 
this without having to think about the technical details.

Note that I may be wrong with this, but there's no way to be sure 
without trying. Anyway, once we have a proper scoping system, it'll be 
easy to try and decide between auto-allocation and simply enforcing 
constrains by emitting errors.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Nov 04 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-28 23:52:04 -0400, "Robert Jacques" <sandford jhu.edu> said:

 I've run across some academic work on ownership types which seems  
 relevant  to this discussion on share/local/scope/noscope.

 I haven't read the paper yet, but the overview seems to go in the same  
 direction as I was thinking.

 Basically, all the scope variables you can get are guarentied to be in  
 the current or in some ansestry scope. To allow a reference to a scope  
 variable, or a scope function, to be put inside a member of a struct or  
 class, you only need to prove that the struct or class lifetime is  
 smaller or equal to the one of the reference to your scope variable. If  
 you could tell to the compiler the scope relationship of the various  
 arguments, then you'd have pretty good scope analysis.

 For instance, with this syntax, we could define i to be available during  
 the whole lifetime of o:

 	void foo(scope MyObject o, scope(o) int* i)
 	{
 		o.i = i;
 	}

What does the scope part of 'scope MyObject o' mean? (i.e. is this D's  
current scope or something else?)
What does 'scope(o)' explicitly mean? I'm going to assume scope(o) means  
the scope of o.

 So you could do:

 	void bar()
 	{
 		scope int i;
 		scope MyObject o = new MyObject;
 		foo(o, &i);
 	}

 And the compiler would let it pass because foo guarenties not to keep  
 references to i outside of o's scope, and o's scope is the same as i.

 Or you could do:

 	void test1()
 	{
 		int i;
 		test2(&i);
 	}

 	void test2(scope int* i)
 	{
 		scope o = new MyObject;
 		foo(o, &i);

Error: &i is of type int** while foo takes a int*. Did you mean foo(o, i)?

 	}

 Again, the compiler can statically check that test2 won't keep a  
 reference to i outside of the caller's scope (test1) because o scope is  
 limited to test2.

The way I read your example, no useful escape analysis can be done by the  
complier, and it works mainly because i is a pointer to a value type.

 And if you try the reverse:

 	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}

 	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}

 Then the compiler could determine automatically that i needs to escape  
 test2's scope and allocate the variable on the heap to make its lifetime  
 as long as the object's scope (as it does currently with nested  
 functions) [see my reserves to this in post scriptum]. This could be  
 avoided by explictly binding i to the current scope, in which case the  
 compiler could issue a scope error:

The way I read this is o is of type scope MyObject, i is of type scope int  
and therefore foo(o,&i) is valid and an escape happens.

Oct 29 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-29 15:10:00 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 Basically, all the scope variables you can get are guarentied to be in  
 the current or in some ansestry scope. To allow a reference to a scope  
 variable, or a scope function, to be put inside a member of a struct or 
  class, you only need to prove that the struct or class lifetime is  
 smaller or equal to the one of the reference to your scope variable. If 
  you could tell to the compiler the scope relationship of the various  
 arguments, then you'd have pretty good scope analysis.
 
 For instance, with this syntax, we could define i to be available 
 during  the whole lifetime of o:
 
 	void foo(scope MyObject o, scope(o) int* i)
 	{
 		o.i = i;
 	}

 
 What does the scope part of 'scope MyObject o' mean? (i.e. is this D's  
 current scope or something else?)

Ok, I should have defined that better. It means that o is bound the 
caller scope (possibly on the stack). Scopes are created for each 
function and each {}-delimited blocks in them, basically it's the stack 
of the current thread. Once you exit a scope, its variables cease to 
exist and we must ensure there is no more reference to them.

In this case, "scope MyObject o" means that we're recieving a MyObject 
reference which could be pointing to somewhere down in the stack *or* 
the heap. We have to consider the most restrictive constrain however, 
so let's say it's in the stack. The rule is that you can't place a 
reference to a scoped variable anywhere below its scope in the stack, 
making sure that you can't keep a reference to a variable which no 
longer exist once the top scope has dissapeared.

Scope stack (call stack with the global scope at the bottom):
 1. foo ( scope MyObject o = function1.o ) { }
 2. function1 () { scope MyObject o, int i }
 3. main () { }
 ...
 n. global scope

In practical terms, "scope MyObject o" means that we can't put a 
reference to the object anywhere that lives beyond the current function 
call... except in a scope return value, but I haven't entered that yet.

 What does 'scope(o)' explicitly mean? I'm going to assume scope(o) 
 means  the scope of o.

That's it... mostly. scope(o) is the scope of o, or any scope below o. 
Take it as any scope valid as long as o exists. If o was not scope, 
scope(o) would be noscope.


 So you could do:
 
 	void bar()
 	{
 		scope int i;
 		scope MyObject o = new MyObject;
 		foo(o, &i);
 	}
 
 And the compiler would let it pass because foo guarenties not to keep  
 references to i outside of o's scope, and o's scope is the same as i.
 
 Or you could do:
 
 	void test1()
 	{
 		int i;
 		test2(&i);
 	}
 
 	void test2(scope int* i)
 	{
 		scope o = new MyObject;
 		foo(o, &i);

 Error: &i is of type int** while foo takes a int*. Did you mean foo(o, i)?

Oops. Indeed, I meant foo(o, i).

 	}
 
 Again, the compiler can statically check that test2 won't keep a  
 reference to i outside of the caller's scope (test1) because o scope is 
  limited to test2.

 
 The way I read your example, no useful escape analysis can be done by 
 the  complier, and it works mainly because i is a pointer to a value 
 type.

It's not escape analysis. It scoping constrains enforced by making sure 
that every function declares what may escape and what may not.

If this was a pure value type passed by copy, scope would be 
meaningless indeed as there would be no reference that could escape.


 And if you try the reverse:
 
 	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}
 
 	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}
 
 Then the compiler could determine automatically that i needs to escape  
 test2's scope and allocate the variable on the heap to make its 
 lifetime  as long as the object's scope (as it does currently with 
 nested  functions) [see my reserves to this in post scriptum]. This 
 could be  avoided by explictly binding i to the current scope, in which 
 case the  compiler could issue a scope error:

 
 The way I read this is o is of type scope MyObject, i is of type scope 
 int  and therefore foo(o,&i) is valid and an escape happens.

That's my point. The compiler can detect an escape may happen just by 
looking at the funciton prototype for foo. The prototype tells us that 
foo needs i to be at the same or a lower scope than o, something we 
don't have here.

The compiler can then decide to allocate i dynamically on the heap to 
make sure it exists for at least the scope of o; or it could be decided 
to just make that illegal. I prefer automatic heap allocation, as it 
means we can get rid of the decision to statically or dynamically 
allocate variables: the compiler can decide based on the funciton 
prototypes whichever is best. For cases you really mean a variable to 
be on the stack, you can use scope, as in:

	scope int i;

and the compiler would just issue an error if you attept to give a 
reference to i to a function that wants to use it in a lower scope. 
Otherwise, the compiler would be free to decide whichever scope to use 
between local or heap-allocated.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 30 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Oct 2008 08:14:31 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 And if you try the reverse:
  	void test1()
 	{
 		scope o = new MyObject;
 		test2(o);
 	}
  	void test2(scope MyObject o)
 	{
 		int i;
 		foo(o, &i);
 	}
  Then the compiler could determine automatically that i needs to  
 escape  test2's scope and allocate the variable on the heap to make  
 its lifetime  as long as the object's scope (as it does currently with  
 nested  functions) [see my reserves to this in post scriptum]. This  
 could be  avoided by explictly binding i to the current scope, in  
 which case the  compiler could issue a scope error:

  The way I read this is o is of type scope MyObject, i is of type scope  
 int  and therefore foo(o,&i) is valid and an escape happens.

 That's my point. The compiler can detect an escape may happen just by  
 looking at the funciton prototype for foo. The prototype tells us that  
 foo needs i to be at the same or a lower scope than o, something we  
 don't have here.

 The compiler can then decide to allocate i dynamically on the heap to  
 make sure it exists for at least the scope of o; or it could be decided  
 to just make that illegal. I prefer automatic heap allocation, as it  
 means we can get rid of the decision to statically or dynamically  
 allocate variables: the compiler can decide based on the funciton  
 prototypes whichever is best. For cases you really mean a variable to be  
 on the stack, you can use scope, as in:

 	scope int i;

 and the compiler would just issue an error if you attept to give a  
 reference to i to a function that wants to use it in a lower scope.  
 Otherwise, the compiler would be free to decide whichever scope to use  
 between local or heap-allocated.

Just to clarify:
   	void test2(scope MyObject o)	// the scope of o is a parent of test2
  	{
  		int i;			// the scope of i is test2
  		foo(o, &i);		// foo(o,&i) requires &i to have o's scope or a parent of  
o's scope, so i must be heap (the root parent) allocated.
  	}

A problem I see is that once shared/local are introduced, you have  
multiple heaps where i should be allocated, depending on the runtime type  
of o. How would this be handled in this scheme?

Oct 30 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-30 09:04:10 -0400, "Robert Jacques" <sandford jhu.edu> said:

 Just to clarify:
    	void test2(scope MyObject o)	// the scope of o is a parent of test2
   	{
   		int i;			// the scope of i is test2
   		foo(o, &i);		// foo(o,&i) requires &i to have o's scope or a parent 
 of  o's scope, so i must be heap (the root parent) allocated.
   	}
 
 A problem I see is that once shared/local are introduced, you have  
 multiple heaps where i should be allocated, depending on the runtime 
 type  of o. How would this be handled in this scheme?

Well, it all depends if foo wants the second argument of i must be 
shared or not. If foo's declaration was like this:

	void foo(scope MyObject o, scope(o) shared int* i);

then you'd need to use "shared int i" in test2 to avoid an error at the 
call site.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 30 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Oct 2008 21:01:27 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 09:04:10 -0400, "Robert Jacques" <sandford jhu.edu> said:

 Just to clarify:
    	void test2(scope MyObject o)	// the scope of o is a parent of test2
   	{
   		int i;			// the scope of i is test2
   		foo(o, &i);		// foo(o,&i) requires &i to have o's scope or a parent  
 of  o's scope, so i must be heap (the root parent) allocated.
   	}
  A problem I see is that once shared/local are introduced, you have   
 multiple heaps where i should be allocated, depending on the runtime  
 type  of o. How would this be handled in this scheme?

 Well, it all depends if foo wants the second argument of i must be  
 shared or not. If foo's declaration was like this:

 	void foo(scope MyObject o, scope(o) shared int* i);

 then you'd need to use "shared int i" in test2 to avoid an error at the  
 call site.

Actually, what I meant was that o may be local or shared. However,  
assuming thin-locks, o  may be tested at runtime for share/local cheaply  
and the right allocation done.

Oct 31 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current  
 scope(success/failure/exit) feature.

How about o.scope instead of scope(o)? Also, this would allow  
contract-like syntax:
void foo (myObject o, int* i)
  if (o.scope <= i.scope) {
...
}

Oct 30 2008

Michel Fortin <michel.fortin michelf.com> writes:

On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin  
 <michel.fortin michelf.com> wrote:
 
 P.P.S.: This syntax doesn't fit very well with the current  
 scope(success/failure/exit) feature.

 How about o.scope instead of scope(o)? Also, this would allow  
 contract-like syntax:
 void foo (myObject o, int* i)
   if (o.scope <= i.scope) {
 ...
 }

Hum, but can that syntax guarenty a reference to o or i won't escape 
the current function's scope, like

	void foo(scope Object o);

?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Oct 30 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Oct 2008 21:01:28 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin   
 <michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current   
 scope(success/failure/exit) feature.

 How about o.scope instead of scope(o)? Also, this would allow   
 contract-like syntax:
 void foo (myObject o, int* i)
   if (o.scope <= i.scope) {
 ...
 }

 Hum, but can that syntax guarenty a reference to o or i won't escape the  
 current function's scope, like

 	void foo(scope Object o);

 ?

No, the syntax was meant to address the more complex problem of specifying  
the concept of scope(o). It also add some flexibility for other  
relationships. As for do not escape, I'm assuming a no_escape type (it  
would behave as a transitive version of final). I dislike reusing the  
scope keyword for this as

void foo(scope Object a) {
     scope Object b = new Object();
     scope Object c = b; // Okay
     scope Object d = a; // Error
}

Oct 31 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Thu, 30 Oct 2008 21:01:28 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2008-10-30 14:07:42 -0400, "Robert Jacques" <sandford jhu.edu> said:

 On Wed, 29 Oct 2008 07:28:55 -0400, Michel Fortin   
 <michel.fortin michelf.com> wrote:

 P.P.S.: This syntax doesn't fit very well with the current   
 scope(success/failure/exit) feature.

 How about o.scope instead of scope(o)? Also, this would allow   
 contract-like syntax:
 void foo (myObject o, int* i)
   if (o.scope <= i.scope) {
 ...
 }

 Hum, but can that syntax guarenty a reference to o or i won't escape the  
 current function's scope, like

 	void foo(scope Object o);

 ?

Another option is for the default to be escape. i.e. a contract is  
required for an escape to happen

Object o;
void foo(Object a, Object b)
    if(b.scope <= o.scope) {
       o = b; // Okay
       o = a; // Error
}

Oct 31 2008

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 31 Oct 2008 11:02:31 -0400, Robert Jacques <sandford jhu.edu>  
wrote:
 Another option is for the default to be escape.

Correction: default to be _no_ escape.

Oct 31 2008

bearophile <bearophileHUGS lycos.com> writes:

I think C++ designers are fully mad, this shows how to use C++ lambdas:

http://blogs.msdn.com/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx

If D2 lambdas (and closures) become *half* complex as that I'm going to stop
using D on the spot :-)

Bye,
bearophile

Oct 29 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 29 Oct 2008 09:46:22 -0400, bearophile wrote:

 I think C++ designers are fully mad, this shows how to use C++
 lambdas: 
 
 http://blogs.msdn.com/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx
 
 If D2 lambdas (and closures) become *half* complex as that I'm going
 to stop using D on the spot :-) 

Well, they're somewhat limited, and a bit manual, and actually just a
syntactic sugar, but otherwise they're quite close to D's stack
delegates, even in syntax.  I couldn't see what scared you that much.

Oct 29 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Thu, Oct 30, 2008 at 12:49 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 29 Oct 2008 09:46:22 -0400, bearophile wrote:

 I think C++ designers are fully mad, this shows how to use C++
 lambdas:

 http://blogs.msdn.com/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx

 If D2 lambdas (and closures) become *half* complex as that I'm going
 to stop using D on the spot :-)

 Well, they're somewhat limited, and a bit manual, and actually just a
 syntactic sugar, but otherwise they're quite close to D's stack
 delegates, even in syntax.  I couldn't see what scared you that much.

I think it's mostly the capture mode [] stuff that's a bit ugly.
I think this is a legal lambda:  [=,this,&x,&y](int& r) mutable { ... }

Anyway, I'm impressed that MS is getting these things into the
compiler so quickly.  I had expected to see another C99 foot-dragging
extravaganza.  Guess it just goes to show how little they care about
C.

--bb

Oct 29 2008

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 30 Oct 2008 04:06:52 +0900, Bill Baxter wrote:

 On Thu, Oct 30, 2008 at 12:49 AM, Sergey Gromov <snake.scaly gmail.com> wrote:
 Wed, 29 Oct 2008 09:46:22 -0400, bearophile wrote:

 I think C++ designers are fully mad, this shows how to use C++
 lambdas:

 http://blogs.msdn.com/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx


 
 Anyway, I'm impressed that MS is getting these things into the
 compiler so quickly.  I had expected to see another C99 foot-dragging
 extravaganza.  Guess it just goes to show how little they care about
 C.

The discussed features are really a significant improvement for C++
productivity.  I think a lot of C++ code is still being written by MS so
improving productivity here should be a priority for them.

I wonder if there are any chances for typeof() in C++.

Oct 29 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Wed, Oct 29, 2008 at 5:30 PM, Sergey Gromov <snake.scaly gmail.com> wrote:
 I wonder if there are any chances for typeof() in C++.

It's called decltype().

Oct 29 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Bill Baxter wrote:
 Anyway, I'm impressed that MS is getting these things into the
 compiler so quickly.  I had expected to see another C99 foot-dragging
 extravaganza.  Guess it just goes to show how little they care about
 C.

C++ is a .NET language now ;-P

Oct 29 2008

Chad J <gamerchad __spam.is.bad__gmail.com> writes:

I wonder if it would be easy enough to allocate closures lazily at runtime.

So the compiler scans executable code, and any time there is an 
assignment (passing as function args doesn't count, returning does) 
involving delegates, it inserts code that will do the following:
- Check whether the delegate being assigned from is on the stack or the 
heap.
- If it's on the stack, make a copy on the heap, and use that.

Scope (partial) closures never get assigned to other things, so no extra 
code will ever be generated or executed for them.

I worry that this might be more complicated with multithreading though.

Also, I'm not sure how to make sure all calls to the closure access the 
same context, and that the function that contains the context also knows 
when it's context has moved off of the stack and into the heap.  I'm not 
sure of this because I'm also not sure how that's handled anyways.

Also notable is that the heuristic I suggest is just that; it is not 
necessarily optimal or even strictly lazy.  There are cases where 
delegates could be passed around by assignment yet never escape their 
scope.  Maybe it is easy enough to add that as another condition for the 
runtime check: is this delegate being assigned to some place in the heap 
or too far up (down?) in the stack?  Just an optimization though, and 
probably one not nearly as important.

OK so all of this doesn't help much with the more general problem of 
/static/ escape analysis.  Oh well.

Oct 30 2008

Christopher Wright <dhasenan gmail.com> writes:

Walter Bright wrote:
 void bar(noscope int* p);    // p escapes
 void bar(scope int* p);      // p does not escape
 void bar(int* p);            // what should be the default?
 
 What should be the default? The functional programmer would probably 
 choose scope as the default, and the OOP programmer noscope.
 
 (The issue with delegates is we need the dynamic closure only if the 
 delegate 'escapes'.)

I appreciate OOP. I also appreciate it when it takes no significant 
effort to write safe code. I also appreciate it when I don't have to 
convince the compiler that what I'm doing is safe when I know it's safe.

In the case of pointers, I don't use them, most of the time. (I'm 
working on a variable-key-length cache-oblivious lookahead array right 
now, and that requires pointers for efficiency, but this is probably the 
first time I've used pointers in D.)

In the case of delegates, I use them. I've been confused and upset by 
the lack of closures in D1. I think a lot of new programmers will expect 
closures and get confused by having two different ways of declaring them.

For my code, I won't mind using whatever new syntax for closures, even 
if it's slightly verbose. For new programmers, I'd recommend using 
closures by default, since they're safer. Once they're more comfortable 
with the language, you can introduce the idea of allocating delegate 
context on the stack as an occasionally unsafe optimization.

Nov 01 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Escape analysis