www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ref parameters: there is no escape

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter and I have had a long discussion and we thought we'd bring an 
idea for community review.

We believe it would be useful for safety purposes to disallow escaping 
addresses of ref parameters. Consider:

class C {
   int * p;
   this(ref int x) {
     p = &x; // escapes the address of a ref parameter
   }
}

Such code is accepted today. We believe it is error-prone and dangerous, 
particularly because the caller has no syntactic cue that the address of 
the parameter is passed into the function (in this case constructor). 
Worse, such a function cannot be characterized as  safe.

So we want to make the above an error. The workaround is obvious - just 
take int* as a parameter instead of ref int. What a function can do with 
a ref parameter in general is:

* use it directly just like a local;

* pass it down to other functions (which may take it by value or reference);

* pass its address down to pure functions because a pure function cannot 
escape the address anyway (cool insight by Walter);

* take its address as long as the address doesn't outlive the frame of 
the function.

The third bullet is not easy to implement as it requires flow analysis, 
but we may start with a conservative version first. Probably there won't 
be a lot of broken code anyway.

Please chime in with any comments you might have!


Thanks,

Andrei
Aug 14 2011
next sibling parent Jakob Ovrum <jakobovrum gmail.com> writes:
On 2011/08/14 23:20, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:

 class C {
 int * p;
 this(ref int x) {
 p = &x; // escapes the address of a ref parameter
 }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor).
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or
 reference);

 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

 * take its address as long as the address doesn't outlive the frame of
 the function.

 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.

 Please chime in with any comments you might have!


 Thanks,

 Andrei

I like the idea, but don't we already have (currently non-enforced) scope parameters for this? Of course it would be nice to have "ref" also mean scope, like "in" meaning "scope const", but it would be nice to have scope working properly. Currently, this code compiles fine: --------------------- const(char)[] test; void foo(in char[] s) { test = s; } void main() { foo("bar"); } ---------------------- This is a big problem when writing a library (accepting delegate callbacks and such) and you're not sure whether the user wants to just "read" a variable or hold onto it. Whether or not to make a copy should be the user's choice, not the library's.
Aug 14 2011
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/14/2011 04:20 PM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:

 class C {
 int * p;
 this(ref int x) {
 p = &x; // escapes the address of a ref parameter
 }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor).
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or
 reference);

 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

Well, then it is possible to 'wash clean' a pointer to ref argument using a pure function: int* identity(int* p)pure{return p;} int* global; void escapeRef(ref int x) safe{ global=identity(&x); }
 * take its address as long as the address doesn't outlive the frame of
 the function.

 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.

 Please chime in with any comments you might have!


 Thanks,

 Andrei

I agree, disallow. But more important is that scope parameters start working. Probably some of the code could be reused.
Aug 14 2011
prev sibling next sibling parent kennytm <kennytm gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Walter and I have had a long discussion and we thought we'd bring an idea
 for community review.
 
 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:
 
 class C {
   int * p;
   this(ref int x) {
     p = &x; // escapes the address of a ref parameter
   }
 }
 
 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor). 

Well, you could adopt bug 6442 and call the constructor as auto c = new C(ref x); <g>
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:
 
 * use it directly just like a local;
 
 * pass it down to other functions (which may take it by value or reference);
 
 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

Does this mean strongly pure? Because for now we can write a weakly pure function pure int* escape(int* q) { return q; } and change that constructor to this(ref int x) { p = escape(&x); }
 * take its address as long as the address doesn't outlive the frame of the
function.
 
 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.
 
 Please chime in with any comments you might have!
 
 
 Thanks,
 
 Andrei

Aug 14 2011
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
I think this is an absolutely terrible idea, unless it has an "I know 
what I'm doing, let me cast away the safety" loophole.  Consider the 
case of designing a D wrapper for C functionality.

// C, we know it doesn't escape its parameters but the compiler doesn't.
void cFun(int* a, int* b);

// D:
void dWrapper(ref int a, ref int b) {
     cFun(&a, &b);
}

If you want the compiler to put extra restrictions on you in the name of 
safety, that's what SafeD is for.  If you're writing an  system 
function, then the compiler should stay out of your way and let you do 
what you want, unless it can **prove** that it's wrong.

On 8/14/2011 10:20 AM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:

 class C {
 int * p;
 this(ref int x) {
 p = &x; // escapes the address of a ref parameter
 }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor).
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or
 reference);

 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

 * take its address as long as the address doesn't outlive the frame of
 the function.

 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.

 Please chime in with any comments you might have!


 Thanks,

 Andrei

Aug 14 2011
next sibling parent reply Jakob Ovrum <jakobovrum+ng gmail.com> writes:
On 2011/08/15 0:28, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole. Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

What if it was allowed if the parameters were explicitly marked scope? void cFun(scope int* a, scope int* b); I can imagine it being a proper inconvenience most of the time though, with many libraries not escaping a lot at all, you'd have to mark pretty much everything scope manually.
Aug 14 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 10:33 AM, Jakob Ovrum wrote:
 On 2011/08/15 0:28, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole. Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

What if it was allowed if the parameters were explicitly marked scope? void cFun(scope int* a, scope int* b); I can imagine it being a proper inconvenience most of the time though, with many libraries not escaping a lot at all, you'd have to mark pretty much everything scope manually.

Exactly. Using scope has been part of the discussion, and our agreement was that it would be a lot of burden to require manual scope annotations for non-escaping parameters. Andrei
Aug 14 2011
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-08-14 18:45, Andrei Alexandrescu wrote:
 On 8/14/11 10:33 AM, Jakob Ovrum wrote:
 On 2011/08/15 0:28, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole. Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

What if it was allowed if the parameters were explicitly marked scope? void cFun(scope int* a, scope int* b); I can imagine it being a proper inconvenience most of the time though, with many libraries not escaping a lot at all, you'd have to mark pretty much everything scope manually.

Exactly. Using scope has been part of the discussion, and our agreement was that it would be a lot of burden to require manual scope annotations for non-escaping parameters. Andrei

Can we do the opposite, somehow indicating that the parameters might escape? -- /Jacob Carlborg
Aug 14 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 11:50 AM, Jacob Carlborg wrote:
 On 2011-08-14 18:45, Andrei Alexandrescu wrote:
 On 8/14/11 10:33 AM, Jakob Ovrum wrote:
 On 2011/08/15 0:28, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole. Consider the
 case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler
 doesn't.
 void cFun(int* a, int* b);

What if it was allowed if the parameters were explicitly marked scope? void cFun(scope int* a, scope int* b); I can imagine it being a proper inconvenience most of the time though, with many libraries not escaping a lot at all, you'd have to mark pretty much everything scope manually.

Exactly. Using scope has been part of the discussion, and our agreement was that it would be a lot of burden to require manual scope annotations for non-escaping parameters. Andrei

Can we do the opposite, somehow indicating that the parameters might escape?

We talked about this, too. I even aired ~scope. Such a change would be doable but is liable to break a lot of code. Andrei
Aug 14 2011
prev sibling parent dsimcha <dsimcha yahoo.com> writes:
On 8/14/2011 12:45 PM, Andrei Alexandrescu wrote:
 Exactly. Using scope has been part of the discussion, and our agreement
 was that it would be a lot of burden to require manual scope annotations
 for non-escaping parameters.

 Andrei

Let's assume for the sake of argument that scope is part of the game. (How) would it be checked?
Aug 14 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 10:28 AM, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole.

I'm weary of absolute qualifications, particularly after arguments have been made in favor of the idea that are not refuted.
 Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

 // D:
 void dWrapper(ref int a, ref int b) {
 cFun(&a, &b);
 }

I understand. Probably it's fine to require an explicit cast for taking the address. Offhand, I don't see this as a frequent situation, or one that would make pass-by-pointer unpalatable.
 If you want the compiler to put extra restrictions on you in the name of
 safety, that's what SafeD is for. If you're writing an  system function,
 then the compiler should stay out of your way and let you do what you
 want, unless it can **prove** that it's wrong.

The problem is, currently all functions that pass locals by ref cannot be proven safe modularly. Andrei
Aug 14 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/14/2011 12:44 PM, Andrei Alexandrescu wrote:
 On 8/14/11 10:28 AM, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole.

I'm weary of absolute qualifications, particularly after arguments have been made in favor of the idea that are not refuted.

What do you mean "absolute qualifications"?
 Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

 // D:
 void dWrapper(ref int a, ref int b) {
 cFun(&a, &b);
 }

I understand. Probably it's fine to require an explicit cast for taking the address. Offhand, I don't see this as a frequent situation, or one that would make pass-by-pointer unpalatable.

Pass-by-
 If you want the compiler to put extra restrictions on you in the name of
 safety, that's what SafeD is for. If you're writing an  system function,
 then the compiler should stay out of your way and let you do what you
 want, unless it can **prove** that it's wrong.

The problem is, currently all functions that pass locals by ref cannot be proven safe modularly.

Right, but they can be proven safe if they pass locals by ref **to safe functions**. I don't think there's any disagreement that safe functions shouldn't be allowed to take the address of locals or parameters.
Aug 14 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 11:51 AM, dsimcha wrote:
 On 8/14/2011 12:44 PM, Andrei Alexandrescu wrote:
 On 8/14/11 10:28 AM, dsimcha wrote:
 I think this is an absolutely terrible idea, unless it has an "I know
 what I'm doing, let me cast away the safety" loophole.

I'm weary of absolute qualifications, particularly after arguments have been made in favor of the idea that are not refuted.

What do you mean "absolute qualifications"?

"absolutely terrible" Andrei
Aug 14 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
Argh, accidentally hit send before I meant to on my last post.  Please 
ignore.

On 8/14/2011 12:44 PM, Andrei Alexandrescu wrote:
 Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

 // D:
 void dWrapper(ref int a, ref int b) {
 cFun(&a, &b);
 }

I understand. Probably it's fine to require an explicit cast for taking the address. Offhand, I don't see this as a frequent situation, or one that would make pass-by-pointer unpalatable.

Pass-by-pointer is really, really ugly when used in high-level D-style code, and exposes the implementation detail that the D wrapper is using C code. By explicit cast, do you mean one in dWrapper() that's encapsulated and invisible to the caller?
 If you want the compiler to put extra restrictions on you in the name of
 safety, that's what SafeD is for. If you're writing an  system function,
 then the compiler should stay out of your way and let you do what you
 want, unless it can **prove** that it's wrong.

The problem is, currently all functions that pass locals by ref cannot be proven safe modularly.

Right, but they can be proven safe if they pass locals by ref **to safe functions**. I don't think there's any disagreement that safe functions shouldn't be allowed to take the address of locals or parameters.
Aug 14 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 11:55 AM, dsimcha wrote:
 Argh, accidentally hit send before I meant to on my last post. Please
 ignore.

 On 8/14/2011 12:44 PM, Andrei Alexandrescu wrote:
 Consider the case
 of designing a D wrapper for C functionality.

 // C, we know it doesn't escape its parameters but the compiler doesn't.
 void cFun(int* a, int* b);

 // D:
 void dWrapper(ref int a, ref int b) {
 cFun(&a, &b);
 }

I understand. Probably it's fine to require an explicit cast for taking the address. Offhand, I don't see this as a frequent situation, or one that would make pass-by-pointer unpalatable.

Pass-by-pointer is really, really ugly when used in high-level D-style code, and exposes the implementation detail that the D wrapper is using C code. By explicit cast, do you mean one in dWrapper() that's encapsulated and invisible to the caller?

Yah, dWrapper would become: void dWrapper(ref int a, ref int b) { cFun(cast(int*) &a, cast(int*) &b); } If the casts are missing, the compiler's error message could clarify under what assumptions they might be inserted.
 If you want the compiler to put extra restrictions on you in the name of
 safety, that's what SafeD is for. If you're writing an  system function,
 then the compiler should stay out of your way and let you do what you
 want, unless it can **prove** that it's wrong.

The problem is, currently all functions that pass locals by ref cannot be proven safe modularly.

Right, but they can be proven safe if they pass locals by ref **to safe functions**. I don't think there's any disagreement that safe functions shouldn't be allowed to take the address of locals or parameters.

We don't have that rule yet, but we can enact it. I strongly believe it would help if we enacted the "unescapable ref" rule for all D code. It disallows a patently dangerous pattern that many C++ coding standards (including Facebook's) explicitly disallow. We found pernicious bugs in our code caused by escaping reference parameters, and we're looking into adding a rule in our lint program to statically disallow it. If that's worthwhile (and I have evidence it is), then it's all the better to put the check straight in the language. Andrei
Aug 14 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/14/2011 1:05 PM, Andrei Alexandrescu wrote:
 Pass-by-pointer is really, really ugly when used in high-level D-style
 code, and exposes the implementation detail that the D wrapper is using
 C code. By explicit cast, do you mean one in dWrapper() that's
 encapsulated and invisible to the caller?

Yah, dWrapper would become: void dWrapper(ref int a, ref int b) { cFun(cast(int*) &a, cast(int*) &b); } If the casts are missing, the compiler's error message could clarify under what assumptions they might be inserted.

Ok, IIUC we might have found some common ground here. Is the idea that, if you insert the cast, then it's an unsafe cast and you're free to take the address of a ref parameter, period? I think this is a reasonable compromise: 1. There's enormous precedent for the idea that casts are for things you **probably** shouldn't be doing but may occasionally have a good reason to do. 2. It's greppable, unlike the status quo, where there's no easy way to search for possible escaping of addresses of ref parameters. 3. It solves the encapsulation problem mentioned in my previous post. 4. It can be disallowed in SafeD as an unsafe cast. 5. If you allow taking the address of ref parameters without a cast as long as the compiler can prove that they don't escape, then performing this type of cast is a very explicit statement that you **know** the compiler can't prove that those addresses don't escape and that you take full responsibility for ensuring they don't. 6. It may eventually lead to a more comprehensive ownership type system similar to one that's been discussed here before, where there's ScopedPointers and regular pointers. A ScopedPointer is a super type of a regular pointer, isn't allowed to escape from where it was created, etc. Bottom line: I completely agree that escaping addresses of ref parameters is a terrible design. I'm fine with making constructs that have the potential to do so more verbose and explicit by requiring casts. However, I am against disallowing it completely for the following reasons: 1. The rules against it would have to be conservative, meaning at least some valid designs are tossed out as well. This is completely unacceptable in a systems language. 2. I'm not convinced that escaping addresses of ref parameters is at all easy to do by accident. 3. In a systems language the compiler should **never** go out of its way to completely disallow a design, no matter how bad that design is. A language is a tool that should do what the user tells it to, not a nanny that should prevent the user from being naughty. (Though this does not preclude the compiler making it hard to do bad things **by accident**.) I don't care if said design is wholeheartedly endorsed by The Devil himself.
Aug 14 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 12:39 PM, dsimcha wrote:
 On 8/14/2011 1:05 PM, Andrei Alexandrescu wrote:
 Pass-by-pointer is really, really ugly when used in high-level D-style
 code, and exposes the implementation detail that the D wrapper is using
 C code. By explicit cast, do you mean one in dWrapper() that's
 encapsulated and invisible to the caller?

Yah, dWrapper would become: void dWrapper(ref int a, ref int b) { cFun(cast(int*) &a, cast(int*) &b); } If the casts are missing, the compiler's error message could clarify under what assumptions they might be inserted.

Ok, IIUC we might have found some common ground here. Is the idea that, if you insert the cast, then it's an unsafe cast and you're free to take the address of a ref parameter, period?

Everything sounds great, thanks. Andrei
Aug 14 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 We found pernicious bugs in 
 our code caused by escaping reference parameters, and we're looking into 
 adding a rule in our lint program to statically disallow it. If that's 
 worthwhile (and I have evidence it is), then it's all the better to put 
 the check straight in the language.

Another possibility is to add it to an experimental branch of DMD, study its use and effects for some time, and then decide what to do of this idea.
 We talked about this, too. I even aired ~scope. Such a change would be
 doable but is liable to break a lot of code.

It's interesting to know how much code and how much hard the changes are to do. Like in Go, a small tool similar to the one that converts Python2 code to Python3 code is getting useful in D too, to reduce the amount of work of people (including Phobos dev) that have to update D code to follow changed in D language. Bye, bearophile
Aug 14 2011
prev sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2011-08-14 16:55:08 +0000, dsimcha <dsimcha yahoo.com> said:

 Right, but they can be proven safe if they pass locals by ref **to 
  safe functions**.  I don't think there's any disagreement that  safe 
 functions shouldn't be allowed to take the address of locals or 
 parameters.

Actually, no, that's not safe by itself. Consider this: ref int foo(ref int a) safe { return a; } ref int bar() safe { int a; return foo(a); } And now 'bar' returns its local variable 'a' by ref, thanks to the complicity of 'foo'. All this unsafety is perfectly safe. I think a safe function shouldn't be allowed to return by ref one of its parameter. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Aug 14 2011
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 14.08.2011, 21:08 Uhr, schrieb bearophile <bearophileHUGS lycos.com>:

 Like in Go, a small tool similar to the one that converts Python2 code  
 to Python3 code is getting useful in D too, to reduce the amount of work  
 of people (including Phobos dev) that have to update D code to follow  
 changed in D language.

 Bye,
 bearophile

+1 (although with __gshared it would have created some horrible code when the change was made :p)
Aug 14 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-14 16:20, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:

 class C {
 int * p;
 this(ref int x) {
 p = &x; // escapes the address of a ref parameter
 }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor).
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or
 reference);

 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

 * take its address as long as the address doesn't outlive the frame of
 the function.

 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.

 Please chime in with any comments you might have!


 Thanks,

 Andrei

I have code relying on this, probably not could practice but it works. This is a usage example: void main () { int i = 3; restore(i) in { i = 4; }; assert(i == 3); } Restore returns a struct which overloads the "in" operator and stores a pointer to the value pass to "restore". I'm overloading the "in" operator have a nicer looking delegate syntax. But I guess this could be seen as operator overload abuse. If D just could have a good looking syntax for delegate literals, like this: restore(i) { i = 4; } Then this wouldn't be needed. -- /Jacob Carlborg
Aug 14 2011
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
Another example of why this is a bad idea:

In std.parallelism, I have a function called TaskPool.put, which takes a 
Task object by reference, takes its address and puts it on the task 
queue.  This is used for scoped tasks.  However, it's safe because Task 
has a destructor that waits for the task to be finished and out of the 
task queue before destroying the stack frame it's on and returning.

Why can't we just establish a strong convention that, if a function 
truly escapes the address of a ref parameter (meaning it actually lives 
longer than the lifetime of the function), you take a pointer instead of 
a ref?  It's not like escaping ref parameters unintentionally is a 
common source of bugs.

My point is that any rule we come up with will always be conservative. 
D is a **systems language** and needs to give the benefit of the doubt 
to assuming the programmer knows what he/she is doing.  If you want 
extra checks, that's what SafeD is for.

On 8/14/2011 10:20 AM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. Consider:

 class C {
 int * p;
 this(ref int x) {
 p = &x; // escapes the address of a ref parameter
 }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,
 particularly because the caller has no syntactic cue that the address of
 the parameter is passed into the function (in this case constructor).
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just
 take int* as a parameter instead of ref int. What a function can do with
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or
 reference);

 * pass its address down to pure functions because a pure function cannot
 escape the address anyway (cool insight by Walter);

 * take its address as long as the address doesn't outlive the frame of
 the function.

 The third bullet is not easy to implement as it requires flow analysis,
 but we may start with a conservative version first. Probably there won't
 be a lot of broken code anyway.

 Please chime in with any comments you might have!


 Thanks,

 Andrei

Aug 14 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 10:41 AM, dsimcha wrote:
 Another example of why this is a bad idea:

 In std.parallelism, I have a function called TaskPool.put, which takes a
 Task object by reference, takes its address and puts it on the task
 queue. This is used for scoped tasks. However, it's safe because Task
 has a destructor that waits for the task to be finished and out of the
 task queue before destroying the stack frame it's on and returning.

I understand. Would it be agreeable to require a cast to take the address of the parameter since you're relying on an extralinguistic invariant? Basically you'd be more motivated to do so if you recognized how problematic escaping ref parameters is for most cases.
 Why can't we just establish a strong convention that, if a function
 truly escapes the address of a ref parameter (meaning it actually lives
 longer than the lifetime of the function), you take a pointer instead of
 a ref? It's not like escaping ref parameters unintentionally is a common
 source of bugs.

Convention has its usefulness, but also major downsides. The problem here is that we can't verify (or infer) safe for a lot of functions. Basically all functions taking ref become very difficult to use from safe code, including the common idiom of passing a stack variable into a function by reference. I don't think we can afford to lose so much when turning safety on.
 My point is that any rule we come up with will always be conservative. D
 is a **systems language** and needs to give the benefit of the doubt to
 assuming the programmer knows what he/she is doing. If you want extra
 checks, that's what SafeD is for.

I have only little sympathy for this argument; it actually leaves me more convinced we're on the right path. We're not talking about making it impossible to do something that you want to do. We're discussing about a change that will make a lot of functions efficient _and_ safe, leaving a minority of cases to a slight syntactic change. Andrei
Aug 14 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/14/2011 12:54 PM, Andrei Alexandrescu wrote:
 I have only little sympathy for this argument; it actually leaves me
 more convinced we're on the right path. We're not talking about making
 it impossible to do something that you want to do. We're discussing
 about a change that will make a lot of functions efficient _and_ safe,
 leaving a minority of cases to a slight syntactic change.


 Andrei

But this breaks encapsulation horribly in the presence of conservative rules. Let's say you start off with a function: SomeType fun(ref T arg) { .... } Then you change fun()'s implementation such that it takes the address of arg. It does **not** escape this address, so the fact that the address is taken is an implementation detail. However, since the compiler's rules are conservative, this code would might be illegal if the compiler can't prove via its static analysis that the addresses don't escape. Bam! Implementation details leaking into function signatures.
Aug 14 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 12:02 PM, dsimcha wrote:
 On 8/14/2011 12:54 PM, Andrei Alexandrescu wrote:
 I have only little sympathy for this argument; it actually leaves me
 more convinced we're on the right path. We're not talking about making
 it impossible to do something that you want to do. We're discussing
 about a change that will make a lot of functions efficient _and_ safe,
 leaving a minority of cases to a slight syntactic change.


 Andrei

But this breaks encapsulation horribly in the presence of conservative rules. Let's say you start off with a function: SomeType fun(ref T arg) { .... } Then you change fun()'s implementation such that it takes the address of arg. It does **not** escape this address, so the fact that the address is taken is an implementation detail. However, since the compiler's rules are conservative, this code would might be illegal if the compiler can't prove via its static analysis that the addresses don't escape. Bam! Implementation details leaking into function signatures.

You are exploring an increasingly narrow niche. Is it worth keeping a hole in the language for the sake of that? Andrei
Aug 14 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 8/14/2011 1:06 PM, Andrei Alexandrescu wrote:
 You are exploring an increasingly narrow niche. Is it worth keeping a
 hole in the language for the sake of that?

 Andrei

Yes!!! Such conservative and inflexible rules have no place in a systems language, period.
Aug 14 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/14/11 12:12 PM, dsimcha wrote:
 On 8/14/2011 1:06 PM, Andrei Alexandrescu wrote:
 You are exploring an increasingly narrow niche. Is it worth keeping a
 hole in the language for the sake of that?

 Andrei

Yes!!! Such conservative and inflexible rules have no place in a systems language, period.

I see. Do you have a response to any of the arguments I brought? Among other things, does the fact that you still can do what you want to do assuage your perceived inconvenience? Andrei
Aug 14 2011
prev sibling next sibling parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/14/2011 7:20 AM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an 
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping 
 addresses of ref parameters. <snip>

... I hope you're joking. (1) I thought the whole point of D was that you didn't need pointers to program effectively? (2) Isn't this what compiler **warnings** are for?
Aug 14 2011
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/15/2011 08:16 AM, Mehrdad wrote:
 On 8/14/2011 7:20 AM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. <snip>

... I hope you're joking. (1) I thought the whole point of D was that you didn't need pointers to program effectively?

Why would the change contradict this?
 (2) Isn't this what compiler **warnings** are for?

In a well designed language, warnings are useless. Either the code is well formed or it is not. All you will have to do to convince the compiler that what you are doing is safe is to insert a cast.
Aug 15 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/15/2011 4:10 AM, Timon Gehr wrote:
 On 08/15/2011 08:16 AM, Mehrdad wrote:
 (1) I thought the whole point of D was that you didn't need pointers to
 program effectively?

Why would the change contradict this?

 (2) Isn't this what compiler **warnings** are for?


You've never used C#/C/Java/whatever? Or do you think they're all poorly designed?
Aug 15 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/15/2011 04:04 PM, Mehrdad wrote:
 On 8/15/2011 4:10 AM, Timon Gehr wrote:
 On 08/15/2011 08:16 AM, Mehrdad wrote:
 (1) I thought the whole point of D was that you didn't need pointers to
 program effectively?

Why would the change contradict this?


Only if the function intends to escape the reference. And if you really need to, you can still use a cast. Furthermore, escaping a reference is generally unsafe when it is to stack memory (they can be some higher-level invariant that guarantees safety, but that is not within the compilers reach -- use a cast.) When the reference is to a value type on heap memory, you had pointers in your code all along.
 (2) Isn't this what compiler **warnings** are for?


You've never used C#/C/Java/whatever? Or do you think they're all poorly designed?

Warnings are issued for constructs that are regarded as potentially dangerous/error-prone by many people, but are still valid code. Therefore, they usually reflect suboptimal language design.
Aug 15 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/15/2011 2:04 PM, Timon Gehr wrote:
 On 08/15/2011 04:04 PM, Mehrdad wrote:
 On 8/15/2011 4:10 AM, Timon Gehr wrote: Because now you need pointers 
 to pass things by reference?

really need to, you can still use a cast.

... introducing D#? void dangerous(ref int x) { unsafe { bar(&x); } } (i.e. "you can still use a cast" is pretty much equivalent to using C#'s unsafe block. Maybe C#'s done some things correctly after all, huh?...)
 Furthermore, escaping a reference is generally unsafe when it is to 
 stack memory (they can be some higher-level invariant that guarantees 
 safety, but that is not within the compilers reach -- use a cast.)

Right, so let's also ban "auto ref" as a return value, since it could be returning the parameter itself, without the caller's knowledge: auto ref trySomethingExpensive(T)(auto ref T input, bool condition) { return condition ? doSomethingExpensive(input) : input; // Optimize the input } Thoughts? Should we ban ref return types, too?
 When the reference is to a value type on heap memory, you had pointers 
 in your code all along.

Java and C# have had pointers all along, but they're pretty darn successful at rendering them useless *except* for interop purposes. Obviously, it looks as though D is failing to achieve that same goal, requiring pointers for something so simple -- do we really want that?
 Warnings are issued for constructs that are regarded as potentially 
 dangerous/error-prone by many people, but are still valid code. 
 Therefore, they usually reflect suboptimal language design.

Is that why my C compiler issues "Warning: Uninitialized local variable p" when I run this? int *p; *p = 5; AFAIK this is clearly *INVALID* (i.e. undefined) code according to the standard...
Aug 15 2011
next sibling parent reply Jakob Ovrum <jakobovrum+ng gmail.com> writes:
On 2011/08/16 12:33, Mehrdad wrote:
 On 8/15/2011 2:04 PM, Timon Gehr wrote:
 On 08/15/2011 04:04 PM, Mehrdad wrote:
 On 8/15/2011 4:10 AM, Timon Gehr wrote: Because now you need pointers
 to pass things by reference?

really need to, you can still use a cast.

... introducing D#? void dangerous(ref int x) { unsafe { bar(&x); } } (i.e. "you can still use a cast" is pretty much equivalent to using C#'s unsafe block. Maybe C#'s done some things correctly after all, huh?...)
 Furthermore, escaping a reference is generally unsafe when it is to
 stack memory (they can be some higher-level invariant that guarantees
 safety, but that is not within the compilers reach -- use a cast.)

Right, so let's also ban "auto ref" as a return value, since it could be returning the parameter itself, without the caller's knowledge: auto ref trySomethingExpensive(T)(auto ref T input, bool condition) { return condition ? doSomethingExpensive(input) : input; // Optimize the input } Thoughts? Should we ban ref return types, too?

No reference to stack memory can escape its stack frame in your example. I don't see how this has anything to do with the discussion.
 When the reference is to a value type on heap memory, you had pointers
 in your code all along.

Java and C# have had pointers all along, but they're pretty darn successful at rendering them useless *except* for interop purposes. Obviously, it looks as though D is failing to achieve that same goal, requiring pointers for something so simple -- do we really want that?

Java doesn't have pointers. (unless you're referring to its reference types, which you obviously aren't, considering your interop comment). Pointers aren't required in D either, but are supported nevertheless, one reason being that D allows for putting value types directly on the heap, like structs and primitives, which C# doesn't.
 Warnings are issued for constructs that are regarded as potentially
 dangerous/error-prone by many people, but are still valid code.
 Therefore, they usually reflect suboptimal language design.

Is that why my C compiler issues "Warning: Uninitialized local variable p" when I run this? int *p; *p = 5; AFAIK this is clearly *INVALID* (i.e. undefined) code according to the standard...

Aug 15 2011
parent reply Mehrdad <wfunction hotmail.com> writes:
On 8/15/2011 9:09 PM, Jakob Ovrum wrote:
 On 2011/08/16 12:33, Mehrdad wrote:
 auto ref trySomethingExpensive(T)(auto ref T input, bool condition)
 {
     return condition ? doSomethingExpensive(input) : input; // 
 Optimize the input
 }
 Thoughts? Should we ban ref return types, too?

I don't see how this has anything to do with the discussion.

Oh, I assumed it was obvious that you usually have to /call/ a function for anything to happen. But perhaps it wasn't, my bad. Here's something (hopefully) more obvious/verbose: auto ref call_if(T)(auto ref T input, bool cond) { return cond ? call_something(input) : input; } ref int get_foo(int s) { int a = 9.8 * s; return call_if(x, true); // oops? } void test() { writeln(GetFoo()); // <--- what do you suppose gets returned? }
 When the reference is to a value type on heap memory, you had 
 pointers in your code all along.



had pointers in your Java code all along?
 Pointers aren't required in D either.

 Warnings are issued for constructs that are regarded as potentially 
 dangerous/error-prone by many people, but are still valid code.

*p = 5; AFAIK this is clearly *INVALID* (i.e. undefined) code according to the standard...


Would be curious to know why you just ignored that comment, btw. Do you really consider that to be "valid" code, in any sense of the term?
Aug 15 2011
next sibling parent Mehrdad <wfunction hotmail.com> writes:
On 8/15/2011 9:48 PM, Mehrdad wrote:
 ref int get_foo(int s) {
     int a = 9.8 * s;
     return call_if(x, true);  // oops?
 }

Sorry, that should say "false", not "true".
Aug 15 2011
prev sibling parent Jakob Ovrum <jakobovrum+ng gmail.com> writes:
On 2011/08/16 13:48, Mehrdad wrote:
 On 8/15/2011 9:09 PM, Jakob Ovrum wrote:
 On 2011/08/16 12:33, Mehrdad wrote:
 auto ref trySomethingExpensive(T)(auto ref T input, bool condition)
 {
 return condition ? doSomethingExpensive(input) : input; // Optimize
 the input
 }
 Thoughts? Should we ban ref return types, too?

I don't see how this has anything to do with the discussion.

Oh, I assumed it was obvious that you usually have to /call/ a function for anything to happen. But perhaps it wasn't, my bad. Here's something (hopefully) more obvious/verbose:

Not all calls to that function would be dangerous. I would even go so far as to claim most uses of such a function would not be dangerous. No need to be snarky, anyway.
 auto ref call_if(T)(auto ref T input, bool cond) {
 return cond ? call_something(input) : input;
 }

 ref int get_foo(int s) {
 int a = 9.8 * s;
 return call_if(x, true); // oops?
 }

 void test() {
 writeln(GetFoo()); // <--- what do you suppose gets returned?
 }

I agree these cases are problematic too, but disallowing escaping of ref parameters does not affect them. I don't think Andrei's suggestion includes disallowing returning ref parameters by ref, if it does, I agree that it would reduce the usefulness of ref return.
 When the reference is to a value type on heap memory, you had
 pointers in your code all along.



had pointers in your Java code all along?
 Pointers aren't required in D either.


Things will work the same, except you'll need an explicit cast if you want to take the address of the ref parameter (which, as before, yields a pointer). You shouldn't take the address of ref parameters unless you're sure it's safe (but you seem to agree on this, with your C# unsafe example).
 Warnings are issued for constructs that are regarded as potentially
 dangerous/error-prone by many people, but are still valid code.

*p = 5; AFAIK this is clearly *INVALID* (i.e. undefined) code according to the standard...


Would be curious to know why you just ignored that comment, btw. Do you really consider that to be "valid" code, in any sense of the term?

I'm not the one who said it was...
Aug 16 2011
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 16.08.2011 7:33, Mehrdad wrote:

     void dangerous(ref int x)
     {
         unsafe
         {
            bar(&x);
         }
     }

{ bar(&x); } Fixed? -- Dmitry Olshansky
Aug 16 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 08/16/2011 12:09 PM, Dmitry Olshansky wrote:
 On 16.08.2011 7:33, Mehrdad wrote:

 void dangerous(ref int x)
 {
 unsafe
 {
 bar(&x);
 }
 }

{ bar(&x); } Fixed?

Andrei proposed to make this invalid even for system functions. The only way to escape a ref param's address would be system void dangerous(ref int x) { bar(cast(int*)&x); }
Aug 16 2011
parent Mehrdad <wfunction hotmail.com> writes:
On 8/16/2011 8:08 AM, Timon Gehr wrote:
 Andrei proposed to make this invalid even for system functions. The 
 only way to escape a ref param's address would be

  system void dangerous(ref int x) {
     bar(cast(int*)&x);
 }

 So if you are supposing I think the C language is well designed and 
 warnings are useless for C, you are on the wrong path.

consider to be well-designed, other than D? Maybe we can (hopefully) find some common ground then...
Aug 16 2011
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 08/16/2011 05:33 AM, Mehrdad wrote:
 On 8/15/2011 2:04 PM, Timon Gehr wrote:
 On 08/15/2011 04:04 PM, Mehrdad wrote:
 On 8/15/2011 4:10 AM, Timon Gehr wrote: Because now you need pointers
 to pass things by reference?

really need to, you can still use a cast.

... introducing D#? void dangerous(ref int x) { unsafe { bar(&x); } } (i.e. "you can still use a cast" is pretty much equivalent to using C#'s unsafe block. Maybe C#'s done some things correctly after all, huh?...)

Sure. I was never implying otherwise. But the main reason that in C# casts are not suitable for that kind of thing is that C# adopted the C cast syntax and that in a managed language, casts should always be safe.
 Furthermore, escaping a reference is generally unsafe when it is to
 stack memory (they can be some higher-level invariant that guarantees
 safety, but that is not within the compilers reach -- use a cast.)

Right, so let's also ban "auto ref" as a return value, since it could be returning the parameter itself, without the caller's knowledge: auto ref trySomethingExpensive(T)(auto ref T input, bool condition) { return condition ? doSomethingExpensive(input) : input; // Optimize the input } Thoughts? Should we ban ref return types, too?

Well obviously some measure to ban the improper usage of ref return types from safeD has to be taken eventually (otherwise safeD would not be memory safe, duh). Basically the problem arises, when you directly pass on a reference you got from a function that took some of your local variables by ref (maybe indirectly through multiple ref returns). So that could be the construct to ban, which is less restrictive than banning ref return altogether.
 When the reference is to a value type on heap memory, you had pointers
 in your code all along.


auto x=new int; static assert(is(typeof(x)==int*)); Alternatively, you can use a wrapper class and then you don't need pointers at all.
 Java and C# have had pointers all along, but they're pretty darn
 successful at rendering them useless *except* for interop purposes.
 Obviously, it looks as though D is failing to achieve that same goal,
 requiring pointers for something so simple -- do we really want that?

D can still express everything that Java and C# can (and more) without allowing escaping ref argument addresses.
 Warnings are issued for constructs that are regarded as potentially
 dangerous/error-prone by many people, but are still valid code.
 Therefore, they usually reflect suboptimal language design.

Is that why my C compiler issues "Warning: Uninitialized local variable p" when I run this? int *p; *p = 5; AFAIK this is clearly *INVALID* (i.e. undefined) code according to the standard...

C is a machine-friendly language for writing lightning fast portable programs for arbitrary von Neumann architecture computers. Furthermore, writing a conformant compiler should be very easy. It does not need to be a well designed language, because that would interfer with other goals. So if you are supposing I think the C language is well designed and warnings are useless for C, you are on the wrong path. It is the other way round. But it is really not a problem at all. I still like C. Usually, well designed languages have somewhat worse run-time characteristics.
Aug 16 2011
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/15/11 1:16 AM, Mehrdad wrote:
 On 8/14/2011 7:20 AM, Andrei Alexandrescu wrote:
 Walter and I have had a long discussion and we thought we'd bring an
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping
 addresses of ref parameters. <snip>

... I hope you're joking. (1) I thought the whole point of D was that you didn't need pointers to program effectively?

The proposed change has little to do with needing pointers or not.
 (2) Isn't this what compiler **warnings** are for?

No. Andrei
Aug 15 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 14 Aug 2011 10:20:37 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter and I have had a long discussion and we thought we'd bring an  
 idea for community review.

 We believe it would be useful for safety purposes to disallow escaping  
 addresses of ref parameters. Consider:

 class C {
    int * p;
    this(ref int x) {
      p = &x; // escapes the address of a ref parameter
    }
 }

 Such code is accepted today. We believe it is error-prone and dangerous,  
 particularly because the caller has no syntactic cue that the address of  
 the parameter is passed into the function (in this case constructor).  
 Worse, such a function cannot be characterized as  safe.

 So we want to make the above an error. The workaround is obvious - just  
 take int* as a parameter instead of ref int. What a function can do with  
 a ref parameter in general is:

 * use it directly just like a local;

 * pass it down to other functions (which may take it by value or  
 reference);

 * pass its address down to pure functions because a pure function cannot  
 escape the address anyway (cool insight by Walter);

 * take its address as long as the address doesn't outlive the frame of  
 the function.

 The third bullet is not easy to implement as it requires flow analysis,  
 but we may start with a conservative version first. Probably there won't  
 be a lot of broken code anyway.

 Please chime in with any comments you might have!

It sounds reasonable, especially with the added clarification that you can cast yourself back to the good old unsafe pointer. The one thing I'm leery of is that structs are passed by reference for member functions, which is *forced* by the compiler. Not that it's going to be horrible, but I think in certain cases, especially for things that allocate structs on the heap, this is going to require a lot of casting. Here is a real world example for dcollections that is full of &this: http://www.dsource.org/projects/dcollections/browser/branches/d2/dcollections/Link.d#L37 Is there no way to say "for this section of code, allow taking reference addresses"? -Steve
Aug 15 2011