www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - ref is unsafe

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
After some recent discussions relating to auto ref and const ref, I have come 
to the conlusion that as it stands, ref is not  safe. It's  system. And I 
think that we need to take a serious look at it to see what we can do to make 
it  safe. The problem is combining code that takes ref parameters with code 
that returns by ref. Take this code for example:

ref int foo(ref int i)
{
    return i;
}

ref int bar()
{
    int i = 7;
    return foo(i);
}

ref int baz(int i)
{
    return foo(i);
}

void main()
{
    auto a = bar();
    auto b = baz(5);
}

Both bar and baz return a ref to a local variable which no longer exists. They 
refer to garbage. It's exactly the same problem as in

int* foo(int* i)
{
    return i;
}

int* bar()
{
    int i = 7;
    return foo(&i);
}

void main()
{
    auto a = bar();
}

However, that code is considered  system, because it's taking the address of a 
local variable, whereas the code using ref is considered to be  safe. But it's 
just as unsafe as taking the address of the local variable is. Really, it's 
the same thing but with differing syntax.

The question is what to do about this. The most straightforward thing is to 
just make ref parameters  system, but that would be horrible. With that sort 
of restriction, a _lot_ of code suddenly won't be able to be  safe, and it 
affects const ref and auto ref and anything else along those lines, so whatever 
solution we come up with for having auto ref with non-templated functions will 
almost certainly have the problem, and once that works, I'd expect it to be 
used pretty much by default, making most D code  system, which would be a 
_big_ problem.

Another possibility is to make ref imply scope, but given the transitive 
nature of that, that could be _really_ annoying. Maybe it's the correct 
solution though.

Another possibility would be to make it so that functions with a ref parameter 
are only  system if they also return by ref (the lack of ability to have ref 
variables outside of parameters and return types saves us from such a ref 
being squirreled away somewhere). I don't know how good or bad an idea that 
is. It certainly reduces how much code using ref would have to be  system, but 
it might not be sufficient given how much stuff like std.algorithm uses auto
ref 
for its return types.

And maybe another solution which I can't think of at the moment would be 
better. But my point is that we currently have a _major_ hole in SafeD thanks 
to the combination of ref parameters and ref return types, and we need to find 
a solution.

- Jonathan M Davis


Related: http://d.puremagic.com/issues/show_bug.cgi?id=8838
Dec 30 2012
next sibling parent reply "Daniel Kozak" <kozzi11 gmail.com> writes:
On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis 
wrote:
 After some recent discussions relating to auto ref and const 
 ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's 
  system. And I
 think that we need to take a serious look at it to see what we 
 can do to make
 it  safe. The problem is combining code that takes ref 
 parameters with code
 that returns by ref. Take this code for example:

 ref int foo(ref int i)
 {
     return i;
 }

 ref int bar()
 {
     int i = 7;
     return foo(i);
 }

 ref int baz(int i)
 {
     return foo(i);
 }

 void main()
 {
     auto a = bar();
     auto b = baz(5);
 }

 Both bar and baz return a ref to a local variable which no 
 longer exists. They
 refer to garbage. It's exactly the same problem as in

IMHO, try to return ref to local variable should be error, and such a code shouldn't be compilable
Dec 30 2012
parent reply Nick Treleaven <ntrel-public yahoo.co.uk> writes:
On 30/12/2012 09:17, Jonathan M Davis wrote:
 The problem is the wrapper function.
 You'd also have to disallow functions from returning ref parameters by ref.
 Otherwise,

 ref int foo(ref int i)
 {
      return i;
 }

 ref int baz(int i)
 {
      return foo(i);
 }

 continues to cause problems. And making it illegal to return ref parameters by
 ref would be a serious problem for wrapper ranges, because they do that sort
 of thing all the time with front. So, that's not really going to work.

I think the compiler needs to be able to mark foo as a function that returns its input reference. Then, any arguments to foo that are locals should cause an error at the call site (e.g. in baz). So legal calls to foo can always be safe. To extend the above code: ref int quux(ref int i) { return foo(i); } Here the compiler already knows that foo returns its input reference. So it checks whether foo is being passed a local - no; but it also has to check if foo is passed any ref parameters of quux, which it is. The compiler now has to mark quux as a function that returns its input reference. Works?
Dec 30 2012
next sibling parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2012-12-30 22:29:33 +0000, Jonathan M Davis <jmdavisProg gmx.com> said:

 Good point. Member variables of parameters also cause problems. So, it very
 quickly devolves to any function which accepts a user-defined type by ref and
 returns anything by ref would have to be  system, which is far from pleasant.

Note that the above definition includes every struct member function that returns a ref because of the implicit this parameter. Also, it's not just functions returning by ref, it could be a function returning a delegate too, if the delegate happens to make use of the reference: void delegate() foo(ref int a) { return { writeln(a); }; } void delegate() bar() { int a; return foo(a); // leaking reference to a beyond bar's scope } And similar to passing a value by ref: you can pass a slice to a static array, then return the slice: int[] foo(int[] a) { return a; } int[] bar() { int[2] a; return foo(a[]); } Three variations on the same theme. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Dec 30 2012
prev sibling parent reply Nick Treleaven <ntrel-public yahoo.co.uk> writes:
On 30/12/2012 22:01, Jonathan M Davis wrote:
 On Sunday, December 30, 2012 17:32:40 Nick Treleaven wrote:
 I think the compiler needs to be able to mark foo as a function that
 returns its input reference. Then, any arguments to foo that are locals
 should cause an error at the call site (e.g. in baz). So legal calls to
 foo can always be  safe.

 To extend the above code:

 ref int quux(ref int i)
 {
       return foo(i);
 }

 Here the compiler already knows that foo returns its input reference. So
 it checks whether foo is being passed a local - no; but it also has to
 check if foo is passed any ref parameters of quux, which it is. The
 compiler now has to mark quux as a function that returns its input
 reference.

 Works?

No. There's no guarantee that the compiler has access to the function's body, and the function being called could be compiled after the function which calls it. There's a reason that attribute inferrence only works with templated functions. In every other case, the programmer has to mark it. We're _not_ going to get any kind inferrence without templates. D's compilation model doesn't allow it.

I was aware attributes would be needed for .di files. I suppose attribute inference for non-template functions is not doable.
 The closest that we could get to what you suggest would be to add a new
 attribute similar to nothrow but which guarantees that the function does not
 return a ref to a parameter. So, you'd have to mark your functions that way
 (e.g. with  norefparamreturn). Maybe the compiler could infer it for templated
 ones, but this attribute would basically have to work like other inferred
 attributes and be marked manually in all other cases. Certainly, you can't
 have the compiler figuring it out for you in general, because D's compilation
 model allows the function being called to be compiled separately from (and
 potentially after) the function calling it.

As you suggested below that, I would have the attribute mean the opposite, refparamreturn. Functions that need it but don't have it can be detected by recompiling them. The syntax could be 'in ref': in ref int quux(ref int i);
 And when you think about what this attribute would be needed for, it gets a
 bit bizarre to have it. The _only_ time that it's applicable is when a
 function takes an argument by ref and returns the same type by ref. In all
 other cases, the compiler can guarantee it just based on the type system.

As jerro and Michel Fortin pointed out, 'in ref' would be needed for returning input struct members, and capturing inputs with delegates and slices. So the feature might pull its weight. I think detecting all these situations would be essentially the same as the checks needed for a scope parameter, so if/when scope parameters get implemented, 'in ref' might not be hard to bolt on. If a simpler solution doesn't disallow any sensible uses of ref returns, that would be preferable. But I don't think we've found it yet.
 I suppose that we could have an attribute that indicated that a function _did_
 return a ref to one of its params and then have the compiler give an error if
 it were missing, which means that the foo function

 ref int foo(ref int i)
 {
      return i;
 }

 would end up with an error for not having the attribute, whereas a function
 like baz

 ref int baz(int i)
 {
      return foo(i);
 }

 would not end up with the error unless foo had the attribute on it. But that's
 very different from any attribute that we currently have. It would be like
 having a throw attribute instead of a nothrow attribute. I suppose that it is
 a possible solution though. I could also see an argument that the attribute
 should go on the parameter rather than the function, in which case you could
 have more fine-grained control over it, but it does complicate things further.

I suppose a parameter attribute might be useful to allow passing locals to other ref parameters which aren't returned, and as documentation.
Dec 31 2012
parent Nick Treleaven <ntrel-public yahoo.co.uk> writes:
On 31/12/2012 14:44, Nick Treleaven wrote:
 On 30/12/2012 22:01, Jonathan M Davis wrote:
 The closest that we could get to what you suggest would be to add a new
 attribute similar to nothrow but which guarantees that the function
 does not
 return a ref to a parameter. So, you'd have to mark your functions
 that way
 (e.g. with  norefparamreturn). Maybe the compiler could infer it for
 templated
 ones, but this attribute would basically have to work like other inferred
 attributes and be marked manually in all other cases. Certainly, you
 can't
 have the compiler figuring it out for you in general, because D's
 compilation
 model allows the function being called to be compiled separately from
 (and
 potentially after) the function calling it.

As you suggested below that, I would have the attribute mean the opposite, refparamreturn. Functions that need it but don't have it can be detected by recompiling them. The syntax could be 'in ref': in ref int quux(ref int i);

Actually overloading 'in ref' with a different meaning on return type from parameter (const scope) is probably a bad idea. Instead: ref int quux( escape ref int i); [...]
 I suppose a parameter attribute might be useful to allow passing locals
 to other ref parameters which aren't returned, and as documentation.

Dec 31 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 30, 2012 10:04:01 Daniel Kozak wrote:
 IMHO, try to return ref to local variable should be error, and
 such a code shouldn't be compilable

You can disallow that in the easy case of ref int boo(int i) { return i; } and in fact, that's already illegal. The problem is the wrapper function. You'd also have to disallow functions from returning ref parameters by ref. Otherwise, ref int foo(ref int i) { return i; } ref int baz(int i) { return foo(i); } continues to cause problems. And making it illegal to return ref parameters by ref would be a serious problem for wrapper ranges, because they do that sort of thing all the time with front. So, that's not really going to work. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 30 December 2012 at 09:18:30 UTC, Jonathan M Davis 
wrote:
 On Sunday, December 30, 2012 10:04:01 Daniel Kozak wrote:
 IMHO, try to return ref to local variable should be error, and
 such a code shouldn't be compilable

You can disallow that in the easy case of ref int boo(int i) { return i; } and in fact, that's already illegal. The problem is the wrapper function. You'd also have to disallow functions from returning ref parameters by ref. Otherwise, ref int foo(ref int i) { return i; } ref int baz(int i) { return foo(i); } continues to cause problems. And making it illegal to return ref parameters by ref would be a serious problem for wrapper ranges, because they do that sort of thing all the time with front. So, that's not really going to work. - Jonathan M Davis

Wouldn't it be enough to disallow functions that both take and return by ref? There would still be some limitations, but at least: //---- property ref T front(T)(T[] a); //---- Would still be safe. It seams the only code that is unsafe always boils down to taking an argument by ref and returning it by ref... At best, we'd (try) to only make that illegal (when we can), or (seeing things the other (safer) way around), only allow returning by ref, if the compiler is able to prove it is not also an input by ref?
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 30, 2012 11:04:35 monarch_dodra wrote:
 Wouldn't it be enough to disallow functions that both take and
 return by ref? There would still be some limitations, but at
 least:
 
 //----
  property ref T front(T)(T[] a);
 //----
 Would still be  safe.
 
 It seams the only code that is unsafe always boils down to taking
 an argument by ref and returning it by ref...
 
 At best, we'd (try) to only make that illegal (when we can), or
 (seeing things the other (safer) way around), only allow
 returning by ref, if the compiler is able to prove it is not also
 an input by ref?

The question is whether that would be too limiting. Certainly, it risks being a big problem for wrapper functions, since they may _need_ to take an argument by ref and return it by ref (or more probably, auto ref for both, but that amounts to the same thing as far as this issue goes). We could go with making such functions system rather than safe, but I don't know how problematic that would be. We may have no choice though, since unless you can prove that the ref being passed in will stay valid as long as the ref being passed out is used, you can't prove that that code is safe. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12/30/2012 12:37 PM, Jonathan M Davis пишет:
 After some recent discussions relating to auto ref and const ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's  system. And I
 think that we need to take a serious look at it to see what we can do to make
 it  safe. The problem is combining code that takes ref parameters with code
 that returns by ref. Take this code for example:

[snip]
 And maybe another solution which I can't think of at the moment would be
 better. But my point is that we currently have a _major_ hole in SafeD thanks
 to the combination of ref parameters and ref return types, and we need to find
 a solution.

 - Jonathan M Davis


 Related: http://d.puremagic.com/issues/show_bug.cgi?id=8838

And another one: http://d.puremagic.com/issues/show_bug.cgi?id=9195 -- Dmitry Olshansky
Dec 30 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Sunday, 30 December 2012 at 17:32:41 UTC, Nick Treleaven wrote:
 On 30/12/2012 09:17, Jonathan M Davis wrote:
 The problem is the wrapper function.
 You'd also have to disallow functions from returning ref 
 parameters by ref.
 Otherwise,

 ref int foo(ref int i)
 {
     return i;
 }

 ref int baz(int i)
 {
     return foo(i);
 }

 continues to cause problems. And making it illegal to return 
 ref parameters by
 ref would be a serious problem for wrapper ranges, because 
 they do that sort
 of thing all the time with front. So, that's not really going 
 to work.

I think the compiler needs to be able to mark foo as a function that returns its input reference. Then, any arguments to foo that are locals should cause an error at the call site (e.g. in baz). So legal calls to foo can always be safe. To extend the above code: ref int quux(ref int i) { return foo(i); } Here the compiler already knows that foo returns its input reference. So it checks whether foo is being passed a local - no; but it also has to check if foo is passed any ref parameters of quux, which it is. The compiler now has to mark quux as a function that returns its input reference. Works?

That seems like a promising approach. If the compiler can track where the local is being passed by ref and returned by ref, then it should be able to determine if the ref to the local is leaving the scope it was originally conceived in and issue a compiler error if it is. The idea of "tagging" the local so that it can be tracked may work well. You may still be able to hide it from the compiler using pointers, but at that point you're not safe anymore but that should be fine because all we want to do is allow returns by ref to be proven safe or not. In general terms, no reference to a local should ever leave it's scope, so ultimately the compiler *has* to track the scope of any local, no matter if it is being passed by ref or not, so really this is a solution that has to be implemented one way or the other. --rt
Dec 30 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 Here the compiler already knows that foo returns its input 
 reference. So it checks whether foo is being passed a local - 
 no; but it also has to check if foo is passed any ref 
 parameters of quux, which it is. The compiler now has to mark 
 quux as a function that returns its input reference.

 Works?

If functions's source isn't available, the compiler can't know what the function does. This could only work if this property of a function (whether it returns a reference to its ref parameter) would be part of its type. The compiler could still infer it for function literals and templates, similar to how pure works now.
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 30, 2012 17:32:40 Nick Treleaven wrote:
 I think the compiler needs to be able to mark foo as a function that
 returns its input reference. Then, any arguments to foo that are locals
 should cause an error at the call site (e.g. in baz). So legal calls to
 foo can always be  safe.
 
 To extend the above code:
 
 ref int quux(ref int i)
 {
      return foo(i);
 }
 
 Here the compiler already knows that foo returns its input reference. So
 it checks whether foo is being passed a local - no; but it also has to
 check if foo is passed any ref parameters of quux, which it is. The
 compiler now has to mark quux as a function that returns its input
 reference.
 
 Works?

No. There's no guarantee that the compiler has access to the function's body, and the function being called could be compiled after the function which calls it. There's a reason that attribute inferrence only works with templated functions. In every other case, the programmer has to mark it. We're _not_ going to get any kind inferrence without templates. D's compilation model doesn't allow it. The closest that we could get to what you suggest would be to add a new attribute similar to nothrow but which guarantees that the function does not return a ref to a parameter. So, you'd have to mark your functions that way (e.g. with norefparamreturn). Maybe the compiler could infer it for templated ones, but this attribute would basically have to work like other inferred attributes and be marked manually in all other cases. Certainly, you can't have the compiler figuring it out for you in general, because D's compilation model allows the function being called to be compiled separately from (and potentially after) the function calling it. And when you think about what this attribute would be needed for, it gets a bit bizarre to have it. The _only_ time that it's applicable is when a function takes an argument by ref and returns the same type by ref. In all other cases, the compiler can guarantee it just based on the type system. I suppose that we could have an attribute that indicated that a function _did_ return a ref to one of its params and then have the compiler give an error if it were missing, which means that the foo function ref int foo(ref int i) { return i; } would end up with an error for not having the attribute, whereas a function like baz ref int baz(int i) { return foo(i); } would not end up with the error unless foo had the attribute on it. But that's very different from any attribute that we currently have. It would be like having a throw attribute instead of a nothrow attribute. I suppose that it is a possible solution though. I could also see an argument that the attribute should go on the parameter rather than the function, in which case you could have more fine-grained control over it, but it does complicate things further. Honestly though, I'm inclined to argue that functions which return by ref and have a ref parameter of that same type just be considered system. It's just way simpler. It's also more in line with how pointers to locals are handled, though because ref is far more restrictive, it should be possible to come up with a different solution (like the attribute), whereas the fact that you can squirrel away pointers to things makes it rather complicated (if not impossible) to have a solution other than simply make taking the address of a local variable system. You can't squirrel away ref. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 Honestly though, I'm inclined to argue that functions which 
 return by ref and
 have a ref parameter of that same type just be considered 
  system.

What about this: struct Foo { int a; } ref int bar(ref Foo foo) { return foo.a; } the parameter type and the return type here are different, but bar still returns a reference to its parameter. I guess you should consider all functions that return ref and have at least one ref parameter system (unless they are marked trusted).
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 30, 2012 23:18:43 jerro wrote:
 Honestly though, I'm inclined to argue that functions which
 return by ref and
 have a ref parameter of that same type just be considered
  system.

What about this: struct Foo { int a; } ref int bar(ref Foo foo) { return foo.a; } the parameter type and the return type here are different, but bar still returns a reference to its parameter. I guess you should consider all functions that return ref and have at least one ref parameter system (unless they are marked trusted).

Good point. Member variables of parameters also cause problems. So, it very quickly devolves to any function which accepts a user-defined type by ref and returns anything by ref would have to be system, which is far from pleasant. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Sunday, 30 December 2012 at 22:30:24 UTC, Jonathan M Davis 
wrote:
 On Sunday, December 30, 2012 23:18:43 jerro wrote:
 Honestly though, I'm inclined to argue that functions which
 return by ref and
 have a ref parameter of that same type just be considered
  system.

What about this: struct Foo { int a; } ref int bar(ref Foo foo) { return foo.a; } the parameter type and the return type here are different, but bar still returns a reference to its parameter. I guess you should consider all functions that return ref and have at least one ref parameter system (unless they are marked trusted).

Good point. Member variables of parameters also cause problems. So, it very quickly devolves to any function which accepts a user-defined type by ref and returns anything by ref would have to be system, which is far from pleasant. - Jonathan M Davis

This may be far fetched, but consider this, If a function returns a ref that is the same as what was passed in by ref, then the passed and return addresses would match, which means that it still may be possible for the compiler to detect the situation. This is more complicated in the case where a user defined struct was passed by ref, and the ref return type is a member from that struct, but it still may be possible to detect it. If address matching is possible (or is that determined only at link time?) then it may be possible to detect a situation that should be illegal and flagged as a compiler error. --rt
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, December 31, 2012 02:37:56 Rob T wrote:
 This may be far fetched, but consider this,
 
 If a function returns a ref that is the same as what was passed
 in by ref, then the passed and return addresses would match,
 which means that it still may be possible for the compiler to
 detect the situation.
 
 This is more complicated in the case where a user defined struct
 was passed by ref, and the ref return type is a member from that
 struct, but it still may be possible to detect it.
 
 If address matching is possible (or is that determined only at
 link time?) then it may be possible to detect a situation that
 should be illegal and flagged as a compiler error.

Addresses would only be known at runtime, and it's far too late at that point. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent "Carl Sturtivant" <sturtivant gmail.com> writes:
/*
The implementation of delegates has solved an analogous problem. 
e.g.
*/
import std.stdio;

auto getfun( int x) {
	int y = x*x;
	int ysquared() {
		return y*y;
	}
	return &ysquared;
}

void main() {
	auto f1 = getfun(2);
	auto f2 = getfun(3);
	writeln( f1() );
	writeln( f2() );
}
/*
The variable y no longer exists local to getfun, but its 
existence is prolonged making its use safe.

Just like _immutable_, some clever compiler inference that is 
transitive plus a delegate-like solution may do the job with ref 
without imposing constraints upon the sane. Plus, it may lead to 
new terrain when the liberation of local variables as above 
occurs in this context.

Warning: this is all speculation.
*/
Dec 30 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, December 31, 2012 03:34:18 Carl Sturtivant wrote:
 The implementation of delegates has solved an analogous problem.

Delegates use closures. The stack of the calling function is copied onto the heap so that it will continue to be valid for the delegate after the function returns. We don't want to be doing anything like that with ref, and it's generally, completely unnecessary. It's just that there are cases where you can escape such references if you're not careful, and the compiler currently considers those to be safe, when they're not actually safe. So, we presumably need to do one of 1. Limit what can be legally done with ref to get rid of the problem. 2. Make ref system in cases where we can't prove that it's safe and try and prove it to be safe in as many situations as possible. 3. Create a new attribute which has to be used when a function returns a ref to a parameter and use that to make it illegal to pass a ref to a local variable to such functions. 4. Something else that similarly protects against this at compile time without any extra overhead at runtime. I really don't think that something like closures would be acceptable for ref parameters. - Jonathan M Davis
Dec 30 2012
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Monday, 31 December 2012 at 02:47:46 UTC, Jonathan M Davis 
wrote:

 3. Create a new attribute which has to be used when a function 
 returns a ref
 to a parameter and use that to make it illegal to pass a ref to 
 a local
 variable to such functions.

If this is the way to go, maybe " saferef" could double as both safe and inoutref. [OT] I've not been here for a while, but I've been reading up on the D boards again. I might want to help with the standard library lexer and parser. Happy New Year...
Dec 30 2012
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
I don't understand why there is a discussion on trying to 
special-case ref parameters. There's nothing special about ref 
parameters... what's special is ref _returns_.


Therefore all we need to do is disallow ref returns in  safe code.
Dec 31 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Monday, 31 December 2012 at 21:25:53 UTC, Mehrdad wrote:
 I don't understand why there is a discussion on trying to 
 special-case ref parameters. There's nothing special about ref 
 parameters... what's special is ref _returns_.


 Therefore all we need to do is disallow ref returns in  safe 
 code.

Yes, but that will render a whole lot of perfectly safe code, including easily provable safe code, from being marked as safe. The real problem is the ability to return a temp as a ref by obfuscating the temp from the compiler through an intermediate wrapper of some kind. Perhaps what must be disallowed (as being safe) are ref returns where the return result cannot be proven to be safe from within the calling function, i.e. the wrapper may be safe, but the usage of the wrapper cannot be guaranteed to be safe when used as a ref return. --rt
Dec 31 2012
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Monday, 31 December 2012 at 23:39:35 UTC, Rob T wrote:
 ref returns where the return result cannot be proven to be safe

Halting problem?
Dec 31 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, December 31, 2012 22:25:52 Mehrdad wrote:
 I don't understand why there is a discussion on trying to
 special-case ref parameters. There's nothing special about ref
 parameters... what's special is ref _returns_.
 
 Therefore all we need to do is disallow ref returns in  safe code.

The problem is ranges. auto ref returns are _very_ common with the front and back of ranges (especially wrapper ranges). If returning by ref or auto ref automatically renders code system, then we just made a _large_ portion of Phobos system when most of it is actually perfectly safe. We _might_ be able to say that a function is system if it both takes an argument by ref and returns by ref, but even that is likely to be a problem unless we can statically prove _in most cases_ that there's no way that the ref being returned could be to any portion of any of the arguments passed by ref. - Jonathan M Davis
Jan 01 2013
prev sibling next sibling parent reply "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis 
wrote:
 After some recent discussions relating to auto ref and const 
 ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's 
  system.

This is not a surprise, I remember Andrei was talking about it 1.5 year ago.
 And I think that we need to take a serious look at it to see 
 what we can do to make
 it  safe. The problem is combining code that takes ref 
 parameters with code
 that returns by ref. Take this code for example:
 <skipped>

I have not met any bugzilla issue or a forum thread when someone has fallen in this double ref trap. The only cases I remember are discussions that there is such possible problem. Requiring some new attribute or new keyword does not really help, because almost all D language constraints can be avoided by low-level tricks. Inferring this trap is not always possible as was mentioned here because compiler does not always have access to function definition. I think it should not be fixed, but probably compiler may issue warning at some circumstances when it can realize this situation. By the way, there is another issue with ref - http://dpaste.dzfl.pl/928767a9 which was discussed several month ago minimum. Do you think this should be also fixed?
 But my point is that we currently have a _major_ hole in SafeD 
 thanks
 to the combination of ref parameters and ref return types, and 
 we need to find
 a solution.

 - Jonathan M Davis

I don't take into D's safity seriously because it can be easily hacked.
Jan 02 2013
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/2/13 5:21 PM, Maxim Fomin wrote:
 On Wednesday, 2 January 2013 at 19:37:51 UTC, Jonathan M Davis wrote:
 On Wednesday, January 02, 2013 13:45:32 Maxim Fomin wrote:
 I think it should not be fixed, but probably compiler may issue
 warning at some circumstances when it can realize this situation.

It's a hole in safe. It must be fixed. That's not even vaguely up for discussion. The question is _how_ to fix it. Ideally, it would be fixed in a way that limits how much more code has to become system.

I argue that safity can be easily broken (not only by example I provided) and there is no way to fix all holes because D is a system language and provides access to low-level features. Safe is good to warn about (not prevent from) doing something wrong but it cannot stop from all safety breakages.

That is incorrect. Andrei
Jan 02 2013
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, January 02, 2013 13:45:32 Maxim Fomin wrote:
 I think it should not be fixed, but probably compiler may issue
 warning at some circumstances when it can realize this situation.

It's a hole in safe. It must be fixed. That's not even vaguely up for discussion. The question is _how_ to fix it. Ideally, it would be fixed in a way that limits how much more code has to become system.
 By the way, there is another issue with ref -
 http://dpaste.dzfl.pl/928767a9 which was discussed several month
 ago minimum. Do you think this should be also fixed?

It's not a bug. You're dereferencing a null pointer, so you get a segfault. There's nothing surprising there.
 I don't take into D's  safity seriously because it can be easily
 hacked.

It's fine if you don't care about it, but as the maintainers of the language and standard library, we have to take it seriously. Regardless of the likelihood of there being a bug caused by this, it breaks safe, so it must be fixed, even if that means simply making all functions which both accept by ref and return by ref system. But that's very undesirable, because it will lead to too much code being considered system even when it's perfectly safe. Hence why this is being discussed. - Jonathan M Davis
Jan 02 2013
parent Artur Skawina <art.08.09 gmail.com> writes:
On 01/03/13 00:06, H. S. Teoh wrote:
 All extern(C) functions must be  system by default. It makes no sense to
 allow a  safe extern(C) function, since there is no way for the compiler
 to verify anything at all. The best you can do is  trusted.

extern(C) does not imply extern. And for extern functions -- safe and trusted are equivalent, unless this makes a difference for name mangling, which it a) shouldn't, b) already does not for the extern(C) case. artur
Jan 03 2013
prev sibling next sibling parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Wednesday, 2 January 2013 at 19:37:51 UTC, Jonathan M Davis 
wrote:
 On Wednesday, January 02, 2013 13:45:32 Maxim Fomin wrote:
 I think it should not be fixed, but probably compiler may issue
 warning at some circumstances when it can realize this 
 situation.

It's a hole in safe. It must be fixed. That's not even vaguely up for discussion. The question is _how_ to fix it. Ideally, it would be fixed in a way that limits how much more code has to become system.

I argue that safity can be easily broken (not only by example I provided) and there is no way to fix all holes because D is a system language and provides access to low-level features. Safe is good to warn about (not prevent from) doing something wrong but it cannot stop from all safety breakages. Nor it should make plenty of code uncompilable just because some trick may cause segfault. Actually many things can cause segfaults, but they are not intended to be fixed.
 By the way, there is another issue with ref -
 http://dpaste.dzfl.pl/928767a9 which was discussed several 
 month
 ago minimum. Do you think this should be also fixed?

It's not a bug. You're dereferencing a null pointer, so you get a segfault. There's nothing surprising there.

Consider broaden example when function takes pointer, does not check for null and passes later reference. This is similar to double ref trick. Consider another example: ---main.d----- extern(C) void foo() safe pure nothrow; void notThatSafe () safe pure nothrow { foo(); } void main() { notThatSafe(); } ----foo.d--- extern(C) void foo() { throw new Exception(""); } ---------- So, pure, nothrow and safe are effectively stripped off by separate compilation. Another example, which does not require separate file: http://dpaste.dzfl.pl/f968cab5
 I don't take into D's  safity seriously because it can be 
 easily
 hacked.

It's fine if you don't care about it, but as the maintainers of the language and standard library, we have to take it seriously. Regardless of the likelihood of there being a bug caused by this, it breaks safe, so it must be fixed, even if that means simply making all functions which both accept by ref and return by ref system. But that's very undesirable, because it will lead to too much code being considered system even when it's perfectly safe. Hence why this is being discussed. - Jonathan M Davis

Again, I argue that D is a system language and there are many possibilities to break safity. Although fixing holes does make sense in general, it does not make sense fixing obvious issues so that plenty of code becomes uncompilable and safity usage becomes very annoying.
Jan 02 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, January 02, 2013 23:21:55 Maxim Fomin wrote:
 Again, I argue that D is a system language and there are many
 possibilities to break  safity. Although fixing holes does make
 sense in general, it does not make sense fixing obvious issues so
 that plenty of code becomes uncompilable and  safity usage
 becomes very annoying.

Then we're going to have to disagree, and I believe that Walter and Andrei are completely with me on this one. If all of the constructs that you use are safe, then it should be _guaranteed_ that your program is memory-safe. That's what safe is for. Yes, it can be gotten around if the programmer marks system code as trusted when it's not really memory-safe, but that's the programmer's problem. safe is not doing it's job and is completely pointless if it has any holes in it beyond programmers mislabeling functions as trusted. - Jonathan M Davis
Jan 02 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 02, 2013 at 05:52:54PM -0500, Jonathan M Davis wrote:
 On Wednesday, January 02, 2013 23:21:55 Maxim Fomin wrote:
 Again, I argue that D is a system language and there are many
 possibilities to break  safity. Although fixing holes does make
 sense in general, it does not make sense fixing obvious issues so
 that plenty of code becomes uncompilable and  safity usage becomes
 very annoying.

Then we're going to have to disagree, and I believe that Walter and Andrei are completely with me on this one. If all of the constructs that you use are safe, then it should be _guaranteed_ that your program is memory-safe. That's what safe is for. Yes, it can be gotten around if the programmer marks system code as trusted when it's not really memory-safe, but that's the programmer's problem. safe is not doing it's job and is completely pointless if it has any holes in it beyond programmers mislabeling functions as trusted.

All extern(C) functions must be system by default. It makes no sense to allow a safe extern(C) function, since there is no way for the compiler to verify anything at all. The best you can do is trusted. T -- Two American lawyers went down to the beach for a swim. Seeing a canoe rental nearby, one asked the other, "Roe, or Wade?"
Jan 02 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, January 02, 2013 15:06:24 H. S. Teoh wrote:
 All extern(C) functions must be  system by default. It makes no sense to
 allow a  safe extern(C) function, since there is no way for the compiler
 to verify anything at all. The best you can do is  trusted.

Agreed. And trusted is seriously questionable. - Jonathan M Davis
Jan 02 2013
prev sibling next sibling parent "Thiez" <thiezz gmail.com> writes:
On Wednesday, 2 January 2013 at 22:53:04 UTC, Jonathan M Davis 
wrote:
 Then we're going to have to disagree, and I believe that Walter 
 and Andrei are
 completely with me on this one. If all of the constructs that 
 you use are
  safe, then it should be _guaranteed_ that your program is 
 memory-safe. That's
 what  safe is for. Yes, it can be gotten around if the 
 programmer marks
  system code as  trusted when it's not really memory-safe, but 
 that's the
 programmer's problem.  safe is not doing it's job and is 
 completely pointless
 if it has any holes in it beyond programmers mislabeling 
 functions as  trusted.
 - Jonathan M Davis

Perhaps it is worth looking at Rust for this problem? They have been looking pretty hard at the lifetimes of data/pointers and perhaps they have a (possibly partial) solution that can be used in the D compiler. It seems to me a ref in D has many things in common with Rust's borrowed pointers.
Jan 02 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jan 02, 2013 at 06:30:38PM -0500, Jonathan M Davis wrote:
 On Wednesday, January 02, 2013 15:06:24 H. S. Teoh wrote:
 All extern(C) functions must be  system by default. It makes no sense to
 allow a  safe extern(C) function, since there is no way for the compiler
 to verify anything at all. The best you can do is  trusted.

Agreed. And trusted is seriously questionable.

We may not have a choice on that, if some Phobos code needs to call C in the backend but needs to expose a safe interface. But yeah, if possible, we should prohibit trusted on extern(C) functions as well. T -- Two American lawyers went down to the beach for a swim. Seeing a canoe rental nearby, one asked the other, "Roe, or Wade?"
Jan 02 2013
prev sibling next sibling parent reply "Jason House" <jason.james.house gmail.com> writes:
On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis 
wrote:
 After some recent discussions relating to auto ref and const 
 ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's 
  system. And I
 think that we need to take a serious look at it to see what we 
 can do to make
 it  safe. The problem is combining code that takes ref 
 parameters with code
 that returns by ref.

The best solution I can think of is for the safe code to require a ref return value is treated with the same care as all the function input arguments. I'll try to annotate the example code you gave to explain.
 Take this code for example:

 ref int foo(ref int i)
 {
     return i;
 }

This function is valid. Ref input arguments can be returned.
 ref int bar()
 {
     int i = 7;
     return foo(i);
 }

If safe, this code will not compile. Error: foo may return a local stack variable Since "i" is a local variable, "foo(i)" might return it.
 ref int baz(int i)
 {
     return foo(i);
 }

This function is fine. "i" is an input argument so "foo(i)" is considered to be equivalent to an input argument.
 void main()
 {
     auto a = bar();
     auto b = baz(5);
 }

Both function calls compile. The variable a could be returned. I'm not sure if b should be returnable by ref. if "5" is a manifest constant, it must be an error in safe code. If it has a permanent address, it could be returned.
Jan 02 2013
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01/03/2013 12:48 AM, Jason House wrote:
 ...

 ref int bar()
 {
     int i = 7;
     return foo(i);
 }

If safe, this code will not compile. Error: foo may return a local stack variable Since "i" is a local variable, "foo(i)" might return it.
 ref int baz(int i)
 {
     return foo(i);
 }

This function is fine. "i" is an input argument so "foo(i)" is considered to be equivalent to an input argument.

Those two cases are pretty much the same.
Jan 02 2013
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 01/03/2013 01:52 PM, Jason House wrote:
 On Thursday, 3 January 2013 at 05:56:27 UTC, Timon Gehr wrote:
 On 01/03/2013 12:48 AM, Jason House wrote:
 ...

 ref int bar()
 {
    int i = 7;
    return foo(i);
 }

If safe, this code will not compile. Error: foo may return a local stack variable Since "i" is a local variable, "foo(i)" might return it.
 ref int baz(int i)
 {
    return foo(i);
 }

This function is fine. "i" is an input argument so "foo(i)" is considered to be equivalent to an input argument.

Those two cases are pretty much the same.

If what I suggest is done, they must be differentiated. If you replace "return foo(i)" with "return i", the compiler will already issue an error for the local variable case.

Obviously _both_ examples result in memory corruption. i is not a ref parameter.
Jan 03 2013
prev sibling parent =?UTF-8?B?U8O2bmtlIEx1ZHdpZw==?= <sludwig outerproduct.org> writes:
Am 03.01.2013 00:48, schrieb Jason House:
 On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis wrote:
 After some recent discussions relating to auto ref and const ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's  system. And I
 think that we need to take a serious look at it to see what we can do to make
 it  safe. The problem is combining code that takes ref parameters with code
 that returns by ref.

The best solution I can think of is for the safe code to require a ref return value is treated with the same care as all the function input arguments. I'll try to annotate the example code you gave to explain.

+1 In other words, references returned by a function call that took any references to locals would be tainted as possibly local (in the function local data flow) and thus are not allowed to escape the scope. References derived from non-local refs could still be returned and returning references to fields from a struct method also works. --- safe ref int test(ref int v) { return v; // fine } safe ref int test2() { int local; return test(local); // error: (possibly) returning ref to local } safe ref int test3() { int local; int* ptr = &test(local); // fine, ptr is tainted 'local' return *ptr; // error: (possibly) returning ref to local } safe ref int test4(ref int val) { return test(val); // fine, can only be a ref to the external 'val' or to a global } ---
Jan 03 2013
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Wednesday, 2 January 2013 at 23:08:14 UTC, H. S. Teoh wrote:
 All extern(C) functions must be  system by default. It makes no 
 sense to
 allow a  safe extern(C) function, since there is no way for the 
 compiler
 to verify anything at all. The best you can do is  trusted.

trusted shouldn't be a part of the function signature anyway, see http://forum.dlang.org/thread/blrglebkzhrilxkbprgh forum.dlang.org. Somebody up for creating a DIP on that? David
Jan 02 2013
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 03, 2013 at 12:53:56AM +0100, David Nadlinger wrote:
 On Wednesday, 2 January 2013 at 23:08:14 UTC, H. S. Teoh wrote:
All extern(C) functions must be  system by default. It makes no sense
to allow a  safe extern(C) function, since there is no way for the
compiler to verify anything at all. The best you can do is  trusted.

trusted shouldn't be a part of the function signature anyway, see http://forum.dlang.org/thread/blrglebkzhrilxkbprgh forum.dlang.org. Somebody up for creating a DIP on that?

Good point, trusted is an implementation detail that should not pollute public APIs. T -- Век живи - век учись. А дураком помрёшь.
Jan 02 2013
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 3 January 2013 at 00:13:41 UTC, H. S. Teoh wrote:
 On Thu, Jan 03, 2013 at 12:53:56AM +0100, David Nadlinger wrote:
 On Wednesday, 2 January 2013 at 23:08:14 UTC, H. S. Teoh wrote:
All extern(C) functions must be  system by default. It makes 
no sense
to allow a  safe extern(C) function, since there is no way 
for the
compiler to verify anything at all. The best you can do is 
 trusted.

trusted shouldn't be a part of the function signature anyway, see http://forum.dlang.org/thread/blrglebkzhrilxkbprgh forum.dlang.org. Somebody up for creating a DIP on that?

Good point, trusted is an implementation detail that should not pollute public APIs. T

Reading through the discussion on the subject, I have to agree that trusted should be used to mark implementation details and not the api. --rt
Jan 02 2013
prev sibling next sibling parent "Jason House" <jason.james.house gmail.com> writes:
On Thursday, 3 January 2013 at 05:56:27 UTC, Timon Gehr wrote:
 On 01/03/2013 12:48 AM, Jason House wrote:
 ...

 ref int bar()
 {
    int i = 7;
    return foo(i);
 }

If safe, this code will not compile. Error: foo may return a local stack variable Since "i" is a local variable, "foo(i)" might return it.
 ref int baz(int i)
 {
    return foo(i);
 }

This function is fine. "i" is an input argument so "foo(i)" is considered to be equivalent to an input argument.

Those two cases are pretty much the same.

If what I suggest is done, they must be differentiated. If you replace "return foo(i)" with "return i", the compiler will already issue an error for the local variable case.
Jan 03 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis 
wrote:
 And maybe another solution which I can't think of at the moment 
 would be
 better. But my point is that we currently have a _major_ hole 
 in SafeD thanks
 to the combination of ref parameters and ref return types, and 
 we need to find
 a solution.

 - Jonathan M Davis


 Related: http://d.puremagic.com/issues/show_bug.cgi?id=8838

I've thought about how I think the attributes should work if D is forced to use them. This was the first system I came up with, but as you'll see below, the system can be simplified by ignoring safe-ty altogether: Two attributes: saferef and inoutref // " saferef" is semantically equivalent to " safe inoutref" saferef ref int fupz(ref int a) { somethingUnsafe(); // Error return a; //Okay } // The same function won't work with just safe safe ref int fuz(ref int a) { return a; // Error: a safe function which returns a reference to // a variable deriving from one of its parameters must be // marked saferef } // Basic rule against using it when not necessary: // a saferef or inoutref function must both accept and return a ref saferef int validate1(ref int a) { return a; } // Error inoutref ref int validate2(int a) { return a; } // Error // saferef's are chained by compiler enforcement: saferef ref int fonz(ref int a) { return a; } safe ref int frooz(ref int a) { return fonz(a); // Error: a function which returns the result of one of // its parameters being passed to a saferef or inoutref // function must itself be marked saferef or inoutref } // The problem of escaping local variables: saferef ref int fonz(ref int a) { return a; } ref int dollop() { int local; return fonz(local); // Error: a function may not return the result of a local variable passed to a saferef or an inoutref function } // inoutref may be used when you have otherwise un-safe code: inoutref ref int froes(ref int a) { /+…some unsafe code…+/ return a; } ref int f() { int local; return froes(local); // Bug caught now even in system code } // An enhancement: mark harmless parameters as saferef saferef ref int twoParams( saferef ref int a, ref int b) { return a; // Error: a saferef or inoutref function may not return a reference derived from a parameter marked saferef return b; // Fine } // Only saferef or inoutref functions would be able to use saferef parameters: ref int zorf( saferef ref int a, ref int b) {} // Error So I typed all of that out and realized that a simpler alternative would be to ignore safe altogether and have the inoutref functionality be on by default. The only attribute now required would be outref, which could be simplified to just "out" so long as it appeared *before* the parameter list, since it could be confused for an out contract if it came afterwards. So: " saferef" <=> " safe outref" is unnecessary because all functions are checked, not just safe ones. ref int lugs(ref int a) { return a; // Okay } ref int h(ref int a) { return lugs(a); // Okay int local; return lugs(local); // Error: may not return the result of a local variable // passed to a function which both accepts and returns a // ref unless that function is marked " outref" } int d; outref ref int saml(ref int a) { return *(new int); // Fine return d; // Fine return a; // Error: a function marked " outref" may not return a reference // deriving from one of its parameters } ref int lugs(ref int a) { return a; } outref ref int druh(ref int a) { return lugs(a); // Error: a function marked outref may not return the result // of one of its parameters being passed to a function unless // that function is itself marked outref } // Must both accept and return a reference outref int boops(ref int a) {} // Error outref ref int bop(int a) {} // Error // Harmless parameters may be marked trusted: outref ref int lit( trusted ref int a, ref int b) { return a; // Passes based on the honor system return b; // Error } The second system is much simpler, and it's only a little more computationally expensive than the first, since the signature of all functions called with local variables must be scanned for ref output and input, not just safe ones.
Jan 03 2013
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Sunday, 30 December 2012 at 08:38:27 UTC, Jonathan M Davis 
wrote:
 After some recent discussions relating to auto ref and const 
 ref, I have come
 to the conlusion that as it stands, ref is not  safe. It's 
  system. And I
 think that we need to take a serious look at it to see what we 
 can do to make
 it  safe. The problem is combining code that takes ref 
 parameters with code
 that returns by ref. Take this code for example:

 ref int foo(ref int i)
 {
     return i;
 }

 ref int bar()
 {
     int i = 7;
     return foo(i);
 }

 ref int baz(int i)
 {
     return foo(i);
 }

 void main()
 {
     auto a = bar();
     auto b = baz(5);
 }

I must admit that I haven't read the rest of the thread yet, but I think the obvious and correct solution is to disallow passing locals (including non-ref parameters, which are effectively locals in D) as non-scope ref arguments. The scope attribute, once properly implemented, would make sure that the reference is not escaped. For now, we could just make it behave overly conservative in safe code. David
Jan 03 2013
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 3 January 2013 at 21:56:22 UTC, David Nadlinger 
wrote:
 I must admit that I haven't read the rest of the thread yet, 
 but I think the obvious and correct solution is to disallow 
 passing locals (including non-ref parameters, which are 
 effectively locals in D) as non-scope ref arguments.

The problem with that idea, is that a ref return with no arguments may call another ref return that returns something that escapes the scope it was created in. If the source code is not available, then there's no way for the compiler to determine that this is going on. I would suggest to disallow all ref returns that make use of a ref return function call *unless* the code portion is marked as trusted, and to to that requires following the ideas presented for changing how trusted should be implemented, ie allowing selected portions of otherwise unsafe code to be marked as trusted by a programmer who has verified the use of the code to be safe given the context.
 The scope attribute, once properly implemented, would make sure 
 that the reference is not escaped. For now, we could just make 
 it behave overly conservative in  safe code.

 David

My understanding was that in some cases that source code is not available to the compiler, which I would think means that preventing scope escaping cannot be 100% guaranteed, correct? --rt
Jan 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 3 January 2013 at 21:56:22 UTC, David Nadlinger 
wrote:
 I must admit that I haven't read the rest of the thread yet, 
 but I think the obvious and correct solution is to disallow 
 passing locals (including non-ref parameters, which are 
 effectively locals in D) as non-scope ref arguments.

 The scope attribute, once properly implemented, would make sure 
 that the reference is not escaped. For now, we could just make 
 it behave overly conservative in  safe code.

This seems to me like the sane thing to do.
Jan 03 2013
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Thursday, 3 January 2013 at 22:50:38 UTC, Rob T wrote:
 On Thursday, 3 January 2013 at 21:56:22 UTC, David Nadlinger 
 wrote:
 I must admit that I haven't read the rest of the thread yet, 
 but I think the obvious and correct solution is to disallow 
 passing locals (including non-ref parameters, which are 
 effectively locals in D) as non-scope ref arguments.

The problem with that idea, is that a ref return with no arguments may call another ref return that returns something that escapes the scope it was created in. If the source code is not available, then there's no way for the compiler to determine that this is going on.

I am not quite sure what you are trying to say. If the compiler never sees the source code for the functions, then codegen is going to be difficult. ;) Yes, if you just see "void iPromiseNotToEscapeMyParameter(scope ref int a) safe;", then there is no way to directly check that the function actually does not leak the parameter address. However, you can be sure that the compiler checked that when generating the code for the function. David
Jan 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 3 January 2013 at 22:50:38 UTC, Rob T wrote:
 The problem with that idea, is that a ref return with no 
 arguments may call another ref return that returns something 
 that escapes the scope it was created in. If the source code is 
 not available, then there's no way for the compiler to 
 determine that this is going on.

You can't return a scope ref in safe code, so that is not an issue.
 I would suggest to disallow all ref returns that make use of a 
 ref return function call *unless* the code portion is marked as 
  trusted, and to to that requires following the ideas presented 
 for changing how  trusted should be implemented, ie allowing 
 selected portions of otherwise unsafe code to be marked as 
 trusted by a programmer who has verified the use of the code to 
 be safe given the context.

 The scope attribute, once properly implemented, would make 
 sure that the reference is not escaped. For now, we could just 
 make it behave overly conservative in  safe code.

 David

My understanding was that in some cases that source code is not available to the compiler, which I would think means that preventing scope escaping cannot be 100% guaranteed, correct?

This is why the scope qualifier exists.
Jan 03 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Thursday, January 03, 2013 23:50:37 Rob T wrote:
 My understanding was that in some cases that source code is not
 available to the compiler, which I would think means that
 preventing scope escaping cannot be 100% guaranteed, correct?

The source code is always available when compiling the function itself. So (assuming that scope is fully implemented - which it's not right now), the compiler will be able to verify that a scope parameter does not escape the function when it compiles that function. What doesn't work is inferring function attributes at the call site, because that requires that the full code be available at the call site. And that's not necessarily true unless you're dealing with a templated function (which is part of why attribute inferrence only works with templated functions). But as long as you're talking about stuff that can be verified when the function itself is compiled, then the fact that the source code isn't necessarily available to the caller isn't an issue. - Jonathan M Davis
Jan 03 2013
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
OK, I understand what you mean by "scope" and how that can be 
used to prevent leaking a local ref out.

Don't forget to consider this kind of scenario, which has no ref 
arguments to consider

struct X
{
    int _i;
    ref int f()
    {
       return _i;
    }
}

ref int F()
{

    X x;
    return x.f();
}

int main()
{
     // example uses that currently compile
     F = 1000;
     writeln(F());
}

Is this valid? Does local x remain defined up until the function 
call terminates completely, ie until after the reference is no 
longer valid?

I can also mark everything as  safe and it will compile, and also 
scope x

 safe ref int F()
{

    scope X x;
    return x.f();

    // this compiles too

    return x._i;

}

--rt
Jan 03 2013
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Thursday, 3 January 2013 at 23:06:03 UTC, David Nadlinger 
wrote:
 The problem with that idea, is that a ref return with no 
 arguments may call another ref return that returns something 
 that escapes the scope it was created in. If the source code 
 is not available, then there's no way for the compiler to 
 determine that this is going on.

I am not quite sure what you are trying to say. If the compiler never sees the source code for the functions, then codegen is going to be difficult. ;)

See my post directly above this one that has an example. --rt
Jan 03 2013
prev sibling next sibling parent "Araq" <rumpf_a gmx.de> writes:
On Wednesday, 2 January 2013 at 23:33:16 UTC, Thiez wrote:
 On Wednesday, 2 January 2013 at 22:53:04 UTC, Jonathan M Davis 
 wrote:
 Then we're going to have to disagree, and I believe that 
 Walter and Andrei are
 completely with me on this one. If all of the constructs that 
 you use are
  safe, then it should be _guaranteed_ that your program is 
 memory-safe. That's
 what  safe is for. Yes, it can be gotten around if the 
 programmer marks
  system code as  trusted when it's not really memory-safe, but 
 that's the
 programmer's problem.  safe is not doing it's job and is 
 completely pointless
 if it has any holes in it beyond programmers mislabeling 
 functions as  trusted.
 - Jonathan M Davis

Perhaps it is worth looking at Rust for this problem?

You can also look at how Algol solved this over 40 years ago: Insert a runtime check that the escaping reference does not point to the current stack frame which is about to be destroyed. The check should be very cheap at runtime but it can be deactivated in a release build for efficiency just like it is done for array indexing. FYI Nimrod has the same problem and it's planned to prevent these cases statically with a type based alias analysis; however at least the first versions will still keep the dynamic check as these kind of static analyses cry for correctness proofs IMO.
Jan 03 2013
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Friday, 4 January 2013 at 00:46:33 UTC, Araq wrote:
 On Wednesday, 2 January 2013 at 23:33:16 UTC, Thiez wrote:
 On Wednesday, 2 January 2013 at 22:53:04 UTC, Jonathan M Davis 
 wrote:
 Then we're going to have to disagree, and I believe that 
 Walter and Andrei are
 completely with me on this one. If all of the constructs that 
 you use are
  safe, then it should be _guaranteed_ that your program is 
 memory-safe. That's
 what  safe is for. Yes, it can be gotten around if the 
 programmer marks
  system code as  trusted when it's not really memory-safe, 
 but that's the
 programmer's problem.  safe is not doing it's job and is 
 completely pointless
 if it has any holes in it beyond programmers mislabeling 
 functions as  trusted.
 - Jonathan M Davis

Perhaps it is worth looking at Rust for this problem?

You can also look at how Algol solved this over 40 years ago: Insert a runtime check that the escaping reference does not point to the current stack frame which is about to be destroyed. The check should be very cheap at runtime but it can be deactivated in a release build for efficiency just like it is done for array indexing. FYI Nimrod has the same problem and it's planned to prevent these cases statically with a type based alias analysis; however at least the first versions will still keep the dynamic check as these kind of static analyses cry for correctness proofs IMO.

I did suggest something like that, and it may be a good idea to implement as a debugging aid (like runtime range checking). I wonder how difficult it would be to implement? Unfortunately, it does not help solve the safe compile time checks. --rt
Jan 03 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 4 January 2013 at 06:30:55 UTC, Sönke Ludwig wrote:
 In other words, references returned by a function call that 
 took any references to locals would be
 tainted as possibly local (in the function local data flow) and 
 thus are not allowed to escape the
 scope. References derived from non-local refs could still be 
 returned and returning references to
 fields from a struct method also works.

 ---
  safe ref int test(ref int v) {
 	return v; // fine
 }

v should be scope here. If not, other function have no guarantee that the reference will not escape.
  safe ref int test2() {
 	int local;
 	return test(local); // error: (possibly) returning ref to local
 }

  safe ref int test3() {
 	int local;
 	int* ptr = &test(local); // fine, ptr is tainted 'local'
 	return *ptr; // error: (possibly) returning ref to local
 }

  safe ref int test4(ref int val) {
 	return test(val); // fine, can only be a ref to the external 
 'val' or to a global
 }
 ---

Given the modification mentioned above, this look like the way to go.
Jan 04 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Friday, 4 January 2013 at 06:30:55 UTC, Sönke Ludwig wrote:
 In other words, references returned by a function call that 
 took any references to locals would be
 tainted as possibly local (in the function local data flow) and 
 thus are not allowed to escape the
 scope. References derived from non-local refs could still be 
 returned and returning references to
 fields from a struct method also works.

 ---
  safe ref int test(ref int v) {
 	return v; // fine
 }

  safe ref int test2() {
 	int local;
 	return test(local); // error: (possibly) returning ref to local
 }

  safe ref int test3() {
 	int local;
 	int* ptr = &test(local); // fine, ptr is tainted 'local'
 	return *ptr; // error: (possibly) returning ref to local
 }

  safe ref int test4(ref int val) {
 	return test(val); // fine, can only be a ref to the external 
 'val' or to a global
 }
 ---

Trying to say that formally: Definitions: 'Tainter function': A function that: 1. takes at least one of its parameters by reference and 2. returns by reference 'Tainting function call': A call to a 'tainter function' where at least one of the arguments passed by reference is ref to a local variable Then the rules become: Function may not return a reference to: Rule 1: a function-local variable Rule 2. a value returned by a 'tainting function call' safe: ref int tfun(ref int v) { // tfun tagged 'tainter function' ... } ref int test1() { int local; return local; // error by Rule 1 } ref int test2() { int local; return tfun(local); // error by Rule 2 } ref int test3() { int local; int* ptr = &tfun(local); // ptr tagged 'local' return *ptr; // error by Rule 2 } ref int test4(ref int val) { return tfun(val); // fine } int global; ref int test5() { int local; int* ptr = &tfun(local); // ptr tagged 'local' ptr = &global; // ptr's 'local' tag removed return *ptr; // fine }
Jan 04 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Friday, 4 January 2013 at 14:15:01 UTC, Tommi wrote:
 'Tainting function call':
     A call to a 'tainter function' where at least one of the
     arguments passed by reference is ref to a local variable

I forgot to point out that the return value of a 'tainting function call' is considered to be a "reference to a function-local variable" (even if it's not in reality).
Jan 04 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Sunday, 30 December 2012 at 22:02:16 UTC, Jonathan M Davis 
wrote:
 The closest that we could get to what you suggest would be to 
 add a new
 attribute similar to nothrow but which guarantees that the 
 function does not
 return a ref to a parameter. So, you'd have to mark your 
 functions that way
 (e.g. with  norefparamreturn). Maybe the compiler could infer 
 it for templated
 ones, but this attribute would basically have to work like 
 other inferred
 attributes and be marked manually in all other cases. 
 Certainly, you can't
 have the compiler figuring it out for you in general, because 
 D's compilation
 model allows the function being called to be compiled 
 separately from (and
 potentially after) the function calling it.

 And when you think about what this attribute would be needed 
 for, it gets a
 bit bizarre to have it. The _only_ time that it's applicable is 
 when a
 function takes an argument by ref and returns the same type by 
 ref. In all
 other cases, the compiler can guarantee it just based on the 
 type system.

I realized just now that it's also applicable to member functions: struct F { int _i; ref int ser() { return _i; } // Needs to be marked as well } A struct's fields are implicit parameters in anything it returns.
 Honestly though, I'm inclined to argue that functions which 
 return by ref and
 have a ref parameter of that same type just be considered 
  system.

Structs mess that up as well: struct S { int i; } ref int d(ref S s) { return s.i; }
Jan 04 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, January 04, 2013 17:26:59 Zach the Mystic wrote:
 Honestly though, I'm inclined to argue that functions which
 return by ref and
 have a ref parameter of that same type just be considered
  system.

Structs mess that up as well: struct S { int i; } ref int d(ref S s) { return s.i; }

Yes. That's a function which takes a ref and returns by ref just like I said. It's just that in this case, the ref returned isn't the full object that was passed by ref but just a portion of it. What that means is that you can't assume that the ref being returned is safe just because the type of the parameter and the return type aren't the same. But it doesn't change the statement that a function which takes a parameter by ref and returns by ref can't be considered safe without additional constraints of some kind. It just shows why you don't have an easy way out to make many of them safe based on the differing types involved. - Jonathan M Davis
Jan 04 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
On Friday, 4 January 2013 at 20:20:08 UTC, Jonathan M Davis wrote:
 ... But it doesn't change the
 statement that a function which takes a parameter by ref and 
 returns by ref
 can't be considered  safe without additional constraints of 
 some kind. It just
 shows why you don't have an easy way out to make many of them 
  safe based on
 the differing types involved.

Well, I've been working on just that. I'll have it for you tomorrow, I think.
Jan 04 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
I've here formalized how I think the constraints on a non-scope 
ref taking and ref returning function should work. This 
represents a whole addition to the type system. The attribute 
" outref" from my previous post has been shortened to keyword 
"out" (must come before parentheses). This is all I have left to 
say about this topic:

ref int lugs(ref int a) { return a; }
ref int h(ref int a)
{
   return lugs(a); // Okay

   int local;
   return lugs(local); // Error: the result of a function which 
accepts a local as non-scope ref and returns ref is treated as 
local and cannot be escaped unless that function is marked "out"

   int* p = &lugs(local); // Same error
}

int d;
out ref int saml(ref int a)
{
   return *(new int); // Fine
   return d; // Fine

   return a; // Error: a function marked "out" may not escape a 
non-scope ref parameter
}

ref int lugh(ref int a) { return a; }
out ref int druh(ref int a)
{
   return lugh(a); // Error: a function marked "out" may not 
escape the result of a function which accepts its non-scope ref 
parameter and returns a ref unless that function is also marked 
"out"
}

out int boops(ref int a) {} // Error: a function marked "out" 
must return a reference
out ref int bop(int a, in ref b, scope ref c) {} // Error: a 
non-member function marked "out" must accept at least one 
non-scope ref parameter

// "cast(out)" provides all needed flexibility:
out ref int lit(ref int a)
{
   return cast(out) a; // Not  safe

   // But with  trusted blocks, we could do:
    trusted { return cast(out) a; } //  safe

   // And with  trusted statements, the brackets are gone:
    trusted return cast(out) a; //  safe

   // Otherwise, this function must be marked " trusted"
}

// You can use cast(out) anywhere:
ref int hugs(ref int a) { return a; }
ref int g(ref int a)
{
   int local;
   return cast(out) (hugs(local)); // Okay
   return cast(out) local; // Okay??
   return hugs(cast(out) local); // Won't know what hit 'em
}

// Nor did I forget about structs:
struct S
{
   int _i;
   static int _s;
   out ref int club() { return _i; } // Error: a member function 
marked "out" may not escape a non-static field
   out ref int trob() { return _s; } // Okay
   out ref int blub() { return cast(out) _i; } //Okay
}

struct B { int _i; ref int snub() { return _i; } }
ref int bub()
{
   B b;
   return b.snub(); // Error: the result of a local instance's 
non-static method which returns ref is considered local and may 
not be escaped unless that method is marked "out"

   int* i = &b.snub(); // Same error
}
Jan 05 2013
prev sibling next sibling parent "comco" <void.unsigned gmail.com> writes:
On Friday, 4 January 2013 at 20:20:08 UTC, Jonathan M Davis wrote:
 On Friday, January 04, 2013 17:26:59 Zach the Mystic wrote:
 Honestly though, I'm inclined to argue that functions which
 return by ref and
 have a ref parameter of that same type just be considered
  system.

Structs mess that up as well: struct S { int i; } ref int d(ref S s) { return s.i; }

Yes. That's a function which takes a ref and returns by ref just like I said. It's just that in this case, the ref returned isn't the full object that was passed by ref but just a portion of it. What that means is that you can't assume that the ref being returned is safe just because the type of the parameter and the return type aren't the same. But it doesn't change the statement that a function which takes a parameter by ref and returns by ref can't be considered safe without additional constraints of some kind. It just shows why you don't have an easy way out to make many of them safe based on the differing types involved. - Jonathan M Davis

Why this won't work? 1. If the function code is available at ct, we can check for escaping locals. 2. Otherwise, we want to statically say to the compiler that the returned ref is safe exactly in these lines in which the particular function argument, from which the ref has been extracted, has not yet gone out of scope. So the returned ref safety guarantee tracks the argument safety guarantee. Something like this: ref int f( infer_safe_from ref int a); With such an annotation on an argument, the compiler will be able to infer the safety of a function when used in cases in which a is in scope whenever the returned ref is referenced. Now, if f is used in this manner: {
Jan 06 2013
prev sibling next sibling parent "Zach the Mystic" <reachBUTMINUSTHISzach gOOGLYmail.com> writes:
I felt confident enough about my proposal to submit it as 
enhancement request:

http://d.puremagic.com/issues/show_bug.cgi?id=9283
Jan 08 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Wednesday, 9 January 2013 at 04:33:21 UTC, Zach the Mystic 
wrote:
 I felt confident enough about my proposal to submit it as 
 enhancement request:

 http://d.puremagic.com/issues/show_bug.cgi?id=9283

I like it. One issue though, like you also indicated by putting question marks on it: ref T get(T)() { T local; return cast(out) local; // This shouldn't compile } Because, wouldn't returning a local variable as a reference be a dangling reference in all cases? No matter if the programmer claims it's correct by saying cast(out)... it just can't be correct. And T can be a type that has reference semantics or value semantics, it doesn't matter. That function would always return a dangling reference, were it allowed to compile.
Jan 09 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Wednesday, 9 January 2013 at 04:33:21 UTC, Zach the Mystic 
wrote:
 I felt confident enough about my proposal to submit it as 
 enhancement request:

 http://d.puremagic.com/issues/show_bug.cgi?id=9283

By the way, what do you propose is the correct placement of this "new" out keyword: #1: out ref int get(ref int a); #2: ref out int get(ref int a); #3: ref int get(ref int a) out; I wouldn't allow #3.
Jan 09 2013
prev sibling next sibling parent "comco" <void.unsigned gmail.com> writes:
On Sunday, 30 December 2012 at 22:02:16 UTC, Jonathan M Davis 
wrote:
  But that's
 very different from any attribute that we currently have. It 
 would be like
 having a throw attribute instead of a nothrow attribute. I 
 suppose that it is
 a possible solution though. I could also see an argument that 
 the attribute
 should go on the parameter rather than the function, in which 
 case you could
 have more fine-grained control over it, but it does complicate 
 things further.

I think this is the most reasonable thing to do and I can argue that the complications are not a valid argument against this. I've came out with roughly the same idea some days ago. Comparing this with nothrow is a nice point, but I don't see it as an argument against it. This is the most logical thing to do, and solves problems. So, the general notion that we want to (statically) express is that the ref result of a function __can__ depend on one (or more - depending on a condition for example) ref function parameters. Now, if the result is used when all the annotated arguments are still in scope, that usage can be considered safe. So a function declaration will (conceptually) look like this: ref int min( result_tracks_scope_of ref int a, result_tracks_scope_of ref int b) { return a < b ? a : b; } Now the interface provides enough information for itself to infer when the usage is safe and when not. This will work equally well when we refer to members of the ref parameters. ref int a( result_tracks_scope_of A a) { return a.la.bala; } The crucial thing is that the compiler can simply infer these attributes when the implementation is available, so we won't have to issue errors when the user has not added them (if the code is available).
Jan 09 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Thursday, 3 January 2013 at 21:56:22 UTC, David Nadlinger 
wrote:
 I must admit that I haven't read the rest of the thread yet, 
 but I think the obvious and correct solution is to disallow 
 passing locals (including non-ref parameters, which are 
 effectively locals in D) as non-scope ref arguments.

 The scope attribute, once properly implemented, would make sure 
 that the reference is not escaped. For now, we could just make 
 it behave overly conservative in  safe code.

 David

If you disallow passing local variables as non-scope ref arguments, then you effectively disallow all method calls on local variables. My reasoning is as follows: struct T { int get(int v) const; void set(int v); } Those methods of T can be thought of as free functions with these signatures: int get(ref const T obj, int v); void set(ref T obj, int v); And these kinds of method calls: T obj; int n = obj.get(v); obj.set(n); ...can be thought of as being converted to these free function calls: T obj; int n = .get(obj, v); .set(obj, n); I don't know what the compiler does or doesn't do, but it is *as_if* the compiler did this conversion from method calls to free functions. Now it's obvious, given those free function signatures, that if you disallow passing function-local variables as non-scope references, you also disallow this code: void func() { T obj; obj.set(123); } Because that would effectively be the same as: void func() { T obj; // obj is a local variable .set(obj, 123); // obj is passed as non-scope ref } Then, you might ask, why don't those methods of T correspond to these free function signatures: int get(scope ref const T obj, int v); void set(scope ref T obj, int v); And the answer is obviously that it would prevent these kinds of methods: struct T { int v; ref T increment() { v++; return this; } } ...because that would then convert to this free function signature: ref T increment(scope ref T obj) { obj.v++; return obj; // Can't return a reference to a scope argument }
Jan 10 2013
prev sibling next sibling parent "Tommi" <tommitissari hotmail.com> writes:
...Although, I should add that my analogy between methods and 
free functions seems to break when the object is an rvalue. Like 
in:

struct T
{
     int v;

     this(int a)
     {
         v = a;
     }

     int get()
     {
         return v;
     }
}

int v = T(4).get();

Given my analogy, the method get() should be able to be thought 
of as a free function:

int gget(ref T obj)
{
     return obj.v;
}

But then the above method call should be able to thought of as:

int v = gget(T(4));

...which won't compile because T(4) is an rvalue, and according 
to D, rvalues can't be passed as ref (nor const ref). I don't 
know which one is flawed, my analogy, or the logic of how D is 
designed.
Jan 10 2013
prev sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Thursday, 10 January 2013 at 16:42:09 UTC, Tommi wrote:
 ...which won't compile because T(4) is an rvalue, and according 
 to D, rvalues can't be passed as ref (nor const ref). I don't 
 know which one is flawed, my analogy, or the logic of how D is 
 designed.

My analogy is a bit broken in the sense that methods actually see their designated object as a reference to lvalue even if it is an rvalue. But I don't think that affects the logic of my main argument about scope arguments. A more strict language logic would be inconvenient. But, this logic does introduce a discrepancy between non-member operators and member operators in C++ (which D actually side-steps by disallowing non-member operators... and then re-introduces by providing UFCS): // C++: struct T { int val = 10; T& operator--() { --val; return *this; } }; T& operator++(T& t) { ++t.val; return t; } int main() { _cprintf("%d\n", (--T()).val); // Prints: 9 _cprintf("%d\n", (++T()).val); // Error: no known conversion // from 'T' to 'T&' return 0; } // D: import std.stdio; struct T { int val; int get() { ++val; return val; } } int het(ref T t) { ++t.val; return t.val; } void main() { writeln(T().get()); writeln(T().het()); // Error: T(0) is not an lvalue }
Jan 10 2013