www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - safe inference fundamentally broken

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
I was just viewing an interesting pull request. The pull wanted to change  
a line like this (buf is a fixed-size stack allocated array):

auto x = buf.ptr[0..buf.len];

To this:

auto trustedBuffer(ref typeof(buf) b)  trusted { return b.ptr[0..buf.len];  
}

for the sake of allowing the enclosing template function to be inferred  
 safe. But the interesting thing is that the original line is DELIBERATELY  
that way to PREVENT the compiler from inferring  safety (the buffer could  
potentially be squirreled away somewhere).

I questioned the idea of using the .ptr trick to fool the compiler, and  
asked why doesn't just do the correct thing. Turns out, the compiler is  
very bad at determining if stack-allocated data is escaped or returned.

A quick example:

T[] getBuf(T)()  safe
{
     T[100] ret;
     auto r = ret[];
     return r;
}

void main()  safe
{
     auto buf = getBuf!int();
}

Note, the above compiles. An interesting thing here is that we have  
explicitly marked getBuf as  safe. So what if we want to remove that? It  
STILL compiles, because the compiler infers  safety!

In order to fix it, we can either mark the function getBuf as  system, or  
do what was done in the original code above. The benefit of using ptr  
instead of  system is that it's much less susceptible to someone coming  
along and saying "hey look, if I just remove  system, it works!" It also  
allows conditionally compiled un- safe code to be included in the same  
function, and let the compiler infer correctly.

But this is extremely alarming. Here we have to be vigilant for anything  
like this, and make SURE we explicitly mark this as  system. If there is  
some possibility that a function could be inferred  safe or  system  
depending on the template parameters, then we would need to write separate  
templates for both, or use the .ptr trick above.

This situation is very bad. I personally think that we need to make  
slicing a stack-allocated array INVALID in  safe code, and not let that  
code be inferred safe. We have already demonstrated an easy way to make an  
internal delegate that can be  trusted for one line. That should be used  
to work around this limitation.

I propose that we start migrating towards making slicing of stack data  
un- safe, first by making it a warning, enabled with -w. Then making it an  
error.

Thoughts?

-Steve
Jun 05 2014
next sibling parent reply "deadalnix" <deadalnix gmail.com> writes:
 safe is fundamentally broken.
Jun 05 2014
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/05/2014 08:17 PM, deadalnix wrote:
  safe is fundamentally broken.

What's the "fundamental" problem? The construct seems perfectly fit to specify memory safety in at least the following context: void main() safe{} :o)
Jun 05 2014
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/05/2014 08:47 PM, deadalnix wrote:
 On Thursday, 5 June 2014 at 18:33:22 UTC, Timon Gehr wrote:
 On 06/05/2014 08:17 PM, deadalnix wrote:
  safe is fundamentally broken.

What's the "fundamental" problem? The construct seems perfectly fit to specify memory safety in at least the following context: void main() safe{} :o)

Many constructs are assumed to be safe on basis safe don't guarantee. T[] arr = [ ... ]; arr = arr[$ .. $]; auto garbage = *(arr.ptr); safe is as safe a a condom with holes poked in it.

I see this not as a fundamental problem with safe, but with how its implementation has been approached.
Jun 05 2014
prev sibling next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 06/05/2014 08:09 PM, Steven Schveighoffer wrote:
 A quick example:

 T[] getBuf(T)()  safe
 {
     T[100] ret;
     auto r = ret[];
     return r;
 }

 void main()  safe
 {
     auto buf = getBuf!int();
 }

 Note, the above compiles. An interesting thing here is that we have explicitly
marked getBuf as  safe. So what if we want to remove that? It STILL compiles,
because the compiler infers  safety!

AFAIK it's a well known problem.
 This situation is very bad. I personally think that we need to make
 slicing a stack-allocated array INVALID in  safe code,

This issue is not a fundamental problem with safe, nor is it a matter of personal opinion, this is simply a compiler bug.
 and not let that
 code be inferred safe. We have already demonstrated an easy way to make
 an internal delegate that can be  trusted for one line. That should be
 used to work around this limitation.

 I propose that we start migrating towards making slicing of stack data
 un- safe, first by making it a warning, enabled with -w. Then making it
 an error.

 Thoughts?

The fundamental issue seems to lie in methodology and it is that safe is approximated by the DMD implementation from the wrong side. Instead of gradually banning usage of more and more constructs in safe, the implementation should have started out with not allowing any constructs in safe code and then should have gradually allowed more and more manually verified to be memory safe constructs.
Jun 05 2014
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/05/2014 08:35 PM, Steven Schveighoffer wrote:
 On Thu, 05 Jun 2014 14:30:49 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 The fundamental issue seems to lie in methodology and it is that  safe
 is approximated by the DMD implementation from the wrong side. Instead
 of gradually banning usage of more and more constructs in  safe, the
 implementation should have started out with not allowing any
 constructs in  safe code and then should have gradually allowed more
 and more manually verified to be memory safe constructs.

I think I was one of those who argued to do it gradually. I was wrong.

I don't understand. Both strategies are gradual.
 When one is manually marking  safe things, it's not as bad as when the
 compiler is automatically marking them. But in either case,  safe
 doesn't really mean safe, so it is pretty much useless.

 -Steve

Actually there are two conflicting definitions of safe in the documentation. One says that safe means memory safe and the other says that safe means what DMD does: dlang.org/function.html "Function Safety Safe functions are functions that are statically checked to exhibit no possibility of undefined behavior. Undefined behavior is often used as a vector for malicious attacks. Safe Functions Safe functions are marked with the safe attribute. The following operations are not allowed in safe functions: [ too short list ]" It then goes on to note: "Note: The verifiable safety of functions may be compromised by bugs in the compiler and specification. Please report all such errors so they can be corrected." This is simply the wrong way to go about it. The documentation should instead say: "The following operations are _allowed_ in safe functions:"
Jun 05 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/5/14, 8:30 PM, Timon Gehr wrote:
 The fundamental issue seems to lie in methodology and it is that  safe
 is approximated by the DMD implementation from the wrong side. Instead
 of gradually banning usage of more and more constructs in  safe, the
 implementation should have started out with not allowing any constructs
 in  safe code and then should have gradually allowed more and more
 manually verified to be memory safe constructs.

Agreed. -- Andrei
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 14:30:49 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 The fundamental issue seems to lie in methodology and it is that  safe  
 is approximated by the DMD implementation from the wrong side. Instead  
 of gradually banning usage of more and more constructs in  safe, the  
 implementation should have started out with not allowing any constructs  
 in  safe code and then should have gradually allowed more and more  
 manually verified to be memory safe constructs.

I think I was one of those who argued to do it gradually. I was wrong. When one is manually marking safe things, it's not as bad as when the compiler is automatically marking them. But in either case, safe doesn't really mean safe, so it is pretty much useless. -Steve
Jun 05 2014
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 5 June 2014 at 18:33:22 UTC, Timon Gehr wrote:
 On 06/05/2014 08:17 PM, deadalnix wrote:
  safe is fundamentally broken.

What's the "fundamental" problem? The construct seems perfectly fit to specify memory safety in at least the following context: void main() safe{} :o)

Many constructs are assumed to be safe on basis safe don't guarantee. T[] arr = [ ... ]; arr = arr[$ .. $]; auto garbage = *(arr.ptr); safe is as safe a a condom with holes poked in it.
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 14:52:39 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 06/05/2014 08:35 PM, Steven Schveighoffer wrote:
 On Thu, 05 Jun 2014 14:30:49 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 The fundamental issue seems to lie in methodology and it is that  safe
 is approximated by the DMD implementation from the wrong side. Instead
 of gradually banning usage of more and more constructs in  safe, the
 implementation should have started out with not allowing any
 constructs in  safe code and then should have gradually allowed more
 and more manually verified to be memory safe constructs.

I think I was one of those who argued to do it gradually. I was wrong.

I don't understand. Both strategies are gradual.

One starts with a decidedly non-gradual breaking change. -Steve
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 14:47:54 -0400, deadalnix <deadalnix gmail.com> wrote:

 On Thursday, 5 June 2014 at 18:33:22 UTC, Timon Gehr wrote:
 On 06/05/2014 08:17 PM, deadalnix wrote:
  safe is fundamentally broken.

What's the "fundamental" problem? The construct seems perfectly fit to specify memory safety in at least the following context: void main() safe{} :o)

Many constructs are assumed to be safe on basis safe don't guarantee. T[] arr = [ ... ]; arr = arr[$ .. $]; auto garbage = *(arr.ptr);

Believe it or not, this is actually safe. -Steve
Jun 05 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 June 2014 at 19:27:56 UTC, Steven Schveighoffer 
wrote:
 On Thu, 05 Jun 2014 14:47:54 -0400, deadalnix
 T[] arr = [ ... ];
 arr = arr[$ .. $];
 auto garbage = *(arr.ptr);

Believe it or not, this is actually safe. -Steve

What do you mean by "is actually safe" ? In the "can you actually believe this obviously wrong code is marked as safe" or "this code that looks wrong is actually perfectly safe"? AFAIK, it's only safe if arr is GC allocated to a bloc smaller than a page, so the conditions are implementation defined behavior, and so is the result.
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 15:34:13 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 5 June 2014 at 19:27:56 UTC, Steven Schveighoffer wrote:
 On Thu, 05 Jun 2014 14:47:54 -0400, deadalnix
 T[] arr = [ ... ];
 arr = arr[$ .. $];
 auto garbage = *(arr.ptr);

Believe it or not, this is actually safe.

What do you mean by "is actually safe" ? In the "can you actually believe this obviously wrong code is marked as safe" or "this code that looks wrong is actually perfectly safe"?

It's safe because of the implementation of arrays. There is always one sentinel byte that cannot be used for the block of data. This is why when you allocate e.g. 8 ints, it goes into a 32-byte block.
 AFAIK, it's only safe if arr is GC allocated to a bloc smaller than a  
 page, so the conditions are implementation defined behavior, and so is  
 the result.

No, pages also have a sentinel byte. -Steve
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 15:48:09 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 05 Jun 2014 15:34:13 -0400, monarch_dodra  
 <monarchdodra gmail.com> wrote:

 On Thursday, 5 June 2014 at 19:27:56 UTC, Steven Schveighoffer wrote:
 On Thu, 05 Jun 2014 14:47:54 -0400, deadalnix
 T[] arr = [ ... ];
 arr = arr[$ .. $];
 auto garbage = *(arr.ptr);

Believe it or not, this is actually safe.

What do you mean by "is actually safe" ? In the "can you actually believe this obviously wrong code is marked as safe" or "this code that looks wrong is actually perfectly safe"?

It's safe because of the implementation of arrays. There is always one sentinel byte that cannot be used for the block of data. This is why when you allocate e.g. 8 ints, it goes into a 32-byte block.

I take it back, it could be unsafe. You could have e.g. a 12 byte struct be T, and then the last "element" could extend through the end of the block. -Steve
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 15:54:33 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 05 Jun 2014 15:48:09 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 05 Jun 2014 15:34:13 -0400, monarch_dodra  
 <monarchdodra gmail.com> wrote:

 On Thursday, 5 June 2014 at 19:27:56 UTC, Steven Schveighoffer wrote:
 On Thu, 05 Jun 2014 14:47:54 -0400, deadalnix
 T[] arr = [ ... ];
 arr = arr[$ .. $];
 auto garbage = *(arr.ptr);

Believe it or not, this is actually safe.

What do you mean by "is actually safe" ? In the "can you actually believe this obviously wrong code is marked as safe" or "this code that looks wrong is actually perfectly safe"?

It's safe because of the implementation of arrays. There is always one sentinel byte that cannot be used for the block of data. This is why when you allocate e.g. 8 ints, it goes into a 32-byte block.

I take it back, it could be unsafe. You could have e.g. a 12 byte struct be T, and then the last "element" could extend through the end of the block.

A possible fix could be to reject the call to ptr at runtime if the slice is empty. -Steve
Jun 05 2014
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 5 June 2014 at 19:57:08 UTC, Steven Schveighoffer 
wrote:
 A possible fix could be to reject the call to ptr at runtime if 
 the slice is empty.

I don't know why you'd ever do "arr.ptr" in the first place, other than to avoid the bounds check. So I think the call should just be unsafe, and we call it a day. Or maybe to interface with a function that want a pointer? "Maybe", we could get away with allowing "&arr[someIndex]" though. The compiler would have to be able to "understand" this is not escaping a reference (for *DYNAMIC* arrays anyways).
Jun 05 2014
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 05 Jun 2014 16:32:24 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Thursday, 5 June 2014 at 19:57:08 UTC, Steven Schveighoffer wrote:
 A possible fix could be to reject the call to ptr at runtime if the  
 slice is empty.

I don't know why you'd ever do "arr.ptr" in the first place, other than to avoid the bounds check. So I think the call should just be unsafe, and we call it a day. Or maybe to interface with a function that want a pointer?

That's true. You can always get a pointer to any valid element with &arr[x]. Then at least you have bounds checking to save you. In fact, in safe code, arr.ptr could be replaced with &arr[0].
 "Maybe", we could get away with allowing "&arr[someIndex]" though. The  
 compiler would have to be able to "understand" this is not escaping a  
 reference (for *DYNAMIC* arrays anyways).

If we are going to allow pointers, we need to allow pointers to data we know is valid and heap-allocated. This should include dynamic array elements. -Steve
Jun 05 2014
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 5 June 2014 at 20:37:51 UTC, Steven Schveighoffer
wrote:
 On Thu, 05 Jun 2014 16:32:24 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 On Thursday, 5 June 2014 at 19:57:08 UTC, Steven Schveighoffer 
 wrote:
 A possible fix could be to reject the call to ptr at runtime 
 if the slice is empty.

I don't know why you'd ever do "arr.ptr" in the first place, other than to avoid the bounds check. So I think the call should just be unsafe, and we call it a day. Or maybe to interface with a function that want a pointer?

That's true. You can always get a pointer to any valid element with &arr[x]. Then at least you have bounds checking to save you. In fact, in safe code, arr.ptr could be replaced with &arr[0].

&arr[0] does bound checking. That is different ;)
Jun 05 2014
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/5/2014 11:09 AM, Steven Schveighoffer wrote:
 Thoughts?

Please file bugzilla issue(s) for this.
Jun 05 2014
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/06/2014 01:15 AM, Walter Bright wrote:
 On 6/5/2014 11:09 AM, Steven Schveighoffer wrote:
 Thoughts?

Please file bugzilla issue(s) for this.

There already is one: https://issues.dlang.org/show_bug.cgi?id=8838
Jun 05 2014