www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dip1000 + pure is a DEADLY COMBO

reply Dennis <dkorpel gmail.com> writes:
Sorry for the attention-grabbing title, but I think it's 
warranted, because the gist of it is this:

**With `-preview=dip1000` enabled, the compiler will happily 
compile valid, ` safe` D code into memory corrupting machine 
code.**

The root cause is:
[Issue 20150 - -dip1000 defeated by 
pure](https://issues.dlang.org/show_bug.cgi?id=20150)

The compiler ignores "reference to local variable `x` assigned to 
non-scope parameter `y`" errors when the function is annotated or 
inferred `pure`. The idea is, presumably, that `pure` functions 
can't escape references because they have no interaction with 
global variables. This is false of course, since they can still 
return them or assign them to other parameters.

The deadly part it that using this flawed logic, the compiler 
sometimes turns GC allocations into stack allocations too 
eagerly. Here I got memory corruption because the compiler 
allocated an array literal on the stack instead of the heap:

[Issue 21291 - Array literal that escapes scope is allocated on 
stack](https://issues.dlang.org/show_bug.cgi?id=21291)

Later I encountered another instance of it where a closure was 
not heap-allocated, which looked something like this:

```D
import core.thread;
 safe:
void main() {
     S s;
     s.memberFunc();
}

struct S {
     int a;
     auto memberFunc() {
         auto t = new Thread({
             auto pa = &a; // pointer to stack frame of main!
         });
     }
}
```

I'm not the only one who encountered memory corruption bugs this 
way, user Dechcaudron commented on my issue: "This has also 
happened to me, no idea it could be due to -dip1000".

And most recently:
[Issue 21912 - Invalid stack closure when calling delegate inside 
lambda](https://issues.dlang.org/show_bug.cgi?id=21912)



Walter made a PR for fixing the behavior in dmd: (March 2020)
https://github.com/dlang/dmd/pull/10924

Later, aG0aep6G made a better fix: (November 2020)
https://github.com/dlang/dmd/pull/12010

But they're both blocked by the fact that Phobos relies on the 
bug to compile with -dip1000. This makes sense, because the 
conversion process was mostly "add `scope` and `return` 
annotations until the compile errors go away". `pure` functions 
did not give error messages, so they did not get those 
annotations.

Regarding this extra work, [aG0aep6G 
commented:](https://github.com/dlang/dmd/pull/12010#issuecomment-759070687)
(January 2021)

 I had started on it, but it's tedious work tracking down the 
 errors through templates and overloads. If I remember 
 correctly, dup gave me some trouble, too.

 So I've put it on ice for the time being. If someone else wants 
 to give it a shot, that would be great.
And that's where we are now. Matthias asked "Is there a plan to enable DIP1000 by default?" during [DConf Online 2020 Day One Q \& A Livestream, at 4:50:11](https://youtu.be/o-2_mxaCL9w?t=17411). Walter mentioned "we can do it now" and Atila mentioned how the first step would be to change -dip1000 errors into equivalent deprecation warnings. Clearly, issue 20150 is a blocker for dip1000 by default. In the meantime, since I absolutely don't want another unfortunate soul debugging memory corruption bugs that dip1000 introduces, this post is meant to raise awareness, and discuss intermediate solutions. Maybe the compiler can defensively heap-allocate for now, though that would break ` nogc` code. Or maybe we can add another switch, `-preview=dip1000proper`, since the fix is a breaking change. What do you think?
May 12
next sibling parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Wednesday, 12 May 2021 at 13:14:30 UTC, Dennis wrote:
 Sorry for the attention-grabbing title, but I think it's 
 warranted, because the gist of it is this:

 **With `-preview=dip1000` enabled, the compiler will happily 
 compile valid, ` safe` D code into memory corrupting machine 
 code.**

 The root cause is:
 [Issue 20150 - -dip1000 defeated by 
 pure](https://issues.dlang.org/show_bug.cgi?id=20150)

 The compiler ignores "reference to local variable `x` assigned 
 to non-scope parameter `y`" errors when the function is 
 annotated or inferred `pure`. The idea is, presumably, that 
 `pure` functions can't escape references because they have no 
 interaction with global variables. This is false of course, 
 since they can still return them or assign them to other 
 parameters.

 The deadly part it that using this flawed logic, the compiler 
 sometimes turns GC allocations into stack allocations too 
 eagerly. Here I got memory corruption because the compiler 
 allocated an array literal on the stack instead of the heap:

 [Issue 21291 - Array literal that escapes scope is allocated on 
 stack](https://issues.dlang.org/show_bug.cgi?id=21291)

 Later I encountered another instance of it where a closure was 
 not heap-allocated, which looked something like this:

 ```D
 import core.thread;
  safe:
 void main() {
     S s;
     s.memberFunc();
 }

 struct S {
     int a;
     auto memberFunc() {
         auto t = new Thread({
             auto pa = &a; // pointer to stack frame of main!
         });
     }
 }
 ```

 I'm not the only one who encountered memory corruption bugs 
 this way, user Dechcaudron commented on my issue: "This has 
 also happened to me, no idea it could be due to -dip1000".

 And most recently:
 [Issue 21912 - Invalid stack closure when calling delegate 
 inside lambda](https://issues.dlang.org/show_bug.cgi?id=21912)



 Walter made a PR for fixing the behavior in dmd: (March 2020)
 https://github.com/dlang/dmd/pull/10924

 Later, aG0aep6G made a better fix: (November 2020)
 https://github.com/dlang/dmd/pull/12010

 But they're both blocked by the fact that Phobos relies on the 
 bug to compile with -dip1000. This makes sense, because the 
 conversion process was mostly "add `scope` and `return` 
 annotations until the compile errors go away". `pure` functions 
 did not give error messages, so they did not get those 
 annotations.

 Regarding this extra work, [aG0aep6G 
 commented:](https://github.com/dlang/dmd/pull/12010#issuecomment-759070687)
(January 2021)

 I had started on it, but it's tedious work tracking down the 
 errors through templates and overloads. If I remember 
 correctly, dup gave me some trouble, too.

 So I've put it on ice for the time being. If someone else 
 wants to give it a shot, that would be great.
And that's where we are now. Matthias asked "Is there a plan to enable DIP1000 by default?" during [DConf Online 2020 Day One Q \& A Livestream, at 4:50:11](https://youtu.be/o-2_mxaCL9w?t=17411). Walter mentioned "we can do it now" and Atila mentioned how the first step would be to change -dip1000 errors into equivalent deprecation warnings. Clearly, issue 20150 is a blocker for dip1000 by default. In the meantime, since I absolutely don't want another unfortunate soul debugging memory corruption bugs that dip1000 introduces, this post is meant to raise awareness, and discuss intermediate solutions. Maybe the compiler can defensively heap-allocate for now, though that would break ` nogc` code. Or maybe we can add another switch, `-preview=dip1000proper`, since the fix is a breaking change. What do you think?
Should it be a bug with Pure rather than dip1000? -Alex
May 12
parent reply Dennis <dkorpel gmail.com> writes:
On Wednesday, 12 May 2021 at 14:58:23 UTC, 12345swordy wrote:
 Should it be a bug with Pure rather than dip1000?
Purity is inferred correctly. The problem is that function parameters get the `scope` storage class for free when the function is strongly pure. In any case, I don't think it makes a difference whether you call it a "bug with pure" or a "bug with dip1000".
May 12
parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Wednesday, 12 May 2021 at 15:47:24 UTC, Dennis wrote:
 On Wednesday, 12 May 2021 at 14:58:23 UTC, 12345swordy wrote:
 Should it be a bug with Pure rather than dip1000?
Purity is inferred correctly. The problem is that function parameters get the `scope` storage class for free when the function is strongly pure. In any case, I don't think it makes a difference whether you call it a "bug with pure" or a "bug with dip1000".
Can phobos be rewritten, such that it doesn't depend on the bug? -Alex
May 12
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 12 May 2021 at 21:20:03 UTC, 12345swordy wrote:
 On Wednesday, 12 May 2021 at 15:47:24 UTC, Dennis wrote:
 On Wednesday, 12 May 2021 at 14:58:23 UTC, 12345swordy wrote:
 Should it be a bug with Pure rather than dip1000?
Purity is inferred correctly. The problem is that function parameters get the `scope` storage class for free when the function is strongly pure. In any case, I don't think it makes a difference whether you call it a "bug with pure" or a "bug with dip1000".
Can phobos be rewritten, such that it doesn't depend on the bug? -Alex
This was discussed here: https://github.com/dlang/dmd/pull/12010 Short answer: yes, but it requires a lot of difficult and tedious debugging work, and nobody has volunteered yet.
May 12
parent reply MoonlightSentinel <moonlightsentinel disroot.org> writes:
On Wednesday, 12 May 2021 at 22:11:36 UTC, Paul Backus wrote:
 Short answer: yes, but it requires a lot of difficult and 
 tedious debugging work, and nobody has volunteered yet.
... and will probably break code in several other libraries.
May 12
parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 12 May 2021 at 22:51:24 UTC, MoonlightSentinel 
wrote:
 On Wednesday, 12 May 2021 at 22:11:36 UTC, Paul Backus wrote:
 Short answer: yes, but it requires a lot of difficult and 
 tedious debugging work, and nobody has volunteered yet.
... and will probably break code in several other libraries.
If they're using -preview=dip1000, yes. Do preview flags come with a stability guarantee?
May 12
parent Mathias LANG <geod24 gmail.com> writes:
On Wednesday, 12 May 2021 at 22:53:20 UTC, Paul Backus wrote:
 On Wednesday, 12 May 2021 at 22:51:24 UTC, MoonlightSentinel 
 wrote:
 On Wednesday, 12 May 2021 at 22:11:36 UTC, Paul Backus wrote:
 Short answer: yes, but it requires a lot of difficult and 
 tedious debugging work, and nobody has volunteered yet.
... and will probably break code in several other libraries.
If they're using -preview=dip1000, yes. Do preview flags come with a stability guarantee?
Nope, that's the point of a preview flag, it can break from release to release. Our company uses `-preview=in` and `-checkaction=context` (the later isn't a preview, but it's still very experimental) and we know that the tradeoff is that you will most likely need to stick to one version of the compiler if you want to keep your sanity.
May 13
prev sibling next sibling parent Dukc <ajieskola gmail.com> writes:
On Wednesday, 12 May 2021 at 13:14:30 UTC, Dennis wrote:
 **With `-preview=dip1000` enabled, the compiler will happily 
 compile valid, ` safe` D code into memory corrupting machine 
 code.**
This is indeed horrible. Thanks for bringing it up.
 Maybe the compiler can defensively heap-allocate for now, 
 though that would break ` nogc` code. Or maybe we can add 
 another switch, `-preview=dip1000proper`, since the fix is a 
 breaking change. What do you think?
The compiler switch, but with some changes: - It would be inverse. Correct behaviour by default. - You have to list modules/packages where the buggy behaviour would apply. - One can set that flag to `core,std` to continue using Phobos with `-dip1000`, while still fixing the bug regarding user's own code. - The error messages caused by the bug fix should clearly redirect to instructions to the above.
May 13
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 12 May 2021 at 13:14:30 UTC, Dennis wrote:
 The idea is, presumably, that `pure` functions can't escape 
 references because they have no interaction with global 
 variables. This is false of course, since they can still return 
 them or assign them to other parameters.
No, pure function can _neither_ access module global _nor_ process global (__gshared) variables whatsoever regardless of `scope` nor `return` qualifier on parameters and return type. Please show an example that contradicts this statement. I fail to see the general issue here.
May 14
next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 14 May 2021 at 09:02:35 UTC, Per Nordlöw wrote:
 I fail to see the general issue here.
Nevermind this comment. I see the problem now, thanks. I was given the code example ```d int bad_global = 1; safe pure void purefunc(int* x) { *x = 2; } void main() { assert(bad_global == 1); purefunc(&bad_global); assert(bad_global == 2); } ``` that incorrectly succeeds to compile [1]. [1] https://run.dlang.io/?compiler=dmd&source=int%20bad_global%20%3D%201;%0Apure%20void%20purefunc(int*%20x)%0A%7B%0A%20%20%20%20*x%20%3D%202;%0A%7D%0Avoid%20main()%0A%7B%0A%20%20%20%20assert(bad_global%20%3D%3D%201);%0A%20%20%20%20purefunc(%26bad_global);%0A%20%20%20%20assert(bad_global%20%3D%3D%202);%0A%7D
May 14
prev sibling parent Dennis <dkorpel gmail.com> writes:
On Friday, 14 May 2021 at 09:02:35 UTC, Per Nordlöw wrote:
 No, pure function can _neither_ access module global _nor_ 
 process global (__gshared) variables whatsoever regardless of 
 `scope` nor `return` qualifier on parameters and return type.

 Please show an example that contradicts this statement.
I can't, I agree with the statement. I'm not saying that adding scope or return breaks pure, it's the other way around. Adding `pure` breaks `scope`, since the compiler incorrectly assumes that a `pure` function implies that the parameters are `scope`.
May 14
prev sibling next sibling parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 12 May 2021 at 13:14:30 UTC, Dennis wrote:
 The root cause is:
 [Issue 20150 - -dip1000 defeated by 
 pure](https://issues.dlang.org/show_bug.cgi?id=20150)
Moreover, the code example ```d int* escape(int* r) safe pure { return r; } int* f() safe { int x = 42; return escape(&x); /* Should not compile. */ } ``` referenced at [1] correctly fails to compile with dmd master using `-dip1000` as main.d(9,20): Error: cannot take address of local `x` in ` safe` function `f` Again, I fail to see the problem here. [1] https://issues.dlang.org/show_bug.cgi?id=20150
May 14
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 14 May 2021 at 09:15:18 UTC, Per Nordlöw wrote:
 fails to compile with dmd master using `-dip1000` as

     main.d(9,20): Error: cannot take address of local `x` in 
 ` safe` function `f`
That's the old error message, I don't understand how you get that with `-dip1000`, I can only reproduce that by forgetting that flag.
May 14
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 14 May 2021 at 09:41:01 UTC, Dennis wrote:
 On Friday, 14 May 2021 at 09:15:18 UTC, Per Nordlöw wrote:
 fails to compile with dmd master using `-dip1000` as

     main.d(9,20): Error: cannot take address of local `x` in 
 ` safe` function `f`
That's the old error message, I don't understand how you get that with `-dip1000`, I can only reproduce that by forgetting that flag.
Are you using dmd master?
May 14
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Friday, 14 May 2021 at 09:43:53 UTC, Per Nordlöw wrote:
 That's the old error message, I don't understand how you get 
 that with `-dip1000`, I can only reproduce that by forgetting 
 that flag.
Ahh, sorry my script used -dip1008 not -dip1000. Didn't see the difference.
May 14
prev sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 12 May 2021 at 13:14:30 UTC, Dennis wrote:


 Walter made a PR for fixing the behavior in dmd: (March 2020)
 https://github.com/dlang/dmd/pull/10924

 Later, aG0aep6G made a better fix: (November 2020)
 https://github.com/dlang/dmd/pull/12010
I vote for modifying this PR to emit a deprecation message for the new case not covered by phobos and many other projects.
May 14