www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Discussion Thread: DIP 1035-- system Variables--Final Review

reply Mike Parker <aldacron gmail.com> writes:
This is the discussion thread for the Final Review of DIP 1035, 
" system Variables":

https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md

The review period will end at 11:59 PM ET on March 5, or when I 
make a post declaring it complete. Discussion in this thread may 
continue beyond that point.

Here in the discussion thread, you are free to discuss anything 
and everything related to the DIP. Express your support or 
opposition, debate alternatives, argue the merits, etc.

However, if you have any specific feedback on how to improve the 
proposal itself, then please post it in the feedback thread. The 
feedback thread will be the source for the review summary I write 
at the end of this review round. I will post a link to that 
thread immediately following this post. Just be sure to read and 
understand the Reviewer Guidelines before posting there:

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

And my blog post on the difference between the Discussion and 
Feedback threads:

https://dlang.org/blog/2020/01/26/dip-reviews-discussion-vs-feedback/

Please stay on topic here. I will delete posts that are 
completely off-topic.
Feb 19 2022
next sibling parent Mike Parker <aldacron gmail.com> writes:
On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:

 However, if you have any specific feedback on how to improve 
 the proposal itself, then please post it in the feedback 
 thread. The feedback thread will be the source for the review 
 summary I write at the end of this review round. I will post a 
 link to that thread immediately following this post.
The feedback thread is located here: https://forum.dlang.org/post/kwabfusqvczenjjacbmq forum.dlang.org
Feb 19 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:
 This is the discussion thread for the Final Review of DIP 1035, 
 " system Variables":

 https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md

 The review period will end at 11:59 PM ET on March 5, or when I 
 make a post declaring it complete. Discussion in this thread 
 may continue beyond that point.

 Here in the discussion thread, you are free to discuss anything 
 and everything related to the DIP. Express your support or 
 opposition, debate alternatives, argue the merits, etc.

 However, if you have any specific feedback on how to improve 
 the proposal itself, then please post it in the feedback 
 thread. The feedback thread will be the source for the review 
 summary I write at the end of this review round. I will post a 
 link to that thread immediately following this post. Just be 
 sure to read and understand the Reviewer Guidelines before 
 posting there:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 And my blog post on the difference between the Discussion and 
 Feedback threads:

 https://dlang.org/blog/2020/01/26/dip-reviews-discussion-vs-feedback/

 Please stay on topic here. I will delete posts that are 
 completely off-topic.
While this DIP will solve most of the issues of ` safe` `__traits(getMember, xxx, yyy)` fetching private members, there is one thing that will remain a problem: destructors. The issue is that we want a way to specify a destructor so that it could be called by ` safe` code at end of the lifetime of an instance, but not early: ```D auto shouldBeSafe() { ObjectWithDestructor x; } auto shouldBeSystem() { ObjectWithDestructor x; __traits(getMember, x, "__dtor")(); } ``` For that to work, we will either have to make privacy inviolable from ` safe` (the destructor that does not want to be called early would be private, and `object.destroy` would be changed to be ` system` for private destructors), or add some alternative way to define destructors. Why we want to do that at all? DIP1000. Most of the potential of DIP1000 is wasted if you cannot prevent destruction before end of the scope: ```D safe void abuse() { auto cont = SomeRaiiContainer([1,2,3]); scope ptr = &cont.front; destroy(cont); int oops = *ptr; } ``` Yes, you can prevent that by also marking `abuse` ` live`. But I think we ought to do better than that. As I see it, ` live` is mainly meant as a partial memory safety mechanishm for low-level code that cannot be ` safe`. It isn't intended that you start to mark your average ` safe` code as ` live`, that would be terribly onerous. So we don't want to settle for our libraries being memory safe in ` live`, if we can make them memory safe in normal ` safe`. This is not intended as an argument against DIP1035, in fact I'm still in favour of it. I just wanted to point out that it does not entirely solve the problem of `__traits(getMember, xxx, yyy)` bypassing privacy.
Feb 20 2022
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 The issue is that we want a way to specify a destructor so that 
 it could be called by ` safe` code at end of the lifetime of an 
 instance, but not early:
Yes, there's an issue for that: [Issue 21981 - Manually calling a __dtor can violate memory safety ](https://issues.dlang.org/show_bug.cgi?id=21981)
Feb 20 2022
next sibling parent Dukc <ajieskola gmail.com> writes:
On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?
No, `void[1]` is not a type with unsafe values.
I was just checking what the language spec says about this, and found an alternative we have all been overlooking. A type can be declared unsafe in the present language by giving it an invariant. Yes I meant that contract programming invariant! The spec says that void-initializing a type with an invariant, or using an union that has a member with an invariant is ` system`-only. Thus the invariant effectively declares the type unsafe. It also means that `void[1]` is an unsafe type, because it can contain a struct with an invariant. This DIP still has the advantage that ` safe` functions in the same module with the invariant type do not need any special care. But still, that sounds a pretty trivial gain to me - in the `IntSlice` example you can make the members read-only with a bit union trickery if you want to, or define a string mixin that does the same automatically. I'm starting to think it's probably not worth it overall. Still I'm only slightly against because the rules proposed blend such nicely with the existing language, and it sure is sometimes convenient to have an alternative.
Mar 04 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?
No, `void[1]` is not a type with unsafe values.
I was just checking what the language spec says about this, and found an alternative we have all been overlooking. A type can be declared unsafe in the present language by giving it an invariant. Yes I meant that contract programming invariant! The spec says that void-initializing a type with an invariant, or using an union that has a member with an invariant is ` system`-only. Thus the invariant effectively declares the type unsafe. It also means that `void[1]` is an unsafe type, because it can contain a struct with an invariant. This DIP still has the advantage that ` safe` functions in the same module with the invariant type do not need any special care. But still, that sounds a pretty trivial gain to me - in the `IntSlice` example you can make the members read-only with a bit union trickery if you want to, or define a string mixin that does the same automatically. I'm starting to think it's probably not worth it overall. Still I'm only slightly against because the rules proposed blend such nicely with the existing language, and it sure is sometimes convenient to have an alternative.
Mar 04 2022
parent Paul Backus <snarwin gmail.com> writes:
On Friday, 4 March 2022 at 13:06:35 UTC, Dukc wrote:
 On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
 feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?
No, `void[1]` is not a type with unsafe values.
I was just checking what the language spec says about this, and found an alternative we have all been overlooking. A type can be declared unsafe in the present language by giving it an invariant. Yes I meant that contract programming invariant! The spec says that void-initializing a type with an invariant, or using an union that has a member with an invariant is ` system`-only. Thus the invariant effectively declares the type unsafe.
First, this was not "overlooked"--it was added to the language spec well after DIP 1035 was written and submitted. Dennis and I have been aware of this spec change since it was first proposed in [DMD PR 12326][1]. Second, this is not a complete alternative to DIP 1035, because it does not solve [the `__traits(getMember)` issue][2]. As long as ` safe` code is allowed to bypass encapsulation and access the fields of user-defined types directly, it is impossible for ` trusted` code to rely on the integrity of the data in those fields. [1]: https://github.com/dlang/dmd/pull/12326#issuecomment-812575730 [2]: https://issues.dlang.org/show_bug.cgi?id=20941
Mar 04 2022
prev sibling parent reply Nick Treleaven <nick geany.org> writes:
On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 Why we want to do that at all? DIP1000. Most of the potential 
 of DIP1000 is wasted if you cannot prevent destruction before 
 end of the scope:

 ```D
  safe void abuse()
 { auto cont = SomeRaiiContainer([1,2,3]);
   scope ptr = &cont.front;
   destroy(cont);
   int oops = *ptr;
 }
 ```
Doesn't the same problem occur just with reassignment?: ```d scope ptr = &cont.front; cont = cont.init; int oops = *ptr; ```
Feb 22 2022
parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 22 February 2022 at 15:05:14 UTC, Nick Treleaven 
wrote:
 On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 Why we want to do that at all? DIP1000. Most of the potential 
 of DIP1000 is wasted if you cannot prevent destruction before 
 end of the scope:

 ```D
  safe void abuse()
 { auto cont = SomeRaiiContainer([1,2,3]);
   scope ptr = &cont.front;
   destroy(cont);
   int oops = *ptr;
 }
 ```
Doesn't the same problem occur just with reassignment?: ```d scope ptr = &cont.front; cont = cont.init; int oops = *ptr; ```
Reassignment can be forbidden at least. But still, my point does not stand scrutiny, because: 1: No reassignment is clumsy. 2: Even accepting that, this would still do the same thing: ```D auto pCont = new SomeRaiiContainer([1,2,3]); scope ptr = &pCont.front(); pCont = null; GC.collect; int oops = *ptr; ``` So, stratch that. RAII or reference counted containers can only be ` safe` with a callback based usage, as suggested by Paul: https://github.com/dlang/phobos/pull/8368#issuecomment-1024917439 . And that idiom is ` safe` even with present destructors. Hopefully: We have discovered so many in our DIP1000-based memory safety schemes lately so there is no guarantee that this isn't still just some oversight.
Feb 23 2022
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:
 This is the discussion thread for the Final Review of DIP 1035, 
 " system Variables":

 https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md
In the "Example: `int` as pointer" section, the following sentence appears:
 Because an `int` is a safe type, any `int` value can be created 
 from ` safe` code, so any memory corruption that could follow 
 from escaping a `scope int` could also result from creating the 
 same `int` value without accessing the variable.
This sentence correctly recognizes that (absent incorrect ` trusted` code elsewhere) there is no memory-safety risk in allowing a value without indirections to escape from a function. It also completely undermines the example's motivation. If there is no benefit to memory-safety from applying `scope` checking to data without indirections, then there is no justification for enabling such checks in all ` safe` code, even if they may occasionally be "desirable" for other, non-memory-safety-related reasons. Later, in the "Description" section, we find the following sentence:
 The `scope` keyword is not stripped away [from an aggregate 
 with at least one ` system` field], even when the aggregate has 
 no members that contain pointers.
The only justification for this appears to be the example discussed above. Both this sentence, and the example that attempts to support it, should be removed from the DIP.
Feb 21 2022
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 In the "Example: `int` as pointer" section, the following 
 sentence appears:
 [...]
This should have been in the Feedback thread. I've reposted it there now: https://forum.dlang.org/post/lmmpuyeurzavwqiylwlp forum.dlang.org
Feb 21 2022
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 If there is no benefit to memory-safety from applying `scope`
 checking to data without indirections, then there is no
 justification for enabling such checks in all ` safe` code, 
 even if they may occasionally be "desirable" for other, 
 non-memory-safety-related reasons.
It is memory-safety related, it allows you to create custom pointer types. A pointer is just an integer under the hood, the idea of indirections and lifetimes is just a compile time idea around a `size_t` which indexes into memory. Why can't we do the same with a `ushort` which indexes into an array?
Feb 21 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 21 February 2022 at 20:30:07 UTC, Dennis wrote:
 On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 If there is no benefit to memory-safety from applying `scope`
 checking to data without indirections, then there is no
 justification for enabling such checks in all ` safe` code, 
 even if they may occasionally be "desirable" for other, 
 non-memory-safety-related reasons.
It is memory-safety related, it allows you to create custom pointer types. A pointer is just an integer under the hood, the idea of indirections and lifetimes is just a compile time idea around a `size_t` which indexes into memory. Why can't we do the same with a `ushort` which indexes into an array?
If the goal is being able to define custom pointer types, then the DIP should use that as an example instead of talking about file descriptors, and it should explain *exactly* which part of the example depends on this feature for memory safety (as the other examples do). I still don't think it's a compelling use-case, though. [`TailUnqual`][1] does something very similar, using the `union` workaround, and it would not benefit from having access to `scope`-checked integers because (a) it stores a `size_t`, so eliminating the `union` wouldn't save any space; and (b) it needs the `union` for correct GC scanning regardless. [1]: https://gist.github.com/pbackus/1638523a5b6ea3ce2c0a73358cff4dc6
Feb 21 2022
parent reply Dennis <dkorpel gmail.com> writes:
On Monday, 21 February 2022 at 21:50:31 UTC, Paul Backus wrote:
 If the goal is being able to define custom pointer types, then 
 the DIP should use that as an example instead of talking about 
 file descriptors, and it should explain *exactly* which part of 
 the example depends on this feature for memory safety (as the 
 other examples do).
A double `fclose` on a `FILE*` is basically a double free. I thought the same would apply to raw file descriptors, but I just read that a double `close` simply results in an `EBADF` error, so maybe it's not a good example.
 I still don't think it's a compelling use-case, though.
 [`TailUnqual`][1] does something very similar, using the 
 `union` workaround, and it would not benefit from having access 
 to `scope`-checked integers because (a) it stores a `size_t`, 
 so eliminating the `union` wouldn't save any space; and (b) it 
 needs the `union` for correct GC scanning regardless.
Yes, TailUnqual doesn't need `scope`-checked integers, but that doesn't mean other code doesn't need it. I added the rule for two reasons: - The compiler currently has a notion of a type that `hasPointers`. The extra complexity of adding a notion `hasSystemVariables` was daunting, but then I thought we could just make them the same. I think that would not only simplify the implementation, but also the feature in general. It makes it easy to draw a parallel to a pointer and a ` system size_t`. - Some people asked for the feature (see links in the rationale section) I can improve the DIP text, but I'm not yet convinced the rule should be scrapped.
Feb 21 2022
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Monday, 21 February 2022 at 22:56:30 UTC, Dennis wrote:
 - The compiler currently has a notion of a type that 
 `hasPointers`. The extra complexity of adding a notion 
 `hasSystemVariables` was daunting, but then I thought we could 
 just make them the same. I think that would not only simplify 
 the implementation, but also the feature in general. It makes 
 it easy to draw a parallel to a pointer and a ` system size_t`.
The compiler needs at least two notions: `hasPointers` and `hasUnsafeValues`. `hasUnsafeValues` is a superset of `hasPointers`, and also includes aggregate types with ` system` fields and [`bool`][1]. Since I assume you do not intend to expand `scope` checking to `bool`, folding everything into a single concept will not be possible. [1]: https://issues.dlang.org/show_bug.cgi?id=20148
 - Some people asked for the feature (see links in the rationale 
 section)
I've read those links. In one of them, the problem was solved using existing DIP 1000 features. The other asks for "unique references (and borrow) to plain ints". It is not clear to me that `scope` checking for integers would solve either of these problems. If you believe it would, then it should not be difficult for you to write up an example demonstrating how.
Feb 21 2022
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Monday, 21 February 2022 at 22:56:30 UTC, Dennis wrote:
 On Monday, 21 February 2022 at 21:50:31 UTC, Paul Backus wrote:
 If the goal is being able to define custom pointer types, then 
 the DIP should use that as an example instead of talking about 
 file descriptors, and it should explain *exactly* which part 
 of the example depends on this feature for memory safety (as 
 the other examples do).
A double `fclose` on a `FILE*` is basically a double free. I thought the same would apply to raw file descriptors, but I just read that a double `close` simply results in an `EBADF` error, so maybe it's not a good example.
A more pertinent example around file descriptors and memory safety is void-initialization: ```d struct File { void write(const(void)[] data) safe; // ... private int fd; } void main() safe { File f = void; // this compiles in current language, because `File` doesn't have pointers f.write("hello"); // may corrupt memory if (implementation-defined) value of `f.fd` happens to correspond to an existing mapping } ```
Feb 22 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 22 February 2022 at 08:47:55 UTC, Stanislav Blinov 
wrote:
 A more pertinent example around file descriptors and memory 
 safety is void-initialization:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe;
     // ...
     private int fd;
 }

 void main()  safe
 {
     File f = void; // this compiles in current language, 
 because `File` doesn't have pointers
     f.write("hello"); // may corrupt memory if 
 (implementation-defined) value of `f.fd` happens to correspond 
 to an existing mapping
 }
 ```
If you attempt to fill in the missing part of your example, I think you will find that you cannot actually demonstrate memory corruption resulting from `void`-initialization of a file descriptor without the use of ` trusted` code (e.g., to cast the `void*` returned from `mmap` to some other type of pointer whose target type has unsafe values).
Feb 22 2022
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 22 February 2022 at 13:13:43 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 08:47:55 UTC, Stanislav Blinov 
 wrote:
 A more pertinent example around file descriptors and memory 
 safety is void-initialization:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe;
     // ...
     private int fd;
 }

 void main()  safe
 {
     File f = void; // this compiles in current language, 
 because `File` doesn't have pointers
     f.write("hello"); // may corrupt memory if 
 (implementation-defined) value of `f.fd` happens to correspond 
 to an existing mapping
 }
 ```
If you attempt to fill in the missing part of your example, I think you will find that you cannot actually demonstrate memory corruption resulting from `void`-initialization of a file descriptor without the use of ` trusted` code (e.g., to cast the `void*` returned from `mmap` to some other type of pointer whose target type has unsafe values).
Yes, the implementation of `File` would need trusted code. How would that invalidate the example? Your process can inherit fds from its parent. Or you may have pipes, shared memfds, sockets. Plenty of ways of obtaining a valid fd without requiring any casts OR having to deal with pointers. The example shows a way to (unintentionally) alias an existing fd (obtained through whichever means) and write to it, in safe context. An fd is an unsafe quantity encoded in a safe type. We need a way to express that in the language.
Feb 22 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 22 February 2022 at 15:55:16 UTC, Stanislav Blinov 
wrote:
 Yes, the implementation of `File` would need  trusted code. How 
 would that invalidate the example?
If completing the example required *incorrect* use of ` trusted` (i.e., on a function that does not have a [safe interface][1]), it would not be valid. Using ` trusted` in the implementation of `File.write` to call POSIX `write` would not be a problem. [1]: https://dlang.org/spec/function.html#safe-interfaces
 Your process can inherit fds from its parent. Or you may have 
 pipes, shared memfds, sockets. Plenty of ways of obtaining a 
 valid fd without requiring any casts OR having to deal with 
 pointers. The example shows a way to (unintentionally) alias an 
 existing fd (obtained through whichever means) and write to it, 
 in  safe context.
The example shows a write to an fd, and then hand-waves about how this could maybe, hypothetically, somehow, cause memory corruption. Aliasing an fd does not, by itself, constitute memory corruption. Remember, if you can cause memory corruption in ` safe` code, that means you can also cause undefined behavior in ` safe` code. So if you cannot write a program that uses this alleged loophole to cause UB, then what you have found is not actually memory corruption. (Although it may still be "data corruption".)
Feb 22 2022
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 22 February 2022 at 16:16:30 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 15:55:16 UTC, Stanislav Blinov 
 wrote:
 Yes, the implementation of `File` would need  trusted code. 
 How would that invalidate the example?
If completing the example required *incorrect* use of ` trusted` (i.e., on a function that does not have a [safe interface][1]), it would not be valid.
It doesn't. The program, as presented, is enough.
 Using ` trusted` in the implementation of `File.write` to call 
 POSIX `write` would not be a problem.
More the reason for me to not understand the objection then.
 [1]: https://dlang.org/spec/function.html#safe-interfaces

 Your process can inherit fds from its parent. Or you may have 
 pipes, shared memfds, sockets. Plenty of ways of obtaining a 
 valid fd without requiring any casts OR having to deal with 
 pointers. The example shows a way to (unintentionally) alias 
 an existing fd (obtained through whichever means) and write to 
 it, in  safe context.
The example shows a write to an fd, and then hand-waves about how this could maybe, hypothetically, somehow, cause memory corruption.
I don't follow. It seems pretty clear to me how the example is expressed. Where's the handwaving? Compare to this: ```d void main() safe { char[5]* ptr = void; *ptr = "hello"; } ``` The above won't compile, since void-initialization of pointers is not allowed in safe code, with good reason. Is there anything to handwave here? The fd example, however, *will* compile in current language, despite doing the same thing.
 Aliasing an fd does not, by itself, constitute memory 
 corruption.
I did not say it did. Writing to an fd initialized with "implementation-defined" (read: garbage) value may - that's what I said, and that's what the example shows.
 Remember, if you can cause memory corruption in ` safe` code, 
 that means you can also cause undefined behavior in ` safe` 
 code. So if you cannot write a program that uses this alleged 
 loophole to cause UB, then what you have found is not actually 
 memory corruption. (Although it may still be "data corruption".)
You *can* write such a program with fds. The example is one such program. Do you have any suggestions on how to make it clearer?
Feb 22 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 22 February 2022 at 17:29:46 UTC, Stanislav Blinov 
wrote:
 On Tuesday, 22 February 2022 at 16:16:30 UTC, Paul Backus wrote:
 Remember, if you can cause memory corruption in ` safe` code, 
 that means you can also cause undefined behavior in ` safe` 
 code. So if you cannot write a program that uses this alleged 
 loophole to cause UB, then what you have found is not actually 
 memory corruption. (Although it may still be "data 
 corruption".)
You *can* write such a program with fds. The example is one such program. Do you have any suggestions on how to make it clearer?
The example, as written, does not link, because `File.write` is missing a function body. If I fill in the obvious implementation, I get the following program: ```d struct File { void write(const(void)[] data) safe { import core.sys.posix.unistd: write; () trusted { write(fd, data.ptr, data.length); }(); } private int fd; } void main() safe { File f = void; f.write("hello"); } ``` The above program does not have undefined behavior. The call to `write` will either fail with `EBADF`, or attempt to write the string `"hello"` to some unspecified open file. If you believe there is some way to get the above program to produce undefined behavior, or to complete your original example in such a way that it produces undefined behavior without the use of incorrect ` trusted` code, I'm afraid you will have to spell it out for me.
Feb 22 2022
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Tuesday, 22 February 2022 at 18:33:58 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 17:29:46 UTC, Stanislav Blinov 
 wrote:
 You *can* write such a program with fds. The example is one 
 such program. Do you have any suggestions on how to make it 
 clearer?
The example, as written, does not link, because `File.write` is missing a function body.
If you're going to go there, then...
 If I fill in the obvious implementation, I get the following 
 program:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe
     {
         import core.sys.posix.unistd: write;
         ()  trusted { write(fd, data.ptr, data.length); }();
     }
     private int fd;
 }
...that trusted code is incorrect, at least on some platforms (yes, I can nitpick too). Or we can simply agree that `File.write` is implemented correctly in terms of `write` (which is the important part) and leave it at that, as the rest is irrelevant to the example. I am seriously perplexed at this kind of nitpicking, not to mention the implied expectation of having to spell out full-blown libraries in an example code.
 void main()  safe
 {
     File f = void;
     f.write("hello");
 }
 ```

 The above program does not have undefined behavior. The call to 
 `write` will either fail with `EBADF`, or attempt to write the 
 string `"hello"` to some unspecified open file.
I am reasonably certain that the results may be much more varied than that, including some that aren't specified: https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
 If you believe there is some way to get the above program to 
 produce undefined behavior, or to complete your original 
 example in such a way that it produces undefined behavior 
 without the use of incorrect ` trusted` code, I'm afraid you 
 will have to spell it out for me.
Not exhaustive: It may corrupt a given GC's implementation's heap, which means what occurs after the } is anyone's guess. It may mutate data that's supposed to be immutable (i.e. in a parent process, though you could argue that might not be relevant to the DIP). It may block indefinitely, or crash, or complete with no effect. If you could demonstrate that it cannot possibly exhibit at least the above, I'll happily accept being mistaken. ...but seriously. What is it with all the condescending tone on the forums lately?
Feb 22 2022
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav Blinov 
wrote:
 If you're going to go there, then...

 ...

 ...that  trusted code is incorrect, at least on some platforms 
 (yes, I can nitpick too).
Why? I don't see it.
 ...but seriously. What is it with all the condescending tone on 
 the forums lately?
I think Paul makes a valid point and uses an appropriate tone. The DIP should not be hand-wavy about how `scope` checking would help memory safety when using file descriptors. I didn't go into much detail there because I didn't think it would be a contested addition.
Feb 23 2022
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Wednesday, 23 February 2022 at 16:14:51 UTC, Dennis wrote:
 On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav 
 Blinov wrote:
 If you're going to go there, then...

 ...

 ...that  trusted code is incorrect, at least on some platforms 
 (yes, I can nitpick too).
Why? I don't see it.
Because not all possible values of `data.length` are valid values for `write`'s third argument.
 ...but seriously. What is it with all the condescending tone 
 on the forums lately?
I think Paul makes a valid point and uses an appropriate tone. The DIP should not be hand-wavy about how `scope` checking would help memory safety when using file descriptors. I didn't go into much detail there because I didn't think it would be a contested addition.
I see the problem now, thanks.
Feb 23 2022
parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 23 February 2022 at 22:01:55 UTC, Stanislav Blinov 
wrote:
 Because not all possible values of `data.length` are valid 
 values for `write`'s third argument.
POSIX says:
 Before any action described below is taken, and if nbyte is 
 zero and the file is a regular file, the write() function may 
 detect and return errors as described below. In the absence of 
 errors, or if error detection is not performed, the write() 
 function shall return zero and have no other results. If nbyte 
 is zero and the file is not a regular file, the results are 
 unspecified.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html I was unable to find a definition in the standard itself of exactly what "unspecified" means in this context, but I think we can assume that it does not mean the same thing as "undefined", because the POSIX standard uses the actual word "undefined" elsewhere (e.g., in the description of [`pthread_mutex_destroy`][1]). If we assume that it means the same thing as ["unspecified behavior" in C][2], then it means that there are multiple possible behaviors, and the standard does not require an implementation to commit to any particular one in any given situation. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html
Feb 23 2022
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav Blinov 
wrote:
 On Tuesday, 22 February 2022 at 18:33:58 UTC, Paul Backus wrote:
 If you believe there is some way to get the above program to 
 produce undefined behavior, or to complete your original 
 example in such a way that it produces undefined behavior 
 without the use of incorrect ` trusted` code, I'm afraid you 
 will have to spell it out for me.
Not exhaustive: It may corrupt a given GC's implementation's heap, which means what occurs after the } is anyone's guess. It may mutate data that's supposed to be immutable (i.e. in a parent process, though you could argue that might not be relevant to the DIP). It may block indefinitely, or crash, or complete with no effect. If you could demonstrate that it cannot possibly exhibit at least the above, I'll happily accept being mistaken.
Having spent some more time scratching my head over this, I now realize what I was missing: it is indeed possible to open a file descriptor that can corrupt *arbitrary* memory in a process's address space, using something like `/proc/self/mem`. Maybe I'm an idiot for missing this the first time around; I can only ask that you take pity on me. :) This means that calling `write` on a fd is only memory safe if you have previously verified that the file the fd refers to is "well behaved" (i.e., satisfies a particular invariant). It follows that the fd itself must be stored in a ` system` variable in order to ensure that the invariant is maintained in ` safe` code. I don't think adding `scope` checking to the fd makes any difference here, though. *Reading* from `/proc/self/mem` in ` safe` code is perfectly fine, even if you are reading from uninitialized or deallocated memory. The reason such reads are UB when done through pointers is that *dereferencing an invalid pointer* is UB, not because reading from the memory is UB. (I'm also not sure if it's possible in practice to tell whether a file is "well behaved". If not, that means we have to either accept that `write` is *always* ` system`, or allow a permanent loophole in ` safe`. But that's a separate issue.)
Feb 23 2022
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Wednesday, 23 February 2022 at 18:16:17 UTC, Paul Backus wrote:
 (I'm also not sure if it's possible in practice to tell whether 
 a file is "well behaved". If not, that means we have to either 
 accept that `write` is *always* ` system`, or allow a permanent 
 loophole in ` safe`. But that's a separate issue.)
By the way, this issue has also come up in Rust: - https://github.com/rust-lang/rust/issues/32670 - https://blog.yossarian.net/2021/03/16/totally_safe_transmute-line-by-line
Feb 23 2022
prev sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Wednesday, 23 February 2022 at 18:16:17 UTC, Paul Backus wrote:

 Having spent some more time scratching my head over this, I now 
 realize what I was missing: it is indeed possible to open a 
 file descriptor that can corrupt *arbitrary* memory in a 
 process's address space, using something like `/proc/self/mem`.
Yes, or you may use e.g. `memfd_create`. And you can inherit such an fd from a parent process. Or receive a shared memory descriptor from another process.
 Maybe I'm an idiot for missing this the first time around; I 
 can only ask that you take pity on me. :)
Never! How dare you make me question myself!!! :)
 This means that calling `write` on a fd is only memory safe if 
 you have previously verified that the file the fd refers to is 
 "well behaved" (i.e., satisfies a particular invariant). It 
 follows that the fd itself must be stored in a ` system` 
 variable in order to ensure that the invariant is maintained in 
 ` safe` code.
Yup.
 I don't think adding `scope` checking to the fd makes any 
 difference here, though. *Reading* from `/proc/self/mem` in 
 ` safe` code is perfectly fine, even if you are reading from 
 uninitialized or deallocated memory. The reason such reads are 
 UB when done through pointers is that *dereferencing an invalid 
 pointer* is UB, not because reading from the memory is UB.
Well, results of `read`ing from some types of fds are also not specified. So, if I'm not mistaken, performing such a read *and* then using the resulting "data" would be undefined behavior (provided the program even gets there). As for `scope` checks themselves - as Dennis mentions, double `close` looks dissimilar to double free. Yet it *is* subject to a superset of that - use after free, as are `read` and `write`. You may well safely "dangle" an fd and not invoke UB by calling those functions with it, but only up to the point when the program opens another descriptor. Calling `close` on a dangled fd, which would then succeed, would be a mere bug and not invoke UB, but attempting to `write` or `read`+use may. So I do think that fds could still be a good example material for the DIP.
 (I'm also not sure if it's possible in practice to tell whether 
 a file is "well behaved". If not, that means we have to either 
 accept that `write` is *always* ` system`, or allow a permanent 
 loophole in ` safe`. But that's a separate issue.)
I don't think that should be necessary in concrete cases, as the onus of ensuring the implicit invariant would lie on the implementation of, in this case, `File` - e.g. making it non-copyable (or reference-counted), ensuring that the constructor opens an appropriate kind of file, etc. etc. That way the only way to make it unsafe would be to corrupt the given instance of `File` itself, which means there's a memory safety issue somewhere else in the program (for example, that same void-initialization).
Feb 23 2022
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
This is my reply to [this post][1] from the feedback thread:

[1]: 
https://forum.dlang.org/post/qbbatlviwhjsnytbypfw forum.dlang.org

On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?
No, `void[1]` is not a type with unsafe values.
`void[1]` is considered by the compiler to potentially contain pointer data, in accordance with this section of the language spec: https://dlang.org/spec/arrays.html#void_arrays Note in particular the paragraph that begins, "Void arrays can also be static". As a result, the compiler will not allow you to void-initialize a `void[1]` in ` safe` code: ```d void main() safe { void[1] a = void; // error } ``` So, the workaround suggested by Dukc would indeed work. (By the way, I know this because the first thing I did after I read his post in the feedback thread was to actually write out a complete example using the `void[1]` workaround and check to see if it worked.)
Mar 04 2022
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 March 2022 at 13:09:42 UTC, Paul Backus wrote:
 As a result, the compiler will not allow you to void-initialize 
 a `void[1]` in ` safe` code:

 ```d
 void main()  safe {
     void[1] a = void; // error
 }
 ```
That's new to me, and the error makes no sense considering you can (implicitly) convert any array to a `void[]` even in ` safe` code, so you can still do this: ```D void main() safe { ubyte[1] x = void; void[1] y = x; } ```
 So, the workaround suggested by Dukc would indeed work.
It's an interesting alternative if we can nail it down.
Mar 04 2022
parent Paul Backus <snarwin gmail.com> writes:
On Friday, 4 March 2022 at 14:00:31 UTC, Dennis wrote:
 On Friday, 4 March 2022 at 13:09:42 UTC, Paul Backus wrote:
 As a result, the compiler will not allow you to 
 void-initialize a `void[1]` in ` safe` code:

 ```d
 void main()  safe {
     void[1] a = void; // error
 }
 ```
That's new to me, and the error makes no sense considering you can (implicitly) convert any array to a `void[]` even in ` safe` code, so you can still do this: ```D void main() safe { ubyte[1] x = void; void[1] y = x; } ```
Yes, I think this is a case of the compiler (and the spec) applying rules more broadly than is strictly necessary, since `void`-initializing a `void[1]` cannot *actually* lead to UB in ` safe` code on its own.
Mar 04 2022