digitalmars.D - Discussion Thread: DIP 1035-- system Variables--Final Review

Mike Parker (21/21) Feb 19 2022 This is the discussion thread for the Final Review of DIP 1035,

Mike Parker (3/8) Feb 19 2022 The feedback thread is located here:
Dukc (44/66) Feb 20 2022 While this DIP will solve most of the issues of `@safe`

Dennis (4/7) Feb 20 2022 Yes, there's an issue for that: [Issue 21981 - Manually calling a

Dukc (22/25) Mar 04 2022 I was just checking what the language spec says about this, and
Dukc (22/25) Mar 04 2022 I was just checking what the language spec says about this, and

Paul Backus (14/28) Mar 04 2022 First, this was not "overlooked"--it was added to the language

Nick Treleaven (7/18) Feb 22 2022 Doesn't the same problem occur just with reassignment?:

Dukc (20/39) Feb 23 2022 Reassignment can be forbidden at least. But still, my point does

Paul Backus (18/28) Feb 21 2022 In the "Example: `int` as pointer" section, the following

Paul Backus (4/7) Feb 21 2022 This should have been in the Feedback thread. I've reposted it
Dennis (6/11) Feb 21 2022 It is memory-safety related, it allows you to create custom

Paul Backus (14/25) Feb 21 2022 If the goal is being able to define custom pointer types, then

Dennis (18/29) Feb 21 2022 A double `fclose` on a `FILE*` is basically a double free. I

Paul Backus (15/23) Feb 21 2022 The compiler needs at least two notions: `hasPointers` and
Stanislav Blinov (19/29) Feb 22 2022 A more pertinent example around file descriptors and memory

Paul Backus (8/26) Feb 22 2022 If you attempt to fill in the missing part of your example, I

Stanislav Blinov (11/39) Feb 22 2022 Yes, the implementation of `File` would need @trusted code. How

Paul Backus (17/25) Feb 22 2022 If completing the example required *incorrect* use of `@trusted`

Stanislav Blinov (20/47) Feb 22 2022 More the reason for me to not understand the objection then.

Paul Backus (29/39) Feb 22 2022 The example, as written, does not link, because `File.write` is

Stanislav Blinov (23/56) Feb 22 2022 ...that @trusted code is incorrect, at least on some platforms

Dennis (8/14) Feb 23 2022 Why? I don't see it.

Stanislav Blinov (4/20) Feb 23 2022 Because not all possible values of `data.length` are valid values

Paul Backus (18/27) Feb 23 2022 POSIX says:

Paul Backus (24/39) Feb 23 2022 Having spent some more time scratching my head over this, I now

Paul Backus (5/9) Feb 23 2022 By the way, this issue has also come up in Rust:
Stanislav Blinov (29/51) Feb 23 2022 Yes, or you may use e.g. `memfd_create`. And you can inherit such

Paul Backus (22/25) Mar 04 2022 This is my reply to [this post][1] from the feedback thread:

Dennis (11/19) Mar 04 2022 That's new to me, and the error makes no sense considering you

Paul Backus (5/23) Mar 04 2022 Yes, I think this is a case of the compiler (and the spec)

Mike Parker <aldacron gmail.com> writes:

This is the discussion thread for the Final Review of DIP 1035, 
" system Variables":

https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md

The review period will end at 11:59 PM ET on March 5, or when I 
make a post declaring it complete. Discussion in this thread may 
continue beyond that point.

Here in the discussion thread, you are free to discuss anything 
and everything related to the DIP. Express your support or 
opposition, debate alternatives, argue the merits, etc.

However, if you have any specific feedback on how to improve the 
proposal itself, then please post it in the feedback thread. The 
feedback thread will be the source for the review summary I write 
at the end of this review round. I will post a link to that 
thread immediately following this post. Just be sure to read and 
understand the Reviewer Guidelines before posting there:

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

And my blog post on the difference between the Discussion and 
Feedback threads:

https://dlang.org/blog/2020/01/26/dip-reviews-discussion-vs-feedback/

Please stay on topic here. I will delete posts that are 
completely off-topic.

Feb 19 2022

Mike Parker <aldacron gmail.com> writes:

On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:

 However, if you have any specific feedback on how to improve 
 the proposal itself, then please post it in the feedback 
 thread. The feedback thread will be the source for the review 
 summary I write at the end of this review round. I will post a 
 link to that thread immediately following this post.

The feedback thread is located here:

https://forum.dlang.org/post/kwabfusqvczenjjacbmq forum.dlang.org

Feb 19 2022

Dukc <ajieskola gmail.com> writes:

On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:
 This is the discussion thread for the Final Review of DIP 1035, 
 " system Variables":

 https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md

 The review period will end at 11:59 PM ET on March 5, or when I 
 make a post declaring it complete. Discussion in this thread 
 may continue beyond that point.

 Here in the discussion thread, you are free to discuss anything 
 and everything related to the DIP. Express your support or 
 opposition, debate alternatives, argue the merits, etc.

 However, if you have any specific feedback on how to improve 
 the proposal itself, then please post it in the feedback 
 thread. The feedback thread will be the source for the review 
 summary I write at the end of this review round. I will post a 
 link to that thread immediately following this post. Just be 
 sure to read and understand the Reviewer Guidelines before 
 posting there:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 And my blog post on the difference between the Discussion and 
 Feedback threads:

 https://dlang.org/blog/2020/01/26/dip-reviews-discussion-vs-feedback/

 Please stay on topic here. I will delete posts that are 
 completely off-topic.

While this DIP will solve most of the issues of ` safe` 
`__traits(getMember, xxx, yyy)` fetching private members, there 
is one thing that will remain a problem: destructors.

The issue is that we want a way to specify a destructor so that 
it could be called by ` safe` code at end of the lifetime of an 
instance, but not early:

```D
auto shouldBeSafe()
{ ObjectWithDestructor x;
}

auto shouldBeSystem()
{ ObjectWithDestructor x;
   __traits(getMember, x, "__dtor")();
}
```

For that to work, we will either have to make privacy inviolable 
from ` safe` (the destructor that does not want to be called 
early would be private, and `object.destroy` would be changed to 
be ` system` for private destructors), or add some alternative 
way to define destructors.

Why we want to do that at all? DIP1000. Most of the potential of 
DIP1000 is wasted if you cannot prevent destruction before end of 
the scope:

```D
 safe void abuse()
{ auto cont = SomeRaiiContainer([1,2,3]);
   scope ptr = &cont.front;
   destroy(cont);
   int oops = *ptr;
}
```

Yes, you can prevent that by also marking `abuse` ` live`. But I 
think we ought to do better than that. As I see it, ` live` is 
mainly meant as a partial memory safety mechanishm for low-level 
code that cannot be ` safe`. It isn't intended that you start to 
mark your average ` safe` code as ` live`, that would be terribly 
onerous. So we don't want to settle for our libraries being 
memory safe in ` live`, if we can make them memory safe in normal 
` safe`.

This is not intended as an argument against DIP1035, in fact I'm 
still in favour of it. I just wanted to point out that it does 
not entirely solve the problem of `__traits(getMember, xxx, yyy)` 
bypassing privacy.

Feb 20 2022

Dennis <dkorpel gmail.com> writes:

On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 The issue is that we want a way to specify a destructor so that 
 it could be called by ` safe` code at end of the lifetime of an 
 instance, but not early:

Yes, there's an issue for that: [Issue 21981 - Manually calling a 
__dtor can violate memory safety 
](https://issues.dlang.org/show_bug.cgi?id=21981)

Feb 20 2022

Dukc <ajieskola gmail.com> writes:

On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?

 No, `void[1]` is not a type with unsafe values.

I was just checking what the language spec says about this, and 
found an alternative we have all been overlooking.

A type can be declared unsafe in the present language by giving 
it an invariant.

Yes I meant that contract programming invariant! The spec says 
that void-initializing a type with an invariant, or using an 
union that has a member with an invariant is ` system`-only. Thus 
the invariant effectively declares the type unsafe. It also means 
that `void[1]` is an unsafe type, because it can contain a struct 
with an invariant.

This DIP still has the advantage that ` safe` functions in the 
same module with the invariant type do not need any special care. 
But still, that sounds a pretty trivial gain to me - in the 
`IntSlice` example you can make the members read-only with a bit 
union trickery if you want to, or define a string mixin that does 
the same automatically.

I'm starting to think it's probably not worth it overall. Still 
I'm only slightly against because the rules proposed blend such 
nicely with the existing language, and it sure is sometimes 
convenient to have an alternative.

Mar 04 2022

Dukc <ajieskola gmail.com> writes:

On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?

 No, `void[1]` is not a type with unsafe values.

I was just checking what the language spec says about this, and 
found an alternative we have all been overlooking.

A type can be declared unsafe in the present language by giving 
it an invariant.

Yes I meant that contract programming invariant! The spec says 
that void-initializing a type with an invariant, or using an 
union that has a member with an invariant is ` system`-only. Thus 
the invariant effectively declares the type unsafe. It also means 
that `void[1]` is an unsafe type, because it can contain a struct 
with an invariant.

This DIP still has the advantage that ` safe` functions in the 
same module with the invariant type do not need any special care. 
But still, that sounds a pretty trivial gain to me - in the 
`IntSlice` example you can make the members read-only with a bit 
union trickery if you want to, or define a string mixin that does 
the same automatically.

I'm starting to think it's probably not worth it overall. Still 
I'm only slightly against because the rules proposed blend such 
nicely with the existing language, and it sure is sometimes 
convenient to have an alternative.

Mar 04 2022

Paul Backus <snarwin gmail.com> writes:

On Friday, 4 March 2022 at 13:06:35 UTC, Dukc wrote:
 On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote at the 
 feedback theard:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?

 No, `void[1]` is not a type with unsafe values.

 I was just checking what the language spec says about this, and 
 found an alternative we have all been overlooking.

 A type can be declared unsafe in the present language by giving 
 it an invariant.

 Yes I meant that contract programming invariant! The spec says 
 that void-initializing a type with an invariant, or using an 
 union that has a member with an invariant is ` system`-only. 
 Thus the invariant effectively declares the type unsafe.

First, this was not "overlooked"--it was added to the language 
spec well after DIP 1035 was written and submitted. Dennis and I 
have been aware of this spec change since it was first proposed 
in [DMD PR 12326][1].

Second, this is not a complete alternative to DIP 1035, because 
it does not solve [the `__traits(getMember)` issue][2]. As long 
as ` safe` code is allowed to bypass encapsulation and access the 
fields of user-defined types directly, it is impossible for 
` trusted` code to rely on the integrity of the data in those 
fields.

[1]: 
https://github.com/dlang/dmd/pull/12326#issuecomment-812575730
[2]: https://issues.dlang.org/show_bug.cgi?id=20941

Mar 04 2022

Nick Treleaven <nick geany.org> writes:

On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 Why we want to do that at all? DIP1000. Most of the potential 
 of DIP1000 is wasted if you cannot prevent destruction before 
 end of the scope:

 ```D
  safe void abuse()
 { auto cont = SomeRaiiContainer([1,2,3]);
   scope ptr = &cont.front;
   destroy(cont);
   int oops = *ptr;
 }
 ```

Doesn't the same problem occur just with reassignment?:

```d
   scope ptr = &cont.front;
   cont = cont.init;
   int oops = *ptr;
```

Feb 22 2022

Dukc <ajieskola gmail.com> writes:

On Tuesday, 22 February 2022 at 15:05:14 UTC, Nick Treleaven 
wrote:
 On Sunday, 20 February 2022 at 15:16:30 UTC, Dukc wrote:
 Why we want to do that at all? DIP1000. Most of the potential 
 of DIP1000 is wasted if you cannot prevent destruction before 
 end of the scope:

 ```D
  safe void abuse()
 { auto cont = SomeRaiiContainer([1,2,3]);
   scope ptr = &cont.front;
   destroy(cont);
   int oops = *ptr;
 }
 ```

 Doesn't the same problem occur just with reassignment?:

 ```d
   scope ptr = &cont.front;
   cont = cont.init;
   int oops = *ptr;
 ```

Reassignment can be forbidden at least. But still, my point does 
not stand scrutiny, because:

1: No reassignment is clumsy.
2: Even accepting that, this would still do the same thing:
```D
   auto pCont = new SomeRaiiContainer([1,2,3]);
   scope ptr = &pCont.front();
   pCont = null;
   GC.collect;
   int oops = *ptr;
```

So, stratch that. RAII or reference counted containers can only 
be ` safe` with a callback based usage, as suggested by Paul: 
https://github.com/dlang/phobos/pull/8368#issuecomment-1024917439 
. And that idiom is ` safe` even with present destructors. 
Hopefully: We have discovered so many in our DIP1000-based memory 
safety schemes lately so there is no guarantee that this isn't 
still just some oversight.

Feb 23 2022

Paul Backus <snarwin gmail.com> writes:

On Saturday, 19 February 2022 at 12:24:04 UTC, Mike Parker wrote:
 This is the discussion thread for the Final Review of DIP 1035, 
 " system Variables":

 https://github.com/dlang/DIPs/blob/4d73e17901a3a620bf59a2a5bfb8c433069c5f52/DIPs/DIP1035.md

In the "Example: `int` as pointer" section, the following 
sentence appears:

 Because an `int` is a safe type, any `int` value can be created 
 from ` safe` code, so any memory corruption that could follow 
 from escaping a `scope int` could also result from creating the 
 same `int` value without accessing the variable.

This sentence correctly recognizes that (absent incorrect 
` trusted` code elsewhere) there is no memory-safety risk in 
allowing a value without indirections to escape from a function.

It also completely undermines the example's motivation. If there 
is no benefit to memory-safety from applying `scope` checking to 
data without indirections, then there is no justification for 
enabling such checks in all ` safe` code, even if they may 
occasionally be "desirable" for other, non-memory-safety-related 
reasons.

Later, in the "Description" section, we find the following 
sentence:

 The `scope` keyword is not stripped away [from an aggregate 
 with at least one ` system` field], even when the aggregate has 
 no members that contain pointers.

The only justification for this appears to be the example 
discussed above.

Both this sentence, and the example that attempts to support it, 
should be removed from the DIP.

Feb 21 2022

Paul Backus <snarwin gmail.com> writes:

On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 In the "Example: `int` as pointer" section, the following 
 sentence appears:
 [...]

This should have been in the Feedback thread. I've reposted it 
there now:

https://forum.dlang.org/post/lmmpuyeurzavwqiylwlp forum.dlang.org

Feb 21 2022

Dennis <dkorpel gmail.com> writes:

On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 If there is no benefit to memory-safety from applying `scope`
 checking to data without indirections, then there is no
 justification for enabling such checks in all ` safe` code, 
 even if they may occasionally be "desirable" for other, 
 non-memory-safety-related reasons.

It is memory-safety related, it allows you to create custom 
pointer types. A pointer is just an integer under the hood, the 
idea of indirections and lifetimes is just a compile time idea 
around a `size_t` which indexes into memory. Why can't we do the 
same with a `ushort` which indexes into an array?

Feb 21 2022

Paul Backus <snarwin gmail.com> writes:

On Monday, 21 February 2022 at 20:30:07 UTC, Dennis wrote:
 On Monday, 21 February 2022 at 19:49:58 UTC, Paul Backus wrote:
 If there is no benefit to memory-safety from applying `scope`
 checking to data without indirections, then there is no
 justification for enabling such checks in all ` safe` code, 
 even if they may occasionally be "desirable" for other, 
 non-memory-safety-related reasons.

 It is memory-safety related, it allows you to create custom 
 pointer types. A pointer is just an integer under the hood, the 
 idea of indirections and lifetimes is just a compile time idea 
 around a `size_t` which indexes into memory. Why can't we do 
 the same with a `ushort` which indexes into an array?

If the goal is being able to define custom pointer types, then 
the DIP should use that as an example instead of talking about 
file descriptors, and it should explain *exactly* which part of 
the example depends on this feature for memory safety (as the 
other examples do).

I still don't think it's a compelling use-case, though. 
[`TailUnqual`][1] does something very similar, using the `union` 
workaround, and it would not benefit from having access to 
`scope`-checked integers because (a) it stores a `size_t`, so 
eliminating the `union` wouldn't save any space; and (b) it needs 
the `union` for correct GC scanning regardless.

[1]: 
https://gist.github.com/pbackus/1638523a5b6ea3ce2c0a73358cff4dc6

Feb 21 2022

Dennis <dkorpel gmail.com> writes:

On Monday, 21 February 2022 at 21:50:31 UTC, Paul Backus wrote:
 If the goal is being able to define custom pointer types, then 
 the DIP should use that as an example instead of talking about 
 file descriptors, and it should explain *exactly* which part of 
 the example depends on this feature for memory safety (as the 
 other examples do).

A double `fclose` on a `FILE*` is basically a double free. I 
thought the same would apply to raw file descriptors, but I just 
read that a double `close` simply results in an `EBADF` error, so 
maybe it's not a good example.

 I still don't think it's a compelling use-case, though.
 [`TailUnqual`][1] does something very similar, using the 
 `union` workaround, and it would not benefit from having access 
 to `scope`-checked integers because (a) it stores a `size_t`, 
 so eliminating the `union` wouldn't save any space; and (b) it 
 needs the `union` for correct GC scanning regardless.

Yes, TailUnqual doesn't need `scope`-checked integers, but that 
doesn't mean other code doesn't need it. I added the rule for two 
reasons:

- The compiler currently has a notion of a type that 
`hasPointers`. The extra complexity of adding a notion 
`hasSystemVariables` was daunting, but then I thought we could 
just make them the same. I think that would not only simplify the 
implementation, but also the feature in general. It makes it easy 
to draw a parallel to a pointer and a ` system size_t`.
- Some people asked for the feature (see links in the rationale 
section)

I can improve the DIP text, but I'm not yet convinced the rule 
should be scrapped.

Feb 21 2022

Paul Backus <snarwin gmail.com> writes:

On Monday, 21 February 2022 at 22:56:30 UTC, Dennis wrote:
 - The compiler currently has a notion of a type that 
 `hasPointers`. The extra complexity of adding a notion 
 `hasSystemVariables` was daunting, but then I thought we could 
 just make them the same. I think that would not only simplify 
 the implementation, but also the feature in general. It makes 
 it easy to draw a parallel to a pointer and a ` system size_t`.

The compiler needs at least two notions: `hasPointers` and 
`hasUnsafeValues`. `hasUnsafeValues` is a superset of 
`hasPointers`, and also includes aggregate types with ` system` 
fields and [`bool`][1].

Since I assume you do not intend to expand `scope` checking to 
`bool`, folding everything into a single concept will not be 
possible.

[1]: https://issues.dlang.org/show_bug.cgi?id=20148

 - Some people asked for the feature (see links in the rationale 
 section)

I've read those links. In one of them, the problem was solved 
using existing DIP 1000 features. The other asks for "unique 
references (and borrow) to plain ints". It is not clear to me 
that `scope` checking for integers would solve either of these 
problems. If you believe it would, then it should not be 
difficult for you to write up an example demonstrating how.

Feb 21 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Monday, 21 February 2022 at 22:56:30 UTC, Dennis wrote:
 On Monday, 21 February 2022 at 21:50:31 UTC, Paul Backus wrote:
 If the goal is being able to define custom pointer types, then 
 the DIP should use that as an example instead of talking about 
 file descriptors, and it should explain *exactly* which part 
 of the example depends on this feature for memory safety (as 
 the other examples do).

 A double `fclose` on a `FILE*` is basically a double free. I 
 thought the same would apply to raw file descriptors, but I 
 just read that a double `close` simply results in an `EBADF` 
 error, so maybe it's not a good example.

A more pertinent example around file descriptors and memory 
safety is void-initialization:

```d
struct File
{
     void write(const(void)[] data)  safe;
     // ...
     private int fd;
}

void main()  safe
{
     File f = void; // this compiles in current language, because 
`File` doesn't have pointers
     f.write("hello"); // may corrupt memory if 
(implementation-defined) value of `f.fd` happens to correspond to 
an existing mapping
}
```

Feb 22 2022

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 22 February 2022 at 08:47:55 UTC, Stanislav Blinov 
wrote:
 A more pertinent example around file descriptors and memory 
 safety is void-initialization:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe;
     // ...
     private int fd;
 }

 void main()  safe
 {
     File f = void; // this compiles in current language, 
 because `File` doesn't have pointers
     f.write("hello"); // may corrupt memory if 
 (implementation-defined) value of `f.fd` happens to correspond 
 to an existing mapping
 }
 ```

If you attempt to fill in the missing part of your example, I 
think you will find that you cannot actually demonstrate memory 
corruption resulting from `void`-initialization of a file 
descriptor without the use of ` trusted` code (e.g., to cast the 
`void*` returned from `mmap` to some other type of pointer whose 
target type has unsafe values).

Feb 22 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 22 February 2022 at 13:13:43 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 08:47:55 UTC, Stanislav Blinov 
 wrote:
 A more pertinent example around file descriptors and memory 
 safety is void-initialization:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe;
     // ...
     private int fd;
 }

 void main()  safe
 {
     File f = void; // this compiles in current language, 
 because `File` doesn't have pointers
     f.write("hello"); // may corrupt memory if 
 (implementation-defined) value of `f.fd` happens to correspond 
 to an existing mapping
 }
 ```

 If you attempt to fill in the missing part of your example, I 
 think you will find that you cannot actually demonstrate memory 
 corruption resulting from `void`-initialization of a file 
 descriptor without the use of ` trusted` code (e.g., to cast 
 the `void*` returned from `mmap` to some other type of pointer 
 whose target type has unsafe values).

Yes, the implementation of `File` would need  trusted code. How 
would that invalidate the example?

Your process can inherit fds from its parent. Or you may have 
pipes, shared memfds, sockets. Plenty of ways of obtaining a 
valid fd without requiring any casts OR having to deal with 
pointers. The example shows a way to (unintentionally) alias an 
existing fd (obtained through whichever means) and write to it, 
in  safe context.

An fd is an unsafe quantity encoded in a safe type. We need a way 
to express that in the language.

Feb 22 2022

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 22 February 2022 at 15:55:16 UTC, Stanislav Blinov 
wrote:
 Yes, the implementation of `File` would need  trusted code. How 
 would that invalidate the example?

If completing the example required *incorrect* use of ` trusted` 
(i.e., on a function that does not have a [safe interface][1]), 
it would not be valid.

Using ` trusted` in the implementation of `File.write` to call 
POSIX `write` would not be a problem.

[1]: https://dlang.org/spec/function.html#safe-interfaces

 Your process can inherit fds from its parent. Or you may have 
 pipes, shared memfds, sockets. Plenty of ways of obtaining a 
 valid fd without requiring any casts OR having to deal with 
 pointers. The example shows a way to (unintentionally) alias an 
 existing fd (obtained through whichever means) and write to it, 
 in  safe context.

The example shows a write to an fd, and then hand-waves about how 
this could maybe, hypothetically, somehow, cause memory 
corruption. Aliasing an fd does not, by itself, constitute memory 
corruption.

Remember, if you can cause memory corruption in ` safe` code, 
that means you can also cause undefined behavior in ` safe` code. 
So if you cannot write a program that uses this alleged loophole 
to cause UB, then what you have found is not actually memory 
corruption. (Although it may still be "data corruption".)

Feb 22 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 22 February 2022 at 16:16:30 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 15:55:16 UTC, Stanislav Blinov 
 wrote:
 Yes, the implementation of `File` would need  trusted code. 
 How would that invalidate the example?

 If completing the example required *incorrect* use of 
 ` trusted` (i.e., on a function that does not have a [safe 
 interface][1]), it would not be valid.

It doesn't. The program, as presented, is enough.

 Using ` trusted` in the implementation of `File.write` to call 
 POSIX `write` would not be a problem.

More the reason for me to not understand the objection then.

 [1]: https://dlang.org/spec/function.html#safe-interfaces

 Your process can inherit fds from its parent. Or you may have 
 pipes, shared memfds, sockets. Plenty of ways of obtaining a 
 valid fd without requiring any casts OR having to deal with 
 pointers. The example shows a way to (unintentionally) alias 
 an existing fd (obtained through whichever means) and write to 
 it, in  safe context.

 The example shows a write to an fd, and then hand-waves about 
 how this could maybe, hypothetically, somehow, cause memory 
 corruption.

I don't follow. It seems pretty clear to me how the example is 
expressed. Where's the handwaving? Compare to this:

```d
void main()  safe {
     char[5]* ptr = void;
     *ptr = "hello";
}
```

The above won't compile, since void-initialization of pointers is 
not allowed in  safe code, with good reason. Is there anything to 
handwave here? The fd example, however, *will* compile in current 
language, despite doing the same thing.

 Aliasing an fd does not, by itself, constitute memory 
 corruption.

I did not say it did. Writing to an fd initialized with 
"implementation-defined" (read: garbage) value may - that's what 
I said, and that's what the example shows.

 Remember, if you can cause memory corruption in ` safe` code, 
 that means you can also cause undefined behavior in ` safe` 
 code. So if you cannot write a program that uses this alleged 
 loophole to cause UB, then what you have found is not actually 
 memory corruption. (Although it may still be "data corruption".)

You *can* write such a program with fds. The example is one such 
program. Do you have any suggestions on how to make it clearer?

Feb 22 2022

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 22 February 2022 at 17:29:46 UTC, Stanislav Blinov 
wrote:
 On Tuesday, 22 February 2022 at 16:16:30 UTC, Paul Backus wrote:
 Remember, if you can cause memory corruption in ` safe` code, 
 that means you can also cause undefined behavior in ` safe` 
 code. So if you cannot write a program that uses this alleged 
 loophole to cause UB, then what you have found is not actually 
 memory corruption. (Although it may still be "data 
 corruption".)

 You *can* write such a program with fds. The example is one 
 such program. Do you have any suggestions on how to make it 
 clearer?

The example, as written, does not link, because `File.write` is 
missing a function body. If I fill in the obvious implementation, 
I get the following program:

```d
struct File
{
     void write(const(void)[] data)  safe
     {
         import core.sys.posix.unistd: write;
         ()  trusted { write(fd, data.ptr, data.length); }();
     }
     private int fd;
}

void main()  safe
{
     File f = void;
     f.write("hello");
}
```

The above program does not have undefined behavior. The call to 
`write` will either fail with `EBADF`, or attempt to write the 
string `"hello"` to some unspecified open file.

If you believe there is some way to get the above program to 
produce undefined behavior, or to complete your original example 
in such a way that it produces undefined behavior without the use 
of incorrect ` trusted` code, I'm afraid you will have to spell 
it out for me.

Feb 22 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Tuesday, 22 February 2022 at 18:33:58 UTC, Paul Backus wrote:
 On Tuesday, 22 February 2022 at 17:29:46 UTC, Stanislav Blinov 
 wrote:

 You *can* write such a program with fds. The example is one 
 such program. Do you have any suggestions on how to make it 
 clearer?

 The example, as written, does not link, because `File.write` is 
 missing a function body.

If you're going to go there, then...

 If I fill in the obvious implementation, I get the following 
 program:

 ```d
 struct File
 {
     void write(const(void)[] data)  safe
     {
         import core.sys.posix.unistd: write;
         ()  trusted { write(fd, data.ptr, data.length); }();
     }
     private int fd;
 }

...that  trusted code is incorrect, at least on some platforms 
(yes, I can nitpick too). Or we can simply agree that 
`File.write` is implemented correctly in terms of `write` (which 
is the important part) and leave it at that, as the rest is 
irrelevant to the example. I am seriously perplexed at this kind 
of nitpicking, not to mention the implied expectation of having 
to spell out full-blown libraries in an example code.

 void main()  safe
 {
     File f = void;
     f.write("hello");
 }
 ```

 The above program does not have undefined behavior. The call to 
 `write` will either fail with `EBADF`, or attempt to write the 
 string `"hello"` to some unspecified open file.

I am reasonably certain that the results may be much more varied 
than that, including some that aren't specified:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

 If you believe there is some way to get the above program to 
 produce undefined behavior, or to complete your original 
 example in such a way that it produces undefined behavior 
 without the use of incorrect ` trusted` code, I'm afraid you 
 will have to spell it out for me.

Not exhaustive:
It may corrupt a given GC's implementation's heap, which means 
what occurs after the } is anyone's guess.
It may mutate data that's supposed to be immutable (i.e. in a 
parent process, though you could argue that might not be relevant 
to the DIP).
It may block indefinitely, or crash, or complete with no effect.

If you could demonstrate that it cannot possibly exhibit at least 
the above, I'll happily accept being mistaken.

...but seriously. What is it with all the condescending tone on 
the forums lately?

Feb 22 2022

Dennis <dkorpel gmail.com> writes:

On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav Blinov 
wrote:
 If you're going to go there, then...

 ...

 ...that  trusted code is incorrect, at least on some platforms 
 (yes, I can nitpick too).

Why? I don't see it.

 ...but seriously. What is it with all the condescending tone on 
 the forums lately?

I think Paul makes a valid point and uses an appropriate tone. 
The DIP should not be hand-wavy about how `scope` checking would 
help memory safety when using file descriptors. I didn't go into 
much detail there because I didn't think it would be a contested 
addition.

Feb 23 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Wednesday, 23 February 2022 at 16:14:51 UTC, Dennis wrote:
 On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav 
 Blinov wrote:
 If you're going to go there, then...

 ...

 ...that  trusted code is incorrect, at least on some platforms 
 (yes, I can nitpick too).

 Why? I don't see it.

Because not all possible values of `data.length` are valid values 
for `write`'s third argument.

 ...but seriously. What is it with all the condescending tone 
 on the forums lately?

 I think Paul makes a valid point and uses an appropriate tone. 
 The DIP should not be hand-wavy about how `scope` checking 
 would help memory safety when using file descriptors. I didn't 
 go into much detail there because I didn't think it would be a 
 contested addition.

I see the problem now, thanks.

Feb 23 2022

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 23 February 2022 at 22:01:55 UTC, Stanislav Blinov 
wrote:
 Because not all possible values of `data.length` are valid 
 values for `write`'s third argument.

POSIX says:

 Before any action described below is taken, and if nbyte is 
 zero and the file is a regular file, the write() function may 
 detect and return errors as described below. In the absence of 
 errors, or if error detection is not performed, the write() 
 function shall return zero and have no other results. If nbyte 
 is zero and the file is not a regular file, the results are 
 unspecified.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

I was unable to find a definition in the standard itself of 
exactly what "unspecified" means in this context, but I think we 
can assume that it does not mean the same thing as "undefined", 
because the POSIX standard uses the actual word "undefined" 
elsewhere (e.g., in the description of 
[`pthread_mutex_destroy`][1]).

If we assume that it means the same thing as ["unspecified 
behavior" in C][2], then it means that there are multiple 
possible behaviors, and the standard does not require an 
implementation to commit to any particular one in any given 
situation.

[1]: 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html

Feb 23 2022

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 23 February 2022 at 00:14:13 UTC, Stanislav Blinov 
wrote:
 On Tuesday, 22 February 2022 at 18:33:58 UTC, Paul Backus wrote:
 If you believe there is some way to get the above program to 
 produce undefined behavior, or to complete your original 
 example in such a way that it produces undefined behavior 
 without the use of incorrect ` trusted` code, I'm afraid you 
 will have to spell it out for me.

 Not exhaustive:
 It may corrupt a given GC's implementation's heap, which means 
 what occurs after the } is anyone's guess.
 It may mutate data that's supposed to be immutable (i.e. in a 
 parent process, though you could argue that might not be 
 relevant to the DIP).
 It may block indefinitely, or crash, or complete with no effect.

 If you could demonstrate that it cannot possibly exhibit at 
 least the above, I'll happily accept being mistaken.

Having spent some more time scratching my head over this, I now 
realize what I was missing: it is indeed possible to open a file 
descriptor that can corrupt *arbitrary* memory in a process's 
address space, using something like `/proc/self/mem`. Maybe I'm 
an idiot for missing this the first time around; I can only ask 
that you take pity on me. :)

This means that calling `write` on a fd is only memory safe if 
you have previously verified that the file the fd refers to is 
"well behaved" (i.e., satisfies a particular invariant). It 
follows that the fd itself must be stored in a ` system` variable 
in order to ensure that the invariant is maintained in ` safe` 
code.

I don't think adding `scope` checking to the fd makes any 
difference here, though. *Reading* from `/proc/self/mem` in 
` safe` code is perfectly fine, even if you are reading from 
uninitialized or deallocated memory. The reason such reads are UB 
when done through pointers is that *dereferencing an invalid 
pointer* is UB, not because reading from the memory is UB.

(I'm also not sure if it's possible in practice to tell whether a 
file is "well behaved". If not, that means we have to either 
accept that `write` is *always* ` system`, or allow a permanent 
loophole in ` safe`. But that's a separate issue.)

Feb 23 2022

Paul Backus <snarwin gmail.com> writes:

On Wednesday, 23 February 2022 at 18:16:17 UTC, Paul Backus wrote:
 (I'm also not sure if it's possible in practice to tell whether 
 a file is "well behaved". If not, that means we have to either 
 accept that `write` is *always* ` system`, or allow a permanent 
 loophole in ` safe`. But that's a separate issue.)

By the way, this issue has also come up in Rust:

- https://github.com/rust-lang/rust/issues/32670
- 
https://blog.yossarian.net/2021/03/16/totally_safe_transmute-line-by-line

Feb 23 2022

Stanislav Blinov <stanislav.blinov gmail.com> writes:

On Wednesday, 23 February 2022 at 18:16:17 UTC, Paul Backus wrote:

 Having spent some more time scratching my head over this, I now 
 realize what I was missing: it is indeed possible to open a 
 file descriptor that can corrupt *arbitrary* memory in a 
 process's address space, using something like `/proc/self/mem`.

Yes, or you may use e.g. `memfd_create`. And you can inherit such 
an fd from a parent process. Or receive a shared memory 
descriptor from another process.

 Maybe I'm an idiot for missing this the first time around; I 
 can only ask that you take pity on me. :)

Never! How dare you make me question myself!!! :)

 This means that calling `write` on a fd is only memory safe if 
 you have previously verified that the file the fd refers to is 
 "well behaved" (i.e., satisfies a particular invariant). It 
 follows that the fd itself must be stored in a ` system` 
 variable in order to ensure that the invariant is maintained in 
 ` safe` code.

Yup.

 I don't think adding `scope` checking to the fd makes any 
 difference here, though. *Reading* from `/proc/self/mem` in 
 ` safe` code is perfectly fine, even if you are reading from 
 uninitialized or deallocated memory. The reason such reads are 
 UB when done through pointers is that *dereferencing an invalid 
 pointer* is UB, not because reading from the memory is UB.

Well, results of `read`ing from some types of fds are also not 
specified. So, if I'm not mistaken, performing such a read *and* 
then using the resulting "data" would be undefined behavior 
(provided the program even gets there).

As for `scope` checks themselves - as Dennis mentions, double 
`close` looks dissimilar to double free. Yet it *is* subject to a 
superset of that - use after free, as are `read` and `write`. You 
may well safely "dangle" an fd and not invoke UB by calling those 
functions with it, but only up to the point when the program 
opens another descriptor. Calling `close` on a dangled fd, which 
would then succeed, would be a mere bug and not invoke UB, but 
attempting to `write` or `read`+use may.

So I do think that fds could still be a good example material for 
the DIP.

 (I'm also not sure if it's possible in practice to tell whether 
 a file is "well behaved". If not, that means we have to either 
 accept that `write` is *always* ` system`, or allow a permanent 
 loophole in ` safe`. But that's a separate issue.)

I don't think that should be necessary in concrete cases, as the 
onus of ensuring the implicit invariant would lie on the 
implementation of, in this case, `File` - e.g. making it 
non-copyable (or reference-counted), ensuring that the 
constructor opens an appropriate kind of file, etc. etc. That way 
the only way to make it unsafe would be to corrupt the given 
instance of `File` itself, which means there's a memory safety 
issue somewhere else in the program (for example, that same 
void-initialization).

Feb 23 2022

Paul Backus <snarwin gmail.com> writes:

This is my reply to [this post][1] from the feedback thread:

[1]: 
https://forum.dlang.org/post/qbbatlviwhjsnytbypfw forum.dlang.org

On Friday, 4 March 2022 at 09:39:53 UTC, Dennis wrote:
 On Friday, 25 February 2022 at 21:46:25 UTC, Dukc wrote:
 Wouldn't putting the handle in union with `void[1]` work?

 No, `void[1]` is not a type with unsafe values.

`void[1]` is considered by the compiler to potentially contain 
pointer data, in accordance with this section of the language 
spec:

https://dlang.org/spec/arrays.html#void_arrays

Note in particular the paragraph that begins, "Void arrays can 
also be static".

As a result, the compiler will not allow you to void-initialize a 
`void[1]` in ` safe` code:

```d
void main()  safe {
     void[1] a = void; // error
}
```

So, the workaround suggested by Dukc would indeed work.

(By the way, I know this because the first thing I did after I 
read his post in the feedback thread was to actually write out a 
complete example using the `void[1]` workaround and check to see 
if it worked.)

Mar 04 2022

Dennis <dkorpel gmail.com> writes:

On Friday, 4 March 2022 at 13:09:42 UTC, Paul Backus wrote:
 As a result, the compiler will not allow you to void-initialize 
 a `void[1]` in ` safe` code:

 ```d
 void main()  safe {
     void[1] a = void; // error
 }
 ```

That's new to me, and the error makes no sense considering you 
can (implicitly) convert any array to a `void[]` even in ` safe` 
code, so you can still do this:

```D
void main()  safe {
     ubyte[1] x = void;
     void[1] y = x;
}
```

 So, the workaround suggested by Dukc would indeed work.

It's an interesting alternative if we can nail it down.

Mar 04 2022

Paul Backus <snarwin gmail.com> writes:

On Friday, 4 March 2022 at 14:00:31 UTC, Dennis wrote:
 On Friday, 4 March 2022 at 13:09:42 UTC, Paul Backus wrote:
 As a result, the compiler will not allow you to 
 void-initialize a `void[1]` in ` safe` code:

 ```d
 void main()  safe {
     void[1] a = void; // error
 }
 ```

 That's new to me, and the error makes no sense considering you 
 can (implicitly) convert any array to a `void[]` even in 
 ` safe` code, so you can still do this:

 ```D
 void main()  safe {
     ubyte[1] x = void;
     void[1] y = x;
 }
 ```

Yes, I think this is a case of the compiler (and the spec) 
applying rules more broadly than is strictly necessary, since 
`void`-initializing a `void[1]` cannot *actually* lead to UB in 
` safe` code on its own.

Mar 04 2022

D Programming

C/C++ Programming

Other

digitalmars.D - Discussion Thread: DIP 1035-- system Variables--Final Review