www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DIP1000 scope inference

reply Steven Schveighoffer <schveiguy gmail.com> writes:
Deprecation messages due to dip1000's imminent arrival are scheduled to 
happen on the next release of the compiler. I have some concerns about 
scope inference, and wanted to find out the answers here.

Let's say I have a scope array like this in a  trusted function:

```d
int[] mkarr()  trusted {
    scope arr = [1, 2, 3];
    return arr;
}
```

Clearly, this is a bad idea. The compiler might put the array data 
actually on the stack (right?), and therefore return stack data when it 
shouldn't.

But what if you *don't* mark it scope? Let's try something here:

```d
int[] mkarr()  safe {
    int[3] arr = [1, 2, 3];
    int[] other = arr[];

    other = [4, 5, 6];
    return other;
}
```

by the time `other` is returned, it should no longer be pointing at 
stack data. But *because* it was originally assigned to the static 
array, `other` is inferred as scope (as is proven by the code above 
failing to compile with dip1000 enabled with an error about returning 
scope data).

Let's switch that back to ` trusted`, and now it does compile, even with 
dip1000. BUT, let me ask this very crucial question:

Does the inferred `scope` make it so that the compiler is *allowed* to 
allocate the `[4, 5, 6]` literal on the stack? Keep in mind that I never 
put `scope` here, this is something the compiler did on its own.

In a ` trusted` function today, without dip1000, the above is perfectly 
reasonable and not invalid. Will dip1000 make it corrupt memory?

-Steve
Oct 24 2022
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 25 October 2022 at 01:35:28 UTC, Steven Schveighoffer 
wrote:
 Does the inferred `scope` make it so that the compiler is 
 *allowed* to allocate the `[4, 5, 6]` literal on the stack? 
 Keep in mind that I never put `scope` here, this is something 
 the compiler did on its own.
No, it does not. This capability was added only for array literals, and only for variable initialization: DMD PR: https://github.com/dlang/dmd/pull/14562 Spec PR (pending): https://github.com/dlang/dlang.org/pull/3442 However, this thread raises an important point: changing the way existing language constructs allocate memory in the presence of `scope` may cause ` trusted` code which relied on the original behavior to become unsound. For example, the ` trusted` function below is memory safe when using the current compiler release, but will become unsafe when compiled with DMD 2.101: ```d trusted int[] example() { scope example = [1, 2, 3]; return example; } ``` The worst part is that the potential memory corruption is introduced silently. Users who upgrade to DMD 2.101 will have no idea that the ground has shifted beneath their feet until their code invokes UB at runtime.
Oct 24 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 25/10/2022 3:09 PM, Paul Backus wrote:
 For example, the ` trusted` function below is memory safe when using the 
 current compiler release, but will become unsafe when compiled with DMD 
 2.101:
 
 ```d
  trusted int[] example()
 {
      scope example = [1, 2, 3];
      return example;
 }
 ```
Does this also apply to safe?
Oct 24 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 25/10/2022 3:13 PM, rikki cattermole wrote:
 On 25/10/2022 3:09 PM, Paul Backus wrote:
 For example, the ` trusted` function below is memory safe when using 
 the current compiler release, but will become unsafe when compiled 
 with DMD 2.101:

 ```d
  trusted int[] example()
 {
      scope example = [1, 2, 3];
      return example;
 }
 ```
Does this also apply to safe?
Apparently. I can't find any checks in the PR. REVERT REVERT REVERT (or ya know add the check for safe). #SuddenlyWorried lol
Oct 24 2022
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 25 October 2022 at 02:13:20 UTC, rikki cattermole 
wrote:
 On 25/10/2022 3:09 PM, Paul Backus wrote:
 For example, the ` trusted` function below is memory safe when 
 using the current compiler release, but will become unsafe 
 when compiled with DMD 2.101:
 
 ```d
  trusted int[] example()
 {
      scope example = [1, 2, 3];
      return example;
 }
 ```
Does this also apply to safe?
No, because you are not allowed to return a `scope` variable in ` safe` code, even if you happen to know that it points to a heap allocation.
Oct 24 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/24/22 10:09 PM, Paul Backus wrote:
 On Tuesday, 25 October 2022 at 01:35:28 UTC, Steven Schveighoffer wrote:
 Does the inferred `scope` make it so that the compiler is *allowed* to 
 allocate the `[4, 5, 6]` literal on the stack? Keep in mind that I 
 never put `scope` here, this is something the compiler did on its own.
No, it does not. This capability was added only for array literals, and only for variable initialization: DMD PR: https://github.com/dlang/dmd/pull/14562 Spec PR (pending): https://github.com/dlang/dlang.org/pull/3442
OK, what about this? ```d int[] mkarr() trusted { int[3] arr = [1, 2, 3]; int[] other = [4, 5, 6]; auto foo = other; other = arr[]; return foo; } ``` `other` is inferred as `scope` (along with `foo`), because it touches `arr[]` later (but after it was pointing at what should have been heap memory). So does that count as possible for stack allocation, or is it still heap allocated? -Steve
Oct 24 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 25 October 2022 at 02:38:02 UTC, Steven Schveighoffer 
wrote:
 OK, what about this?

 ```d
 int[] mkarr()  trusted {
     int[3] arr = [1, 2, 3];
     int[] other = [4, 5, 6];

     auto foo = other;
     other = arr[];
     return foo;
 }
 ```

 `other` is inferred as `scope` (along with `foo`), because it 
 touches `arr[]` later (but after it was pointing at what should 
 have been heap memory). So does that count as possible for 
 stack allocation, or is it still heap allocated?
When I compile the above with ` safe` and `-preview=dip1000`, I get Error: reference to local variable `arr` assigned to non-scope `other` ...using both DMD 2.100.2 and DMD master. So `scope` is not actually being inferred here, and the array is allocated on the heap. My expectation is that `scope` will probably *never* be inferred for `other`, because doing multi-step inference like this requires dataflow analysis in the general case, which is something Walter wants to avoid (see discussion in [issue 20674][1]). So I don't think you have anything to worry about. Still, this is a good illustration of how silently changing the rules on people can have unintended consequences. If Walter ever *does* consider adding dataflow analysis, overly-aggressive "optimizations" like these could easily become obstacles in the way of that goal. [1]: https://issues.dlang.org/show_bug.cgi?id=20674
Oct 24 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/24/22 10:59 PM, Paul Backus wrote:
 On Tuesday, 25 October 2022 at 02:38:02 UTC, Steven Schveighoffer wrote:
 OK, what about this?

 ```d
 int[] mkarr()  trusted {
     int[3] arr = [1, 2, 3];
     int[] other = [4, 5, 6];

     auto foo = other;
     other = arr[];
     return foo;
 }
 ```

 `other` is inferred as `scope` (along with `foo`), because it touches 
 `arr[]` later (but after it was pointing at what should have been heap 
 memory). So does that count as possible for stack allocation, or is it 
 still heap allocated?
When I compile the above with ` safe` and `-preview=dip1000`, I get     Error: reference to local variable `arr` assigned to non-scope `other`
OK, I misread the error here, it's the same on run.dlang.io. But we did just go through an exercise where a struct not labeled scope is inferred scope not because of its declaration, but because of later things done with it. It doesn't seem to be the case here.
 
 ...using both DMD 2.100.2 and DMD master. So `scope` is not actually 
 being inferred here, and the array is allocated on the heap.
 
 My expectation is that `scope` will probably *never* be inferred for 
 `other`, because doing multi-step inference like this requires dataflow 
 analysis in the general case, which is something Walter wants to avoid 
 (see discussion in [issue 20674][1]). So I don't think you have anything 
 to worry about.
My biggest concern is that this inference takes priority over what is actually written, and then can cause memory problems to occur in code that seemingly reads like it shouldn't cause memory problems. I'm trying to find a hole because I'm worried about that hole showing up without intention later (especially with the way the compiler can inline and rewrite code for optimization). The compiler doing things that are not checkable (I know of no way to introspect that something is scope inferred), hard to describe, and impossible to prevent makes things uncomfortable. Especially if the compiler might make disastrous decisions based on that inference. It would be relieving to have some rule that says "any data inferred scope inside a system or trusted context without explicitly being declared scope shall not result in memory allocations hoisting to the stack". I can deal, begrudgingly, with compiler errors that are misguided. I can't deal with memory errors caused by the compiler knowing better than me. -Steve
Oct 25 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/25/22 9:44 AM, Steven Schveighoffer wrote:
 But we did just go through an exercise where a struct not labeled scope 
 is inferred scope not because of its declaration, but because of later 
 things done with it.
It's very curious. I can't get any indication that the struct is inferred scope except by throwing an exception contained in it. If I declare the function safe, it won't let me assign the scope variable to the struct member. If I mark it as trusted, that succeeds, but then it won't let me throw the exception out of the struct because it says the struct is scope. Again, with no way to tell whether scope is inferred, it's hard to judge. -Steve
Oct 25 2022
prev sibling parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Tuesday, 25 October 2022 at 02:09:02 UTC, Paul Backus wrote:
 For example, the ` trusted` function below is memory safe when 
 using the current compiler release, but will become unsafe when 
 compiled with DMD 2.101:

 ```d
  trusted int[] example()
 {
     scope example = [1, 2, 3];
     return example;
 }
 ```

 The worst part is that the potential memory corruption is 
 introduced silently. Users who upgrade to DMD 2.101 will have 
 no idea that the ground has shifted beneath their feet until 
 their code invokes UB at runtime.
Asking curiously, wasn’t the function UB before, but the behavior changed?
Oct 25 2022
parent Paul Backus <snarwin gmail.com> writes:
On Tuesday, 25 October 2022 at 14:14:00 UTC, Quirin Schroll wrote:
 Asking curiously, wasn’t the function UB before, but the 
 behavior changed?
Until very recently, the language spec [1] said that a `scope` *parameter* "must not escape", but was silent on whether the same rule applied to `scope` local variables (although it would be reasonable to infer that it did). At some point between the release of DMD 2.100.2 and current `master`, the spec was updated to additionally state that returning a `scope` variable from a function is "disallowed" [2]. So, yes, I think the most reasonable interpretation is that this was always intended to be UB. But I am not confident that the average D user would have *known* for certain it was UB at the time DMD 2.100.2 was released. [1] https://dlang.org/spec/function.html#scope-parameters [2] https://dlang.org/spec/attribute.html#scope
Oct 25 2022
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is perfectly 
 reasonable and not invalid. Will dip1000 make it corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
Oct 26 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 26/10/2022 9:03 PM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it corrupt 
 memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
At the very least, if no solution can be determined this needs to be reverted before 2.101.0.
Oct 26 2022
parent Nick Treleaven <nick geany.org> writes:
On Wednesday, 26 October 2022 at 08:37:36 UTC, rikki cattermole 
wrote:
 On 26/10/2022 9:03 PM, Walter Bright wrote:
 A very good question. Clearly, having code work when it is 
  safe, but cause memory corruption when it is marked  trusted, 
 is the wrong solution. This should never happen. I'm not sure 
 what the solution should be here.
At the very least, if no solution can be determined this needs to be reverted before 2.101.0.
There's nothing to revert that corrupts memory (without incorrectly writing scope), see Paul's reply.
Oct 26 2022
prev sibling next sibling parent reply German Diago <germandiago gmail.com> writes:
On Wednesday, 26 October 2022 at 08:03:37 UTC, Walter Bright 
wrote:

 A very good question. Clearly, having code work when it is 
  safe, but cause memory corruption when it is marked  trusted, 
 is the wrong solution. This should never happen. I'm not sure 
 what the solution should be here.
Is not trusted code (note my little D experience so sorry if I am asking something relatively stupid) unsafe? I mean, safe is safe, trusted is ??, system is you go your own. - So what are the guarantees of trusted compared to system? Also, as far as I understood from my limited D usage, only type[N] are static arrays on the stack and the rest are GC-allocated, by default, right? So in the presence of scope, probably that should be a dynamically sized array that was to be "freed" by the GC and invalid at the end of the function. I would assume a move can be done if the array is not static, independently of scope being there or not and an error if it is statically allocated, since the return type is type[] (without explicit size in the type).
Oct 26 2022
next sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Wednesday, 26 October 2022 at 10:43:11 UTC, German Diago wrote:
 On Wednesday, 26 October 2022 at 08:03:37 UTC, Walter Bright 
 wrote:

 A very good question. Clearly, having code work when it is 
  safe, but cause memory corruption when it is marked  trusted, 
 is the wrong solution. This should never happen. I'm not sure 
 what the solution should be here.
Is not trusted code (note my little D experience so sorry if I am asking something relatively stupid) unsafe? I mean, safe is safe, trusted is ??, system is you go your own.
safe: it's like a seat belt. You can take your children, who come to see their uncle at the weekend, by car with their seat belts. trusted: it's like an uncle who didn't crash with his tractor. You can take your children around the field with their uncles by tractor. We trust the uncle, but even if he did not have an accident, this 2nd situation is not safe. SDB 79
Oct 26 2022
prev sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Wednesday, 26 October 2022 at 10:43:11 UTC, German Diago wrote:
 Is not trusted code (note my little D experience so sorry if I 
 am asking something relatively stupid) unsafe? I mean,  safe is 
 safe,  trusted is ??,  system is you go your own.

 - So what are the guarantees of  trusted compared to  system?
A ` safe` function is guaranteed by the compiler to be memory safe to call from other ` safe` code with (almost) any possible arguments and under (almost) any circumstances. A ` trusted` function is guaranteed by its author to be memory safe to call from other ` safe` code with (almost) any possible arguments and under (almost) any circumstances. A ` system` function may require the caller to follow additional rules beyond those enforced by the compiler, even in ` safe` code, to maintain memory safety. Since the compiler does not know what these additional rules are and cannot enforce them automatically, calling ` system` functions directly from ` safe` code is forbidden. | Attribute | Must check definition | Must check each caller | |------------|-----------------------|------------------------| | ` safe` | compiler | compiler | | ` trusted` | programmer | compiler | | ` system` | programmer | programmer | Assume the function is implemented correctly, then try to figure out how to call the function from ` safe` code in a way that violates memory safety. If there is a way to do so, the function should be ` system`. Otherwise, it should be ` safe` if that compiles, or ` trusted` if not.
Oct 26 2022
parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Wednesday, 26 October 2022 at 20:24:38 UTC, tsbockman wrote:
 On Wednesday, 26 October 2022 at 10:43:11 UTC, German Diago 
 wrote:
 Is not trusted code (note my little D experience so sorry if I 
 am asking something relatively stupid) unsafe? I mean,  safe 
 is safe,  trusted is ??,  system is you go your own.

 - So what are the guarantees of  trusted compared to  system?
A ` safe` function is guaranteed by the compiler to be memory safe to call from other ` safe` code with (almost) any possible arguments and under (almost) any circumstances. A ` trusted` function is guaranteed by its author to be memory safe to call from other ` safe` code with (almost) any possible arguments and under (almost) any circumstances.
The “(almost)” should be absent. If you mean something other than compiler bugs, please tell us.
 A ` system` function may require the caller to follow 
 additional rules beyond those enforced by the compiler, even in 
 ` safe` code, to maintain memory safety. Since the compiler 
 does not know what these additional rules are and cannot 
 enforce them automatically, calling ` system` functions 
 directly from ` safe` code is forbidden.

 | Attribute  | Must check definition | Must check each caller |
 |------------|-----------------------|------------------------|
 | ` safe`    | compiler              | compiler               |
 | ` trusted` | programmer            | compiler               |
 | ` system`  | programmer            | programmer             |

 Assume the function is implemented correctly, then try to 
 figure out how to call the function from ` safe` code in a way 
 that violates memory safety. If there is a way to do so, the 
 function should be ` system`.

 Otherwise, it should be ` safe` if that compiles, or ` trusted` 
 if not.
I agree with the characterization of ` safe` and ` system`. For ` trusted` functions, there’s something more to say: * Widely accessible ones (e.g. `public`, `package`, `protected`, even `private` in a big module) should have a ` safe` interface, i.e. you can use them like ` safe` functions in all regards; they just aren’t ` safe` because of some implementation details. * Narrowly accessible ones (e.g. `private` (in a small module), local functions, immediately executed lambdas) can have a ` system` interface, but their surroundings can be trusted to use the function correctly.
Oct 27 2022
next sibling parent ag0aep6g <anonymous example.com> writes:
On 27.10.22 19:39, Quirin Schroll wrote:
 I agree with the characterization of ` safe` and ` system`. For 
 ` trusted` functions, there’s something more to say:
 * Widely accessible ones (e.g. `public`, `package`, `protected`, even 
 `private` in a big module) should have a ` safe` interface, i.e. you can 
 use them like ` safe` functions in all regards; they just aren’t ` safe` 
 because of some implementation details.
Every single trusted function must have a safe interface. That includes local functions and immediately called literals.
 * Narrowly accessible ones (e.g. `private` (in a small module), local 
 functions, immediately executed lambdas) can have a ` system` interface, 
 but their surroundings can be trusted to use the function correctly.
You say it yourself: In that case, the surroundings need to be trusted. The function that is being called can only be system when it doesn't have a safe interface.
Oct 27 2022
prev sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 27 October 2022 at 17:39:14 UTC, Quirin Schroll 
wrote:
 On Wednesday, 26 October 2022 at 20:24:38 UTC, tsbockman wrote:
 A ` safe` function is guaranteed by the compiler to be memory 
 safe to call from other ` safe` code with (almost) any 
 possible arguments and under (almost) any circumstances.

 A ` trusted` function is guaranteed by its author to be memory 
 safe to call from other ` safe` code with (almost) any 
 possible arguments and under (almost) any circumstances.
The “(almost)” should be absent. If you mean something other than compiler bugs, please tell us.
In practice, ` safe` code depends upon a guard page to catch `null` pointer dereferences. If a struct field or static array element is at a sufficiently large offset from the `null` pointer, this can theoretically result in a silent buffer overrun. As far as I can tell, this is not considered a bug, but rather a reasonable trade-off for improved performance. Also, doing anything at all is [officially undefined behavior](https://dlang.org/spec/expression.html#assert_expressions) after a failed assertion. This is, again, theoretically problematic because debug builds may call user code to prepare or log the `AssertError`. There are probably other obscure cases like these, as well, which ` safe` and ` trusted` functions are not responsible for handling correctly.
 I agree with the characterization of ` safe` and ` system`. For 
 ` trusted` functions, there’s something more to say:
 ...
 * Narrowly accessible ones (e.g. `private` (in a small module), 
 local functions, immediately executed lambdas) can have a 
 ` system` interface, but their surroundings can be trusted to 
 use the function correctly.
My characterization [agrees with the language spec](https://dlang.org/spec/function.html#trusted-functions), yours does not. You are essentially redefining ` trusted` to mean "ignore memory any memory safety issues here", instead of what it is actually intended to mean, "trust me, this is actually memory safe".
Oct 27 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 26 October 2022 at 08:03:37 UTC, Walter Bright 
wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it 
 corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
It's not quite exactly that. The code in question fails with ` safe`. The problem is that Steven's ` trusted` code not only happens to work, but is defined behaviour without dip1000, yet undefined behaviour with `-preview=dip1000`. My proposal: disable local variable `scope` inference for ` system` and ` trusted` code. This has the downside that it's difficult to test whether the implementation really turns the inference off. But unless we're ready to ditch `scope` inference altogether I can't come up with anything better.
Oct 26 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/26/22 8:49 AM, Dukc wrote:
 On Wednesday, 26 October 2022 at 08:03:37 UTC, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it corrupt 
 memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
It's not quite exactly that. The code in question fails with ` safe`. The problem is that Steven's ` trusted` code not only happens to work, but is defined behaviour without dip1000, yet undefined behaviour with `-preview=dip1000`.
Yes, maybe. I don't know if it's UB, because I don't know the rules/philosophy.
 
 My proposal: disable local variable `scope` inference for ` system` and 
 ` trusted` code. This has the downside that it's difficult to test 
 whether the implementation really turns the inference off. But unless 
 we're ready to ditch `scope` inference altogether I can't come up with 
 anything better.
This is a possibility. I don't know the consequences of this, especially for template code. -Steve
Oct 26 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/26/22 4:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it corrupt 
 memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
I should be clear here -- the code does *not* compile in safe code, but is perfectly reasonable as trusted code. What I don't want is the compiler taking actions based on scope inference that cause memory corruption. I get that we can say "if it wouldn't compile in safe, it's on you to make sure it doesn't corrupt memory as trusted". But if the reason it's unsafe is not because of things you wrote, but because of compiler inference (as in this case), then the compiler should either not do the inference, or not hoist allocations to the stack based on that inference. A philosophy/statement to that effect should be satisfactory. The last thing we want dip1000 to do is *cause* memory corruption. -Steve
Oct 26 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/26/2022 7:38 AM, Steven Schveighoffer wrote:
 On 10/26/22 4:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is perfectly 
 reasonable and not invalid. Will dip1000 make it corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
I should be clear here
I understood the issue <g>.
 The last thing we 
 want dip1000 to do is *cause* memory corruption.
We're in full agreement here.
Oct 26 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/26/2022 1:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is perfectly 
 reasonable and not invalid. Will dip1000 make it corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
[Some more thinking about the problem] The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap? Some points: 1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did 2. allocating on the heap means it is unusable in nogc code 3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient 4. it runs against the idea that the simpler code should be more efficient than the complex code Therefore, I suggest the following: [1,2,3] is always allocated on the stack [1,2,3].dup is always allocated on the heap and thus, its behavior is not dependent on inference. How we transition to this, we'll have to figure out.
Oct 26 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/26/22 8:57 PM, Walter Bright wrote:
 On 10/26/2022 1:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it corrupt 
 memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
[Some more thinking about the problem] The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap? Some points: 1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did 2. allocating on the heap means it is unusable in nogc code 3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient 4. it runs against the idea that the simpler code should be more efficient than the complex code Therefore, I suggest the following:     [1,2,3] is always allocated on the stack     [1,2,3].dup is always allocated on the heap and thus, its behavior is not dependent on inference. How we transition to this, we'll have to figure out.
Please no! We can allocate on the stack by explicitly requesting it: ```d int[3] = [1, 2, 3]; ``` The issue is the DRYness of it. This has been proposed before, just: ```d int[$] = [1, 2, 3]; ``` If we are going to fix something, let's fix this! It's backwards compatible too. If anything, the compiler can just punt and say all array literals that aren't immediately assigned to static arrays are allocated on the heap. Then it's consistent. Allocating array literals on the heap is *awesome*, please don't change that! D is one of the best learning languages for high-performance code because you don't have to worry at all about memory management out of the box. I'm actually OK with backends using stack allocations because it can prove they aren't escaping, why can't we just rely on that? -Steve
Oct 26 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/26/2022 6:26 PM, Steven Schveighoffer wrote:
 Please no! We can allocate on the stack by explicitly requesting it:
 
 ```d
 int[3] = [1, 2, 3];
 ```
 
 The issue is the DRYness of it. This has been proposed before, just:
 
 ```d
 int[$] = [1, 2, 3];
 ```
How would this be done: foo([1,2,3] + a) i.e. using an array literal in places other than an initialization?
 If we are going to fix something, let's fix this! It's backwards compatible
too.
 
 If anything, the compiler can just punt and say all array literals that aren't 
 immediately assigned to static arrays are allocated on the heap. Then it's 
 consistent.
And inefficient.
 Allocating array literals on the heap is *awesome*, please don't change that!
D 
 is one of the best learning languages for high-performance code because you 
 don't have to worry at all about memory management out of the box. I'm
actually 
 OK with backends using stack allocations because it can prove they aren't 
 escaping, why can't we just rely on that?
I thought your test case showed the problem with that :-/
Oct 27 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/27/22 9:44 AM, Walter Bright wrote:
 On 10/26/2022 6:26 PM, Steven Schveighoffer wrote:
 Please no! We can allocate on the stack by explicitly requesting it:

 ```d
 int[3] = [1, 2, 3];
 ```

 The issue is the DRYness of it. This has been proposed before, just:

 ```d
 int[$] = [1, 2, 3];
 ```
How would this be done:     foo([1,2,3] + a)
Already works today, except I don't know what the + a means: foo([1, 2, 3].staticArray);
 If we are going to fix something, let's fix this! It's backwards 
 compatible too.

 If anything, the compiler can just punt and say all array literals 
 that aren't immediately assigned to static arrays are allocated on the 
 heap. Then it's consistent.
And inefficient.
Inefficiencies that are taken care of by modern backends, such as llvm and gcc.
 Allocating array literals on the heap is *awesome*, please don't 
 change that! D is one of the best learning languages for 
 high-performance code because you don't have to worry at all about 
 memory management out of the box. I'm actually OK with backends using 
 stack allocations because it can prove they aren't escaping, why can't 
 we just rely on that?
I thought your test case showed the problem with that :-/
Backends that put it on the stack are not using language constructs such as scope to make assumptions, they are using actual analysis of the control flow to prove that it doesn't escape. -Steve
Oct 27 2022
prev sibling next sibling parent reply German Diago <germandiago gmail.com> writes:
On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:
 On 10/26/2022 1:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above is 
 perfectly reasonable and not invalid. Will dip1000 make it 
 corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
[Some more thinking about the problem] The question is when is [1,2,3] allocated on the stack, and when is it allocated on the GC heap? Some points: 1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did 2. allocating on the heap means it is unusable in nogc code 3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient 4. it runs against the idea that the simpler code should be more efficient than the complex code Therefore, I suggest the following: [1,2,3] is always allocated on the stack [1,2,3].dup is always allocated on the heap
As a person who has used D but not extensively, I was suprised of type[] vs type[N] behavior all the time. I agree that [1, 2, 3] should allocate in the stack but I am not sure how much code that could break? For example, if before it was on the heap, what happens with this now? int [] func() { // Allocated in the stack, I presume that not safe, should add .dup? int[] v = [1, 2, 3]; return v; } How it should work?
Oct 27 2022
next sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 27 October 2022 at 09:36:25 UTC, German Diago wrote:
 On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright 
 wrote:
 On 10/26/2022 1:03 AM, Walter Bright wrote:
 On 10/24/2022 6:35 PM, Steven Schveighoffer wrote:
 In a ` trusted` function today, without dip1000, the above 
 is perfectly reasonable and not invalid. Will dip1000 make 
 it corrupt memory?
A very good question. Clearly, having code work when it is safe, but cause memory corruption when it is marked trusted, is the wrong solution. This should never happen. I'm not sure what the solution should be here.
[Some more thinking about the problem] The question is when is `[1,2,3]` allocated on the stack, and when is it allocated on the GC heap? Some points: 1. in C it is allocated on the stack. D's behavior to allocate it on the heap is kinda surprising in that light, even though D had such literals before C did 2. allocating on the heap means it is unusable in ` nogc` code 3. when writing expressions, the only way to get it on the stack is to assign it to a scope variable, which is inconvenient and inefficient 4. it runs against the idea that the simpler code should be more efficient than the complex code Therefore, I suggest the following: ```d [1,2,3] // is always allocated on the stack [1,2,3].dup // is always allocated on the heap ```
As a person who has used D but not extensively, I was suprised of `type[]` vs `type[N]` behavior all the time. I agree that `[1, 2, 3]` should allocate in the stack but I am not sure how much code that could break? For example, if before it was on the heap, what happens with this now? ```d int[] func() { // Allocated in the stack, I presume that not safe, should add .dup? int[] v = [1, 2, 3]; return v; } ``` How it should work?
If `[1, 2, 3]` is stack allocated, it should not compile (at least not in ` safe` code, probably not in ` system` code either). The problem is not the assignment to `v` (that is of the same kind as a pointer to a local variable), but that its value is returned and thus leaking the address of a local.
Oct 27 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 10/27/2022 2:36 AM, German Diago wrote:
 As a person who has used D but not extensively, I was suprised of type[] vs 
 type[N] behavior all the time. I agree that [1, 2, 3] should allocate in the 
 stack but I am not sure how much code that could break? For example, if before 
 it was on the heap, what happens with this now?
You'll get an error on [1]
 int [] func() {
    // Allocated in the stack, I presume that not safe, should add .dup?
    int[] v  = [1, 2, 3];
    return v;  [1]
 }
 
 How it should work?
Add .dup for those that need the array to survive the function.
Oct 27 2022
prev sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:
     [1,2,3] is always allocated on the stack
Why isnt this an immutable constant just like a string literal?
Oct 27 2022
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Thursday, 27 October 2022 at 00:57:47 UTC, Walter Bright wrote:
 Therefore, I suggest the following:

     [1,2,3] is always allocated on the stack
Please no. Far too much breakage for the value (even without going to the question whether it'd be added value in the first place).
 2. allocating on the heap means it is unusable in  nogc code
The compiler will error, and the programmer can manually fix it. No silent errors. ` nogc` code is still a bit of a special case, GC-using code is the normal we want to optimise the language for.
 3. when writing expressions, the only way to get it on the 
 stack is to assign it to a scope variable, which is 
 inconvenient and inefficient
The compiler is still free to optimise those as a stack allocation, if it can prove there's no escaping of the data. `scope` is just used to enforce that being the case in ` safe`, or giving the compiler the permission to assume that being the case in ` trusted` and ` system`.
Oct 27 2022