www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - DIP1000: Memory Safety in a Modern System Programming Language Pt.1

reply Mike Parker <aldacron gmail.com> writes:
Ate Eskola was inspired to write a series of tutorials about 
DIP1000 for the D Blog. The first post in the series is live. If 
you haven't yet dug into DIP1000 much or understood how to use 
it, this should give you enough to get started.

The blog:
https://dlang.org/blog/2022/06/21/dip1000-memory-safety-in-a-modern-system-programming-language-pt-1/

Reddit:
https://www.reddit.com/r/programming/comments/vhfd28/memory_safety_in_a_modern_system_programming/
Jun 21 2022
next sibling parent reply StarCanopy <starcanopy protonmail.com> writes:
On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:
 [...]
```d int[5] stackData = [-1, -2, -3, -4, -5]; // Lifetime of stackData2 ends // before limitedRef, so this is // disallowed. limitedRef = stackData[]; ``` In the above example, `stackData2` seems to be a typo.
Jun 21 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 21 June 2022 at 22:55:56 UTC, StarCanopy wrote:
 On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:
 [...]
```d int[5] stackData = [-1, -2, -3, -4, -5]; // Lifetime of stackData2 ends // before limitedRef, so this is // disallowed. limitedRef = stackData[]; ``` In the above example, `stackData2` seems to be a typo.
Thanks, you're right. Missed that when editing.
Jun 21 2022
parent ezneh <petitv.isat gmail.com> writes:
On Wednesday, 22 June 2022 at 06:48:34 UTC, Dukc wrote:
 On Tuesday, 21 June 2022 at 22:55:56 UTC, StarCanopy wrote:
 On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:
 [...]
```d int[5] stackData = [-1, -2, -3, -4, -5]; // Lifetime of stackData2 ends // before limitedRef, so this is // disallowed. limitedRef = stackData[]; ``` In the above example, `stackData2` seems to be a typo.
Thanks, you're right. Missed that when editing.
Other typo: ```, as that dcoument is what ```
Jun 22 2022
prev sibling next sibling parent reply zjh <fqbqrr 163.com> writes:
On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:

Good article!
Jun 22 2022
parent reply zjh <fqbqrr 163.com> writes:
On Wednesday, 22 June 2022 at 07:15:34 UTC, zjh wrote:
 On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:

 Good article!
[chinese version](https://fqbqrr.blog.csdn.net/article/details/125409915)
Jun 24 2022
parent Dukc <ajieskola gmail.com> writes:
On Saturday, 25 June 2022 at 02:03:01 UTC, zjh wrote:
 On Wednesday, 22 June 2022 at 07:15:34 UTC, zjh wrote:
 On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:

 Good article!
[chinese version](https://fqbqrr.blog.csdn.net/article/details/125409915)
Wow, thanks!
Jun 25 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:
 The blog:
 https://dlang.org/blog/2022/06/21/dip1000-memory-safety-in-a-modern-system-programming-language-pt-1/
Now on 26. place at Hacker News.
Jun 22 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 22 June 2022 at 19:09:28 UTC, Dukc wrote:
 On Tuesday, 21 June 2022 at 15:05:46 UTC, Mike Parker wrote:
 The blog:
 https://dlang.org/blog/2022/06/21/dip1000-memory-safety-in-a-modern-system-programming-language-pt-1/
Now on 26. place at Hacker News.
This was a nice presentation, if there will be a follow up then maybe create examples with a main and a button for «run this» that will show it in run.dlang.org? I suspect some readers will think TLDR when faced with longer blog posts, and just look at the examples (hence the show-don't-tell principle).
Jun 22 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/21/22 11:05 AM, Mike Parker wrote:
 Ate Eskola was inspired to write a series of tutorials about DIP1000 for 
 the D Blog. The first post in the series is live. If you haven't yet dug 
 into DIP1000 much or understood how to use it, this should give you 
 enough to get started.
 
 The blog:
 https://dlang.org/blog/2022/06/21/dip1000-memory-safety-in-a-modern-system-progra
ming-language-pt-1/ 
 
 
 Reddit:
 https://www.reddit.com/r/programming/comments/vhfd28/memory_safety_in_a_modern
system_programming/ 
 
Dip1000's point is starting to seep in. I still think it's going to be a challenge for people new to D (not just us old-timers). But... The part about `scope` being shallow. This is a problem. ```d scope a = "first"; scope b = "second"; string[] arr = [a, b]; // invalid regardless of attributes in safe code ``` Sometimes algorithms require manipulation of structure, such as sorting arrays, or using linked lists, and sometimes it's nice to be able to point at things on the stack, temporarily. This is one of the things I was looking forward to with dip1000, since it does allow pointing at the stack when it can work out the details. Is there any plan to address this other than "just use ` system`"? -Steve
Jun 22 2022
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer 
wrote:
 The part about `scope` being shallow. This is a problem.
One thing that will be confusing to most users is that it appears to be using "taint" rather than proper flow analysis on the pointed-to-object? ```d int* test(int arg1, int arg2) { int* p = null; p = &arg1; p = new int(5); return p; // complains about p being scope } ```
Jun 22 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/22/22 5:07 PM, Ola Fosheim Grøstad wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer wrote:
 The part about `scope` being shallow. This is a problem.
One thing that will be confusing to most users is that it appears to be using "taint" rather than proper flow analysis on the pointed-to-object? ```d int* test(int arg1, int arg2) {     int* p = null;     p = &arg1;     p = new int(5);     return p;  // complains about p being scope } ```
The other option is to complain about the assignment of &arg to p. That might be a better answer. At least it's *understandable*, and not sneaky. Full flow analysis will be defeatable by more complex situations: ```d int *p = null; if(alwaysEvaluateToFalse()) p = &arg; else p = new int(5); return p; ``` That would take a lot of effort just to prove it shouldn't be scope. -Steve
Jun 22 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 22 June 2022 at 21:20:33 UTC, Steven Schveighoffer 
wrote:
 Full flow analysis will be defeatable by more complex 
 situations:

 ```d
 int *p = null;
 if(alwaysEvaluateToFalse()) p = &arg;
 else p = new int(5);
 return p;
 ```

 That would take a lot of effort just to prove it shouldn't be 
 scope.
I guess this is the wrong forum, but two quick points. Some C programmers reuse variables extensively, those programmers will be confused or annoyed. The analysis can be done after an optimization pass, so at least the simple cases go through smoothly.
Jun 22 2022
parent Dom Disc <dominikus scherkl.de> writes:
On Wednesday, 22 June 2022 at 21:58:07 UTC, Ola Fosheim Grøstad 
wrote:
 Some C programmers reuse variables extensively, those 
 programmers will be confused or annoyed.
And rightly so. Misra says since 30 years or longer: don't reuse variables if possible (and it should almost always be possible). If there exists now another way to shoot in your foot with this bad habit, so what?
Jun 23 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 22 June 2022 at 21:07:50 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven 
 Schveighoffer wrote:
 The part about `scope` being shallow. This is a problem.
One thing that will be confusing to most users is that it appears to be using "taint" rather than proper flow analysis on the pointed-to-object? ```d int* test(int arg1, int arg2) { int* p = null; p = &arg1; p = new int(5); return p; // complains about p being scope } ```
I'd personally prefer if variable `scope` auto-inference worked only in the declaration, not later assignments. I guess the intention is to break less existing code. Your solution would break even less, but it'd mean the language rules depend on flow analysis. Because the rules are now "official", probably best to leave them as is to avoid confusion.
Jun 22 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/22/22 5:44 PM, Dukc wrote:
 On Wednesday, 22 June 2022 at 21:07:50 UTC, Ola Fosheim Grøstad wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer wrote:
 The part about `scope` being shallow. This is a problem.
One thing that will be confusing to most users is that it appears to be using "taint" rather than proper flow analysis on the pointed-to-object? ```d int* test(int arg1, int arg2) {     int* p = null;     p = &arg1;     p = new int(5);     return p;  // complains about p being scope } ```
I'd personally prefer if variable `scope` auto-inference worked only in the declaration, not later assignments. I guess the intention is to break less existing code.
I think this is the better option. Either that, or that when it returns `p` that trumps any possible `scope` inference. Imagine you have a function like this: ```d int foo() { int x = 0; x = long.max; x = 2; return x; } ``` Now today, this causes an error on the assignment to `long.max`, because obviously `x` is an int. But what if, instead, the compiler decides to backtrack and say "actually, if I make x a `long`, then it works!", and *now*, at the end, says "Oh, actually, you can't return a long as an int, what were you thinking?!" This is the equivalent here, you declare something *without* scope, assign it to something that is *not* scope, and then because sometime later you assigned it to something that *is* scope, it goes back and rewrites the declaration as if you did make it scope, and then complains to you that the magic trick it tried is not valid. This is going to be one of the most confusing features of DIP1000. -Steve
Jun 22 2022
next sibling parent Johan <j j.nl> writes:
On Thursday, 23 June 2022 at 00:45:09 UTC, Steven Schveighoffer 
wrote:
 On 6/22/22 5:44 PM, Dukc wrote:
 On Wednesday, 22 June 2022 at 21:07:50 UTC, Ola Fosheim 
 Grøstad wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven 
 Schveighoffer wrote:
 The part about `scope` being shallow. This is a problem.
One thing that will be confusing to most users is that it appears to be using "taint" rather than proper flow analysis on the pointed-to-object? ```d int* test(int arg1, int arg2) {     int* p = null;     p = &arg1;     p = new int(5);     return p;  // complains about p being scope } ```
I'd personally prefer if variable `scope` auto-inference worked only in the declaration, not later assignments.
I agree.
 I think this is the better option. Either that, or that when it 
 returns `p` that trumps any possible `scope` inference.

 Imagine you have a function like this:

 ```d
 int foo()
 {
    int x = 0;
    x = long.max;
    x = 2;
    return x;
 }
 ```

 Now today, this causes an error on the assignment to 
 `long.max`, because obviously `x` is an int. But what if, 
 instead, the compiler decides to backtrack and say "actually, 
 if I make x a `long`, then it works!", and *now*, at the end, 
 says "Oh, actually, you can't return a long as an int, what 
 were you thinking?!"

 This is the equivalent here, you declare something *without* 
 scope, assign it to something that is *not* scope, and then 
 because sometime later you assigned it to something that *is* 
 scope, it goes back and rewrites the declaration as if you did 
 make it scope, and then complains to you that the magic trick 
 it tried is not valid.

 This is going to be one of the most confusing features of 
 DIP1000.
Plus that the error message just sounds like a compiler bug. It says that variable `p` is `scope`, but nowhere in the source does it say that. The error message should say that assignment `p = &arg1;` makes the variable `scope`. (An alternative is to give a warning on `p = &arg1;`.) -Johan
Jun 22 2022
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 23 June 2022 at 00:45:09 UTC, Steven Schveighoffer 
wrote:
 I think this is the better option. Either that, or that when it 
 returns `p` that trumps any possible `scope` inference.

 Imagine you have a function like this:

 ```d
 int foo()
 {
    int x = 0;
    x = long.max;
    x = 2;
    return x;
 }
 ```

 Now today, this causes an error on the assignment to 
 `long.max`, because obviously `x` is an int. But what if, 
 instead, the compiler decides to backtrack and say "actually, 
 if I make x a `long`, then it works!", and *now*, at the end, 
 says "Oh, actually, you can't return a long as an int, what 
 were you thinking?!"

 This is the equivalent here, you declare something *without* 
 scope, assign it to something that is *not* scope, and then 
 because sometime later you assigned it to something that *is* 
 scope, it goes back and rewrites the declaration as if you did 
 make it scope, and then complains to you that the magic trick 
 it tried is not valid.

 This is going to be one of the most confusing features of 
 DIP1000.
It is confusing because it introduces flow typing without having flow typing. So this is messing up the user’s mental model of the type system, this is a basic usability flaw. Track the object instead and don’t change the type of the pointer to scope. If D wants to do flow typing, do it properly and make it clear to the user. It would be a good feature to have, but it would become D3.
Jun 22 2022
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 23 June 2022 at 06:36:23 UTC, Ola Fosheim Grøstad 
wrote:
 Track the object instead and don’t change the type of the 
 pointer to scope.
I guess this is flow typing too, but it is less intrusive to say that the object is either of type «scope» or type «heap» and that regular pointers can hold both than to change the concrete pointer type. Specified concrete types should not change.
Jun 22 2022
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Thursday, 23 June 2022 at 06:52:48 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 23 June 2022 at 06:36:23 UTC, Ola Fosheim Grøstad 
 wrote:
 Track the object instead and don’t change the type of the 
 pointer to scope.
I guess this is flow typing too, but it is less intrusive to say that the object is either of type «scope» or type «heap» and that regular pointers can hold both than to change the concrete pointer type. Specified concrete types should not change.
For people interested in getting more intuition for flow typing: https://www.typescriptlang.org/docs/handbook/2/narrowing.html or chapter 3: https://whiley.org/pdfs/GettingStartedWithWhiley.pdf
Jun 23 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer 
wrote:
 On 6/21/22 11:05 AM, Mike Parker wrote:
 The part about `scope` being shallow. This is a problem.

 Is there any plan to address this other than "just use 
 ` system`"?

 -Steve
I think a custom `struct` containing a GC:d slice of `scope` arrays could be done. The struct itself needs ` system` code of course but could be ` safe` from outwards perspective, unless there's some issue I haven't thought of. A relatively quick-and-dirty solution would be to use a static array for this, if you know some upper size the array can't exceed. Kinda cheap but probably better than ` system`.
Jun 22 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/22/22 5:58 PM, Dukc wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer wrote:
 On 6/21/22 11:05 AM, Mike Parker wrote:
 The part about `scope` being shallow. This is a problem.

 Is there any plan to address this other than "just use ` system`"?
I think a custom `struct` containing a GC:d slice of `scope` arrays could be done. The struct itself needs ` system` code of course but could be ` safe` from outwards perspective, unless there's some issue I haven't thought of. A relatively quick-and-dirty solution would be to use a static array for this, if you know some upper size the array can't exceed. Kinda cheap but probably better than ` system`.
You mean like a system function which removes the scope-ness of an array? Let me introduce you to my other thread: https://forum.dlang.org/thread/t7qd45$1lrb$1 digitalmars.com -Steve
Jun 22 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Thursday, 23 June 2022 at 00:37:24 UTC, Steven Schveighoffer 
wrote:
 You mean like a system function which removes the scope-ness of 
 an array? Let me introduce you to my other thread: 
 https://forum.dlang.org/thread/t7qd45$1lrb$1 digitalmars.com

 -Steve
You are allowed to remove `scope` from an argument in unsafe code. It's only if you escape that argument when you trigger undefined behaviour. Just like you can cast away `const`, if you don't actually mutate the cast variable.
Jun 23 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/23/22 8:01 AM, Dukc wrote:
 On Thursday, 23 June 2022 at 00:37:24 UTC, Steven Schveighoffer wrote:
 You mean like a system function which removes the scope-ness of an 
 array? Let me introduce you to my other thread: 
 https://forum.dlang.org/thread/t7qd45$1lrb$1 digitalmars.com
You are allowed to remove `scope` from an argument in unsafe code. It's only if you escape that argument when you trigger undefined behaviour. Just like you can cast away `const`, if you don't actually mutate the cast variable.
And what do you think a custom struct that circumvents the scopeness is going to do with that parameter? To be clear, I think we're talking about something like: ```d struct ScopeArray(T) { T[] arr; system void opAssign(scope T[] param) { arr = param; // escaping scope } } ``` -Steve
Jun 23 2022
parent Dukc <ajieskola gmail.com> writes:
On Thursday, 23 June 2022 at 14:08:15 UTC, Steven Schveighoffer 
wrote:
 On 6/23/22 8:01 AM, Dukc wrote:
 On Thursday, 23 June 2022 at 00:37:24 UTC, Steven 
 Schveighoffer wrote:
 You mean like a system function which removes the scope-ness 
 of an array? Let me introduce you to my other thread: 
 https://forum.dlang.org/thread/t7qd45$1lrb$1 digitalmars.com
You are allowed to remove `scope` from an argument in unsafe code. It's only if you escape that argument when you trigger undefined behaviour. Just like you can cast away `const`, if you don't actually mutate the cast variable.
And what do you think a custom struct that circumvents the scopeness is going to do with that parameter? To be clear, I think we're talking about something like: ```d struct ScopeArray(T) { T[] arr; system void opAssign(scope T[] param) { arr = param; // escaping scope } } ``` -Steve
By `return scope`. I'm not sure if it'd work with that `opAssign` example since we don't actually return the `this` pointer. I need to recheck what `return scope` means with `void` return type, before I write the next article - There is or at least was some rule regarding that situation. But in any case you could accomplish the same with ```D safe ScopeArray!T(T)(return scope T[] param) { return ScopeArray!T(param); } ``` . Note that in this case you are not even trying to store `scope` variables as elements to an array, so we don't need ` trusted`.
Jun 23 2022
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer 
wrote:
 Sometimes algorithms require manipulation of structure, such as 
 sorting arrays, or using linked lists, and sometimes it's nice 
 to be able to point at things on the stack, temporarily. This 
 is one of the things I was looking forward to with dip1000, 
 since it does allow pointing at the stack when it can work out 
 the details.
This works: ``` struct S { int[] a; int[] get() return scope safe { return a; } void set(return int[] b) return scope safe { a=b; } } int[] f() safe { int[2] a; scope S t; int[] b=t.get; t.set=a; return b; //no } ```
Jun 23 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/23/22 8:14 AM, Kagamin wrote:
 On Wednesday, 22 June 2022 at 20:48:13 UTC, Steven Schveighoffer wrote:
 Sometimes algorithms require manipulation of structure, such as 
 sorting arrays, or using linked lists, and sometimes it's nice to be 
 able to point at things on the stack, temporarily. This is one of the 
 things I was looking forward to with dip1000, since it does allow 
 pointing at the stack when it can work out the details.
This works: ``` struct S {     int[] a;     int[] get() return scope safe { return a; }     void set(return int[] b) return scope safe     { a=b; } } int[] f() safe {     int[2] a;     scope S t;     int[] b=t.get;     t.set=a;     return b; //no } ```
This is just saying the same thing -- you must allocate your data on the stack in order to have scope elements assignable. This isn't always feasible. ```d scope a = "first"; scope b = "second"; string[2] x = [a, b]; auto arr = x[]; // ok arr = ["first", "second"]; // ok arr = [a, b]; // not ok ``` It's established that arr is attributed such that it's able to point at stack data. Fine. It can also point at allocated data. Fine. But the *allocated data itself* cannot point at scope data. This "somewhat" makes sense, because by the time the GC cleans up this array, the data is likely invalid (not in this case, but that's because we deliberately labeled non-scope data as scope). However, you may need to mix both non-scope and scope data, which means you can't allocate on the stack. But there can be ways to mitigate this: 1. For this simple case, just allocate [a, b] on the stack. It's happening in other places (see my other thread), why not here? 2. The compiler could insert a call at the end of the scope to free the allocated data. It can't escape anyway, and freeing it early is part of the benefit of having scope. Or if it's determined that no destructors are involved, just let the GC clean it up. The more cases where we make this painless for the user, the better. -Steve
Jun 23 2022