www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - dip1000 and preview in combine to cause extra safety errors

reply Steven Schveighoffer <schveiguy gmail.com> writes:
```d
string foo(in string s)
{
     return s;
}

void main()
{
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
}
```

With no previews, preview=dip1000, or preview=in, this outputs: `["h", 
"e", "l", "l", "o"]`

With both preview=dip1000 and preview=in, this outputs: `["o", "o", "o", 
"o", "o"]`

What is happening is the compiler is somehow convinced that it can 
allocate the array literal on the stack (and overwrites that literal 
each loop).

I know this isn't ` safe` code, but ` system` code shouldn't be made 
less safe by the preview switches!

I know people write `in` instead of `const` all the time *simply because 
it's shorter*.

Thoughts?

-Steve
Jun 08 2022
next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven Schveighoffer 
wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```

 Thoughts?
This is simply the result of using `in` wrong. `in` means `const scope`. `scope` (without preceeding `return`) means you won't return a reference to the address to the argument (unless the function can reach it via some other channel). Result: undefined behaviour.
Jun 08 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 11:10 AM, Dukc wrote:
 On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven Schveighoffer wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```

 Thoughts?
This is simply the result of using `in` wrong. `in` means `const scope`. `scope` (without preceeding `return`) means you won't return a reference to the address to the argument (unless the function can reach it via some other channel). Result: undefined behaviour.
So silently changing behavior to create new dangling pointers with a preview switch is ok? Remember, there is already code that does this. It's not trying to be clever via scope, it's not trying to be ` safe`, it's expecting that an array literal is allocated on the GC (as has always been the case). -Steve
Jun 08 2022
next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 8 June 2022 at 15:35:56 UTC, Steven Schveighoffer 
wrote:
 On 6/8/22 11:10 AM, Dukc wrote:
 On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven 
 Schveighoffer wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```

 Thoughts?
This is simply the result of using `in` wrong. `in` means `const scope`. `scope` (without preceeding `return`) means you won't return a reference to the address to the argument (unless the function can reach it via some other channel). Result: undefined behaviour.
So silently changing behavior to create new dangling pointers with a preview switch is ok? Remember, there is already code that does this. It's not trying to be clever via scope, it's not trying to be ` safe`, it's expecting that an array literal is allocated on the GC (as has always been the case). -Steve
The preview switch is changing the meaning of `in` which changes the signature of `foo` (which is then inconsistent with the implementation), which in turn will affect the call sites. This seems roughly as expected, no?
Jun 08 2022
next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 8 June 2022 at 15:58:10 UTC, John Colvin wrote:
 The preview switch is changing the meaning of `in` which 
 changes the signature of `foo` (which is then inconsistent with 
 the implementation), which in turn will affect the call sites. 
 This seems roughly as expected, no?
E.g. this prints `["o", "o", "o", "o", "o"]` regardless of compiler flags string foo(scope string s) { return s; } void main() { import std.stdio; string[] result; foreach(c; "hello") { result ~= foo([c]); } writeln(result); }
Jun 08 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 16:01:55 UTC, John Colvin wrote:
 On Wednesday, 8 June 2022 at 15:58:10 UTC, John Colvin wrote:
 The preview switch is changing the meaning of `in` which 
 changes the signature of `foo` (which is then inconsistent 
 with the implementation), which in turn will affect the call 
 sites. This seems roughly as expected, no?
E.g. this prints `["o", "o", "o", "o", "o"]` regardless of compiler flags string foo(scope string s) { return s; } void main() { import std.stdio; string[] result; foreach(c; "hello") { result ~= foo([c]); } writeln(result); }
There is no frame of reference in which this result is in any way reasonable.
Jun 08 2022
next sibling parent Tejas <notrealemail gmail.com> writes:
On Wednesday, 8 June 2022 at 16:16:37 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 16:01:55 UTC, John Colvin wrote:
 On Wednesday, 8 June 2022 at 15:58:10 UTC, John Colvin wrote:
 [...]
E.g. this prints `["o", "o", "o", "o", "o"]` regardless of compiler flags string foo(scope string s) { return s; } void main() { import std.stdio; string[] result; foreach(c; "hello") { result ~= foo([c]); } writeln(result); }
There is no frame of reference in which this result is in any way reasonable.
Yeah, shouldn't the compiler complain that `s` is trying to escape the scope of `foo`?
Jun 08 2022
prev sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 8 June 2022 at 16:16:37 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 16:01:55 UTC, John Colvin wrote:
 E.g. this prints `["o", "o", "o", "o", "o"]` regardless of 
 compiler flags

     string foo(scope string s)
     {
         return s;
     }

     void main()
     {
         import std.stdio;
         string[] result;
         foreach(c; "hello")
         {
             result ~= foo([c]);
         }
         writeln(result);
     }
There is no frame of reference in which this result is in any way reasonable.
My guess is that technically `foo` has undefined behaviour.
Jun 08 2022
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 16:32:25 UTC, John Colvin wrote:
 There is no frame of reference in which this result is in any 
 way reasonable.
My guess is that technically `foo` has undefined behaviour.
Sure, but that also mean it could format your hard drive, and it'd be hard to argue this is reasonable. If the compiler understands of what's going on to decide it can recycle the memory, it understands enough to tell you you are using it after freeing and if it cannot, then it shouldn't do it. In this case specifically, assuming the compiler see the memory doesn't escape and promoting [c] on stack, it should still do the right thing. That means the compiler is somehow getting out of its way to break the code. That doesn't sound reasonable, no matter how you slice it.
Jun 08 2022
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 8 June 2022 at 16:58:41 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 16:32:25 UTC, John Colvin wrote:
 There is no frame of reference in which this result is in any 
 way reasonable.
My guess is that technically `foo` has undefined behaviour.
Sure, but that also mean it could format your hard drive, and it'd be hard to argue this is reasonable. If the compiler understands of what's going on to decide it can recycle the memory, it understands enough to tell you you are using it after freeing and if it cannot, then it shouldn't do it. In this case specifically, assuming the compiler see the memory doesn't escape and promoting [c] on stack, it should still do the right thing. That means the compiler is somehow getting out of its way to break the code. That doesn't sound reasonable, no matter how you slice it.
The compiler is going “you told me `foo` doesn’t leak references to the string passed to it, I believe you. Based on that, this temporary array is safe to put on the stack”. I think it’s reasonable for the compiler to lean on `scope` like this. The problem is `foo` and whether the compiler should somehow prevent the inconsistency between the signature and implementation. Obviously the answer is “yes, ideally”, but in practice with safe, system, dip1000, live and so on it’s all a mess.
Jun 08 2022
next sibling parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 The compiler is going “you told me `foo` doesn’t leak 
 references to the string passed to it, I believe you. Based on 
 that, this temporary array is safe to put on the stack”. I 
 think it’s reasonable for the compiler to lean on `scope` like 
 this.
Never mind, result is a `string[]` so yes, it's somewhat expected.
Jun 08 2022
prev sibling next sibling parent reply claptrap <clap trap.com> writes:
On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 On Wednesday, 8 June 2022 at 16:58:41 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 16:32:25 UTC, John Colvin wrote:
The problem is `foo` and whether the compiler should somehow prevent the inconsistency between the signature and implementation. Obviously the answer is “yes, ideally”, but in practice with safe, system, dip1000, live and so on it’s all a mess.
Isn't the DIP process supposed to catch these sorts of things?
Jun 08 2022
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 09/06/2022 7:18 AM, claptrap wrote:
 Isn't the DIP process supposed to catch these sorts of things?
DIP1000 was the first DIP to go through the new system. But even then, we are still having to change how it works many years after the fact due to the complexities involved.
Jun 08 2022
parent claptrap <clap trap.com> writes:
On Wednesday, 8 June 2022 at 19:21:56 UTC, rikki cattermole wrote:
 On 09/06/2022 7:18 AM, claptrap wrote:
 Isn't the DIP process supposed to catch these sorts of things?
DIP1000 was the first DIP to go through the new system. But even then, we are still having to change how it works many years after the fact due to the complexities involved.
Sorry I was being facetious... DIP1000 : "This DIP did not complete the review process" But we're getting it anyway.... Oh wait nobody seems to really understand it, even the gurus cant seem to agree how or how it should it work. But we're getting it anyway.... Oh wait weird stuff is happening But we're getting it anyway.... Not directed at you FWIW, its just a joke that the first DIP through the new "lets add more rigour to the process by which we add things to D" process, failed to pass, and yet we're getting it anyway.
Jun 09 2022
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 19:18:20 UTC, claptrap wrote:
 On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 On Wednesday, 8 June 2022 at 16:58:41 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 16:32:25 UTC, John Colvin wrote:
The problem is `foo` and whether the compiler should somehow prevent the inconsistency between the signature and implementation. Obviously the answer is “yes, ideally”, but in practice with safe, system, dip1000, live and so on it’s all a mess.
Isn't the DIP process supposed to catch these sorts of things?
It did.
Jun 08 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 12:18 PM, claptrap wrote:
 Isn't the DIP process supposed to catch these sorts of things?
It does catch those sorts of things. But the checks do not happen for system code, and the code snippet is system code. I submitted a DIP to make safe the default (rather than system) but it was rejected. This is the result.
Jun 08 2022
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Jun 08, 2022 at 04:21:33PM -0700, Walter Bright via Digitalmars-d wrote:
 On 6/8/2022 12:18 PM, claptrap wrote:
 Isn't the DIP process supposed to catch these sorts of things?
It does catch those sorts of things. But the checks do not happen for system code, and the code snippet is system code. I submitted a DIP to make safe the default (rather than system) but it was rejected. This is the result.
I was in favor of the safe by default DIP. The only amendment I wanted was that safe by default would only apply to extern(D) functions, which IIRC just about everyone else also wanted. IIRC, you were the only one who opposed this amendment, which then caused the majority to reject the DIP. Had you accepted the compromise then, we would already have safe by default today. So it seems a bit incongruous to now place blame on the DIP being rejected, as though everyone flatly rejected the whole idea. Such was not the case. A compromise *could* have been reached. (And for the record, if the amended DIP were to be submitted today, I'd vote in favor. safe by default is a good thing to have -- except on extern(C) interfaces to C code, which by definition is un- safe -- the most it can be is trusted, and I'm sure nobody wants trusted by default.) T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Jun 08 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 09/06/2022 11:47 AM, H. S. Teoh wrote:
 (And for the record, if the amended DIP were to be submitted today, I'd
 vote in favor.  safe by default is a good thing to have -- except on
 extern(C) interfaces to C code, which by definition is un- safe -- the
 most it can be is  trusted, and I'm sure nobody wants  trusted by
 default.)
I'm in favor of turning on attribute inference instead. It'll do the same thing (effectively) while also covering things like DIP1000's attributes too. Why not try it and see how much breakage it would really cause? If hardly anything lets just do it without a DIP.
Jun 08 2022
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/8/22 16:47, H. S. Teoh wrote:

  safe by default is a good thing to have
I think we used wrong names. safe is not safe because it allows an escape hatch. Today, safe is actually "trusted" because the compiler trusts the programmer but checks whatever it is allowed to. Basically today's safe is "verify, but trust".
 -- except on
 extern(C) interfaces to C code, which by definition is un- safe
I see it differently: extern(C) interfaces are trusted but they can't be checked. (More below.) I was convinced (after having an email exchange with Walter) that unless we assumed extern(C) functions safe, then nobody would bother marking their declarations as trusted one-by-one. And whoever marked them as such, they would do it without actually auditing any source code. What have we gained by disapproving safe-by-default? Nothing: C API would either not be called and be marked blindly as trusted. I think this is more embarrassing than safe-by-default C libraries. So, D's presumed embarrassment of "C functions are assumed safe" was against both practicality and the truth: The truth is, we indeed "trust" C functions because we use C libraries all the time without reading their source code. This is the definition of trust. And that's why I say we chose wrong names around this topic.
 -- the
 most it can be is  trusted, and I'm sure nobody wants  trusted by
 default.)
Me wants trusted by default but with some semantic changes! :) I think I have written the following proposal before, which requires changing the semantics but I haven't thought about every detail. (I am not methodic nor complete when it comes to such design ideas.) So, this is what we have currently: safe: Checked with escape hatch trusted: Assumed safe, unchecked system: Assumed unsafe, unchecked default: system extern(C): system The whole thing could have started (and I believe can be changed into) like the following instead: safe: Checked without escape hatch trusted: Checked, with escape hatch ( system will be the escape hatch) system: Assumed unsafe, unchecked default: trusted extern(C): trusted but can't check As that list may be hard to parse, here is a commentary: safe: We had it wrong. safe should mean "safe" without any escape hatch. trusted: The name was fine but why not check D code that is not marked? So, let's make this the default, and check all D code. Everybody will benefit. Except, we will have to add system{} to some places. system: No change here but this becomes the escape hatch. extern(C): We will happily call them from trusted code (but not safe code) but we can't check them. So what? The society trusts C libraries, so do we. Ali
Jun 08 2022
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 02:44, Ali Çehreli wrote:
 The society trusts C libraries, so do we.
free(cast(void*)0xDEADBEEF) Seems legit.
Jun 08 2022
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 02:54, Timon Gehr wrote:
 On 09.06.22 02:44, Ali Çehreli wrote:
 The society trusts C libraries, so do we.
free(cast(void*)0xDEADBEEF) Seems legit.
I guess this does not actually make the point very well. Second try: ```d free(new int); ``` Seems legit. The C library can do no wrong!
Jun 08 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/8/22 18:04, Timon Gehr wrote:
 On 09.06.22 02:54, Timon Gehr wrote:
 On 09.06.22 02:44, Ali Çehreli wrote:
 The society trusts C libraries, so do we.
free(cast(void*)0xDEADBEEF) Seems legit.
I guess this does not actually make the point very well. Second try: ```d free(new int); ``` Seems legit. The C library can do no wrong!
I still don't get it. :( That mistake has nothing to do with the C library. If your object is to trusted code being able to call free, then no special marking can be practically useful. Forcing D code to be system just to call free() is counter productive because the D code does not get checked. When D code is trustet, at least situation like my other response would be caught by D. I mean, who wins by system-by-default? Nobody. The code is not safer. Ali
Jun 08 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 03:10, Ali Çehreli wrote:
 On 6/8/22 18:04, Timon Gehr wrote:
  > On 09.06.22 02:54, Timon Gehr wrote:
  >> On 09.06.22 02:44, Ali Çehreli wrote:
  >>> The society trusts C libraries, so do we.
  >>
  >> free(cast(void*)0xDEADBEEF)
  >>
  >> Seems legit.
  >
  > I guess this does not actually make the point very well. Second try:
  >
  > ```d
  > free(new int);
  > ```
  >
  > Seems legit. The C library can do no wrong!
 
 I still don't get it. :(
 ...
` trusted` has a specific meaning, it does not mean we believe the implementer of `free` is a nice guy. It means the specification of `free` says it's safe to call with any valid pointer and we believe that it is true. This is not the case, hence it cannot be ` trusted`.
 ...
 
 I mean, who wins by  system-by-default? Nobody. The code is not safer.
 ...
That's on Walter.
Jun 08 2022
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
OOn 6/8/22 17:54, Timon Gehr wrote:
 On 09.06.22 02:44, Ali Çehreli wrote:
 The society trusts C libraries, so do we.
free(cast(void*)0xDEADBEEF) Seems legit.
I don't get the point. The society uses C libraries in many places. I assume many important libraries like libssh are written in C. If your point was about my proposal, I meant only trusted code would be able to call a C library. So, if your expression above was in D code, then the compiler would reject it ( trusted-by-default but trusted is checked). The programmer would have to mark it as system. Ali
Jun 08 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 03:07, Ali Çehreli wrote:
 
 If your point was about my proposal
It does not matter. There's nothing in your proposal that justifies ` trusted` being the default for extern(C) functions. In fact, the opposite is the case.
Jun 08 2022
prev sibling next sibling parent reply forkit <forkit gmail.com> writes:
On Thursday, 9 June 2022 at 00:44:58 UTC, Ali Çehreli wrote:
 .... So, this is what we have currently:

    safe: Checked with escape hatch

    trusted: Assumed safe, unchecked

    system: Assumed unsafe, unchecked

   default:  system

   extern(C):  system
 ....
and people complain about the extra cognitive load associated with my idea of 'private (this) int x;' All these attributes you mention, wanna make my head explode ;-)
Jun 08 2022
parent reply zjh <fqbqrr 163.com> writes:
On Thursday, 9 June 2022 at 01:10:04 UTC, forkit wrote:

 All these attributes you mention, wanna make my head explode ;-)
They are very funny . Obviously simple, they say there is a `cognitive burden`. And `trouble`,They said it wasn't that hard.
Jun 08 2022
parent zjh <fqbqrr 163.com> writes:
On Thursday, 9 June 2022 at 01:35:43 UTC, zjh wrote:

 And `trouble`,They said it wasn't that hard.
We hope to have a `complete, comprehensive and accurate` introduction to this `'dip1000'` , otherwise `no one` can use it rightly!
Jun 08 2022
prev sibling next sibling parent Paul Backus <snarwin gmail.com> writes:
On Thursday, 9 June 2022 at 00:44:58 UTC, Ali Çehreli wrote:
 I was convinced (after having an email exchange with Walter) 
 that unless we assumed extern(C) functions  safe, then nobody 
 would bother marking their declarations as  trusted one-by-one. 
 And whoever marked them as such, they would do it without 
 actually auditing any source code.
It is actually even worse than this: no matter how diligent you are, it is impossible to audit the source code corresponding to an extern(C) function declaration, even in principle. The reason for this is that when your code calls an extern(C) function, it is not calling a specific implementation. Rather, it is calling whatever implementation happens to get linked into the final binary when the code is compiled. If someone builds your code 10 years from now on a new system, it may end up calling a C implementation that was not even written yet when you wrote your extern(C) function declaration! Strictly speaking, this means that you can *never* be absolutely, 100% mathematically sure that an extern(C) function is memory-safe. So if you are aiming for an absolute, 100% mathematically-ironclad guarantee of memory safety, your program cannot call any extern(C) functions, ever, period. Unfortunately, this also means your program cannot make any system calls, so you will probably not get very far attempting to program this way. The solution is to give up on aiming for an absolute, 100% mathematical guarantee of memory safety under all possible circumstances. Instead, what we aim for in practice is a *conditional* guarantee: "my program is memory safe, so long as assumptions A, B, and C about the external world hold true." Some common assumptions we make are: * The OS's system call interface behaves as documented. * The C standard library API behaves as specified in the relevant C standard. * Other C libraries I depend on behave as documented. It is important to understand that relying on C libraries to behave according to their documentation is *not* the same thing as marking their functions as trusted. Many C library functions say explicitly, in their documentation, that they *will* corrupt memory if called in certain ways. These functions must still be marked as system, even if you are willing to "trust" the libraries to behave as-documented.
Jun 08 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 8:44 PM, Ali Çehreli wrote:
  > -- except on
  > extern(C) interfaces to C code, which by definition is un- safe
 
 I see it differently: extern(C) interfaces are  trusted but they can't 
 be checked. (More below.)
 
 I was convinced (after having an email exchange with Walter) that unless 
 we assumed extern(C) functions  safe, then nobody would bother marking 
 their declarations as  trusted one-by-one. And whoever marked them as 
 such, they would do it without actually auditing any source code.
 
 What have we gained by disapproving  safe-by-default? Nothing: C API 
 would either not be called and be marked blindly as  trusted. I think 
 this is more embarrassing than  safe-by-default C libraries.
 
 So, D's presumed embarrassment of "C functions are assumed  safe" was 
 against both practicality and the truth: The truth is, we indeed "trust" 
 C functions because we use C libraries all the time without reading 
 their source code. This is the definition of trust. And that's why I say 
 we chose wrong names around this topic.
You are missing the point. ```d extern(C) void *malloc(size_t); extern(C) void free(void *); // safe? void main() // safe! { auto ptr = malloc(int.sizeof); int *v = ((p) trusted => cast(int*)p)(ptr); free(v); *v = 5; // safe??! } ``` See, you can trust malloc and free to have *valid* implementations (read, they are memory safe as long as you obey their rules). But they don't obey ` safe` rules. They can't, they aren't written in D, and they aren't checked by D. There is no way to express the invariants that they require, and there certainly isn't a way to *infer* those invariants based on the types of the parameters. Marking them ` safe` by default is a disaster. It is the complete and utter destruction of memory safety in D. I can't stress this enough. I wish Walter had not brought this up, because I don't think it's fruitful to have this discussion again. -Steve
Jun 08 2022
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/8/22 18:34, Steven Schveighoffer wrote:

 You are missing the point.
Clearly. :) I actually want checked-by-default but don't know how to get it for D code around C library calls. But I think I see it better now. Ali
Jun 08 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
The point of  safe by default for C declarations was:

1. so that we would not be deluged with complaints about breaking existing code

2. so people would use it

What people *will* do with C unsafe by default is:

1. slap ` trusted:` at the beginning and go on their merry way, and nothing was 
accomplished except annoying people
Jun 08 2022
next sibling parent Adrian Matoga <dlang.spam matoga.info> writes:
On Thursday, 9 June 2022 at 06:53:55 UTC, Walter Bright wrote:
 The point of  safe by default for C declarations was:

 1. so that we would not be deluged with complaints about 
 breaking existing code

 2. so people would use it

 What people *will* do with C unsafe by default is:

 1. slap ` trusted:` at the beginning and go on their merry way, 
 and nothing was accomplished except annoying people
Still, slapping trusted is explicit and greppable, and so can get flagged by a simple script even before reviewed by a human. Also, once ImportC becomes the default way of interfacing to C APIs, their memory safety attributes are under compiler control and circumventing that would require extra effort.
Jun 09 2022
prev sibling next sibling parent Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Thursday, 9 June 2022 at 06:53:55 UTC, Walter Bright wrote:
 The point of  safe by default for C declarations was:

 1. so that we would not be deluged with complaints about 
 breaking existing code

 2. so people would use it

 What people *will* do with C unsafe by default is:

 1. slap ` trusted:` at the beginning and go on their merry way, 
 and nothing was accomplished except annoying people
As said during the infinitely long thread about that in the past, rejecting a slapped ' trusted:' is work for code reviewers if your company minds for safety. But please, not again that discussion. (said that, if there was a -preview with safe-as-default but system extern(C) that will become the default in our codebase)
Jun 09 2022
prev sibling next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 9 June 2022 at 06:53:55 UTC, Walter Bright wrote:
 The point of  safe by default for C declarations was:

 1. so that we would not be deluged with complaints about 
 breaking existing code

 2. so people would use it

 What people *will* do with C unsafe by default is:

 1. slap ` trusted:` at the beginning and go on their merry way, 
 and nothing was accomplished except annoying people
That is their fault and provides a clear warning sign of where to look to fix the problem and improve safety. grepping for trusted is the number 1 way to find starting points for memory safety problems (aside from gdb/asan)
Jun 09 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 5:44 AM, John Colvin wrote:
 That is their fault
If the language encourages it, it's the language's fault.
Jun 13 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 14.06.22 00:53, Walter Bright wrote:
 On 6/9/2022 5:44 AM, John Colvin wrote:
 That is their fault
If the language encourages it, it's the language's fault.
Exactly. You have literally been arguing in favor of the language doing this exact faulty thing _implicitly by default_. _That's_ what encouragement looks like.
Jun 13 2022
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 14 June 2022 at 00:16:37 UTC, Timon Gehr wrote:
 On 14.06.22 00:53, Walter Bright wrote:
 On 6/9/2022 5:44 AM, John Colvin wrote:
 That is their fault
If the language encourages it, it's the language's fault.
Exactly. You have literally been arguing in favor of the language doing this exact faulty thing _implicitly by default_. _That's_ what encouragement looks like.
I am normally on team “Walter’s been around the block a few times, he knows what’s up”, but this one just makes no sense at all. I am working at what is presumably the largest employer of D programmers in the world, I was one of the very first using D there & I just can’t imagine us ever blaming you for encouraging us to blindly slap trusted on C bindings when the alternative was the compiler doing it implicitly!
Jun 14 2022
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 08:53, Walter Bright wrote:
 The point of  safe by default for C declarations was:
 
 1. so that we would not be deluged with complaints about breaking 
 existing code
 ...
It really does not help much with that. In addition, it would slap ` safe` on code that is not actually memory safe and was not intended to be. That's also breakage.
 2. so people would use it
 
 What people *will* do with C unsafe by default is:
 
 1. slap ` trusted:` at the beginning and go on their merry way,
This is not what I will do, but they can of course just do that. It's very visible in code review.
 and nothing was accomplished except annoying people
Your are predicting that some people will explicitly do the wrong and lazy thing, hence the compiler should do the wrong and lazy thing implicitly by default. This just makes no sense. What's the big harm in annoying lazy people slightly more? It's not like they won't complain loudly about ` safe` by default in any case. May as well do it right or not at all.
Jun 09 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 7:51 AM, Timon Gehr wrote:
 Your are predicting that some people will explicitly do the wrong and lazy 
 thing,
My experience is that the vast bulk of people will do the least amount of effort. It's why software is always larded up with technical debt. I do it, too. Yes, sometimes I've used duct tape and baling wire. Anyone who claims they haven't, I don't believe :-)
 hence the compiler should do the wrong and lazy thing implicitly by 
 default. This just makes no sense. What's the big harm in annoying lazy people 
 slightly more? It's not like they won't complain loudly about ` safe` by
default 
 in any case.
I'm the recipient of all the complaints that I'm breaking their existing code.
 May as well do it right or not at all.
This entire thread is what happens with "not at all". At some point all C functions have to be trusted in some form or other because the D compiler has NO way to check them, and neither does the D programmer. Putting ` trusted` on the C declarations accomplishes nothing, it's safety theater. In druntime, we've gone through many (certainly not all) of the C declarations and appropriately added correct annotations to them. But realistically, this is not scalable.
Jun 09 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/9/22 11:07 PM, Walter Bright wrote:
 On 6/9/2022 7:51 AM, Timon Gehr wrote:
 Your are predicting that some people will explicitly do the wrong and 
 lazy thing,
My experience is that the vast bulk of people will do the least amount of effort. It's why software is always larded up with technical debt. I do it, too. Yes, sometimes I've used duct tape and baling wire. Anyone who claims they haven't, I don't believe :-)
In order for safe-by-default extern(C) to actually prevent code breakage, their code: 1. Has to be marked system or unmarked 2. Has to call C functions that are unmarked 3. Has to all be actually safe code (as checked by the D compiler) that they didn't mark as safe. 1, and 2, I can see being true. 3 not so much. Especially if dip1000 is enabled. They aren't going to bother with trusted:, they'll just apply system: to *all their code*, not just the extern(C) functions. Which is fine. That's actually the correct way to mark "I don't care about safety", and it's not lying. Not to mention that we have lots of C modules like this: https://github.com/dlang/druntime/blob/ae0724769e3808398b3efdaed4ebdb59c676100d/src/core/stdc/stdio.d#L52 So even if you make `extern(C)` safe by default most C modules are ALREADY MARKED WITH ` system:`!! Will they not complain when `printf` can't be called from their hello world program? Hey, maybe they'll just declare a new `printf` prototype because that will now make it safe!
 At some point all C functions have to be trusted in some form or other 
 because the D compiler has NO way to check them, and neither does the D 
 programmer. Putting ` trusted` on the C declarations accomplishes 
 nothing, it's safety theater.
Then you don't understand what trusted or safe means. Marking a C function safe doesn't mean it's bug free. It means that it *obeys the rules of D safe*. I trust that the libc authors implemented `free` correctly. I don't trust that they completely disregarded the C spec and made it valid for D safe. This means extern(C) calls that takes a `char *` must only read at most one byte from that pointer. This means that all arrays only can read one value from the array (because a C array is a pointer). This means that memory can never be freed, because it would leave dangling pointers. This means that `size_t` parameters accompanying pointers can't be used to judge how many elements of the pointer to read. This is not an exhaustive list.
 In druntime, we've gone through many (certainly not all) of the C 
 declarations and appropriately added correct annotations to them. But 
 realistically, this is not scalable.
I am completely lost here. How is it not scalable to go into every libc module of druntime and mark them with ` system:` at the top? I can do it if you want, it probably will take 10 minutes. Most time will be spent searching for already existing system: attributes to skip having to attribute that module. -Steve
Jun 09 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 8:49 PM, Steven Schveighoffer wrote:
 In druntime, we've gone through many (certainly not all) of the C declarations 
 and appropriately added correct annotations to them. But realistically, this 
 is not scalable.
I am completely lost here. How is it not scalable to go into every libc module of druntime and mark them with ` system:` at the top? I can do it if you want, it probably will take 10 minutes. Most time will be spent searching for already existing system: attributes to skip having to attribute that module.
I did say "correct annotations". Granted, just slapping system on them is easier. My experience is that asking people to make *any* edits to existing files is asking a lot, especially if it is code that is considered tested and working, and especially if it is code they don't have rights to the repository to change. I'm the one who gets the earful when these changes are needed.
Jun 09 2022
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 10 June 2022 at 05:14:21 UTC, Walter Bright wrote:
 My experience is that asking people to make *any* edits to 
 existing files is asking a lot, especially if it is code that 
 is considered tested and working, and especially if it is code 
 they don't have rights to the repository to change.

 I'm the one who gets the earful when these changes are needed.
Just have a compiler option that gives the old behaviour.
Jun 09 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/10/22 1:14 AM, Walter Bright wrote:
 On 6/9/2022 8:49 PM, Steven Schveighoffer wrote:
 In druntime, we've gone through many (certainly not all) of the C 
 declarations and appropriately added correct annotations to them. But 
 realistically, this is not scalable.
I am completely lost here. How is it not scalable to go into every libc module of druntime and mark them with ` system:` at the top? I can do it if you want, it probably will take 10 minutes. Most time will be spent searching for already existing system: attributes to skip having to attribute that module.
I did say "correct annotations". Granted, just slapping system on them is easier.
I'm trying to parse this. You mean to say, there are enough unmarked extern(C) functions inside druntime, that fixing them all *as they come up* is not scalable? That seems unlikely. Note that modules like core.stdc.math has trusted: at the top already. My point above is that unmarked extern(C) calls are system now, and marking them as system will not change anything. I will volunteer to mark any druntime extern(C) functions within a 2-day turnaround if they are posted on bugzilla and assigned to me. Start with system: at the top, and mark them as the errors occur.
 
 My experience is that asking people to make *any* edits to existing 
 files is asking a lot, especially if it is code that is considered 
 tested and working, and especially if it is code they don't have rights 
 to the repository to change.
 
 I'm the one who gets the earful when these changes are needed.
The vast majority of compiler errors that will come from a safe-by-default DIP are functions that are actually system and now implicitly get marked safe -- not because they call unmarked extern(C) functions, but because they do system things (like casting). If you think the "no-edits" bar is only cleared if extern(C) functions are assumed safe, you are 100% wrong. You can point the complaints about extern(C) functions at my ear, and deal with the significant majority of complaints that are about safe by default D code. I would love to see a viable safe-by-default DIP get added. -Steve
Jun 10 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/10/2022 5:46 AM, Steven Schveighoffer wrote:
 I'm trying to parse this. You mean to say, there are enough unmarked extern(C) 
 functions inside druntime, that fixing them all *as they come up* is not 
 scalable? That seems unlikely. Note that modules like core.stdc.math has 
  trusted: at the top already.
It is not scalable for the reasons mentioned. Nobody has ever gone through windows.d to see what to mark things as, and nobody ever will.
 I will volunteer to mark any druntime extern(C) functions within a 2-day 
 turnaround if they are posted on bugzilla and assigned to me. Start with 
  system: at the top, and mark them as the errors occur.
They aren't even done *now*, after 15+ years. See windows.d and all the the others.
 If you think the "no-edits" bar is only cleared if 
 extern(C) functions are assumed  safe, you are 100% wrong.
I'm not seeing how I am wrong.
 You can point the complaints about extern(C) functions at my ear, and deal
with 
 the significant majority of complaints that are about  safe by default D code.
That's a very nice offer, but that won't change that I get the complaints and people want me to fix it, not brush it off on you.
 I would love to see a viable safe-by-default DIP get added.
At least we can agree on that!
Jun 13 2022
next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 13 June 2022 at 22:49:53 UTC, Walter Bright wrote:
 On 6/10/2022 5:46 AM, Steven Schveighoffer wrote:
 I would love to see a viable safe-by-default DIP get added.
At least we can agree on that!
I think the best proposal so far has been Adam Ruppe's idea to make safe-by-default something you can opt into at the module level. https://dpldocs.info/this-week-in-d/Blog.Posted_2020_01_13.html If we don't want to change the meaning of the existing ` safe:` syntax, we can adopt some new syntax for it (` safe module foo;`? `default( safe):`?). As a bonus, Adam's proposal will also give us opt-in "nothrow by default" and " nogc by default" for free.
Jun 13 2022
parent reply Mike Parker <aldacron gmail.com> writes:
On Monday, 13 June 2022 at 23:44:51 UTC, Paul Backus wrote:

 syntax, we can adopt some new syntax for it (` safe module 
 foo;`? `default( safe):`?).
Adding the attributes to the module statement works, but I'd like to see it this way: ```d module foo default safe nothrow; ``` The `default` must follow the module name, and the attributes must follow the `default`. Having a single pattern for it increases readability. You don't have to scan the line to pick out the module name since it's always second, and the `default` visually marks the beginning of the attribute list.
Jun 13 2022
next sibling parent reply forkit <forkit gmail.com> writes:
On Tuesday, 14 June 2022 at 01:55:01 UTC, Mike Parker wrote:

the word default seems uncessary.

how about just:

module foo :  safe, nothow;
Jun 13 2022
parent Mike Parker <aldacron gmail.com> writes:
On Tuesday, 14 June 2022 at 02:04:58 UTC, forkit wrote:
 On Tuesday, 14 June 2022 at 01:55:01 UTC, Mike Parker wrote:

 the word default seems uncessary.

 how about just:

 module foo :  safe, nothow;
Yeah, that works.
Jun 13 2022
prev sibling parent Martin B <martin.brzenska googlemail.com> writes:
On Tuesday, 14 June 2022 at 01:55:01 UTC, Mike Parker wrote:
 ```d
 module foo default  safe nothrow;
 ```

 The `default` must follow the module name, and the attributes 
 must follow the `default`. Having a single pattern for it 
 increases readability. You don't have to scan the line to pick 
 out the module name since it's always second, and the `default` 
 visually marks the beginning of the attribute list.
Yes, please!
Jun 14 2022
prev sibling next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2022 10:49 AM, Walter Bright wrote:
 On 6/10/2022 5:46 AM, Steven Schveighoffer wrote:
 I would love to see a viable safe-by-default DIP get added.
At least we can agree on that!
I would love to see a PR turn on attribute inference for all functions. Who knows, we might be closer to having this work for most code than we think!
Jun 13 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/13/22 6:49 PM, Walter Bright wrote:
 On 6/10/2022 5:46 AM, Steven Schveighoffer wrote:
 I'm trying to parse this. You mean to say, there are enough unmarked 
 extern(C) functions inside druntime, that fixing them all *as they 
 come up* is not scalable? That seems unlikely. Note that modules like 
 core.stdc.math has  trusted: at the top already.
It is not scalable for the reasons mentioned. Nobody has ever gone through windows.d to see what to mark things as, and nobody ever will.
It's done already. Nearly all the modules have ` system:` at the top. The few I checked that don't are just types, no functions.
 
 
 I will volunteer to mark any druntime extern(C) functions within a 
 2-day turnaround if they are posted on bugzilla and assigned to me. 
 Start with  system: at the top, and mark them as the errors occur.
They aren't even done *now*, after 15+ years. See windows.d and all the the others.
They are mostly marked system, with a smattering of safe and trusted. I'll tell you what, I'll do a *whole file* at a time `winsock32.d` ... OK, I did it in less than 10 minutes. https://github.com/dlang/druntime/pull/3839
 
 
 If you think the "no-edits" bar is only cleared if extern(C) functions 
 are assumed  safe, you are 100% wrong.
I'm not seeing how I am wrong.
```d import core.stdc.stdio; void main() safe { printf("hello world!\n"); // fails } ``` You are saying that nobody has any unmarked D code that uses `extern(C)` functions that are *already and correctly* marked system? I'm willing to bet 100% breakage. Not just like 99%, but 100% (as in, a project that has unmarked D code, which calls `extern(C)` functions, will have at least one compiler error). Unless... you plan to remark files like core.stdc.stdio as safe? I hope not.
 You can point the complaints about extern(C) functions at my ear, and 
 deal with the significant majority of complaints that are about  safe 
 by default D code.
That's a very nice offer, but that won't change that I get the complaints and people want me to fix it, not brush it off on you.
It's trivial: user: hey, this function in core.sys.windows.windows looks like it should be safe? Walter: it's probably just that we haven't marked it yet, file a bug and assign it to Steve. Me: OK, I marked it and all the related functions as trusted (10 minutes later) - or - Me: Sorry, that's not actually safe, please use a trusted escape. There are 166 files in core/sys/windows. Each one where someone has a problem, I'll fix them in 10 minutes, that's 1660 minutes, or 28 hours of work (spread out over however long, if people find some interface that needs fixing). Less than a man-week. How does this not scale? You need to learn to delegate! Especially for library functions, you aren't responsible for all of it!
 I would love to see a viable safe-by-default DIP get added.
At least we can agree on that!
Please, let's make it happen! -Steve
Jun 13 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 6/14/22 04:39, Steven Schveighoffer wrote:
 
 They are mostly marked  system, with a smattering of  safe and  trusted.
 
 I'll tell you what, I'll do a *whole file* at a time `winsock32.d` ...
 
 OK, I did it in less than 10 minutes.
 
 https://github.com/dlang/druntime/pull/3839
There is a post-merge review of that pull request that points out that two of the functions cannot be ` trusted`. It seems in the current version of druntime in DMD master [1], they are still ` trusted`. (I would have commented on the pull request, but it is now archived.) [1] https://github.com/dlang/dmd/blob/master/druntime/src/core/sys/windows/winsock2.d I don't know much about windows sockets, so I am not sure what is the best way to fix this. I guess for `inet_ntoa` we should just remove trusted. For `getprotobynumber`, I am not sure if we should just remove trusted or if it is sufficient to mark the return value `const` (it seems like it might not be. Given that it says windows sockets will return pointers pointing to stuff it has allocated internally, it might also deallocate it internally at a later point?)
Nov 13 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/13/22 3:54 AM, Timon Gehr wrote:
 On 6/14/22 04:39, Steven Schveighoffer wrote:
 They are mostly marked  system, with a smattering of  safe and  trusted.

 I'll tell you what, I'll do a *whole file* at a time `winsock32.d` ...

 OK, I did it in less than 10 minutes.

 https://github.com/dlang/druntime/pull/3839
There is a post-merge review of that pull request that points out that two of the functions cannot be ` trusted`. It seems in the current version of druntime in DMD master [1], they are still ` trusted`. (I would have commented on the pull request, but it is now archived.) [1] https://github.com/dlang/dmd/blob/master/druntime/src/core/sys/windows/winsock2.d I don't know much about windows sockets, so I am not sure what is the best way to fix this. I guess for `inet_ntoa` we should just remove trusted. For `getprotobynumber`, I am not sure if we should just remove trusted or if it is sufficient to mark the return value `const` (it seems like it might not be. Given that it says windows sockets will return pointers pointing to stuff it has allocated internally, it might also deallocate it internally at a later point?)
Thanks! I didn't notice that review. `getprotobynumber` also states that the "application should copy any information that it needs before issuing any other Windows Sockets function calls" Which suggests the data may not be valid on a second call. In other words, the struct contains e.g. a `char *`. If you copy that *pointer*, it may not be valid upon a second call. When I did the first PR, I did not focus enough on the return values. https://github.com/dlang/dmd/pull/14639 -Steve
Nov 13 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 13.11.22 17:06, Steven Schveighoffer wrote:
 
 When I did the first PR, I did not focus enough on the return values.
 
 https://github.com/dlang/dmd/pull/14639
Thank you! :)
Nov 13 2022
prev sibling next sibling parent mee6 <mee6 lookat.me> writes:
On Friday, 10 June 2022 at 03:07:23 UTC, Walter Bright wrote:
 On 6/9/2022 7:51 AM, Timon Gehr wrote:
 Your are predicting that some people will explicitly do the 
 wrong and lazy thing,
My experience is that the vast bulk of people will do the least amount of effort. It's why software is always larded up with technical debt. I do it, too. Yes, sometimes I've used duct tape and baling wire. Anyone who claims they haven't, I don't believe :-)
 hence the compiler should do the wrong and lazy thing 
 implicitly by default. This just makes no sense. What's the 
 big harm in annoying lazy people slightly more? It's not like 
 they won't complain loudly about ` safe` by default in any 
 case.
I'm the recipient of all the complaints that I'm breaking their existing code.
 May as well do it right or not at all.
This entire thread is what happens with "not at all". At some point all C functions have to be trusted in some form or other because the D compiler has NO way to check them, and neither does the D programmer. Putting ` trusted` on the C declarations accomplishes nothing, it's safety theater. In druntime, we've gone through many (certainly not all) of the C declarations and appropriately added correct annotations to them. But realistically, this is not scalable.
That's why it's a mistake to even mark C declarations with safe or trusted. Rust treats all C declarations as unsafe. That mistake could have been fixed, slightly by having C/C++ declarations be unsafe by default. The word used for the keyword of trusted is kind of wrong, C code should never be "trusted", it is always unsafe. So when you say at some point all C code is trusted, that statement is just wrong. Think of it as safe and unsafe. You are basically saying all C code is safe, when that's wrong. All C code is unsafe.
Jun 10 2022
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 6/10/22 05:07, Walter Bright wrote:
 
 hence the compiler should do the wrong and lazy thing implicitly by 
 default. This just makes no sense. What's the big harm in annoying 
 lazy people slightly more? It's not like they won't complain loudly 
 about ` safe` by default in any case.
I'm the recipient of all the complaints that I'm breaking their existing code. ..
I am aware. But if that's your concern it kills the DIP on its own. If anything, you'll receive more complaints with the broken behavior, because then nobody is happy, not even safe-by-default advocates.
  > May as well do it right or not at all.
 
 This entire thread is what happens with "not at all".
Well, I contest that. It's not even closely related. This is an issue concerning ` system` code. What's the default has not much at all to do with it.
Jun 10 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/10/2022 4:59 PM, Timon Gehr wrote:
 On 6/10/22 05:07, Walter Bright wrote:
 This entire thread is what happens with "not at all".
Well, I contest that. It's not even closely related. This is an issue concerning ` system` code. What's the default has not much at all to do with it.
Steven is compiling ordinary code with the default, when there is no obvious reason why it should be system code. system code should be relatively rare.
Jun 13 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 14.06.22 00:52, Walter Bright wrote:
 On 6/10/2022 4:59 PM, Timon Gehr wrote:
 On 6/10/22 05:07, Walter Bright wrote:
 This entire thread is what happens with "not at all".
Well, I contest that. It's not even closely related. This is an issue concerning ` system` code. What's the default has not much at all to do with it.
Steven is compiling ordinary code with the default, when there is no obvious reason why it should be system code. ...
The obvious reason why Steven's example should be ` system` code is because that's where he is observing the issue.
  system code should be relatively rare.
Steven made very clear in the original post that he's aware that this is ` system` code. It's not annotated because ` system` is currently the default.
Jun 13 2022
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 10:50 AM, John Colvin wrote:
 The problem is `foo` and whether the compiler should somehow prevent the 
 inconsistency between the signature and implementation. Obviously the answer
is 
 “yes, ideally”, but in practice with  safe,  system, dip1000,  live and so
on 
 it’s all a mess.
The checks aren't done for system code. Yes, the compiler believes you for system code. It's the point of system code. If foo() is annotated with safe, test6.d(5): Deprecation: scope variable `s` may not be returned The compiler is working as intended, this is not unexpected behavior.
Jun 08 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 01:19, Walter Bright wrote:
 On 6/8/2022 10:50 AM, John Colvin wrote:
 The problem is `foo` and whether the compiler should somehow prevent 
 the inconsistency between the signature and implementation. Obviously 
 the answer is “yes, ideally”, but in practice with  safe,  system, 
 dip1000,  live and so on it’s all a mess.
The checks aren't done for system code. Yes, the compiler believes you for system code. It's the point of system code. If foo() is annotated with safe,   test6.d(5): Deprecation: scope variable `s` may not be returned The compiler is working as intended, this is not unexpected behavior.
Actually it *is* unexpected behavior. ```d int* foo() system{ int x; return &x; // error } int* foo(ref int x) system{ return &x; // error } int* foo(scope int* x) system{ return x; // ok } ``` This does not have anything to do with ` safe` by default, it's just an inconsistency in the compiler implementation.
Jun 08 2022
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 5:38 PM, Timon Gehr wrote:
 This does not have anything to do with ` safe` by default, it's just an 
 inconsistency in the compiler implementation.
I could make a case for every one of the safety checks being checked in system code, too. The existing checks you note were there long before safe/ trusted/ system was added. Them remaining is an artifact of evolution and a wish to support legacy behavior.
Jun 08 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 08:46, Walter Bright wrote:
 On 6/8/2022 5:38 PM, Timon Gehr wrote:
 This does not have anything to do with ` safe` by default, it's just 
 an inconsistency in the compiler implementation.
I could make a case for every one of the safety checks being checked in system code, too. ...
No. You really would not be able to. The point of ` safe` is to catch everything that's bad. The point of ` system` is to allow everything that makes sense. If the compiler can't figure out if something is fine, it should be an error in ` safe` code and allowed in ` system`. But if the compiler can easily tell that something makes no sense, it should still be an error in both ` safe` and ` system` code!
 The existing checks you note were there long before 
  safe/ trusted/ system was added. Them remaining is an artifact of 
 evolution and a wish to support legacy behavior.
It's a pity you feel this way. Clearly if something is _always wrong_ it makes sense to have a diagnostic. There is type checking in ` system` code, and that makes sense. E.g., we don't just make accessing an undefined identifier in ` system` code UB, because that would be ridiculous.
Jun 09 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 5:58 AM, Timon Gehr wrote:
 But if the compiler can easily tell that something makes no sense, it should 
 still be an error in both ` safe` and ` system` code!
Sometimes it makes sense for a function to return the address of a local. For example, if you want to detect how large the stack has gotten. I use this in, for example, the garbage collector to see how much stack needs to be scanned. It can also be used to "step" on the stack after a function returns, as one might want to do for security software. I've also done things like write 0xDEADBEEF all over memory in order to flush out memory bugs. This involves using pointers in UB ways that don't make sense as far as the language is concerned. In safe code it is nonsense to write specific numbers into a pointer. But in system code, it does make sense. I don't think one could write a symbolic debugger with safe code. Like writing instruction bytes into a buffer, and then calling it? How unsafe can one get? :-) And so on.
Jun 09 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/10/22 05:15, Walter Bright wrote:
 On 6/9/2022 5:58 AM, Timon Gehr wrote:
 But if the compiler can easily tell that something makes no sense, it 
 should still be an error in both ` safe` and ` system` code!
Sometimes it makes sense for a function to return the address of a local. For example, if you want to detect how large the stack has gotten. I use this in, for example, the garbage collector to see how much stack needs to be scanned. It can also be used to "step" on the stack after a function returns, as one might want to do for security software. I've also done things like write 0xDEADBEEF all over memory in order to flush out memory bugs. This involves using pointers in UB ways that don't make sense as far as the language is concerned. In safe code it is nonsense to write specific numbers into a pointer. But in system code, it does make sense. I don't think one could write a symbolic debugger with safe code. Like writing instruction bytes into a buffer, and then calling it? How unsafe can one get? :-) And so on.
Well, you can do that knowing what the backend will or will not do with code that the spec says could mean anything at all. Others may be a bit less privileged. ;) If it's needed, I think it's better to have explicit support for such use cases, not involving any UB.
Jun 10 2022
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Thursday, 9 June 2022 at 00:38:13 UTC, Timon Gehr wrote:
 ```d
 int* foo() system{
     int x;
     return &x; // error
 }

 int* foo(ref int x) system{
     return &x; // error
 }

 int* foo(scope int* x) system{
     return x; // ok
 }
 ```

 This does not have anything to do with ` safe` by default, it's 
 just an inconsistency in the compiler implementation.
I noticed this as well, and as of https://github.com/dlang/dmd/pull/14107 the `&ref` escape is treated the same as returning a scope pointer (error in safe code only). Returning &local directly is still an error in system code, that error predates safe and dip1000.
Jun 09 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 14:45, Dennis wrote:
 On Thursday, 9 June 2022 at 00:38:13 UTC, Timon Gehr wrote:
 ```d
 int* foo() system{
     int x;
     return &x; // error
 }

 int* foo(ref int x) system{
     return &x; // error
 }

 int* foo(scope int* x) system{
     return x; // ok
 }
 ```

 This does not have anything to do with ` safe` by default, it's just 
 an inconsistency in the compiler implementation.
I noticed this as well, and as of https://github.com/dlang/dmd/pull/14107 the `&ref` escape is treated the same as returning a scope pointer (error in safe code only). Returning &local directly is still an error in system code, that error predates safe and dip1000.
Well, not a big fan. This is the wrong way around.
Jun 09 2022
parent Dennis <dkorpel gmail.com> writes:
On Thursday, 9 June 2022 at 13:00:01 UTC, Timon Gehr wrote:
 Well, not a big fan. This is the wrong way around.
It was needed to avoid breaking existing code, which is sometimes annotated incorrectly because of compiler bugs and the `return ref scope` ambiguity issue. Once the dip1000 by default transition has progressed and people corrected the `return ref` / `return scope` annotations in their code, I think it can become an error in ` system` code again.
Jun 09 2022
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 The compiler is going “you told me `foo` doesn’t leak 
 references to the string passed to it, I believe you. Based on 
 that, this temporary array is safe to put on the stack”. I 
 think it’s reasonable for the compiler to lean on `scope` like 
 this.

 The problem is `foo` and whether the compiler should somehow 
 prevent the inconsistency between the signature and 
 implementation. Obviously the answer is “yes, ideally”, but in 
 practice with  safe,  system, dip1000,  live and so on it’s all 
 a mess.
So I gave it some time, and I think I am now convinced that doing this optimization is simply not a good idea. If the value stays on stack - which is all that DIP1000 can check for anyways, then modern backend can track it. LLVM for instance, will annotate function parameter to indicate if they escape or not and do so recursively through the callgraph. LDC is already able to do stack promotion when escape analysis proves something doesn't escape. This is WAY preferable because: - It works regardless of annotations from the dev. - It is always correct, it will not fubar existing system code. - Inlining is likely to uncover more opportunity to do this, there is no point doing it before. Doing this type of optimization to explicitly free elements on heap is worth it. But DIP1000 doesn't allow to track this reliably.
Jun 13 2022
next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Monday, 13 June 2022 at 11:14:36 UTC, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 The compiler is going “you told me `foo` doesn’t leak 
 references to the string passed to it, I believe you. Based on 
 that, this temporary array is safe to put on the stack”. I 
 think it’s reasonable for the compiler to lean on `scope` like 
 this.

 The problem is `foo` and whether the compiler should somehow 
 prevent the inconsistency between the signature and 
 implementation. Obviously the answer is “yes, ideally”, but in 
 practice with  safe,  system, dip1000,  live and so on it’s 
 all a mess.
So I gave it some time, and I think I am now convinced that doing this optimization is simply not a good idea. If the value stays on stack - which is all that DIP1000 can check for anyways, then modern backend can track it. LLVM for instance, will annotate function parameter to indicate if they escape or not and do so recursively through the callgraph. LDC is already able to do stack promotion when escape analysis proves something doesn't escape. This is WAY preferable because: - It works regardless of annotations from the dev. - It is always correct, it will not fubar existing system code. - Inlining is likely to uncover more opportunity to do this, there is no point doing it before. Doing this type of optimization to explicitly free elements on heap is worth it. But DIP1000 doesn't allow to track this reliably.
Without expressing an opinion either way, I want to note that this has implications for ` nogc`. If you remove `scope` from the parameter of `foo`, the compiler won't let `main` be ` nogc` due to the slice literal allocation. string foo(scope string s) nogc { return s; } void main() nogc { foo(['a']); }
Jun 13 2022
parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 13 June 2022 at 12:02:31 UTC, John Colvin wrote:
 Without expressing an opinion either way, I want to note that 
 this has implications for ` nogc`. If you remove `scope` from 
 the parameter of `foo`, the compiler won't let `main` be 
 ` nogc` due to the slice literal allocation.

     string foo(scope string s)  nogc {
         return s;
     }

     void main()  nogc {
         foo(['a']);
     }
It's as if nogc should track leaks and not allocations...
Jun 13 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/13/22 9:25 AM, deadalnix wrote:
 On Monday, 13 June 2022 at 12:02:31 UTC, John Colvin wrote:
 Without expressing an opinion either way, I want to note that this has 
 implications for ` nogc`. If you remove `scope` from the parameter of 
 `foo`, the compiler won't let `main` be ` nogc` due to the slice 
 literal allocation.

     string foo(scope string s)  nogc {
         return s;
     }

     void main()  nogc {
         foo(['a']);
     }
It's as if nogc should track leaks and not allocations...
Wouldn't that be a leak if `['a']` was GC allocated? -Steve
Jun 13 2022
prev sibling next sibling parent Nick Treleaven <nick geany.org> writes:
On Monday, 13 June 2022 at 11:14:36 UTC, deadalnix wrote:
 If the value stays on stack - which is all that DIP1000 can 
 check for anyways, then modern backend can track it. LLVM for 
 instance, will annotate function parameter to indicate if they 
 escape or not and do so recursively through the callgraph.
LLVM might not have the library source code.
 LDC is already able to do stack promotion when escape analysis 
 proves something doesn't escape.

 This is WAY preferable because:
  - It works regardless of annotations from the dev.
So does scope inference (when it's done).
  - It is always correct, it will not fubar existing  system 
 code.
I'm not sure why someone would have written scope in system code unless they didn't understand it's only checked in safe code. The stack promotion optimization is the only reason to use it in system code AIUI.
  - Inlining is likely to uncover more opportunity to do this, 
 there is no point doing it before.
Then the LLVM optimization and the scope optimisation are both needed for the best code in all cases.
 Doing this type of optimization to explicitly free elements on 
 heap is worth it. But DIP1000 doesn't allow to track this 
 reliably.
Please can you provide an example? (There's one Timon came up with in your scope thread recently, Walter's working on that).
Jun 13 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/13/2022 4:14 AM, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:50:18 UTC, John Colvin wrote:
 The compiler is going “you told me `foo` doesn’t leak references to the
string 
 passed to it, I believe you. Based on that, this temporary array is safe to 
 put on the stack”. I think it’s reasonable for the compiler to lean on
`scope` 
 like this.

 The problem is `foo` and whether the compiler should somehow prevent the 
 inconsistency between the signature and implementation. Obviously the answer 
 is “yes, ideally”, but in practice with  safe,  system, dip1000,  live and
so 
 on it’s all a mess.
So I gave it some time, and I think I am now convinced that doing this optimization is simply not a good idea. If the value stays on stack - which is all that DIP1000 can check for anyways, then modern backend can track it. LLVM for instance, will annotate function parameter to indicate if they escape or not and do so recursively through the callgraph. LDC is already able to do stack promotion when escape analysis proves something doesn't escape. This is WAY preferable because:  - It works regardless of annotations from the dev.  - It is always correct, it will not fubar existing system code.  - Inlining is likely to uncover more opportunity to do this, there is no point doing it before. Doing this type of optimization to explicitly free elements on heap is worth it.
The D compiler *does* keep track of this when it does attribute inference. Attribute inference is not currently done for regular functions because of the risk of mismatch between a function declaration and a function definition that may arise because of inference.
 But DIP1000 doesn't allow to track this reliably.
Yes, it does. Any instances where it doesn't are a bug and are fixed.
Jun 13 2022
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Monday, 13 June 2022 at 22:56:11 UTC, Walter Bright wrote:
 The D compiler *does* keep track of this when it does attribute 
 inference. Attribute inference is not currently done for 
 regular functions because of the risk of mismatch between a 
 function declaration and a function definition that may arise 
 because of inference.
It'd cause a linker error though, right? This isn't really much different than any other mismatched header - and since we have dmd -H, it seems manageable. I'll write a blog about this when I have time though. Been swamped with work lately but I do think there's some cases we have to be careful of but it should be doable.
Jun 13 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/13/2022 6:32 PM, Adam D Ruppe wrote:
 On Monday, 13 June 2022 at 22:56:11 UTC, Walter Bright wrote:
 The D compiler *does* keep track of this when it does attribute inference. 
 Attribute inference is not currently done for regular functions because of the 
 risk of mismatch between a function declaration and a function definition that 
 may arise because of inference.
It'd cause a linker error though, right? This isn't really much different than any other mismatched header - and since we have dmd -H, it seems manageable. I'll write a blog about this when I have time though. Been swamped with work lately but I do think there's some cases we have to be careful of but it should be doable.
Even with a linker error there'd be constant maintenance problem for the user.
Jun 14 2022
parent reply Adam Ruppe <destructionator gmail.com> writes:
On Tuesday, 14 June 2022 at 23:41:55 UTC, Walter Bright wrote:
 Even with a linker error there'd be constant maintenance 
 problem for the user.
What maintenance problem? When a library changes, it recompiles and generates its new interface file (if it even uses interface files; they're extremely rare in real world D code). If the library hasn't changed, there's no need to update anything, inference or not. This is near zero work.
Jun 14 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Tuesday, 14 June 2022 at 23:51:45 UTC, Adam Ruppe wrote:
 On Tuesday, 14 June 2022 at 23:41:55 UTC, Walter Bright wrote:
 Even with a linker error there'd be constant maintenance 
 problem for the user.
What maintenance problem? When a library changes, it recompiles and generates its new interface file (if it even uses interface files; they're extremely rare in real world D code). If the library hasn't changed, there's no need to update anything, inference or not. This is near zero work.
I'm assuming you're arguing for an always-on function attribute inference, at least for `scope` and `return scope`. Please no! This would be outright poison for a stable API or ABI of a library, probably including ARSD. All your public functions have attributes added to them under your nose, and then you cannot change them without breaking client code because it depends on those attributes you did not intend to add. Even in a top-level application it happens every now and then that I intentionally want to disable some attributes from a function. Maybe to hunt for some bug, maybe for DBI or testing purposes. Now I can do that with `void nonScopeArgument(int*){}`. With universal inference I'd have to resort to error-prone hacks like ```D void nonScopeArgument(int* arg) { static int* dummy; if(false) dummy = arg; } ``` . No thanks.
Jun 15 2022
next sibling parent Arafel <er.krali gmail.com> writes:
On 15.06.22 12:49, Dukc wrote:
 This would be outright poison for a stable API or ABI of a library, 
 probably including ARSD. All your public functions have attributes added 
 to them under your nose, and then you cannot change them without 
 breaking client code because it depends on those attributes you did not 
 intend to add.
This. For this to work, there would need to be some way of saying that, for instance, you don't want your function inferred nogc, even it might be in its current implementation (which btw can be just returning "false" or throwing "Not implemented yet"). Why? Because you want the actual interface to leave open the possibility of needing to allocate later, and if your library isn't overall nogc, why would you want to lock just one function as nogc?
Jun 15 2022
prev sibling parent Adam D Ruppe <destructionator gmail.com> writes:
On Wednesday, 15 June 2022 at 10:49:13 UTC, Dukc wrote:
 This would be outright poison for a stable API or ABI of a 
 library, probably including ARSD. All your public functions 
 have attributes added to them under your nose, and then you 
 cannot change them without breaking client code because it 
 depends on those attributes you did not intend to add.
My view of this (which I need to write about in more detail in the blog, but I have 1,000 more lines of code to translate for work..... by tomorrow...... yeah im not gonna make the deadline but still wanna get as close as i can, so blog on hold for a bit) is that the explicit attributes are all that'd count for breakage, and the inferred ones are specifically subject to change at any time and people shouldn't rely on them. The auto-generated .di file might list it, but the documentation absolutely would not. I suppose some users might want to statically guarantee they are only using the stable published interface instead of the unstable inferred one, and that would be a bit tricky to declare given the attributes are part of the abi too. That's one of the things that needs more consideration.
 Even in a top-level application it happens every now and then 
 that I intentionally want to disable some attributes from a 
 function.
It is obvious to me that everything that can be turned on needs to be able to be turned off. tbh it is one if the biggest embarrassments of the attribute soup process we have right now that there STILL isn't a ` gc` or `impure` or whatever after all these years. Even the lowest hanging fruit out of this mess is in limbo.
Jun 15 2022
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
This compiles and prints hello.

```d
string foo(return in string s)  safe
{
     return s;
}
```

So the problem here is something to do with -preview=in and return 
detection of DIP1000.
Jun 08 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 11:58 AM, John Colvin wrote:
 
 The preview switch is changing the meaning of `in` which changes the 
 signature of `foo` (which is then inconsistent with the implementation), 
 which in turn will affect the call sites. This seems roughly as 
 expected, no?
I guess it is! I always thought `in` meant `const scope`, but apparently it no longer does unless you add `-preview=in`. However, I will note that just using `-preview=in` does not cause it to print the `o` strings, only when paired with dip1000 does it do that. Still, I would think a warning, even for ` system` code is warranted. Especially since the compiler is already able to figure this out for ` safe` code. The reality is that people are (mostly) only dealing with the existing implementation, until it stops working. But to silently break it, and silently break it with *memory corruption* does not seem an appropriate penalty. -Steve
Jun 08 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 9:22 AM, Steven Schveighoffer wrote:
 The reality is that people are (mostly) only dealing with the existing 
 implementation, until it stops working. But to silently break it, and silently 
 break it with *memory corruption* does not seem an appropriate penalty.
The point of system code is people can do whatever they want. The answer is to make safe the default. But that DIP was rejected.
Jun 08 2022
parent Paulo Pinto <pjmlp progtools.org> writes:
On Wednesday, 8 June 2022 at 23:24:35 UTC, Walter Bright wrote:
 On 6/8/2022 9:22 AM, Steven Schveighoffer wrote:
 The reality is that people are (mostly) only dealing with the 
 existing implementation, until it stops working. But to 
 silently break it, and silently break it with *memory 
 corruption* does not seem an appropriate penalty.
The point of system code is people can do whatever they want. The answer is to make safe the default. But that DIP was rejected.
Because contrary to Ada, .NET, Rust, Go, Swift,.... it considered calling into C "safe".
Jun 09 2022
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Wednesday, 8 June 2022 at 15:35:56 UTC, Steven Schveighoffer 
wrote:
 So silently changing behavior to create new dangling pointers 
 with a preview switch is ok?

 Remember, there is already code that does this. It's not trying 
 to be clever via scope, it's not trying to be ` safe`, it's 
 expecting that an array literal is allocated on the GC (as has 
 always been the case).
This is one of the reasons why all code should endeavour to be ` safe` wherever possible. I believe C and C++ code often have the same problem: accidently relying on undefined behaviour, that then changes later. D in ` system` or ` trusted` is fundamentally no different, even if it sometimes tries to make footguns harder to make. Alas, I do agree that most of us use ` system` way too much and thus changes like this always trip us, even when they theoretically should not. But I can't see a good way to avoid that. We could in principle try to avoid UB changes until ` safe` has become more widespread, but since we are people I suspect the habits don't change before we are kicked often enough :(.
Jun 08 2022
prev sibling next sibling parent reply Mathias LANG <pro.mathias.lang gmail.com> writes:
On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven Schveighoffer 
wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```
This has nothing to do with `-preview=in`. Change `foo`'s signature to:
 string foo(scope string s)
And you'll see the bug, even without `-preview=dip1000`. Why is this happening ? You correctly guessed, because the frontend wrongfully lets the `string` go on the stack instead of allocating with it. Some of the changes for DIP1000 made it to releases even without the switch, that's one example.
Jun 08 2022
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 8 June 2022 at 17:09:49 UTC, Mathias LANG wrote:
 And you'll see the bug, even without `-preview=dip1000`.

 Why is this happening ? You correctly guessed, because the 
 frontend wrongfully lets the `string` go on the stack instead 
 of allocating with it.

 Some of the changes for DIP1000 made it to releases even 
 without the switch, that's one example.
No, promoting the array on stack is not sufficient to explain the behavior - thought it is certainly part of it. The compiler is going out of his way in some other way to break the code.
Jun 08 2022
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 6/8/22 19:22, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:09:49 UTC, Mathias LANG wrote:
 And you'll see the bug, even without `-preview=dip1000`.

 Why is this happening ? You correctly guessed, because the frontend 
 wrongfully lets the `string` go on the stack instead of allocating 
 with it.
 ...
Your code is literally calling this function: ```d string foo(scope string s){ return s; } ``` This causes UB, therefore you can't blame the compiler frontend here. I guess you can complain about the language specification, but what else are you expecting `scope` to do? There could be some more diagnostics I guess, like for the case where a stack variable is escaped directly.
 Some of the changes for DIP1000 made it to releases even without the 
 switch, that's one example.
No, promoting the array on stack is not sufficient to explain the behavior - thought it is certainly part of it. The compiler is going out of his way in some other way to break the code.
It's reusing the same location on the stack for all instances of `[c]`. I think that's a pretty complete and straightforward explanation of the behavior. What is missing? Anyway, this kind of issue is why one should never rely on undefined behavior giving a specific result; the compiler may get smart about it later.
Jun 08 2022
next sibling parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Wednesday, 8 June 2022 at 18:32:41 UTC, Timon Gehr wrote:
 On 6/8/22 19:22, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:09:49 UTC, Mathias LANG wrote:
 And you'll see the bug, even without `-preview=dip1000`.

 Why is this happening ? You correctly guessed, because the 
 frontend wrongfully lets the `string` go on the stack instead 
 of allocating with it.
 ...
Your code is literally calling this function: ```d string foo(scope string s){ return s; } ``` This causes UB, therefore you can't blame the compiler frontend here.
I got to say here, you shouldn't be able to compile that code at all if it is going to shoot you in the foot unintentionally. - Alex
Jun 08 2022
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/8/22 20:44, 12345swordy wrote:
 On Wednesday, 8 June 2022 at 18:32:41 UTC, Timon Gehr wrote:
 On 6/8/22 19:22, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:09:49 UTC, Mathias LANG wrote:
 And you'll see the bug, even without `-preview=dip1000`.

 Why is this happening ? You correctly guessed, because the frontend 
 wrongfully lets the `string` go on the stack instead of allocating 
 with it.
 ...
Your code is literally calling this function: ```d string foo(scope string s){ return s; } ``` This causes UB, therefore you can't blame the compiler frontend here.
I got to say here, you shouldn't be able to compile that code at all if it is going to shoot you in the foot unintentionally. - Alex
Well, I agree that in simple cases like this one, the compiler should just complain. In general though, it won't understand what's going on. If you want to catch everything, you'll have to use safe, but that will also reject some things that are actually fine.
Jun 08 2022
prev sibling parent reply Meta <jared771 gmail.com> writes:
On Wednesday, 8 June 2022 at 18:44:28 UTC, 12345swordy wrote:
 On Wednesday, 8 June 2022 at 18:32:41 UTC, Timon Gehr wrote:
 On 6/8/22 19:22, deadalnix wrote:
 On Wednesday, 8 June 2022 at 17:09:49 UTC, Mathias LANG wrote:
 And you'll see the bug, even without `-preview=dip1000`.

 Why is this happening ? You correctly guessed, because the 
 frontend wrongfully lets the `string` go on the stack 
 instead of allocating with it.
 ...
Your code is literally calling this function: ```d string foo(scope string s){ return s; } ``` This causes UB, therefore you can't blame the compiler frontend here.
I got to say here, you shouldn't be able to compile that code at all if it is going to shoot you in the foot unintentionally. - Alex
I believe this is because foo is not annotated with safe, thus it's system by default and you're allowed to do all kinds of unsafe things. Mark it safe and the compiler will correctly complain: ``` safe string foo(in string s) { return s; // Error: scope variable `s` may not be returned } void main() { import std.stdio; string[] result; foreach(c; "hello") { result ~= foo([c]); } writeln(result); } ``` In addition, changing `in` to `const return scope` makes the compiler aware that you intend to return the value, and thus it seems to somehow know not to re-use that stack space, and correctly prints ["h", "e", "l", "l", "o"].
Jun 08 2022
parent reply 12345swordy <alexanderheistermann gmail.com> writes:
On Wednesday, 8 June 2022 at 19:07:00 UTC, Meta wrote:
 On Wednesday, 8 June 2022 at 18:44:28 UTC, 12345swordy wrote:
 On Wednesday, 8 June 2022 at 18:32:41 UTC, Timon Gehr wrote:
 [...]
I got to say here, you shouldn't be able to compile that code at all if it is going to shoot you in the foot unintentionally. - Alex
I believe this is because foo is not annotated with safe, thus it's system by default and you're allowed to do all kinds of unsafe things. Mark it safe and the compiler will correctly complain: ``` safe string foo(in string s) { return s; // Error: scope variable `s` may not be returned } void main() { import std.stdio; string[] result; foreach(c; "hello") { result ~= foo([c]); } writeln(result); } ``` In addition, changing `in` to `const return scope` makes the compiler aware that you intend to return the value, and thus it seems to somehow know not to re-use that stack space, and correctly prints ["h", "e", "l", "l", "o"].
You shouldn't have to mark your functions safe to prevent shooting yourself in the foot. It should give a warning message that can be surpass by explicitly marking your function as system. -Alex
Jun 08 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 12:33 PM, 12345swordy wrote:
 You shouldn't have to mark your functions safe to prevent shooting yourself in 
 the foot.
I agree, but the DIP to make functions safe by default was rejected.
Jun 08 2022
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 8 June 2022 at 23:29:53 UTC, Walter Bright wrote:
 On 6/8/2022 12:33 PM, 12345swordy wrote:
 You shouldn't have to mark your functions safe to prevent 
 shooting yourself in the foot.
I agree, but the DIP to make functions safe by default was rejected.
Because it contained “assume all C code is safe unless explicitly marked otherwise”, which is pretty wild!
Jun 09 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 11:32 AM, Timon Gehr wrote:
 Your code is literally calling this function:
 
 ```d
 string foo(scope string s){ return s; }
 ```
 
 This causes UB, therefore you can't blame the compiler frontend here. I guess 
 you can complain about the language specification, but what else are you 
 expecting `scope` to do? There could be some more diagnostics I guess, like
for 
 the case where a stack variable is escaped directly.
Annotating foo() with scope yields: test6.d(5): Deprecation: scope variable `s` may not be returned
Jun 08 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 1:09 PM, Mathias LANG wrote:
 On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven Schveighoffer wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```
This has nothing to do with `-preview=in`. Change `foo`'s signature to:
 string foo(scope string s)
And you'll see the bug, even without `-preview=dip1000`.
Yes, it has been noted by John. But for some reason, this specific code doesn't fail with `-preview=dip1000` or `-preview=in`, but only when both are specified. Apparently `in` under preview in doesn't really mean the same thing as `scope const`. So does it have nothing to do with preview in? simple experimentation says it does. Note that `scope` arrays started being allocated on the stack in 2.092.0, coincidentally the same release that added `-preview=in`.
 Why is this happening ? You correctly guessed, because the frontend 
 wrongfully lets the `string` go on the stack instead of allocating with it.
Whether it's right or wrong, it's a change that silently introduces memory corruption. It ought to produce a warning for such code. I'm not blaming anybody here for behavior that is probably correct per the spec. But is there no mechanism to be had for warning about this? D already doesn't allow returning a pointer to stack data, even in ` system` code. Doesn't this also qualify? -Steve
Jun 08 2022
next sibling parent reply Mathias LANG <pro.mathias.lang gmail.com> writes:
On Wednesday, 8 June 2022 at 19:02:53 UTC, Steven Schveighoffer 
wrote:
 But for some reason, this specific code doesn't fail with 
 `-preview=dip1000` or `-preview=in`, but only when both are 
 specified.

 Apparently `in` under preview in doesn't really mean the same 
 thing as `scope const`. So does it have nothing to do with 
 preview in? simple experimentation says it does.
Well the real bug in this code lies in `scope` + array being allocated on stack. Cherry picking the instance where it happens with `-preview=in` is just pointing the finger at a subset of the problem, which is probably where you first encountered the issue ? You unfortunately are at the weird intersection of a few design decisions made by a few different people. Story time: When DIP1000 was first introduced, there was a lot of discussion to make sure we have it behind a switch. So `-dip1000` was born. Later on, we introduced the `-preview` / `-revert` / `-transition` trifecta as more work was being done on potentially breaking change features. Now the decision that Walter made, and that I, honestly, still do not understand to this day, was to make `in` means simply `const`. The rationale was that it would "break too much code", and if I'm not mistaken, that people expected `scope` to do nothing (exception for delegates / class alloc), but changing the semantic of `in` would have too much of an impact. The DIP1000 changes being behind a preview switch, IMO, it shouldn't have been a problem. Later on, there was a push to get rid of most, if not all, usages of `in`, especially in druntime C bindings were they were favored. In parallel, Atila submitted a PR for a new `-preview` switch which would make `in` have its actual meaning (dear old `-preview=inMeansConstRef`). I once again expressed my (unfavorable) opinion on the topic (https://github.com/dlang/dmd/pull/10769#issuecomment-583229694). Now, that's where I come in. I implemented `-preview=in` as you see it today. But there was ONE constraint I had to keep in, when I did so: `-preview=in` should also make `in` act as `scope`. That's how you end up with: https://github.com/dlang/dmd/blob/7e1b115a0c62c04b74fecfed8a220d5e31cf4fe0/src/dmd/mtype.d#L4397-L4399 Does it make sense ? I don't think so. If it was up to me, `in` would always means `scope`. Should we change it now ? If we couldn't do it when `scope` had *no* effect unless `-dip1000` was used, I don't see how we could now that `scope` does have an effect even without `-dip1000`.
 Whether it's right or wrong, it's a change that silently 
 introduces memory corruption. It ought to produce a warning for 
 such code. I'm not blaming anybody here for behavior that is 
 probably correct per the spec. But is there no mechanism to be 
 had for warning about this? D already doesn't allow returning a 
 pointer to stack data, even in ` system` code. Doesn't this 
 also qualify?
I have argued with Walter for a long time that having `scope` enforced only in ` safe` code was a grave mistake, and would be extremely confusing. It would have avoided this situation.
Jun 08 2022
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 12:59 PM, Mathias LANG wrote:
 I have argued with Walter for a long time that having `scope` enforced only in 
 ` safe` code was a grave mistake, and would be extremely confusing. It would 
 have avoided this situation.
The idea is that safe should be the default and so system code would be rare and would stick out like a sore thumb. In system code, maintaining the invariants of the function parameters is entirely up to the programmer.
Jun 08 2022
parent reply Mathias LANG <pro.mathias.lang gmail.com> writes:
On Wednesday, 8 June 2022 at 23:35:03 UTC, Walter Bright wrote:
 On 6/8/2022 12:59 PM, Mathias LANG wrote:
 I have argued with Walter for a long time that having `scope` 
 enforced only in ` safe` code was a grave mistake, and would 
 be extremely confusing. It would have avoided this situation.
The idea is that safe should be the default and so system code would be rare and would stick out like a sore thumb. In system code, maintaining the invariants of the function parameters is entirely up to the programmer.
` safe` by default is only viable if ` safe` has minimum to no friction. This isn't the case *at all*. We have examples in the standard library itself. It's also departing from D's identity as a system programming language which makes it trivial to interop with C / C++, alienating a sizeable portion of our community in the process.
Jun 08 2022
next sibling parent reply forkit <forkit gmail.com> writes:
On Thursday, 9 June 2022 at 00:04:07 UTC, Mathias LANG wrote:
 ` safe` by default is only viable if ` safe` has minimum to no 
 friction.
 This isn't the case *at all*. We have examples in the standard 
 library itself.

 It's also departing from D's identity as a system programming 
 language which makes it trivial to interop with C / C++, 
 alienating a sizeable portion of our community in the process.
It does seem to me, that there is greater move towards memory safety 'by default', and not away from it. Rust has demonstrated, that this is not antithetical to systems programming. Unsafe is always available when it's required. But yes, certainly, safe needs a lot more work for it to ever be considered as a default in D. I have it as default in every module I create. I usually have to end up commenting it out, even for simple things. But IMO, in programming, one should be moving towards being explicit about being unsafe, rather than being safe. Those who are truly alienated by such an idea, have C/C++ ;-)
Jun 08 2022
parent Paulo Pinto <pjmlp progtools.org> writes:
On Thursday, 9 June 2022 at 00:34:51 UTC, forkit wrote:
 On Thursday, 9 June 2022 at 00:04:07 UTC, Mathias LANG wrote:
 ` safe` by default is only viable if ` safe` has minimum to no 
 friction.
 This isn't the case *at all*. We have examples in the standard 
 library itself.

 It's also departing from D's identity as a system programming 
 language which makes it trivial to interop with C / C++, 
 alienating a sizeable portion of our community in the process.
It does seem to me, that there is greater move towards memory safety 'by default', and not away from it. Rust has demonstrated, that this is not antithetical to systems programming. Unsafe is always available when it's required. But yes, certainly, safe needs a lot more work for it to ever be considered as a default in D. I have it as default in every module I create. I usually have to end up commenting it out, even for simple things. But IMO, in programming, one should be moving towards being explicit about being unsafe, rather than being safe. Those who are truly alienated by such an idea, have C/C++ ;-)
Fun historical fact, the first languages to have unsafe code blocks date back to early 60's, late 50's, JOVIAL and ESPOL (superseded by NEWP). NEWP is still used in production, on ClearPath MCP mainframes sold by Unisys, and I wouldn't be surprised if US army wouldn't have some stuff running JOVIAL.
Jun 09 2022
prev sibling next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Thursday, 9 June 2022 at 00:04:07 UTC, Mathias LANG wrote:
 On Wednesday, 8 June 2022 at 23:35:03 UTC, Walter Bright wrote:
 On 6/8/2022 12:59 PM, Mathias LANG wrote:
 I have argued with Walter for a long time that having `scope` 
 enforced only in ` safe` code was a grave mistake, and would 
 be extremely confusing. It would have avoided this situation.
The idea is that safe should be the default and so system code would be rare and would stick out like a sore thumb. In system code, maintaining the invariants of the function parameters is entirely up to the programmer.
` safe` by default is only viable if ` safe` has minimum to no friction. This isn't the case *at all*. We have examples in the standard library itself. It's also departing from D's identity as a system programming language which makes it trivial to interop with C / C++, alienating a sizeable portion of our community in the process.
Swift, Ada, Modula-2, Modula-3, Rust, to pick only examples of languages that have an acknoledged identity as systems programming languages, had no issue considering calls into C and C++ unsafe. Three of them have more users in 2022 than D can ever aspire by current trends.
Jun 09 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 12:20 AM, Paulo Pinto wrote:
 Swift, Ada, Modula-2, Modula-3, Rust, to pick only examples of languages that 
 have an acknoledged identity as systems programming languages, had no issue 
 considering calls into C and C++ unsafe.
There isn't a conversion problem when the language starts out as safe.
Jun 14 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 5:04 PM, Mathias LANG wrote:
 ` safe` by default is only viable if ` safe` has minimum to no friction.
 This isn't the case *at all*. We have examples in the standard library itself.
I've converted a lot of old C code to D. safe errors crop up regularly, but in nearly every case it is trivial to fix.
 It's also departing from D's identity as a system programming language which 
 makes it trivial to interop with C / C++, alienating a sizeable portion of our 
 community in the process.
On the other hand, people want their code to be more reliable. Buffer overflows
Jun 13 2022
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 8 June 2022 at 19:02:53 UTC, Steven Schveighoffer 
wrote:
 D already doesn't allow returning a pointer to stack data, even 
 in ` system` code. Doesn't this also qualify?

 -Steve
D does allow it, just not directly: ```D // error int* escape(int arg){return &arg;} // ok int* escape(int arg) { auto temp = &arg; return temp; } ``` The philosophy is to force to be explicit in the simplest cases that are almost always errors, but not trying to seriously prevent the escape.
Jun 08 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 4:49 PM, Dukc wrote:

 The philosophy is to force to be explicit in the simplest cases that are 
 almost always errors, but not trying to seriously prevent the escape.
This is the same case. You are returning what is a a reference to a scope (local) variable, directly. Yes, I know this doesn't mean it catches everything. There is also something to be said about dip1000 not allowing this even for system code, when it can prove imminent memory corruption. system code should not set footguns all over the floor, they should be on a back shelf, where you have to go and grab it in order to fire southward. -Steve
Jun 08 2022
parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 8 June 2022 at 21:00:25 UTC, Steven Schveighoffer 
wrote:
 On 6/8/22 4:49 PM, Dukc wrote:

 The philosophy is to force to be explicit in the simplest 
 cases that are almost always errors, but not trying to 
 seriously prevent the escape.
This is the same case. You are returning what is a a reference to a scope (local) variable, directly.
I suppose you're right, that would be good if it errored. In fact, I'm surprised the compiler does not siletly add `return`. I did complain about silent adding of `return` in a fairly similar function here: https://forum.dlang.org/thread/edtbjavjzkwogvutxpho forum.dlang.org I did initially advocate for compiling without adding `return` but as your example demonstrates an error would be better.
Jun 08 2022
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 3:03 PM, Dukc wrote:
 In fact, I'm surprised the compiler does not siletly add `return`.
D does not do attribute inference for regular functions, because of the risk of a mismatch between a function's definition (with body) and a declaration (without body) of the same function.
Jun 08 2022
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 1:49 PM, Dukc wrote:
 The philosophy is to force to be explicit in the simplest cases that are
almost 
 always errors, but not trying to seriously prevent the escape.
Yup. That simple check is also what the original D1 compiler did (and what C and C++ compilers do). Making `scope` work to plug all the holes was a vast increase in technology. `scope` engenders a lot of complaints, too, because it is ruthless and demanding and annoys people with the annotations. Fortunately, D gives you a choice - use safe and the compiler will kick in and demand that the code stays in its lane. Or use system, and do whatevs, and the onus is on you to be careful with that saw as the blade guards are removed.
Jun 08 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/8/2022 12:02 PM, Steven Schveighoffer wrote:
 But for some reason,
The reason is it's system code, where it's on the programmer. safe layers on a vast smorgasbord of extra checking.
Jun 08 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 7:31 PM, Walter Bright wrote:
 On 6/8/2022 12:02 PM, Steven Schveighoffer wrote:
 But for some reason,
The reason is it's system code, where it's on the programmer. safe layers on a vast smorgasbord of extra checking.
I'll respond basically to all your points here. 1. Yes, I get that this is system code, and it appears that returning of scope data in system code is obviously subject to memory corruption. For some reason, while you can't return a pointer to a local, you can return a scope pointer. 2. The programmer is *not* expecting this. They did not write `scope`, they wrote `in`, which according to the spec is "equivalent to const" (see https://dlang.org/spec/function.html#in-params). I'm convinced that we *absolutely cannot* turn on preview in by default until this is addressed. I can't even recommend using the preview switch, as this is too dangerous for memory safety. 3. The safe by default DIP (as everyone else has mentioned) was great, except for extern(C) functions. I believe a vast majority wanted it without that poison pill. -Steve
Jun 08 2022
next sibling parent reply Mathias LANG <pro.mathias.lang gmail.com> writes:
On Thursday, 9 June 2022 at 01:18:30 UTC, Steven Schveighoffer 
wrote:
 2. The programmer is *not* expecting this. They did not write 
 `scope`, they wrote `in`, which according to the spec is 
 "equivalent to const" (see 
 https://dlang.org/spec/function.html#in-params). I'm convinced 
 that we *absolutely cannot* turn on preview in by default until 
 this is addressed. I can't even recommend using the preview 
 switch, as this is too dangerous for memory safety.
According to the spec, `in` means `const scope` when you are using `-preview=in`.
Jun 08 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/8/22 9:27 PM, Mathias LANG wrote:
 On Thursday, 9 June 2022 at 01:18:30 UTC, Steven Schveighoffer wrote:
 2. The programmer is *not* expecting this. They did not write `scope`, 
 they wrote `in`, which according to the spec is "equivalent to const" 
 (see https://dlang.org/spec/function.html#in-params). I'm convinced 
 that we *absolutely cannot* turn on preview in by default until this 
 is addressed. I can't even recommend using the preview switch, as this 
 is too dangerous for memory safety.
According to the spec, `in` means `const scope` when you are using `-preview=in`.
Existing code likely does not use -preview=in. -Steve
Jun 08 2022
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Thursday, 9 June 2022 at 01:18:30 UTC, Steven Schveighoffer 
wrote:
 For some reason, while you can't return a pointer to a local, 
 you can return a scope pointer.
A pointer to a local is guaranteed to be a dangling pointer when you return it, while a `scope` pointer is not guaranteed to be memory with limited lifetime when you return it. `scope` is only a conservative compile-time approximation of what's actually happening, which makes it susceptible to false positives: ```D int* f(int x) safe { int* p = &x; // p is inferred scope here p = new int; // p is no longer pointing to stack memory return p; // Error: scope variable `p` may not be returned } ``` This function could be permitted as system or trusted code.
Jun 09 2022
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09.06.22 16:46, Dennis wrote:
 On Thursday, 9 June 2022 at 01:18:30 UTC, Steven Schveighoffer wrote:
 For some reason, while you can't return a pointer to a local, you can 
 return a scope pointer.
A pointer to a local is guaranteed to be a dangling pointer when you return it, while a `scope` pointer is not guaranteed to be memory with limited lifetime when you return it. `scope` is only a conservative compile-time approximation of what's actually happening, which makes it susceptible to false positives: ```D int* f(int x) safe {     int* p = &x; // p is inferred scope here     p = new int; // p is no longer pointing to stack memory     return p;    // Error: scope variable `p` may not be returned } ``` This function could be permitted as system or trusted code.
Sure, and it should be. But the example was this: ```d int* foo(scope int* s){ return s; } ``` There is no upside to allowing this `scope` annotation.
Jun 09 2022
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 9 June 2022 at 15:23:35 UTC, Timon Gehr wrote:
 [snip]

 Sure, and it should be. But the example was this:

 ```d
 int* foo(scope int* s){ return s; }
 ```

 There is no upside to allowing this `scope` annotation.
Am I right that the reason to only issue the error in safe code that the compiler can only be 100% sure that it is scope is in safe code? What about making it so that in system code it does the checks the best that it can? In other words, if the compiler can verify that you aren't using scope properly in system code, then it gives an error as if it is safe, but if it can't verify it then the current behavior with -preview=in occurs (I understand some risk of undefined behavior). I suppose the problem with that is that the user may not know when the scope suddenly stops actually checking correctly in system code. The other alternative would be that scope works regardless of safe/ system code and fails to compile if the compiler can't verify it properly. That makes it a bit harder to use though, which I imagine is why they didn't go that way.
Jun 09 2022
parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 9 June 2022 at 15:38:10 UTC, jmh530 wrote:
 On Thursday, 9 June 2022 at 15:23:35 UTC, Timon Gehr wrote:
 [snip]
 There is no upside to allowing this `scope` annotation.
[snip]
Below is also unintuitive as the compiler has all the information needed to verify that the reference is escaping. Also, since the escape analysis occurs only in safe functions, it is assumed that no escape happens in the trusted one called by the safe one. The situation reminds me a little of `restrict` in C. `restrict` tells the compiler to make certain assumptions about the code, but it is up the programmer to ensure that those assumptions are upheld. ```d int* a; safe void foo(scope int* x) { //a = x; //error bar(x); } trusted void bar(scope int* x) { a = x; } void main() { int x = 1; foo(&x); assert(*a == 1); } ```
Jun 09 2022
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Thursday, 9 June 2022 at 15:23:35 UTC, Timon Gehr wrote:
 There is no upside to allowing this `scope` annotation.
You could call that function `assumeNonScope` and use it to bypass lifetime errors that are false positives.
Jun 09 2022
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 6/9/22 20:46, Dennis wrote:
 On Thursday, 9 June 2022 at 15:23:35 UTC, Timon Gehr wrote:
 There is no upside to allowing this `scope` annotation.
You could call that function `assumeNonScope` and use it to bypass lifetime errors that are false positives.
In exchange for apparently inviting UB. Not a great trade-off.
Jun 09 2022
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/9/22 10:46 AM, Dennis wrote:
 On Thursday, 9 June 2022 at 01:18:30 UTC, Steven Schveighoffer wrote:
 For some reason, while you can't return a pointer to a local, you can 
 return a scope pointer.
A pointer to a local is guaranteed to be a dangling pointer when you return it, while a `scope` pointer is not guaranteed to be memory with limited lifetime when you return it. `scope` is only a conservative compile-time approximation of what's actually happening, which makes it susceptible to false positives: ```D int* f(int x) safe {     int* p = &x; // p is inferred scope here     p = new int; // p is no longer pointing to stack memory     return p;    // Error: scope variable `p` may not be returned } ``` This function could be permitted as system or trusted code.
I want to stress that I'm actually OK with the current situation as far as returning scope pointers as not-scope pointers in system code. I've returned &this quite a bit in my code. What is not OK is the compiler turning actual requests for GC allocation into stack allocations based on that. At this point, it's a literal, but if it did this for e.g. `new T`, we are in for a lot of trouble. I'll ask, is it undefined behavior to return a scope pointer as a non-scope pointer? If so, should we make UB so easy to do? This obscure interaction between the attributes, and the weird relationship between the scopeness of the return value (you can't label the return value as scope, you have to instead label the parameter as return), paired with the odd choices the compiler makes, are going to lead to insanely hard-to-find memory corruption problems. In any case, I filed a bugzilla issue: https://issues.dlang.org/show_bug.cgi?id=23175 -Steve
Jun 09 2022
next sibling parent Nick Treleaven <nick geany.org> writes:
On Thursday, 9 June 2022 at 15:30:47 UTC, Steven Schveighoffer 
wrote:
 What is not OK is the compiler turning actual requests for GC 
 allocation into stack allocations based on that. At this point, 
 it's a literal, but if it did this for e.g. `new T`, we are in 
 for a lot of trouble.
scope has to have a strong guarantee to be meaningful. Why use scope in system code if it was ignored by the compiler? The optimization should be allowed.
 I'll ask, is it undefined behavior to return a scope pointer as 
 a non-scope pointer? If so, should we make UB so easy to do?
We shouldn't, but we can't detect all cases of wrong use of scope in general in system code. We do need a warning in the scope docs though. ...
 In any case, I filed a bugzilla issue: 
 https://issues.dlang.org/show_bug.cgi?id=23175
Thanks for finding this. I think the problem is changing the meaning of `in` in system code. If `in` is to mean scope too then a deprecation period is needed to weed out any uses of `in` in system code. (`in` meaning scope too in safe code is fine).
Jun 09 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 8:30 AM, Steven Schveighoffer wrote:
 What is not OK is the compiler turning actual requests for GC allocation into 
 stack allocations based on that.
It's necessary for D generated code to be competitive.
Jun 13 2022
parent reply forkit <forkit gmail.com> writes:
On Monday, 13 June 2022 at 23:59:56 UTC, Walter Bright wrote:
 On 6/9/2022 8:30 AM, Steven Schveighoffer wrote:
 What is not OK is the compiler turning actual requests for GC 
 allocation into stack allocations based on that.
It's necessary for D generated code to be competitive.
As I work at a much higher level, conceptually, I've always accepted the proposition that stack allocation is much faster than heap allocation. But I am curious. How does one, these days, compare the performance of stack allocation vs heap allocation? Are you also factoring into your argument (for competitively generated code), de-allocation? In which case, if there is a lot of spare memory, how does one compare the speed of stack de-allocation to gc de-allocation (since the more spare memory, the less need to GC deallocate). I am surprised, if it's true, that modern hardware still can't provide heap based allocation at least as fast as stack based allocation. Of course, I have no idea - it's not my area, by a long shot - I'm just curious whether its really true, these days, or it's just something we accept to be true, cause it used to be true?
Jun 13 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2022 1:45 PM, forkit wrote:
 I am surprised, if it's true, that modern hardware still can't provide 
 heap based allocation at least as fast as stack based allocation. Of 
 course, I have no idea - it's not my area, by a long shot - I'm just 
 curious whether its really true, these days, or it's just something we 
 accept to be true, cause it used to be true?
There is no such thing as hardware assisted memory allocation. Memory allocators were data structures in 1980, and they still are today. The only difference being sbrk has now been deprecated by POSIX standard. https://www.amazon.com/Structure-Techniques-Addison-Wesley-computer-science/dp/0201072564 https://www.amazon.com/Garbage-Collection-Handbook-Management-Algorithms/dp/0367659247
Jun 13 2022
parent reply forkit <forkit gmail.com> writes:
On Tuesday, 14 June 2022 at 01:54:03 UTC, rikki cattermole wrote:
 There is no such thing as hardware assisted memory allocation.

 Memory allocators were data structures in 1980, and they still 
 are today.
if I have a lot of spare memory, I cannot see the advantage of stack allocation over heap allocation. i.e: advantage of stack allocation (over heap allocation): - no gc pauses - better memory utilisation these advantages become less relevant when plenty of spare memory is available, true? In this situation, the only advantage stack allocation would have over heap allocation, is that stack allocation is somehow faster than heap allocation. But what would be the basis (evidence) for such an assertion? Do we know this assertion to be true?
Jun 13 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
All heap memory allocations are expensive.

I cannot emphasize this enough.

Stack allocation costs one instruction, an add. That's it. Heap 
allocation cannot compete.
Jun 13 2022
next sibling parent zjh <fqbqrr 163.com> writes:
On Tuesday, 14 June 2022 at 02:35:22 UTC, rikki cattermole wrote:
 All heap memory allocations are expensive.

 I cannot emphasize this enough.

 Stack allocation costs one instruction, an add. That's it. Heap 
 allocation cannot compete.
I don't even use `new`!
Jun 13 2022
prev sibling parent reply forkit <forkit gmail.com> writes:
On Tuesday, 14 June 2022 at 02:35:22 UTC, rikki cattermole wrote:
 All heap memory allocations are expensive.

 I cannot emphasize this enough.

 Stack allocation costs one instruction, an add. That's it. Heap 
 allocation cannot compete.
ok. just in terms of 'allocation', for an int, for example: https://d.godbolt.org/z/Y6E668hEn it's a difference of 4 instructions (with stack having the lesser amount) then the question I have is, how much faster does it take for *my* cpu to do those 6 instructions, vs those 10 instructions. that seems difficult to accurately determine - as a constant ;-) So I presume the assertion that stack 'allocation' is faster than heap 'allocation', is based purely on the basis that there are 'a few less' instructions involved in stack allocation.
Jun 13 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2022 4:24 PM, forkit wrote:
 So I presume the assertion that stack 'allocation' is faster than heap 
 'allocation', is based purely on the basis that there are 'a few less' 
 instructions involved in stack allocation.
No. The stack allocation uses a single mov instruction which is as cheap as you can get in terms of instructions. You are comparing that against a function call which uses linked lists, atomics, locks and syscalls. All of which are pretty darn expensive individually let alone together. These two things are in very different categories of costs. One is practically free, the other is measurable.
Jun 13 2022
parent reply forkit <forkit gmail.com> writes:
On Tuesday, 14 June 2022 at 04:40:44 UTC, rikki cattermole wrote:
 No.

 The stack allocation uses a single mov instruction which is as 
 cheap as you can get in terms of instructions.

 You are comparing that against a function call which uses 
 linked lists, atomics, locks and syscalls. All of which are 
 pretty darn expensive individually let alone together.

 These two things are in very different categories of costs.

 One is practically free, the other is measurable.
how do I can explain this result in godbolt(using the -O paramater to ldc2): https://d.godbolt.org/z/hhT8MPesv if I understand the output correctly (and it's possible I don't), then it's telling me that there is no difference, in terms of the number of intructions needed, to allocate an int on the stack vs allocating it on the heap - no difference whatsoever. I don't get it. Is the outcome perculiar to just this simple example?
Jun 14 2022
parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 14 June 2022 at 07:17:39 UTC, forkit wrote:
 On Tuesday, 14 June 2022 at 04:40:44 UTC, rikki cattermole 
 wrote:
 [...]
how do I can explain this result in godbolt(using the -O paramater to ldc2): https://d.godbolt.org/z/hhT8MPesv if I understand the output correctly (and it's possible I don't), then it's telling me that there is no difference, in terms of the number of intructions needed, to allocate an int on the stack vs allocating it on the heap - no difference whatsoever. I don't get it. Is the outcome perculiar to just this simple example?
The compiler removed it, as you aren't using it, naturally it looks the same. Now tell the compiler not to be smart helping you, https://d.godbolt.org/z/fY14oz86E
Jun 14 2022
parent reply forkit <forkit gmail.com> writes:
On Tuesday, 14 June 2022 at 07:34:29 UTC, Paulo Pinto wrote:
 The compiler removed it, as you aren't using it,  naturally it 
 looks the same.

 Now tell the compiler not to be smart helping you,

 https://d.godbolt.org/z/fY14oz86E
I expect the compiler to be smart in helping me ;-) In yet another example, the argument that 'less instructions are needed to allocate on the heap vs the stack', does not hold up to further scrutiny (at least in this optimised example). https://d.godbolt.org/z/E85hWrocM Presumably, the more complex the type of allocation be requested, the more difficult it is for the optimiser to optimise that request, and so you do in fact end up with more instructions for heap allocation. That is the only explanation I can come up with ;-) But i leave further analysis to another thread ;-)
Jun 14 2022
next sibling parent Paulo Pinto <pjmlp progtools.org> writes:
On Tuesday, 14 June 2022 at 09:23:36 UTC, forkit wrote:
 On Tuesday, 14 June 2022 at 07:34:29 UTC, Paulo Pinto wrote:
 [...]
I expect the compiler to be smart in helping me ;-) In yet another example, the argument that 'less instructions are needed to allocate on the heap vs the stack', does not hold up to further scrutiny (at least in this optimised example). https://d.godbolt.org/z/E85hWrocM Presumably, the more complex the type of allocation be requested, the more difficult it is for the optimiser to optimise that request, and so you do in fact end up with more instructions for heap allocation. That is the only explanation I can come up with ;-) But i leave further analysis to another thread ;-)
Now you asked for too little help, with -O3 it is the same, https://d.godbolt.org/z/M41EKvdT1
Jun 14 2022
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 14/06/2022 9:23 PM, forkit wrote:
 In yet another example, the argument that 'less instructions are needed 
 to allocate on the heap vs the stack', does not hold up to further 
 scrutiny (at least in this optimised example).
It still holds up. In both assembly no heap allocations took place. The compiler optimized it down to a register.
Jun 14 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/9/2022 7:46 AM, Dennis wrote:
 A pointer to a local is guaranteed to be a dangling pointer when you return
it, 
 while a `scope` pointer is not guaranteed to be memory with limited lifetime 
 when you return it. `scope` is only a conservative compile-time approximation
of 
 what's actually happening, which makes it susceptible to false positives:
 
 ```D
 int* f(int x)  safe {
      int* p = &x; // p is inferred scope here
      p = new int; // p is no longer pointing to stack memory
      return p;    // Error: scope variable `p` may not be returned
 }
 ```
 This function could be permitted as  system or  trusted code.
I suggest there is little point to permitting it, as good style would expect that a different variable be used for each purpose, rather than "recycling" an existing variable. I.e.: ```D int* f(int x) safe { int* p = &x; int* q = new int; return q; } ```
Jun 13 2022
parent Dennis <dkorpel gmail.com> writes:
On Monday, 13 June 2022 at 23:58:26 UTC, Walter Bright wrote:
 I suggest there is little point to permitting it, as good style 
 would expect that a different variable be used for each 
 purpose, rather than "recycling" an existing variable.
I chose the example for its simplicity, not for its good style. The point remains that `scope` checking has false positives and it does crop up in real code. Look for example at [this refactoring that had to be done in Phobos](https://github.com/dlang/phobos/pull/8116/files) because `tempCString` deals with pointers to either stack-allocated or heap-allocated memory based on the length of a run-time string. That being said, I wouldn't be against eventually doing `scope` checks in ` system` code as long as there's still some kind of escape hatch.
Jun 14 2022
prev sibling parent reply Dukc <ajieskola gmail.com> writes:
On Wednesday, 8 June 2022 at 14:52:53 UTC, Steven Schveighoffer 
wrote:
 ```d
 string foo(in string s)
 {
     return s;
 }

 void main()
 {
     import std.stdio;
     string[] result;
     foreach(c; "hello")
     {
         result ~= foo([c]);
     }
     writeln(result);
 }
 ```

 With no previews, preview=dip1000, or preview=in, this outputs: 
 `["h", "e", "l", "l", "o"]`

 With both preview=dip1000 and preview=in, this outputs: `["o", 
 "o", "o", "o", "o"]`

 What is happening is the compiler is somehow convinced that it 
 can allocate the array literal on the stack (and overwrites 
 that literal each loop).

 I know this isn't ` safe` code, but ` system` code shouldn't be 
 made less safe by the preview switches!

 I know people write `in` instead of `const` all the time 
 *simply because it's shorter*.

 Thoughts?

 -Steve
Sorry to wake up an old thread, but I have a request. May I use this as an example of what can go wrong in ` system` code for the "Memory safety in modern systems programming language" series in the D blog? And if, is a direct link here okay?
Nov 12 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/12/22 2:35 PM, Dukc wrote:
 
 Sorry to wake up an old thread, but I have a request. May I use this as 
 an example of what can go wrong in ` system` code for the "Memory safety 
 in modern systems programming language" series in the D blog? And if, is 
 a direct link here okay?
Of course! Feel free to link to anything I've said on these forums. -Steve
Nov 12 2022
parent Dukc <ajieskola gmail.com> writes:
On Saturday, 12 November 2022 at 22:09:45 UTC, Steven 
Schveighoffer wrote:
 On 11/12/22 2:35 PM, Dukc wrote:
 
 Sorry to wake up an old thread, but I have a request. May I 
 use this as an example of what can go wrong in ` system` code 
 for the "Memory safety in modern systems programming language" 
 series in the D blog? And if, is a direct link here okay?
Of course! Feel free to link to anything I've said on these forums.
Thanks!
Nov 13 2022