www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D defined behavior

reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
I've lost track. Does D have undefined behavior at all? (e.g. 
outside  safe code).

In any case, I imagine that the D spec (ha!) would dictate that 
the following code should print 0x10001, although that's not 
quite clear.

GDC (optimized) prints 0x1.
LDC2 prints 0x10001, but I suspect only because LLVM doesn't 
happen to do the same optimization, not because this is handled 
properly.

```
import std.stdio;

 safe:

int foo(int* a, int* b) {
     *a = 1;
     *b = 1;
     return *a;
}

void main() {
     ubyte[6] buffer;
     auto a = cast(int[]) (buffer[0 .. 4]);
     auto b = cast(int[]) (buffer[2 .. 6]);
     writefln("0x%X", foo(&a[0], &b[0]));
}
```
Apr 27 2020
next sibling parent reply Johan <j j.nl> writes:
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:
 I've lost track. Does D have undefined behavior at all? (e.g. 
 outside  safe code).
Yes, D has UB.
 In any case, I imagine that the D spec (ha!) would dictate that 
 the following code should print 0x10001, although that's not 
 quite clear.

 GDC (optimized) prints 0x1.
 LDC2 prints 0x10001, but I suspect only because LLVM doesn't 
 happen to do the same optimization, not because this is handled 
 properly.
I think you mean to ask whether D defines writing to partially overlapping objects. Or if it is defined to write a type `int` to memory locations that were typed as `char`. LDC assumes `a` and `b` may alias partially, and therefore `a` must be reloaded in the return statement. LDC does not pass type-based alias information to the optimizer (TBAA if you are interested), which Clang _would_ do; it be nice if someone could check me on this: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules Still, Clang does not optimize the code as it probably could, same as GCC does. My guess is that GDC inherits the C or C++ type-based aliasing rules here. -Johan
Apr 27 2020
next sibling parent =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:
 I think you mean to ask whether D defines writing to partially 
 overlapping objects. Or if it is defined to write a type `int` 
 to memory locations that were typed as `char`.
I left it purposefully vague because it wasn't clear what question should be asked at that level (and because I'm *really* short for time at the moment, sorry). For instance, another possibly question is if it's valid for safe code to create pointers to unaligned values. Avoiding that could potentially provide another avenue for addressing issues like this, I guess.
 LDC assumes `a` and `b` may alias partially, and therefore `a` 
 must be reloaded in the return statement. LDC does not pass 
 type-based alias information to the optimizer (TBAA if you are 
 interested), which Clang _would_ do; it be nice if someone 
 could check me on this: 
 https://llvm.org/docs/LangRef.html#pointer-aliasing-rules
 Still, Clang does not optimize the code as it probably could, 
 same as GCC does. My guess is that GDC inherits the C or C++ 
 type-based aliasing rules here.
Thanks for the details.
Apr 27 2020
prev sibling parent reply =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:
 Yes, D has UB.
Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
Apr 27 2020
next sibling parent reply welkam <wwwelkam gmail.com> writes:
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:
 On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:
 Yes, D has UB.
Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
casting out immutable or const and then modifying the value would lead to incorrect code.
Apr 27 2020
parent =?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:
On Monday, 27 April 2020 at 16:10:12 UTC, welkam wrote:
 casting out immutable or const and then modifying the value 
 would lead to incorrect code.
Ah, yes, I should have phrased it to make it clear I didn't have that kind of thing in mind. In your example we are in the realm of "well, you promised something and then you didn't follow through, and this is system, so you're on your own". Whereas things like evaluation order, alignment, etc. could arguably all be specified such that D does the right thing, even in various cases where in C they would be undefined behavior.
Apr 27 2020
prev sibling next sibling parent Johan <j j.nl> writes:
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:
 On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:
 Yes, D has UB.
Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
What comes to mind are null dereference and shifting by more than the bit width. ( safe plays no role for these) -Johan
Apr 27 2020
prev sibling parent reply Arine <arine1283798123 gmail.com> writes:
On Monday, 27 April 2020 at 15:14:06 UTC, Luís Marques wrote:
 On Monday, 27 April 2020 at 14:57:09 UTC, Johan wrote:
 Yes, D has UB.
Could you expand on that, please? Do you mean in the implementation or per the spec? Outside safe or also in safe code? Etc.
D has UB even in safe. safe doesn't mean there's no UB, it simply means it is memory safe. This will print both true and false: import std.stdio; safe void foo() { bool v = void; if ( v ) { writeln("true"); } if ( !v ) { writeln("false"); } } safe void main() { foo(); }
Apr 27 2020
next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Monday, 27 April 2020 at 18:21:45 UTC, Arine wrote:
 D has UB even in  safe.  safe doesn't mean there's no UB, it 
 simply means it is memory safe. This will print both true and 
 false:
Note that safe is defined to have no undefined behavior.
 Safe functions are functions that are statically checked to 
 exhibit no possibility of undefined behavior. Undefined 
 behavior is often used as a vector for malicious attacks.
https://dlang.org/spec/function.html#function-safety Anytime it there is UB in a safe function, it's a bug. The example you posted in particular is filed under https://issues.dlang.org/show_bug.cgi?id=20148
Apr 27 2020
parent Arine <arine1283798123 gmail.com> writes:
On Monday, 27 April 2020 at 18:59:22 UTC, Dennis wrote:
 On Monday, 27 April 2020 at 18:21:45 UTC, Arine wrote:
 D has UB even in  safe.  safe doesn't mean there's no UB, it 
 simply means it is memory safe. This will print both true and 
 false:
Note that safe is defined to have no undefined behavior.
 Safe functions are functions that are statically checked to 
 exhibit no possibility of undefined behavior. Undefined 
 behavior is often used as a vector for malicious attacks.
https://dlang.org/spec/function.html#function-safety Anytime it there is UB in a safe function, it's a bug. The example you posted in particular is filed under https://issues.dlang.org/show_bug.cgi?id=20148
There's lots of bugs filed there. A lot of them aren't valid. No one's confirmed whether that actually even is a bug. It is working as intended, otherwise the fix is rather simple.
Apr 27 2020
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 27.04.20 20:21, Arine wrote:
  safe doesn't mean there's no UB, it simply means it is memory safe.
UB precludes memory safety.
Apr 27 2020
prev sibling next sibling parent Johan <j j.nl> writes:
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:
 I've lost track. Does D have undefined behavior at all? (e.g. 
 outside  safe code).

 In any case, I imagine that the D spec (ha!) would dictate that 
 the following code should print 0x10001, although that's not 
 quite clear.
I would greatly appreciate it if the outcome of this discussion leads to addition of this case to the compiler testsuite. -Johan
Apr 27 2020
prev sibling next sibling parent kinke <noone nowhere.com> writes:
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:
 I've lost track. Does D have undefined behavior at all? (e.g. 
 outside  safe code).

 In any case, I imagine that the D spec (ha!) would dictate that 
 the following code should print 0x10001, although that's not 
 quite clear.

 GDC (optimized) prints 0x1.
 LDC2 prints 0x10001, but I suspect only because LLVM doesn't 
 happen to do the same optimization, not because this is handled 
 properly.

 ```
 import std.stdio;

  safe:

 int foo(int* a, int* b) {
     *a = 1;
     *b = 1;
     return *a;
 }

 void main() {
     ubyte[6] buffer;
     auto a = cast(int[]) (buffer[0 .. 4]);
     auto b = cast(int[]) (buffer[2 .. 6]);
     writefln("0x%X", foo(&a[0], &b[0]));
 }
 ```
A case where LDC goes further than GDC, wrt. infinite loops (see https://github.com/ldc-developers/ldc/issues/2827): void foo() safe { int x; while (x != x + 1) ++x; } int bar(int p) safe { if (p > 1) foo(); return p; } With -O, LDC optimizes bar() to simply returning p. See https://godbolt.org/z/84wZ9k.
Apr 27 2020
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On Monday, 27 April 2020 at 14:07:49 UTC, Luís Marques wrote:
 I've lost track. Does D have undefined behavior at all? (e.g. 
 outside  safe code).
In system and trusted code, there is plenty undefined behavior. Just search the spec for it: https://www.google.com/search?q=%22undefined%20behavior%22%20site:dlang.org/spec Per definition, safe is supposed to be free of undefined behavior [1]. But there are many holes in the system, so that's not really true at the moment. Maybe we get there some day. Even then, all guarantees are off when you feed bad data to safe code from system code (e.g. an invalid pointer).
 In any case, I imagine that the D spec (ha!) would dictate that 
 the following code should print 0x10001, although that's not 
 quite clear.

 GDC (optimized) prints 0x1.
 LDC2 prints 0x10001, but I suspect only because LLVM doesn't 
 happen to do the same optimization, not because this is handled 
 properly.

 ```
 import std.stdio;

  safe:

 int foo(int* a, int* b) {
     *a = 1;
     *b = 1;
     return *a;
 }

 void main() {
     ubyte[6] buffer;
     auto a = cast(int[]) (buffer[0 .. 4]);
     auto b = cast(int[]) (buffer[2 .. 6]);
     writefln("0x%X", foo(&a[0], &b[0]));
 }
 ```
As far as I'm aware, an implementation can only assume alignment on GC pointers [2]. It cannot assume that two `int*`s don't point to overlapping locations. So GDC shouldn't do that optimization. Then again, Walter wants to disallow passing multiple mutable references to the same memory with DIP 1021 [3]. If he goes through with that, implementations should reject that code. [1] https://dlang.org/spec/function.html#function-safety [2] https://dlang.org/spec/attribute.html#align [3] https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1021.md
Apr 27 2020
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/27/2020 12:04 PM, ag0aep6g wrote:
 Then again, Walter wants to disallow passing multiple mutable references to
the 
 same memory with DIP 1021 [3]. If he goes through with that, implementations 
 should reject that code.
Yes, but there will still be a way to trick it. live will stop that trickery, unless bad pointers are passed to the live function.
Apr 27 2020