digitalmars.D - Should (p - q) be disallowed in safe code?

Walter Bright (30/30) Dec 31 Consider:

Richard (Rikki) Andrew Cattermole (3/40) Dec 31 Make it ptrdiff_t not size_t, and I'm happy.

Walter Bright (2/3) Dec 31 My bad.

Richard (Rikki) Andrew Cattermole (9/13) Dec 31 I wasn't correcting you, I was saying what I wanted it to do.

user1234 (14/28) Jan 01 Yes but this has nothing to do with the substraction. You simply

Richard (Rikki) Andrew Cattermole (7/39) Jan 01 That sounds completely overkill.

Vladimir Panteleev (5/6) Jan 01 I don't think so. An expression which calculates a `size_t` (or

Timon Gehr (6/15) Jan 01 In C, subtracting pointers to different memory objects is undefined

Walter Bright (3/9) Jan 01 Since the same front end is used, I'd be surprised if they behaved diffe...

Timon Gehr (21/35) Jan 01 Well, that was the point. Vladimir had said that pointer subtraction is

Walter Bright (16/37) Jan 01 That "can do whatever it wants" is a correct interpretation, but it woul...

Richard (Rikki) Andrew Cattermole (12/15) Jan 01 That sounds a little like you're wanting to make safety designate which

Walter Bright (13/24) Jan 01 Right. Just disallow the p-q.

Walter Bright (4/8) Jan 02 The original C semantics were designed to map directly onto PDP-11 CPU i...

Timon Gehr (56/108) Jan 01 Usually if the UB does anything of note it is because an attacker is

Nick Treleaven (7/20) Jan 03 Point 6 here:

claptrap (5/12) Jan 01 I use this all the time to iterate multiple arrays in lockstep.

Walter Bright (3/10) Jan 01 @safe code doesn't allow pointer arithmetic, and so such code would have...

Paul Backus (13/23) Jan 01 Pointer *subtraction* is allowed in @safe code because the result

Timon Gehr (4/7) Jan 01 There are plenty of unsafe operations whose result is considered to be
Walter Bright (6/19) Jan 01 Yes, that code would be safe, but it would also be garbage.

Timon Gehr (13/24) Jan 01 It's one question whether you have to mark it `@trusted` for it to type

Walter Bright (8/20) Jan 01 My proposal would not affect that - the frontend would diagnose p-q as a...

Timon Gehr (12/44) Jan 01 I understand, but your latest point was "just put `@trusted` on it".

Walter Bright (15/30) Jan 02 Let's step back a bit. I expect it to behave as a C backend would. More

Timon Gehr (47/95) Jan 02 There is some conversion going on here that you did not mention, and in

Walter Bright (7/7) Jan 02 It seems we are in full agreement that p-q should be disallowed in @safe...

Timon Gehr (23/34) Jan 02 I am happy with each of these two outcomes:

Walter Bright (16/35) Jan 02 You are obviously correct. But using known computers, it is not a memory...

Timon Gehr (37/86) Jan 02 Underlying this admission is your utterly wrong claim, namely that it is...

Walter Bright (48/71) Jan 02 The bottom line here is why are we arguing about this? Haven't we agreed...

Richard (Rikki) Andrew Cattermole (2/12) Jan 02 Web assembly is segmented :(

Walter Bright (2/3) Jan 02 They should have talked to me first!

Richard (Rikki) Andrew Cattermole (15/19) Jan 02 I understand why they are doing it.

Walter Bright (2/4) Jan 03 LDC has a webassembly back end, so it is not a killer.

Richard (Rikki) Andrew Cattermole (7/12) Jan 03 You have misunderstood the situation.

Walter Bright (2/9) Jan 03 I don't see why calls to `new` cannot be redirected to whatever WASM doe...

Richard (Rikki) Andrew Cattermole (10/22) Jan 03 You can't do pointer arithmetic with WasmGC.

Walter Bright (9/19) Jan 03 ```

Richard (Rikki) Andrew Cattermole (41/65) Jan 03 Loading and storing fields work.

Walter Bright (2/2) Jan 04 Thanks for the explanation. What it suggest to me is that a subset of D ...

Richard (Rikki) Andrew Cattermole (6/8) Jan 04 Yes it should do, but it'll be limited enough that people will get

H. S. Teoh (37/48) Jan 04 In the past I've managed to get a rudimentary (highly-hacked) druntime

Adam D. Ruppe (3/5) Jan 04 https://dpldocs.info/this-week-in-arsd/Blog.Posted_2024_10_25.html

Walter Bright (2/3) Jan 04 It'd be little different from the WASM targets for C++.
Adam Ruppe (3/4) Jan 04 Just use opend, we ported druntime to wasm and make cross compilation

Timon Gehr (13/16) Jan 03 You brought up some tangential points that I think are based on flawed

Walter Bright (22/24) Jan 02 The current documentation says:

Walter Bright (1/1) Jan 02 https://github.com/dlang/dlang.org/pull/4358
Timon Gehr (14/23) Jan 02 This is what people intuitively assume will happen, but UB is not this.

Walter Bright (11/22) Jan 03 I've said that it was memory safe all along. I've said this proposal is

Timon Gehr (20/50) Jan 03 You have both said that and then also said that I am obviously correct

Walter Bright (2/2) Jan 03 I'm just going to make p-q an error in @safe code. If someone still want...

jmh530 (3/7) Jan 02 To what extent can D know when pointers are known to point to the

Walter Bright (4/11) Jan 02 Some can be done trivially, such as (&c - &d). More can be discovered wi...

Richard (Rikki) Andrew Cattermole (30/43) Jan 02 Four days ago I started implementing value tracking.

Walter Bright (1/1) Jan 03 Sorry about that. If you ask me, it may be possible, but it sure looks i...

Richard (Rikki) Andrew Cattermole (3/5) Jan 03 Its not your fault, its a hard problem that I don't think has any good

Walter Bright (1/1) Jan 03 https://github.com/dlang/dmd/pull/22348

jmh530 (4/5) Jan 04 2024 edition?

Walter Bright <newshound2 digitalmars.com> writes:

Consider:
```d
 safe
size_t distance(int* p, int* q) => p - q;
```
The difficulty here is when p and q may not be pointing into the same memory 
object. If they're not, the result is nonsense:
```d
int a;
int b;
size_t distance = &b - &a;
```
The address relationship between `a` and `b` is implementation-defined, and
code 
like this would be almost certainly a bug.

Where this could be valid:
```d
struct S
{
     int a,b;
}
S s;
size_t distance = &s.b - &s.a;
```

So this would be valid, as the two pointers are known to point to the same 
memory object.

A corollary to this would be disallowing < <= > >= comparisons between pointers.

p-q is commonplace in C code, where one traverses a loop. But in D code the 
preferred way would be to use arrays.

Thoughts?

P.S. I don't recall ever having a bug with misusing `p-q`. Has anyone?

Dec 31

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 01/01/2026 7:15 PM, Walter Bright wrote:
 Consider:
 ```d
  safe
 size_t distance(int* p, int* q) => p - q;
 ```
 The difficulty here is when p and q may not be pointing into the same 
 memory object. If they're not, the result is nonsense:
 ```d
 int a;
 int b;
 size_t distance = &b - &a;
 ```
 The address relationship between `a` and `b` is implementation-defined, 
 and code like this would be almost certainly a bug.
 
 Where this could be valid:
 ```d
 struct S
 {
      int a,b;
 }
 S s;
 size_t distance = &s.b - &s.a;
 ```
 
 So this would be valid, as the two pointers are known to point to the 
 same memory object.
 
 A corollary to this would be disallowing < <= > >= comparisons between 
 pointers.
 
 p-q is commonplace in C code, where one traverses a loop. But in D code 
 the preferred way would be to use arrays.
 
 Thoughts?
 
 P.S. I don't recall ever having a bug with misusing `p-q`. Has anyone?

Make it ptrdiff_t not size_t, and I'm happy.

The loops might go bad, but hey that is what static analyzers are for ;)

Dec 31

Walter Bright <newshound2 digitalmars.com> writes:

On 12/31/2025 10:54 PM, Richard (Rikki) Andrew Cattermole wrote:
 Make it ptrdiff_t not size_t, and I'm happy.

My bad.

Dec 31

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 01/01/2026 8:19 PM, Walter Bright wrote:
 On 12/31/2025 10:54 PM, Richard (Rikki) Andrew Cattermole wrote:
 Make it ptrdiff_t not size_t, and I'm happy.

 
 My bad.

I wasn't correcting you, I was saying what I wanted it to do.

```d
void func(void* a, void* b) {
	ptrdiff_t diff = b - a;
	// size_t diff = b - a; ERROR
	assert(diff >= 0, "ARGUMENTS BACKWARDS");
}
```

Dec 31

user1234 <user1234 12.de> writes:

On Thursday, 1 January 2026 at 07:27:04 UTC, Richard (Rikki) 
Andrew Cattermole wrote:
 On 01/01/2026 8:19 PM, Walter Bright wrote:
 On 12/31/2025 10:54 PM, Richard (Rikki) Andrew Cattermole 
 wrote:
 Make it ptrdiff_t not size_t, and I'm happy.

 
 My bad.

 I wasn't correcting you, I was saying what I wanted it to do.

 ```d
 void func(void* a, void* b) {
 	ptrdiff_t diff = b - a;
 	// size_t diff = b - a; ERROR
 	assert(diff >= 0, "ARGUMENTS BACKWARDS");
 }
 ```

Yes but this has nothing to do with the substraction. You simply 
hit the implicit corecions rules there

```d
ptrdiff_t a;
size_t b;
a = b;
b = a;
```

You need some kind of tracking/dfa/vrp to put restrictions. Same 
remark about the initial question I would say. You can imagine 
some code based on aliasing where `p - q` is finally totally 
fine. however ` safe` is good deal I would say.

Jan 01

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 01/01/2026 9:24 PM, user1234 wrote:
 On Thursday, 1 January 2026 at 07:27:04 UTC, Richard (Rikki) Andrew 
 Cattermole wrote:
 On 01/01/2026 8:19 PM, Walter Bright wrote:
 On 12/31/2025 10:54 PM, Richard (Rikki) Andrew Cattermole wrote:
 Make it ptrdiff_t not size_t, and I'm happy.

 My bad.

 I wasn't correcting you, I was saying what I wanted it to do.

 ```d
 void func(void* a, void* b) {
     ptrdiff_t diff = b - a;
     // size_t diff = b - a; ERROR
     assert(diff >= 0, "ARGUMENTS BACKWARDS");
 }
 ```

 
 Yes but this has nothing to do with the substraction. You simply hit the 
 implicit corecions rules there
 
 ```d
 ptrdiff_t a;
 size_t b;
 a = b;
 b = a;
 ```
 
 You need some kind of tracking/dfa/vrp to put restrictions. Same remark 
 about the initial question I would say. You can imagine some code based 
 on aliasing where `p - q` is finally totally fine. however ` safe` is 
 good deal I would say.

That sounds completely overkill.

Disable implicit unsigned <-> signed conversion, for a subtraction of 
pointers.

Doesn't need to be inter statement and definitely does not need to solve 
exact values for the pointers, which may only be known at runtime with 
100% certainty.

Jan 01

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 Thoughts?

I don't think so. An expression which calculates a `size_t` (or 
`ptrdiff_t`) value without side effects is memory-safe.

What you do with the index (valid or not) would be scrutinized by 
the usual rules.

Jan 01

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 10:10, Vladimir Panteleev wrote:
 On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 Thoughts?

 
 I don't think so. An expression which calculates a `size_t` (or 
 `ptrdiff_t`) value without side effects is memory-safe.
 
 What you do with the index (valid or not) would be scrutinized by the 
 usual rules.
 

In C, subtracting pointers to different memory objects is undefined 
behavior, hence side-effecting.

Subtracting pointers can be ` safe` iff it is always defined behavior. 
(Even if the defined behavior is to yield a nonsense value.)

I am not sure how GDC and LDC are currently treating this.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 7:18 AM, Timon Gehr wrote:
 In C, subtracting pointers to different memory objects is undefined behavior, 
 hence side-effecting.
 
 Subtracting pointers can be ` safe` iff it is always defined behavior. (Even
if 
 the defined behavior is to yield a nonsense value.)

Getting a nonsense value is memory safe, but is almost certainly a bug.


 I am not sure how GDC and LDC are currently treating this.

Since the same front end is used, I'd be surprised if they behaved differently.

Jan 01

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 18:18, Walter Bright wrote:
 On 1/1/2026 7:18 AM, Timon Gehr wrote:
 In C, subtracting pointers to different memory objects is undefined 
 behavior, hence side-effecting.

 Subtracting pointers can be ` safe` iff it is always defined behavior. 
 (Even if the defined behavior is to yield a nonsense value.)

 
 Getting a nonsense value is memory safe, but is almost certainly a bug.
 ...

Well, that was the point. Vladimir had said that pointer subtraction is 
free of side effects, but UB would *be* the side effect. And AFAIU 
according to the C standard it can be UB. This does not merely mean that 
the result could be nonsensical (which, for the record, would *not* be a 
bug in C), it means the program can do whatever it wants.

As long as it is defined behavior in D, keeping it ` safe` is perfectly 
fine. But differences to C and C++ may nevertheless trip up some 
implementations and violate memory safety, as backends were developed 
with C and C++ in mind.

` safe` has to be consistent with the backend semantics. This means 
either making certain constructs ` system` or ensuring all backends 
compile them safely, or having a broken ` safe`.

 
 I am not sure how GDC and LDC are currently treating this.

 
 Since the same front end is used, I'd be surprised if they behaved 
 differently.

UB is mostly a glue/backend thing, it's about what the code *means*, not 
about how it is type checked by the frontend. And backends are often 
biased towards C and C++ semantics.

There are other cases, e.g., with DMD a null dereference may be a 
guaranteed segfault, but I think it's likely UB with GDC and LDC.

It seems LDC even has the flag `-fno-delete-null-pointer-checks` to turn 
off UB on null pointer dereference, which would indeed indicate it is UB 
by default.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 12:47 PM, Timon Gehr wrote:
 Well, that was the point. Vladimir had said that pointer subtraction is free
of 
 side effects, but UB would *be* the side effect. And AFAIU according to the C 
 standard it can be UB. This does not merely mean that the result could be 
 nonsensical (which, for the record, would *not* be a bug in C), it means the 
 program can do whatever it wants.

That "can do whatever it wants" is a correct interpretation, but it would be 
insane to deliberately set up a system that launched nuclear missiles upon 
encountering UB.

I also object to common optimizations that interpret UB as license to delete
the 
offending code path.


 As long as it is defined behavior in D, keeping it ` safe` is perfectly fine. 
 But differences to C and C++ may nevertheless trip up some implementations and 
 violate memory safety, as backends were developed with C and C++ in mind.
 
 ` safe` has to be consistent with the backend semantics. This means either 
 making certain constructs ` system` or ensuring all backends compile them 
 safely, or having a broken ` safe`.

The backends do not, to my knowledge, have any awareness of  safe or  system. I 
don't see a scenario where not allowing p-q in  safe code would have any effect 
on the backend.


 UB is mostly a glue/backend thing, it's about what the code *means*, not about 
 how it is type checked by the frontend. And backends are often biased towards
C 
 and C++ semantics.
 
 There are other cases, e.g., with DMD a null dereference may be a guaranteed 
 segfault, but I think it's likely UB with GDC and LDC.
 
 It seems LDC even has the flag `-fno-delete-null-pointer-checks` to turn off
UB 
 on null pointer dereference, which would indeed indicate it is UB by default.

DMD's optimizer can detect null pointer dereferences as a result of copy 
propagation, etc., and always gives a compile time error when it does. 
Otherwise, it just dereferences the null pointer and whatever the CPU does with 
it happens.

What the proposal in this thread is about is extending the  safe semantics to 
not just be about memory safety, but about checking for common bugs where 
rewriting the code slightly to avoid it is practical.

Jan 01

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 02/01/2026 10:53 AM, Walter Bright wrote:
 What the proposal in this thread is about is extending the  safe 
 semantics to not just be about memory safety, but about checking for 
 common bugs where rewriting the code slightly to avoid it is practical.

That sounds a little like you're wanting to make safety designate which 
functions need static analysis.

However in this case it isn't required to make it disallow the operation:

``for (auto q = &array[0]; p - q; ++q)``

If ``p - q`` is a /signed/ integer that cannot implicitly cast to 
unsigned, it will never iterate.

Both ``size_t`` and ``ptrdiff_t`` should be built in types that cannot 
implicitly cast off them. Making them aliases was a mistake.

Note:  safe even with this upgrade does not track aliasing or 
not-aliasing. So if it is positive there is no way to know if it is the 
same object.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 4:42 PM, Richard (Rikki) Andrew Cattermole wrote:
 That sounds a little like you're wanting to make safety designate which 
 functions need static analysis.

Possibly, but not for the p-q case.


 However in this case it isn't required to make it disallow the operation:
 
 ``for (auto q = &array[0]; p - q; ++q)``

Right. Just disallow the p-q.


 If ``p - q`` is a /signed/ integer that cannot implicitly cast to unsigned, it 
 will never iterate.

The loop works correctly whether the difference is signed or not.


 Both ``size_t`` and ``ptrdiff_t`` should be built in types that cannot 
 implicitly cast off them. Making them aliases was a mistake.

Andrei proposed something like that years ago. Trying to write code with this 
soup of same-but-different types turns out to be an awful soup, resulting in 
lots of casting.

I've used a language that did not have implicit casting. The result was casts 
everywhere, which winds up *increasing* the number of hidden bugs. C has a 
well-designed implicit casting system (it isn't perfect) that is a lot more 
flexible when one, for instance, wants to change the type of an integer.

 Note:  safe even with this upgrade does not track aliasing or not-aliasing. So 
 if it is positive there is no way to know if it is the same object.

That's right. That's why p-q would be disallowed, whether or not it points to 
the same memory object.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 5:55 PM, Walter Bright wrote:
 I've used a language that did not have implicit casting. The result was casts 
 everywhere, which winds up *increasing* the number of hidden bugs. C has a 
 well-designed implicit casting system (it isn't perfect) that is a lot more 
 flexible when one, for instance, wants to change the type of an integer.

The original C semantics were designed to map directly onto PDP-11 CPU
instructions.

Along with C's rise to dominance in the 80s, the design of other CPUs were 
adjusted to match C semantics.

Jan 02

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 22:53, Walter Bright wrote:
 On 1/1/2026 12:47 PM, Timon Gehr wrote:
 Well, that was the point. Vladimir had said that pointer subtraction 
 is free of side effects, but UB would *be* the side effect. And AFAIU 
 according to the C standard it can be UB. This does not merely mean 
 that the result could be nonsensical (which, for the record, would 
 *not* be a bug in C), it means the program can do whatever it wants.

 
 That "can do whatever it wants" is a correct interpretation, but it 
 would be insane to deliberately set up a system that launched nuclear 
 missiles upon encountering UB.
 ...

Usually if the UB does anything of note it is because an attacker is 
exploiting the hole in the language semantics to, deliberately, make the 
program do something actively malicious that was never intended by its 
author.

It's a common misunderstanding that these kinds of scenarios are 
hypothetical. UB just sucks in this way, whether you deliberately want 
it to or not.

 I also object to common optimizations that interpret UB as license to 
 delete the offending code path.
 ...

Whether it is deliberately interpreted in any way or not, the code has 
to do _SOMETHING_ if the UB condition actually occurs, and it's unclear 
how you would specify that optimizations are supposed to maintain that 
specific behavior when it may not even be clear at a point in the 
optimization pipeline what it will end up being in the end. Often 
enough, it will end up being an exploitable weakness.

If you want optimizers to preserve the behavior of code that has UB in 
it, you have to turn any potential of UB into an optimization blocker. 
The optimizer's intermediate representation just does not carry the 
final machine semantics for expressions with UB.

It's a fundamental problem of any language design with UB, not some sort 
of conspiracy by evil compiler developers.

UB exists because language designers and programmers want/need power 
(runnable low-level fast code fast) without responsibility (using formal 
methods such as advanced type systems).

 
 As long as it is defined behavior in D, keeping it ` safe` is 
 perfectly fine. But differences to C and C++ may nevertheless trip up 
 some implementations and violate memory safety, as backends were 
 developed with C and C++ in mind.

 ` safe` has to be consistent with the backend semantics. This means 
 either making certain constructs ` system` or ensuring all backends 
 compile them safely, or having a broken ` safe`.

 
 The backends do not, to my knowledge, have any awareness of  safe or 
  system.

I think this is true yet completely irrelevant to what I was saying.

 I don't see a scenario where not allowing p-q in  safe code 
 would have any effect on the backend.
 ...

Not the point at all. I was reacting to Vladimir's statement that:

 I don't think so. An expression which calculates a `size_t` (or 

`ptrdiff_t`) value without side effects is memory-safe.
 What you do with the index (valid or not) would be scrutinized by the 

usual rules.

My point was that:

a) There does not actually seem to be any explicit documentation in the 
D spec about pointer subtraction. If there is, I have not found it.

b) In some popular languages, `p-q` is UB if `p` and `q` point to 
different memory objects.

c) It's hence possible that some D backends give UB to this expression 
when according to your intention they should not.

d) This scenario is not implausible, I think it already happens for null 
pointer dereferences that code that the frontend says is ` safe` is 
treated as UB by some of the backends.

 
 UB is mostly a glue/backend thing, it's about what the code *means*, 
 not about how it is type checked by the frontend. And backends are 
 often biased towards C and C++ semantics.

 There are other cases, e.g., with DMD a null dereference may be a 
 guaranteed segfault, but I think it's likely UB with GDC and LDC.

 It seems LDC even has the flag `-fno-delete-null-pointer-checks` to 
 turn off UB on null pointer dereference, which would indeed indicate 
 it is UB by default.

 
 DMD's optimizer can detect null pointer dereferences

The flag is somewhat of a misnomer, you might have to actually look into 
its documentation.

 as a result of copy propagation, etc., and always gives a compile time error
when it does.

Sure, when you can prove that a piece of code is always wrong to 
execute, you can do that (and I think it's a good idea). Often you 
however can't.

 Otherwise, it just dereferences the null pointer and whatever the CPU 
 does with it happens.
 ...

And hence you are now stuck treating pointer dereferences as a 
side-effecting operation. Some backends don't like doing that.

Another issue is that some targets will not trap at all and just treat 0 
as a valid memory address. (Less relevant for DMD's supported targets.)

 What the proposal in this thread is about is extending the  safe 
 semantics to not just be about memory safety, but about checking for 
 common bugs where rewriting the code slightly to avoid it is practical.
 

I understand, but there are more than two positions here.

Your position: `p-q` is memory safe yet might be error prone and we 
might want to start banning error prone constructs in ` safe` code even 
though it was originally meant to be strictly about memory safety.

Vladimir's position: `p-q` is memory safe, hence there is no need to 
reject it in ` safe` code.

My position: Wait, is `p-q` even _currently implemented_ in a memory 
safe way? Where is it documented? What are the backends doing? There are 
already cases where ` safe` code is treated as UB by some backends and 
`p-q` might be among these cases.

Jan 01

Nick Treleaven <nick geany.org> writes:

On Friday, 2 January 2026 at 02:00:29 UTC, Timon Gehr wrote:
 a) There does not actually seem to be any explicit 
 documentation in the D spec about pointer subtraction. If there 
 is, I have not found it.

Point 6 here: 
https://dlang.org/spec/expression.html#pointer_arithmetic

 If both operands are pointers, and the operator is -, the 
 pointers are subtracted and the result is divided by the size 
 of the type pointed to by the operands.

It sounds like we need to put in an undefined behaviour note. 
What about *RelExpression* on pointers, UB or not?

 b) In some popular languages, `p-q` is UB if `p` and `q` point 
 to different memory objects.

 c) It's hence possible that some D backends give UB to this 
 expression when according to your intention they should not.

 d) This scenario is not implausible, I think it already happens 
 for null pointer dereferences that code that the frontend says 
 is ` safe` is treated as UB by some of the backends.

Yes as of May, that requirement is in the spec:
https://dlang.org/spec/function.html#null-dereferences

Jan 03

claptrap <clap trap.com> writes:

On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 Consider:
 ```d
  safe
 size_t distance(int* p, int* q) => p - q;
 ```
 The difficulty here is when p and q may not be pointing into 
 the same memory object. If they're not, the result is nonsense:

I use this all the time to iterate multiple arrays in lockstep.

size_t offset = q-p;

you access q with "p[offset]", and you just iterate p

I tried to avoid using it but it is just faster sometimes,

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 7:16 AM, claptrap wrote:
 I use this all the time to iterate multiple arrays in lockstep.
 
 size_t offset = q-p;
 
 you access q with "p[offset]", and you just iterate p
 
 I tried to avoid using it but it is just faster sometimes,

 safe code doesn't allow pointer arithmetic, and so such code would have to be 
marked  trusted anyway.

Jan 01

Paul Backus <snarwin gmail.com> writes:

On Thursday, 1 January 2026 at 17:20:33 UTC, Walter Bright wrote:
 On 1/1/2026 7:16 AM, claptrap wrote:
 I use this all the time to iterate multiple arrays in lockstep.
 
 size_t offset = q-p;
 
 you access q with "p[offset]", and you just iterate p
 
 I tried to avoid using it but it is just faster sometimes,

  safe code doesn't allow pointer arithmetic, and so such code 
 would have to be marked  trusted anyway.

Pointer *subtraction* is allowed in  safe code because the result 
is an integer, and all integers are [safe values][1].

For example, this compiles using the latest release of DMD:

```d
import std.stdio;

void main()  safe
{
     int* p = new int, q = new int;
     writeln(q - p);
}
```

[1]: https://dlang.org/spec/function.html#safe-values

Jan 01

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 18:56, Paul Backus wrote:
 
 Pointer *subtraction* is allowed in  safe code because the result is an 
 integer, and all integers are [safe values][1].

There are plenty of unsafe operations whose result is considered to be 
an integer by the type checker, so I don't think this justification is 
sufficient.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 9:56 AM, Paul Backus wrote:
 For example, this compiles using the latest release of DMD:
 
 ```d
 import std.stdio;
 
 void main()  safe
 {
      int* p = new int, q = new int;
      writeln(q - p);
 }
 ```
 
 [1]: https://dlang.org/spec/function.html#safe-values

Yes, that code would be safe, but it would also be garbage.

I was referring more to code like this:

char* p = &array[array.length;
for (auto q = &array[0]; p - q; ++q)
     ...

Jan 01

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 18:20, Walter Bright wrote:
 On 1/1/2026 7:16 AM, claptrap wrote:
 I use this all the time to iterate multiple arrays in lockstep.

 size_t offset = q-p;

 you access q with "p[offset]", and you just iterate p

 I tried to avoid using it but it is just faster sometimes,

 
  safe code doesn't allow pointer arithmetic, and so such code would have 
 to be marked  trusted anyway.

It's one question whether you have to mark it ` trusted` for it to type 
check, it's another question whether you are allowed to mark it 
` trusted` (i.e., whether it is actually memory safe).

In C, I think adding an integer to a pointer aiming to get a result that 
points to a different memory object entirely would just be UB.

I.e., there is some potential for `q+(p-q)` to do something other than 
give you `p` unless glue code and backends are careful to handle it as 
intended.

As far as I can tell, such nuances are not documented in the D spec and 
so it would be defensible for GDC and LDC to assume it's just supposed 
to mimic C behavior. Whether the backends actually do end up breaking 
assumptions like they would be allowed to is still another question.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

 1/1/2026 12:57 PM, Timon Gehr wrote:
 It's one question whether you have to mark it ` trusted` for it to type check, 
 it's another question whether you are allowed to mark it ` trusted` (i.e., 
 whether it is actually memory safe).

 trusted only applies to the interface, not the code itself.

 In C, I think adding an integer to a pointer aiming to get a result that
points 
 to a different memory object entirely would just be UB.

You're right.


 I.e., there is some potential for `q+(p-q)` to do something other than give
you 
 `p` unless glue code and backends are careful to handle it as intended.

My proposal would not affect that - the frontend would diagnose p-q as an error 
in  safe code.


 As far as I can tell, such nuances are not documented in the D spec and so it 
 would be defensible for GDC and LDC to assume it's just supposed to mimic C 
 behavior. Whether the backends actually do end up breaking assumptions like
they 
 would be allowed to is still another question.

I designed the semantics of D fully aware of the reality that the usable 
backends (including mine) were designed for C, and that to not do so would be 
language suicide.

(And it's not just the backends, there are the debuggers, etc.)

Jan 01

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/1/26 23:14, Walter Bright wrote:
  > 1/1/2026 12:57 PM, Timon Gehr wrote:
 It's one question whether you have to mark it ` trusted` for it to 
 type check, it's another question whether you are allowed to mark it 
 ` trusted` (i.e., whether it is actually memory safe).

 
  trusted only applies to the interface, not the code itself.
 
 In C, I think adding an integer to a pointer aiming to get a result 
 that points to a different memory object entirely would just be UB.

 
 You're right.
 
 
 I.e., there is some potential for `q+(p-q)` to do something other than 
 give you `p` unless glue code and backends are careful to handle it as 
 intended.

 
 My proposal would not affect that - the frontend would diagnose p-q as 
 an error in  safe code.
 ...

I understand, but your latest point was "just put ` trusted` on it".


Let's say the frontend now treats `p-q` as ` system`, and there is not 
even any documentation of what its semantics is supposed to be.

Do you believe with this background, alternative backends will in the 
future be more likely to:

- treat `p-q` as UB when different memory objects are involved

- treat `p-q` as defined behavior when different memory objects are involved

I just think the overall effect of this will be to cause confusion about 
what is allowed among all parties involved. I think it's better to stick 
to banning language constructs from ` safe` if they can actually exhibit UB.

 
 As far as I can tell, such nuances are not documented in the D spec 
 and so it would be defensible for GDC and LDC to assume it's just 
 supposed to mimic C behavior. Whether the backends actually do end up 
 breaking assumptions like they would be allowed to is still another 
 question.

 
 I designed the semantics of D fully aware of the reality that the usable 
 backends (including mine) were designed for C, and that to not do so 
 would be language suicide.
 
 (And it's not just the backends, there are the debuggers, etc.)

And yet it seems for `p-q` you differed.

Jan 01

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 6:11 PM, Timon Gehr wrote:
 On 1/1/26 23:14, Walter Bright wrote:
 I understand, but your latest point was "just put ` trusted` on it".
 
 Let's say the frontend now treats `p-q` as ` system`, and there is not even
any 
 documentation of what its semantics is supposed to be.

It's semantics are subtract q from p and divide by the size of the pointed to
type.


 Do you believe with this background, alternative backends will in the future
be 
 more likely to:
 
 - treat `p-q` as UB when different memory objects are involved
 
 - treat `p-q` as defined behavior when different memory objects are involved

Let's step back a bit. I expect it to behave as a C backend would. More 
precisely, I have read the C/C++ memory model specification. It is very 
carefully written and well done. I requested a license to copy it to use in the 
D specification, but my request was ignored.

I could rewrite it to an equivalent definition, but that's a lot of work.

But still, D is going to adhere to it. It works, everyone understands it, and 
the existing backends are carefully tuned to match it.

All my proposal does is disallow pointer subtraction in  safe code. Code 
generation is not affected in any material way.

It's the same thing as disallowing p+=1 in  safe code. The memory model does
not 
change.


 I just think the overall effect of this will be to cause confusion about what
is 
 allowed among all parties involved. I think it's better to stick to banning 
 language constructs from ` safe` if they can actually exhibit UB.

Isn't that what I proposed?


 And yet it seems for `p-q` you differed.

How did I differ? I am confused.

Jan 02

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/2/26 18:53, Walter Bright wrote:
 On 1/1/2026 6:11 PM, Timon Gehr wrote:
 On 1/1/26 23:14, Walter Bright wrote:
 I understand, but your latest point was "just put ` trusted` on it".

 Let's say the frontend now treats `p-q` as ` system`, and there is not 
 even any documentation of what its semantics is supposed to be.

 
 It's semantics are subtract q from p and divide by the size of the 
 pointed to type.
 ...

There is some conversion going on here that you did not mention, and in 
C the subtraction is sometimes invalid. I understand how to subtract 
pointers in e.g. x86 assembly, but the abstract semantics in a 
high-level language is a different thing. E.g., there is no such thing 
as a "memory object" at the assembly level.

 
 Do you believe with this background, alternative backends will in the 
 future be more likely to:

 - treat `p-q` as UB when different memory objects are involved

 - treat `p-q` as defined behavior when different memory objects are 
 involved

 
 Let's step back a bit. I expect it to behave as a C backend would.

(What any given C backend does _in practice_ is yet another question.)

But it seems you'd like it to be UB sometimes. Then it must be ` system`.

 More 
 precisely, I have read the C/C++ memory model specification. It is very 
 carefully written and well done. I requested a license to copy it to use 
 in the D specification, but my request was ignored.
 
 I could rewrite it to an equivalent definition, but that's a lot of work.
 ...

That's not really the point of contention, if you are saying "D pointer 
arithmetic semantics is like C", that's a sufficient specification as 
far as I am concerned. And then it immediately follows that `p-q` cannot 
be allowed in ` safe` code.

 But still, D is going to adhere to it. It works, everyone understands 
 it, and the existing backends are carefully tuned to match it.
 ...

Ok.

 All my proposal does is disallow pointer subtraction in  safe code. Code 
 generation is not affected in any material way.
 ...

The point of contention is really not whether banning a construct will 
affect codegen. The actual dependency is:

type checking <- semantics -> codegen

However, if you allow `p-q` in ` safe` code, assuming logical 
consistency, we can infer an intent about semantics that will put 
certain restrictions on code generation.

 It's the same thing as disallowing p+=1 in  safe code.

Maybe to you this is the same, but to me `p-q` and `p+=1` are materially 
different: one yields an integer, the other one yields a potentially 
invalid pointer. It is conceivable _in principle_ to have a language 
semantics where `p-q` is defined behavior.

 The memory model does not change.
 ...

That's fine, but to allow `p-q` in ` safe` code with C semantics is 
inconsistent with _the definition of ` safe`_. And now you are saying 
that this is the _current behavior_. It seems something is broken, and 
fixing it is a _design problem_.

There are two different ways to fix it:
- Make cross-memory-object `p-q` implementation-defined (as you claimed 
in your OP was already the case), differing from C.
- Make cross-memory-object `p-q` UB (as you are claiming now is already 
the case), then ban `p-q` from ` safe` code.

You can't ignore the intended semantics of your programming constructs 
when deciding if they can be ` safe`, even if changing the type checker 
to consider something ` safe` or not does not have a material effect on 
code generation by itself.

 
 I just think the overall effect of this will be to cause confusion 
 about what is allowed among all parties involved. I think it's better 
 to stick to banning language constructs from ` safe` if they can 
 actually exhibit UB.

 
 Isn't that what I proposed?
 ...

I am not able to tell, which is the problem. You are saying 
contradictory things.

You so far made all of these claims:
- cross-memory-object `p-q` is implementation-defined in D
- `p-q` in D is like in C
- cross-memory-object `p-q` is UB in C.

One of these three statements must be false. I think the last one is 
correct.

 
 And yet it seems for `p-q` you differed.

 
 How did I differ? I am confused.

`p-q` is sometimes UB in C and hence not memory safe. You said `p-q` is 
memory safe in D. Hence it would have to be different.

There is no such thing as "UB yet memory safe".

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

It seems we are in full agreement that p-q should be disallowed in  safe code, 
which is my proposal here.

BTW, p-q is not a memory safety issue. At worst you get an integer result that 
is an unpredictable value. Yes, I am suggesting expanding the scope of  safe.

`i<<j` can also result in nonsense if `j>=32`. But it is not unsafe. Given the 
pervasiveness of C, it would be insanity for a CPU to do anything other than
seg 
fault or produce a random result.

Jan 02

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/2/26 22:03, Walter Bright wrote:
 It seems we are in full agreement that p-q should be disallowed in  safe 
 code, which is my proposal here.
 ...

I am happy with each of these two outcomes:
1. `p-q` is ` safe`, implementation-defined.
2. `p-q` can be UB, must be ` system`.

So, works for me.

 BTW, p-q is not a memory safety issue.

Any type of UB is a memory safety issue.

 At worst you get an integer result that is an unpredictable value.

No, _at worst_ you get e.g. the nuclear launch thing you mentioned (or 
worse). Undefined semantics is starkly distinct from nondeterministic 
semantics.

Any assumption that any type of UB is benign must rely on additional 
information about specific backends. So what you claim may be true with 
DMD, but that is about the extent of it.

 Yes, I am suggesting expanding the scope of  safe.
 ...

As long as ` safe` code consistently bans UB, the discussion of whether 
banning UB from ` safe` code is an expansion of the scope of ` safe` is 
mostly a philosophical one.

I am happy if ` safe` code disallows language constructs that can cause UB.

I think you are completely wrong to claim that this is an expansion of 
the scope of ` safe`, but I will not lose any sleep over that part.

 `i<<j` can also result in nonsense if `j>=32`.

It is UB in C.

 But it is not unsafe.

UB implies unsafe. If it is not unsafe, it is not UB (e.g., in Java it 
is safe and hence not UB).

 Given the pervasiveness of C, it would be insanity for a CPU to do 
 anything other than seg fault or produce a random result.

I would expect a CPU to just do `i<<(j&31)`.

The C abstract machine is however not the CPU.

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

On 1/2/2026 2:54 PM, Timon Gehr wrote:
 On 1/2/26 22:03, Walter Bright wrote:
 It seems we are in full agreement that p-q should be disallowed in  safe code, 
 which is my proposal here.
 ...

 
 I am happy with each of these two outcomes:
 1. `p-q` is ` safe`, implementation-defined.
 2. `p-q` can be UB, must be ` system`.
 
 So, works for me.
 
 BTW, p-q is not a memory safety issue.

 
 Any type of UB is a memory safety issue.

You are obviously correct. But using known computers, it is not a memory safety 
measure. I don't see any reason anyone would implement p-q such that it trashes 
memory or sets the CPU on fire. Maybe what actually happens should be 
documented, to make it "implementation defined", but I'm not in a position to 
authoritatively document what CPUs do.

Dereferencing random pointers, on the other hand, can realistically corrupt 
memory. This is why pointer arithmetic is not allowed in  safe code.

 Any assumption that any type of UB is benign must rely on additional
information 
 about specific backends. So what you claim may be true with DMD, but that is 
 about the extent of it.

I can't see a professionally designed CPU catching fire or corrupting memory by 
subtracting two unrelated pointers. One would have to add more transistors to 
make that happen. Nobody would buy such a machine.

Current CPUs are what they are. We live with that, and we trade off performance 
for some level of unpredictable failure.

 I would expect a CPU to just do `i<<(j&31)`.

The X86_64 and Aarch64 give different results, I ran into that bug.

 The C abstract machine is however not the CPU.

CPU design has very much followed C semantics since the 80s. Unfortunately, the 
C spec didn't nail down certain behaviors, and so we have different behaviors.

Jan 02

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/3/26 01:00, Walter Bright wrote:
 On 1/2/2026 2:54 PM, Timon Gehr wrote:
 On 1/2/26 22:03, Walter Bright wrote:
 It seems we are in full agreement that p-q should be disallowed in 
  safe code, which is my proposal here.
 ...

 I am happy with each of these two outcomes:
 1. `p-q` is ` safe`, implementation-defined.
 2. `p-q` can be UB, must be ` system`.

 So, works for me.

 BTW, p-q is not a memory safety issue.

 Any type of UB is a memory safety issue.

 
 You are obviously correct.

Underlying this admission is your utterly wrong claim, namely that it is 
a theoretical issue without practical significance.

 But using known computers, it is not a memory safety measure.

What "known computers" are doing at the machine level is only part of 
the puzzle. You can't just ignore "known compilers".

This is not about hardware.

 I don't see any reason anyone would implement p-q such 
 that it trashes memory or sets the CPU on fire.

Compiler passes just do what they do, assuming things like that if you 
see `p-q` then `p` and `q` are pointing to the same memory object.

Garbage in, garbage out. Wrong assumptions entering optimizers can and 
do cause befuddling miscompilation.

The optimizer does not care to explicitly trash your memory on `p-q`, 
it's just a side effect of completely disregarding the case where `p` 
and `q` are unrelated.

 Maybe what actually happens should be documented, to make it "implementation
defined", but 
 I'm not in a position to authoritatively document what CPUs do.
 ...

UB does not care about what CPUs do. Even saying "it will do whatever 
the CPU does in this and this situation" is much, much safer than saying 
"this is UB". However, most backends made for C will not be able to 
implement this semantics while still performing optimizations.

 Dereferencing random pointers, on the other hand, can realistically 
 corrupt memory. This is why pointer arithmetic is not allowed in  safe 
 code.
 ...

`p-q` in a C program can _realistically_ corrupt memory even if the CPU 
will never corrupt memory when subtracting addresses.

This is not just a theoretical problem, UB is UB and it has caused 
problems in practice.

 Any assumption that any type of UB is benign must rely on additional 
 information about specific backends. So what you claim may be true 
 with DMD, but that is about the extent of it.

 
 I can't see a professionally designed CPU catching fire or corrupting 
 memory by subtracting two unrelated pointers. One would have to add more 
 transistors to make that happen.

You absolutely can make a more efficient CPU by adding UB to it that can 
cause it to destroy itself or corrupt other components of the system if 
you run the wrong program. Professionals just indeed don't do that, 
because for some reason hardware reliability is taken seriously while 
software reliability is not.

CPUs come with manufacturer warranties, software comes with EULAs that 
read "ABSOLUTELY NO WARRANTY OF FITNESS FOR ANY PARTICULAR PURPOSE".

CPU manufacturers are using formal methods to verify their designs.

 Nobody would buy such a machine.
 ...

This is not about the CPU, it's about compilers.

 Current CPUs are what they are. We live with that, and we trade off 
 performance for some level of unpredictable failure.
 
 I would expect a CPU to just do `i<<(j&31)`.

 
 The X86_64 and Aarch64 give different results, I ran into that bug.
 
 The C abstract machine is however not the CPU.

 
 CPU design has very much followed C semantics since the 80s. 

The CPU does not have a concept of "memory object" or "different memory 
objects". It usually does not even distinguish addresses from other 
machine-word integers.

 Unfortunately, the C spec didn't nail down certain behaviors, and so we 
 have different behaviors.
 

This is analogous to implementation-defined behavior, not undefined 
behavior. The C spec has undefined behavior, it is not saying "do what 
the CPU does", it is saying "do whatever is expedient, e.g. so to make 
the program run fast".

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

The bottom line here is why are we arguing about this? Haven't we agreed that 
p-q should be disallowed in  safe code? The rest of this message you can ignore 
if you like.

---------------------

On 1/2/2026 4:46 PM, Timon Gehr wrote:
 This is not about hardware.

Good, we can move on from that issue!


 The optimizer does not care to explicitly trash your memory on `p-q`, it's
just 
 a side effect of completely disregarding the case where `p` and `q` are
unrelated.

```
int i,j;
p = &i;
q = &j;
x = p - q;
```
The compiler can detect that p-q would would be undefined behavior. A sane 
compiler would issue an error message upon such detection. Note that the the
C11 
spec says not doing a "shall" means undefined behavior. Taking that literally 
means any syntax/semantic error in your code can legitimately cause the
compiler 
to generate undefined behavior. But not a sane compiler.

And yes, I oppose optimizers that detect UB and just delete it. That's a 
disservice to the users, who find out the hard way about this behavior, rather 
than getting a useful error message.

If the compiler does not detect that error (which will be most cases), then it 
will do the reasonable thing and just subtract the two numbers, which will not 
cause memory corruption in any mainstream CPU.


 `p-q` in a C program can _realistically_ corrupt memory even if the CPU will 
 never corrupt memory when subtracting addresses.
 This is not just a theoretical problem, UB is UB and it has caused problems in 
 practice.

I know, but I haven't seen an example of it for `p-q`. It would be interesting 
if you could devise one! The UB problems I've seen were for other constructions.


 You absolutely can make a more efficient CPU by adding UB to it that can cause 
 it to destroy itself or corrupt other components of the system if you run the 
 wrong program.

I don't know if that is possible for `p-q`. It's just a subtraction. It may
very 
well be possible for other UBs.

 Professionals just indeed don't do that, because for some reason 
 hardware reliability is taken seriously while software reliability is not.

The reason is pretty simple. Remember the disaster with the Intel Pentium 
floating point bug? Wow was that expensive! I bore some of that cost because I 
had to add workarounds to the code generator. Software updates are a lot
cheaper 
than having to pry out everyone's CPU chip and replace it, and even so, 
compilers had to assume they were running on a bad CPU.


 CPUs come with manufacturer warranties, software comes with EULAs that read 
 "ABSOLUTELY NO WARRANTY OF FITNESS FOR ANY PARTICULAR PURPOSE".

The software industry would cease to exist without that clause.


 CPU manufacturers are using formal methods to verify their designs.

Formal methods have bugs, too. Though I agree that formal methods are highly 
useful. I know how to set up DFA and such and get them right, but I can't say I 
have expertise in formal methods. For example, I don't know how to prove that 
DFA converges to a solution, though I know it does, because the paper I learned 
it from says they proved it :-) and have never found it to not be true. Full 
disclosure: I have no formal education in computer science, which you have 
surely inferred by now!


 The CPU does not have a concept of "memory object" or "different memory 
 objects". It usually does not even distinguish addresses from other
machine-word 
 integers.

It does with the segmented memory system of the IBM PC, and the banked memory 
card add-ons. I wrote a software virtual memory system using banked memory and 
segment registers. You didn't really want to use an offset larger than the 
memory allocated to that segment!

But those designs are all obsolete now and irrelevant.


 Unfortunately, the C spec didn't nail down certain behaviors, and so we have 
 different behaviors.

 This is analogous to implementation-defined behavior, not undefined behavior. 
 The C spec has undefined behavior, it is not saying "do what the CPU does", it 
 is saying "do whatever is expedient, e.g. so to make the program run fast".

The commercial reality is starting in the 80s CPU designs changed to be very 
friendly to actual C behavior. The C spec doesn't say anything about expedience 
or speed (that I recall).

Jan 02

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 03/01/2026 3:29 PM, Walter Bright wrote:
 The CPU does not have a concept of "memory object" or "different 
 memory objects". It usually does not even distinguish addresses from 
 other machine-word integers.

 
 It does with the segmented memory system of the IBM PC, and the banked 
 memory card add-ons. I wrote a software virtual memory system using 
 banked memory and segment registers. You didn't really want to use an 
 offset larger than the memory allocated to that segment!
 
 But those designs are all obsolete now and irrelevant.

Web assembly is segmented :(

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

On 1/2/2026 6:37 PM, Richard (Rikki) Andrew Cattermole wrote:
 Web assembly is segmented :(

They should have talked to me first!

Jan 02

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 03/01/2026 4:24 PM, Walter Bright wrote:
 On 1/2/2026 6:37 PM, Richard (Rikki) Andrew Cattermole wrote:
 Web assembly is segmented :(

 
 They should have talked to me first!

I understand why they are doing it.

Its not like a traditional cpu ISA, its all typed.

The killer though for D is you can't get a pointer with whatever offset 
you want into a GC object.

There are some improvements being worked on:

https://github.com/WebAssembly/memory-control/blob/main/proposals/memory-control/Overview.md

https://github.com/WebAssembly/multibyte-array-access/blob/main/proposals/multibyte-array-access/Overview.md

But what we'd need to take full advantage is a reference type that can 
point to whatever segment of memory + an arbitrary offset, and do 
arithmetic on it.

Possible to do that due to it all being typed and JIT'd.

Funnily enough I watched a video on Web Assembly's GC today, left a 
comment about how its a bit of a disappointment that it is DOA for us.

https://www.youtube.com/watch?v=nbqjDEaRkVI

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

On 1/2/2026 7:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The killer though for D is you can't get a pointer with whatever offset you
want 
 into a GC object.

LDC has a webassembly back end, so it is not a killer.

Jan 03

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/01/2026 8:30 AM, Walter Bright wrote:
 On 1/2/2026 7:57 PM, Richard (Rikki) Andrew Cattermole wrote:
 The killer though for D is you can't get a pointer with whatever 
 offset you want into a GC object.

 
 LDC has a webassembly back end, so it is not a killer.

You have misunderstood the situation.

As far as GC is concerned we are on our own and stuck on linear memory 
aka sbrk, we cannot use WasmGC with our pointers.

Upstream ldc does not have runtime supported and I'm not sure I'd even 
suggest the -betterC support as acceptable.

Having the target enabled isn't the same thing as being a supported target.

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

On 1/3/2026 6:03 PM, Richard (Rikki) Andrew Cattermole wrote:
 As far as GC is concerned we are on our own and stuck on linear memory aka
sbrk, 
 we cannot use WasmGC with our pointers.
 
 Upstream ldc does not have runtime supported and I'm not sure I'd even suggest 
 the -betterC support as acceptable.
 
 Having the target enabled isn't the same thing as being a supported target.


I don't see why calls to `new` cannot be redirected to whatever WASM does?

Jan 03

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/01/2026 3:30 PM, Walter Bright wrote:
 On 1/3/2026 6:03 PM, Richard (Rikki) Andrew Cattermole wrote:
 As far as GC is concerned we are on our own and stuck on linear memory 
 aka sbrk, we cannot use WasmGC with our pointers.

 Upstream ldc does not have runtime supported and I'm not sure I'd even 
 suggest the -betterC support as acceptable.

 Having the target enabled isn't the same thing as being a supported 
 target.

 
 
 I don't see why calls to `new` cannot be redirected to whatever WASM does?

You can't do pointer arithmetic with WasmGC.
No subtraction, no getting pointers to fields, nothing like that.
That is the GC offering currently.

For the linear memory, its a memory mapper only, sbrk.
Oh and you can have multiple linear memories that you have to keep track 
what the offset is actually for when dereferencing.

They are typed entirely differently, you cannot mix them.
It is exactly like near vs far pointers.

Basically you're on your own as a compiler developer.

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

On 1/3/2026 7:30 PM, Richard (Rikki) Andrew Cattermole wrote:
 I don't see why calls to `new` cannot be redirected to whatever WASM does?

 
 You can't do pointer arithmetic with WasmGC.
 No subtraction, no getting pointers to fields, nothing like that.
 That is the GC offering currently.

```
struct S { int a; }
S* s = new S();
s.a = 3;
```

What's the problem?

 For the linear memory, its a memory mapper only, sbrk.
 Oh and you can have multiple linear memories that you have to keep track what 
 the offset is actually for when dereferencing.

I don't get it.

 They are typed entirely differently, you cannot mix them.
 It is exactly like near vs far pointers.

??

Jan 03

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/01/2026 6:28 PM, Walter Bright wrote:
 On 1/3/2026 7:30 PM, Richard (Rikki) Andrew Cattermole wrote:
 I don't see why calls to `new` cannot be redirected to whatever WASM 
 does?

 You can't do pointer arithmetic with WasmGC.
 No subtraction, no getting pointers to fields, nothing like that.
 That is the GC offering currently.

 
 ```
 struct S { int a; }
 S* s = new S();
 s.a = 3;
 ```
 
 What's the problem?

Loading and storing fields work.

But this doesn't when using the Wasm GC:

``int* ptr = &s.a;``

Or this:

```d
func(s.a);
void func(ref int);
```

 For the linear memory, its a memory mapper only, sbrk.
 Oh and you can have multiple linear memories that you have to keep 
 track what the offset is actually for when dereferencing.

 
 I don't get it.

Ahhh, I see.

Here is the man page for bsd 2.11 which is the last BSD (that is 
actively in use by retro community) to run on PDP-11: 
https://man.freebsd.org/cgi/man.cgi?query=sbrk&apropos=0&sektion=0&manpath=2.11+BSD&arch=default&format=html

It was removed in Posix 2001.
https://en.wikipedia.org/wiki/Sbrk

This is how memory is mapped into a process and then is cut up and 
returned by memory allocators like malloc.

For an example of this, open K&R C Programming Language 2nd edition to 
page 185. It has an example malloc implementation that uses sbrk.

This isn't how its done today, these days memory is mapped using mmap 
instead, sbrk isn't an option.

Web assembly folks however decided to do it the way we did it in the 
70's before MMU's were a thing.

 They are typed entirely differently, you cannot mix them.
 It is exactly like near vs far pointers.

 
 ??

Okay, I think I understand the confusion.

Web assembly isn't a byte code like x86 is.

It is fully typed, a reference to a GC object is different to a pointer 
to linear memory (sbrk).

In pseudo code:

```
struct S {
	int field;
}

linear(S*) l = cast(S*)linear_alloc(4);
l++; // ok


gc(S) g = new S;
int* ptr = &g.field; // error no instruction to do this

l = g; // error
g = l; // error

l - l; // ok
g - g; // error no instruction to do this
```

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

Thanks for the explanation. What it suggest to me is that a subset of D will 
work perfectly fine with WASM.

Jan 04

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 05/01/2026 1:33 PM, Walter Bright wrote:
 Thanks for the explanation. What it suggest to me is that a subset of D 
 will work perfectly fine with WASM.

Yes it should do, but it'll be limited enough that people will get 
annoyed with it rather fast, at the very least I would.

Not worth my time building up a new backend and a new runtime for it.

The whole time it'll be nope can't do that, or that or that... Ugh me no 
like.

Jan 04

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Mon, Jan 05, 2026 at 01:48:25PM +1300, Richard (Rikki) Andrew Cattermole via
Digitalmars-d wrote:
 On 05/01/2026 1:33 PM, Walter Bright wrote:
 Thanks for the explanation. What it suggest to me is that a subset
 of D will work perfectly fine with WASM.

 
 Yes it should do, but it'll be limited enough that people will get
 annoyed with it rather fast, at the very least I would.
 
 Not worth my time building up a new backend and a new runtime for it.
 
 The whole time it'll be nope can't do that, or that or that... Ugh me
 no like.

In the past I've managed to get a rudimentary (highly-hacked) druntime
running in WASM, with bare-minimum support for module ctors and JS
interface.  It's quite comfortable to use, except for memory allocation.

Problem is, either you have to port the entire GC implementation to
WASM, which will take up a LOT of code (i.e., slow loading of your
project over the web), and require gobs of memory to run (your memory
requirements will go way up, even for the simplest of modules), or you
have to live with completely no GC, or some hackish in-between.  For
things like frame-based animated games, you could get away with
per-frame allocation, i.e., allocate everything statically before the
main loop, then during the main loop all allocations only last until the
end of the frame, after that it's thrown away.  While it will work, it
lacks the comfort of programming with full GC support.  You couldn't
just use standard D features like delegates and closures without
worrying about lifetime issues.  Things like threading and other
advanced features will of course be very limited as well.

I haven't had the motivation to actually port the GC to WASM, because it
adds so much code that it becomes a bigger project than the target app
itself. I ended up going back to JS for web projects just to avoid
having to grapple with these issues.  Dreamed about writing a D to JS
translator, actually, just haven't gotten around to it yet. :-P

WASM GC is a thing, but it requires treating GC references as separate
types from normal pointers, which D's memory model just doesn't fit in
well with.  Host-managed GC'd memory is also treated differently from
linear memory; the layout of the object must be known to the host so
generic pointers and unions are unsupported.  It also requires LLVM
support if you're using LDC, but AFAIK LLVM doesn't have full support
for WASM GC yet.  IOW, WASM GC imposes restrictions that are
incompatible with D's memory model, so it will be very hard to work with
it.  The only alternative for full GC support is to port the GC itself
into WASM.  As I said, it greatly increases the payload size and memory
requirements, and also won't be as efficient as the host browser's GC.
All in all, a suboptimal situation.


T

-- 
First Rule of History: History doesn't repeat itself -- historians merely
repeat each other.

Jan 04

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 5 January 2026 at 01:17:11 UTC, H. S. Teoh wrote:
 In the past I've managed to get a rudimentary (highly-hacked) 
 druntime running in WASM

https://dpldocs.info/this-week-in-arsd/Blog.Posted_2024_10_25.html

OpenD did most of it successfully in 2024.

Jan 04

Walter Bright <newshound2 digitalmars.com> writes:

On 1/4/2026 4:48 PM, Richard (Rikki) Andrew Cattermole wrote:
 The whole time it'll be nope can't do that, or that or that... Ugh me no like.

It'd be little different from the WASM targets for C++.

Jan 04

Adam Ruppe <destructionator gmail.com> writes:

 I haven't had the motivation to actually port the GC to WASM,

Just use opend, we ported druntime to wasm and make cross compilation
easy. see https://dpldocs.info/this-week-in-arsd/Blog.Posted_2024_10_25.html

Yeah it is a lil bloated, like a megabyte download, but meh.

Jan 04

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/3/26 03:29, Walter Bright wrote:
 The bottom line here is why are we arguing about this?

You brought up some tangential points that I think are based on flawed 
reasoning.

 Haven't we agreed that p-q should be disallowed in  safe code?

With the semantics you clarified it is intended to have, it must indeed 
be ` system`.

 And yes, I oppose optimizers that detect UB and just delete it.

The optimizers don't crave or need your approval, all they need is your 
specification that it is UB. You are thereby inviting them to do this.

Your UB is their dead code. And it helps them delete real dead code that 
they otherwise would not be able to detect. They will not stop doing 
this unless the language stops giving them UB to exploit.

If you don't mean UB, don't say UB.

There are some claims in your last post with which I disagree, but as I 
said, I will not sacrifice sleep in order to argue against everything.

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

On 1/1/2026 6:11 PM, Timon Gehr wrote:
 Let's say the frontend now treats `p-q` as ` system`, and there is not even
any 
 documentation of what its semantics is supposed to be.

The current documentation says:

"If both operands are pointers, and the operator is -, the pointers are 
subtracted and the result is divided by the size of the type pointed to by the 
operands. In this calculation the assumed size of void is one byte. It is an 
error if the pointers point to different types. The type of the result is 
ptrdiff_t."

https://dlang.org/spec/expression.html#pointer_arithmetic

C11 says:

"When two pointers are subtracted, both shall point to elements of the same 
array object, or one past the last element of the array object; the result is 
the difference of the subscripts of the two array elements."

and:

"The behavior is undefined in the following circumstances: A ‘‘shall’’
or 
‘‘shall not’’ requirement that appears outside of a constraint is
violated 
(clause 4)."

In general, it is not possible for the compiler to ensure two pointers point to 
the same object without expensive instrumentation added to the code. The 
practical effect is to assume they do, subtract the values, and divide by the 
size of the type.

The only thing D can do is in  safe code simply disallow p-q, as there are good 
alternatives to do the equivalent thing.

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

https://github.com/dlang/dlang.org/pull/4358

Jan 02

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/2/26 20:51, Walter Bright wrote:
 ...
 
 In general, it is not possible for the compiler to ensure two pointers 
 point to the same object without expensive instrumentation added to the 
 code. The practical effect is to assume they do, subtract the values, 
 and divide by the size of the type.
 ...

This is what people intuitively assume will happen, but UB is not this. 
UB is notoriously prone to confusing programmers as well as compiler 
writers.

One memory-safe alternative semantics to UB would be: if arguments point 
to different memory objects, you may get any result value. This can 
still be implemented by your "practical effect" above, but now it's 
memory safe (in isolation).

You could have an even stronger semantics that also guarantees things 
like `p is p+(q-p)`.

The source and target semantics are the only things the optimizer really 
tends to care about.

 The only thing D can do is in  safe code simply disallow p-q, as there 
 are good alternatives to do the equivalent thing.

It's absolutely not the only possible thing. It is just one way (the C 
way) to deal with the issue.

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

On 1/2/2026 3:17 PM, Timon Gehr wrote:
 One memory-safe alternative semantics to UB would be: if arguments point to 
 different memory objects, you may get any result value. This can still be 
 implemented by your "practical effect" above, but now it's memory safe (in 
 isolation).

I've said that it was memory safe all along. I've said this proposal is 
extending  safe beyond memory safety to include bug detection of other things 
like p-q.

 You could have an even stronger semantics that also guarantees things like `p
is 
 p+(q-p)`.

I don't see any particular use for recognizing that. There are an infinite 
number of patterns that can be recognized, it's only useful to recognize ones 
that occur commonly.

BTW, the optimizer does recognize i+(j-i) as being just j, and it will do it
for 
pointers to 1 byte objects. For pointers to int, the intermediate code looks 
like p+((q-p)/4)*4 which is not recognized.

 The only thing D can do is in  safe code simply disallow p-q, as there are 
 good alternatives to do the equivalent thing.

 
 It's absolutely not the only possible thing. It is just one way (the C way) to 
 deal with the issue.

Please explain the other ways.

Jan 03

Timon Gehr <timon.gehr gmx.ch> writes:

On 1/3/26 20:28, Walter Bright wrote:
 On 1/2/2026 3:17 PM, Timon Gehr wrote:
 One memory-safe alternative semantics to UB would be: if arguments 
 point to different memory objects, you may get any result value. This 
 can still be implemented by your "practical effect" above, but now 
 it's memory safe (in isolation).

 
 I've said that it was memory safe all along.

You have both said that and then also said that I am obviously correct 
when I say that UB is not memory safe. `p-q` is sometimes UB in C.

 I've said this proposal is 
 extending  safe beyond memory safety to include bug detection of other 
 things like p-q.
 ...

You said that, it's false. This is not merely a bug, it is UB if you 
copy the C semantics.

 You could have an even stronger semantics that also guarantees things 
 like `p is p+(q-p)`.

 
 I don't see any particular use for recognizing that.

It was an example of what even stronger semantics would possibly 
guarantee, not a suggestion to detect something as a special case. In 
this thread, you have often shot off on a tangent and then made a moot 
unrelated claim along that tangent. I have to assume these are all the 
result of misunderstandings.

 There are an 
 infinite number of patterns that can be recognized, it's only useful to 
 recognize ones that occur commonly.
 ...

There was literally someone in this thread who said they commonly rely 
on this particular pattern and that it gives them better performance.

 BTW, the optimizer does recognize i+(j-i) as being just j, and it will 
 do it for pointers to 1 byte objects. For pointers to int, the 
 intermediate code looks like p+((q-p)/4)*4 which is not recognized.
 ...

The point was not about whether it is actually just the expression `j`, 
the point was whether it will always result in the same value as `j`, no 
matter how it got there (assuming the pointers were properly aligned).

 The only thing D can do is in  safe code simply disallow p-q, as 
 there are good alternatives to do the equivalent thing.

 It's absolutely not the only possible thing. It is just one way (the C 
 way) to deal with the issue.

 
 Please explain the other ways.

I already have, in that same post. What is important for a ` safe` 
construct is that it always has defined behavior. That behavior can be 
nondeterministic, it just can't be undefined. You can for example say: 
the semantics is that it will yield an arbitrary value or crash. That 
would be a memory safe semantics. UB is not.

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

I'm just going to make p-q an error in  safe code. If someone still wants to do 
it, they can cast the pointers to size_t, or make it  trusted/ system code.

Jan 03

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 [snip]
 So this would be valid, as the two pointers are known to point 
 to the same memory object.
 [snip]

To what extent can D know when pointers are known to point to the 
same object?

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

On 1/2/2026 11:20 AM, jmh530 wrote:
 On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 [snip]
 So this would be valid, as the two pointers are known to point to the same 
 memory object.
 [snip]

 
 To what extent can D know when pointers are known to point to the same object?

Some can be done trivially, such as (&c - &d). More can be discovered with DFA 
(Data Flow Analysis), but not really that much. Just like not many cases of
null 
dereference can be unambiguously discovered with DFA.

Jan 02

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 03/01/2026 9:21 AM, Walter Bright wrote:
 On 1/2/2026 11:20 AM, jmh530 wrote:
 On Thursday, 1 January 2026 at 06:15:09 UTC, Walter Bright wrote:
 [snip]
 So this would be valid, as the two pointers are known to point to the 
 same memory object.
 [snip]

 To what extent can D know when pointers are known to point to the same 
 object?

 
 Some can be done trivially, such as (&c - &d). More can be discovered 
 with DFA (Data Flow Analysis), but not really that much. Just like not 
 many cases of null dereference can be unambiguously discovered with DFA.

Four days ago I started implementing value tracking.

I can get objects like:

```d
int a, b;
int* ptr = condition ? &a : &b;
```

And see that ptr could be either a or b.

Its also possible to see:

```d
int* ptr = new int, oldObj = ptr;

foreach(i; 0 .. 10) {
	ptr = new int;
}

assert(ptr !is oldObj);
```

But what you can't do with DFA alone:

```d
void func(int* a, int* b) {
	assert(a is b); // how can I know this?
}
```

While I'd love to have knowledge that pointers are not from the same 
object, or are, realistically its beyond what can be annotated on code 
explicitly, let alone inferred.

I've been trying to solve for this for well over a year and have not 
made progress on it.

The best you can really hope for here I suspect is ownership transfer 
and modelling it in the function that borrows from it. As well as new 
and stack allocations ext.

Jan 02

Walter Bright <newshound2 digitalmars.com> writes:

Sorry about that. If you ask me, it may be possible, but it sure looks
impractical.

Jan 03

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 04/01/2026 3:13 PM, Walter Bright wrote:
 Sorry about that. If you ask me, it may be possible, but it sure looks 
 impractical.

Its not your fault, its a hard problem that I don't think has any good 
answers. I expect that I would've found one by now if it wasn't.

Jan 03

Walter Bright <newshound2 digitalmars.com> writes:

https://github.com/dlang/dmd/pull/22348

Jan 03

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 4 January 2026 at 06:57:11 UTC, Walter Bright wrote:
 https://github.com/dlang/dmd/pull/22348

2024 edition?

And I recall a discussion of a procedure in place for checking 
for breakage in projects. Is there anything formal on this front?

Jan 04

D Programming

C/C++ Programming

Other

digitalmars.D - Should (p - q) be disallowed in safe code?