www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is D's pointer subtraction more permissive than C (and C++)?

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
As the following quote from a Microsoft document claims, and as I've 
already known, pointer subtraction is legal only if the pointers are 
into the same array: "ANSI 3.3.6, 4.1.1 The type of integer required to 
hold the difference between two pointers to elements of the same array, 
ptrdiff_t." ( 
https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr
ction?view=msvc-170 
)

I suspect "array" means "a block of memory" there because arrays are 
ordinarily malloc'ed pieces of memory in C.

1) Is D more permissive (or anemic in documentation:))? I ask because 
paragraph 5 below does not mention the pointers should be related in any 
way:

   https://dlang.org/spec/expression.html#pointer_arithmetic

2) Is subtracting pointers that used to be in the same array legal.

void main() {
   auto a = [ 1, 2 ];
   auto b = a;
   assert(a.ptr - b.ptr == 0);    // i) Obviously legal?

   // Drop the first element
   a = a[1..$];
   assert(a.ptr - b.ptr == 1);    // ii) GC-behaviorally legal?

   // Save the pointer
   const old_aPtr = a.ptr;
   // and move the array to another memory
   a.length = 1_000_000;
   // Expect a and b are on different blocks of memory
   assert(a.ptr != old_aPtr);

   assert(old_aPtr - b.ptr == 1);  // iii) Practically legal?
}

Regardless of your answer, I will go ahead and perform that last 
subtraction :).

Ali

P.S. I am trying to implement a type where slices will follow the 
elements as the elements may be moved in memory:

   const old_aPtr = a.ptr;

   a ~= e;
   if (a.ptr != old_aPtr) {
     // Elements are moved; adjust the slice
     assert(b.ptr >= old_aPtr);
     const old_bOffset = b.ptr - old_aPtr;
     b = a[old_bOffset .. $];
   }

If you ask why I don't keep offsets instead of slices to begin with, I 
want to use pointers (implicitly in D slices) so that they participate 
in the ownership of array elements so that the GC does not free earlier 
elements as the buffer is popFronted as well.
Apr 01 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 11:52 AM, Ali Çehreli wrote:
 As the following quote from a Microsoft document claims, and as I've 
 already known, pointer subtraction is legal only if the pointers are 
 into the same array: "ANSI 3.3.6, 4.1.1 The type of integer required to 
 hold the difference between two pointers to elements of the same array, 
 ptrdiff_t." ( 
 https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr
ction?view=msvc-170 
 )
 
 I suspect "array" means "a block of memory" there because arrays are 
 ordinarily malloc'ed pieces of memory in C.
I'm assuming this has to do with the ability to detect artifacts of how the compiler/library lays out memory, which shouldn't really figure into program behavior. In practice, I don't see how it affects the behavior *of the compiler*. When you subtract two pointers, I don't see how the compiler/optimizer can make some other decision based on the subtraction not being between two pointers to the same block of memory.
 
 1) Is D more permissive (or anemic in documentation:))? I ask because 
 paragraph 5 below does not mention the pointers should be related in any 
 way:
 
    https://dlang.org/spec/expression.html#pointer_arithmetic
I assume this is because nobody thought about it? But I don't see a problem with omitting that requirement.
 
 2) Is subtracting pointers that used to be in the same array legal.
 
 void main() {
    auto a = [ 1, 2 ];
    auto b = a;
    assert(a.ptr - b.ptr == 0);    // i) Obviously legal?
 
    // Drop the first element
    a = a[1..$];
    assert(a.ptr - b.ptr == 1);    // ii) GC-behaviorally legal?
 
    // Save the pointer
    const old_aPtr = a.ptr;
    // and move the array to another memory
    a.length = 1_000_000;
    // Expect a and b are on different blocks of memory
    assert(a.ptr != old_aPtr);
 
    assert(old_aPtr - b.ptr == 1);  // iii) Practically legal?
 }
Assuming C rules, I still think all this is legal. I'd even hazard to guess it's legal to do this: ```c struct S { int arr1[5], arr2[5]; } void foo() { S s; ptrdiff_t p = &s.arr1[0] - &s.arr2[0]; } ``` Because you know the relationship between the pointers is defined. I.e. this is NEVER going to change from run to run, or build to build.
 
 Regardless of your answer, I will go ahead and perform that last 
 subtraction :).
 
 Ali
 
 P.S. I am trying to implement a type where slices will follow the 
 elements as the elements may be moved in memory:
 
    const old_aPtr = a.ptr;
 
    a ~= e;
    if (a.ptr != old_aPtr) {
      // Elements are moved; adjust the slice
      assert(b.ptr >= old_aPtr);
      const old_bOffset = b.ptr - old_aPtr;
      b = a[old_bOffset .. $];
    }
 
 If you ask why I don't keep offsets instead of slices to begin with, I 
 want to use pointers (implicitly in D slices) so that they participate 
 in the ownership of array elements so that the GC does not free earlier 
 elements as the buffer is popFronted as well.
This should be fine. I would suggest to store things as offsets anyway, and have accessors for the pointers. -Steve
Apr 01 2022
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/22 10:39, Steven Schveighoffer wrote:

 I don't see how the compiler/optimizer
 can make some other decision based on the subtraction not being between
 two pointers to the same block of memory.
I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted. Ali
Apr 01 2022
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 4:22 PM, Ali Çehreli wrote:
 On 4/1/22 10:39, Steven Schveighoffer wrote:
 
  > I don't see how the compiler/optimizer
  > can make some other decision based on the subtraction not being between
  > two pointers to the same block of memory.
 
 I think this rule is related to C's accepting wildly different 
 platforms, some of which may have different kinds of memory. Two 
 pointers to different kinds of memory may not be subtracted.
Well, can the pointers be subtracted? Yes. What is the result? If they are in the same block, the difference in elements between two pointers. If they are not in the same block, anything. This is why I don't know that it's important to avoid it. -Steve
Apr 01 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/22 13:47, Steven Schveighoffer wrote:

 I think this rule is related to C's accepting wildly different
 platforms, some of which may have different kinds of memory. Two
 pointers to different kinds of memory may not be subtracted.
Well, can the pointers be subtracted? Yes.
My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)
 This is why I don't know that it's important to avoid it.
I will not avoid it. :) Ali
Apr 01 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via Digitalmars-d wrote:
 On 4/1/22 13:47, Steven Schveighoffer wrote:
 
 I think this rule is related to C's accepting wildly different
 platforms, some of which may have different kinds of memory. Two
 pointers to different kinds of memory may not be subtracted.
Well, can the pointers be subtracted? Yes.
My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)
[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-D T -- What doesn't kill me makes me stranger.
Apr 01 2022
next sibling parent norm <norm.rowtree gmail.com> writes:
On Friday, 1 April 2022 at 21:20:00 UTC, H. S. Teoh wrote:
 On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via 
 Digitalmars-d wrote:
 On 4/1/22 13:47, Steven Schveighoffer wrote:
 
 I think this rule is related to C's accepting wildly 
 different platforms, some of which may have different kinds 
 of memory. Two pointers to different kinds of memory may 
 not be subtracted.
Well, can the pointers be subtracted? Yes.
My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)
[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-D T
Ahh the "good" old days were not that good really when it came to addressing :-) All x86 still start up in real mode and the "flat" modes (protected mode, long mode etc.) are still reached by SW putting the CPU into that mode. Today that happens in your boot loader as all desktop SW pretty much runs in 64bit flat mode but back in the 90's most SW would have to manage this itself. There is an old, yet still interesting, blog about Win95 and the world as it was when we were all slowly migrating from 16bit real mode DOS to 32 bit flat mode OS's: https://devblogs.microsoft.com/oldnewthing/20071224-00/?p=24063 (Any blog by Raymond Chen is well worth a read IMO, I always learn something new) Sorry my post has gone completely off topic.
Apr 01 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 5:20 PM, H. S. Teoh wrote:
 On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via Digitalmars-d wrote:
 On 4/1/22 13:47, Steven Schveighoffer wrote:

 I think this rule is related to C's accepting wildly different
 platforms, some of which may have different kinds of memory. Two
 pointers to different kinds of memory may not be subtracted.
Well, can the pointers be subtracted? Yes.
My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)
[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-D
Right, but my larger point was, the *subtraction itself* is not harmful. There's two ways to look at this. First, if you subtract two pointers that *aren't to the same block*, then the data is garbage. The other way is that it is *undefined behavior*. I think *using* that subtracted difference to e.g. index a pointer is what would be UB. But the subtraction itself is ok. -Steve
Apr 01 2022
parent reply Paul Backus <snarwin gmail.com> writes:
On Saturday, 2 April 2022 at 02:20:34 UTC, Steven Schveighoffer 
wrote:
 Right, but my larger point was, the *subtraction itself* is not 
 harmful.

 There's two ways to look at this. First, if you subtract two 
 pointers that *aren't to the same block*, then the data is 
 garbage. The other way is that it is *undefined behavior*. I 
 think *using* that subtracted difference to e.g. index a 
 pointer is what would be UB. But the subtraction itself is ok.
It's UB just to perform the subtraction:
 [C11 § 6.5.6 ¶ 9][1] When two pointers are subtracted, both 
 **shall** point to elements of the same array object, or one 
 past the last element of the array object
 [C11 § 4 ¶ 2][2] If a ''shall'' or ''shall not'' requirement 
 that appears outside of a constraint or runtime- constraint is 
 violated, the behavior is undefined.
Apr 02 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/2/22 1:08 PM, Paul Backus wrote:
 On Saturday, 2 April 2022 at 02:20:34 UTC, Steven Schveighoffer wrote:
 Right, but my larger point was, the *subtraction itself* is not harmful.

 There's two ways to look at this. First, if you subtract two pointers 
 that *aren't to the same block*, then the data is garbage. The other 
 way is that it is *undefined behavior*. I think *using* that 
 subtracted difference to e.g. index a pointer is what would be UB. But 
 the subtraction itself is ok.
It's UB just to perform the subtraction:
 [C11 § 6.5.6 ¶ 9][1] When two pointers are subtracted, both **shall** 
 point to elements of the same array object, or one past the last 
 element of the array object
 [C11 § 4 ¶ 2][2] If a ''shall'' or ''shall not'' requirement that 
 appears outside of a constraint or runtime- constraint is violated, 
 the behavior is undefined.
Yep. That's a steep penalty. It looks like C is trying to avoid having to specify how memory works without actually specifying it (likely on purpose so it doesn't tie down hardware developers to one memory model). An interesting read I found on C and pointer comparisons here: https://stefansf.de/post/pointers-are-more-abstract-than-you-might-expect/ -Steve
Apr 02 2022
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Friday, 1 April 2022 at 20:22:46 UTC, Ali Çehreli wrote:
 On 4/1/22 10:39, Steven Schveighoffer wrote:

 I don't see how the compiler/optimizer
 can make some other decision based on the subtraction not
being between
 two pointers to the same block of memory.
I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted. Ali
In C you're not even allowed to cast a function pointer to a data pointer. Posix requires that they are (or else no dynamic linking). These C restictions are put in place because of the all the platforms it is supposed to work (subtracting pointers on MS-DOS or on PDP-10 is not an easy proposition). For simplicity sake D was defined to be implemented only on at least 32 bit machines with memory protection and linear address range.
Apr 03 2022
prev sibling parent reply Elronnd <elronnd elronnd.net> writes:
On Friday, 1 April 2022 at 17:39:24 UTC, Steven Schveighoffer 
wrote:
 In practice, I don't see how it affects the behavior *of the 
 compiler*. When you subtract two pointers, I don't see how the 
 compiler/optimizer can make some other decision based on the 
 subtraction not being between two pointers to the same block of 
 memory.
Unfortunately, they can and do. For instance, consider this snippet of c code: #include <stddef.h> #include <stdlib.h> int f() { int *x = malloc(1), *y = malloc(1); ptrdiff_t d = y - x; return y == x + d; } GCC compiles this to the same code as: int f() { return 1; } This is intertwined with issues of provenance.
Apr 02 2022
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/2/22 5:38 AM, Elronnd wrote:
 On Friday, 1 April 2022 at 17:39:24 UTC, Steven Schveighoffer wrote:
 In practice, I don't see how it affects the behavior *of the 
 compiler*. When you subtract two pointers, I don't see how the 
 compiler/optimizer can make some other decision based on the 
 subtraction not being between two pointers to the same block of memory.
Unfortunately, they can and do.  For instance, consider this snippet of c code: #include <stddef.h> #include <stdlib.h> int f() {     int *x = malloc(1), *y = malloc(1);     ptrdiff_t d = y - x;     return y == x + d; } GCC compiles this to the same code as: int f() { return 1; } This is intertwined with issues of provenance.
Wait, how does that differ from how it would handle pointers to the same block? If we use something other than pointers, it becomes obvious why this happens: ```c int x = ...; int y = ...; int d = x - y; return y == x + d; ``` This is trivially always going to be true. -Steve
Apr 02 2022
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:
 1) Is D more permissive (or anemic in documentation:))? I ask 
 because paragraph 5 below does not mention the pointers should 
 be related in any way:

   https://dlang.org/spec/expression.html#pointer_arithmetic
The spec is permissive, but I would not be terribly surprised if the implementation (specifically, LDC and GDC, which share backends with C compilers) actually enforced the same restrictions as C. There are similar issues with null dereferences: D's spec says they have defined behavior, but actual D compilers fail to guarantee this in some cases.
 2) Is subtracting pointers that used to be in the same array 
 legal.

 void main() {
   auto a = [ 1, 2 ];
   auto b = a;
   assert(a.ptr - b.ptr == 0);    // i) Obviously legal?

   // Drop the first element
   a = a[1..$];
   assert(a.ptr - b.ptr == 1);    // ii) GC-behaviorally legal?

   // Save the pointer
   const old_aPtr = a.ptr;
   // and move the array to another memory
   a.length = 1_000_000;
   // Expect a and b are on different blocks of memory
   assert(a.ptr != old_aPtr);

   assert(old_aPtr - b.ptr == 1);  // iii) Practically legal?
 }
According to the C rules, (i) and (ii) are legal, since they point to the same memory block, but (iii) is illegal.
Apr 01 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/22 2:44 PM, Paul Backus wrote:
 On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:
 2) Is subtracting pointers that used to be in the same array legal.

 void main() {
   auto a = [ 1, 2 ];
   auto b = a;
   assert(a.ptr - b.ptr == 0);    // i) Obviously legal?

   // Drop the first element
   a = a[1..$];
   assert(a.ptr - b.ptr == 1);    // ii) GC-behaviorally legal?

   // Save the pointer
   const old_aPtr = a.ptr;
   // and move the array to another memory
   a.length = 1_000_000;
   // Expect a and b are on different blocks of memory
   assert(a.ptr != old_aPtr);

   assert(old_aPtr - b.ptr == 1);  // iii) Practically legal?
 }
According to the C rules, (i) and (ii) are legal, since they point to the same memory block, but (iii) is illegal.
(iii) is the same as (ii) because old_aPtr is the same as a.ptr at that time. -Steve
Apr 01 2022
parent Paul Backus <snarwin gmail.com> writes:
On Friday, 1 April 2022 at 19:43:01 UTC, Steven Schveighoffer 
wrote:
 On 4/1/22 2:44 PM, Paul Backus wrote:
 According to the C rules, (i) and (ii) are legal, since they 
 point to the same memory block, but (iii) is illegal.
(iii) is the same as (ii) because old_aPtr is the same as a.ptr at that time. -Steve
You're right; my mistake. I misread it as `a.ptr - b.ptr`, since that's what the other two `assert`s do.
Apr 01 2022
prev sibling next sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:
 As the following quote from a Microsoft document claims, and as 
 I've already known, pointer subtraction is legal only if the 
 pointers are into the same array: "ANSI 3.3.6, 4.1.1 The type 
 of integer required to hold the difference between two pointers 
 to elements of the same array, ptrdiff_t." ( 
 https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr
ction?view=msvc-170 )

 I suspect "array" means "a block of memory" there because 
 arrays are ordinarily malloc'ed pieces of memory in C.
 [...]
```d import std.stdio; enum testLimit = 1024 * 1024 * 62; void main() { char[] first ="a12345".dup; // length =>6 first.length = testLimit; first[$-1] = 'z'; auto hLen = first.length/2; // =>3 auto sliceRight = first[hLen..$];// "345" char* oneHalf = &first[hLen++]; // =>[3] char[] half12 = first[0..hLen]; // =>[0..4] char[] half22 = oneHalf[0..hLen];// =>[0..4] // legal? Ok. --hLen; // =>3 for(int i; i < hLen; i++) assert(&sliceRight[i] - &half12[i] == hLen); // legal? Ok. half12[0..3].write("..."); // "a12..." sliceRight[$-3..$].writeln; // "��z" char* a = &half12[0]; // "a" char* z = &sliceRight[$-1]; // "z" writefln("[%c]%s\n[%c]%s", *a, &(*a), *z, &(*z)); assert( (&(*z) - &(*a)) == (testLimit - 1) ); // legal? Ok. auto test = half22.ptr - half12.ptr; assert(test == hLen);/* test.writeln;//*/ // legal? Ok. auto last = first; assert(first.ptr - last.ptr == 0); // legal? Ok. last ~= '.'; assert(&last[$-1] - &first[$-1] == 1); } ``` Everything is legal... SDB 79
Apr 01 2022
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
You're right that the spec should add some commentary saying that pointer 
arithmetic should be confined to being within the same memory object.
Apr 02 2022
parent reply Tobias Pankrath <tobias pankrath.net> writes:
On Saturday, 2 April 2022 at 18:54:58 UTC, Walter Bright wrote:
 You're right that the spec should add some commentary saying 
 that pointer arithmetic should be confined to being within the 
 same memory object.
What's the definition of memory object here? Does the C Standard treat malloc/calloc and co as special functions and memory objects are what is returned by them?
Apr 03 2022
parent Paul Backus <snarwin gmail.com> writes:
On Sunday, 3 April 2022 at 12:22:22 UTC, Tobias Pankrath wrote:
 On Saturday, 2 April 2022 at 18:54:58 UTC, Walter Bright wrote:
 You're right that the spec should add some commentary saying 
 that pointer arithmetic should be confined to being within the 
 same memory object.
What's the definition of memory object here? Does the C Standard treat malloc/calloc and co as special functions and memory objects are what is returned by them?
In D: https://dlang.org/spec/intro.html#object-model
Apr 03 2022