digitalmars.D - Is D's pointer subtraction more permissive than C (and C++)?
- =?UTF-8?Q?Ali_=c3=87ehreli?= (46/46) Apr 01 2022 As the following quote from a Microsoft document claims, and as I've
- Steven Schveighoffer (27/87) Apr 01 2022 I'm assuming this has to do with the ability to detect artifacts of how
- =?UTF-8?Q?Ali_=c3=87ehreli?= (5/8) Apr 01 2022 I think this rule is related to C's accepting wildly different
- Steven Schveighoffer (6/15) Apr 01 2022 Well, can the pointers be subtracted? Yes. What is the result? If they
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/12) Apr 01 2022 My understanding is that depending on the CPU, certain operations would
- H. S. Teoh (17/29) Apr 01 2022 [...]
- norm (15/48) Apr 01 2022 Ahh the "good" old days were not that good really when it came to
- Steven Schveighoffer (8/36) Apr 01 2022 Right, but my larger point was, the *subtraction itself* is not harmful.
- Paul Backus (5/18) Apr 02 2022 It's UB just to perform the subtraction:
- Steven Schveighoffer (8/29) Apr 02 2022 Yep. That's a steep penalty.
- Patrick Schluter (10/19) Apr 03 2022 In C you're not even allowed to cast a function pointer to a data
- Elronnd (14/19) Apr 02 2022 Unfortunately, they can and do. For instance, consider this
- Steven Schveighoffer (12/34) Apr 02 2022 Wait, how does that differ from how it would handle pointers to the same...
- Paul Backus (9/30) Apr 01 2022 The spec is permissive, but I would not be terribly surprised if
- Steven Schveighoffer (4/28) Apr 01 2022 (iii) is the same as (ii) because old_aPtr is the same as a.ptr at that
- Paul Backus (4/11) Apr 01 2022 You're right; my mistake. I misread it as `a.ptr - b.ptr`, since
- Salih Dincer (41/50) Apr 01 2022 ```d
- Walter Bright (2/2) Apr 02 2022 You're right that the spec should add some commentary saying that pointe...
- Tobias Pankrath (4/7) Apr 03 2022 What's the definition of memory object here? Does the C Standard
- Paul Backus (3/10) Apr 03 2022 In C: http://port70.net/~nsz/c/c11/n1570.html#3.15
As the following quote from a Microsoft document claims, and as I've already known, pointer subtraction is legal only if the pointers are into the same array: "ANSI 3.3.6, 4.1.1 The type of integer required to hold the difference between two pointers to elements of the same array, ptrdiff_t." ( https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr ction?view=msvc-170 ) I suspect "array" means "a block of memory" there because arrays are ordinarily malloc'ed pieces of memory in C. 1) Is D more permissive (or anemic in documentation:))? I ask because paragraph 5 below does not mention the pointers should be related in any way: https://dlang.org/spec/expression.html#pointer_arithmetic 2) Is subtracting pointers that used to be in the same array legal. void main() { auto a = [ 1, 2 ]; auto b = a; assert(a.ptr - b.ptr == 0); // i) Obviously legal? // Drop the first element a = a[1..$]; assert(a.ptr - b.ptr == 1); // ii) GC-behaviorally legal? // Save the pointer const old_aPtr = a.ptr; // and move the array to another memory a.length = 1_000_000; // Expect a and b are on different blocks of memory assert(a.ptr != old_aPtr); assert(old_aPtr - b.ptr == 1); // iii) Practically legal? } Regardless of your answer, I will go ahead and perform that last subtraction :). Ali P.S. I am trying to implement a type where slices will follow the elements as the elements may be moved in memory: const old_aPtr = a.ptr; a ~= e; if (a.ptr != old_aPtr) { // Elements are moved; adjust the slice assert(b.ptr >= old_aPtr); const old_bOffset = b.ptr - old_aPtr; b = a[old_bOffset .. $]; } If you ask why I don't keep offsets instead of slices to begin with, I want to use pointers (implicitly in D slices) so that they participate in the ownership of array elements so that the GC does not free earlier elements as the buffer is popFronted as well.
Apr 01 2022
On 4/1/22 11:52 AM, Ali Çehreli wrote:As the following quote from a Microsoft document claims, and as I've already known, pointer subtraction is legal only if the pointers are into the same array: "ANSI 3.3.6, 4.1.1 The type of integer required to hold the difference between two pointers to elements of the same array, ptrdiff_t." ( https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr ction?view=msvc-170 ) I suspect "array" means "a block of memory" there because arrays are ordinarily malloc'ed pieces of memory in C.I'm assuming this has to do with the ability to detect artifacts of how the compiler/library lays out memory, which shouldn't really figure into program behavior. In practice, I don't see how it affects the behavior *of the compiler*. When you subtract two pointers, I don't see how the compiler/optimizer can make some other decision based on the subtraction not being between two pointers to the same block of memory.1) Is D more permissive (or anemic in documentation:))? I ask because paragraph 5 below does not mention the pointers should be related in any way:  https://dlang.org/spec/expression.html#pointer_arithmeticI assume this is because nobody thought about it? But I don't see a problem with omitting that requirement.2) Is subtracting pointers that used to be in the same array legal. void main() {  auto a = [ 1, 2 ];  auto b = a;  assert(a.ptr - b.ptr == 0);   // i) Obviously legal?  // Drop the first element  a = a[1..$];  assert(a.ptr - b.ptr == 1);   // ii) GC-behaviorally legal?  // Save the pointer  const old_aPtr = a.ptr;  // and move the array to another memory  a.length = 1_000_000;  // Expect a and b are on different blocks of memory  assert(a.ptr != old_aPtr);  assert(old_aPtr - b.ptr == 1); // iii) Practically legal? }Assuming C rules, I still think all this is legal. I'd even hazard to guess it's legal to do this: ```c struct S { int arr1[5], arr2[5]; } void foo() { S s; ptrdiff_t p = &s.arr1[0] - &s.arr2[0]; } ``` Because you know the relationship between the pointers is defined. I.e. this is NEVER going to change from run to run, or build to build.Regardless of your answer, I will go ahead and perform that last subtraction :). Ali P.S. I am trying to implement a type where slices will follow the elements as the elements may be moved in memory:  const old_aPtr = a.ptr;  a ~= e;  if (a.ptr != old_aPtr) {    // Elements are moved; adjust the slice    assert(b.ptr >= old_aPtr);    const old_bOffset = b.ptr - old_aPtr;    b = a[old_bOffset .. $];  } If you ask why I don't keep offsets instead of slices to begin with, I want to use pointers (implicitly in D slices) so that they participate in the ownership of array elements so that the GC does not free earlier elements as the buffer is popFronted as well.This should be fine. I would suggest to store things as offsets anyway, and have accessors for the pointers. -Steve
Apr 01 2022
On 4/1/22 10:39, Steven Schveighoffer wrote:I don't see how the compiler/optimizer can make some other decision based on the subtraction not being between two pointers to the same block of memory.I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted. Ali
Apr 01 2022
On 4/1/22 4:22 PM, Ali Çehreli wrote:On 4/1/22 10:39, Steven Schveighoffer wrote: > I don't see how the compiler/optimizer > can make some other decision based on the subtraction not being between > two pointers to the same block of memory. I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted.Well, can the pointers be subtracted? Yes. What is the result? If they are in the same block, the difference in elements between two pointers. If they are not in the same block, anything. This is why I don't know that it's important to avoid it. -Steve
Apr 01 2022
On 4/1/22 13:47, Steven Schveighoffer wrote:My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted.Well, can the pointers be subtracted? Yes.This is why I don't know that it's important to avoid it.I will not avoid it. :) Ali
Apr 01 2022
On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via Digitalmars-d wrote:On 4/1/22 13:47, Steven Schveighoffer wrote:[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-D T -- What doesn't kill me makes me stranger.My understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted.Well, can the pointers be subtracted? Yes.
Apr 01 2022
On Friday, 1 April 2022 at 21:20:00 UTC, H. S. Teoh wrote:On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via Digitalmars-d wrote:Ahh the "good" old days were not that good really when it came to addressing :-) All x86 still start up in real mode and the "flat" modes (protected mode, long mode etc.) are still reached by SW putting the CPU into that mode. Today that happens in your boot loader as all desktop SW pretty much runs in 64bit flat mode but back in the 90's most SW would have to manage this itself. There is an old, yet still interesting, blog about Win95 and the world as it was when we were all slowly migrating from 16bit real mode DOS to 32 bit flat mode OS's: https://devblogs.microsoft.com/oldnewthing/20071224-00/?p=24063 (Any blog by Raymond Chen is well worth a read IMO, I always learn something new) Sorry my post has gone completely off topic.On 4/1/22 13:47, Steven Schveighoffer wrote:[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-D TMy understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted.Well, can the pointers be subtracted? Yes.
Apr 01 2022
On 4/1/22 5:20 PM, H. S. Teoh wrote:On Fri, Apr 01, 2022 at 01:56:19PM -0700, Ali Çehreli via Digitalmars-d wrote:Right, but my larger point was, the *subtraction itself* is not harmful. There's two ways to look at this. First, if you subtract two pointers that *aren't to the same block*, then the data is garbage. The other way is that it is *undefined behavior*. I think *using* that subtracted difference to e.g. index a pointer is what would be UB. But the subtraction itself is ok. -SteveOn 4/1/22 13:47, Steven Schveighoffer wrote:[...] In the bad ole days of segmented protected memory (around the days of the 386 or 486, IIRC), you could have memory with different segment prefixes, referenced using a convoluted scheme of far ptrs and near ptrs. Near ptrs are relative to a particular segment; subtracting near ptrs associated with diverse segment pointers yields nonsensical values. You can subtract far ptrs, sorta-kinda, but the results are likely to be either garbage, or else refer to an address that can't be addressed with existing segment pointers. So basically, it's Trouble with a capital T. On modern machines, though, this is no longer relevant. I think people figured out real quick that segmented addressing is just way more trouble than it's worth, so we came running back, tail between legs, to the flat memory model and embraced it like there's no tomorrow. :-DMy understanding is that depending on the CPU, certain operations would make the CPU barf. I am not sure but the old protected memory, extended memory, etc. systems might not be able to subtract between the systems at all. (Not sure; I am making this up.)I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted.Well, can the pointers be subtracted? Yes.
Apr 01 2022
On Saturday, 2 April 2022 at 02:20:34 UTC, Steven Schveighoffer wrote:Right, but my larger point was, the *subtraction itself* is not harmful. There's two ways to look at this. First, if you subtract two pointers that *aren't to the same block*, then the data is garbage. The other way is that it is *undefined behavior*. I think *using* that subtracted difference to e.g. index a pointer is what would be UB. But the subtraction itself is ok.It's UB just to perform the subtraction:[C11 § 6.5.6 ¶ 9][1] When two pointers are subtracted, both **shall** point to elements of the same array object, or one past the last element of the array object[C11 § 4 ¶ 2][2] If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined.
Apr 02 2022
On 4/2/22 1:08 PM, Paul Backus wrote:On Saturday, 2 April 2022 at 02:20:34 UTC, Steven Schveighoffer wrote:Yep. That's a steep penalty. It looks like C is trying to avoid having to specify how memory works without actually specifying it (likely on purpose so it doesn't tie down hardware developers to one memory model). An interesting read I found on C and pointer comparisons here: https://stefansf.de/post/pointers-are-more-abstract-than-you-might-expect/ -SteveRight, but my larger point was, the *subtraction itself* is not harmful. There's two ways to look at this. First, if you subtract two pointers that *aren't to the same block*, then the data is garbage. The other way is that it is *undefined behavior*. I think *using* that subtracted difference to e.g. index a pointer is what would be UB. But the subtraction itself is ok.It's UB just to perform the subtraction:[C11 § 6.5.6 ¶ 9][1] When two pointers are subtracted, both **shall** point to elements of the same array object, or one past the last element of the array object[C11 § 4 ¶ 2][2] If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined.
Apr 02 2022
On Friday, 1 April 2022 at 20:22:46 UTC, Ali Çehreli wrote:On 4/1/22 10:39, Steven Schveighoffer wrote:In C you're not even allowed to cast a function pointer to a data pointer. Posix requires that they are (or else no dynamic linking). These C restictions are put in place because of the all the platforms it is supposed to work (subtracting pointers on MS-DOS or on PDP-10 is not an easy proposition). For simplicity sake D was defined to be implemented only on at least 32 bit machines with memory protection and linear address range.I don't see how the compiler/optimizer can make some other decision based on the subtraction notbeing betweentwo pointers to the same block of memory.I think this rule is related to C's accepting wildly different platforms, some of which may have different kinds of memory. Two pointers to different kinds of memory may not be subtracted. Ali
Apr 03 2022
On Friday, 1 April 2022 at 17:39:24 UTC, Steven Schveighoffer wrote:In practice, I don't see how it affects the behavior *of the compiler*. When you subtract two pointers, I don't see how the compiler/optimizer can make some other decision based on the subtraction not being between two pointers to the same block of memory.Unfortunately, they can and do. For instance, consider this snippet of c code: #include <stddef.h> #include <stdlib.h> int f() { int *x = malloc(1), *y = malloc(1); ptrdiff_t d = y - x; return y == x + d; } GCC compiles this to the same code as: int f() { return 1; } This is intertwined with issues of provenance.
Apr 02 2022
On 4/2/22 5:38 AM, Elronnd wrote:On Friday, 1 April 2022 at 17:39:24 UTC, Steven Schveighoffer wrote:Wait, how does that differ from how it would handle pointers to the same block? If we use something other than pointers, it becomes obvious why this happens: ```c int x = ...; int y = ...; int d = x - y; return y == x + d; ``` This is trivially always going to be true. -SteveIn practice, I don't see how it affects the behavior *of the compiler*. When you subtract two pointers, I don't see how the compiler/optimizer can make some other decision based on the subtraction not being between two pointers to the same block of memory.Unfortunately, they can and do. For instance, consider this snippet of c code: #include <stddef.h> #include <stdlib.h> int f() {     int *x = malloc(1), *y = malloc(1);     ptrdiff_t d = y - x;     return y == x + d; } GCC compiles this to the same code as: int f() { return 1; } This is intertwined with issues of provenance.
Apr 02 2022
On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:1) Is D more permissive (or anemic in documentation:))? I ask because paragraph 5 below does not mention the pointers should be related in any way: https://dlang.org/spec/expression.html#pointer_arithmeticThe spec is permissive, but I would not be terribly surprised if the implementation (specifically, LDC and GDC, which share backends with C compilers) actually enforced the same restrictions as C. There are similar issues with null dereferences: D's spec says they have defined behavior, but actual D compilers fail to guarantee this in some cases.2) Is subtracting pointers that used to be in the same array legal. void main() { auto a = [ 1, 2 ]; auto b = a; assert(a.ptr - b.ptr == 0); // i) Obviously legal? // Drop the first element a = a[1..$]; assert(a.ptr - b.ptr == 1); // ii) GC-behaviorally legal? // Save the pointer const old_aPtr = a.ptr; // and move the array to another memory a.length = 1_000_000; // Expect a and b are on different blocks of memory assert(a.ptr != old_aPtr); assert(old_aPtr - b.ptr == 1); // iii) Practically legal? }According to the C rules, (i) and (ii) are legal, since they point to the same memory block, but (iii) is illegal.
Apr 01 2022
On 4/1/22 2:44 PM, Paul Backus wrote:On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:(iii) is the same as (ii) because old_aPtr is the same as a.ptr at that time. -Steve2) Is subtracting pointers that used to be in the same array legal. void main() {  auto a = [ 1, 2 ];  auto b = a;  assert(a.ptr - b.ptr == 0);   // i) Obviously legal?  // Drop the first element  a = a[1..$];  assert(a.ptr - b.ptr == 1);   // ii) GC-behaviorally legal?  // Save the pointer  const old_aPtr = a.ptr;  // and move the array to another memory  a.length = 1_000_000;  // Expect a and b are on different blocks of memory  assert(a.ptr != old_aPtr);  assert(old_aPtr - b.ptr == 1); // iii) Practically legal? }According to the C rules, (i) and (ii) are legal, since they point to the same memory block, but (iii) is illegal.
Apr 01 2022
On Friday, 1 April 2022 at 19:43:01 UTC, Steven Schveighoffer wrote:On 4/1/22 2:44 PM, Paul Backus wrote:You're right; my mistake. I misread it as `a.ptr - b.ptr`, since that's what the other two `assert`s do.According to the C rules, (i) and (ii) are legal, since they point to the same memory block, but (iii) is illegal.(iii) is the same as (ii) because old_aPtr is the same as a.ptr at that time. -Steve
Apr 01 2022
On Friday, 1 April 2022 at 15:52:39 UTC, Ali Çehreli wrote:As the following quote from a Microsoft document claims, and as I've already known, pointer subtraction is legal only if the pointers are into the same array: "ANSI 3.3.6, 4.1.1 The type of integer required to hold the difference between two pointers to elements of the same array, ptrdiff_t." ( https://docs.microsoft.com/en-us/cpp/c-language/pointer-subtr ction?view=msvc-170 ) I suspect "array" means "a block of memory" there because arrays are ordinarily malloc'ed pieces of memory in C. [...]```d import std.stdio; enum testLimit = 1024 * 1024 * 62; void main() { char[] first ="a12345".dup; // length =>6 first.length = testLimit; first[$-1] = 'z'; auto hLen = first.length/2; // =>3 auto sliceRight = first[hLen..$];// "345" char* oneHalf = &first[hLen++]; // =>[3] char[] half12 = first[0..hLen]; // =>[0..4] char[] half22 = oneHalf[0..hLen];// =>[0..4] // legal? Ok. --hLen; // =>3 for(int i; i < hLen; i++) assert(&sliceRight[i] - &half12[i] == hLen); // legal? Ok. half12[0..3].write("..."); // "a12..." sliceRight[$-3..$].writeln; // "��z" char* a = &half12[0]; // "a" char* z = &sliceRight[$-1]; // "z" writefln("[%c]%s\n[%c]%s", *a, &(*a), *z, &(*z)); assert( (&(*z) - &(*a)) == (testLimit - 1) ); // legal? Ok. auto test = half22.ptr - half12.ptr; assert(test == hLen);/* test.writeln;//*/ // legal? Ok. auto last = first; assert(first.ptr - last.ptr == 0); // legal? Ok. last ~= '.'; assert(&last[$-1] - &first[$-1] == 1); } ``` Everything is legal... SDB 79
Apr 01 2022
You're right that the spec should add some commentary saying that pointer arithmetic should be confined to being within the same memory object.
Apr 02 2022
On Saturday, 2 April 2022 at 18:54:58 UTC, Walter Bright wrote:You're right that the spec should add some commentary saying that pointer arithmetic should be confined to being within the same memory object.What's the definition of memory object here? Does the C Standard treat malloc/calloc and co as special functions and memory objects are what is returned by them?
Apr 03 2022
On Sunday, 3 April 2022 at 12:22:22 UTC, Tobias Pankrath wrote:On Saturday, 2 April 2022 at 18:54:58 UTC, Walter Bright wrote:In D: https://dlang.org/spec/intro.html#object-modelYou're right that the spec should add some commentary saying that pointer arithmetic should be confined to being within the same memory object.What's the definition of memory object here? Does the C Standard treat malloc/calloc and co as special functions and memory objects are what is returned by them?
Apr 03 2022