www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How ptr arithmitic works??? It doesn't make any sense....

reply rempas <rempas tutanota.com> writes:
First a little bit of theory. A pointer just points to a memory 
address which is a number. So when I add "10" to this pointer, it 
will point ten bytes after the place it was pointing to, right? 
Another thing with pointers is that it doesn't have "types". A 
pointer always just points to a location so types are created for 
the compiler so we can catch bugs when pointing to places and 
trying to manipulate the bytes to a size we probably wouldn't 
want to. For example: if you have allocated 4 bytes and then you 
try to point to it with a type of "short" for example, then you 
could only manipulate 2 of these 4 bytes but you probably 
wouldn't and you did something wrong so we do have types and the 
compiler requires explicit pointer type casting (in contrast to 
C) so it can protect you from these bugs.

This type-casting brings some problem however. So, I played 
around it and I figured it out than to get the right location you 
expect when returning from a function, you need to do the math 
and then cast the whole expression (so the result) and return 
that. If you only cast the first value (that is of the different 
type) an then do that addition (or whatever expression you want), 
it will return a wrong address. But WAIT!!! This doesn't work in 
a different example. And I'm braking my head to understand why 
and I thought about asking if anyone can help and explain to me 
why. Btw, all the testing was made with `ldc` in the `BetterC` 
"mode". Code:

```d
import core.stdc.stdio;
import core.stdc.stdlib;

struct MemoryBlock {
   char* ptr;
   ulong length;
}

void* ptr = cast(void*)0x7a7;

void* right() {
   return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast 
the whole expression between paranthesis. Got the right value!
}

void* wrong() {
   return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First 
cast the `ptr` variable and then add the number. Got a wronge 
value...
}

char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted 
the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
}

char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
first casted the `local_ptr` variable and then added the number 
but this time this gave me the right value....
}

extern (C) void main() {
   printf("EXPECTED LOCATION: %p\n", ptr + MemoryBlock.sizeof);
   printf("RIGHT LOCATION: %p\n", right());
   printf("WRONG LOCATION: %p\n", wrong());

   printf("RETURNED ADDRESS (wrong): %p\n", 
return_address_wrong());
   printf("RETURNED ADDRESS (right): %p\n", 
return_address_right());
}
```
Dec 04 2022
next sibling parent reply ag0aep6g <anonymous example.com> writes:
On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:
 First a little bit of theory. A pointer just points to a memory 
 address which is a number. So when I add "10" to this pointer, 
 it will point ten bytes after the place it was pointing to, 
 right?
Not quite. Adding 10 to a T* means adding 10 * T.sizeof.
Dec 04 2022
parent reply rempas <rempas tutanota.com> writes:
On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:
 Not quite. Adding 10 to a T* means adding 10 * T.sizeof.
Oh! I thought it was addition. Is there a specific reasoning for that if you are aware of?
Dec 04 2022
parent reply bauss <jacobbauss gmail.com> writes:
On Monday, 5 December 2022 at 06:12:44 UTC, rempas wrote:
 On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:
 Not quite. Adding 10 to a T* means adding 10 * T.sizeof.
Oh! I thought it was addition. Is there a specific reasoning for that if you are aware of?
Because it's much easier to work with. Ex. if you have an array of 4 signed 32 bit integers that you're pointing to then you can simply just increment the pointer by 1. If it was raw bytes then you'd have to increment the pointer by 4 to move to the next element. This is counter-intuitive if you're moving to the next element in a loop ex. This is how you'd do it idiomatically: ```d foreach (i; 0 .. list.length) { (*cast(int*)(ptr + i)) = i; } ``` Compared to: ```d foreach (i; 0 .. list.length) { (*cast(int*)(ptr + (i * 4))) = i; } ```
Dec 05 2022
parent reply rempas <rempas tutanota.com> writes:
On Monday, 5 December 2022 at 08:21:44 UTC, bauss wrote:
 Because it's much easier to work with.

 Ex. if you have an array of 4 signed 32 bit integers that 
 you're pointing to then you can simply just increment the 
 pointer by 1.

 If it was raw bytes then you'd have to increment the pointer by 
 4 to move to the next element.

 This is counter-intuitive if you're moving to the next element 
 in a loop ex.

 This is how you'd do it idiomatically:

 ```d
 foreach (i; 0 .. list.length)
 {
     (*cast(int*)(ptr + i)) = i;
 }
 ```
Is this `(*cast(int*)(ptr + i)) = i;` or you did a mistake and wanted to write: `(*cast(int*)ptr + i) = i;`? Cause like we said before, the first operand must be a cast to the type for this to work right.
 Compared to:

 ```d

 foreach (i; 0 .. list.length)
 {
     (*cast(int*)(ptr + (i * 4))) = i;
 }
 ```
Got it! I guess they could also just allow us to use bracket notation to do the same thing. So something like: ```d foreach (i; 0 .. list.length) { (cast(int*)ptr[i]) = i; } ``` This is what happens with arrays anyways. And arrays ARE pointers to a contiguous memory block anyways so they could do the same with regular pointers. The example also looks more readable.
Dec 05 2022
parent reply ag0aep6g <anonymous example.com> writes:
On Monday, 5 December 2022 at 15:08:41 UTC, rempas wrote:
 Got it! I guess they could also just allow us to use bracket 
 notation to do the same thing. So something like:

 ```d
 foreach (i; 0 .. list.length) {
   (cast(int*)ptr[i]) = i;
 }
 ```

 This is what happens with arrays anyways. And arrays ARE 
 pointers to a contiguous memory block anyways so they could do 
 the same with regular pointers. The example also looks more 
 readable.
You can use bracket notation with pointers. You just need to move your closing parenthesis a bit. Assuming that `ptr` is a `void*`, these are all equivalent: ```d (cast(int*) ptr)[i] = whatever; *((cast(int*) ptr) + i) = whatever; *(cast(int*) (ptr + i * int.sizeof)) = whatever; ```
Dec 05 2022
next sibling parent reply Salih Dincer <salihdb hotmail.com> writes:
On Monday, 5 December 2022 at 18:01:47 UTC, ag0aep6g wrote:
 You can use bracket notation with pointers. You just need to 
 move your closing parenthesis a bit.

 Assuming that `ptr` is a `void*`, these are all equivalent...
Yeah, there is such a thing! I'm sure you'll all like this example: ```d struct AAish(K, V, size_t s) { K key; V[s] value; // 5 + 1(\0) string toString() { import std.format : format; import core.stdc.string : strlen; auto result = value[0..strlen(value.ptr)]; return format("%s: %s", key, result); } } void stringCopy(C)(ref C src, string str) { import std.algorithm : min; auto len = min(src.length - 1, str.length); src[0 .. len] = str[0 .. len]; src[len] = '\0'; } enum n = 9; alias AA = AAish!(int, char, n); void main() { // First, we malloc for multiple AA() import core.stdc.stdlib; auto v = malloc(n * AA.sizeof); static assert( is (typeof(v) == void*) ); // Cast to use memory space for AA()'s auto ptr = cast(AA*)v; static assert( is (typeof(ptr) == AA*) ); // init AA()'s: foreach(i; 0..n) { ptr[i] = AA(i); } import std.stdio; ptr[0].value.stringCopy = "zero"; ptr[0].writeln; ptr[1].value.stringCopy = "one"; ptr[1].writeln; ptr[2].value.stringCopy = "two"; ptr[2].writeln; ptr[3].value.stringCopy = "three"; ptr[3].writeln; "...".writeln; //... ptr[8].value.stringCopy = "eight"; ptr[8].writeln; ptr[0..n/2].writeln; } /* Prints: 0: zero 1: one 2: two 3: three ... 8: eight [0: zero, 1: one, 2: two, 3: three] */ ``` SDB 79
Dec 05 2022
parent rempas <rempas tutanota.com> writes:
On Monday, 5 December 2022 at 22:21:06 UTC, Salih Dincer wrote:
 Yeah, there is such a thing!  I'm sure you'll all like this 
 example:

 [...]
Great example! Thank you my friend!
Dec 06 2022
prev sibling parent rempas <rempas tutanota.com> writes:
On Monday, 5 December 2022 at 18:01:47 UTC, ag0aep6g wrote:
 You can use bracket notation with pointers. You just need to 
 move your closing parenthesis a bit.

 Assuming that `ptr` is a `void*`, these are all equivalent:

 ```d
 (cast(int*) ptr)[i] = whatever;
 *((cast(int*) ptr) + i) = whatever;
 *(cast(int*) (ptr + i * int.sizeof)) = whatever;
 ```
Oh, wow! That's sure interesting! Thanks a lot!
Dec 05 2022
prev sibling next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:
 struct MemoryBlock {
   char* ptr;
   ulong length;
 }
(MemoryBlock.sizeof is 16 on my 64-bit system).
 void* ptr = cast(void*)0x7a7;

 void* right() {
   return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast 
 the whole expression between paranthesis. Got the right value!
 }
The above adds 16 bytes to ptr.
 void* wrong() {
   return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First 
 cast the `ptr` variable and then add the number. Got a wronge 
 value...
 }
The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, because ptr is cast first. Should be `+ 1` to be equivalent. https://dlang.org/spec/expression.html#pointer_arithmetic "the resulting value is the pointer plus (or minus) the second operand **multiplied by the size of the type pointed to by the first operand**."
 char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted 
 the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
 }
Because you are adding to a pointer that points to a 16-byte block, rather than a void* which points to a single byte.
 char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
 first casted the `local_ptr` variable and then added the number 
 but this time this gave me the right value....
 }
The casted pointer points to a single byte.
Dec 04 2022
parent rempas <rempas tutanota.com> writes:
On Sunday, 4 December 2022 at 17:27:39 UTC, Nick Treleaven wrote:
 On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:

 (MemoryBlock.sizeof is 16 on my 64-bit system).

 The above adds 16 bytes to ptr.

 The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, 
 because ptr is cast first. Should be `+ 1` to be equivalent.

 https://dlang.org/spec/expression.html#pointer_arithmetic

 "the resulting value is the pointer plus (or minus) the second 
 operand **multiplied by the size of the type pointed to by the 
 first operand**."
Thanks! This explains it. And I have tried and I can only use "+" or "-" with a pointer so it explains it.
 char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // 
 Casted the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
 }
Because you are adding to a pointer that points to a 16-byte block, rather than a void* which points to a single byte.
 char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
 first casted the `local_ptr` variable and then added the 
 number but this time this gave me the right value....
 }
The casted pointer points to a single byte.
I think I get it! The first part about the arithmetic explains it all well. I was also able to fix my program. They way I see it, you return from a function by first casting the first operand and when you want to get a variable (or pass one to a function), you cast the whole expression. At least that's how it worked with my program.
Dec 04 2022
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Sun, Dec 04, 2022 at 04:33:35PM +0000, rempas via Digitalmars-d-learn wrote:
 First a little bit of theory. A pointer just points to a memory
 address which is a number. So when I add "10" to this pointer, it will
 point ten bytes after the place it was pointing to, right?
This is true only if you're talking about pointers in the sense of pointers in assembly language. Languages like C and D add another layer of abstraction over this.
 Another thing with pointers is that it doesn't have "types".
This is where you went wrong. In assembly language, yes, a pointer value is just a number, and there's no type associated with it. However, experience has shown that manipulating pointers at this raw, untyped level is extremely error-prone. Therefore, in languages like C or D, a pointer *does* have a type. It's a way of preventing the programmer from making silly mistakes, by associating a type (at compile-time only, of course) to the pointer value. It's a way of keeping track that address 1234 points to a short, and not to a float, for example. At the assembly level, of course, this type information is erased, and the pointers are just integer addresses. However, at compile-type, this type exists to prevent, or at least warn, the programmer from treating the value at the pointed-to address as the wrong type. This is not only because of data sizes, but the interpretation of data. A 32-bit value interpreted as an int is completely different from a 32-bit value interpreted as a float, for example. You wouldn't want to perform integer arithmetic on something that's supposed to be a float; the result would be garbage. In addition, although in theory memory is byte-addressable, many architectures impose alignment restrictions on values larger than a byte. For example, the CPU may require that 32-bit values (ints or floats) must be aligned to an address that's a multiple of 4 bytes. If you add 1 to an int* address and try to access the result, it may cause performance issues (the CPU may have to load 2 32-bit values and reassemble parts of them to form the misaligned 32-bit value) or a fault (the CPU may refuse to load a non-aligned address), which could be a silent failure or may cause your program to be forcefully terminated. Therefore, typed pointers like short* and int* may not be entirely an artifact that only exists in the compiler; it may not actually be legal to add a non-aligned value to an int*, depending on the hardware you're running on. Because of this, C and D implement pointer arithmetic in terms of the underlying value type. I.e., adding 1 to a char* will add 1 to the underlying address, but adding 1 to an int* will add int.sizeof to the underlying address instead of 1. I.e.: int[2] x; int* p = &x[0]; // let's say this is address 1234 p++; // p is now 1238, *not* 1235 (int.sizeof == 4) As a consequence, when you cast a raw pointer value to a typed pointer, you are responsible to respect any underlying alignment requirements that the machine may have. Casting a non-aligned address like 1235 to a possibly-aligned pointer like int* may cause problems if you're not careful. Also, the value type of the pointer *does* matter; you will get different results depending on the size of the type and any alignment requirements it may have. Pointer arithmetic involving T* operate in units of T.sizeof, *not* in terms of the raw pointer value. T -- Change is inevitable, except from a vending machine.
Dec 04 2022
parent rempas <rempas tutanota.com> writes:
On Sunday, 4 December 2022 at 19:00:15 UTC, H. S. Teoh wrote:
 This is true only if you're talking about pointers in the sense 
 of pointers in assembly language.  Languages like C and D add 
 another layer of abstraction over this.


 Another thing with pointers is that it doesn't have "types".
This is where you went wrong. In assembly language, yes, a pointer value is just a number, and there's no type associated with it. However, experience has shown that manipulating pointers at this raw, untyped level is extremely error-prone. Therefore, in languages like C or D, a pointer *does* have a type. It's a way of preventing the programmer from making silly mistakes, by associating a type (at compile-time only, of course) to the pointer value. It's a way of keeping track that address 1234 points to a short, and not to a float, for example. At the assembly level, of course, this type information is erased, and the pointers are just integer addresses. However, at compile-type, this type exists to prevent, or at least warn, the programmer from treating the value at the pointed-to address as the wrong type. This is not only because of data sizes, but the interpretation of data. A 32-bit value interpreted as an int is completely different from a 32-bit value interpreted as a float, for example. You wouldn't want to perform integer arithmetic on something that's supposed to be a float; the result would be garbage. In addition, although in theory memory is byte-addressable, many architectures impose alignment restrictions on values larger than a byte. For example, the CPU may require that 32-bit values (ints or floats) must be aligned to an address that's a multiple of 4 bytes. If you add 1 to an int* address and try to access the result, it may cause performance issues (the CPU may have to load 2 32-bit values and reassemble parts of them to form the misaligned 32-bit value) or a fault (the CPU may refuse to load a non-aligned address), which could be a silent failure or may cause your program to be forcefully terminated. Therefore, typed pointers like short* and int* may not be entirely an artifact that only exists in the compiler; it may not actually be legal to add a non-aligned value to an int*, depending on the hardware you're running on. Because of this, C and D implement pointer arithmetic in terms of the underlying value type. I.e., adding 1 to a char* will add 1 to the underlying address, but adding 1 to an int* will add int.sizeof to the underlying address instead of 1. I.e.: int[2] x; int* p = &x[0]; // let's say this is address 1234 p++; // p is now 1238, *not* 1235 (int.sizeof == 4) As a consequence, when you cast a raw pointer value to a typed pointer, you are responsible to respect any underlying alignment requirements that the machine may have. Casting a non-aligned address like 1235 to a possibly-aligned pointer like int* may cause problems if you're not careful. Also, the value type of the pointer *does* matter; you will get different results depending on the size of the type and any alignment requirements it may have. Pointer arithmetic involving T* operate in units of T.sizeof, *not* in terms of the raw pointer value. T
Wow! Seriously, thanks a lot for this detailed explanation! I want to write a compiler and this type of explanations that not only give me the answer but explain me in detail why something happens are a gift for me! I wish I could meet you in person and buy you a coffee. Maybe one day, you never know! Thanks a lot and have an amazing day!
Dec 05 2022