digitalmars.D.learn - How ptr arithmitic works??? It doesn't make any sense....

rempas (62/62) Dec 04 2022 First a little bit of theory. A pointer just points to a memory

ag0aep6g (2/6) Dec 04 2022 Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

rempas (3/4) Dec 04 2022 Oh! I thought it was addition. Is there a specific reasoning for

bauss (22/27) Dec 05 2022 Because it's much easier to work with.

rempas (15/37) Dec 05 2022 Is this `(*cast(int*)(ptr + i)) = i;` or you did a mistake and

ag0aep6g (9/20) Dec 05 2022 You can use bracket notation with pointers. You just need to move

Salih Dincer (67/70) Dec 05 2022 Yeah, there is such a thing! I'm sure you'll all like this

rempas (2/5) Dec 06 2022 Great example! Thank you my friend!

rempas (2/10) Dec 05 2022 Oh, wow! That's sure interesting! Thanks a lot!

Nick Treleaven (12/37) Dec 04 2022 The above adds 16 bytes to ptr.

rempas (9/32) Dec 04 2022 Thanks! This explains it. And I have tried and I can only use "+"

H. S. Teoh (52/56) Dec 04 2022 This is true only if you're talking about pointers in the sense of

rempas (7/62) Dec 05 2022 Wow! Seriously, thanks a lot for this detailed explanation! I

rempas <rempas tutanota.com> writes:

First a little bit of theory. A pointer just points to a memory 
address which is a number. So when I add "10" to this pointer, it 
will point ten bytes after the place it was pointing to, right? 
Another thing with pointers is that it doesn't have "types". A 
pointer always just points to a location so types are created for 
the compiler so we can catch bugs when pointing to places and 
trying to manipulate the bytes to a size we probably wouldn't 
want to. For example: if you have allocated 4 bytes and then you 
try to point to it with a type of "short" for example, then you 
could only manipulate 2 of these 4 bytes but you probably 
wouldn't and you did something wrong so we do have types and the 
compiler requires explicit pointer type casting (in contrast to 
C) so it can protect you from these bugs.

This type-casting brings some problem however. So, I played 
around it and I figured it out than to get the right location you 
expect when returning from a function, you need to do the math 
and then cast the whole expression (so the result) and return 
that. If you only cast the first value (that is of the different 
type) an then do that addition (or whatever expression you want), 
it will return a wrong address. But WAIT!!! This doesn't work in 
a different example. And I'm braking my head to understand why 
and I thought about asking if anyone can help and explain to me 
why. Btw, all the testing was made with `ldc` in the `BetterC` 
"mode". Code:

```d
import core.stdc.stdio;
import core.stdc.stdlib;

struct MemoryBlock {
   char* ptr;
   ulong length;
}

void* ptr = cast(void*)0x7a7;

void* right() {
   return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast 
the whole expression between paranthesis. Got the right value!
}

void* wrong() {
   return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First 
cast the `ptr` variable and then add the number. Got a wronge 
value...
}

char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted 
the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
}

char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
first casted the `local_ptr` variable and then added the number 
but this time this gave me the right value....
}

extern (C) void main() {
   printf("EXPECTED LOCATION: %p\n", ptr + MemoryBlock.sizeof);
   printf("RIGHT LOCATION: %p\n", right());
   printf("WRONG LOCATION: %p\n", wrong());

   printf("RETURNED ADDRESS (wrong): %p\n", 
return_address_wrong());
   printf("RETURNED ADDRESS (right): %p\n", 
return_address_right());
}
```

Dec 04 2022

ag0aep6g <anonymous example.com> writes:

On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:
 First a little bit of theory. A pointer just points to a memory 
 address which is a number. So when I add "10" to this pointer, 
 it will point ten bytes after the place it was pointing to, 
 right?

Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

Dec 04 2022

rempas <rempas tutanota.com> writes:

On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:
 Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

Oh! I thought it was addition. Is there a specific reasoning for 
that if you are aware of?

Dec 04 2022

bauss <jacobbauss gmail.com> writes:

On Monday, 5 December 2022 at 06:12:44 UTC, rempas wrote:
 On Sunday, 4 December 2022 at 16:40:17 UTC, ag0aep6g wrote:
 Not quite. Adding 10 to a T* means adding 10 * T.sizeof.

 Oh! I thought it was addition. Is there a specific reasoning 
 for that if you are aware of?

Because it's much easier to work with.

Ex. if you have an array of 4 signed 32 bit integers that you're 
pointing to then you can simply just increment the pointer by 1.

If it was raw bytes then you'd have to increment the pointer by 4 
to move to the next element.

This is counter-intuitive if you're moving to the next element in 
a loop ex.

This is how you'd do it idiomatically:

```d
foreach (i; 0 .. list.length)
{
     (*cast(int*)(ptr + i)) = i;
}
```

Compared to:

```d

foreach (i; 0 .. list.length)
{
     (*cast(int*)(ptr + (i * 4))) = i;
}
```

Dec 05 2022

rempas <rempas tutanota.com> writes:

On Monday, 5 December 2022 at 08:21:44 UTC, bauss wrote:
 Because it's much easier to work with.

 Ex. if you have an array of 4 signed 32 bit integers that 
 you're pointing to then you can simply just increment the 
 pointer by 1.

 If it was raw bytes then you'd have to increment the pointer by 
 4 to move to the next element.

 This is counter-intuitive if you're moving to the next element 
 in a loop ex.

 This is how you'd do it idiomatically:

 ```d
 foreach (i; 0 .. list.length)
 {
     (*cast(int*)(ptr + i)) = i;
 }
 ```

Is this `(*cast(int*)(ptr + i)) = i;` or you did a mistake and 
wanted to write: `(*cast(int*)ptr + i) = i;`? Cause like we said 
before, the first operand must be a cast to the type for this to 
work right.

 Compared to:

 ```d

 foreach (i; 0 .. list.length)
 {
     (*cast(int*)(ptr + (i * 4))) = i;
 }
 ```

Got it! I guess they could also just allow us to use bracket 
notation to do the same thing. So something like:

```d
foreach (i; 0 .. list.length) {
   (cast(int*)ptr[i]) = i;
}
```

This is what happens with arrays anyways. And arrays ARE pointers 
to a contiguous memory block anyways so they could do the same 
with regular pointers. The example also looks more readable.

Dec 05 2022

ag0aep6g <anonymous example.com> writes:

On Monday, 5 December 2022 at 15:08:41 UTC, rempas wrote:
 Got it! I guess they could also just allow us to use bracket 
 notation to do the same thing. So something like:

 ```d
 foreach (i; 0 .. list.length) {
   (cast(int*)ptr[i]) = i;
 }
 ```

 This is what happens with arrays anyways. And arrays ARE 
 pointers to a contiguous memory block anyways so they could do 
 the same with regular pointers. The example also looks more 
 readable.

You can use bracket notation with pointers. You just need to move 
your closing parenthesis a bit.

Assuming that `ptr` is a `void*`, these are all equivalent:

```d
(cast(int*) ptr)[i] = whatever;
*((cast(int*) ptr) + i) = whatever;
*(cast(int*) (ptr + i * int.sizeof)) = whatever;
```

Dec 05 2022

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 5 December 2022 at 18:01:47 UTC, ag0aep6g wrote:
 You can use bracket notation with pointers. You just need to 
 move your closing parenthesis a bit.

 Assuming that `ptr` is a `void*`, these are all equivalent...

Yeah, there is such a thing!  I'm sure you'll all like this 
example:

```d
struct AAish(K, V, size_t s)
{
   K key;
   V[s] value; // 5 + 1(\0)

   string toString() {
     import std.format : format;
     import core.stdc.string : strlen;
     auto result = value[0..strlen(value.ptr)];
     return format("%s: %s", key, result);
   }
}

void stringCopy(C)(ref C src, string str) {
   import std.algorithm : min;
   auto len = min(src.length - 1,
                  str.length);
   src[0 .. len] = str[0 .. len];
   src[len] = '\0';
}

enum n = 9;
alias AA = AAish!(int, char, n);

void main()
{
   // First, we malloc for multiple AA()
   import core.stdc.stdlib;
   auto v = malloc(n * AA.sizeof);
   static assert(
     is (typeof(v) == void*)
   );

   // Cast to use memory space for AA()'s
   auto ptr = cast(AA*)v;
   static assert(
     is (typeof(ptr) == AA*)
   );

   // init AA()'s:
   foreach(i; 0..n)
   {
     ptr[i] = AA(i);
   }

   import std.stdio;
   ptr[0].value.stringCopy = "zero";
   ptr[0].writeln;

   ptr[1].value.stringCopy = "one";
   ptr[1].writeln;

   ptr[2].value.stringCopy = "two";
   ptr[2].writeln;

   ptr[3].value.stringCopy = "three";
   ptr[3].writeln;

   "...".writeln; //...

   ptr[8].value.stringCopy = "eight";
   ptr[8].writeln;

   ptr[0..n/2].writeln;
}
/* Prints:

0: zero
1: one
2: two
3: three
...
8: eight
[0: zero, 1: one, 2: two, 3: three]

*/
```
SDB 79

Dec 05 2022

rempas <rempas tutanota.com> writes:

On Monday, 5 December 2022 at 22:21:06 UTC, Salih Dincer wrote:
 Yeah, there is such a thing!  I'm sure you'll all like this 
 example:

 [...]

Great example! Thank you my friend!

Dec 06 2022

rempas <rempas tutanota.com> writes:

On Monday, 5 December 2022 at 18:01:47 UTC, ag0aep6g wrote:
 You can use bracket notation with pointers. You just need to 
 move your closing parenthesis a bit.

 Assuming that `ptr` is a `void*`, these are all equivalent:

 ```d
 (cast(int*) ptr)[i] = whatever;
 *((cast(int*) ptr) + i) = whatever;
 *(cast(int*) (ptr + i * int.sizeof)) = whatever;
 ```

Oh, wow! That's sure interesting! Thanks a lot!

Dec 05 2022

Nick Treleaven <nick geany.org> writes:

On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:
 struct MemoryBlock {
   char* ptr;
   ulong length;
 }

(MemoryBlock.sizeof is 16 on my 64-bit system).

 void* ptr = cast(void*)0x7a7;

 void* right() {
   return cast(MemoryBlock*)(ptr + MemoryBlock.sizeof); // Cast 
 the whole expression between paranthesis. Got the right value!
 }

The above adds 16 bytes to ptr.

 void* wrong() {
   return cast(MemoryBlock*)ptr + MemoryBlock.sizeof; // First 
 cast the `ptr` variable and then add the number. Got a wronge 
 value...
 }

The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, 
because ptr is cast first. Should be `+ 1` to be equivalent.

https://dlang.org/spec/expression.html#pointer_arithmetic

"the resulting value is the pointer plus (or minus) the second 
operand **multiplied by the size of the type pointed to by the 
first operand**."

 char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // Casted 
 the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
 }

Because you are adding to a pointer that points to a 16-byte 
block, rather than a void* which points to a single byte.

 char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
 first casted the `local_ptr` variable and then added the number 
 but this time this gave me the right value....
 }

The casted pointer points to a single byte.

Dec 04 2022

rempas <rempas tutanota.com> writes:

On Sunday, 4 December 2022 at 17:27:39 UTC, Nick Treleaven wrote:
 On Sunday, 4 December 2022 at 16:33:35 UTC, rempas wrote:

 (MemoryBlock.sizeof is 16 on my 64-bit system).

 The above adds 16 bytes to ptr.

 The above adds 16 * MemoryBlock.sizeof bytes (16 * 16) to ptr, 
 because ptr is cast first. Should be `+ 1` to be equivalent.

 https://dlang.org/spec/expression.html#pointer_arithmetic

 "the resulting value is the pointer plus (or minus) the second 
 operand **multiplied by the size of the type pointed to by the 
 first operand**."

Thanks! This explains it. And I have tried and I can only use "+" 
or "-" with a pointer so it explains it.

 char* return_address_wrong() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)(local_ptr + MemoryBlock.sizeof); // 
 Casted the whole expression. BUT GOT THE WRONG VALUE!!!! Why???
 }

 Because you are adding to a pointer that points to a 16-byte 
 block, rather than a void* which points to a single byte.

 char* return_address_right() {
   MemoryBlock* local_ptr = cast(MemoryBlock*)ptr;
   return cast(char*)local_ptr + MemoryBlock.sizeof; // Now I 
 first casted the `local_ptr` variable and then added the 
 number but this time this gave me the right value....
 }

 The casted pointer points to a single byte.

I think I get it! The first part about the arithmetic explains it 
all well. I was also able to fix my program. They way I see it, 
you return from a function by first casting the first operand and 
when you want to get a variable (or pass one to a function), you 
cast the whole expression. At least that's how it worked with my 
program.

Dec 04 2022

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Sun, Dec 04, 2022 at 04:33:35PM +0000, rempas via Digitalmars-d-learn wrote:
 First a little bit of theory. A pointer just points to a memory
 address which is a number. So when I add "10" to this pointer, it will
 point ten bytes after the place it was pointing to, right?

This is true only if you're talking about pointers in the sense of
pointers in assembly language.  Languages like C and D add another layer
of abstraction over this.


 Another thing with pointers is that it doesn't have "types".

This is where you went wrong.  In assembly language, yes, a pointer
value is just a number, and there's no type associated with it.
However, experience has shown that manipulating pointers at this raw,
untyped level is extremely error-prone.  Therefore, in languages like C
or D, a pointer *does* have a type.  It's a way of preventing the
programmer from making silly mistakes, by associating a type (at
compile-time only, of course) to the pointer value.  It's a way of
keeping track that address 1234 points to a short, and not to a float,
for example.  At the assembly level, of course, this type information is
erased, and the pointers are just integer addresses.  However, at
compile-type, this type exists to prevent, or at least warn, the
programmer from treating the value at the pointed-to address as the
wrong type.  This is not only because of data sizes, but the
interpretation of data.  A 32-bit value interpreted as an int is
completely different from a 32-bit value interpreted as a float, for
example.  You wouldn't want to perform integer arithmetic on something
that's supposed to be a float; the result would be garbage.

In addition, although in theory memory is byte-addressable, many
architectures impose alignment restrictions on values larger than a
byte. For example, the CPU may require that 32-bit values (ints or
floats) must be aligned to an address that's a multiple of 4 bytes.  If
you add 1 to an int* address and try to access the result, it may cause
performance issues (the CPU may have to load 2 32-bit values and
reassemble parts of them to form the misaligned 32-bit value) or a fault
(the CPU may refuse to load a non-aligned address), which could be a
silent failure or may cause your program to be forcefully terminated.
Therefore, typed pointers like short* and int* may not be entirely an
artifact that only exists in the compiler; it may not actually be legal
to add a non-aligned value to an int*, depending on the hardware you're
running on.

Because of this, C and D implement pointer arithmetic in terms of the
underlying value type. I.e., adding 1 to a char* will add 1 to the
underlying address, but adding 1 to an int* will add int.sizeof to the
underlying address instead of 1. I.e.:

	int[2] x;
	int* p = &x[0];	// let's say this is address 1234
	p++;		// p is now 1238, *not* 1235 (int.sizeof == 4)

As a consequence, when you cast a raw pointer value to a typed pointer,
you are responsible to respect any underlying alignment requirements
that the machine may have. Casting a non-aligned address like 1235 to a
possibly-aligned pointer like int* may cause problems if you're not
careful.  Also, the value type of the pointer *does* matter; you will
get different results depending on the size of the type and any
alignment requirements it may have.  Pointer arithmetic involving T*
operate in units of T.sizeof, *not* in terms of the raw pointer value.


T

-- 
Change is inevitable, except from a vending machine.

Dec 04 2022

rempas <rempas tutanota.com> writes:

On Sunday, 4 December 2022 at 19:00:15 UTC, H. S. Teoh wrote:
 This is true only if you're talking about pointers in the sense 
 of pointers in assembly language.  Languages like C and D add 
 another layer of abstraction over this.


 Another thing with pointers is that it doesn't have "types".

 This is where you went wrong.  In assembly language, yes, a 
 pointer value is just a number, and there's no type associated 
 with it. However, experience has shown that manipulating 
 pointers at this raw, untyped level is extremely error-prone.  
 Therefore, in languages like C or D, a pointer *does* have a 
 type.  It's a way of preventing the programmer from making 
 silly mistakes, by associating a type (at compile-time only, of 
 course) to the pointer value.  It's a way of keeping track that 
 address 1234 points to a short, and not to a float, for 
 example.  At the assembly level, of course, this type 
 information is erased, and the pointers are just integer 
 addresses.  However, at compile-type, this type exists to 
 prevent, or at least warn, the programmer from treating the 
 value at the pointed-to address as the wrong type.  This is not 
 only because of data sizes, but the interpretation of data.  A 
 32-bit value interpreted as an int is completely different from 
 a 32-bit value interpreted as a float, for example.  You 
 wouldn't want to perform integer arithmetic on something that's 
 supposed to be a float; the result would be garbage.

 In addition, although in theory memory is byte-addressable, 
 many architectures impose alignment restrictions on values 
 larger than a byte. For example, the CPU may require that 
 32-bit values (ints or floats) must be aligned to an address 
 that's a multiple of 4 bytes.  If you add 1 to an int* address 
 and try to access the result, it may cause performance issues 
 (the CPU may have to load 2 32-bit values and reassemble parts 
 of them to form the misaligned 32-bit value) or a fault (the 
 CPU may refuse to load a non-aligned address), which could be a 
 silent failure or may cause your program to be forcefully 
 terminated. Therefore, typed pointers like short* and int* may 
 not be entirely an artifact that only exists in the compiler; 
 it may not actually be legal to add a non-aligned value to an 
 int*, depending on the hardware you're running on.

 Because of this, C and D implement pointer arithmetic in terms 
 of the underlying value type. I.e., adding 1 to a char* will 
 add 1 to the underlying address, but adding 1 to an int* will 
 add int.sizeof to the underlying address instead of 1. I.e.:

 	int[2] x;
 	int* p = &x[0];	// let's say this is address 1234
 	p++;		// p is now 1238, *not* 1235 (int.sizeof == 4)

 As a consequence, when you cast a raw pointer value to a typed 
 pointer, you are responsible to respect any underlying 
 alignment requirements that the machine may have. Casting a 
 non-aligned address like 1235 to a possibly-aligned pointer 
 like int* may cause problems if you're not careful.  Also, the 
 value type of the pointer *does* matter; you will get different 
 results depending on the size of the type and any alignment 
 requirements it may have.  Pointer arithmetic involving T* 
 operate in units of T.sizeof, *not* in terms of the raw pointer 
 value.


 T

Wow! Seriously, thanks a lot for this detailed explanation! I 
want to write a compiler and this type of explanations that not 
only give me the answer but explain me in detail why something 
happens are a gift for me! I wish I could meet you in person and 
buy you a coffee. Maybe one day, you never know! Thanks a lot and 
have an amazing day!

Dec 05 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How ptr arithmitic works??? It doesn't make any sense....