www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Does associative array change the location of values?

reply Andrey Zherikov <andrey.zherikov gmail.com> writes:
I want to have a pointer to a value in an associative array. Does 
AA guarantee that the value will remain at the same address all 
the time unless I remove the corresponding key? I couldn't find 
any guarantees similar to C++ iterator invalidation in D Language 
Reference.
Oct 29 2021
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 29 October 2021 at 17:40:38 UTC, Andrey Zherikov wrote:
 I want to have a pointer to a value in an associative array. 
 Does AA guarantee that the value will remain at the same 
 address all the time unless I remove the corresponding key? I 
 couldn't find any guarantees similar to C++ iterator 
 invalidation in D Language Reference.
No, the AA does not guarantee that the value will remain in the same location. Inserting or removing *any* keys could cause the AA to resize, which may change the locations of all of its values. However, you do not have to worry about undefined behavior, because the garbage collector will keep the "stale" copy of the value alive as long as you hold a pointer to it.
Oct 29 2021
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 29, 2021 at 05:58:24PM +0000, Paul Backus via Digitalmars-d-learn
wrote:
 On Friday, 29 October 2021 at 17:40:38 UTC, Andrey Zherikov wrote:
 I want to have a pointer to a value in an associative array. Does AA
 guarantee that the value will remain at the same address all the
 time unless I remove the corresponding key? I couldn't find any
 guarantees similar to C++ iterator invalidation in D Language
 Reference.
No, the AA does not guarantee that the value will remain in the same location. Inserting or removing *any* keys could cause the AA to resize, which may change the locations of all of its values. However, you do not have to worry about undefined behavior, because the garbage collector will keep the "stale" copy of the value alive as long as you hold a pointer to it.
One way to achieve what the OP wants is to store a pointer to a heap-allocated object in the AA. Then AA rehashing won't change the value of the pointer. T -- Designer clothes: how to cover less by paying more.
Oct 29 2021
prev sibling next sibling parent Andrey Zherikov <andrey.zherikov gmail.com> writes:
On Friday, 29 October 2021 at 17:58:24 UTC, Paul Backus wrote:
 No, the AA does not guarantee that the value will remain in the 
 same location. Inserting or removing *any* keys could cause the 
 AA to resize, which may change the locations of all of its 
 values.

 However, you do not have to worry about undefined behavior, 
 because the garbage collector will keep the "stale" copy of the 
 value alive as long as you hold a pointer to it.
Thanks a lot for clarification
Oct 29 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/29/21 1:58 PM, Paul Backus wrote:
 On Friday, 29 October 2021 at 17:40:38 UTC, Andrey Zherikov wrote:
 I want to have a pointer to a value in an associative array. Does AA 
 guarantee that the value will remain at the same address all the time 
 unless I remove the corresponding key? I couldn't find any guarantees 
 similar to C++ iterator invalidation in D Language Reference.
No, the AA does not guarantee that the value will remain in the same location. Inserting or removing *any* keys could cause the AA to resize, which may change the locations of all of its values. However, you do not have to worry about undefined behavior, because the garbage collector will keep the "stale" copy of the value alive as long as you hold a pointer to it.
This is incorrect, the buckets are each heap allocated. Just the array of bucket pointers would change. In addition, AAs do not deallocate the key/value pairs ever. You are safe to obtain a pointer to a value and it will stay there, even if you remove the key. -Steve
Oct 29 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Friday, 29 October 2021 at 21:00:48 UTC, Steven Schveighoffer 
wrote:

 This is incorrect, the buckets are each heap allocated. Just 
 the array of bucket pointers would change.

 In addition, AAs do not deallocate the key/value pairs ever. 
 You are safe to obtain a pointer to a value and it will stay 
 there, even if you remove the key.
Who's going to document these implementation details? ;) I mean, if no one, then the above shouldn't be stated. Wouldn't you agree? Given the premise of the question at hand, it does seem useful to know these. But at least one should stress what is and isn't subject to change (even if unlikely).
Oct 29 2021
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 30 October 2021 at 00:49:04 UTC, Stanislav Blinov 
wrote:
 On Friday, 29 October 2021 at 21:00:48 UTC, Steven 
 Schveighoffer wrote:

 This is incorrect, the buckets are each heap allocated. Just 
 the array of bucket pointers would change.

 In addition, AAs do not deallocate the key/value pairs ever. 
 You are safe to obtain a pointer to a value and it will stay 
 there, even if you remove the key.
Who's going to document these implementation details? ;) I mean, if no one, then the above shouldn't be stated. Wouldn't you agree? Given the premise of the question at hand, it does seem useful to know these. But at least one should stress what is and isn't subject to change (even if unlikely).
This should be documented for sure
Oct 29 2021
parent reply Andrey Zherikov <andrey.zherikov gmail.com> writes:
On Saturday, 30 October 2021 at 00:52:23 UTC, Imperatorn wrote:
 On Saturday, 30 October 2021 at 00:49:04 UTC, Stanislav Blinov 
 wrote:
 On Friday, 29 October 2021 at 21:00:48 UTC, Steven 
 Schveighoffer wrote:

 This is incorrect, the buckets are each heap allocated. Just 
 the array of bucket pointers would change.

 In addition, AAs do not deallocate the key/value pairs ever. 
 You are safe to obtain a pointer to a value and it will stay 
 there, even if you remove the key.
Who's going to document these implementation details? ;) I mean, if no one, then the above shouldn't be stated. Wouldn't you agree? Given the premise of the question at hand, it does seem useful to know these. But at least one should stress what is and isn't subject to change (even if unlikely).
This should be documented for sure
I did small test and it printed the same values three times so even rehash doesn't change the address of the value: ```d long[long] aa = [0:0]; writeln(&aa[0]); foreach(i; 0 .. 100_000_000) aa[i]=i; writeln(&aa[0]); aa.rehash; writeln(&aa[0]); ``` So it seems pretty safe to store a pointer to a value in AA. And I agree that this should definitely be documented.
Oct 30 2021
next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 18:31:16 UTC, Andrey Zherikov 
wrote:

 I did small test and it printed the same values three times so 
 even rehash doesn't change the address of the value:
 So it seems pretty safe to store a pointer to a value in AA. 
 And I agree that this should definitely be documented.
Address itself may change though. While the AA won't move stuff around, a GC might. I don't think current GC moves anything, but it's definitely allowed to. Which would be transparent to your code so long as you don't depend on the value of address itself :) https://dlang.org/spec/garbage.html#pointers_and_gc
Oct 30 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 30 October 2021 at 20:05:17 UTC, Stanislav Blinov 
wrote:
 On Saturday, 30 October 2021 at 18:31:16 UTC, Andrey Zherikov 
 wrote:

 I did small test and it printed the same values three times so 
 even rehash doesn't change the address of the value:
 So it seems pretty safe to store a pointer to a value in AA. 
 And I agree that this should definitely be documented.
Address itself may change though. While the AA won't move stuff around, a GC might. I don't think current GC moves anything, but it's definitely allowed to. Which would be transparent to your code so long as you don't depend on the value of address itself :) https://dlang.org/spec/garbage.html#pointers_and_gc
What test could be written to verify the behaviour?
Oct 30 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 20:19:58 UTC, Imperatorn wrote:

 https://dlang.org/spec/garbage.html#pointers_and_gc
What test could be written to verify the behaviour?
Assuming the GC was moving? You'd need a loop allocating different sizes, storing the addresses somewhere the GC won't see (i.e. in memory allocated not with the GC: malloc, VirtualAlloc, mmap...), orphaning some allocations, repeating a bunch of times, and then comparing addresses of remaining allocations with stored ones. Behavior would very much depend on the GC implementation, so would a concrete test. Point is, it's allowed to move so we have to assume it would even if it doesn't, and relying on it not moving (i.e. depending on concrete addresses) is UB.
Oct 30 2021
parent reply Elronnd <elronnd elronnd.net> writes:
On Saturday, 30 October 2021 at 21:20:15 UTC, Stanislav Blinov 
wrote:
 On Saturday, 30 October 2021 at 20:19:58 UTC, Imperatorn wrote:

 https://dlang.org/spec/garbage.html#pointers_and_gc
What test could be written to verify the behaviour?
Assuming the GC was moving?
If the GC were moving, it would also have to move the pointers you took to AA elements. You would never get stale pointers in any event.
Oct 30 2021
next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 22:47:57 UTC, Elronnd wrote:

 If the GC were moving, it would also have to move the pointers 
 you took to AA elements.  You would never get stale pointers in 
 any event.
Who said you would?..
Oct 30 2021
prev sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/30/21 3:47 PM, Elronnd wrote:

 If the GC were moving, it would also have to move the pointers you took
 to AA elements.
I doubt D's GC can ever change pointer values because the values may be hiding inside e.g. ulong variables. And we would definitely not want the GC to change ulong values just because it thought they were familiar pointer values in disguise. :) Ali
Oct 30 2021
next sibling parent Elronnd <elronnd elronnd.net> writes:
On Sunday, 31 October 2021 at 02:56:35 UTC, Ali Çehreli wrote:
 On 10/30/21 3:47 PM, Elronnd wrote:

 If the GC were moving, it would also have to move the
pointers you took
 to AA elements.
I doubt D's GC can ever change pointer values because the values may be hiding inside e.g. ulong variables. And we would definitely not want the GC to change ulong values just because it thought they were familiar pointer values in disguise. :)
Precise GC exists now.
Oct 30 2021
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Oct 30, 2021 at 07:56:35PM -0700, Ali Çehreli via Digitalmars-d-learn
wrote:
 On 10/30/21 3:47 PM, Elronnd wrote:
 
 If the GC were moving, it would also have to move the pointers you
 took to AA elements.
I doubt D's GC can ever change pointer values because the values may be hiding inside e.g. ulong variables. And we would definitely not want the GC to change ulong values just because it thought they were familiar pointer values in disguise. :)
[...] The current spec explicitly states that masking pointers this way is UB. T -- Без труда не выловишь и рыбку из пруда.
Oct 31 2021
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/31/21 9:49 AM, H. S. Teoh wrote:

 The current spec explicitly states that masking pointers this way is UB.
Ok. :) What about unions? union U { ulong u; void* p; } Can the GC deal with that? Or other smart ways where I store the pointer in two parts as page+offset? (Perhaps that's UB as well.) Ali
Oct 31 2021
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Oct 31, 2021 at 09:54:23AM -0700, Ali Çehreli via Digitalmars-d-learn
wrote:
 On 10/31/21 9:49 AM, H. S. Teoh wrote:
 
 The current spec explicitly states that masking pointers this way is UB.
Ok. :) What about unions? union U { ulong u; void* p; } Can the GC deal with that?
https://dlang.org/spec/garbage.html#obj_pinning_and_gc Though I have to say, object pinning does impose restrictions on a moving GC implementation that may negate some of its benefits. T -- Never criticize a man until you've walked a mile in his shoes. Then when you do criticize him, you'll be a mile away and he won't have his shoes.
Oct 31 2021
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/31/21 9:54 AM, Ali =C3=87ehreli wrote:

 Or other smart ways where I store the pointer in two parts as=20
 page+offset?
Ok, that was silly. There would be no reference to the GC memory if I=20 chopped off a pointer. :/ Ali
Oct 31 2021
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/30/21 2:31 PM, Andrey Zherikov wrote:
 On Saturday, 30 October 2021 at 00:52:23 UTC, Imperatorn wrote:
 On Saturday, 30 October 2021 at 00:49:04 UTC, Stanislav Blinov wrote:
 On Friday, 29 October 2021 at 21:00:48 UTC, Steven Schveighoffer wrote:

 This is incorrect, the buckets are each heap allocated. Just the 
 array of bucket pointers would change.

 In addition, AAs do not deallocate the key/value pairs ever. You are 
 safe to obtain a pointer to a value and it will stay there, even if 
 you remove the key.
Who's going to document these implementation details? ;) I mean, if no one, then the above shouldn't be stated. Wouldn't you agree? Given the premise of the question at hand, it does seem useful to know these. But at least one should stress what is and isn't subject to change (even if unlikely).
This should be documented for sure
I did small test and it printed the same values three times so even rehash doesn't change the address of the value: ```d             long[long] aa = [0:0];             writeln(&aa[0]);             foreach(i; 0 .. 100_000_000)                     aa[i]=i;             writeln(&aa[0]);             aa.rehash;             writeln(&aa[0]); ``` So it seems pretty safe to store a pointer to a value in AA. And I agree that this should definitely be documented.
Here is the rehash code, it definitely just moves around the buckets: https://github.com/dlang/druntime/blob/20963f956c550b7b1b52849e5cf41f436d0df788/src/rt/aaA.d#L144-L157 That free at the end is just for the original `Bucket[]` array. A Bucket is just a hash and a void pointer to the actual key/value pair. When you get a pointer to an AA value, it's to the key/value pair block (there is no struct for this, it's done via runtime type information, which is horrific legacy). -Steve
Oct 30 2021
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/29/21 8:49 PM, Stanislav Blinov wrote:
 On Friday, 29 October 2021 at 21:00:48 UTC, Steven Schveighoffer wrote:
 
 This is incorrect, the buckets are each heap allocated. Just the array 
 of bucket pointers would change.

 In addition, AAs do not deallocate the key/value pairs ever. You are 
 safe to obtain a pointer to a value and it will stay there, even if 
 you remove the key.
Who's going to document these implementation details? ;) I mean, if no one, then the above shouldn't be stated. Wouldn't you agree?
It should be documented. There isn't a valid way to remove these requirements, even if they are currently just an implementation detail -- code already depends on these properties. And D is a GC-based language, especially when using AAs. There is no reason to introduce undefined behavior for existing usage.
 Given the premise of the question at hand, it does seem useful to know 
 these. But at least one should stress what is and isn't subject to 
 change (even if unlikely).
The whole AA documentation probably needs some attention. -Steve
Oct 30 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 11:59:15 UTC, Steven 
Schveighoffer wrote:

 It should be documented. There isn't a valid way to remove 
 these requirements, even if they are currently just an 
 implementation detail -- code already depends on these 
 properties.
 And D is a GC-based language, especially when using AAs. There 
 is no reason to introduce undefined behavior for existing usage.
You won't introduce UB by deallocating unreferenced elements, which a given GC *may* be able to figure out. Therefore I object to "AAs do not deallocate the key/value pairs ever" part. Strongly :) Until such time that such a requirement is indeed set in stone, and not incidental.
Oct 30 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/30/21 10:51 AM, Stanislav Blinov wrote:
 On Saturday, 30 October 2021 at 11:59:15 UTC, Steven Schveighoffer wrote:
 
 It should be documented. There isn't a valid way to remove these 
 requirements, even if they are currently just an implementation detail 
 -- code already depends on these properties.
 And D is a GC-based language, especially when using AAs. There is no 
 reason to introduce undefined behavior for existing usage.
You won't introduce UB by deallocating unreferenced elements, which a given GC *may* be able to figure out. Therefore I object to "AAs do not deallocate the key/value pairs ever" part. Strongly :) Until such time that such a requirement is indeed set in stone, and not incidental.
auto v = k in aa; aa.remove(k); How can the GC/compiler work out that there is still a reference? -Steve
Oct 30 2021
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 16:55:03 UTC, Steven 
Schveighoffer wrote:

 auto v = k in aa;
 aa.remove(k);

 How can the GC/compiler work out that there is still a 
 reference?
??? The same way it does for all other references. I think either you misunderstood me, or I misunderstood you.
Oct 30 2021
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/30/21 1:38 PM, Stanislav Blinov wrote:
 On Saturday, 30 October 2021 at 16:55:03 UTC, Steven Schveighoffer wrote:
 
 auto v = k in aa;
 aa.remove(k);

 How can the GC/compiler work out that there is still a reference?
??? The same way it does for all other references. I think either you misunderstood me, or I misunderstood you.
You said "deallocating unreferenced elements". I thought you meant elements unreferenced by the AA. What I mean is, the AA isn't going to change implementations where it now deallocates values that may still have existing references. If that's the case, we can state that in the docs. e.g.: Removing a key does not deallocate the value that was removed. The value's lifetime is managed by the GC and will be alive until there are no references to that value. -Steve
Oct 30 2021
parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 30 October 2021 at 17:45:57 UTC, Steven 
Schveighoffer wrote:

 You said "deallocating unreferenced elements". I thought you 
 meant elements unreferenced by the AA.
Yup, I misunderstood you :)
 What I mean is, the AA isn't going to change implementations 
 where it now deallocates values that may still have existing 
 references. If that's the case, we can state that in the docs.

 e.g.:

 Removing a key does not deallocate the value that was removed. 
 The value's lifetime is managed by the GC and will be alive 
 until there are no references to that value.
:whateveritistodoathumbsuphere:
Oct 30 2021