www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Has AA a bad implementation?

reply frame <frame86 live.com> writes:
Talking about this:

aaA.d: 157
 void clear() pure nothrow
 {
    import core.stdc.string : memset;
    // clear all data, but don't change bucket array length
    memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) 
 * Bucket.sizeof);
    deleted = used = 0;
    firstUsed = cast(uint) dim;
 }
I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.
Feb 12
parent reply welkam <wwwelkam gmail.com> writes:
On Friday, 12 February 2021 at 12:41:32 UTC, frame wrote:
 Talking about this:

 aaA.d: 157
 void clear() pure nothrow
 {
    import core.stdc.string : memset;
    // clear all data, but don't change bucket array length
    memset(&buckets[firstUsed], 0, (buckets.length - firstUsed) 
 * Bucket.sizeof);
    deleted = used = 0;
    firstUsed = cast(uint) dim;
 }
I see this is for performance but it also means that an associative array never frees the memory it has allocated for life time - or I'm wrong? I am missing a real empty() method here.
There can be slices that point to the memory owned by this array so you cant just free it. If you want manually manage the memory then you should look at dub for different containers.
Feb 12
parent reply frame <frame86 live.com> writes:
On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:

 There can be slices that point to the memory owned by this 
 array so you cant just free it. If you want manually manage the 
 memory then you should look at dub for different containers.
The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0. In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.
Feb 12
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/12/21 3:35 PM, frame wrote:
 On Friday, 12 February 2021 at 12:48:30 UTC, welkam wrote:
 
 There can be slices that point to the memory owned by this array so 
 you cant just free it. If you want manually manage the memory then you 
 should look at dub for different containers.
The array is overwritten with 0 so any pointers are lost anyway. Yeah, I can use any element I want. But my problem is that any thirdparty code may not care about. In any static context it produces leaks. This behavior is not documented - anyone thinks that clear() means clear, not: fill with 0. In some context it's needed to reduce memory. As a standard element it should at least have an explicit ability to free memory when it's really desired.
There's a misunderstanding here. The bucket array is not the data itself, its just basically a pointer and the hash. The Bucket struct is here: https://github.com/dlang/druntime/blob/a026ea46088b369f1182c397e7e046f5fc950358/src/rt/aaA.d#L173 When you use clear, it removes the pointers to all the existing key/value pairs. Those data items are separate GC blocks. If you no longer have any pointers to them, they will be garbage collected. If you want the bucket memory itself to be freed on the next GC cycle, you can just set the AA to null. But the bucket memory shouldn't be that cumbersome. I don't think there's any current way to explicitly free all the items in the AA. Even aa.remove(key) is not going to free the item. -Steve
Feb 12
next sibling parent reply frame <frame86 live.com> writes:
On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer 
wrote:

 There's a misunderstanding here. The bucket array is not the 
 data itself, its just basically a pointer and the hash. The 
 Bucket struct is here:
No, I'm talking about those Buckets.
 If you want the bucket memory itself to be freed on the next GC 
 cycle, you can just set the AA to null. But the bucket memory 
 shouldn't be that cumbersome.
Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array. Yes, nulling helps to get rid of the Bucket list. Also destroy(). But that's not so intentional than using clear() - this is my point.
Feb 12
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/12/21 4:56 PM, frame wrote:
 On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer wrote:
 
 There's a misunderstanding here. The bucket array is not the data 
 itself, its just basically a pointer and the hash. The Bucket struct 
 is here:
No, I'm talking about those Buckets.
 If you want the bucket memory itself to be freed on the next GC cycle, 
 you can just set the AA to null. But the bucket memory shouldn't be 
 that cumbersome.
Maybe not, but it comes in mind for daemon applications. 10000 items are consuming 600 ~ 800K each array.
Your numbers are a bit high, but I don't know what else you would expect. If you have an AA with 10000 elements, it's going to consume a multiple of that number in memory.
 Yes, nulling helps to get rid of the Bucket list. Also destroy(). But 
 that's not so intentional than using clear() - this is my point.
The intent of clear is to reset the array so I DON'T have to reallocate the bucket array. Before clear existed, you had to remove each element one at a time, and then it would resize down throwing away all that space. If your use case is to clear an aa to fill it up again, then clear is perfect for that. It also has implications if you have multiple references to the same AA. Note, I thought rehash would shrink the bucket array down, but it looks like it ignores that when the AA is empty. That seems like an omission. Other types in Phobos treat clear the same way (e.g. appender). I think you are just misunderstanding what clear is for. It means, reset but don't deallocate. -Steve
Feb 12
next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer 
wrote:
 Note, I thought rehash would shrink the bucket array down, but 
 it looks like it ignores that when the AA is empty. That seems 
 like an omission.
Do we necessarily want that shrinkage by default? If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again. For comparison, does `rehash` shrink the bucket array when the AA is non-empty but much, much smaller than the bucket array? I don't object to the possibility to shrink the bucket array where possible but it should probably require an explicit and unambiguous request.
Feb 12
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/12/21 6:13 PM, Joseph Rushton Wakeling wrote:
 On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
 Note, I thought rehash would shrink the bucket array down, but it 
 looks like it ignores that when the AA is empty. That seems like an 
 omission.
Do we necessarily want that shrinkage by default?  If an AA has had a certain size in the past, and then is cleared, it seems likely that the typical use case is that it will fill up to a similar size again.
The shrinkage happens by default if you remove an element and it goes below a threshold. It does not shrink on clear (on purpose, for the reason you say).
 
 For comparison, does `rehash` shrink the bucket array when the AA is 
 non-empty but much, much smaller than the bucket array?
Yes. If it has one element, then it will resize the array.
 I don't object to the possibility to shrink the bucket array where 
 possible but it should probably require an explicit and unambiguous 
 request.
I think it already does shrink for a rehash. It's just set up to not shrink when the AA is "empty". I'm not really sure if it's a problem, just a weird inconsistency. Technically nobody should care about the bucket array, it's an implementation detail. -Steve
Feb 12
prev sibling parent reply frame <frame86 live.com> writes:
On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer 
wrote:
 On 2/12/21 4:56 PM, frame wrote:
 The intent of clear is to reset the array so I DON'T have to 
 reallocate the bucket array.
I know.
 Other types in Phobos treat clear the same way (e.g. appender).
Is this really the same? This looks more like clearing in Appender:
 void clear()  trusted pure nothrow
 {
    if (_data)
    {
        _data.arr = _data.arr.ptr[0 .. 0];
    }
 }
 I think you are just misunderstanding what clear is for. It 
 means, reset but don't deallocate.
So it should be called reset() or blank() then.
 I'm not really sure if it's a problem, just a weird 
 inconsistency. Technically nobody should care about the bucket 
 array, it's an implementation detail.
I just don't like a widely used but leaky standard element that imply it does not. Better correct than error. But ok, it's not a big deal. Should be maybe documented that null can be useful.
Feb 12
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Saturday, 13 February 2021 at 03:43:13 UTC, frame wrote:
 On Friday, 12 February 2021 at 22:23:45 UTC, Steven 
 Schveighoffer wrote:
 I think you are just misunderstanding what clear is for. It 
 means, reset but don't deallocate.
So it should be called reset() or blank() then.
Think of it this way: when you "clear" a room, you remove all of its current occupants, but you do not demolish the room itself.
Feb 12
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/12/21 10:43 PM, frame wrote:
 On Friday, 12 February 2021 at 22:23:45 UTC, Steven Schveighoffer wrote:
 On 2/12/21 4:56 PM, frame wrote:
 Other types in Phobos treat clear the same way (e.g. appender).
Is this really the same? This looks more like clearing in Appender:
 void clear()  trusted pure nothrow
 {
    if (_data)
    {
        _data.arr = _data.arr.ptr[0 .. 0];
    }
 }
Yes, it's the same. Appender uses knowledge of the array allocated length, the _data.arr slice is the current "valid" portion. Your request is the the bucket list is deallocated. Appender does not deallocate the array, as it still deliberately retains a pointer to the GC block.
 I think you are just misunderstanding what clear is for. It means, 
 reset but don't deallocate.
So it should be called reset() or blank() then.
You are not the first to interpret the meaning of words differently. These are not winnable arguments. Just learn what it does now, not what you think it should do. And if you want a feature to do what you want, propose it under a different name. The name "clear" is not changing.
 I'm not really sure if it's a problem, just a weird inconsistency. 
 Technically nobody should care about the bucket array, it's an 
 implementation detail.
I just don't like a widely used but leaky standard element that imply it does not. Better correct than error. But ok, it's not a big deal. Should be maybe documented that null can be useful.
Setting to null is useful, but only if you have one reference to the AA. It won't deallocate the buckets if you have more than one reference. e.g.: int[int] aa = [1 : 2, 3 : 4]; auto bb = aa; aa = null; assert(bb.length == 2); // still there aa = bb; aa.clear; assert(bb.length == 0); // all references are now empty, but bucket array is still fully there There is probably room to have a mechanism to destroy the buckets for all references. Looking at destroy, it looks like it's equivalent to setting equal to null. Which is consistent, but I'm surprised there's no "recursive destroy" type function, even for normal arrays. -Steve
Feb 13
prev sibling parent Elronnd <elronnd elronnd.net> writes:
On Friday, 12 February 2021 at 20:59:50 UTC, Steven Schveighoffer 
wrote:
 I don't think there's any current way to explicitly free all 
 the items in the AA. Even aa.remove(key) is not going to free 
 the item.
Removing all the keys will trigger resizes, freeing the memory. But doing that individually for each key incurs overhead. Why not expose such functionality?
Feb 16