www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Pre-expanding alloc cell(s) / reserving space for an associative array

reply Cecil Ward <cecil cecilward.com> writes:
Before I posted a question about avoiding unnecessary 
allocs/reallocs when adding entries to an array like so
     uint[ dstring ]  arr;

when I build it up from nothing with successive insertions.

The array is accessed by a key that is a dstring. I was told that 
I can’t use .reserve or the like on it? Is that correct? My 
memory fails me, powerful pain drugs.

Is there an alternate method I could use ? To be honest, the 
number of entries is likely to be extremely modest, so this is 
not a huge performance issue, six entries would be considered 
quite a lot. The string keys are probably not very very long. But 
I’m learning good habits for the day when I might have a bigger 
such database.
Jul 09 2023
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/9/23 4:24 PM, Cecil Ward wrote:
 Before I posted a question about avoiding unnecessary allocs/reallocs 
 when adding entries to an array like so
      uint[ dstring ]  arr;
 
 when I build it up from nothing with successive insertions.
 
 The array is accessed by a key that is a dstring. I was told that I 
 can’t use .reserve or the like on it? Is that correct? My memory fails 
 me, powerful pain drugs.
 
 Is there an alternate method I could use ? To be honest, the number of 
 entries is likely to be extremely modest, so this is not a huge 
 performance issue, six entries would be considered quite a lot. The 
 string keys are probably not very very long. But I’m learning good 
 habits for the day when I might have a bigger such database.
No there is no such `.reserve` call for associative arrays. It might be possible to implement, but it's quite a bit different from a normal array reserve -- an associative array allocates all its elements as individual memory blocks, so there is no place to reserve things. You would have to add e.g. a free-list, that is reserved from the allocator itself. -Steve
Jul 09 2023
prev sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Sunday, 9 July 2023 at 20:24:24 UTC, Cecil Ward wrote:
 Before I posted a question about avoiding unnecessary 
 allocs/reallocs when adding entries to an array like so
     uint[ dstring ]  arr;

 when I build it up from nothing with successive insertions.

 The array is accessed by a key that is a dstring. I was told 
 that I can’t use .reserve or the like on it? Is that correct? 
 My memory fails me, powerful pain drugs.

 Is there an alternate method I could use ? To be honest, the 
 number of entries is likely to be extremely modest, so this is 
 not a huge performance issue, six entries would be considered 
 quite a lot. The string keys are probably not very very long. 
 But I’m learning good habits for the day when I might have a 
 bigger such database.
From the spec it sounds as though (but good luck testing for sure) that if you have (for example) 6 big dummy key-value pairs in the AA to begin with, then if you use `.clear` it "Removes all remaining keys and values from [the] associative array. The array is not rehashed after removal, __to allow for the existing storage to be reused.__" ([source](https://dlang.org/spec/hash-map.html#properties)) I'm not 100% sure this would work how you want, and it's such a hack, but it's something. D has a few user-made hash-map libraries, so it might be worth investigating those to see if they offer a pre-allocation method, too.
Jul 10 2023
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Jul 10, 2023 at 09:30:57AM +0000, IchorDev via Digitalmars-d-learn
wrote:
[...]
 From the spec it sounds as though (but good luck testing for sure)
 that if you have (for example) 6 big dummy key-value pairs in the AA
 to begin with, then if you use `.clear` it "Removes all remaining keys
 and values from [the] associative array. The array is not rehashed
 after removal, __to allow for the existing storage to be reused.__"
[...] This is not an accurate understanding of what actually happens. The AA implementation consists of a primary hashtable (an array), each slot of which points to a list of buckets. Clearing the AA does not discard the hashtable, but does dispose of the buckets, so adding new keys afterwards will allocate new buckets. So the buckets used by the dummy key-value pairs do not get reused without a reallocation. T -- This is a tpyo.
Jul 10 2023
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/10/23 8:44 AM, H. S. Teoh wrote:
 On Mon, Jul 10, 2023 at 09:30:57AM +0000, IchorDev via Digitalmars-d-learn
wrote:
 [...]
  From the spec it sounds as though (but good luck testing for sure)
 that if you have (for example) 6 big dummy key-value pairs in the AA
 to begin with, then if you use `.clear` it "Removes all remaining keys
 and values from [the] associative array. The array is not rehashed
 after removal, __to allow for the existing storage to be reused.__"
[...] This is not an accurate understanding of what actually happens. The AA implementation consists of a primary hashtable (an array), each slot of which points to a list of buckets. Clearing the AA does not discard the hashtable, but does dispose of the buckets, so adding new keys afterwards will allocate new buckets. So the buckets used by the dummy key-value pairs do not get reused without a reallocation.
This used to be the case, but in the latest implementation, the AA uses open addressing. That is, if a slot is full, it linearly searches the table for the next open slot. This cuts down significantly on cache misses, since the hash slot itself stores the hash, so the pointer only needs to be followed if the hash matches. Fun historical fact, the original AA used a tree to store colliding elements, which means that all AA keys had to support both opEquals and opCmp. BUT, you are right that the elements themselves are individual blocks of memory, and only the bucket array is reused. If you want to see how the hash table works, without having to deal with all the typeinfo acrobatics that the actual builtin AA uses, you can look at my newaa library: https://github.com/schveiguy/newaa This is the exact implementation of the AA internals, but uses compile-time introspection instead of typeinfo. -Steve
Jul 10 2023