digitalmars.D - https://issues.dlang.org/show_bug.cgi?id=2504: reserve for
- Andrei Alexandrescu (27/27) Nov 01 2016 Our student Alex has been looking at this and I wanted to open the floor...
- Jonathan M Davis via Digitalmars-d (35/36) Nov 01 2016 Hasn't Martin been working on a new, templatized AA implementation to
- Era Scarecrow (9/13) Nov 01 2016 I don't see why it would be odd.
- safety0ff (9/13) Nov 02 2016 F.Y.I. It now appears to use quadratic probing since druntime PR
- Jon Degenhardt (13/18) Nov 02 2016 Aren't associative arrays using open addressing for collision
- Steven Schveighoffer (14/20) Nov 03 2016 They have changed. Collisions are now handled via finding the next
- safety0ff (4/5) Nov 03 2016 In case I wasn't clear in my previous post:
- Steven Schveighoffer (5/10) Nov 03 2016 Yes, you can't use a free-list when removing nodes. AA used to actually
- Yuxuan Shui (4/10) Nov 03 2016 How about let the user provide the memory block when they calls
- Era Scarecrow (2/4) Nov 03 2016 I'd love to see this type of option almost everywhere.
- Shachar Shemesh (3/9) Nov 04 2016 I think that's just a halfhearted attempt at introducing STL's allocator...
- Alexandru Caciulescu (5/19) Nov 05 2016 I see this topic started a clash of opinions regarding the future
- Jon Degenhardt (17/39) Nov 05 2016 To make a concrete proposal - For near-term, against current AAs,
- safety0ff (5/18) Nov 06 2016 I agree with what Jon Degenhardt said. It also seems to be in
- Jon Degenhardt (2/23) Jan 04 2017 Alex, Andrei - Any updates on pursuing this?
- Alexandru Caciulescu (3/4) Oct 09 2017 https://github.com/dlang/druntime/pull/1929#pullrequestreview-67691304
- Shachar Shemesh (26/30) Nov 04 2016 I think that, in itself, is a mistake. A language that thinks of itself
Our student Alex has been looking at this and I wanted to open the floor for a bit of discussion before we embark on an implementation. Last time I looked our associative arrays were arrays of singly-linked lists. It follows that in order to implement reserve() we'd need a freelist allocator approach, which preallocates a bunch of nodes in a single contiguous block of memory. Each hashtable would have its own freelist, or alternatively all hashtables of the same types share the same freelist. So should we get into this? Pros: * It provides a nice primitive to use with built-in hashtables * Accelerates without much effort a variety of codes using hashtables Cons: * Built-in hashtables are a convenience facility more than a high-performance one, so providing an efficiency-oriented primitive seems a bit odd. * We're considering a number of improvements and refactorings to built-in hashtables, which may obviate the freelist implementation or need extra effort to implement it. Basically defining reserve() limits our future implementation freedom. * Implementation effort is nontrivial. * The time needed to initialize the freelist in one contiguous block of memory is still O(n) (although the constant is small and special implementation techniques may make it O(1)). * The implementation cannot use Phobos' allocator because it's in druntime. Please share your thoughts. Thanks, Andrei
Nov 01 2016
On Tuesday, November 01, 2016 23:36:42 Andrei Alexandrescu via Digitalmars-d wrote:Please share your thoughts.Hasn't Martin been working on a new, templatized AA implementation to replace the current one? As I recall, he said something about postponing it by a release to focus on something else (helping Walter with safe issues IIRC), but it sounded like it was close to ready. And if that's the case, I really don't think that it makes sense to be doing anything major with the current implementation. But regardless of that, I think that the focus of the built-in AAs should be to have something simple that works. It should be reasonably performant, but anything involving performance tuning really should be in library type. We've had quite a few problems with the built-in AAs over the years (especially with stuff like user-defined key types - in part because it uses void* internally IIRC), and while the new implementation will hopefully be much better in that regard, I think that based on the issues with had with the built-in AAs historically, doing anything that would make them more complicated should be approached with extreme caution. And anyone who really wants to do performance tuning is going to need better facilities than we're going to be able to reasonably provide in the built-in AAs anyway. The only real downside to user-defined hash tables at this point is that you can't use AA literals with them (which we should probably create a DIP for at some point). Aside from that, there's no reason that a library AA can't work as well or better than the built-in AAs, and it would be much easier to have facilities for tuning performance with a library type than with something built into the language. I think that effort put towards containers would be much better served working towards the std.container rewrite (though you would know where that stands better than the rest of us as well as whether it makes any sense for anyone to help you right now). Or maybe Martin could use some help in finishing whatever he's done with the new, built-in AA implementation. I don't know. But considering that adding reserve to the built-in AAs is not a small task and that it'is debatable that it should be done at all, there pretty much has to be something better for Alex to work on. - Jonathan M Davis
Nov 01 2016
On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:Cons: * Built-in hashtables are a convenience facility more than a high-performance one, so providing an efficiency-oriented primitive seems a bit odd.I don't see why it would be odd. If I know my code is going to be working with 10,000,000 values, then telling/hinting so it could prepare appropriately up front rather than taking hits later seems useful. It's not like it couldn't work without it right? I'd hope for a few other features, like the AA could be pre-initialized and ready to go, for fixed/immutable lists.
Nov 01 2016
On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:Last time I looked our associative arrays were arrays of singly-linked lists.F.Y.I. It now appears to use quadratic probing since druntime PREach hashtable would have its own freelist, or alternatively all hashtables of the same types share the same freelist.How do you return memory to the freelist when GC is expected to managed memory of the entries? Without belaboring the point, I think we'd better off with good library AA offerings as well.
Nov 02 2016
On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:Last time I looked our associative arrays were arrays of singly-linked lists. It follows that in order to implement reserve() we'd need a freelist allocator approach, which preallocates a bunch of nodes in a single contiguous block of memory.Aren't associative arrays using open addressing for collision resolution? I'm looking at routines findSlotInsert() and findSlotLookup() here: https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L102 There is already a resize() routine: https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L141 The resize() routine takes the new size of the bucket array as an argument. Could a 'reserve' routine call this? Scanning through the code, it appears policy on size of the bucket array is established by callers to resize(), e.g. see the _aaRehash() function, which calls resize().
Nov 02 2016
On 11/1/16 11:36 PM, Andrei Alexandrescu wrote:Our student Alex has been looking at this and I wanted to open the floor for a bit of discussion before we embark on an implementation. Last time I looked our associative arrays were arrays of singly-linked lists. It follows that in order to implement reserve() we'd need a freelist allocator approach, which preallocates a bunch of nodes in a single contiguous block of memory.They have changed. Collisions are now handled via finding the next available slot in the array. However, elements of the array are still pointers to entries. So technically, the freelist is still needed. But I'm not sure preallocating into a contiguous block is the best approach. People don't expect a hashtable to hold onto all the memory it has used in the past when you remove most of the elements. In dcollections, I let the allocator take care of the details of allocation. I think the best thing to do is to preallocate the array at least (so no rehashing occurs during additions of nodes), and then move towards a templated solution that can use a custom allocator for tuning the performance/memory usage tradeoff. -Steve
Nov 03 2016
On Thursday, 3 November 2016 at 13:19:17 UTC, Steven Schveighoffer wrote:So technically, the freelist is still needed.In case I wasn't clear in my previous post: We can't use a freelist because it breaks safety.
Nov 03 2016
On 11/3/16 11:48 PM, safety0ff wrote:On Thursday, 3 November 2016 at 13:19:17 UTC, Steven Schveighoffer wrote:Yes, you can't use a free-list when removing nodes. AA used to actually free the memory, but that was decided against long ago. There are so many possibilities if we just move to a templated type. -SteveSo technically, the freelist is still needed.In case I wasn't clear in my previous post: We can't use a freelist because it breaks safety.
Nov 03 2016
On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:[snip] * The implementation cannot use Phobos' allocator because it's in druntime.How about let the user provide the memory block when they calls reserve()?Please share your thoughts. Thanks, Andrei
Nov 03 2016
On Thursday, 3 November 2016 at 21:00:56 UTC, Yuxuan Shui wrote:How about let the user provide the memory block when they calls reserve()?I'd love to see this type of option almost everywhere.
Nov 03 2016
On 03/11/16 23:00, Yuxuan Shui wrote:On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:I think that's just a halfhearted attempt at introducing STL's allocators. Shachar[snip] * The implementation cannot use Phobos' allocator because it's in druntime.How about let the user provide the memory block when they calls reserve()?
Nov 04 2016
On Friday, 4 November 2016 at 09:37:39 UTC, Shachar Shemesh wrote:On 03/11/16 23:00, Yuxuan Shui wrote:I see this topic started a clash of opinions regarding the future of AAs. After Andrei suggested a free-list implementation I had a good idea of how to proceed but now I am not so sure since this discussion isn't converging to a single idea/implementation.On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:I think that's just a halfhearted attempt at introducing STL's allocators.[snip] * The implementation cannot use Phobos' allocator because it's in druntime.How about let the user provide the memory block when they calls reserve()?
Nov 05 2016
On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru Caciulescu wrote:On Friday, 4 November 2016 at 09:37:39 UTC, Shachar Shemesh wrote:To make a concrete proposal - For near-term, against current AAs, add a 'reserve()' function that pre-allocates the bucket array, but not the memory for elements. This would be consistent with the current implementation. It looks like it could be done via a call to the 'resize()' function. It would be similar to the 'rehash()' function (see _aaRehash() in aaA.d). Memory allocation and management for inserted elements is unchanged, it continues to operate as current. This approach would be similar to how the C++ std::unordered_map::reserve() function works. This still permits a longer term approach adding template-based hash tables in the future, with various options for memory management, etc. I think this suggestion is consistent with Steve Schveighoffer's suggestion earlier in the thread. --JonOn 03/11/16 23:00, Yuxuan Shui wrote:I see this topic started a clash of opinions regarding the future of AAs. After Andrei suggested a free-list implementation I had a good idea of how to proceed but now I am not so sure since this discussion isn't converging to a single idea/implementation.On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:I think that's just a halfhearted attempt at introducing STL's allocators.[snip] * The implementation cannot use Phobos' allocator because it's in druntime.How about let the user provide the memory block when they calls reserve()?
Nov 05 2016
On Sunday, 6 November 2016 at 03:28:20 UTC, Jon Degenhardt wrote:On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru Caciulescu wrote:I agree with what Jon Degenhardt said. It also seems to be in agreement with what Shachar Shemesh and Steven Schveighoffer wrote regarding the implementation of a reserve function for the built in AAs.I see this topic started a clash of opinions regarding the future of AAs. After Andrei suggested a free-list implementation I had a good idea of how to proceed but now I am not so sure since this discussion isn't converging to a single idea/implementation.[Snip] I think this suggestion is consistent with Steve Schveighoffer's suggestion earlier in the thread. --Jon
Nov 06 2016
On Sunday, 6 November 2016 at 09:32:33 UTC, safety0ff wrote:On Sunday, 6 November 2016 at 03:28:20 UTC, Jon Degenhardt wrote:Alex, Andrei - Any updates on pursuing this?On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru Caciulescu wrote:I agree with what Jon Degenhardt said. It also seems to be in agreement with what Shachar Shemesh and Steven Schveighoffer wrote regarding the implementation of a reserve function for the built in AAs.I see this topic started a clash of opinions regarding the future of AAs. After Andrei suggested a free-list implementation I had a good idea of how to proceed but now I am not so sure since this discussion isn't converging to a single idea/implementation.[Snip] I think this suggestion is consistent with Steve Schveighoffer's suggestion earlier in the thread. --Jon
Jan 04 2017
On Wednesday, 4 January 2017 at 08:00:14 UTC, Jon Degenhardt wrote:Alex, Andrei - Any updates on pursuing this?https://github.com/dlang/druntime/pull/1929#pullrequestreview-67691304
Oct 09 2017
On 02/11/16 05:36, Andrei Alexandrescu wrote:Cons: * Built-in hashtables are a convenience facility more than a high-performance one, so providing an efficiency-oriented primitive seems a bit odd.I think that, in itself, is a mistake. A language that thinks of itself as a system programming language cannot, I think, provide hash functions that are designed to be "convenience over performance". I also think the basic approach is flawed. If we're talking about GC backs storage for everything, then "reserve" should not attempt to pre-allocate memory to the values. Instead, "reserve" should set the hash table size. In other words, if I call "hash.reserve(10_000)", and then insert 10,000 elements, I'm less concerned with allocations for the values (especially if it's defined to be GC controlled and non-contiguous anyways), and more concerned about it not rehashing while I insert due to the table getting full. To me, "reserve" is more about preventing copying data around due to the data structure being surprised from the number of elements it needs to handle than it is about preventing any allocations during the insert. If the values are stored in a linked list, and do not need to be copied no matter what, then reserve need not do anything about them at all. Lastly, if "reserve" doesn't do anything under a certain implementation because it makes no sense, that is not a problem as far as I'm concerned. If an implementation does not gain performance from calling "reserve", that is not a reason not to have the function. It can just be a no-op, and I'm fine with calling it an "implementation detail". As such, I don't think it is a problem for future changes to the underlying structure. Shachar
Nov 04 2016