digitalmars.D - https://issues.dlang.org/show_bug.cgi?id=2504: reserve for

Andrei Alexandrescu (27/27) Nov 01 2016 Our student Alex has been looking at this and I wanted to open the floor...

Jonathan M Davis via Digitalmars-d (35/36) Nov 01 2016 Hasn't Martin been working on a new, templatized AA implementation to
Era Scarecrow (9/13) Nov 01 2016 I don't see why it would be odd.
safety0ff (9/13) Nov 02 2016 F.Y.I. It now appears to use quadratic probing since druntime PR
Jon Degenhardt (13/18) Nov 02 2016 Aren't associative arrays using open addressing for collision
Steven Schveighoffer (14/20) Nov 03 2016 They have changed. Collisions are now handled via finding the next

safety0ff (4/5) Nov 03 2016 In case I wasn't clear in my previous post:

Steven Schveighoffer (5/10) Nov 03 2016 Yes, you can't use a free-list when removing nodes. AA used to actually

Yuxuan Shui (4/10) Nov 03 2016 How about let the user provide the memory block when they calls

Era Scarecrow (2/4) Nov 03 2016 I'd love to see this type of option almost everywhere.
Shachar Shemesh (3/9) Nov 04 2016 I think that's just a halfhearted attempt at introducing STL's allocator...

Alexandru Caciulescu (5/19) Nov 05 2016 I see this topic started a clash of opinions regarding the future

Jon Degenhardt (17/39) Nov 05 2016 To make a concrete proposal - For near-term, against current AAs,
safety0ff (5/18) Nov 06 2016 I agree with what Jon Degenhardt said. It also seems to be in

Jon Degenhardt (2/23) Jan 04 2017 Alex, Andrei - Any updates on pursuing this?

Alexandru Caciulescu (3/4) Oct 09 2017 https://github.com/dlang/druntime/pull/1929#pullrequestreview-67691304

Shachar Shemesh (26/30) Nov 04 2016 I think that, in itself, is a mistake. A language that thinks of itself

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Our student Alex has been looking at this and I wanted to open the floor 
for a bit of discussion before we embark on an implementation.

Last time I looked our associative arrays were arrays of singly-linked 
lists. It follows that in order to implement reserve() we'd need a 
freelist allocator approach, which preallocates a bunch of nodes in a 
single contiguous block of memory.

Each hashtable would have its own freelist, or alternatively all 
hashtables of the same types share the same freelist.

So should we get into this? Pros:

* It provides a nice primitive to use with built-in hashtables

* Accelerates without much effort a variety of codes using hashtables

Cons:

* Built-in hashtables are a convenience facility more than a 
high-performance one, so providing an efficiency-oriented primitive 
seems a bit odd.

* We're considering a number of improvements and refactorings to 
built-in hashtables, which may obviate the freelist implementation or 
need extra effort to implement it. Basically defining reserve() limits 
our future implementation freedom.

* Implementation effort is nontrivial.

* The time needed to initialize the freelist in one contiguous block of 
memory is still O(n) (although the constant is small and special 
implementation techniques may make it O(1)).

* The implementation cannot use Phobos' allocator because it's in druntime.

Please share your thoughts.


Thanks,

Andrei

Nov 01 2016

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tuesday, November 01, 2016 23:36:42 Andrei Alexandrescu via Digitalmars-d 
wrote:
 Please share your thoughts.

Hasn't Martin been working on a new, templatized AA implementation to
replace the current one? As I recall, he said something about postponing it
by a release to focus on something else (helping Walter with  safe issues
IIRC), but it sounded like it was close to ready. And if that's the case, I
really don't think that it makes sense to be doing anything major with the
current implementation.

But regardless of that, I think that the focus of the built-in AAs should be
to have something simple that works. It should be reasonably performant, but
anything involving performance tuning really should be in library type.
We've had quite a few problems with the built-in AAs over the years
(especially with stuff like user-defined key types - in part because it uses
void* internally IIRC), and while the new implementation will hopefully be
much better in that regard, I think that based on the issues with had with
the built-in AAs historically, doing anything that would make them more
complicated should be approached with extreme caution.

And anyone who really wants to do performance tuning is going to need better
facilities than we're going to be able to reasonably provide in the built-in
AAs anyway.

The only real downside to user-defined hash tables at this point is that you
can't use AA literals with them (which we should probably create a DIP for
at some point). Aside from that, there's no reason that a library AA can't
work as well or better than the built-in AAs, and it would be much easier to
have facilities for tuning performance with a library type than with
something built into the language.

I think that effort put towards containers would be much better served
working towards the std.container rewrite (though you would know where that
stands better than the rest of us as well as whether it makes any sense for
anyone to help you right now). Or maybe Martin could use some help in
finishing whatever he's done with the new, built-in AA implementation. I
don't know. But considering that adding reserve to the built-in AAs is not a
small task and that it'is debatable that it should be done at all, there
pretty much has to be something better for Alex to work on.

- Jonathan M Davis

Nov 01 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
Alexandrescu wrote:
 Cons:

 * Built-in hashtables are a convenience facility more than a 
 high-performance one, so providing an efficiency-oriented 
 primitive seems a bit odd.

  I don't see why it would be odd.

  If I know my code is going to be working with 10,000,000 values, 
then telling/hinting so it could prepare appropriately up front 
rather than taking hits later seems useful. It's not like it 
couldn't work without it right?

  I'd hope for a few other features, like the AA could be 
pre-initialized and ready to go, for fixed/immutable lists.

Nov 01 2016

safety0ff <safety0ff.dev gmail.com> writes:

On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
Alexandrescu wrote:
 Last time I looked our associative arrays were arrays of 
 singly-linked lists.

F.Y.I. It now appears to use quadratic probing since druntime PR 


 Each hashtable would have its own freelist, or alternatively 
 all hashtables of the same types share the same freelist.

How do you return memory to the freelist when GC is expected to 
managed memory of the entries?




Without belaboring the point, I think we'd better off with good 
library AA offerings as well.

Nov 02 2016

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
Alexandrescu wrote:
 Last time I looked our associative arrays were arrays of 
 singly-linked lists. It follows that in order to implement 
 reserve() we'd need a freelist allocator approach, which 
 preallocates a bunch of nodes in a single contiguous block of 
 memory.

Aren't associative arrays using open addressing for collision 
resolution? I'm looking at routines findSlotInsert() and 
findSlotLookup() here: 
https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L102

There is already a resize() routine: 
https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L141

The resize() routine takes the new size of the bucket array as an 
argument. Could a 'reserve' routine call this? Scanning through 
the code, it appears policy on size of the bucket array is 
established by callers to resize(), e.g. see the _aaRehash() 
function, which calls resize().

Nov 02 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/1/16 11:36 PM, Andrei Alexandrescu wrote:
 Our student Alex has been looking at this and I wanted to open the floor
 for a bit of discussion before we embark on an implementation.

 Last time I looked our associative arrays were arrays of singly-linked
 lists. It follows that in order to implement reserve() we'd need a
 freelist allocator approach, which preallocates a bunch of nodes in a
 single contiguous block of memory.

They have changed. Collisions are now handled via finding the next 
available slot in the array. However, elements of the array are still 
pointers to entries.

So technically, the freelist is still needed. But I'm not sure 
preallocating into a contiguous block is the best approach. People don't 
expect a hashtable to hold onto all the memory it has used in the past 
when you remove most of the elements.

In dcollections, I let the allocator take care of the details of 
allocation. I think the best thing to do is to preallocate the array at 
least (so no rehashing occurs during additions of nodes), and then move 
towards a templated solution that can use a custom allocator for tuning 
the performance/memory usage tradeoff.

-Steve

Nov 03 2016

safety0ff <safety0ff.dev gmail.com> writes:

On Thursday, 3 November 2016 at 13:19:17 UTC, Steven 
Schveighoffer wrote:
 So technically, the freelist is still needed.

In case I wasn't clear in my previous post:
We can't use a freelist because it breaks safety.

Nov 03 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/3/16 11:48 PM, safety0ff wrote:
 On Thursday, 3 November 2016 at 13:19:17 UTC, Steven Schveighoffer wrote:
 So technically, the freelist is still needed.

 In case I wasn't clear in my previous post:
 We can't use a freelist because it breaks safety.

Yes, you can't use a free-list when removing nodes. AA used to actually 
free the memory, but that was decided against long ago.

There are so many possibilities if we just move to a templated type.

-Steve

Nov 03 2016

Yuxuan Shui <yshuiv7 gmail.com> writes:

On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
Alexandrescu wrote:
 [snip]

 * The implementation cannot use Phobos' allocator because it's 
 in druntime.

How about let the user provide the memory block when they calls 
reserve()?

 Please share your thoughts.


 Thanks,

 Andrei

Nov 03 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Thursday, 3 November 2016 at 21:00:56 UTC, Yuxuan Shui wrote:
 How about let the user provide the memory block when they calls 
 reserve()?

  I'd love to see this type of option almost everywhere.

Nov 03 2016

Shachar Shemesh <shachar weka.io> writes:

On 03/11/16 23:00, Yuxuan Shui wrote:
 On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei Alexandrescu wrote:
 [snip]

 * The implementation cannot use Phobos' allocator because it's in
 druntime.

 How about let the user provide the memory block when they calls reserve()?

I think that's just a halfhearted attempt at introducing STL's allocators.

Shachar

Nov 04 2016

Alexandru Caciulescu <alexandru.razvan.c gmail.com> writes:

On Friday, 4 November 2016 at 09:37:39 UTC, Shachar Shemesh wrote:
 On 03/11/16 23:00, Yuxuan Shui wrote:
 On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
 Alexandrescu wrote:
 [snip]

 * The implementation cannot use Phobos' allocator because 
 it's in
 druntime.

 How about let the user provide the memory block when they 
 calls reserve()?

 I think that's just a halfhearted attempt at introducing STL's 
 allocators.

I see this topic started a clash of opinions regarding the future 
of AAs. After Andrei suggested a free-list implementation I had a 
good idea of how to proceed but now I am not so sure since this 
discussion isn't converging to a single idea/implementation.

Nov 05 2016

Jon Degenhardt <jond noreply.com> writes:

On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru Caciulescu 
wrote:
 On Friday, 4 November 2016 at 09:37:39 UTC, Shachar Shemesh 
 wrote:
 On 03/11/16 23:00, Yuxuan Shui wrote:
 On Wednesday, 2 November 2016 at 03:36:42 UTC, Andrei 
 Alexandrescu wrote:
 [snip]

 * The implementation cannot use Phobos' allocator because 
 it's in
 druntime.

 How about let the user provide the memory block when they 
 calls reserve()?

 I think that's just a halfhearted attempt at introducing STL's 
 allocators.

 I see this topic started a clash of opinions regarding the 
 future of AAs. After Andrei suggested a free-list 
 implementation I had a good idea of how to proceed but now I am 
 not so sure since this discussion isn't converging to a single 
 idea/implementation.

To make a concrete proposal - For near-term, against current AAs, 
add a 'reserve()' function that pre-allocates the bucket array, 
but not the memory for elements. This would be consistent with 
the current implementation. It looks like it could be done via a 
call to the 'resize()' function. It would be similar to the 
'rehash()' function (see _aaRehash() in aaA.d). Memory allocation 
and management for inserted elements is unchanged, it continues 
to operate as current. This approach would be similar to how the 
C++ std::unordered_map::reserve() function works.

This still permits a longer term approach adding template-based 
hash tables in the future, with various options for memory 
management, etc.

I think this suggestion is consistent with Steve Schveighoffer's 
suggestion earlier in the thread.

--Jon

Nov 05 2016

safety0ff <safety0ff.dev gmail.com> writes:

On Sunday, 6 November 2016 at 03:28:20 UTC, Jon Degenhardt wrote:
 On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru 
 Caciulescu wrote:
 I see this topic started a clash of opinions regarding the 
 future of AAs. After Andrei suggested a free-list 
 implementation I had a good idea of how to proceed but now I 
 am not so sure since this discussion isn't converging to a 
 single idea/implementation.

 [Snip]

 I think this suggestion is consistent with Steve 
 Schveighoffer's suggestion earlier in the thread.

 --Jon

I agree with what Jon Degenhardt said. It also seems to be in 
agreement with what Shachar Shemesh and Steven Schveighoffer 
wrote regarding the implementation of a reserve function for the 
built in AAs.

Nov 06 2016

Jon Degenhardt <noreply noreply.com> writes:

On Sunday, 6 November 2016 at 09:32:33 UTC, safety0ff wrote:
 On Sunday, 6 November 2016 at 03:28:20 UTC, Jon Degenhardt 
 wrote:
 On Sunday, 6 November 2016 at 02:12:12 UTC, Alexandru 
 Caciulescu wrote:
 I see this topic started a clash of opinions regarding the 
 future of AAs. After Andrei suggested a free-list 
 implementation I had a good idea of how to proceed but now I 
 am not so sure since this discussion isn't converging to a 
 single idea/implementation.

 [Snip]

 I think this suggestion is consistent with Steve 
 Schveighoffer's suggestion earlier in the thread.

 --Jon

 I agree with what Jon Degenhardt said. It also seems to be in 
 agreement with what Shachar Shemesh and Steven Schveighoffer 
 wrote regarding the implementation of a reserve function for 
 the built in AAs.

Alex, Andrei - Any updates on pursuing this?

Jan 04 2017

Alexandru Caciulescu <alexandru.razvan.c gmail.com> writes:

On Wednesday, 4 January 2017 at 08:00:14 UTC, Jon Degenhardt 
wrote:
 Alex, Andrei - Any updates on pursuing this?

https://github.com/dlang/druntime/pull/1929#pullrequestreview-67691304

Oct 09 2017

Shachar Shemesh <shachar weka.io> writes:

On 02/11/16 05:36, Andrei Alexandrescu wrote:
 Cons:

 * Built-in hashtables are a convenience facility more than a
 high-performance one, so providing an efficiency-oriented primitive
 seems a bit odd.

I think that, in itself, is a mistake. A language that thinks of itself 
as a system programming language cannot, I think, provide hash functions 
that are designed to be "convenience over performance".

I also think the basic approach is flawed. If we're talking about GC 
backs storage for everything, then "reserve" should not attempt to 
pre-allocate memory to the values. Instead, "reserve" should set the 
hash table size.

In other words, if I call "hash.reserve(10_000)", and then insert 10,000 
elements, I'm less concerned with allocations for the values (especially 
if it's defined to be GC controlled and non-contiguous anyways), and 
more concerned about it not rehashing while I insert due to the table 
getting full.

To me, "reserve" is more about preventing copying data around due to the 
data structure being surprised from the number of elements it needs to 
handle than it is about preventing any allocations during the insert. If 
the values are stored in a linked list, and do not need to be copied no 
matter what, then reserve need not do anything about them at all.

Lastly, if "reserve" doesn't do anything under a certain implementation 
because it makes no sense, that is not a problem as far as I'm 
concerned. If an implementation does not gain performance from calling 
"reserve", that is not a reason not to have the function. It can just be 
a no-op, and I'm fine with calling it an "implementation detail". As 
such, I don't think it is a problem for future changes to the underlying 
structure.

Shachar

Nov 04 2016

D Programming

C/C++ Programming

Other

digitalmars.D - https://issues.dlang.org/show_bug.cgi?id=2504: reserve for