www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC-less Hash-Tables (AA)

reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
Have anybody cooked any GC-less variants of hash-tables 
(associative arrays) that take keys and values with value 
semantics only.

Similar to how

     X[]

relates to

     std.containers.Array!X

I need this to index my nodes in graphs with tens of millions of 
nodes.
Sep 17 2014
next sibling parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 10:39:07 UTC, Nordlöw wrote:
 Have anybody cooked any GC-less variants of hash-tables 
 (associative arrays) that take keys and values with value 
 semantics only.
Important follow-up question: If types of key and value all have value semantics (no indirections) will the hash-table still use the GC for allocations?
Sep 17 2014
prev sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Wed, 17 Sep 2014 10:39:05 +0000, Nordlöw wrote:

 Have anybody cooked any GC-less variants of hash-tables (associative
 arrays) that take keys and values with value semantics only.
 
 Similar to how
 
      X[]
 
 relates to
 
      std.containers.Array!X
 
 I need this to index my nodes in graphs with tens of millions of nodes.
See our hashmap and hashset implementations here: https://github.com/economicmodeling/containers/tree/master/src/containers These containers are all certified GC-free.
Sep 17 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 15:27:40 UTC, Justin Whear 
wrote:
 These containers are all certified GC-free.
Superb! One question though: AFAIK a builtin hash-table in D shouldn't require nor use any GC-allocations if the keys and values all have reference semantics right (no string class, nor member indirections)? What is the current status on this in dmd/druntime?
Sep 17 2014
parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Wed, Sep 17, 2014 at 07:23:01PM +0000, "Nordlw" via Digitalmars-d-learn
wrote:
 On Wednesday, 17 September 2014 at 15:27:40 UTC, Justin Whear wrote:
These containers are all certified GC-free.
Superb! One question though: AFAIK a builtin hash-table in D shouldn't require nor use any GC-allocations if the keys and values all have reference semantics right (no string class, nor member indirections)? What is the current status on this in dmd/druntime?
How do you implement a completely GC-free AA with no limit on number of entries stored? I mean, where would it get the memory to store the hashtable from? T -- Once bitten, twice cry...
Sep 17 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 How do you implement a completely GC-free AA with no limit on 
 number of entries stored?
Ada2012 has a fixed-size hash in the standard library, it can even be allocated on the stack. But the number of entries is not unlimited. Bye, bearophile
Sep 17 2014
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 19:51:06 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 How do you implement a completely GC-free AA with no limit on 
 number of
 entries stored? I mean, where would it get the memory to store 
 the
 hashtable from?
I mean, GC-free not heap-free. An AA of course needs dynamic memory management a la C++'s std::vector.
Sep 17 2014
parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Wed, Sep 17, 2014 at 10:26:16PM +0000, "Nordlw" via Digitalmars-d-learn
wrote:
 On Wednesday, 17 September 2014 at 19:51:06 UTC, H. S. Teoh via
 Digitalmars-d-learn wrote:
How do you implement a completely GC-free AA with no limit on number
of entries stored? I mean, where would it get the memory to store the
hashtable from?
I mean, GC-free not heap-free. An AA of course needs dynamic memory management a la C++'s std::vector.
So you would use malloc/free? T -- It is of the new things that men tire --- of fashions and proposals and improvements and change. It is the old things that startle and intoxicate. It is the old things that are young. -- G.K. Chesterton
Sep 17 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 22:36:55 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 So you would use malloc/free?
Yes, with GC-free I mean *not* involving D's automatic garbage collector for the individual allocations of the AA *keys* and *values*. How are these normally allocated in a hash-table?...as individual heap allocations?
Sep 17 2014
parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Wed, Sep 17, 2014 at 10:44:17PM +0000, "Nordlw" via Digitalmars-d-learn
wrote:
 On Wednesday, 17 September 2014 at 22:36:55 UTC, H. S. Teoh via
 Digitalmars-d-learn wrote:
So you would use malloc/free?
Yes, with GC-free I mean *not* involving D's automatic garbage collector for the individual allocations of the AA *keys* and *values*. How are these normally allocated in a hash-table?...as individual heap allocations?
Assuming you're talking about the built-in AA's, the current implementation is as follows. There is the hashtable itself, which is just an array of pointers to individual buckets. The buckets are implemented as a linked-list of Slots, which are just structs containing a next pointer, the hash value, and a copy of the key and value. Obviously, individual allocations of Slots come from the GC heap, but in theory it could come from any allocator, potentially even malloc/free, since the lifetime of the Slots is well-defined (user code generally does not have direct access to them; so the AA always knows when a Slot is going out of scope). It does *not* perform separate allocations for keys and values. This means that if your keys and values are value-types, then there is no additional allocation besides the Slots themselves. For reference types, only the reference is stored (this is one of the sources of currently-open AA bugs where unexpected behaviour happens when non-immutable reference-type keys are used in AA's: somebody else changes the key after it's hashed, and this breaks the AA's ability to find that key/value again). No additional allocation is done unless the key/value type itself has a postblit that performs additional allocations. T -- I think Debian's doing something wrong, `apt-get install pesticide', doesn't seem to remove the bugs on my system! -- Mike Dresser
Sep 17 2014
parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 22:56:17 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 Slots come from the GC heap, but in theory it could come from 
 any
 allocator, potentially even malloc/free, since the lifetime of 
 the Slots
So I guess there is room for some optimizations here right? We could use, for instance, http://dlang.org/phobos/std_traits.html#hasIndirections to check if we need to use GC. If not we can use malloc/free. Couldn't such an enhancement give significant speed-ups for small key-value allocations?
Sep 17 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 23:27:10 UTC, Nordlöw wrote:
 to check if we need to use GC. If not we can use malloc/free.
Further if they key and values are all fixed-size we should probably use some allocator instead of malloc/free. That would make more efficient use of this allocation regularity, right? Is the current AA-allocator easily accessible/modifable somewhere in druntime?
Sep 17 2014
parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Wed, Sep 17, 2014 at 11:43:27PM +0000, "Nordlw" via Digitalmars-d-learn
wrote:
 On Wednesday, 17 September 2014 at 23:27:10 UTC, Nordlw wrote:
to check if we need to use GC. If not we can use malloc/free.
Further if they key and values are all fixed-size we should probably use some allocator instead of malloc/free. That would make more efficient use of this allocation regularity, right?
Keys and values are *always* fixed sized. Either they are a fixed-sized value type, or they are a fixed-sized reference to some object (of reference type -- the references themselves are of course constant sized). We don't care about managing the resources for the reference type because that's supposed to be the user's problem.
 Is the current AA-allocator easily accessible/modifable somewhere in
 druntime?
Well, (most of) the code is in druntime src/rt/aaA.d and while it's certainly *modifiable*, let's just say it's not a very pleasant experience to do so. :-) There is still a lot of compiler-dependency built into the code, and one of the major obstacles to making significant improvements is the reliance on typeinfo's rather than compile-time type information via templates. Ideally it should be a fully-decoupled library implementation interfacing with the compiler via a set API, but we're still some ways off from that right now. If you'd like to help it reach that point, it would be most welcome, but be aware that right now, doing things like changing the allocation scheme is not going to be an easy change. T -- To provoke is to call someone stupid; to argue is to call each other stupid.
Sep 17 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 23:57:03 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 compile-time type information via templates. Ideally it should 
 be a
 fully-decoupled library implementation interfacing with the 
 compiler via
 a set API, but we're still some ways off from that right now. 
 If you'd
 like to help it reach that point, it would be most welcome, but 
 be aware
 that right now, doing things like changing the allocation 
 scheme is not
 going to be an easy change.
Very interesting, indeed. Do you have an D issue (list) to begin with? Thanks for now.
Sep 18 2014
prev sibling parent "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
On Wed, Sep 17, 2014 at 11:27:08PM +0000, "Nordlw" via Digitalmars-d-learn
wrote:
 On Wednesday, 17 September 2014 at 22:56:17 UTC, H. S. Teoh via
 Digitalmars-d-learn wrote:
Slots come from the GC heap, but in theory it could come from any
allocator, potentially even malloc/free, since the lifetime of the
Slots
So I guess there is room for some optimizations here right? We could use, for instance, http://dlang.org/phobos/std_traits.html#hasIndirections to check if we need to use GC. If not we can use malloc/free. Couldn't such an enhancement give significant speed-ups for small key-value allocations?
All of this has been thought of before. The big obstacle is that right now, the AA implementation is split across dmd hacks, druntime, and object.di, and relies on typeinfo's rather than compile-time type information, so it's not easy to implement these improvements. Recently, there has been a slow but steady trickle of pull requests that have begun to address the implementation issues, and hopefully in the not-so-distant future we will be able to finally decouple the AA implementation from dmd, and stand a fighting chance of moving it into a library implementation. Once it's in the library, the improvements you mention will be trivial to implement. T -- Help a man when he is in trouble and he will remember you when he is in trouble again.
Sep 17 2014
prev sibling next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 15:27:40 UTC, Justin Whear 
wrote:
 These containers are all certified GC-free.
What's the difference between std.containers.Array and DynamicArray? The RC?
Sep 17 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Wednesday, 17 September 2014 at 19:39:16 UTC, Nordlöw wrote:
 On Wednesday, 17 September 2014 at 15:27:40 UTC, Justin Whear 
 wrote:
 These containers are all certified GC-free.
What'a the cost of using hashset in favour of hashmap if I want the to use the Range range() property?
Sep 17 2014
prev sibling next sibling parent "Brian Schott" <briancschott gmail.com> writes:
On Wednesday, 17 September 2014 at 15:27:40 UTC, Justin Whear 
wrote:
 These containers are all certified GC-free.
With the exception of getting keys and values arrays. Those return GC memory to the caller. I'm pretty sure it's documented that it does that though. Everything else uses allocators.
Sep 17 2014
prev sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 See our hashmap and hashset implementations here:
 https://github.com/economicmodeling/containers/tree/master/src/containers

 These containers are all certified GC-free.
I get loads of erros on DMD 2.066: memory/allocators.d(81,4): Error: pure function 'memory.allocators.BlockAllocator!1024LU.BlockAllocator.~this' cannot access mutable static data 'it' memory/allocators.d(81,4): Error: pure function 'memory.allocators.BlockAllocator!1024LU.BlockAllocator.~this' cannot access mutable static data 'it' memory/allocators.d(81,28): Error: pure function 'memory.allocators.BlockAllocator!1024LU.BlockAllocator.~this' cannot call impure function 'std.allocator.Mallocator.deallocate' memory/allocators.d(81,28): Error: 'std.allocator.Mallocator.deallocate' is not nothrow memory/allocators.d(72,2): Error: destructor 'memory.allocators.BlockAllocator!1024LU.BlockAllocator.~this' is nothrow yet may throw memory/allocators.d(20,34): Error: template instance memory.allocators.BlockAllocator!1024LU error instantiating /home/per/Work/justd/containers/hashmap.d(339,29): instantiated from here: NodeAllocator!(32LU, 1024LU) /home/per/Work/justd/containers/hashmap.d(13,18): instantiated from here: HashMap!(string, int, hashString) /home/per/Work/justd/containers/hashmap.d(351,12): instantiated from here: HashMap!(string, int) /home/per/Work/justd/containers/hashmap.d(340,17): Error: template instance Freelist!(BlockAllocator!1024LU, 32LU, 32LU) is used as a type /home/per/Work/justd/containers/hashmap.d(341,22): Error: template instance Freelist!(BlockAllocator!1024LU, 32LU, 32LU) is used as a type /home/per/Work/justd/containers/hashmap.d(342,11): Error: template instance SList!(Node, SListNodeAllocator*) is used as a type /home/per/Work/justd/containers/hashmap.d(55,3): Error: undefined identifier deallocate, did you mean template reallocate(Allocator)(ref Allocator a, ref void[] b, size_t s)? /home/per/Work/justd/containers/hashmap.d(340,17): Error: template instance Freelist!(BlockAllocator!1024LU, 8LU, 8LU) is used as a type /home/per/Work/justd/containers/hashmap.d(341,22): Error: template instance Freelist!(BlockAllocator!1024LU, 8LU, 8LU) is used as a type /home/per/Work/justd/containers/hashmap.d(342,11): Error: template instance SList!(Node, SListNodeAllocator*) is used as a type /home/per/Work/justd/containers/hashmap.d(55,3): Error: undefined identifier deallocate, did you mean template reallocate(Allocator)(ref Allocator a, ref void[] b, size_t s)? /home/per/Work/justd/containers/hashmap.d(19,18): Error: template instance containers.hashmap.HashMap!(char, char, builtinHash) error instantiating /home/per/Work/justd/containers/hashmap.d(372,13): instantiated from here: HashMap!(char, char) memory/allocators.d(81,4): Error: pure function 'memory.allocators.BlockAllocator!512LU.BlockAllocator.~this' cannot access mutable static data 'it' memory/allocators.d(81,4): Error: pure function 'memory.allocators.BlockAllocator!512LU.BlockAllocator.~this' cannot access mutable static data 'it' memory/allocators.d(81,28): Error: pure function 'memory.allocators.BlockAllocator!512LU.BlockAllocator.~this' cannot call impure function 'std.allocator.Mallocator.deallocate' memory/allocators.d(81,28): Error: 'std.allocator.Mallocator.deallocate' is not nothrow memory/allocators.d(72,2): Error: destructor 'memory.allocators.BlockAllocator!512LU.BlockAllocator.~this' is nothrow yet may throw memory/allocators.d(20,34): Error: template instance memory.allocators.BlockAllocator!512LU error instantiating Compilation finished at Thu Sep 18 12:20:19
Sep 18 2014
parent =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
On Thursday, 18 September 2014 at 10:21:29 UTC, Nordlöw wrote:
 These containers are all certified GC-free.
I get loads of erros on DMD 2.066:
My mistake. I hade accidentally used another version of std.allocator.
Sep 18 2014