www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Memory Efficient HashSet

reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Has anybody put together a memory-efficient D-implementation of a 
HashSet

Something like

sparse_hash_set<> contained in

https://github.com/sparsehash/sparsehash

but in D.
Mar 08 2016
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash
It appears to be very slow? What do you need it for?
Mar 08 2016
parent reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 8 March 2016 at 12:25:36 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash
It appears to be very slow? What do you need it for?
My knowledge database engine I'm building cannot afford the memory overhead of D's builtin associative arrays. For instance the following program void main(string[] args) { alias T = uint; bool[T] x; // emulates a HashSet with significant memory overhead import std.range : iota; const n = 10_000_000; foreach (const i; iota(0, n)) { x[i] = true; // indicate that it's stored } import std.stdio : writeln, readln; writeln("Press return: "); readln; } consumes 842.m MiB on my Ubuntu. The consumption with Google's sparsehash unordered_set is about 50 MiB.
Mar 09 2016
next sibling parent reply Martin Tschierschke <mt smartdolphin.de> writes:
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
[...]
     foreach (const i; iota(0, n))
     {
         x[i] = true; // indicate that it's stored
     }

     import std.stdio : writeln, readln;
     writeln("Press return: ");
     readln;
 }

 consumes 842.m MiB on my Ubuntu.

 The consumption with Google's sparsehash unordered_set is about 
 50 MiB.
May be the discussion associated with this post, can give help??? https://forum.dlang.org/post/mailman.217.1457590242.26339.digitalmars-d-bugs puremagic.com I tried your little program and played around. I tried x.rehash, but this resulted in even more memory consumption. 80 bytes for 4 bytes key and one bit storage seams a lot. With smaller n it comes down to app. 60 Bytes. With how many data sets at the end, you will have to deal?
Mar 10 2016
parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 10 March 2016 at 10:15:10 UTC, Martin Tschierschke 
wrote:
 With how many data sets at the end, you will have to deal?
I need something close to the memory overhead of a sorted array of values. That is why I'm considering converting SparseHash's sparse_hash_set to D.
Mar 10 2016
prev sibling next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 The consumption with Google's sparsehash unordered_set is about 
 50 MiB.
See also: http://smerity.com/articles/2015/google_sparsehash.html
Mar 10 2016
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 My knowledge database engine I'm building cannot afford the 
 memory overhead of D's builtin associative arrays.
Sounds like a cool project! You could also look into using a trie.
Mar 10 2016
prev sibling parent thedeemon <dlang thedeemon.com> writes:
On Wednesday, 9 March 2016 at 22:31:50 UTC, Nordlöw wrote:
 consumes 842.m MiB on my Ubuntu.
Here's mine: https://bitbucket.org/infognition/robinhood/src (you just need one file rbhash.d to use in your apps) The following test takes ~130 MB and can take less with some tweaks in the settings: import std.stdio, rbhash; alias Set(T) = RHHash!(T, void); void main() { auto set = new Set!uint; foreach(i; 0..10_000_000) set.add(i); writeln(set.contains(444)); readln; } It's a GC-free Robin Hood hash table implementation with special treatment for void as value type, i.e. sets.
Mar 10 2016
prev sibling next sibling parent Edwin van Leeuwen <edder tkwsping.nl> writes:
On Tuesday, 8 March 2016 at 08:12:04 UTC, Nordlöw wrote:
 Has anybody put together a memory-efficient D-implementation of 
 a HashSet

 Something like

 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

 but in D.
There is an implementation in: code.dlang.org/packages/emsi_containers But to be honest I got stuck trying to use it (copy constructor disabled), so I used this very minimal wrapper around associative array: private struct HashSet(E) { // TODO switch to proper implementation (not using AA) bool put( E el ) { if ( el in set ) return false; set[el] = set.length; return true; } size_t[E] set; } (I only needed put, since I used it to check whether we already encountered a value before in a lazy/non sorted implementation of uniq)
Mar 08 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 3/8/16 3:12 AM, Nordlöw wrote:
 Has anybody put together a memory-efficient D-implementation of a HashSet

 Something like

 sparse_hash_set<> contained in

 https://github.com/sparsehash/sparsehash

 but in D.
D needs a way to use allocators for hash nodes. In Dcollections, both the performance and memory usage I was able to optimize by using a specialized allocator that allocates in pages and then divides up the pages. -Steve
Mar 10 2016