www.digitalmars.com         C & C++   DMDScript  

D - assoc array performance

reply Ben Hinkle <bhinkle4 juno.com> writes:
well, based on Serge's eariler post I've been playing around with 
different assoc array implementations and believe it or not the current 
one is the only one that behaves well on both Windows and Linux under 
various tests that vary the table size and GC interactions.

The only trick that significantly improved performance was preallocating 
the node pointer array. I hadn't realized this before but you can 
already do this as follows:

  int[char[]] X; // an assoc array of ints indexed by strings
  void*[] Y;
  Y.length = 1000000;     // allocate array of void*
  X = cast(int[char[]])Y; // voila, a pre-allocated assoc array

Another performance boost came by using stdup with malloc to duplicate 
strings instead of .dup but if you do that you have to manage the dup'ed 
strings so they don't leak.

I tried all kinds of combinations of pre-allocating the data nodes as 
well as the node pointers and messing with the hashing function but they 
all had drawbacks when mixed with either the GC or the default pointer 
array (the default assoc array allocates 10 pointers no matter how many 
entries you eventually put in). Go figure. Either Walter got lucky or he 
actually tested this stuff out ;-)

-Ben
Apr 17 2004
next sibling parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Thanks Ben!

I've been tearing my hair out trying to figure a way to preallocate AA's.
Was just about to post to the "What's needed for 1.0" thread regarding this.
Would be nice/appropriate if there was a cleaner 'syntax' that the example
below <g>

BTW, did you find a quick way to delete all entries?

- Kris


"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:c5sgbp$2p8j$1 digitaldaemon.com...
 well, based on Serge's eariler post I've been playing around with
 different assoc array implementations and believe it or not the current
 one is the only one that behaves well on both Windows and Linux under
 various tests that vary the table size and GC interactions.

 The only trick that significantly improved performance was preallocating
 the node pointer array. I hadn't realized this before but you can
 already do this as follows:

   int[char[]] X; // an assoc array of ints indexed by strings
   void*[] Y;
   Y.length = 1000000;     // allocate array of void*
   X = cast(int[char[]])Y; // voila, a pre-allocated assoc array

 Another performance boost came by using stdup with malloc to duplicate
 strings instead of .dup but if you do that you have to manage the dup'ed
 strings so they don't leak.

 I tried all kinds of combinations of pre-allocating the data nodes as
 well as the node pointers and messing with the hashing function but they
 all had drawbacks when mixed with either the GC or the default pointer
 array (the default assoc array allocates 10 pointers no matter how many
 entries you eventually put in). Go figure. Either Walter got lucky or he
 actually tested this stuff out ;-)

 -Ben
Apr 17 2004
parent reply Ben Hinkle <bhinkle4 juno.com> writes:
 BTW, did you find a quick way to delete all entries?
either void*[] Y; Y.length = 0; // not really needed since Y has 0 length by default X = cast(int[char[]])Y; to dump the node pointer array entirely. If you want to preserve the array (for instance you had preallocated the pointer array) and just clear out all the pointers try void*[] Y = cast(void*[])X; Y[] = null; // set all the node pointers to null X = cast(int[char[]])Y; Isolating code like that would be nice since it depends on implementation details of assoc arrays. -Ben
Apr 17 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Cheers! Really appreciate that.

Let's hope Walter will add some properties for doing this ...


"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:c5sr12$7q4$1 digitaldaemon.com...
 BTW, did you find a quick way to delete all entries?
either void*[] Y; Y.length = 0; // not really needed since Y has 0 length by default X = cast(int[char[]])Y; to dump the node pointer array entirely. If you want to preserve the array (for instance you had preallocated the pointer array) and just clear out all the pointers try void*[] Y = cast(void*[])X; Y[] = null; // set all the node pointers to null X = cast(int[char[]])Y; Isolating code like that would be nice since it depends on implementation details of assoc arrays. -Ben
Apr 17 2004
prev sibling parent reply "Scott Egan" <scotte tpg.com.aux> writes:
Isn't this dangerous?  You're relying on the semantics of how an Assoc array
is implemented by the compiler.

I would have thought the X.length = 100000 would work, but I'm obviously
missing a subtle point.

Regards,

Scott


"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:c5sgbp$2p8j$1 digitaldaemon.com...
 well, based on Serge's eariler post I've been playing around with
 different assoc array implementations and believe it or not the current
 one is the only one that behaves well on both Windows and Linux under
 various tests that vary the table size and GC interactions.

 The only trick that significantly improved performance was preallocating
 the node pointer array. I hadn't realized this before but you can
 already do this as follows:

   int[char[]] X; // an assoc array of ints indexed by strings
   void*[] Y;
   Y.length = 1000000;     // allocate array of void*
   X = cast(int[char[]])Y; // voila, a pre-allocated assoc array

 Another performance boost came by using stdup with malloc to duplicate
 strings instead of .dup but if you do that you have to manage the dup'ed
 strings so they don't leak.

 I tried all kinds of combinations of pre-allocating the data nodes as
 well as the node pointers and messing with the hashing function but they
 all had drawbacks when mixed with either the GC or the default pointer
 array (the default assoc array allocates 10 pointers no matter how many
 entries you eventually put in). Go figure. Either Walter got lucky or he
 actually tested this stuff out ;-)

 -Ben
Apr 18 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
"Scott Egan" <scotte tpg.com.aux> wrote in message
news:c5tels$175v$1 digitaldaemon.com...
 Isn't this dangerous?  You're relying on the semantics of how an Assoc
array
 is implemented by the compiler.
Sure it is, Scott. But at least this is a workaround, until Walter is "encouraged" by enough of us to add such properties to AAs <g>
Apr 18 2004