www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2504] New: Reserve for associative arrays

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2504

           Summary: Reserve for associative arrays
           Product: D
           Version: 2.021
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: dsimcha yahoo.com


It appears that adding an element to an associative array always triggers a
memory allocation.  Especially in multithreaded code, this is inefficient.  It
would be nice if associative arrays had a .reserve(size_t n) property, which
reserved enough space for n objects, and stored the capacity internally.  The
idea is that, until the reserve buffer is exhausted, no interaction of any kind
with the GC would be needed to add an element to the AA.


-- 
Dec 10 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2504


ZY Zhou <rinick gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rinick gmail.com


--- Comment #1 from ZY Zhou <rinick gmail.com> 2009-09-23 01:38:43 PDT ---
(In reply to comment #0)
 It appears that adding an element to an associative array always triggers a
 memory allocation.  Especially in multithreaded code, this is inefficient.  It
 would be nice if associative arrays had a .reserve(size_t n) property, which
 reserved enough space for n objects, and stored the capacity internally.  The
 idea is that, until the reserve buffer is exhausted, no interaction of any kind
 with the GC would be needed to add an element to the AA.

It's not all about memory int[int] a; foreach(i;0..20_000_000) a[i] = i; //takes 2 _minutes_ foreach(i;0..20_000_000) a[i] = i; //do it again, only 2 seconds 60 times slower. If it's memory allocation problem, the result should be similar to Dynamic array: int[] t; foreach(i;0..20_000_000) t ~= i; // 2 seconds foreach(i;0..20_000_000) t[i] = i; // 0.3 seconds I guess AA rehashs the whole array when capacity changes, and rehashing is much slower than memory allocation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 23 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2504



--- Comment #2 from David Simcha <dsimcha yahoo.com> 2009-11-30 20:36:16 PST ---
No, this is because, on the second run, the program has already reserved a
bunch of memory from the OS, so the GC doesn't run as often.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 30 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2504



--- Comment #3 from ZY Zhou <rinick gmail.com> 2009-11-30 21:24:04 PST ---
(In reply to comment #2)
 No, this is because, on the second run, the program has already reserved a
 bunch of memory from the OS, so the GC doesn't run as often.

try this: import std.stdio; void main(){ int[int] a; foreach(i;0..20_000_000){ a[i] = i; if((i&0xFFFF) == 0) writeln(i); } } You can see how slow it becomes when AA is large. I don't think memory allocation could take so much time. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 30 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2504


Michael Rynn <y0uf00bar gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |y0uf00bar gmail.com


--- Comment #4 from Michael Rynn <y0uf00bar gmail.com> 2010-04-17 04:50:13 PDT
---
See enhancement 4098.
I have experimented with a NodeHeap for the AA, that allocates Nodes in much
bigger blocks, and this does speed by up a considerable amount the insertion
and cleanup times. The NodeHeap idea is a single sized block dedicated heap
manager created for each AA instance.

No advantage at all for lookup times.

The disadvantage is that Nodes released by a remove, are not returned to the GC
pool until the entire block of Nodes has been removed. Of course when it does
get freed, its in a big blocks at once. Also when allocating Nodes one after
the other, I suppose a smart memory heap might be carving up much bigger blocks
anyway, with various Pools for different object sizes.  

Having a manual AA.clear helps the GC as well. 

Even so, having a node heap and using for the first time, the system must get a
new big bunch of memory, and maybe that will always take some time.

Its not in the current 4098 set (which is a lot to swallow already), but I it
could be an optional run time extra parameter to the HashMap/HashSet setup
wrapper. It uses lots of really small nodes.

The AA management object would then have a non-null heap manager object, to
allocate and free blocks from a dedicated instance pool (Just like the Tango
HashMap, which does exactly this).

I have already proposed to add some more fields to the hash map management
object, and one more non-default runtime option (only available from HashMap
template wrapper) will hopefully not be a heap of trouble.

If it was an option, would people use it wisely?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 17 2010