www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 2105] New: Manual Memory Management for Associative Arrays

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105

           Summary: Manual Memory Management for Associative Arrays
           Product: D
           Version: 2.013
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: dsimcha yahoo.com


Due to the nature of conservative garbage collection, very large dynamic and
associative arrays are often not freed properly by the Phobos garbage
collector.  This is due to the high probability of a bit pattern on the stack,
when interpreted as a pointer, pointing to heap space occupied by these
structures.  In the case of dynamic arrays, this can be worked around by simply
deleting the very large array manually:

uint[] foo=new uint[100_000_000];
//stuff
delete foo;

However, if foo is an associative array, this cannot be done.

uint[string] foo;
//Put a large amount of data into array.
delete foo;  //Compile time error.

Furthermore, looping through such an array, such as:

foreach(i, bar; foo) {
    foo.remove(i);
}

fails to prevent memory leaks.  What is needed is a method of manually deleting
associative arrays that assumes that does not rely on the garbage collector.


-- 
May 13 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #1 from mmoncure gmail.com  2008-05-17 10:45 -------
You can declare the array 'scope'...does this help?


-- 
May 17 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #2 from dsimcha yahoo.com  2008-05-17 12:25 -------
Good idea, but surprisingly, no, declaring the array as scope does not solve
the problem for either the dynamic or the associative array.  I honestly have
no idea why.


-- 
May 17 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #3 from mmoncure gmail.com  2008-05-19 07:52 -------
(In reply to comment #2)
 Good idea, but surprisingly, no, declaring the array as scope does not solve
 the problem for either the dynamic or the associative array.  I honestly have
 no idea why.

Is it possible to get a short program demonstrating the issue? --
May 19 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105


dsimcha yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|2.013                       |2.014




------- Comment #4 from dsimcha yahoo.com  2008-05-19 11:20 -------
There really isn't much to it, but here's basically the canonical example:

import std.gc, std.stdio;

void main () {
    uint count;
    while(true) {
        test();
        fullCollect();
        writefln(++count);
    }
    return;
}

void test() {
    scope uint[uint] foo;
    for(uint i=0; i < 10_000_000; i++) {
        foo[i]=i;
    }
    return;
}

Basically, the only reference to foo goes out of scope on every cycle through
the main loop, and should obviously be freed.  However, with each iteration,
this program will use progressively more memory, even when fullCollect() is
called explicitly.  This also happens with very large dynamic arrays, but this
is less of a problem, since, at least for the simple cases, it's easy enough to
just free them manually using the delete keyword.

Also note that I have tested this on 2.014 and it still happens, so I changed
the version number.  


-- 
May 19 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #5 from mmoncure gmail.com  2008-05-19 13:47 -------
(In reply to comment #4)
 There really isn't much to it, but here's basically the canonical example:
 
 import std.gc, std.stdio;
 
 void main () {
     uint count;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
     return;
 }
 
 void test() {
     scope uint[uint] foo;
     for(uint i=0; i < 10_000_000; i++) {
         foo[i]=i;
     }
     return;
 }
 
 Basically, the only reference to foo goes out of scope on every cycle through
 the main loop, and should obviously be freed.  However, with each iteration,
 this program will use progressively more memory, even when fullCollect() is
 called explicitly.  This also happens with very large dynamic arrays, but this
 is less of a problem, since, at least for the simple cases, it's easy enough to
 just free them manually using the delete keyword.
 
 Also note that I have tested this on 2.014 and it still happens, so I changed
 the version number.  

I tested the above program with both gdc v0.25 and dmd v1.028 on linux. In both cases the memory increased until the first insert loop completed and then reached steady state (I dropped the iterations down a bit to make it more obvious). When I injected a sleep(1) after the gc collect, memory usage drop back to baseline was noticeable in top with a low refresh interval. Maybe this is a 2.0 bug?? merlin --
May 19 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #6 from dsimcha yahoo.com  2008-05-19 16:06 -------
If by dropping the iterations down, you mean changing the loop counter in
test(), which controls the array size, from 10_000_000 to a smaller number,
this misses the point of this bug.  It only happens on very large arrays.  From
the informal testing I've done, it seems that size in bytes, not size in
elements, is what matters.  This also makes sense in terms of the tentative
diagnosis of this as a conservative GC issue.  If I drop the array size down a
few notches to 5_000_000, the memory size reaches steady state as you describe,
using GDC 2.014.

Also, I tried this code on GDC 0.24 (32-bit) with an array size of 20_000_000
and it seems to work there, even without the scope keyword on the Linux cluster
I'm working on now.  However, I had tried a similar version of this with a GDC
and a dynamic array instead of an associative on my Windows box a while back
and array sizes over about 13_000_000 broke it.  Don't know why this
inconsistency exists.


-- 
May 19 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105


davidl 126.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|DMD                         |Phobos




------- Comment #7 from davidl 126.com  2008-07-02 22:11 -------
It's a phobos related issue.
and also following func possibly need a fix.
in _aaRehash func : 
                len = prime_list[i];
                newb.b = new aaA*[len];

                foreach (e; aa.b)
                {
                    if (e)
                        _aaRehash_x(e);
                }

                newb.nodes = aa.nodes;
            }

            *paa.a = newb; // here the code doesn't delete the old BB struct.
where it should delete the old BB struct


-- 
Jul 02 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #8 from dsimcha yahoo.com  2008-09-07 18:22 -------
Here's a possible fix.  A one-line patch that just deletes the old array of
pointers to AA structs when rehashing is needed in aaA.d.  I will attach it. 
Once that is added, the following test case works without memory consumption
going up at all on successive iterations, even with the GC disabled.  I am not
fully confident that this patch doesn't break other stuff, since I don't know
exactly what the Phobos unit tests cover and don't cover, but I did run the
Phobos unit tests with this patch in place and they all passed.

IMHO, especially while GC is partially conservative, it is essential that
builtin types be usable without GC as much as possible.  This certainly seems
feasible in the case of AAs.

import std.gc, std.stdio;

void main () {
    std.gc.disable;
    while(true) {
        test;
        printGCStats;
    }
}

void test() {
    uint[uint] foo;
    for(uint i=0; i < 20_000_000; i++) {
        foo[i]=i;
    }
    deleteAA(foo);
}

void printGCStats() {
    std.gc.GCStats stats;
    std.gc.getStats(stats);
    foreach(s; stats.tupleof)
        write(s, "\t");
    writeln();
}

//These structs are copied and pasted from aaA.d.
struct aaA
{
    aaA *left;
    aaA *right;
    hash_t hash;
    /* key   */
    /* value */
}

struct BB
{
    aaA*[] b;
    size_t nodes;       // total number of aaA nodes
}

struct AA
{
    BB* a;
}

//Could be included in the Phobos runtime
//and called when a delete AA; line is
//encountered.  Might also be called
//automatically if AA is allocated as scope.

void deleteAA(T, U)(T[U] input) {
    auto aaPtr = cast(BB*) cast(void*) input;
    deleteBB(aaPtr);
}

void deleteBB(BB* aaPtr) {

    void rdelAA(aaA* input) {
        if(input.left !is null)
            rdelAA(input.left);
        if(input.right !is null)
            rdelAA(input.right);
        delete input;
    }

    foreach(node; aaPtr.b)
        if(node !is null)
            rdelAA(node);
    delete aaPtr.b;
    delete aaPtr;
}


-- 
Sep 07 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #9 from dsimcha yahoo.com  2008-09-07 18:26 -------
Created an attachment (id=274)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=274&action=view)
Fix for aaA.d


-- 
Sep 07 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #10 from dsimcha yahoo.com  2008-09-10 22:14 -------
Here's a slightly updated version of deleteAA() that allows a reference to be
reused after the AA is deleted, so that deleteAA() can be used as both a memory
management tool and a clear function.

void deleteAA(T, U)(ref T[U] input) {
    auto aaPtr = cast(BB*) cast(void*) input;
    deleteBB(aaPtr);
     //Allows AA reference to be reused, i.e. this can be used as a clear
function.
    input = (T[U]).init;
}

void deleteBB(BB* aaPtr) {

    void rdelAA(aaA* input) {
        if(input.left !is null)
            rdelAA(input.left);
        if(input.right !is null)
            rdelAA(input.right);
        delete input;
    }

    foreach(node; aaPtr.b)
        if(node !is null)
            rdelAA(node);
    delete aaPtr.b;
    delete aaPtr;
}


-- 
Sep 10 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105


bugzilla digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Comment #11 from bugzilla digitalmars.com  2008-12-25 04:38 -------
Fixed dmd 1.038 amd 2.022


-- 
Dec 25 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105


dsimcha yahoo.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




------- Comment #12 from dsimcha yahoo.com  2009-01-01 23:38 -------
The patch only partially resolves the issue.  There is still no way to manually
delete AAs, it's just that now, old arrays of pointers to AA structs is
deterministically freed when the array is rehashed.  The other part was that
there should be a delete statement for AAs.


-- 
Jan 01 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=2105





------- Comment #13 from mmoncure gmail.com  2009-01-13 06:59 -------
I would be personally satisfied if AA memory was cleaned up if they are defined
'scope' when they leave scope...is this the case?


-- 
Jan 13 2009