digitalmars.D - Array Capacity Field

dsimcha (13/13) May 07 2008 I know it's been talked about before, but what is the status of the arra...

Sean Kelly (3/16) May 07 2008 Use std.gc.capacity() in Phobos and tango.core.Memory.GC.sizeOf() in Tan...

dsimcha (13/13) May 07 2008 According to the Phobos docs, std.gc.capacity "Returns capacity (size of...

Sean Kelly (12/27) May 07 2008 Right. And since resizing an interior slice "in place" is typically

dsimcha (25/25) May 07 2008 Because of some difficulties encountered using D for large arrays (See p...

dsimcha (11/36) May 07 2008 After further testing, I've found an exact threshold for this bug. When...

Sean Kelly (14/52) May 07 2008 I'm not sure what to say. This sample works fine with D 1.0 using

Sean Kelly (24/100) May 08 2008 Well whatever's going on, it's not tracked by gc_stats(). I modified

dsimcha (3/103) May 08 2008 Another interesting point: This bug also occurs on GDC 0.24, but here t...

dsimcha (11/114) May 09 2008 After looking into this further, it appears it's a fundamental issue wit...

bearophile (30/31) May 09 2008 Time ago I have asked that question, and someone has shown me a way, you...

Fawzi Mohamed (7/18) May 09 2008 Nice that there is a workaround, but shouldn't an AA:

Sean Kelly (6/23) May 09 2008 The GC just deals in memory blocks, so the nodes of a chained AA

dsimcha (25/56) May 09 2008 This appears to be analogous to simly setting the length field to zero i...

Sivo Schilling (28/66) May 08 2008 This bug also occurs using DMD 1.029. It seems one have to help phobos

dsimcha (10/75) May 08 2008 Acutally, if I stick a delete in my original test case at the end of tes...

Robert Jacques (9/9) May 09 2008 dsimcha and I debugged this last night. It appears it’s a case of the ...

Nick B (4/19) May 08 2008 Can someone tell me what "interior pointers" are ?

torhu (4/4) May 08 2008 Nick B wrote:

Pontus Pihlgren (12/14) May 09 2008 This is my absolute favourite pet peeve with Perl, it has the same

Sean Kelly (18/35) May 09 2008 The Tango GC actually does release memory back to the OS in some

dsimcha <dsimcha yahoo.com> writes:

I know it's been talked about before, but what is the status of the array
capacity field that's independent of length?  It would be useful to have such
a thing, for example, when working on a program that uses multi-dimensional
dynamic arrays to avoid excessive wasted space.  For example:

void main () {
    uint[][] Table;
    //Read in a huge jagged table of data from a file.
    //Do stuff that involves read-only access to Table[][] and
    // requires a lot of memory.
}

The above might result in running out of memory due to the amount of wasted
space in Table[][].  If there were a way to compact Table[][] once one knows
it will not be appended to, this would solve a lot of problems.

May 07 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from dsimcha (dsimcha yahoo.com)'s article
 I know it's been talked about before, but what is the status of the array
 capacity field that's independent of length?  It would be useful to have such
 a thing, for example, when working on a program that uses multi-dimensional
 dynamic arrays to avoid excessive wasted space.  For example:
 void main () {
     uint[][] Table;
     //Read in a huge jagged table of data from a file.
     //Do stuff that involves read-only access to Table[][] and
     // requires a lot of memory.
 }
 The above might result in running out of memory due to the amount of wasted
 space in Table[][].  If there were a way to compact Table[][] once one knows
 it will not be appended to, this would solve a lot of problems.

Use std.gc.capacity() in Phobos and tango.core.Memory.GC.sizeOf() in Tango.


Sean

May 07 2008

dsimcha <dsimcha yahoo.com> writes:

According to the Phobos docs, std.gc.capacity "Returns capacity (size of the
memory block) that p  points to the beginning of. If p does not point into the
gc
memory pool, or does not point to the beginning of an allocated memory block, 0
is
returned."  I had looked through this before.

It also doesn't appear that writing my own function to do this works:

void capacity(T)(ref T[] array, uint newCapacity) {
    if(newCapacity < array.length) {
        array.length=newCapacity;
    }
    std.gc.realloc(array.ptr, T.sizeof*newCapacity);
}

When I use this after allocating a huge array, and then run a fullCollect() and
a
minimize(), the memory usage of my programs appear not to go down.

May 07 2008

Sean Kelly <sean invisibleduck.org> writes:

dsimcha wrote:
 According to the Phobos docs, std.gc.capacity "Returns capacity (size of the
 memory block) that p  points to the beginning of. If p does not point into the
gc
 memory pool, or does not point to the beginning of an allocated memory block,
0 is
 returned."  I had looked through this before.

Right.  And since resizing an interior slice "in place" is typically 
undesirable, this does exactly what you want.  If you really want the 
capacity regardless of this however, you can use 
tango.core.Memory.GC.query(), which returns the size of the block even 
for interior pointers.

 It also doesn't appear that writing my own function to do this works:
 
 void capacity(T)(ref T[] array, uint newCapacity) {
     if(newCapacity < array.length) {
         array.length=newCapacity;
     }
     std.gc.realloc(array.ptr, T.sizeof*newCapacity);
 }
 
 When I use this after allocating a huge array, and then run a fullCollect()
and a
 minimize(), the memory usage of my programs appear not to go down.

This is somewhat of a different issue.  The Phobos GC never returns any 
memory to the OS.  It's also possible to make a Phobos app run out of 
memory simply by looping on an append op (if I remember the example 
correctly).  The Tango GC behaves a bit differently, and will actually 
return memory to the OS in certain circumstances.


Sean

May 07 2008

dsimcha <dsimcha yahoo.com> writes:

Because of some difficulties encountered using D for large arrays (See previous
posts about array capacity fields), I produced the following test case that
seems
to be a reproducible bug in D 2.0.13.  The following program keeps allocating a
huge array in a function and then letting all references to this array go out of
scope.  This should result in the array being freed as soon as more memory is
needed.

import std.stdio, std.gc;

void main(){
    uint count=0;
    while(true) {
        test();
        fullCollect();
        writefln(++count);
    }
}

void test() {
    uint[] stuff=new uint[15_000_000];
}

This produced an out of memory error after 21 iterations on my machine w/ 2 GB
of
RAM.  Using an array size of 10_000_000 instead of 15_000_000, its memory usage
stabilized at 350 megs, which seems rather large since a uint[10_000_000] should
only use 40 megs, plus maybe another 40 for overhead.  Furthermore, it takes
several iterations for the memory usage to reach this level.  Using a larger
array
size, such as 100_000_000, made this test run out of memory after even fewer
iterations.  Furthermore, changing from a uint[] to real[] with a size of
15_000_000 made it run out after 8 iterations instead of 21.

May 07 2008

dsimcha <dsimcha yahoo.com> writes:

After further testing, I've found an exact threshold for this bug.  When an
array
of uints gets to 48_693_248 bytes (12_173_312 elements) this problem occurs,
after
26 iterations at the threshold, or less for larger arrays.  Anything below that,
even one element smaller, and memory usage is stable over at least hundreds of
iterations.  It appears that the number of bytes is the key, since a ulong[]
will
allow the same number of bytes (1/2 the elements) before causing problems, and a
ushort[] will allow twice as many elements (same number of bytes) without
crashing.  Furthermore, using equivalent sized floats instead of ints (float
instead of uint, double instead of ulong) or using signed ints, has no effect.

== Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays (See previous
 posts about array capacity fields), I produced the following test case that
seems
 to be a reproducible bug in D 2.0.13.  The following program keeps allocating a
 huge array in a function and then letting all references to this array go out
of
 scope.  This should result in the array being freed as soon as more memory is

needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my machine w/ 2 GB
of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its memory usage
 stabilized at 350 megs, which seems rather large since a uint[10_000_000]
should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore, it takes
 several iterations for the memory usage to reach this level.  Using a larger
array
 size, such as 100_000_000, made this test run out of memory after even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 07 2008

Sean Kelly <sean invisibleduck.org> writes:

I'm not sure what to say.  This sample works fine with D 1.0 using 
Tango, though the memory usage is strangely high at around 370 megs. 
Here's the weird thing, I tried running these two versions of the test 
function for comparison:

void test() {
     void* stuff = (new byte[15_000_000 * 4]).ptr;
}

void test() {
     void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
}

The first one uses a stable 370 megs of memory and the second a stable 
18 megs.  Obviously there's something weird going on with how arrays are 
handled.

dsimcha wrote:
 After further testing, I've found an exact threshold for this bug.  When an
array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem occurs,
after
 26 iterations at the threshold, or less for larger arrays.  Anything below
that,
 even one element smaller, and memory usage is stable over at least hundreds of
 iterations.  It appears that the number of bytes is the key, since a ulong[]
will
 allow the same number of bytes (1/2 the elements) before causing problems, and
a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints (float
 instead of uint, double instead of ulong) or using signed ints, has no effect.
 
 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays (See previous
 posts about array capacity fields), I produced the following test case that
seems
 to be a reproducible bug in D 2.0.13.  The following program keeps allocating a
 huge array in a function and then letting all references to this array go out
of
 scope.  This should result in the array being freed as soon as more memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my machine w/ 2 GB
of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its memory usage
 stabilized at 350 megs, which seems rather large since a uint[10_000_000]
should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore, it takes
 several iterations for the memory usage to reach this level.  Using a larger
array
 size, such as 100_000_000, made this test run out of memory after even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 07 2008

Sean Kelly <sean invisibleduck.org> writes:

Well whatever's going on, it's not tracked by gc_stats().  I modified 
the previous functions slightly to ensure that both allocated exactly 
the same amount of memory, then verified this via a printf inside the GC 
code--malloc calls were for 60000000 bytes:

void test() {
     void* stuff = (new byte[15_000_000 * 4 - 1]).ptr;
}

void test() {
     void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
}

Output using gc_stats() on each iteration after the collect was 
identical for both function calls after things had stabalized:

*** stats ***
poolsize: 360251392
usedsize: 640
freeblocks: 56
freelistsize: 7552
pageblocks: 6

What I don't understand is why Task Manager is reporting the GC.malloc 
version using only 18 megs while the other uses 370, since everything 
I've looked at internally so far suggests they should be the same. 
Perhaps gc_stats() isn't reporting something properly--I really haven't 
given that routine a very close look.

Sean Kelly wrote:
 I'm not sure what to say.  This sample works fine with D 1.0 using 
 Tango, though the memory usage is strangely high at around 370 megs. 
 Here's the weird thing, I tried running these two versions of the test 
 function for comparison:
 
 void test() {
     void* stuff = (new byte[15_000_000 * 4]).ptr;
 }
 
 void test() {
     void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
 }
 
 The first one uses a stable 370 megs of memory and the second a stable 
 18 megs.  Obviously there's something weird going on with how arrays are 
 handled.
 
 dsimcha wrote:
 After further testing, I've found an exact threshold for this bug.  
 When an array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem 
 occurs, after
 26 iterations at the threshold, or less for larger arrays.  Anything 
 below that,
 even one element smaller, and memory usage is stable over at least 
 hundreds of
 iterations.  It appears that the number of bytes is the key, since a 
 ulong[] will
 allow the same number of bytes (1/2 the elements) before causing 
 problems, and a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints 
 (float
 instead of uint, double instead of ulong) or using signed ints, has no 
 effect.

 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays 
 (See previous
 posts about array capacity fields), I produced the following test 
 case that seems
 to be a reproducible bug in D 2.0.13.  The following program keeps 
 allocating a
 huge array in a function and then letting all references to this 
 array go out of
 scope.  This should result in the array being freed as soon as more 
 memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my 
 machine w/ 2 GB of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its 
 memory usage
 stabilized at 350 megs, which seems rather large since a 
 uint[10_000_000] should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore, 
 it takes
 several iterations for the memory usage to reach this level.  Using a 
 larger array
 size, such as 100_000_000, made this test run out of memory after 
 even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a 
 size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 08 2008

dsimcha <dsimcha yahoo.com> writes:

Another interesting point:  This bug also occurs on GDC 0.24, but here the
threshold is lower, 33_423_360 bytes.  Must be something pretty deep.


== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Well whatever's going on, it's not tracked by gc_stats().  I modified
 the previous functions slightly to ensure that both allocated exactly
 the same amount of memory, then verified this via a printf inside the GC
 code--malloc calls were for 60000000 bytes:
 void test() {
      void* stuff = (new byte[15_000_000 * 4 - 1]).ptr;
 }
 void test() {
      void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
 }
 Output using gc_stats() on each iteration after the collect was
 identical for both function calls after things had stabalized:
 *** stats ***
 poolsize: 360251392
 usedsize: 640
 freeblocks: 56
 freelistsize: 7552
 pageblocks: 6
 What I don't understand is why Task Manager is reporting the GC.malloc
 version using only 18 megs while the other uses 370, since everything
 I've looked at internally so far suggests they should be the same.
 Perhaps gc_stats() isn't reporting something properly--I really haven't
 given that routine a very close look.
 Sean Kelly wrote:
 I'm not sure what to say.  This sample works fine with D 1.0 using
 Tango, though the memory usage is strangely high at around 370 megs.
 Here's the weird thing, I tried running these two versions of the test
 function for comparison:

 void test() {
     void* stuff = (new byte[15_000_000 * 4]).ptr;
 }

 void test() {
     void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
 }

 The first one uses a stable 370 megs of memory and the second a stable
 18 megs.  Obviously there's something weird going on with how arrays are
 handled.

 dsimcha wrote:
 After further testing, I've found an exact threshold for this bug.
 When an array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem
 occurs, after
 26 iterations at the threshold, or less for larger arrays.  Anything
 below that,
 even one element smaller, and memory usage is stable over at least
 hundreds of
 iterations.  It appears that the number of bytes is the key, since a
 ulong[] will
 allow the same number of bytes (1/2 the elements) before causing
 problems, and a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints
 (float
 instead of uint, double instead of ulong) or using signed ints, has no
 effect.

 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays
 (See previous
 posts about array capacity fields), I produced the following test
 case that seems
 to be a reproducible bug in D 2.0.13.  The following program keeps
 allocating a
 huge array in a function and then letting all references to this
 array go out of
 scope.  This should result in the array being freed as soon as more
 memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my
 machine w/ 2 GB of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its
 memory usage
 stabilized at 350 megs, which seems rather large since a
 uint[10_000_000] should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore,
 it takes
 several iterations for the memory usage to reach this level.  Using a
 larger array
 size, such as 100_000_000, made this test run out of memory after
 even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a
 size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 08 2008

dsimcha <dsimcha yahoo.com> writes:

After looking into this further, it appears it's a fundamental issue with
conservative GC.  I scanned the stack manually for bit patterns, and it seems
that
with very large arrays, one is virtually guaranteed to have some bit pattern on
the stack that looks like a pointer into the array.  Manually deleting just the
very large arrays is a solution I can live with, since in the type of code I
typically write, all the huge arrays are only around for a short time and only
accessed from one or two places.  However, this still leaves the problem of
large
associative arrays.  There appears to be no way of manually deleting these. 
Using
delete on one (ex.  string[string] foo; delete foo;) produces a compile time
error.  Does anyone know of another way to manually delete associative arrays?


== Quote from dsimcha (dsimcha yahoo.com)'s article
 Another interesting point:  This bug also occurs on GDC 0.24, but here the
 threshold is lower, 33_423_360 bytes.  Must be something pretty deep.
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 Well whatever's going on, it's not tracked by gc_stats().  I modified
 the previous functions slightly to ensure that both allocated exactly
 the same amount of memory, then verified this via a printf inside the GC
 code--malloc calls were for 60000000 bytes:
 void test() {
      void* stuff = (new byte[15_000_000 * 4 - 1]).ptr;
 }
 void test() {
      void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
 }
 Output using gc_stats() on each iteration after the collect was
 identical for both function calls after things had stabalized:
 *** stats ***
 poolsize: 360251392
 usedsize: 640
 freeblocks: 56
 freelistsize: 7552
 pageblocks: 6
 What I don't understand is why Task Manager is reporting the GC.malloc
 version using only 18 megs while the other uses 370, since everything
 I've looked at internally so far suggests they should be the same.
 Perhaps gc_stats() isn't reporting something properly--I really haven't
 given that routine a very close look.
 Sean Kelly wrote:
 I'm not sure what to say.  This sample works fine with D 1.0 using
 Tango, though the memory usage is strangely high at around 370 megs.
 Here's the weird thing, I tried running these two versions of the test
 function for comparison:

 void test() {
     void* stuff = (new byte[15_000_000 * 4]).ptr;
 }

 void test() {
     void* stuff=GC.malloc(15_000_000 * 4, GC.BlkAttr.NO_SCAN);
 }

 The first one uses a stable 370 megs of memory and the second a stable
 18 megs.  Obviously there's something weird going on with how arrays are
 handled.

 dsimcha wrote:
 After further testing, I've found an exact threshold for this bug.
 When an array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem
 occurs, after
 26 iterations at the threshold, or less for larger arrays.  Anything
 below that,
 even one element smaller, and memory usage is stable over at least
 hundreds of
 iterations.  It appears that the number of bytes is the key, since a
 ulong[] will
 allow the same number of bytes (1/2 the elements) before causing
 problems, and a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints
 (float
 instead of uint, double instead of ulong) or using signed ints, has no
 effect.

 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays
 (See previous
 posts about array capacity fields), I produced the following test
 case that seems
 to be a reproducible bug in D 2.0.13.  The following program keeps
 allocating a
 huge array in a function and then letting all references to this
 array go out of
 scope.  This should result in the array being freed as soon as more
 memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my
 machine w/ 2 GB of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its
 memory usage
 stabilized at 350 megs, which seems rather large since a
 uint[10_000_000] should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore,
 it takes
 several iterations for the memory usage to reach this level.  Using a
 larger array
 size, such as 100_000_000, made this test run out of memory after
 even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a
 size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 09 2008

bearophile <bearophileHUGS lycos.com> writes:

dsimcha Wrote:
 Does anyone know of another way to manually delete associative arrays?

Time ago I have asked that question, and someone has shown me a way, you can
find a modified version of it as the clear() function in my libs:
http://www.fantascienza.net/leonardo/so/libs_d.zip


T clear(T)(ref T items) {
    static if (IsAA!(T)) { // This only works with the current AA implementation
        struct BB {
            void*[] b;
            size_t nodes;
        }
        union ToPtr_clear {
            T x;
            void* ptr;
        }
        ToPtr_clear toptr;

        toptr.x = items;
        BB* b = cast(BB*) toptr.ptr;
        if (b) {
            b.b = null;
            b.nodes = 0;
        }

        return items;
    } else static if (IsDynamicArray!(T)) {
        items.length = 0;
        return items;
    } else
        static assert(0, "clear() accepts only AAs and dynamic arrays.");
}

You can find unittests, ddocs and the IsAA!() into those libs, the "func"
module (the IsAA isn't necessary if you just want to use it for AAs).

Bye,
bearophile

May 09 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-05-09 18:20:19 +0200, bearophile <bearophileHUGS lycos.com> said:

 dsimcha Wrote:
 Does anyone know of another way to manually delete associative arrays?

 
 Time ago I have asked that question, and someone has shown me a way, 
 you can find a modified version of it as the clear() function in my 
 libs:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
 [...]
 
 Bye,
 bearophile

Nice that there is a workaround, but shouldn't an AA:
1) possibly contain pointers
2) have no extrnal pointers pointing to its inside

so is it possible to tell the gc to ignore internal pointer to it, but 
still (if kept) scan all the pointers contained in it?

Fawzi

May 09 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Fawzi Mohamed (fmohamed mac.com)'s article
 On 2008-05-09 18:20:19 +0200, bearophile <bearophileHUGS lycos.com> said:
 dsimcha Wrote:
 Does anyone know of another way to manually delete associative arrays?

 Time ago I have asked that question, and someone has shown me a way,
 you can find a modified version of it as the clear() function in my
 libs:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
 [...]

 Bye,
 bearophile

 Nice that there is a workaround, but shouldn't an AA:
 1) possibly contain pointers
 2) have no extrnal pointers pointing to its inside
 so is it possible to tell the gc to ignore internal pointer to it, but
 still (if kept) scan all the pointers contained in it?

The GC just deals in memory blocks, so the nodes of a chained AA
are just blocks containing pointers.  The GC doesn't know any more
than that, and it doesn't treat them any differently than any other
block containing pointers.


Sean

May 09 2008

dsimcha <dsimcha yahoo.com> writes:

This appears to be analogous to simly setting the length field to zero in a
dynamic array.  It just nulls the pointers and still requires the GC to know to
free the memory properly.  On the other hand, delete() for dynamic arrays is
truly
manual memory management and actually frees the memory in question.  The
following
test case illustrates my point, and fails after a few iterations.

void main () {
    uint count;
    while(true) {
        test();
        fullCollect();
        minimize();
        writefln(++count);
    }
}

void test() {
    string[uint] foo;
    for(uint i=1; i < 10_000_000; i++) {
        foo[i]="Some ridiculously random text foo foo foo foo foo foo foo foo
foo";
    }
    clear(foo);
}

Note:  clear() is clear function copied and pasted from Bearophine.

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha Wrote:
 Does anyone know of another way to manually delete associative arrays?

 Time ago I have asked that question, and someone has shown me a way, you can

find a modified version of it as the clear() function in my libs:
 http://www.fantascienza.net/leonardo/so/libs_d.zip
 T clear(T)(ref T items) {
     static if (IsAA!(T)) { // This only works with the current AA
implementation
         struct BB {
             void*[] b;
             size_t nodes;
         }
         union ToPtr_clear {
             T x;
             void* ptr;
         }
         ToPtr_clear toptr;
         toptr.x = items;
         BB* b = cast(BB*) toptr.ptr;
         if (b) {
             b.b = null;
             b.nodes = 0;
         }
         return items;
     } else static if (IsDynamicArray!(T)) {
         items.length = 0;
         return items;
     } else
         static assert(0, "clear() accepts only AAs and dynamic arrays.");
 }
 You can find unittests, ddocs and the IsAA!() into those libs, the "func"
module

(the IsAA isn't necessary if you just want to use it for AAs).
 Bye,
 bearophile

May 09 2008

Sivo Schilling <sivo.schilling web.de> writes:

This bug also occurs using DMD 1.029. It seems one have to help phobos
gc for such cases (intended as a workaround). If you excplicitly deletes your
large array than there's no error.
---
import std.stdio, std.gc;

void main()
{
    char[] buf;
    uint[] stuff;

    uint count=0;
    while(true && count < 100)
    {
        testa(stuff);
        // fullCollect();
        delete stuff;
        writefln(++count);
    }
    readln(stdin, buf);
}

void testa(ref uint[] stuff)
{
    stuff=new uint[15_000_000];
    // improve execution speed
    std.gc.hasNoPointers(cast(void*)stuff);
}
---
This works with DMD 1.029 and DMD 2.013 and phobos (using Wiin XP SP3, 4 GB
RAM).
 
dsimcha Wrote:

 After further testing, I've found an exact threshold for this bug.  When an
array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem occurs,
after
 26 iterations at the threshold, or less for larger arrays.  Anything below
that,
 even one element smaller, and memory usage is stable over at least hundreds of
 iterations.  It appears that the number of bytes is the key, since a ulong[]
will
 allow the same number of bytes (1/2 the elements) before causing problems, and
a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints (float
 instead of uint, double instead of ulong) or using signed ints, has no effect.
 
 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays (See previous
 posts about array capacity fields), I produced the following test case that
seems
 to be a reproducible bug in D 2.0.13.  The following program keeps allocating a
 huge array in a function and then letting all references to this array go out
of
 scope.  This should result in the array being freed as soon as more memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my machine w/ 2 GB
of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its memory usage
 stabilized at 350 megs, which seems rather large since a uint[10_000_000]
should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore, it takes
 several iterations for the memory usage to reach this level.  Using a larger
array
 size, such as 100_000_000, made this test run out of memory after even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 08 2008

dsimcha <dsimcha yahoo.com> writes:

Acutally, if I stick a delete in my original test case at the end of test(), the
problem seems to go away.  Furthermore, the ridiculously high memory usage even
for cases of smaller arrays, goes down to like 41 megs instead of 350 for a
uint[10_000_000].  Adding a hasNoPointers seems unnecessary.  Why is the Phobos
garbage collector so bad at freeing unused dynamic arrays?

== Quote from Sivo Schilling (sivo.schilling web.de)'s article
 This bug also occurs using DMD 1.029. It seems one have to help phobos
 gc for such cases (intended as a workaround). If you excplicitly deletes your

large array than there's no error.
 ---
 import std.stdio, std.gc;
 void main()
 {
     char[] buf;
     uint[] stuff;
     uint count=0;
     while(true && count < 100)
     {
         testa(stuff);
         // fullCollect();
         delete stuff;
         writefln(++count);
     }
     readln(stdin, buf);
 }
 void testa(ref uint[] stuff)
 {
     stuff=new uint[15_000_000];
     // improve execution speed
     std.gc.hasNoPointers(cast(void*)stuff);
 }
 ---
 This works with DMD 1.029 and DMD 2.013 and phobos (using Wiin XP SP3, 4 GB
RAM).
 dsimcha Wrote:
 After further testing, I've found an exact threshold for this bug.  When an
array
 of uints gets to 48_693_248 bytes (12_173_312 elements) this problem occurs,
after
 26 iterations at the threshold, or less for larger arrays.  Anything below
that,
 even one element smaller, and memory usage is stable over at least hundreds of
 iterations.  It appears that the number of bytes is the key, since a ulong[]
will
 allow the same number of bytes (1/2 the elements) before causing problems, and
a
 ushort[] will allow twice as many elements (same number of bytes) without
 crashing.  Furthermore, using equivalent sized floats instead of ints (float
 instead of uint, double instead of ulong) or using signed ints, has no effect.

 == Quote from dsimcha (dsimcha yahoo.com)'s article
 Because of some difficulties encountered using D for large arrays (See previous
 posts about array capacity fields), I produced the following test case that



seems
 to be a reproducible bug in D 2.0.13.  The following program keeps allocating a
 huge array in a function and then letting all references to this array go out
of
 scope.  This should result in the array being freed as soon as more memory is

 needed.
 import std.stdio, std.gc;
 void main(){
     uint count=0;
     while(true) {
         test();
         fullCollect();
         writefln(++count);
     }
 }
 void test() {
     uint[] stuff=new uint[15_000_000];
 }
 This produced an out of memory error after 21 iterations on my machine w/ 2



GB of
 RAM.  Using an array size of 10_000_000 instead of 15_000_000, its memory usage
 stabilized at 350 megs, which seems rather large since a uint[10_000_000]
should
 only use 40 megs, plus maybe another 40 for overhead.  Furthermore, it takes
 several iterations for the memory usage to reach this level.  Using a larger



array
 size, such as 100_000_000, made this test run out of memory after even fewer
 iterations.  Furthermore, changing from a uint[] to real[] with a size of
 15_000_000 made it run out after 8 iterations instead of 21.

May 08 2008

"Robert Jacques" <sandford jhu.edu> writes:

dsimcha and I debugged this last night. It appears it’s a case of the  
"partly conservative" collector problem.  
( http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Precise_vs._conservative_a
d_internal_pointers  
). Essentially, there are values on the stack (including the array’s own  
length parameter) which when interpreted as a pointer, point to the array.  
And the bigger the array, the bigger the target and the more likely at  
least one pointer’s found. The issue were it works but uses too much  
memory, seems to be a case of some allocations ‘sticking’ before the GC  
finds some clear ranges, which it uses thereafter.

May 09 2008

Nick B <nick.barbalich gmail.com> writes:

Sean Kelly wrote:
 dsimcha wrote:
 According to the Phobos docs, std.gc.capacity "Returns capacity (size 
 of the
 memory block) that p  points to the beginning of. If p does not point 
 into the gc
 memory pool, or does not point to the beginning of an allocated memory 
 block, 0 is
 returned."  I had looked through this before.

 
 Right.  And since resizing an interior slice "in place" is typically 
 undesirable, this does exactly what you want.  If you really want the 
 capacity regardless of this however, you can use 
 tango.core.Memory.GC.query(), which returns the size of the block even 
 for interior pointers.
 

Can someone tell me what "interior pointers" are ?

thanks
Nick

May 08 2008

torhu <no spam.invalid> writes:

Nick B wrote:
  > Can someone tell me what "interior pointers" are ?

Sure.

http://www.memorymanagement.org/glossary/i.html#interior.pointer

May 08 2008

Pontus Pihlgren <pontus update.uu.se> writes:

 This is somewhat of a different issue.  The Phobos GC never returns any 
 memory to the OS.  

This is my absolute favourite pet peeve with Perl, it has the same 
behaviour.

I'm working on a fairly large program which is used on a cluster system 
with several instances going at the same time. It is initially uses 
quite a bit of memory (a few hundred megabytes) but then releases a 
large part of it and lingers in this state for quite some time (waiting 
for other processes), but from the systems point of view it never 
decreases in size.

I realize that the GC is trying to be nice to me and speed up new 
allocs, but I would like to force it to release memory to the system. Is 
this possible in Phobos or Tango, it really should be?

Pontus.

May 09 2008

Sean Kelly <sean invisibleduck.org> writes:

Pontus Pihlgren wrote:
 
 This is somewhat of a different issue.  The Phobos GC never returns 
 any memory to the OS.  

 
 This is my absolute favourite pet peeve with Perl, it has the same 
 behaviour.
 
 I'm working on a fairly large program which is used on a cluster system 
 with several instances going at the same time. It is initially uses 
 quite a bit of memory (a few hundred megabytes) but then releases a 
 large part of it and lingers in this state for quite some time (waiting 
 for other processes), but from the systems point of view it never 
 decreases in size.
 
 I realize that the GC is trying to be nice to me and speed up new 
 allocs, but I would like to force it to release memory to the system. Is 
 this possible in Phobos or Tango, it really should be?

The Tango GC actually does release memory back to the OS in some 
situations, which is why Tango doesn't explode on the example in this 
thread.  It also actually implements GC.minimize(), which I believe is a 
no-op in Phobos.  However, it's not always easy to return memory to the 
OS given the design of the GC in Phobos (which Tango inherited and 
simply modified).  Basically, memory is obtained from the OS is large 
pools and then chunks of these pools are designated as free space for 
different sized blocks as needed.  So to return a pool to the OS (by far 
the easiest approach), the entire pool must obviously be free.  The 
problem is that the chance that a pool may have at least one chunk used 
for some existing memory block is pretty high in most situations. 
However, sufficiently large single allocations (like the 15_000_000 uint 
array) are given their own pool by the GC, so when this array is 
collected it's possible to return its pool to the OS immediately.  This 
is basically what Tango does, and is likely why it doesn't crash on this 
sample app.


Sean

May 09 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Array Capacity Field