www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - either me or GC sux badly (GC don't reuse free memory)

reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
Hello.

let's run this program:

  import core.sys.posix.unistd;
  import std.stdio;
  import core.memory;

  void main () {
    uint size =3D 1024*1024*300;
    for (;;) {
      auto buf =3D new ubyte[](size);
      writefln("%s", size);
      sleep(1);
      size +=3D 1024*1024*100;
      buf =3D null;
      GC.collect();
      GC.minimize();
    }
  }

pretty innocent, right? i even trying to help GC here. but...

  314572800
  419430400
  524288000
  629145600
  734003200
  core.exception.OutOfMemoryError (0)

oooops.

by the way, this is not actually "no more memory", this is "i'm out of
address space" (yes, i'm on 32-bit system, GNU/Linux).

the question is: am i doing something wrong here? how can i force GC to
stop eating my address space and reuse what it already has?

sure, i can use libc malloc(), refcounting, and so on, but the question
remains: why GC not reusing already allocated and freed memory?
Nov 12 2014
next sibling parent reply "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via 
Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i 
 force GC to stop eating my address space and reuse what it 
 already has?
Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of having a word on the stack that looks like a pointer to it and prevents it from being collected are almost 100%. Just don't store big arrays in GC heap or switch to 64 bits where the problem is not that bad since address space is much larger and chances of false pointers are much smaller.
Nov 12 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 12:05:25 +0000
thedeemon via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i=20
 force GC to stop eating my address space and reuse what it=20
 already has?
=20 Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of=20 having a word on the stack that looks like a pointer to it and=20 prevents it from being collected are almost 100%. Just don't=20 store big arrays in GC heap or switch to 64 bits where the=20 problem is not that bad since address space is much larger and=20 chances of false pointers are much smaller.
even with NO_INTERIOR the sample keeps failing (yet after more iterations). no, really, there is ALWAYS a pointer exactly to the start of the allocated buffer somewhere? i feel something smelly here.
Nov 12 2014
prev sibling next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 12:05:25 +0000
thedeemon via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i=20
 force GC to stop eating my address space and reuse what it=20
 already has?
=20 Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of=20 having a word on the stack that looks like a pointer to it and=20 prevents it from being collected are almost 100%. Just don't=20 store big arrays in GC heap or switch to 64 bits where the=20 problem is not that bad since address space is much larger and=20 chances of false pointers are much smaller.
but 'mkay, let's change the sample a little: import core.memory; import std.stdio; void main () { uint size =3D 1024*1024*300; for (;;) { auto buf =3D new ubyte[](size); writefln("%s", size); size +=3D 1024*1024*100; GC.free(GC.addrOf(buf.ptr)); buf =3D null; GC.collect(); GC.minimize(); } } this shouldn't fail so soon, right? i'm freeing the memory, so... it still dying on 1,887,436,800. 1.7GB and that's all? this can't be true, i have 3GB of free RAM (with 1.2GB used) and 8GB of unused swap. and yes, it consumed all of the process address space again.
Nov 12 2014
next sibling parent reply "Matthias Bentrup" <matthias.bentrup googlemail.com> writes:
On Wednesday, 12 November 2014 at 12:30:15 UTC, ketmar via 
Digitalmars-d wrote:
 On Wed, 12 Nov 2014 12:05:25 +0000
 thedeemon via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via 
 Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i 
 force GC to stop eating my address space and reuse what it 
 already has?
Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of having a word on the stack that looks like a pointer to it and prevents it from being collected are almost 100%. Just don't store big arrays in GC heap or switch to 64 bits where the problem is not that bad since address space is much larger and chances of false pointers are much smaller.
but 'mkay, let's change the sample a little: import core.memory; import std.stdio; void main () { uint size = 1024*1024*300; for (;;) { auto buf = new ubyte[](size); writefln("%s", size); size += 1024*1024*100; GC.free(GC.addrOf(buf.ptr)); buf = null; GC.collect(); GC.minimize(); } } this shouldn't fail so soon, right? i'm freeing the memory, so... it still dying on 1,887,436,800. 1.7GB and that's all? this can't be true, i have 3GB of free RAM (with 1.2GB used) and 8GB of unused swap. and yes, it consumed all of the process address space again.
On Linux/x86 you have only 3 GB virtual address space, and this has to include the program code + all loaded libraries too. Check out /proc/<pid>/maps, to see where the dlls are loaded, and look at the largest chunk of free space available. That is the theoretical limit that could be allocated.
Nov 12 2014
parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 12:42:10 +0000
Matthias Bentrup via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 12:30:15 UTC, ketmar via=20
 Digitalmars-d wrote:
 On Wed, 12 Nov 2014 12:05:25 +0000
 thedeemon via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i=20
 force GC to stop eating my address space and reuse what it=20
 already has?
=20 Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of=20 having a word on the stack that looks like a pointer to it and=20 prevents it from being collected are almost 100%. Just don't=20 store big arrays in GC heap or switch to 64 bits where the=20 problem is not that bad since address space is much larger and=20 chances of false pointers are much smaller.
but 'mkay, let's change the sample a little: import core.memory; import std.stdio; void main () { uint size =3D 1024*1024*300; for (;;) { auto buf =3D new ubyte[](size); writefln("%s", size); size +=3D 1024*1024*100; GC.free(GC.addrOf(buf.ptr)); buf =3D null; GC.collect(); GC.minimize(); } } this shouldn't fail so soon, right? i'm freeing the memory,=20 so... it still dying on 1,887,436,800. 1.7GB and that's all? this can't=20 be true, i have 3GB of free RAM (with 1.2GB used) and 8GB of unused=20 swap. and yes, it consumed all of the process address space again.
=20 On Linux/x86 you have only 3 GB virtual address space, and this=20 has to include the program code + all loaded libraries too. Check=20 out /proc/<pid>/maps, to see where the dlls are loaded, and look=20 at the largest chunk of free space available. That is the=20 theoretical limit that could be allocated.
i know it. what i can't get is why D allocates more and more address space with each 'new'. what i expecting is address space consumption on par with 'size', but it grows alot faster. seems that i should either read GC code to see what's going on (oh, boring!) or write memory region dumper (it's funnier). i bet that something is wrong with GC memory manager though, but can't prove it for now.
Nov 12 2014
prev sibling parent reply "Kagamin" <spam here.lot> writes:
On Wednesday, 12 November 2014 at 12:30:15 UTC, ketmar via 
Digitalmars-d wrote:
 this shouldn't fail so soon, right? i'm freeing the memory, 
 so... it
 still dying on 1,887,436,800. 1.7GB and that's all? this can't 
 be true,
 i have 3GB of free RAM (with 1.2GB used) and 8GB of unused 
 swap. and
 yes, it consumed all of the process address space again.
Maybe you fragmented the heap and don't have 1.7GB of contiguous memory?
Nov 12 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 15:24:08 +0000
Kagamin via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 12:30:15 UTC, ketmar via=20
 Digitalmars-d wrote:
 this shouldn't fail so soon, right? i'm freeing the memory,=20
 so... it
 still dying on 1,887,436,800. 1.7GB and that's all? this can't=20
 be true,
 i have 3GB of free RAM (with 1.2GB used) and 8GB of unused=20
 swap. and
 yes, it consumed all of the process address space again.
=20 Maybe you fragmented the heap and don't have 1.7GB of contiguous=20 memory?
i gave two example programs, which demonstrates the effect, and they aren't excerpts. the only allocating `writef` can be removed too, but the effect stays. so heap fragmentation from other allocations can't be the issue.
Nov 12 2014
parent reply "Kagamin" <spam here.lot> writes:
On Wednesday, 12 November 2014 at 15:36:48 UTC, ketmar via 
Digitalmars-d wrote:
 so heap fragmentation from other allocations can't be the issue.
Why do you think so? Try to go in opposite direction: start from 700MB and decrease allocation size.
Nov 12 2014
parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 16:23:01 +0000
Kagamin via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 15:36:48 UTC, ketmar via=20
 Digitalmars-d wrote:
 so heap fragmentation from other allocations can't be the issue.
=20 Why do you think so? Try to go in opposite direction: start from 700MB and decrease=20 allocation size.
i mean "from allocations in other places of the progam", not the "previous allocations in this code". sorry.
Nov 12 2014
prev sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 12:05:25 +0000
thedeemon via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
   734003200
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i=20
 force GC to stop eating my address space and reuse what it=20
 already has?
=20 Sure: just make the GC precise, not conservative. ;) With current GC implementation and array this big chances of=20 having a word on the stack that looks like a pointer to it and=20 prevents it from being collected are almost 100%. Just don't=20 store big arrays in GC heap or switch to 64 bits where the=20 problem is not that bad since address space is much larger and=20 chances of false pointers are much smaller.
for information: yes, RES is jumping high and low as it should. but VIRT is steadily growing until there is no more address space available. so the problem is clearly not in false pointers this time.
Nov 12 2014
prev sibling next sibling parent reply "Kagamin" <spam here.lot> writes:
On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via 
Digitalmars-d wrote:
 the question is: am i doing something wrong here? how can i 
 force GC to
 stop eating my address space and reuse what it already has?
Try to allocate the arrays with NO_SCAN flag.
Nov 12 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 15:19:51 +0000
Kagamin via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
 the question is: am i doing something wrong here? how can i=20
 force GC to
 stop eating my address space and reuse what it already has?
=20 Try to allocate the arrays with NO_SCAN flag.
why this must make any difference in the demonstrated cases? ubyte arrays are initialized to zeroes, so they can't contain false pointers.
Nov 12 2014
prev sibling next sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/12/14 10:19 AM, Kagamin wrote:
 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via Digitalmars-d
 wrote:
 the question is: am i doing something wrong here? how can i force GC to
 stop eating my address space and reuse what it already has?
Try to allocate the arrays with NO_SCAN flag.
Really that shouldn't matter. The arrays should all be 0-initialized. -Steve
Nov 12 2014
prev sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 15:19:51 +0000
Kagamin via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 11:05:11 UTC, ketmar via=20
 Digitalmars-d wrote:
 the question is: am i doing something wrong here? how can i=20
 force GC to
 stop eating my address space and reuse what it already has?
=20 Try to allocate the arrays with NO_SCAN flag.
btw, compiler is smart enough to allocate array with NO_SCAN flag, i checked this with `GC.getAttr()`.
Nov 12 2014
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/12/14 6:04 AM, ketmar via Digitalmars-d wrote:
 Hello.

 let's run this program:

    import core.sys.posix.unistd;
    import std.stdio;
    import core.memory;

    void main () {
      uint size = 1024*1024*300;
      for (;;) {
        auto buf = new ubyte[](size);
        writefln("%s", size);
        sleep(1);
        size += 1024*1024*100;
        buf = null;
        GC.collect();
        GC.minimize();
      }
    }

 pretty innocent, right? i even trying to help GC here. but...

    314572800
    419430400
    524288000
    629145600
    734003200
    core.exception.OutOfMemoryError (0)

 oooops.

 by the way, this is not actually "no more memory", this is "i'm out of
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i force GC to
 stop eating my address space and reuse what it already has?

 sure, i can use libc malloc(), refcounting, and so on, but the question
 remains: why GC not reusing already allocated and freed memory?
I think I might know what's going on. You are continually adding 100MB to the allocation size. Memory is contiguous from the OS, but can get fragmented inside the GC. So let's say, you allocate 300MB. Fine. It needs more space from the OS, allocates it, and assigns a pool to that 300MB. Now, you add another 100MB. At this point, it can't fit into the original pool, so it allocates another 400MB. BUT, it doesn't merge the 300MB into that (I don't think), so when it adds another 100MB, it has a 300MB space, and a 400MB space, neither of which can hold 500MB. And it goes on and on. Keep in mind also that it is a frequent error that people make to set a pointer to null and expect the data will be collected. For example, buf could still be in a register. I would be interested in how much memory the GC has vs. how much is actually used. -Steve
Nov 12 2014
next sibling parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 10:51:31 -0500
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 On 11/12/14 6:04 AM, ketmar via Digitalmars-d wrote:
 Hello.

 let's run this program:

    import core.sys.posix.unistd;
    import std.stdio;
    import core.memory;

    void main () {
      uint size =3D 1024*1024*300;
      for (;;) {
        auto buf =3D new ubyte[](size);
        writefln("%s", size);
        sleep(1);
        size +=3D 1024*1024*100;
        buf =3D null;
        GC.collect();
        GC.minimize();
      }
    }

 pretty innocent, right? i even trying to help GC here. but...

    314572800
    419430400
    524288000
    629145600
    734003200
    core.exception.OutOfMemoryError (0)

 oooops.

 by the way, this is not actually "no more memory", this is "i'm out of
 address space" (yes, i'm on 32-bit system, GNU/Linux).

 the question is: am i doing something wrong here? how can i force GC to
 stop eating my address space and reuse what it already has?

 sure, i can use libc malloc(), refcounting, and so on, but the question
 remains: why GC not reusing already allocated and freed memory?
=20 I think I might know what's going on. =20 You are continually adding 100MB to the allocation size. Memory is=20 contiguous from the OS, but can get fragmented inside the GC. =20 So let's say, you allocate 300MB. Fine. It needs more space from the OS,=
=20
 allocates it, and assigns a pool to that 300MB. Now, you add another=20
 100MB. At this point, it can't fit into the original pool, so it=20
 allocates another 400MB. BUT, it doesn't merge the 300MB into that (I=20
 don't think), so when it adds another 100MB, it has a 300MB space, and a=
=20
 400MB space, neither of which can hold 500MB. And it goes on and on.=20
 Keep in mind also that it is a frequent error that people make to set a=20
 pointer to null and expect the data will be collected. For example, buf=20
 could still be in a register.
=20
 I would be interested in how much memory the GC has vs. how much is=20
 actually used.
i posted the second samle where i'm doing `GC.free()` to reclaim memory. as i said, RES is jumping between "almost nothing" and "several GB", as sample allocates and frees. but VIRT is growing constantly. i believe that GC just can't merge segments, so it keep asking for more and more address space for new segments, leaving old ones unused and unmerged. this way GC has alot of free memory, but when it can't allocate another segment, it throws "out of memory error". if i'll use libc malloc() for allocating, everything works as i expected: address space consumtion is on par with allocation size.
Nov 12 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 12 November 2014 at 16:06:32 UTC, ketmar via 
Digitalmars-d wrote:
 if i'll use libc malloc() for allocating, everything works as i
 expected: address space consumtion is on par with allocation 
 size.
The gc uses C's calloc rather implementing memory handling itself using the OS so you get fragmentation: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L2223
Nov 12 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 16:13:39 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 16:06:32 UTC, ketmar via=20
 Digitalmars-d wrote:
 if i'll use libc malloc() for allocating, everything works as i
 expected: address space consumtion is on par with allocation=20
 size.
=20 The gc uses C's calloc rather implementing memory handling itself=20 using the OS so you get fragmentation: =20 https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.=
d#L2223 hm. my bad, i was checking malloc() with C program and it happens to use custom allocator. i just carefully re-checked it and it really works the same as D GC. sorry. so this seems to be libc memory manager fault after all. sorry for the noise.
Nov 12 2014
prev sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 16:13:39 +0000
via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On Wednesday, 12 November 2014 at 16:06:32 UTC, ketmar via=20
 Digitalmars-d wrote:
 if i'll use libc malloc() for allocating, everything works as i
 expected: address space consumtion is on par with allocation=20
 size.
=20 The gc uses C's calloc rather implementing memory handling itself=20 using the OS so you get fragmentation: =20 https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.=
d#L2223 ah, and sorry once again, i was tired. ;-) the C program stops at 2,936,012,800, which is much more realistic. i checked three times and it's really using libc `malloc()` now. so libc malloc is perfectly able to merge segments, while D GC is not.
Nov 12 2014
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/12/14 11:06 AM, ketmar via Digitalmars-d wrote:
 i posted the second samle where i'm doing `GC.free()` to reclaim
 memory. as i said, RES is jumping between "almost nothing" and "several
 GB", as sample allocates and frees. but VIRT is growing constantly.

 i believe that GC just can't merge segments, so it keep asking for more
 and more address space for new segments, leaving old ones unused and
 unmerged. this way GC has alot of free memory, but when it can't
 allocate another segment, it throws "out of memory error".
Yes, this is what I think is happening.
 if i'll use libc malloc() for allocating, everything works as i
 expected: address space consumtion is on par with allocation size.
I don't know the internals of C malloc. But I think it should be possible to make D merge segments when it needs to. One thing I am curious about -- it needs to allocate space to deal with metadata in the heap. That data should be moveable, but I bet it doesn't get moved. That may be why it can't merge segments. -Steve
Nov 12 2014
next sibling parent ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 11:20:48 -0500
Steven Schveighoffer via Digitalmars-d <digitalmars-d puremagic.com>
wrote:

 One thing I am curious about -- it needs to allocate space to deal with=20
 metadata in the heap. That data should be moveable, but I bet it doesn't=
=20
 get moved. That may be why it can't merge segments.
looks like a good GC improvement. ;-)
Nov 12 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 12 November 2014 at 16:20:48 UTC, Steven 
Schveighoffer wrote:
 I don't know the internals of C malloc. But I think it should 
 be possible to make D merge segments when it needs to.
Yes, but then D must provide a malloc replacement.
Nov 12 2014
prev sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
It's been a while since I Dded this, but I think the GC will 
effectively call minimize after collecting, so any collected 
large allocations should be returned to the OS. Allocations 
larger than 4K get their own dedicated pool, so fragmentation 
shouldn't come into play here.
Nov 12 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
Try following the big allocation with a really small allocation 
to clear out any registers that may be referencing the large 
block.
Nov 12 2014
parent reply ketmar via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Wed, 12 Nov 2014 16:40:10 +0000
Sean Kelly via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 Try following the big allocation with a really small allocation=20
 to clear out any registers that may be referencing the large=20
 block.
but this clearly not an issue with sample which does `GC.free()`, and it stops at 1.7GB, while C sample does the same and stops at 2.9GB.
Nov 12 2014
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 12 November 2014 at 16:47:47 UTC, ketmar via 
Digitalmars-d wrote:
 On Wed, 12 Nov 2014 16:40:10 +0000
 Sean Kelly via Digitalmars-d <digitalmars-d puremagic.com> 
 wrote:

 Try following the big allocation with a really small 
 allocation to clear out any registers that may be referencing 
 the large block.
but this clearly not an issue with sample which does `GC.free()`, and it stops at 1.7GB, while C sample does the same and stops at 2.9GB.
You should get debuginfo by compiling the runtime with the PRINTF debugging flag set: https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L20
Nov 12 2014
prev sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 12 November 2014 at 16:47:47 UTC, ketmar via 
Digitalmars-d wrote:
 but this clearly not an issue with sample which does 
 `GC.free()`, and
 it stops at 1.7GB, while C sample does the same and stops at 
 2.9GB.
GC probably allocates some small blocks for pools and other data. If it gets in a middle of address space, that can cause fragmentation too. Try to add small allocations to the C sample too.
Nov 13 2014