www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - how to use GC as a leak detector? i.e. get some help info from GC?

reply nobody <no where.com> writes:
Hi,

I'm writing a data processing program in D, which deals with large amounts of
small objects. One of the thing I found is that D's GC is horribly slow in
such situation. I tried my program with gc enable & disabled (with some manual
deletes). The GC disabled version (2 min) is ~100 times faster than the GC
enabled version (4 hours)!

But of course the GC disabled version still leak memory, it soon exceeds the
machine memory limit when I try to process more data; while the GC enabled
version don't have such problem.

So my plan is to use the GC disabled version with manual deletes. But it was
very hard to find all the memory leaks. I'm wondering: is there anyway to use
GC as a leak detector? can the GC enabled version give me some help
information on which objects get collected, so I can manually delete them in
my GC disabled version?  Thanks!
May 24 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
nobody Wrote:

 Hi,
 
 I'm writing a data processing program in D, which deals with large amounts of
 small objects. One of the thing I found is that D's GC is horribly slow in
 such situation. I tried my program with gc enable & disabled (with some manual
 deletes). The GC disabled version (2 min) is ~100 times faster than the GC
 enabled version (4 hours)!
 
 But of course the GC disabled version still leak memory, it soon exceeds the
 machine memory limit when I try to process more data; while the GC enabled
 version don't have such problem.
 
 So my plan is to use the GC disabled version with manual deletes. But it was
 very hard to find all the memory leaks. I'm wondering: is there anyway to use
 GC as a leak detector? can the GC enabled version give me some help
 information on which objects get collected, so I can manually delete them in
 my GC disabled version?  Thanks!
 
 

Why not use valgrind? With the GC disabled, it should give accurate results.
May 24 2009
parent reply nobody <no where.com> writes:
== Quote from Jason House (jason.james.house gmail.com)'s article
 Why not use valgrind? With the GC disabled, it should give accurate results.

Strange enough, indeed I have tried valgrind with the GC disabled version. It didn't report anything useful. That's why I'm puzzled, does D's GC do something special? The GC disabled version run out of 3G memory; but the GC enabled version stays at ~800M throughout the run.
May 24 2009
next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
nobody wrote:
 == Quote from Jason House (jason.james.house gmail.com)'s article
 Why not use valgrind? With the GC disabled, it should give accurate results.

Strange enough, indeed I have tried valgrind with the GC disabled version. It didn't report anything useful. That's why I'm puzzled, does D's GC do something special?

The GC allocates memory directly from the OS, it doesn't use malloc/free and friends. It does this even when the GC is "disabled", which just means the collections won't happen. (Disabling the GC doesn't change the method of allocation) Valgrind probably doesn't detect those OS calls (and almost certainly doesn't know about the GC calls). If you're using Tango, you can link to the 'stub' GC instead of the normal ('basic') one. The stub GC doesn't actually collect, it passes calls on to malloc/calloc/realloc/free instead. That should make Valgrind work. (something similar probably applies if you're using D2 with druntime)
May 24 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
nobody, el 24 de mayo a las 20:03 me escribiste:
 == Quote from Jason House (jason.james.house gmail.com)'s article
 Why not use valgrind? With the GC disabled, it should give accurate results.

Strange enough, indeed I have tried valgrind with the GC disabled version. It didn't report anything useful. That's why I'm puzzled, does D's GC do something special? The GC disabled version run out of 3G memory; but the GC enabled version stays at ~800M throughout the run.

I guess that with such amount of memory used, your program can greatly benefit from using NO_SCAN if your 800M of data are plain old data. Did you tried it? And if you never have interior pointers to that data, your program can possibly avoid a lot of false positives due to the conservativism if you use NO_INTERIOR (this is only available if you patch the GC with David Simcha's patch[1]). [1] http://d.puremagic.com/issues/show_bug.cgi?id=2927 -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- This is what you get, when you mess with us.
May 24 2009
parent reply nobody <no where.com> writes:
== Quote from Leandro Lucarella (llucax gmail.com)'s article
 benefit from using NO_SCAN if your 800M of data are plain old data. Did
 you tried it? And if you never have interior pointers to that data, your
 program can possibly avoid a lot of false positives due to the
 conservativism if you use NO_INTERIOR (this is only available if you patch

No, my data are classes (not structs), and they need to be class by some other design considerations; and worse they contain pointers to other data, e.g. class SmallDataA { // need to be class } class SmallDataB { // need to be class SmallDataA a; // in D 'a' is a reference, or 'pointer' } I have thought about use POD. I think the above code in C++ will be more what I want: i.e. the 'a' object (not the reference) is embedded directly into SmallDataB. I guess when I have millions of such SmallDataB objects, it will make the GC busy in D since 'a' is reference. So question: can we have such expanded class objects in D?
May 25 2009
parent Leandro Lucarella <llucax gmail.com> writes:
nobody, el 25 de mayo a las 07:37 me escribiste:
 == Quote from Leandro Lucarella (llucax gmail.com)'s article
 benefit from using NO_SCAN if your 800M of data are plain old data. Did
 you tried it? And if you never have interior pointers to that data, your
 program can possibly avoid a lot of false positives due to the
 conservativism if you use NO_INTERIOR (this is only available if you patch

No, my data are classes (not structs), and they need to be class by some other design considerations; and worse they contain pointers to other data, e.g. class SmallDataA { // need to be class } class SmallDataB { // need to be class SmallDataA a; // in D 'a' is a reference, or 'pointer' } I have thought about use POD. I think the above code in C++ will be more what I want: i.e. the 'a' object (not the reference) is embedded directly into SmallDataB. I guess when I have millions of such SmallDataB objects, it will make the GC busy in D since 'a' is reference. So question: can we have such expanded class objects in D?

There are request for them, but not for now... -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- CAYO HUGO CONZI --- TENIA PUESTA PELUCA -- Crónica TV
May 25 2009
prev sibling next sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
Theoretically, you could recompile the GC to write to a log file any 
time it frees anything.

For data processing, though, you really want to try to have a fixed 
memory buffer.  You've got to be hurting from the allocations and frees, 
which if at all possible you should get rid of.

Also, if you're allocating buffers of memory (e.g. for the data), you 
can tell the GC not to scan them.  This will probably solve the problem 
of the GC being so slow.

-[Unknown]


nobody wrote:
 Hi,
 
 I'm writing a data processing program in D, which deals with large amounts of
 small objects. One of the thing I found is that D's GC is horribly slow in
 such situation. I tried my program with gc enable & disabled (with some manual
 deletes). The GC disabled version (2 min) is ~100 times faster than the GC
 enabled version (4 hours)!
 
 But of course the GC disabled version still leak memory, it soon exceeds the
 machine memory limit when I try to process more data; while the GC enabled
 version don't have such problem.
 
 So my plan is to use the GC disabled version with manual deletes. But it was
 very hard to find all the memory leaks. I'm wondering: is there anyway to use
 GC as a leak detector? can the GC enabled version give me some help
 information on which objects get collected, so I can manually delete them in
 my GC disabled version?  Thanks!
 
 

May 24 2009
parent nobody <no where.com> writes:
== Quote from Unknown W. Brackets (unknown simplemachines.org)'s article
 Theoretically, you could recompile the GC to write to a log file any
 time it frees anything.

Is it possible to recompile Phobos to let the GC write to a log whenever it frees? I guess I also need the type info of the object being freed.
May 24 2009
prev sibling next sibling parent "Nick Sabalausky" <a a.a> writes:
"nobody" <no where.com> wrote in message 
news:gvc5q7$2bc3$1 digitalmars.com...
 Hi,

 I'm writing a data processing program in D, which deals with large amounts 
 of
 small objects. One of the thing I found is that D's GC is horribly slow in
 such situation. I tried my program with gc enable & disabled (with some 
 manual
 deletes). The GC disabled version (2 min) is ~100 times faster than the GC
 enabled version (4 hours)!

 But of course the GC disabled version still leak memory, it soon exceeds 
 the
 machine memory limit when I try to process more data; while the GC enabled
 version don't have such problem.

 So my plan is to use the GC disabled version with manual deletes. But it 
 was
 very hard to find all the memory leaks. I'm wondering: is there anyway to 
 use
 GC as a leak detector? can the GC enabled version give me some help
 information on which objects get collected, so I can manually delete them 
 in
 my GC disabled version?  Thanks!

Depending how exactly your program is working, another common thing that might help is to manually manage free pools. Ie, allocate a bunch up-front, and instead of letting one get GCed when done with it, hold on to it, make note of it being available for re-use, and then reuse it instead of allocating a new one. Or, allocate one big chuck of memory and stick your small objects in there. They typically do this sort of thing for particle systems.
May 24 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from nobody (no where.com)'s article
 Hi,
 I'm writing a data processing program in D, which deals with large amounts of
 small objects. One of the thing I found is that D's GC is horribly slow in
 such situation. I tried my program with gc enable & disabled (with some manual
 deletes). The GC disabled version (2 min) is ~100 times faster than the GC
 enabled version (4 hours)!
 But of course the GC disabled version still leak memory, it soon exceeds the
 machine memory limit when I try to process more data; while the GC enabled
 version don't have such problem.
 So my plan is to use the GC disabled version with manual deletes. But it was
 very hard to find all the memory leaks. I'm wondering: is there anyway to use
 GC as a leak detector? can the GC enabled version give me some help
 information on which objects get collected, so I can manually delete them in
 my GC disabled version?  Thanks!

I've dealt with a bunch of somewhat similar situations in code I've written, here are some tips that others have not already mentioned, and that might be less drastic than going with fully manual memory management: One thing you could try is disabling the GC (this really just disables automatic running of the collector) and run it manually at points that you know make sense. For example, you could just insert a GC.collect() statement at the end of every run of your main loop. Another thing to try is avoiding appending to arrays. If you know the length in advance, you can get pretty good speedups by pre-allocating the array instead of appending using the ~= operator. You can safely delete specific objects manually even when the GC is enabled. For very large objects with trivial lifetimes, this is probably worth doing. First of all, the GC will run less frequently. Secondly, D's GC is partially conservative, meaning that occasionally memory will not be freed when it should be. The probability of this happening is proportional to the size of the memory block. Lastly, I've been working on a generic second stack/mark-release allocator for D2, called TempAlloc. It's useful for when you need to temporarily allocate memory in a last in, first out order, but you can't use the call stack for whatever reason. I've also implemented a few basic data structures (hash tables and hash sets) that are specifically designed for this allocator. Right now, it's coevolving with my dstats statistics lib, but if you want to try it or at least look at it and give me some feedback, I'd like to eventually get it to the point where it can be added to Phobos and/or Tango. See http://svn.dsource.org/projects/dstats/docs/alloc.html .
May 24 2009
next sibling parent reply nobody <no where.com> writes:
 One thing you could try is disabling the GC (this really just disables
automatic
 running of the collector) and run it manually at points that you know make
sense.
  For example, you could just insert a GC.collect() statement at the end of
every
 run of your main loop.
 Another thing to try is avoiding appending to arrays.  If you know the length
in
 advance, you can get pretty good speedups by pre-allocating the array instead
of
 appending using the ~= operator.
 You can safely delete specific objects manually even when the GC is enabled. 
For
 very large objects with trivial lifetimes, this is probably worth doing. 
First of
 all, the GC will run less frequently.  Secondly, D's GC is partially
conservative,
 meaning that occasionally memory will not be freed when it should be.  The
 probability of this happening is proportional to the size of the memory block.

I have tried all these: with GC enabled only periodically runs in the main loop, however the memory still grows faster than I expected when I feed more data into the program. Then I manually delete some specific objects. However the program start to fail randomly. Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor for certain class, and called delete manually on certain objects. The program fails at random stages, with some stack trace showing some GC calls like: 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk () I suspected the GC is buggy when mixed with manual deletes.
May 24 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from nobody (no where.com)'s article
 One thing you could try is disabling the GC (this really just disables
automatic
 running of the collector) and run it manually at points that you know make
sense.
  For example, you could just insert a GC.collect() statement at the end of
every
 run of your main loop.
 Another thing to try is avoiding appending to arrays.  If you know the length
in
 advance, you can get pretty good speedups by pre-allocating the array instead
of
 appending using the ~= operator.
 You can safely delete specific objects manually even when the GC is enabled. 
For
 very large objects with trivial lifetimes, this is probably worth doing. 
First of
 all, the GC will run less frequently.  Secondly, D's GC is partially
conservative,
 meaning that occasionally memory will not be freed when it should be.  The
 probability of this happening is proportional to the size of the memory block.

however the memory still grows faster than I expected when I feed more data into the program. Then I manually delete some specific objects. However the program start to fail randomly. Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor for certain class, and called delete manually on certain objects. The program fails at random stages, with some stack trace showing some GC calls

  0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk ()
 I suspected the GC is buggy when mixed with manual deletes.

I personally have not experienced this. Please be more specific: D1 or D2? If D1, Phobos or Tango? DMD, LDC, or GDC? Compiler version? Also, please file a bug report, especially if you can create a concise, reproducible test case.
May 24 2009
parent nobody <no where.com> writes:
 I suspected the GC is buggy when mixed with manual deletes.

D1 or D2?

D2.
 If D1, Phobos or Tango?
 DMD, LDC, or GDC?

DMD v2.030
 Compiler version?
 Also, please file a bug report, especially if you can create a concise,
 reproducible test case.

It's hard to isolate the code, and since the program is non-trivial I'm not 100% sure, as it could be my bug.
May 24 2009
prev sibling parent nobody <no where.com> writes:
== Quote from Brad Roberts (braddr puremagic.com)'s article
 After enabling the gc, did you force a collection?  Just enabling it won't
cause
 one to occur.

Yes, I called: core.memory.GC.enable(); core.memory.GC.collect(); core.memory.GC.disable();
May 24 2009
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
nobody wrote:
 One thing you could try is disabling the GC (this really just disables
automatic
 running of the collector) and run it manually at points that you know make
sense.
  For example, you could just insert a GC.collect() statement at the end of
every
 run of your main loop.
 Another thing to try is avoiding appending to arrays.  If you know the length
in
 advance, you can get pretty good speedups by pre-allocating the array instead
of
 appending using the ~= operator.
 You can safely delete specific objects manually even when the GC is enabled. 
For
 very large objects with trivial lifetimes, this is probably worth doing. 
First of
 all, the GC will run less frequently.  Secondly, D's GC is partially
conservative,
 meaning that occasionally memory will not be freed when it should be.  The
 probability of this happening is proportional to the size of the memory block.

I have tried all these: with GC enabled only periodically runs in the main loop, however the memory still grows faster than I expected when I feed more data into the program. Then I manually delete some specific objects. However the program start to fail randomly. Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor for certain class, and called delete manually on certain objects. The program fails at random stages, with some stack trace showing some GC calls like: 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk () I suspected the GC is buggy when mixed with manual deletes.

After enabling the gc, did you force a collection? Just enabling it won't cause one to occur. Later, Brad
May 24 2009
prev sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
nobody, el 24 de mayo a las 19:05 me escribiste:
 Hi,
 
 I'm writing a data processing program in D, which deals with large amounts of
 small objects. One of the thing I found is that D's GC is horribly slow in
 such situation. I tried my program with gc enable & disabled (with some manual
 deletes). The GC disabled version (2 min) is ~100 times faster than the GC
 enabled version (4 hours)!
 
 But of course the GC disabled version still leak memory, it soon exceeds the
 machine memory limit when I try to process more data; while the GC enabled
 version don't have such problem.
 
 So my plan is to use the GC disabled version with manual deletes. But it was
 very hard to find all the memory leaks. I'm wondering: is there anyway to use
 GC as a leak detector? can the GC enabled version give me some help
 information on which objects get collected, so I can manually delete them in
 my GC disabled version?  Thanks!

As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can enable logging in the GC (using the LOGGING version identifier). Is your program source available? I'm gathering programs to make a D GC benchmark suite an your programs seems like a good candidate for measuring the GC performance. Thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Que importante, entonces en estos días de globalización refregar nuestras almas, pasarle el lampazo a nuestros corazones para alcanzar un verdadero estado de babia peperianal. -- Peperino Pómoro
May 24 2009
parent reply nobody <no where.com> writes:
 As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can

DMD v2.030 on Linux.
 enable logging in the GC (using the LOGGING version identifier).

How to do it in D2?
 Is your program source available? I'm gathering programs to make a D GC

Sorry, no.
May 24 2009
parent reply Leandro Lucarella <llucax gmail.com> writes:
nobody, el 25 de mayo a las 03:24 me escribiste:
 As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can

DMD v2.030 on Linux.
 enable logging in the GC (using the LOGGING version identifier).

How to do it in D2?

You should recompile Druntime's GC with -version=LOGGING.
 Is your program source available? I'm gathering programs to make a D GC

Sorry, no.

=( -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Es más probable que el tomate sea perita, a que la pera tomatito. -- Peperino Pómoro
May 25 2009
parent reply nobody <no where.com> writes:
== Quote from Leandro Lucarella (llucax gmail.com)'s article
 nobody, el 25 de mayo a las 03:24 me escribiste:
 As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can

DMD v2.030 on Linux.
 enable logging in the GC (using the LOGGING version identifier).

How to do it in D2?


When linking does DMD link against libphobos2.a, libdruntime.a or both?
 Is your program source available? I'm gathering programs to make a D GC

Sorry, no.


But I have posted a simple example earlier, it's silly, but disable GC get 5x speed up: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=90983
May 25 2009
parent Leandro Lucarella <llucax gmail.com> writes:
nobody, el 25 de mayo a las 18:31 me escribiste:
 == Quote from Leandro Lucarella (llucax gmail.com)'s article
 nobody, el 25 de mayo a las 03:24 me escribiste:
 As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can

DMD v2.030 on Linux.
 enable logging in the GC (using the LOGGING version identifier).

How to do it in D2?


When linking does DMD link against libphobos2.a, libdruntime.a or both?

I think libphobos2.a contains libdruntime.a, but I'm not sure. You better check the sources.
 Is your program source available? I'm gathering programs to make a D GC

Sorry, no.


But I have posted a simple example earlier, it's silly, but disable GC get 5x speed up: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=90983

Oh, I saw that, thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Robar un alfajor es revolucionario, pues rompe con la idea de propiedad, incluso más que si se tratara de dinero. -- publikazion anarkista mdp (hablando de los destrozos de la Cumbre de las Americas en Mar del Plata, 2005)
May 25 2009