www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Profiling Garbage Collector

reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
As I contemplated the challenge of determining whether or not a library 
(not an entire program) causes heap activity and whether or not it 
leaves garbage for the gc, I decided that having a profiling garbage 
collector would be really useful for such tasks.

Such a profiling collector would have the following features:
- For each type in the program, it would track how many times each type 
is allocated, how many times each type is manually deleted, and how many 
times each type is collected by the garbage collector.
- It can return a string containing the above information.  Also, it'd 
be nice if the gc could summarize by saying which types were "leaked" 
most frequently.
- Ideally, it not only uses type information but also has some help from 
the compiler.  Thus it not only knows about types, but can keep track of 
every single allocation on a file by file, line by line, basis.  That 
way finding leaks would be as simple as reading "the allocation of Foo 
at line 42 in file bar.d was collected 1762 times".  This may require 
some extra origination information to be stored in each object/array/etc 
for programs undergoing GC profiling.

Well, other than spouting the idea and refusing to implement it, I am 
wondering - has anyone made this kind of thing yet?
Nov 28 2007
next sibling parent reply "Kris" <foo bar.com> writes:
There was a conversation about this just the other week, for Tango. It's on 
the cards, with some really slick features :)


"Chad J" <gamerChad _spamIsBad_gmail.com> wrote in message 
news:fil6a2$2nu0$1 digitalmars.com...
 As I contemplated the challenge of determining whether or not a library 
 (not an entire program) causes heap activity and whether or not it leaves 
 garbage for the gc, I decided that having a profiling garbage collector 
 would be really useful for such tasks.

 Such a profiling collector would have the following features:
 - For each type in the program, it would track how many times each type is 
 allocated, how many times each type is manually deleted, and how many 
 times each type is collected by the garbage collector.
 - It can return a string containing the above information.  Also, it'd be 
 nice if the gc could summarize by saying which types were "leaked" most 
 frequently.
 - Ideally, it not only uses type information but also has some help from 
 the compiler.  Thus it not only knows about types, but can keep track of 
 every single allocation on a file by file, line by line, basis.  That way 
 finding leaks would be as simple as reading "the allocation of Foo at line 
 42 in file bar.d was collected 1762 times".  This may require some extra 
 origination information to be stored in each object/array/etc for programs 
 undergoing GC profiling.

 Well, other than spouting the idea and refusing to implement it, I am 
 wondering - has anyone made this kind of thing yet? 
Nov 28 2007
parent Chad J <gamerChad _spamIsBad_gmail.com> writes:
Kris wrote:
 There was a conversation about this just the other week, for Tango. It's on 
 the cards, with some really slick features :)
 
 
Awesome!
Nov 28 2007
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
Chad J wrote:
 As I contemplated the challenge of determining whether or not a library 
 (not an entire program) causes heap activity and whether or not it 
 leaves garbage for the gc, I decided that having a profiling garbage 
 collector would be really useful for such tasks.
 
 Such a profiling collector would have the following features:
 - For each type in the program, it would track how many times each type 
 is allocated, how many times each type is manually deleted, and how many 
 times each type is collected by the garbage collector.
 - It can return a string containing the above information.  Also, it'd 
 be nice if the gc could summarize by saying which types were "leaked" 
 most frequently.
 - Ideally, it not only uses type information but also has some help from 
 the compiler.  Thus it not only knows about types, but can keep track of 
 every single allocation on a file by file, line by line, basis.  That 
 way finding leaks would be as simple as reading "the allocation of Foo 
 at line 42 in file bar.d was collected 1762 times".  This may require 
 some extra origination information to be stored in each object/array/etc 
 for programs undergoing GC profiling.
 
 Well, other than spouting the idea and refusing to implement it, I am 
 wondering - has anyone made this kind of thing yet?
Tracking "leaked" objects is quite easy to do in Tango. Check out GC.collectHandler in tango.core.Memory. There is currently no way to track objects that were manually deleted, but it wouldn't be difficult to add a similar hook for that. Sean
Nov 29 2007
parent reply Chad J <gamerChad _spamIsBad_gmail.com> writes:
Sean Kelly wrote:
 
 Tracking "leaked" objects is quite easy to do in Tango.  Check out 
 GC.collectHandler in tango.core.Memory.  There is currently no way to 
 track objects that were manually deleted, but it wouldn't be difficult 
 to add a similar hook for that.
 
 
 Sean
Yeah, when I thought of this I checked Tango and saw that. I like that collect handler feature, a lot. I don't think it pegs profiling though: even assuming I implement all of the logic and pass it to the tango GC, it still doesn't handle non-object entities, entities such as the ubiquitous array. Well, you guys have probably discussed this already so feel free to ignore the below. I feel like rambling. I'm having fun with this :) I also realized that even though the compiler might not be able to supply line and file info for an allocation, the library can still do a stack trace and discover the name of the function that caused the allocation. This does, of course, assume debugging info that allows stack tracing - something that already exists. So perhaps the GC/Tango needs these things to pull it off: - An allocation handler. - A collection handler. (partially done) - A deletion handler. - A copying handler, if the GC wants to do copying. This will be necessary to persist origination info. - Maybe a reallocation handler, or maybe this can be thought of as a kind of allocation. - All handlers must disclose the address(es) involved. - All handlers must disclose full runtime type information. - Each type (allocation/collection/deletion/copying) of handler must handle ANY kind of heap activity, not just objects. Specialized handlers may be built on top of that. - A stack trace function. Preferably it lets you select which frames you want to see. All of those are useful in general, but also just happen to be the right combination of stuff to implement a powerful gc profiler. Now the profiler can be implemented in a separate module or somesuch. The profiler can keep track of where each individual chunk of memory was allocated by means of an (associative?) array that maps addresses onto the function names that allocated them. Then when a deletion/collection occurs, it can just look up the name of the function that caused the corresponding allocation. I would love to even see just a stack trace function in Tango. Even better if Exception calls it and dumps the thing to the console so I can know where I screwed up.
Nov 29 2007
parent Robert Fraser <fraserofthenight gmail.com> writes:
Chad J wrote:
 I would love to even see just a stack trace function in Tango.  Even 
 better if Exception calls it and dumps the thing to the console so I can 
 know where I screwed up.
There already is a stack trace "hook" in Tango. You can use Flectioned to get exception stack traces for it.
Nov 29 2007