www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Debugging a Memory Leak

reply "Maxime Chevalier-Boisvert" <maximechevalierb gmail.com> writes:
There seems to be a memory leak in the Higgs compiler. This 
problem shows up when running our test suite (`make test` 
command).

A new VM object is created for each unittest block, e.g.:
https://github.com/maximecb/Higgs/blob/master/source/runtime/tests.d#L201

These VM objects are unfortunately *never freed*. Not until the 
whole series of tests is run and the process terminates. The VM 
objects keep references to many other objects, and so the process 
keeps using more and more memory, up to over 2GB.

The VM allocates it's own JS data heap that it manages itself, 
i.e.:
https://github.com/maximecb/Higgs/blob/master/source/runtime/gc.d#L186

This memory is clearly marked as NO_SCAN, and so references to 
the VM in there should presumably not be counted. There is also 
executable memory I allocate with mmap, but this should also be 
ignored by the D GC in principle (I do not mark executable code 
as roots):
https://github.com/maximecb/Higgs/blob/master/source/jit/codeblock.d#L129

I don't know where the problem lies. There could be false 
pointers, but I'm on a 64-bit system, which should presumably 
make this less likely. I wish there was a way to ask the D 
runtime "can you tell me what is pointing to this object?", but 
the situation is more complex because many objects in my system 
refer to the VM object, there is a complicated graph of 
references. If anything points into that graph, the whole thing 
stays "live".

Help or advice on solving this problem is welcome.
Nov 17 2014
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/17/14 6:12 PM, Maxime Chevalier-Boisvert wrote:
 There seems to be a memory leak in the Higgs compiler. This problem
 shows up when running our test suite (`make test` command).

 A new VM object is created for each unittest block, e.g.:
 https://github.com/maximecb/Higgs/blob/master/source/runtime/tests.d#L201

 These VM objects are unfortunately *never freed*. Not until the whole
 series of tests is run and the process terminates. The VM objects keep
 references to many other objects, and so the process keeps using more
 and more memory, up to over 2GB.

 The VM allocates it's own JS data heap that it manages itself, i.e.:
 https://github.com/maximecb/Higgs/blob/master/source/runtime/gc.d#L186

 This memory is clearly marked as NO_SCAN, and so references to the VM in
 there should presumably not be counted. There is also executable memory
 I allocate with mmap, but this should also be ignored by the D GC in
 principle (I do not mark executable code as roots):
 https://github.com/maximecb/Higgs/blob/master/source/jit/codeblock.d#L129

 I don't know where the problem lies. There could be false pointers, but
 I'm on a 64-bit system, which should presumably make this less likely. I
 wish there was a way to ask the D runtime "can you tell me what is
 pointing to this object?", but the situation is more complex because
 many objects in my system refer to the VM object, there is a complicated
 graph of references. If anything points into that graph, the whole thing
 stays "live".
Hm... such a function could be created. However, it would be tricky to make work. First, you would need a way to store the pointer without having it actually point at the data. Clearly, if you pass the pointer to the function, it's going to be on the stack, so that would then refer to it. You have to somehow obfuscate it the whole time. Second, you may be given "memory x is pointing at your target", but what does memory x actually mean? That isn't something the GC can deal with. Perhaps when precise scanning is included (and I think we are close on that), you will have at least some type info.
 Help or advice on solving this problem is welcome.
GC problems are *nasty*. My advice is to run the simplest program you can think of that still exhibits the problem, and then put in printf debugging everywhere to see where it breaks down. Not sure if this is useful. -Steve
Nov 17 2014
parent reply "Maxime Chevalier-Boisvert" <maximechevalierb gmail.com> writes:
 GC problems are *nasty*. My advice is to run the simplest 
 program you can think of that still exhibits the problem, and 
 then put in printf debugging everywhere to see where it breaks 
 down.

 Not sure if this is useful.
Unfortunately, the program doesn't break or crash. It just keeps allocating memory that doesn't get freed. There must be some false reference somewhere. I'm not sure how I can printf debug my way out of that.
Nov 17 2014
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/17/14 11:41 PM, Maxime Chevalier-Boisvert wrote:
 GC problems are *nasty*. My advice is to run the simplest program you
 can think of that still exhibits the problem, and then put in printf
 debugging everywhere to see where it breaks down.

 Not sure if this is useful.
Unfortunately, the program doesn't break or crash. It just keeps allocating memory that doesn't get freed. There must be some false reference somewhere. I'm not sure how I can printf debug my way out of that.
By "break down", I mean it does what you don't want :) You will need to instrument the GC and/or druntime. Note, if there is a false pointer, it's likely stack based, and likely there is not very many of them. But you have NO_INTERIOR set. This means the false pointer MUST point at the beginning of the block in order to keep it alive. As I said, these are tricky issues. It would not be easy to determine. One thing you can try -- allocate the block as a class, with a finalizer. This gives you the ability to sense when/if a block is finalized. That can help you determine the point at which your program starts to misbehave. -Steve
Nov 18 2014
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 17 November 2014 at 23:12:10 UTC, Maxime 
Chevalier-Boisvert wrote:
 Help or advice on solving this problem is welcome.
The D GC has some debugging code which might be a little helpful (check the commented-out debug = X lines in druntime/src/gc/gc.d). Specifically, debug=LOGGING activates some sort of leak detector, though I'm not sure how effective it is as I've never used it. I've begun work on reviving Diamond to work for D2, multiple threads and x64. Once complete it should be able to answer such questions definitely, but it'll probably take a few days at least. Watch this space: https://github.com/CyberShadow/druntime/commits/diamond https://github.com/CyberShadow/Diamond
Nov 17 2014
prev sibling parent Etienne <etcimon gmail.com> writes:
On 2014-11-17 6:12 PM, Maxime Chevalier-Boisvert wrote:
 Help or advice on solving this problem is welcome.
I've tried dumping logs from the garbage collection process and it's the biggest waste of time. Even if you left a reference somewhere, the logs will not help identify the code that caused it. Instead, you should do a test with the following: Store in a string[size_t] a list of pointers that should have been collected, along with the variable name. Once you assume they should have been collected, run this: The thread_scanAll function will send you valid memory ranges in your code. Run the stored size_t list against each value contained in the memory range. Accumulate everything that matches into another hashmap, and then fail with the error "Variables [list of identifiers] still have references in the code!"
Nov 18 2014