www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Debugging heap corruption

reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
Hello,

I have a memory corruption problem. Namely, my application randomly crashes
with access violations or the heap data is corrupted.

The application in question is sizeable (over 8000 lines). The crash is hard to
reproduce (requires an amount of user interaction), however I've noticed that
it always happens after a certain user action. I have thoroughly examined the
code handling that action, tried modifying the code (adding asserts and .dups),
but to no avail. I might as well expect that it's a bug in the compiler or
Phobos.

I noticed that Phobos's GC has some debug code for "underrun/overrrun
protection" [sic] in phobos\internal\gc\gcx.d, however I found that the code is
unfinished. For some reason the corresponding code is put in "version
(SENTINEL)" blocks, instead of "debug (SENTINEL)" ones - which would explain
why it never worked (there is also a typo in code on line 567). Even with those
obvious mistakes fixed, I have no idea how much work would be required to get
the SENTINEL debug option working, since it's interfering with other language
features such as array concatenation, and firing off false alarms.

Someone has also suggested GDB/Vargrind, however I haven't attempted this
combination out of reasoning that since D's GC handles all the allocation and
manages the memory, Valgrind wouldn't be able to hook the memory allocation
routines and take over memory management from D's GC. If I reasoned wrongly,
please let me know.

Has anyone met and fought heap corruption issues with D before? I could really
use some advice, since I've been trying to solve it for weeks and it's causing
me to lose motivation in my project :(
Any help is appreciated!

-- 
Thanks,
  Vladimir                          mailto:thecybershadow gmail.com
Jul 29 2007
next sibling parent reply Leandro Lucarella <llucax gmail.com> writes:
Vladimir Panteleev, el 29 de julio a las 10:03 me escribiste:
 Someone has also suggested GDB/Vargrind, however I haven't attempted this
combination out of reasoning that since D's GC handles all the allocation and
manages the memory, Valgrind wouldn't be able to hook the memory allocation
routines and take over memory management from D's GC. If I reasoned wrongly,
please let me know.

Valgrind override both libc and system calls, and D's GC have to use them, so my wild guess is *yes*, valgrind should help. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ .------------------------------------------------------------------------, \ GPG: 5F5A8D05 // F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05 / '--------------------------------------------------------------------' La esperanza es una amiga que nos presta la ilusiĆ³n.
Jul 29 2007
parent reply Matthias Walter <walter mail.math.uni-magdeburg.de> writes:
Leandro Lucarella Wrote:
 Vladimir Panteleev, el 29 de julio a las 10:03 me escribiste:
 Someone has also suggested GDB/Vargrind, however I haven't attempted this
combination out of reasoning that since D's GC handles all the allocation and
manages the memory, Valgrind wouldn't be able to hook the memory allocation
routines and take over memory management from D's GC. If I reasoned wrongly,
please let me know.

Valgrind override both libc and system calls, and D's GC have to use them, so my wild guess is *yes*, valgrind should help.

Yes, valgrind works like a charm with D - except that D seems to not free all GC variables at the end, but this should be no problem. Memory corruption is still shown completely. Only problem I see here is if you have some errors in destructors, because then you'll see many corruptions evoked by GC-routines although the reasons are in your code. Also the symbol names are the mangled ones, so reading them is a bit odd. Matthias Walter
Jul 29 2007
next sibling parent Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
 Also the
 symbol names are the mangled ones, so reading them is a bit odd.

Someone have valgrind patch for this if I remember. Thomas Kuehne?
Jul 29 2007
prev sibling parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sun, 29 Jul 2007 17:02:08 +0300, Matthias Walter <walter mail.math.un=
i-magdeburg.de> wrote:

 Leandro Lucarella Wrote:

 Valgrind override both libc and system calls, and D's GC have to use =


 so my wild guess is *yes*, valgrind should help.

Yes, valgrind works like a charm with D - except that D seems to not f=

orruption is still shown completely. Only problem I see here is if you h= ave some errors in destructors, because then you'll see many corruptions= evoked by GC-routines although the reasons are in your code.
 Also the symbol names are the mangled ones, so reading them is a bit o=

Thanks for the replies. I'll clarify what I meant earlier regarding my a= ssumptions about Valgrind not working with D as well as with C/C++ progr= ams: Consider this D program: void main() { ubyte[] a; a.length =3D 10; // the allocated memory is in the GC-managed heap= auto p =3D a.ptr; = for(int i=3D0;i<=3D10;i++) *(p+i) =3D i; } In this case, the off-by-one error will go unnoticed, even when running = under Valgrind. The same doesn't happen if you don't use the GC: import std.c.stdlib; void main() { ubyte* p =3D cast(ubyte*)malloc(10); for(int i=3D0;i<=3D10;i++) *(p+i) =3D i; } malloc() wires directly to libc, which means that it will be recognized = and instrumented by Valgrind - thus, the off-by-one error will be caught= . Of course, this doesn't mean that heap corruptions related to D's GC are= untrace by Valgrind, as is displayed in practice. It's just that not al= l bugs, esp. minor (off-by-one) ones will be immediately detected with V= algrind. This sometimes causes the code to corrupt other parts of memory= allocated via the GC, or even some of the GC's control structures, caus= ing the program to crash as an indirect effect of the memory corruption = (and making it impossible to find the original cause of the corruption).= "Fixing" that would probably involve either teaching Valgrind to hook i= nto D's GC, or using the built-in (albeit unfinished) SENTINEL debugging= options mentioned in the original post. I am currently running my program under Valgrind, waiting for some resul= ts, hoping for the best. Will post updates when will have some progress = :) -- = Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 29 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 30 Jul 2007 05:23:18 +0300, Vladimir Panteleev
<thecybershadow gmail.com> wrote:

 I am currently running my program under Valgrind, waiting for some results,
hoping for the best. Will post updates when will have some progress :)

As I expected - after two few-hour runs, the program hung once and segfaulted the other. In the second case, the segfault happened when the GC stumbled on an invalid pointer, which was there, no doubt, due to heap corruption. I think I'll give modding Phobos's GC another shot (and perhaps take a look at the warnings Valgrind spits out about the GC referencing uninitialized memory). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 29 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Vladimir Panteleev wrote:
 On Mon, 30 Jul 2007 05:23:18 +0300, Vladimir Panteleev
 <thecybershadow gmail.com> wrote:
 
 I am currently running my program under Valgrind, waiting for some
 results, hoping for the best. Will post updates when will have some
 progress :)

As I expected - after two few-hour runs, the program hung once and segfaulted the other. In the second case, the segfault happened when the GC stumbled on an invalid pointer, which was there, no doubt, due to heap corruption. I think I'll give modding Phobos's GC another shot (and perhaps take a look at the warnings Valgrind spits out about the GC referencing uninitialized memory).

One method for finding heap corruption is to write custom memory allocation, reallocation and free routines. In the allocator you allocate extra memory before and after the block you actually return, you initialise these padding blocks to some known pattern and when it comes time to reallocate or free the memory you verify the padding is intact and has not been modified. This allows you to figure out which piece of memory has been corrupted and how (overrun etc). I used to use this to check I wasn't leaking any memory also but with a GC that's no longer important. I'm not sure whether D allows you to define global custom allocators, anyone? Or perhaps Tango has that capability? Regan
Jul 30 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 30 Jul 2007 12:03:15 +0300, Regan Heath <regan netmail.co.nz> wrote:

 Vladimir Panteleev wrote:
 One method for finding heap corruption is to write custom memory
 allocation, reallocation and free routines.

 In the allocator you allocate extra memory before and after the block
 you actually return, you initialise these padding blocks to some known
 pattern and when it comes time to reallocate or free the memory you
 verify the padding is intact and has not been modified.

 This allows you to figure out which piece of memory has been corrupted
 and how (overrun etc).

 I used to use this to check I wasn't leaking any memory also but with a
 GC that's no longer important.

 I'm not sure whether D allows you to define global custom allocators,
 anyone?  Or perhaps Tango has that capability?

That's exactly what Phobos's GC SENTINEL option is supposed to do (and is what I'll be looking at next). I assume that what you said doesn't apply to D? -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 30 2007
parent reply Regan Heath <regan netmail.co.nz> writes:
Vladimir Panteleev wrote:
 On Mon, 30 Jul 2007 12:03:15 +0300, Regan Heath <regan netmail.co.nz>
 wrote:
 
 Vladimir Panteleev wrote: One method for finding heap corruption is
 to write custom memory allocation, reallocation and free routines.
 
 In the allocator you allocate extra memory before and after the
 block you actually return, you initialise these padding blocks to
 some known pattern and when it comes time to reallocate or free the
 memory you verify the padding is intact and has not been modified.
 
 This allows you to figure out which piece of memory has been
 corrupted and how (overrun etc).
 
 I used to use this to check I wasn't leaking any memory also but
 with a GC that's no longer important.
 
 I'm not sure whether D allows you to define global custom
 allocators, anyone?  Or perhaps Tango has that capability?

That's exactly what Phobos's GC SENTINEL option is supposed to do (and is what I'll be looking at next).

I haven't heard of "Phobos's GC SENTINEL option" what is it? Where can I read about it in the D docs?
 I assume that what you said
 doesn't apply to D?

I'm not sure what you mean? Which part of what I said are you assuming doesn't apply to D? Regan
Jul 30 2007
parent reply "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Mon, 30 Jul 2007 13:16:32 +0300, Regan Heath <regan netmail.co.nz> wrote:

 I haven't heard of "Phobos's GC SENTINEL option" what is it?  Where can
 I read about it in the D docs?

I described it in the original post. It is unfinished and thus undocumented. Quoted:
 I noticed that Phobos's GC has some debug code for "underrun/overrrun
protection" [sic] in phobos\internal\gc\gcx.d, however I found that the code is
unfinished. For some reason the corresponding code is put in "version
(SENTINEL)" blocks, instead of "debug (SENTINEL)" ones - which would explain
why it never worked (there is also a typo in code on line 567). Even with those
obvious mistakes fixed, I have no idea how much work would be required to get
the SENTINEL debug option working, since it's interfering with other language
features such as array concatenation, and firing off false alarms.

 I assume that what you said doesn't apply to D?

I'm not sure what you mean? Which part of what I said are you assuming doesn't apply to D?

If what you said applies to D, then you must have used your own allocation routines only in your own code (I.E. you don't hook D's memory allocation), since you didn't know if it's possible to substitute the standard "global allocator". In that case, your code will have the same effect as using libc's malloc() and Valgrind (except your code wouldn't be able to detect immediately when an overflow has happened). See the rest of this thread for details (particularly my reply to Matthias). -- Best regards, Vladimir mailto:thecybershadow gmail.com
Jul 30 2007
parent Regan Heath <regan netmail.co.nz> writes:
Vladimir Panteleev wrote:
 On Mon, 30 Jul 2007 13:16:32 +0300, Regan Heath <regan netmail.co.nz>
 wrote:
 
 I haven't heard of "Phobos's GC SENTINEL option" what is it?  Where
 can I read about it in the D docs?

I described it in the original post. It is unfinished and thus undocumented. Quoted:
 I noticed that Phobos's GC has some debug code for
 "underrun/overrrun protection" [sic] in phobos\internal\gc\gcx.d,
 however I found that the code is unfinished. For some reason the
 corresponding code is put in "version (SENTINEL)" blocks, instead
 of "debug (SENTINEL)" ones - which would explain why it never
 worked (there is also a typo in code on line 567). Even with those
 obvious mistakes fixed, I have no idea how much work would be
 required to get the SENTINEL debug option working, since it's
 interfering with other language features such as array
 concatenation, and firing off false alarms.


Ahh, sorry, I didn't recall the phrase SENTINEL in your earlier post.
 I assume that what you said doesn't apply to D?

assuming doesn't apply to D?

If what you said applies to D, then you must have used your own allocation routines only in your own code (I.E. you don't hook D's memory allocation), since you didn't know if it's possible to substitute the standard "global allocator". In that case, your code will have the same effect as using libc's malloc() and Valgrind (except your code wouldn't be able to detect immediately when an overflow has happened). See the rest of this thread for details (particularly my reply to Matthias).

My own code which used this technique was written in C. Regan
Jul 30 2007
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
I don't know if this helps, but it's certainly related.

Using gdc, I had SEVERE issues with the garbage collector.  The more I 
avoided the need for garbage collection, the further my program would 
run.  disabling the gc or using dmd let it run indefinitely.  I've 
ported to Tango, but have not tested with gdc yet.  I did not try using 
the latest repository revision of gdc either.

Vladimir Panteleev wrote:
 Hello,
 
 I have a memory corruption problem. Namely, my application randomly crashes
with access violations or the heap data is corrupted.
 
 The application in question is sizeable (over 8000 lines). The crash is hard
to reproduce (requires an amount of user interaction), however I've noticed
that it always happens after a certain user action. I have thoroughly examined
the code handling that action, tried modifying the code (adding asserts and
.dups), but to no avail. I might as well expect that it's a bug in the compiler
or Phobos.
 
 I noticed that Phobos's GC has some debug code for "underrun/overrrun
protection" [sic] in phobos\internal\gc\gcx.d, however I found that the code is
unfinished. For some reason the corresponding code is put in "version
(SENTINEL)" blocks, instead of "debug (SENTINEL)" ones - which would explain
why it never worked (there is also a typo in code on line 567). Even with those
obvious mistakes fixed, I have no idea how much work would be required to get
the SENTINEL debug option working, since it's interfering with other language
features such as array concatenation, and firing off false alarms.
 
 Someone has also suggested GDB/Vargrind, however I haven't attempted this
combination out of reasoning that since D's GC handles all the allocation and
manages the memory, Valgrind wouldn't be able to hook the memory allocation
routines and take over memory management from D's GC. If I reasoned wrongly,
please let me know.
 
 Has anyone met and fought heap corruption issues with D before? I could really
use some advice, since I've been trying to solve it for weeks and it's causing
me to lose motivation in my project :(
 Any help is appreciated!
 

Jul 29 2007
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Vladimir Panteleev wrote:
 I noticed that Phobos's GC has some debug code for "underrun/overrrun
protection" [sic]
 in phobos\internal\gc\gcx.d, however I found that the code is 

 reason the corresponding code is put in "version (SENTINEL)" blocks, 

 "debug (SENTINEL)" ones - which would explain why it never worked 

 in code on line 567). Even with those obvious mistakes fixed, I have 

 work would be required to get the SENTINEL debug option working, 

 with other language features such as array concatenation, and firing 

Please post any bugs you've found or patches you've made to bugzilla.
Jul 29 2007