www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Need help with debugging Segfault

reply d coder <dlang.coder gmail.com> writes:
--f46d042ef5d76b3e1c04cfd74b7c
Content-Type: text/plain; charset=ISO-8859-1

Greetings

I have a code that crashes with current github dmd snapshot with a
segfault. It compiles and runs fine with the released versions of DMD. I am
using lots of structs and classes in the code and I believe the problem
could be related with the other issues that are there with structs. When I
give the compiled binary to valgrind, the report suggests that the segfault
might be related to garbage collector. I wanted to isolate the issue and
report it on bugzilla. Can somebody help me with the valgrind trace below
here, and guide me where to look for the problem? My actual code is
thousands of lines big and I am at a loss as to how I should isolate and
report this issue.

Regards
- Puneet

==4453== Invalid read of size 8
==4453==    at 0x44EFF5: _D4nett5mule5Mule3esl6__dtorMFZv
(../src/nett/mule.d:115)
==4453==    by 0x471124: _D4nett5mule5Mule9EslDomain11__fieldDtorMFZv
(../src/nett/mule.d:7923)
==4453==    by 0x4AD845: rt_finalize2 (in
/home/pgoel/mule/examples/test_code)
==4453==    by 0x4ABC92: _D2gc3gcx3Gcx11fullcollectMFZm (in
/home/pgoel/mule/examples/test_code)
==4453==    by 0x4A9BBA: _D2gc3gcx2GC18fullCollectNoStackMFZv (in
/home/pgoel/mule/examples/test_code)
==4453==    by 0x4A7F2C: gc_term (in /home/pgoel/mule/examples/test_code)
==4453==    by 0x48661B: _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi6runAllMFZv
(in /home/pgoel/mule/examples/test_code)
==4453==    by 0x4860F5:
_D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv (in
/home/pgoel/mule/examples/test_code)
==4453==    by 0x4860B1: _d_run_main (in
/home/pgoel/mule/examples/test_code)
==4453==    by 0x485EF2: main (in /home/pgoel/mule/examples/test_code)
==4453==  Address 0x2c0 is not stack'd, malloc'd or (recently) free'd

--f46d042ef5d76b3e1c04cfd74b7c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div><div>Greetings</div><div><br></div><div>I have a code that crashes wit=
h current github dmd snapshot with a segfault. It compiles and runs fine wi=
th the released versions of DMD. I am using lots of structs and classes in =
the code and I believe the problem could be related with the other issues t=
hat are there with structs. When I give the compiled binary to valgrind, th=
e report suggests that the segfault might be related to garbage collector. =
I wanted to isolate the issue and report it on bugzilla. Can somebody help =
me with the valgrind trace below here, and guide me where to look for the p=
roblem? My actual code is thousands of lines big and I am at a loss as to h=
ow I should isolate and report this issue.</div>

<div><br></div><div>Regards</div><div>- Puneet</div><div><br></div><div>=3D=
=3D4453=3D=3D Invalid read of size 8</div><div>=3D=3D4453=3D=3D =A0 =A0at 0=
x44EFF5: _D4nett5mule5Mule3esl6__dtorMFZv (../src/nett/mule.d:115)</div><di=
v>=3D=3D4453=3D=3D =A0 =A0by 0x471124: _D4nett5mule5Mule9EslDomain11__field=
DtorMFZv (../src/nett/mule.d:7923)</div>

<div>=3D=3D4453=3D=3D =A0 =A0by 0x4AD845: rt_finalize2 (in /home/pgoel/mule=
/examples/test_code)</div><div>=3D=3D4453=3D=3D =A0 =A0by 0x4ABC92: _D2gc3g=
cx3Gcx11fullcollectMFZm (in /home/pgoel/mule/examples/test_code)</div><div>=
=3D=3D4453=3D=3D =A0 =A0by 0x4A9BBA: _D2gc3gcx2GC18fullCollectNoStackMFZv (=
in /home/pgoel/mule/examples/test_code)</div>

<div>=3D=3D4453=3D=3D =A0 =A0by 0x4A7F2C: gc_term (in /home/pgoel/mule/exam=
ples/test_code)</div><div>=3D=3D4453=3D=3D =A0 =A0by 0x48661B: _D2rt6dmain2=
11_d_run_mainUiPPaPUAAaZiZi6runAllMFZv (in /home/pgoel/mule/examples/test_c=
ode)</div><div>=3D=3D4453=3D=3D =A0 =A0by 0x4860F5: _D2rt6dmain211_d_run_ma=
inUiPPaPUAAaZiZi7tryExecMFMDFZvZv (in /home/pgoel/mule/examples/test_code)<=
/div>

<div>=3D=3D4453=3D=3D =A0 =A0by 0x4860B1: _d_run_main (in /home/pgoel/mule/=
examples/test_code)</div><div>=3D=3D4453=3D=3D =A0 =A0by 0x485EF2: main (in=
 /home/pgoel/mule/examples/test_code)</div><div>=3D=3D4453=3D=3D =A0Address=
 0x2c0 is not stack&#39;d, malloc&#39;d or (recently) free&#39;d</div>

</div><div><br></div>

--f46d042ef5d76b3e1c04cfd74b7c--
Dec 01 2012
next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 12/01/2012 08:44 PM, d coder wrote:

 ==4453== Invalid read of size 8
 ==4453==    at 0x44EFF5: _D4nett5mule5Mule3esl6__dtorMFZv
 (../src/nett/mule.d:115)

Are you accessing any resource in Mule's destructor, which is maintained by the GC? If so, it is possible that that resource has already been finalized. The destruction order of GC-maintained resuorces is not deterministic as e.g. in C++. It is quite possible that the member of an object is destroyed before the object itself. Ali
Dec 01 2012
prev sibling next sibling parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Sunday, 2 December 2012 at 04:45:13 UTC, d coder wrote:
 Greetings

 I have a code that crashes with current github dmd snapshot 
 with a
 segfault. It compiles and runs fine with the released versions 
 of DMD. I am
 using lots of structs and classes in the code and I believe the 
 problem
 could be related with the other issues that are there with 
 structs. When I
 give the compiled binary to valgrind, the report suggests that 
 the segfault
 might be related to garbage collector. I wanted to isolate the 
 issue and
 report it on bugzilla. Can somebody help me with the valgrind 
 trace below
 here, and guide me where to look for the problem? My actual 
 code is
 thousands of lines big and I am at a loss as to how I should 
 isolate and
 report this issue.

 Regards
 - Puneet

 ==4453== Invalid read of size 8
 ==4453==    at 0x44EFF5: _D4nett5mule5Mule3esl6__dtorMFZv
 (../src/nett/mule.d:115)
 ==4453==    by 0x471124: 
 _D4nett5mule5Mule9EslDomain11__fieldDtorMFZv
 (../src/nett/mule.d:7923)
 ==4453==    by 0x4AD845: rt_finalize2 (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4ABC92: _D2gc3gcx3Gcx11fullcollectMFZm (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4A9BBA: _D2gc3gcx2GC18fullCollectNoStackMFZv 
 (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4A7F2C: gc_term (in 
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x48661B: 
 _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi6runAllMFZv
 (in /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4860F5:
 _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4860B1: _d_run_main (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x485EF2: main (in 
 /home/pgoel/mule/examples/test_code)
 ==4453==  Address 0x2c0 is not stack'd, malloc'd or (recently) 
 free'd

In addition to accessing reclaimed by GC objects in class destructors you may encounter segfaults with structs (http://forum.dlang.org/thread/50B3859D.7060900 webdrake.net and http://forum.dlang.org/thread/mailman.2410.1354281296.5162.digitalmars-d puremagic.com).
Dec 01 2012
prev sibling next sibling parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Sunday, 2 December 2012 at 04:45:13 UTC, d coder wrote:
 Greetings

 I have a code that crashes with current github dmd snapshot 
 with a
 segfault. It compiles and runs fine with the released versions 
 of DMD. I am
 using lots of structs and classes in the code and I believe the 
 problem
 could be related with the other issues that are there with 
 structs. When I
 give the compiled binary to valgrind, the report suggests that 
 the segfault
 might be related to garbage collector. I wanted to isolate the 
 issue and
 report it on bugzilla. Can somebody help me with the valgrind 
 trace below
 here, and guide me where to look for the problem? My actual 
 code is
 thousands of lines big and I am at a loss as to how I should 
 isolate and
 report this issue.

 Regards
 - Puneet

 ==4453== Invalid read of size 8
 ==4453==    at 0x44EFF5: _D4nett5mule5Mule3esl6__dtorMFZv
 (../src/nett/mule.d:115)
 ==4453==    by 0x471124: 
 _D4nett5mule5Mule9EslDomain11__fieldDtorMFZv
 (../src/nett/mule.d:7923)
 ==4453==    by 0x4AD845: rt_finalize2 (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4ABC92: _D2gc3gcx3Gcx11fullcollectMFZm (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4A9BBA: _D2gc3gcx2GC18fullCollectNoStackMFZv 
 (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4A7F2C: gc_term (in 
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x48661B: 
 _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi6runAllMFZv
 (in /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4860F5:
 _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x4860B1: _d_run_main (in
 /home/pgoel/mule/examples/test_code)
 ==4453==    by 0x485EF2: main (in 
 /home/pgoel/mule/examples/test_code)
 ==4453==  Address 0x2c0 is not stack'd, malloc'd or (recently) 
 free'd

You have two pretty powerful tools to help you isolate and debug, they are not advertised enough imho: 1) DustMite a tool which allows to automatically reduce test cases. It has been used with success several times here. 2) (I haven't tried it yet, to tell the truth, but it DOES look pretty powerful. I'm really surprised noone seems interested, as it should allow one to reproduce a bug exactly, - even on another computer -, down to threading and GC issues), the tool (or rather I should say debugging environment) I mentionned here: http://forum.dlang.org/post/ymfxuozenafnsvuipnjr forum.dlang.org (deterministic replay engine) Maybe thses can help.
Dec 02 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
12/3/2012 12:56 PM, d coder пишет:
     1) DustMite a tool which allows to automatically reduce test cases.
     It has been used with success several times here.


 Awesome!

 The tool took some 2 hours to reduce my testcase to less than 50 lines.

 I have filed a regression.
 http://d.puremagic.com/issues/show_bug.cgi?id=9111

But it's not a bug. Like Ali said: The destruction order of GC-maintained resources is not deterministic as e.g. in C++. It is quite possible that the member of an object is destroyed before the object itself. -- Dmitry Olshansky
Dec 03 2012
prev sibling next sibling parent d coder <dlang.coder gmail.com> writes:
--047d7b33d80adb072c04cfeeeea8
Content-Type: text/plain; charset=ISO-8859-1

 1) DustMite a tool which allows to automatically reduce test cases. It has
 been used with success several times here.

Awesome! The tool took some 2 hours to reduce my testcase to less than 50 lines. I have filed a regression. http://d.puremagic.com/issues/show_bug.cgi?id=9111 --047d7b33d80adb072c04cfeeeea8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"m= argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=3D":1= n6">1) DustMite a tool which allows to automatically reduce test cases. It = has been used with success several times here.<br> </div></blockquote></div><br><div>Awesome!</div><div><br></div><div>The too= l took some 2 hours to reduce my testcase to less than 50 lines.</div><div>= <br></div><div>I have filed a regression.</div><div><a href=3D"http://d.pur= emagic.com/issues/show_bug.cgi?id=3D9111">http://d.puremagic.com/issues/sho= w_bug.cgi?id=3D9111</a></div> --047d7b33d80adb072c04cfeeeea8--
Dec 03 2012
prev sibling parent d coder <dlang.coder gmail.com> writes:
--047d7b33d80a4d86c804cff02ee2
Content-Type: text/plain; charset=ISO-8859-1

 But it's not a bug. Like Ali said:

 The destruction order of GC-maintained resources is not deterministic as
 e.g. in C++. It is quite possible that the member of an object is destroyed
 before the object itself.

Oops. I get it now. What should be done to avoid this situation? I think I need to add a destructor for the parent object class that would make sure that such child objects (that need the parent to be alive during GC process) are destroyed before the GC process kicks in. Would that be sufficient or would it again group such GC processes and still keep the sequence indeterminable? In that case, I will need to introduce a finalize() function which needs to be called explicitly. Thanks and Regards - Puneet --047d7b33d80a4d86c804cff02ee2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"m= argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> But it&#39;s not a bug. Like Ali said:<br> <br> The destruction order of GC-maintained resources is not deterministic as e.= g. in C++. It is quite possible that the member of an object is destroyed b= efore the object itself.</blockquote><div><br></div><div>Oops. I get it now= .</div> <div><br></div><div>What should be done to avoid this situation? I think I = need to add a destructor for the parent object class that would make sure t= hat such child objects (that need the parent to be alive during GC process)= are destroyed before the GC process kicks in. Would that be sufficient or = would it again group such GC processes and still keep the sequence indeterm= inable? In that case, I will need to introduce a finalize() function which = needs to be called explicitly.=A0</div> <div><br></div><div>Thanks and Regards</div><div>- Puneet</div></div> --047d7b33d80a4d86c804cff02ee2--
Dec 03 2012