www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Threads and GC

reply Juan Jose Comellas <jcomellas gmail.com> writes:
I'm having a problem with the garbage collector when working with threads
and DMD 0.149 on Linux. I'm currently writing an application to test some
socket-related functionality and it's crashing whenever the garbage
collector kicks in. 

I have two threads (one acting as server and the other one acting as
client). Both threads are running tight loops processing messages from each
other. In each of the iterations, a small amount of memory is used. At some
point, the garbage collector is activated and the SIGUSR1 signal is sent to
suspend all the other threads, and just after that I see a crash in the
other thread.

From what I've seen of Phobos, when activating the garbage collector, the
threads are suspended using the SIGUSR1 signal and are resumed with the
SIGUSR2 signal. In my test I never see the SIGUSR2 signal being sent.

Has anybody else seen something like this before? It seems that Sean and
Kris have found some problem with the GC too in Ares, but I haven't read
their postings yet (dsource.org is down right now).

In case anybody else finds the backtraces useful, I'm including what I could
get using an unpatched gdb:


Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1442708400 (LWP 8344)]
0x5557a84e in send () from /lib/tls/libpthread.so.0
(gdb) bt
#0  0x5557a84e in send () from /lib/tls/libpthread.so.0
#1  0x08053641 in
_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413
#2  0x08052e70 in _D5mango2io6Socket6Socket6writerFAvZk ()
at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869
#3  0x0804ef8e in _D5mango2io7Conduit7Conduit5writeFAvZk ()
at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198
#4  0x0805c881 in _D8selector16clientThreadFuncFZv () at selector.d:338
#5  0x0805c776 in _D8selector21dummyClientThreadFuncFPvZi () at
selector.d:308
#6  0x08063213 in _D3std6thread6Thread3runFZi ()
#7  0x08063557 in _D3std6thread6Thread11threadstartUPvZPv ()
#8  0x55575cfd in start_thread () from /lib/tls/libpthread.so.0
#9  0x5567913e in clone () from /lib/tls/libc.so.6
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1433270496 (LWP 8341)]
0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
(gdb) bt
#0  0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
#1  0x080675a8 in _D3gcx3Gcx11fullcollectFPvZk ()
#2  0x0806746a in _D3gcx3Gcx16fullcollectshellFZk ()
#3  0x080665bc in _D3gcx2GC12mallocNoSyncFkZPv ()
#4  0x0806650c in _D3gcx2GC6mallocFkZPv ()
#5  0x08062686 in _d_newclass ()
#6  0x08056004 in
_D5mango10containers7HashMap89__T7HashMapTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ7HashMap8iteratorFZC5mango10containers8Iterator101__T18MutableMapIteratorTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ18MutableMapIterator
()
   
at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303
#7  0x08055439 in
_D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
()
   
at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609
#8  0x0805c5ca in
_D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
() at selector.d:130
#9  0x0805c349 in _Dmain () at selector.d:47
#10 0x0805e52b in main ()
Mar 17 2006
parent Sean Kelly <sean f4.ca> writes:
Juan Jose Comellas wrote:
 I'm having a problem with the garbage collector when working with threads
 and DMD 0.149 on Linux. I'm currently writing an application to test some
 socket-related functionality and it's crashing whenever the garbage
 collector kicks in. 
 
 I have two threads (one acting as server and the other one acting as
 client). Both threads are running tight loops processing messages from each
 other. In each of the iterations, a small amount of memory is used. At some
 point, the garbage collector is activated and the SIGUSR1 signal is sent to
 suspend all the other threads, and just after that I see a crash in the
 other thread.
 
 From what I've seen of Phobos, when activating the garbage collector, the
 threads are suspended using the SIGUSR1 signal and are resumed with the
 SIGUSR2 signal. In my test I never see the SIGUSR2 signal being sent.
 
 Has anybody else seen something like this before? It seems that Sean and
 Kris have found some problem with the GC too in Ares, but I haven't read
 their postings yet (dsource.org is down right now).

To sum up, Kris had encountered deadlock problems both with Phobos and with Ares. I've since fixed Ares and have been trying to suss out the Phobos issues. I've been focusing on the Win32 code up to now, and have found a potential resource leak with Phobos threads, but no sign of a potential deadlock yet. But perhaps I should give the Posix code a look as well.
 In case anybody else finds the backtraces useful, I'm including what I could
 get using an unpatched gdb:
 
 
 Program received signal SIGUSR1, User defined signal 1.
 [Switching to Thread 1442708400 (LWP 8344)]
 0x5557a84e in send () from /lib/tls/libpthread.so.0
 (gdb) bt
 #0  0x5557a84e in send () from /lib/tls/libpthread.so.0
 #1  0x08053641 in
 _D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
 at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413
 #2  0x08052e70 in _D5mango2io6Socket6Socket6writerFAvZk ()
 at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869
 #3  0x0804ef8e in _D5mango2io7Conduit7Conduit5writeFAvZk ()
 at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198
 #4  0x0805c881 in _D8selector16clientThreadFuncFZv () at selector.d:338
 #5  0x0805c776 in _D8selector21dummyClientThreadFuncFPvZi () at
 selector.d:308
 #6  0x08063213 in _D3std6thread6Thread3runFZi ()
 #7  0x08063557 in _D3std6thread6Thread11threadstartUPvZPv ()
 #8  0x55575cfd in start_thread () from /lib/tls/libpthread.so.0
 #9  0x5567913e in clone () from /lib/tls/libc.so.6
 (gdb) cont
 Continuing.
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 1433270496 (LWP 8341)]
 0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
 (gdb) bt
 #0  0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
 #1  0x080675a8 in _D3gcx3Gcx11fullcollectFPvZk ()
 #2  0x0806746a in _D3gcx3Gcx16fullcollectshellFZk ()
 #3  0x080665bc in _D3gcx2GC12mallocNoSyncFkZPv ()
 #4  0x0806650c in _D3gcx2GC6mallocFkZPv ()
 #5  0x08062686 in _d_newclass ()
 #6  0x08056004 in
 _D5mango10containers7HashMap89__T7HashMapTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ7HashMap8iteratorFZC5mango10containers8Iterator101__T18MutableMapIteratorTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ18MutableMapIterator
 ()
    
 at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303
 #7  0x08055439 in
 _D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
 ()
    
 at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609
 #8  0x0805c5ca in
 _D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
 () at selector.d:130
 #9  0x0805c349 in _Dmain () at selector.d:47
 #10 0x0805e52b in main ()

Hrm, so the GC thread blows up while trying to scan into pthread library code? I don't see any reason for this to happen, so long as the stack range being passed to the GC is valid. I know there are some library functions that are not considered cancelable, but I would think that they simply turn off signal handling for the span where that's true. Sean Sean
Mar 17 2006