www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Threads and GC

reply Juan Jose Comellas <jcomellas gmail.com> writes:
I'm having a problem with the garbage collector when working with threads
and DMD 0.149 on Linux. I'm currently writing an application to test some
socket-related functionality and it's crashing whenever the garbage
collector kicks in. 

I have two threads (one acting as server and the other one acting as
client). Both threads are running tight loops processing messages from each
other. In each of the iterations, a small amount of memory is used. At some
point, the garbage collector is activated and the SIGUSR1 signal is sent to
suspend all the other threads, and just after that I see a crash in the
other thread.

From what I've seen of Phobos, when activating the garbage collector, the
threads are suspended using the SIGUSR1 signal and are resumed with the
SIGUSR2 signal. In my test I never see the SIGUSR2 signal being sent.

Has anybody else seen something like this before? It seems that Sean and
Kris have found some problem with the GC too in Ares, but I haven't read
their postings yet (dsource.org is down right now).

In case anybody else finds the backtraces useful, I'm including what I could
get using an unpatched gdb:


Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1442708400 (LWP 8344)]
0x5557a84e in send () from /lib/tls/libpthread.so.0
(gdb) bt


_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413

at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869

at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198


selector.d:308




(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1433270496 (LWP 8341)]
0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
(gdb) bt







_D5mango10containers7HashMap89__T7HashMapTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ7HashMap8iteratorFZC5mango10containers8Iterator101__T18MutableMapIteratorTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ18MutableMapIterator
()
   
at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303

_D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
()
   
at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609

_D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
() at selector.d:130


Mar 17 2006
parent Sean Kelly <sean f4.ca> writes:
Juan Jose Comellas wrote:
 I'm having a problem with the garbage collector when working with threads
 and DMD 0.149 on Linux. I'm currently writing an application to test some
 socket-related functionality and it's crashing whenever the garbage
 collector kicks in. 
 
 I have two threads (one acting as server and the other one acting as
 client). Both threads are running tight loops processing messages from each
 other. In each of the iterations, a small amount of memory is used. At some
 point, the garbage collector is activated and the SIGUSR1 signal is sent to
 suspend all the other threads, and just after that I see a crash in the
 other thread.
 
 From what I've seen of Phobos, when activating the garbage collector, the
 threads are suspended using the SIGUSR1 signal and are resumed with the
 SIGUSR2 signal. In my test I never see the SIGUSR2 signal being sent.
 
 Has anybody else seen something like this before? It seems that Sean and
 Kris have found some problem with the GC too in Ares, but I haven't read
 their postings yet (dsource.org is down right now).
To sum up, Kris had encountered deadlock problems both with Phobos and with Ares. I've since fixed Ares and have been trying to suss out the Phobos issues. I've been focusing on the Win32 code up to now, and have found a potential resource leak with Phobos threads, but no sign of a potential deadlock yet. But perhaps I should give the Posix code a look as well.
 In case anybody else finds the backtraces useful, I'm including what I could
 get using an unpatched gdb:
 
 
 Program received signal SIGUSR1, User defined signal 1.
 [Switching to Thread 1442708400 (LWP 8344)]
 0x5557a84e in send () from /lib/tls/libpthread.so.0
 (gdb) bt


 _D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
 at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413

 at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869

 at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198


 selector.d:308




 (gdb) cont
 Continuing.
 
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 1433270496 (LWP 8341)]
 0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
 (gdb) bt







 _D5mango10containers7HashMap89__T7HashMapTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ7HashMap8iteratorFZC5mango10containers8Iterator101__T18MutableMapIteratorTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ18MutableMapIterator
 ()
    
 at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303

 _D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
 ()
    
 at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609

 _D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
 () at selector.d:130


Hrm, so the GC thread blows up while trying to scan into pthread library code? I don't see any reason for this to happen, so long as the stack range being passed to the GC is valid. I know there are some library functions that are not considered cancelable, but I would think that they simply turn off signal handling for the span where that's true. Sean Sean
Mar 17 2006