digitalmars.D - Threads and GC

Juan Jose Comellas (64/64) Mar 17 2006 I'm having a problem with the garbage collector when working with thread...

Sean Kelly (14/84) Mar 17 2006 To sum up, Kris had encountered deadlock problems both with Phobos and

Juan Jose Comellas <jcomellas gmail.com> writes:

I'm having a problem with the garbage collector when working with threads
and DMD 0.149 on Linux. I'm currently writing an application to test some
socket-related functionality and it's crashing whenever the garbage
collector kicks in.

I have two threads (one acting as server and the other one acting as
client). Both threads are running tight loops processing messages from each
other. In each of the iterations, a small amount of memory is used. At some
point, the garbage collector is activated and the SIGUSR1 signal is sent to
suspend all the other threads, and just after that I see a crash in the
other thread.

From what I've seen of Phobos, when activating the garbage collector, the
threads are suspended using the SIGUSR1 signal and are resumed with the
SIGUSR2 signal. In my test I never see the SIGUSR2 signal being sent.

Has anybody else seen something like this before? It seems that Sean and
Kris have found some problem with the GC too in Ares, but I haven't read
their postings yet (dsource.org is down right now).

In case anybody else finds the backtraces useful, I'm including what I could
get using an unpatched gdb:

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1442708400 (LWP 8344)]
0x5557a84e in send () from /lib/tls/libpthread.so.0
(gdb) bt

_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413

at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869

at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198

selector.d:308

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1433270496 (LWP 8341)]
0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
(gdb) bt

_D5mango10containers7HashMap89__T7HashMapTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ7HashMap8iteratorFZC5mango10containers8Iterator101__T18MutableMapIteratorTT5mango2io5model8IConduit8IConduit6HandleTC5mango2io5model8IConduit8IConduitZ18MutableMapIterator
()

at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303

_D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
()

at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609

_D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
() at selector.d:130

Mar 17 2006

Sean Kelly <sean f4.ca> writes:

Juan Jose Comellas wrote:
I'm having a problem with the garbage collector when working with threads
and DMD 0.149 on Linux. I'm currently writing an application to test some
socket-related functionality and it's crashing whenever the garbage
collector kicks in.

Has anybody else seen something like this before? It seems that Sean and
Kris have found some problem with the GC too in Ares, but I haven't read
their postings yet (dsource.org is down right now).

To sum up, Kris had encountered deadlock problems both with Phobos and
with Ares. I've since fixed Ares and have been trying to suss out the
Phobos issues. I've been focusing on the Win32 code up to now, and have
found a potential resource leak with Phobos threads, but no sign of a
potential deadlock yet. But perhaps I should give the Posix code a look
as well.

In case anybody else finds the backtraces useful, I'm including what I could
get using an unpatched gdb:

Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 1442708400 (LWP 8344)]
0x5557a84e in send () from /lib/tls/libpthread.so.0
(gdb) bt

_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi ()
at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1413

at /home/jcomellas/devel/d/mango_test/mango/io/Socket.d:869

at /home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198

selector.d:308

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1433270496 (LWP 8341)]
0x080673b1 in _D3gcx3Gcx4markFPvPvZv ()
(gdb) bt

at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:303

_D5mango2io8selector14SelectSelector18SelectSelectionSet7opApplyFDFKC5mango2io8selector5model9ISelector12SelectionKeyZiZi
()

at /home/jcomellas/devel/d/mango_test/mango/io/selector/SelectSelector.d:609

_D8selector12testSelectorFC5mango2io8selector5model9ISelector9ISelectorZv
() at selector.d:130

Hrm, so the GC thread blows up while trying to scan into pthread library
code? I don't see any reason for this to happen, so long as the stack
range being passed to the GC is valid. I know there are some library
functions that are not considered cancelable, but I would think that
they simply turn off signal handling for the span where that's true.

Sean

Mar 17 2006

D Programming

C/C++ Programming

Other

digitalmars.D - Threads and GC