www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC dead-locking ?

reply Marco Leise <Marco.Leise gmx.de> writes:
Here is an excerpt from a stack trace I got while profiling
with OProfile:

#0  sem_wait () from /lib64/libpthread.so.0
#1  thread_suspendAll () at core/thread.d:2471
#2  gc.gcx.Gcx.fullcollect() (this=...) at gc/gcx.d:2427
#3  gc.gcx.Gcx.bigAlloc() (this=..., size=16401, poolPtr=0x7fc3d4bfe3c8,
alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099
#4  gc.gcx.GC.mallocNoSync (alloc_size=0x7fc3d4bfe418, bits=10, size=16401,
this=...) gc/gcx.d:503
#5  gc.gcx.GC.malloc() (this=..., size=16401, bits=10,
alloc_size=0x7fc3d4bfe418) gc/gcx.d:421
#6  gc.gc.gc_qalloc (ba=10, sz=<optimized out>) gc/gc.d:203
#7  gc_qalloc (sz=<optimized out>, ba=10) gc/gc.d:198
#8  _d_newarrayT (ti=..., length=4096) rt/lifetime.d:807
#9  sequencer.algorithm.gzip.HuffmanTree.__T6__ctorTG32hZ.__ctor() (this=...,
bitLengths=...) sequencer/algorithm/gzip.d:444

Two more threads are alive, but waiting on a condition
variable (i.e.: in pthread_cond_wait(), but from my own and
not from druntime code. Is there some obvious way I could have
dead-locked the GC ? Or is there a bug ?

This was compiled with GDC using DMD FE 2.062.

-- 
Marco
Jun 13 2013
next sibling parent Marco Leise <Marco.Leise gmx.de> writes:
One more note: I get this consistently during profiling, but
not without.
I don't count kernel involvement out either, since OProfile is
a kernel based profiler and there could be a quirk in its
interaction with semaphores.

-- 
Marco
Jun 13 2013
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:

 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
=20
 #0  sem_wait () from /lib64/libpthread.so.0
 #1  thread_suspendAll () at core/thread.d:2471
 #2  gc.gcx.Gcx.fullcollect() (this=3D...) at gc/gcx.d:2427
 #3  gc.gcx.Gcx.bigAlloc() (this=3D..., size=3D16401, =
poolPtr=3D0x7fc3d4bfe3c8, alloc_size=3D0x7fc3d4bfe418) at gc/gcx.d:2099
 #4  gc.gcx.GC.mallocNoSync (alloc_size=3D0x7fc3d4bfe418, bits=3D10, =
size=3D16401, this=3D...) gc/gcx.d:503
 #5  gc.gcx.GC.malloc() (this=3D..., size=3D16401, bits=3D10, =
alloc_size=3D0x7fc3d4bfe418) gc/gcx.d:421
 #6  gc.gc.gc_qalloc (ba=3D10, sz=3D<optimized out>) gc/gc.d:203
 #7  gc_qalloc (sz=3D<optimized out>, ba=3D10) gc/gc.d:198
 #8  _d_newarrayT (ti=3D..., length=3D4096) rt/lifetime.d:807
 #9  sequencer.algorithm.gzip.HuffmanTree.__T6__ctorTG32hZ.__ctor() =
(this=3D..., bitLengths=3D...) sequencer/algorithm/gzip.d:444
=20
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, = specifically) to suspend threads for a collection. So I imagine what's = happening is that your thread is trying to suspend all the other threads = so it can collect, and those threads are ignoring the signal for some = reason. I would expect pthread_cond_wait to be interrupted if a signal = arrives though. Have you overridden the signal handler for SIGUSR1?=
Jun 17 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 17 Jun 2013 10:46:19 -0700
schrieb Sean Kelly <sean invisibleduck.org>:

 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
 
 #0  sem_wait () from /lib64/libpthread.so.0
 #1  thread_suspendAll () at core/thread.d:2471
 #2  gc.gcx.Gcx.fullcollect() (this=...) at gc/gcx.d:2427
 #3  gc.gcx.Gcx.bigAlloc() (this=..., size=16401, poolPtr=0x7fc3d4bfe3c8,
alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099
 #4  gc.gcx.GC.mallocNoSync (alloc_size=0x7fc3d4bfe418, bits=10, size=16401,
this=...) gc/gcx.d:503
 #5  gc.gcx.GC.malloc() (this=..., size=16401, bits=10,
alloc_size=0x7fc3d4bfe418) gc/gcx.d:421
 #6  gc.gc.gc_qalloc (ba=10, sz=<optimized out>) gc/gc.d:203
 #7  gc_qalloc (sz=<optimized out>, ba=10) gc/gc.d:198
 #8  _d_newarrayT (ti=..., length=4096) rt/lifetime.d:807
 #9  sequencer.algorithm.gzip.HuffmanTree.__T6__ctorTG32hZ.__ctor() (this=...,
bitLengths=...) sequencer/algorithm/gzip.d:444
 
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, specifically) to suspend threads for a collection. So I imagine what's happening is that your thread is trying to suspend all the other threads so it can collect, and those threads are ignoring the signal for some reason. I would expect pthread_cond_wait to be interrupted if a signal arrives though. Have you overridden the signal handler for SIGUSR1?
No, I have not overridden the signal handler. I'm aware of the fact that signals make pthread_cond_wait() return early and put them in a while loop as one would expect, that is all. -- Marco
Jun 18 2013
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Jun 18, 2013, at 7:01 AM, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Mon, 17 Jun 2013 10:46:19 -0700
 schrieb Sean Kelly <sean invisibleduck.org>:
=20
 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
=20
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
=20
 #0  sem_wait () from /lib64/libpthread.so.0
 #1  thread_suspendAll () at core/thread.d:2471
 #2  gc.gcx.Gcx.fullcollect() (this=3D...) at gc/gcx.d:2427
 #3  gc.gcx.Gcx.bigAlloc() (this=3D..., size=3D16401, poolPtr=3D0x7fc3d4b=
fe3c8, alloc_size=3D0x7fc3d4bfe418) at gc/gcx.d:2099
 #4  gc.gcx.GC.mallocNoSync (alloc_size=3D0x7fc3d4bfe418, bits=3D10, size=
=3D16401, this=3D...) gc/gcx.d:503
 #5  gc.gcx.GC.malloc() (this=3D..., size=3D16401, bits=3D10, alloc_size=3D=
0x7fc3d4bfe418) gc/gcx.d:421
 #6  gc.gc.gc_qalloc (ba=3D10, sz=3D<optimized out>) gc/gc.d:203
 #7  gc_qalloc (sz=3D<optimized out>, ba=3D10) gc/gc.d:198
 #8  _d_newarrayT (ti=3D..., length=3D4096) rt/lifetime.d:807
 #9  sequencer.algorithm.gzip.HuffmanTree.__T6__ctorTG32hZ.__ctor() (this=
=3D..., bitLengths=3D...) sequencer/algorithm/gzip.d:444
=20
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
=20 I assume you're running on Linux, which uses signals (SIGUSR1, specifical=
ly) to suspend threads for a collection. So I imagine what's happening is t= hat your thread is trying to suspend all the other threads so it can collect= , and those threads are ignoring the signal for some reason. I would expect= pthread_cond_wait to be interrupted if a signal arrives though. Have you o= verridden the signal handler for SIGUSR1?
=20
 No, I have not overridden the signal handler. I'm aware of the
 fact that signals make pthread_cond_wait() return early and
 put them in a while loop as one would expect, that is all.
Hrm... Can you trap this in a debugger and post the stack traces of all thre= ads? That stack above is a thread waiting for others to say they're suspend= ed so it can collect.=20=
Jun 18 2013
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 18 Jun 2013 19:12:06 -0700
schrieb Sean Kelly <sean invisibleduck.org>:

 On Jun 18, 2013, at 7:01 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Am Mon, 17 Jun 2013 10:46:19 -0700
 schrieb Sean Kelly <sean invisibleduck.org>:
 
 On Jun 13, 2013, at 2:22 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Here is an excerpt from a stack trace I got while profiling
 with OProfile:
 
 #0  sem_wait () from /lib64/libpthread.so.0
 #1  thread_suspendAll () at core/thread.d:2471
 #2  gc.gcx.Gcx.fullcollect() (this=...) at gc/gcx.d:2427
 #3  gc.gcx.Gcx.bigAlloc() (this=..., size=16401, poolPtr=0x7fc3d4bfe3c8,
alloc_size=0x7fc3d4bfe418) at gc/gcx.d:2099
 #4  gc.gcx.GC.mallocNoSync (alloc_size=0x7fc3d4bfe418, bits=10, size=16401,
this=...) gc/gcx.d:503
 #5  gc.gcx.GC.malloc() (this=..., size=16401, bits=10,
alloc_size=0x7fc3d4bfe418) gc/gcx.d:421
 #6  gc.gc.gc_qalloc (ba=10, sz=<optimized out>) gc/gc.d:203
 #7  gc_qalloc (sz=<optimized out>, ba=10) gc/gc.d:198
 #8  _d_newarrayT (ti=..., length=4096) rt/lifetime.d:807
 #9  sequencer.algorithm.gzip.HuffmanTree.__T6__ctorTG32hZ.__ctor() (this=...,
bitLengths=...) sequencer/algorithm/gzip.d:444
 
 Two more threads are alive, but waiting on a condition
 variable (i.e.: in pthread_cond_wait(), but from my own and
 not from druntime code. Is there some obvious way I could have
 dead-locked the GC ? Or is there a bug ?
I assume you're running on Linux, which uses signals (SIGUSR1, specifically) to suspend threads for a collection. So I imagine what's happening is that your thread is trying to suspend all the other threads so it can collect, and those threads are ignoring the signal for some reason. I would expect pthread_cond_wait to be interrupted if a signal arrives though. Have you overridden the signal handler for SIGUSR1?
No, I have not overridden the signal handler. I'm aware of the fact that signals make pthread_cond_wait() return early and put them in a while loop as one would expect, that is all.
Hrm... Can you trap this in a debugger and post the stack traces of all threads? That stack above is a thread waiting for others to say they're suspended so it can collect.
I could do that (with a little work setting the scenario up again), but it wont help. As I said, the other two threads were paused in pthread_cond_wait() in my own code. There was nothing special about their stack trace. -- Marco
Jul 01 2013