digitalmars.D - Multithreading woes on Linux
- Juan Jose Comellas <jcomellas gmail.com> Apr 23 2006
- Thomas Kuehne <thomas-dloop kuehne.cn> Apr 23 2006
- Dave <Dave_member pathlink.com> Apr 23 2006
- Juan Jose Comellas <jcomellas gmail.com> Apr 23 2006
- Justin C Calvarese <technocrat7 gmail.com> Apr 23 2006
- pmoore <pmoore_member pathlink.com> Apr 24 2006
Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8Bit It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this.
Apr 23 2006
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Juan Jose Comellas schrieb am 2006-04-23:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 1318 byte *p = cast(byte *)(*p1);
Might be related to http://d.puremagic.com/bugzilla/show_bug.cgi?id=72 A potential workaround: 1) edit dmd/src/phobos/internal/gc/linux.mak remove -relase from DFLAGS: DFLAGS=-O -inline -I../.. 2) recompile libphobos.a 3) replace your current libphobos.a with the one found at dmd/src/phobos/libphobos.a Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFES+KJ3w+/yD4P9tIRAk6XAKCEv0Vcxe8Gr39gq43WwswuikaajgCgxaCQ j0UzSJRwEcrZ+59dPlfuB7g= =oJR4 -----END PGP SIGNATURE-----
Apr 23 2006
I just ran into this - the fix in std/thread.d:
extern (C) static void pauseHandler(int sig)
{ int result;
// Save all registers on the stack so they'll be scanned by the GC
asm
{
pusha ;
}
assert(sig == SIGUSR1);
// Move sem_post to after t.stackTop = getESP();
//sem_post(&flagSuspend);
sigset_t sigmask;
result = sigfillset(&sigmask);
assert(result == 0);
result = sigdelset(&sigmask, SIGUSR2);
assert(result == 0);
Thread t = getThis();
t.stackTop = getESP();
t.flags &= ~1;
sem_post(&flagSuspend); // HERE
while (1)
{
sigsuspend(&sigmask); // suspend until SIGUSR2
if (t.flags & 1) // ensure it was resumeHandler()
break;
}
// Restore all registers
asm
{
popa ;
}
}
The problem is that the t.stackTop is not valid when it is passed into
gcx.mark() because it is being munged as pauseAll returns (and lets the
GC commence) before the stackTop is set for all of the paused threads.
Please give it a try and if it also solves your problem then it will be
a confirmed fix.
- Dave
Juan Jose Comellas wrote:
It seems that there is a problem in the code generated by DMD or the code in
Phobos when using multithreading on Linux. I've been trying several ways of
rewriting my programs to avoid this problem, but I've had no success so
far. The crashes always happen inside the garbage collector. The line
reported by gdb is:
#0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318
1318 byte *p = cast(byte *)(*p1);
It looks like the pointer that's being dereferenced by the GC is invalid.
I've added checks before this line to see if it was a NULL pointer and it's
not. Surprisingly (or not), my program crashes almost immediately if Phobos
and the GC are compiled with optimizations. If I only leave "-g" as the
DFLAGS in the makefiles I get these crashes much less frequently.
In the test program I'm using I have two threads. The crash is happening on
thread 1. The full backtrace I get for the crash is attached to this post.
I'm trying to write a simplified sample program and I'll post it once I have
it ready. Walter, if you have a minute, I'd appreciate you looking into
this.
------------------------------------------------------------------------
(gdb) thread apply all bt
Thread 2 (process 8953):
#0 0x5557db9d in sem_post GLIBC_2.0 () from /lib/tls/libpthread.so.0
#1 0x08062f27 in _D3std6thread6Thread12pauseHandlerUiZv () at std/thread.d:940
#2 <signal handler called>
#3 0x5557e83e in send () from /lib/tls/libpthread.so.0
#4 0x08050a61 in _D5mango2io6Socket6Socket4sendFAvE5mango2io6S
cket6Socket5FlagsZi () at
/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1423
#5 0x08050290 in _D5mango2io6Socket6Socket6writerFAvZk () at
/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:879
#6 0x0804cbde in _D5mango2io7Conduit7Conduit5writeFAvZk () at
/home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198
#7 0x0805821f in _D8selector16clientThreadFuncFZv () at selector.d:363
#8 0x0805816e in _D8selector21dummyClientThreadFuncFPvZi () at selector.d:327
#9 0x080628c5 in _D3std6thread6Thread3runFZi () at std/thread.d:609
#10 0x08062d50 in _D3std6thread6Thread11threadstartUPvZPv () at
std/thread.d:845
#11 0x55579ced in start_thread () from /lib/tls/libpthread.so.0
#12 0x5567ddde in clone () from /lib/tls/libc.so.6
Thread 1 (process 8949):
#0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318
#1 0x0806ad05 in _D3gcx3Gcx11fullcollectFPvZk () at gcx.d:1462
#2 0x0806aab5 in _D3gcx3Gcx16fullcollectshellFZk () at gcx.d:1382
#3 0x080692de in _D3gcx2GC12mallocNoSyncFkZPv () at gcx.d:275
#4 0x080691c1 in _D3gcx2GC6mallocFkZPv () at gcx.d:228
#5 0x080684db in _d_newclass () at gc.d:127
#6 0x08053df7 in _D5mango2io8selector12PollSelector12PollSelector11selectedSetFZC5mango2io8selector5model9ISel
ctor13ISelectionSet ()
at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353
#7 0x08057d69 in _D8selector12testSelectorFC5mango2io8selector5model9I
elector9ISelectorZv () at selector.d:142
#8 0x08057c24 in _Dmain () at selector.d:66
#9 0x0805a38a in main () at internal/dmain2.d:94
Apr 23 2006
Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla? Dave wrote:I just ran into this - the fix in std/thread.d: extern (C) static void pauseHandler(int sig) { int result; // Save all registers on the stack so they'll be scanned by the GC asm { pusha ; } assert(sig == SIGUSR1); // Move sem_post to after t.stackTop = getESP(); //sem_post(&flagSuspend); sigset_t sigmask; result = sigfillset(&sigmask); assert(result == 0); result = sigdelset(&sigmask, SIGUSR2); assert(result == 0); Thread t = getThis(); t.stackTop = getESP(); t.flags &= ~1; sem_post(&flagSuspend); // HERE while (1) { sigsuspend(&sigmask); // suspend until SIGUSR2 if (t.flags & 1) // ensure it was resumeHandler() break; } // Restore all registers asm { popa ; } } The problem is that the t.stackTop is not valid when it is passed into gcx.mark() because it is being munged as pauseAll returns (and lets the GC commence) before the stackTop is set for all of the paused threads. Please give it a try and if it also solves your problem then it will be a confirmed fix. - Dave Juan Jose Comellas wrote:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this. ------------------------------------------------------------------------ (gdb) thread apply all bt Thread 2 (process 8953): #0 0x5557db9d in sem_post GLIBC_2.0 () from /lib/tls/libpthread.so.0 #1 0x08062f27 in _D3std6thread6Thread12pauseHandlerUiZv () at #std/thread.d:940 #2 <signal handler called> #3 0x5557e83e in send () from /lib/tls/libpthread.so.0 #4 0x08050a61 in #_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi () at #/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1423 #5 0x08050290 in _D5mango2io6Socket6Socket6writerFAvZk () at #/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:879 #6 0x0804cbde in _D5mango2io7Conduit7Conduit5writeFAvZk () at #/home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198 #7 0x0805821f in _D8selector16clientThreadFuncFZv () at selector.d:363 #8 0x0805816e in _D8selector21dummyClientThreadFuncFPvZi () at #selector.d:327 #9 0x080628c5 in _D3std6thread6Thread3runFZi () at std/thread.d:609 #10 0x08062d50 in _D3std6thread6Thread11threadstartUPvZPv () at #std/thread.d:845 11 0x55579ced in start_thread () from #/lib/tls/libpthread.so.0 12 0x5567ddde in clone () from #/lib/tls/libc.so.6 Thread 1 (process 8949): #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 #1 0x0806ad05 in _D3gcx3Gcx11fullcollectFPvZk () at gcx.d:1462 #2 0x0806aab5 in _D3gcx3Gcx16fullcollectshellFZk () at gcx.d:1382 #3 0x080692de in _D3gcx2GC12mallocNoSyncFkZPv () at gcx.d:275 #4 0x080691c1 in _D3gcx2GC6mallocFkZPv () at gcx.d:228 #5 0x080684db in _d_newclass () at gc.d:127 #6 0x08053df7 in
#() at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353 #7 0x08057d69 in
#() at selector.d:142 #8 0x08057c24 in _Dmain () at selector.d:66 #9 0x0805a38a in main () at internal/dmain2.d:94
Apr 23 2006
Juan Jose Comellas wrote:Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla?
I think this is exactly what bugzilla is for. I think you should go ahead and add it. -- jcc7
Apr 23 2006
Slightly off topic: Why does this function do a pusha and popa? Surely they are 16 bit pushes and pops? Wouldn't you want pushad and popad instead? Note though that individual pushes and pops would probably be better with the 64 bit future in mind as pushad and popad beome invalid instructions in x86_64. In article <e2gvv6$217a$1 digitaldaemon.com>, Juan Jose Comellas says...Great fix! This solved all the problems I've found so far when working with multiple threads on Linux. I'm going to start running more complex test cases with several hundred threads to see if I can find any additional problems. Thank you very much for this. Walter, please add this fix to Phobos. Should I create an entry in D's bugzilla? Dave wrote:I just ran into this - the fix in std/thread.d: extern (C) static void pauseHandler(int sig) { int result; // Save all registers on the stack so they'll be scanned by the GC asm { pusha ; } assert(sig == SIGUSR1); // Move sem_post to after t.stackTop = getESP(); //sem_post(&flagSuspend); sigset_t sigmask; result = sigfillset(&sigmask); assert(result == 0); result = sigdelset(&sigmask, SIGUSR2); assert(result == 0); Thread t = getThis(); t.stackTop = getESP(); t.flags &= ~1; sem_post(&flagSuspend); // HERE while (1) { sigsuspend(&sigmask); // suspend until SIGUSR2 if (t.flags & 1) // ensure it was resumeHandler() break; } // Restore all registers asm { popa ; } } The problem is that the t.stackTop is not valid when it is passed into gcx.mark() because it is being munged as pauseAll returns (and lets the GC commence) before the stackTop is set for all of the paused threads. Please give it a try and if it also solves your problem then it will be a confirmed fix. - Dave Juan Jose Comellas wrote:It seems that there is a problem in the code generated by DMD or the code in Phobos when using multithreading on Linux. I've been trying several ways of rewriting my programs to avoid this problem, but I've had no success so far. The crashes always happen inside the garbage collector. The line reported by gdb is: #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 1318 byte *p = cast(byte *)(*p1); It looks like the pointer that's being dereferenced by the GC is invalid. I've added checks before this line to see if it was a NULL pointer and it's not. Surprisingly (or not), my program crashes almost immediately if Phobos and the GC are compiled with optimizations. If I only leave "-g" as the DFLAGS in the makefiles I get these crashes much less frequently. In the test program I'm using I have two threads. The crash is happening on thread 1. The full backtrace I get for the crash is attached to this post. I'm trying to write a simplified sample program and I'll post it once I have it ready. Walter, if you have a minute, I'd appreciate you looking into this. ------------------------------------------------------------------------ (gdb) thread apply all bt Thread 2 (process 8953): #0 0x5557db9d in sem_post GLIBC_2.0 () from /lib/tls/libpthread.so.0 #1 0x08062f27 in _D3std6thread6Thread12pauseHandlerUiZv () at #std/thread.d:940 #2 <signal handler called> #3 0x5557e83e in send () from /lib/tls/libpthread.so.0 #4 0x08050a61 in #_D5mango2io6Socket6Socket4sendFAvE5mango2io6Socket6Socket5FlagsZi () at #/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:1423 #5 0x08050290 in _D5mango2io6Socket6Socket6writerFAvZk () at #/home/jcomellas/devel/d/mango_test/mango/io/Socket.d:879 #6 0x0804cbde in _D5mango2io7Conduit7Conduit5writeFAvZk () at #/home/jcomellas/devel/d/mango_test/mango/io/Conduit.d:198 #7 0x0805821f in _D8selector16clientThreadFuncFZv () at selector.d:363 #8 0x0805816e in _D8selector21dummyClientThreadFuncFPvZi () at #selector.d:327 #9 0x080628c5 in _D3std6thread6Thread3runFZi () at std/thread.d:609 #10 0x08062d50 in _D3std6thread6Thread11threadstartUPvZPv () at #std/thread.d:845 11 0x55579ced in start_thread () from #/lib/tls/libpthread.so.0 12 0x5567ddde in clone () from #/lib/tls/libc.so.6 Thread 1 (process 8949): #0 0x0806a978 in _D3gcx3Gcx4markFPvPvZv () at gcx.d:1318 #1 0x0806ad05 in _D3gcx3Gcx11fullcollectFPvZk () at gcx.d:1462 #2 0x0806aab5 in _D3gcx3Gcx16fullcollectshellFZk () at gcx.d:1382 #3 0x080692de in _D3gcx2GC12mallocNoSyncFkZPv () at gcx.d:275 #4 0x080691c1 in _D3gcx2GC6mallocFkZPv () at gcx.d:228 #5 0x080684db in _d_newclass () at gc.d:127 #6 0x08053df7 in
#() at /home/jcomellas/devel/d/mango_test/mango/io/selector/PollSelector.d:353 #7 0x08057d69 in
#() at selector.d:142 #8 0x08057c24 in _Dmain () at selector.d:66 #9 0x0805a38a in main () at internal/dmain2.d:94
Apr 24 2006









Thomas Kuehne <thomas-dloop kuehne.cn> 