digitalmars.D - I think race condition exists in tango & phobos gc code
- redsea (22/22) Sep 07 2008 I have a programm wrote in D and run 24 * 7, I found it would block one...
- Sean Kelly (7/34) Sep 08 2008 SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers
I have a programm wrote in D and run 24 * 7, I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? ) and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ? I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask. Then I check the lib, and I think the problem may cause by the following execute order: thread A: thread B: fullcollect thread_suspendAll suspend thread_suspendHandler sem_post( &suspendCount ); ret from sem_wait( &suspendCount ); do collect thread_resumeAll !! this signal would lost pthread_kill( t.m_addr, SIGUSR2 ) sigsuspend( &sigres ); thread B would block because of the SIGUSR2 lost. then I check the phobos code, and the code is alike. Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly. Any suggest ?
Sep 07 2008
redsea wrote:I have a programm wrote in D and run 24 * 7, I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? ) and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ? I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask. Then I check the lib, and I think the problem may cause by the following execute order: thread A: thread B: fullcollect thread_suspendAll suspend thread_suspendHandler sem_post( &suspendCount ); ret from sem_wait( &suspendCount ); do collect thread_resumeAll !! this signal would lost pthread_kill( t.m_addr, SIGUSR2 ) sigsuspend( &sigres ); thread B would block because of the SIGUSR2 lost.SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested.then I check the phobos code, and the code is alike. Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.Thanks, please do. If it really is a problem I'd be happy to change it. Sean
Sep 08 2008
Sean Kelly Wrote:SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested.I wrote a small programm kill and sigsuspend use the order as me metioned before, the signal is not lost. So the real reason should hide more deep. The version use semaphore finished, but I've to wait the adminstrator test & upload the programming. I will do more check. Thanks for your opinions .then I check the phobos code, and the code is alike. Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.Thanks, please do. If it really is a problem I'd be happy to change it.
Sep 09 2008
Sean Kelly Wrote:SIGUSR2 shouldn't be lost. Tango sets sa_mask for the signal handlers to tell the OS to block all signals while the handler is processing. The call to sigsuspend is supposed to manually change that for the signals requested.I'm wrong. Indeed the programming has two components, client & server, both is multi thread. I was reported that two components have same problem. After check, I found the client version is correct, running stable, that the bug must be nothing about tango. Sorry !
Sep 10 2008