www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Program locked at joinAll and sched_yield

reply tcak <1ltkrs+3wyh1ow7kzn1k sharklasers.com> writes:
I have my own Http Server. Every request is handled by a thread, 
and threads are reused.

I send 35,000 request (7 different terminals are sending 5000 
requests each) to the server again and again (each of them lives 
for short).

Anyway, everything works great, there is no problem at all.

I put "readln" in main function. So, when I press enter, all 
currently idle threads are stopped. (I use thread.join()).

Problem is that, all threads are stopped, by the last thread 
Thread#1 gets locked at sched_yield(), it uses one of CPU cores 
at 100%, and program never quits and stays there.

There is only one remaining thread at the end, and below is its 
stack trace.

sched_yield() in 
/build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84

thread_joinAll() in

rt_term() in

rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
function).runAll()() in

rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
function).tryExec(scope void() delegate)() in

_d_run_main() in

main() in

__libc_start_main(int (*)(int, char **, char **) main, int argc, 
char ** argv, int (*)(int, char **, char **) init, void (*)(void) 
fini, void (*)(void) rtld_fini, void * stack_end) in 
/build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291

_start() in


Is there any known issue about this? or anything that is known to 
cause this problem?
Jul 01 2016
parent reply Lodovico Giaretta <lodovico giaretart.net> writes:
On Friday, 1 July 2016 at 12:02:11 UTC, tcak wrote:
 I have my own Http Server. Every request is handled by a 
 thread, and threads are reused.

 I send 35,000 request (7 different terminals are sending 5000 
 requests each) to the server again and again (each of them 
 lives for short).

 Anyway, everything works great, there is no problem at all.

 I put "readln" in main function. So, when I press enter, all 
 currently idle threads are stopped. (I use thread.join()).

 Problem is that, all threads are stopped, by the last thread 
 Thread#1 gets locked at sched_yield(), it uses one of CPU cores 
 at 100%, and program never quits and stays there.

 There is only one remaining thread at the end, and below is its 
 stack trace.

 sched_yield() in 
 /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84

 thread_joinAll() in

 rt_term() in

 rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
 function).runAll()() in

 rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
 function).tryExec(scope void() delegate)() in

 _d_run_main() in

 main() in

 __libc_start_main(int (*)(int, char **, char **) main, int 
 argc, char ** argv, int (*)(int, char **, char **) init, void 
 (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in 
 /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291

 _start() in


 Is there any known issue about this? or anything that is known 
 to cause this problem?
Hi! Can you provide a reduced test case that shows the issue? Without any code, it's difficult to tell what's going on.
Jul 03 2016
parent reply tcak <1ltkrs+3wyh1ow7kzn1k sharklasers.com> writes:
On Sunday, 3 July 2016 at 17:19:04 UTC, Lodovico Giaretta wrote:
 On Friday, 1 July 2016 at 12:02:11 UTC, tcak wrote:
 I have my own Http Server. Every request is handled by a 
 thread, and threads are reused.

 I send 35,000 request (7 different terminals are sending 5000 
 requests each) to the server again and again (each of them 
 lives for short).

 Anyway, everything works great, there is no problem at all.

 I put "readln" in main function. So, when I press enter, all 
 currently idle threads are stopped. (I use thread.join()).

 Problem is that, all threads are stopped, by the last thread 
 Thread#1 gets locked at sched_yield(), it uses one of CPU 
 cores at 100%, and program never quits and stays there.

 There is only one remaining thread at the end, and below is 
 its stack trace.

 sched_yield() in 
 /build/glibc-GKVZIf/glibc-2.23/posix/../sysdeps/unix/syscall-template.S:84

 thread_joinAll() in

 rt_term() in

 rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
 function).runAll()() in

 rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) 
 function).tryExec(scope void() delegate)() in

 _d_run_main() in

 main() in

 __libc_start_main(int (*)(int, char **, char **) main, int 
 argc, char ** argv, int (*)(int, char **, char **) init, void 
 (*)(void) fini, void (*)(void) rtld_fini, void * stack_end) in 
 /build/glibc-GKVZIf/glibc-2.23/csu/../csu/libc-start.c:291

 _start() in


 Is there any known issue about this? or anything that is known 
 to cause this problem?
Hi! Can you provide a reduced test case that shows the issue? Without any code, it's difficult to tell what's going on.
Well, I actually have found out about the issue, and solved it a different way. I put memory limit on the process for testing. At some point, due to memory limitation, thread.start() method fails. But, this method cannot recover the system correctly, and Phobos thinks that thread has been started correctly. This happens, if I understand correctly, due to the value of variable "nAboutToStart" in core.thread, line 685. Its value is increase here, and is decreased by 1 in "add" function on line 1775. When start() fails, add() is not called for it ever, and thread_joinAll() on line 2271 gets into an endless loop. There by, the program cannot quit, and loop starts using 100% CPU. --- What I did to solve this issue is that I created my thread by using pthread_create() function, and called thread_attachThis(). This way, problem is prevented. --- As a solution, when thread creation is failed in start() method, we should decrease the value of "nAboutToStart" by 1, but it seems like "pAboutToStart" needs to be touched to recover the system properly. Fortunately there is not much code in the start() method.
Jul 03 2016
parent Lodovico Giaretta <lodovico giaretart.net> writes:
On Sunday, 3 July 2016 at 18:25:32 UTC, tcak wrote:
 Well, I actually have found out about the issue, and solved it 
 a different way.

 I put memory limit on the process for testing.

 At some point, due to memory limitation, thread.start() method 
 fails. But, this method cannot recover the system correctly, 
 and Phobos thinks that thread has been started correctly.

 This happens, if I understand correctly, due to the value of 
 variable "nAboutToStart" in core.thread, line 685. Its value is 
 increase here, and is decreased by 1 in "add" function on line 
 1775. When start() fails, add() is not called for it ever, and 
 thread_joinAll() on line 2271 gets into an endless loop. There 
 by, the program cannot quit, and loop starts using 100% CPU.

 ---

 What I did to solve this issue is that I created my thread by 
 using pthread_create() function, and called 
 thread_attachThis(). This way, problem is prevented.

 ---

 As a solution, when thread creation is failed in start() 
 method, we should decrease the value of "nAboutToStart" by 1, 
 but it seems like "pAboutToStart" needs to be touched to 
 recover the system properly. Fortunately there is not much code 
 in the start() method.
I suggest you create an issue for this, if you didn't already, so that it can be fixed.
Jul 03 2016