www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Issues with Socket.accept() and SIGUSR1

reply LeqxLeqx <mitchelldlarson protonmail.ch> writes:
Hello,

I've been trying to create a small server-client program, and 
I've run into a rather strange problem. There's a thread separate 
from the main which accepts incoming connections. The established 
connections are then passed over to the main thread for the 
actual logic of the interaction, and the accept thread loops back 
to listen for further connections.

Normally the accept will throw a timeout and then the loop will 
continue to listen, but sometimes (and I can't find a decent 
pattern) the Socket.accept() method will raise a SIGUSR1 rather 
than throwing an exception of any kind.

Can anyone help with this? I have no idea why this happen. Below 
is the output from GDB when I've been testing it.


   Thread 2 "sverse" received signal SIGUSR1, User defined signal 
1.
   [Switching to Thread 0x7ffff6a16700 (LWP 1562)]
   0x00007ffff72a5840 in __libc_accept (fd=3, addr=addr entry=..., 
len=len entry=0x0) at ../sysdeps/unix/sysv/linux/accept.c:26
   26	../sysdeps/unix/sysv/linux/accept.c: No such file or 
directory.
   (gdb) backtrace
   #0  0x00007ffff72a5840 in __libc_accept (fd=3, 
addr=addr entry=..., len=len entry=0x0) at 
../sysdeps/unix/sysv/linux/accept.c:26
   #1  0x00005555555be5d5 in std.socket.Socket.accept() 
(this=0x7ffff7edc0e0) at 
../../../../src/libphobos/src/std/socket.d:2817
   #2  0x00005555555801c1 in 
sverse.server.server.Server.acceptCallback() 
(this=0x7ffff7eda100) at ./sverse/server/server.d:264
   #3  0x00005555555fa842 in core.thread.Thread.run() 
(this=0x7ffff7eda200) at 
../../../../src/libphobos/libdruntime/core/thread.d:1403
   #4  thread_entryPoint (arg=0x7ffff7eda200) at 
../../../../src/libphobos/libdruntime/core/thread.d:392
   #5  0x00007ffff6c227fc in start_thread (arg=0x7ffff6a16700) at 
pthread_create.c:465
   #6  0x00007ffff72a4b0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
   (gdb) info locals
   resultvar = 18446744073709551612
   sc_cancel_oldtype = 0
   (gdb) up
   #1  0x00005555555be5d5 in std.socket.Socket.accept() 
(this=0x7ffff7edc0e0) at 
../../../../src/libphobos/src/std/socket.d:2817
   2817	../../../../src/libphobos/src/std/socket.d: No such file 
or directory.
   (gdb) info locals
   newSocket = <optimized out>
   newsock = <optimized out>


Using GDC if that helps.

Any and all assistance will be greatly appreciated

Thanks,
Dec 08 2017
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
 Normally the accept will throw a timeout and then the loop will 
 continue to listen, but sometimes (and I can't find a decent 
 pattern) the Socket.accept() method will raise a SIGUSR1 rather 
 than throwing an exception of any kind.
That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan. All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)
Dec 08 2017
next sibling parent Nemanja Boric <4burgos gmail.com> writes:
On Friday, 8 December 2017 at 23:11:47 UTC, Adam D. Ruppe wrote:
 On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
 Normally the accept will throw a timeout and then the loop 
 will continue to listen, but sometimes (and I can't find a 
 decent pattern) the Socket.accept() method will raise a 
 SIGUSR1 rather than throwing an exception of any kind.
That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan. All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)
Sorry, I've completely missed your first paragraph! It's after midnight here, good night!
Dec 08 2017
prev sibling parent LeqxLeqx <mitchelldlarson protonmail.ch> writes:
On Friday, 8 December 2017 at 23:11:47 UTC, Adam D. Ruppe wrote:
 On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
 Normally the accept will throw a timeout and then the loop 
 will continue to listen, but sometimes (and I can't find a 
 decent pattern) the Socket.accept() method will raise a 
 SIGUSR1 rather than throwing an exception of any kind.
That probably means the *other* thread started a garbage collection cycle. The D GC uses that signal to pause threads while it scans memory, so they don't change out from under it mid-scan. All you need to do is try the accept again if that happens. It isn't really an exception, it is just an EINTR - signal call interrupted - and you are supposed to just try again when that happens (unless the interruption meant the program is now instructed to terminate e.g. SIGINT)
Than you both for answering my stupid question. Nonetheless, it seems that there still is a very strange thing going on. I'm getting a segfault (which was the error I got before I opened GDB and ran into the SIGUSR1 thing), in the middle of a object.opEquals call. It seems to be triggered right after the GC's SIGUSR1. Perhaps this is just another stupid question, but is it possible that the D GC is collecting a resource which my program is still attempting to use? I'm not using pointers directly at all in this program. Thread 2 "sverse" received signal SIGUSR1, User defined signal 1. Thread 3 "sverse" received signal SIGUSR1, User defined signal 1. Thread 2 "sverse" received signal SIGUSR2, User defined signal 2. Thread 3 "sverse" received signal SIGUSR2, User defined signal 2. Thread 1 "sverse" received signal SIGSEGV, Segmentation fault. 0x00007ffff7edc200 in ?? () (gdb) backtracfe Undefined command: "backtracfe". Try "help". (gdb) backtrace #0 0x00007ffff7edc200 in ?? () #1 0x0000555555606242 in object.opEquals(Object, Object) (lhs=0x7ffff7edc120, rhs=0x7ffff7edc080) at ../../../../src/libphobos/libdruntime/object.d:152 #2 0x0000555555581fb9 in sverse.server.serverpanel.ServerPanel.canMove(sverse.core.entity.Entity) (this=0x7ffff7ede000, movedEntity=0x7ffff7edf1c0) at ./sverse/server/serverpanel.d:129 #3 0x0000555555581e13 in sverse.server.serverpanel.ServerPanel.attemptToApplyMove(sverse core.entity.Entity, dmath.vector.Vector!(int).Vector) (this=0x7ffff7ede000, entity=0x7ffff7edf1c0, originalPosition=0x7ffff7fd6ec0) at ./sverse/server/serverpanel.d:107 #4 0x0000555555581afb in sverse.server.serverpanel.ServerPanel.update() (this=0x7ffff7ede000) at ./sverse/server/serverpanel.d:72 #5 0x000055555557f971 in sverse.server.server.Server.updateAllPanels() (this=0x7ffff7eda100) at ./sverse/server/server.d:190 #6 0x000055555557f15e in sverse.server.server.Server.tick() (this=0x7ffff7eda100) at ./sverse/server/server.d:113 #7 0x000055555557c926 in sverse.sverse.runServer(immutable(char)[], ushort) (addressString=..., port=6001) at ./sverse/sverse.d:99 #8 0x000055555557c398 in D main (args=...) at ./sverse/sverse.d:26
Dec 08 2017
prev sibling parent Nemanja Boric <4burgos gmail.com> writes:
On Friday, 8 December 2017 at 22:27:41 UTC, LeqxLeqx wrote:
 Hello,

 I've been trying to create a small server-client program, and 
 I've run into a rather strange problem. There's a thread 
 separate from the main which accepts incoming connections. The 
 established connections are then passed over to the main thread 
 for the actual logic of the interaction, and the accept thread 
 loops back to listen for further connections.

 [...]
Looking the trace, your thread 2 had not raised, but received a signal. I have a feeling GC collection starts from another thread, and GC sends SIGUSR1 to all (registered to the runtime) threads to pause them, while the collection is running. As Adam said, repeat the call (check how Phobos sockets are handling this and what you should do), and you probably want to instruct the debugger to ignore SIGUSR1/2.
Dec 08 2017