www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Bogus thread termination under linux when blocked on a socket receive

reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
I have a SocketListener class that accepts a socket during construction, and
fires up a thread to read said socket and dispatch to a user-supplied
notification routine. The socket is of the datagram variety.

Everything works fine until program termination, where the blocked thread
(waiting in a socket-receive function) segfaults instead of returning.
Here's the kicker: if, just after starting the thread, I pause() and then
resume() it, everything works perfectly during termination also. That is,
the socket-receive unblocks and returns as expected, causing the thread to
shutdown gracefully.

There are no such woes under Win32 ...

Help! Any ideas?
Jul 17 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Sat, 17 Jul 2004 22:27:39 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during construction, 
 and
 fires up a thread to read said socket and dispatch to a user-supplied
 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked thread
 (waiting in a socket-receive function) segfaults instead of returning.
 Here's the kicker: if, just after starting the thread, I pause() and then
 resume() it, everything works perfectly during termination also. That is,
 the socket-receive unblocks and returns as expected, causing the thread 
 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a message to the guys involved, hopefully someone remembers why it occurs and how we solved it. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 17 2004
parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Thanks Regan ...

"Regan Heath" <regan netwin.co.nz> wrote in message
news:opsbbj3vx55a2sq9 digitalmars.com...
 On Sat, 17 Jul 2004 22:27:39 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during construction,
 and
 fires up a thread to read said socket and dispatch to a user-supplied
 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked


 (waiting in a socket-receive function) segfaults instead of returning.
 Here's the kicker: if, just after starting the thread, I pause() and


 resume() it, everything works perfectly during termination also. That


 the socket-receive unblocks and returns as expected, causing the thread
 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a message to the guys involved, hopefully someone remembers why it occurs and how we solved it. Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 17 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
Unfortunately.. either we've forgotten we had this problem, or we didn't 
have this exact problem.

Is it definately a SIGSEGV or could it be a SIGHUP you're getting during 
termination?

Regan

On Sat, 17 Jul 2004 23:22:25 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:
 Thanks Regan ...

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbbj3vx55a2sq9 digitalmars.com...
 On Sat, 17 Jul 2004 22:27:39 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during 

 and
 fires up a thread to read said socket and dispatch to a user-supplied
 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked


 (waiting in a socket-receive function) segfaults instead of returning.
 Here's the kicker: if, just after starting the thread, I pause() and


 resume() it, everything works perfectly during termination also. That


 the socket-receive unblocks and returns as expected, causing the 

 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a message to the guys involved, hopefully someone remembers why it occurs and how we solved it.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 18 2004
parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Bummer; it's a SEGV.  Thanks for trying mate ...

- Kris

"Regan Heath" <regan netwin.co.nz> wrote in message
news:opsbcuyfpv5a2sq9 digitalmars.com...
 Unfortunately.. either we've forgotten we had this problem, or we didn't
 have this exact problem.

 Is it definately a SIGSEGV or could it be a SIGHUP you're getting during
 termination?

 Regan

 On Sat, 17 Jul 2004 23:22:25 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:
 Thanks Regan ...

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbbj3vx55a2sq9 digitalmars.com...
 On Sat, 17 Jul 2004 22:27:39 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during

 and
 fires up a thread to read said socket and dispatch to a user-supplied
 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked


 (waiting in a socket-receive function) segfaults instead of




 Here's the kicker: if, just after starting the thread, I pause() and


 resume() it, everything works perfectly during termination also. That


 the socket-receive unblocks and returns as expected, causing the

 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a message



 the guys involved, hopefully someone remembers why it occurs and how we
 solved it.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 18 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
The simplest answer would be that the main thread is closing/releasing the 
socket before the thread was done with it, Occums Razor and all that.

A more complex answer.. might it have something to do with who owns the 
socket handle. It's created in the main thread, and can either be 
inherited by the child thread, or duplicated, or.. depending on how the 
thread is created.

Does the main thread wait for it's child threads to finish before exiting?

Is the code somewhere I can look at it?

Regan

On Sun, 18 Jul 2004 17:34:45 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:
 Bummer; it's a SEGV.  Thanks for trying mate ...

 - Kris

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbcuyfpv5a2sq9 digitalmars.com...
 Unfortunately.. either we've forgotten we had this problem, or we didn't
 have this exact problem.

 Is it definately a SIGSEGV or could it be a SIGHUP you're getting during
 termination?

 Regan

 On Sat, 17 Jul 2004 23:22:25 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:
 Thanks Regan ...

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbbj3vx55a2sq9 digitalmars.com...
 On Sat, 17 Jul 2004 22:27:39 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during

 and
 fires up a thread to read said socket and dispatch to a 



 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked


 (waiting in a socket-receive function) segfaults instead of




 Here's the kicker: if, just after starting the thread, I pause() 



 then
 resume() it, everything works perfectly during termination also. 



 is,
 the socket-receive unblocks and returns as expected, causing the

 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a message



 the guys involved, hopefully someone remembers why it occurs and how 


 solved it.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 18 2004
next sibling parent reply John Reimer <brk_6502 NO_S_PAM.yahoo.com> writes:
On Mon, 19 Jul 2004 13:21:13 +1200, Regan Heath wrote:

 The simplest answer would be that the main thread is closing/releasing the
 socket before the thread was done with it, Occums Razor and all that.
 
 A more complex answer.. might it have something to do with who owns the
 socket handle. It's created in the main thread, and can either be
 inherited by the child thread, or duplicated, or.. depending on how the
 thread is created.
 
 Does the main thread wait for it's child threads to finish before exiting?
 
 Is the code somewhere I can look at it?
 
 Regan

Yes, please have a look at it! I'm kind of stumped myself about this problem. Go to http://www.dsource.org/projects/mango/ for the zip downloads. I'm running Gentoo Linux, and my limited knowledge of threads on linux hasn't helped ferret out the problem. (Sorry, Kris, couldn't resist jumping in on this one).
Jul 18 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
"John Reimer"  wrote in message
 (Sorry, Kris, couldn't resist jumping in on this one).


You and I have both expended a lot of effort on this :-)
Jul 18 2004
prev sibling next sibling parent reply Ben Hinkle <bhinkle4 juno.com> writes:
Does the main thread wait for it's child threads to finish before exiting?

no - Mike Swieton and I have been battling this problem with the concurrent library. I think Mike suggested one make sure the last part of your main gets the list of threads and waits for them to finish explicitly. Something like import std.thread; int main(char[][] args) { ... while (Thread.nthreads > 1) Thread.yield(); return 0; }
Jul 18 2004
parent "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Good idea: might at least show some alternate behaviour. Thanks Ben.

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:189mf0dfl4rqvobet6gqdvfolk8kc0r7ti 4ax.com...
Does the main thread wait for it's child threads to finish before


 no - Mike Swieton and I have been battling this problem with the
 concurrent library. I think Mike suggested one make sure the last part
 of your main gets the list of threads and waits for them to finish
 explicitly. Something like

 import std.thread;
 int main(char[][] args) {
   ...
   while (Thread.nthreads > 1) Thread.yield();
   return 0;
 }

Jul 18 2004
prev sibling parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
 The simplest answer would be that the main thread is closing/releasing the
 socket before the thread was done with it, Occums Razor and all that.

Tried various combinations of that Regan; the thread watches for termination conditions and exceptions, but the segfault (apparently) happens within the lower level (sockets) library. As noted earlier; if the thread is paused() and then resumed() everything terminates perfectly. It's almost as though there's an early race condition or something and some socket state gets clobbered (I really need to get a linux box for Mango)
 A more complex answer.. might it have something to do with who owns the
 socket handle. It's created in the main thread, and can either be
 inherited by the child thread, or duplicated, or.. depending on how the
 thread is created.

The socket is created by the main thread, and handed to the child.
 Does the main thread wait for it's child threads to finish before exiting?

No. The order of termination is determined by the D runtime. Experience indicates that the live D threads are paused during shutdown, and any live sockets are already interrupted by then. The reason main() does not wait is that the child threads are typically blocked on a read() by the OS. It is possible to keep track of all sockets used, but that can lead to all kinds of other problems (this is a library, not a managed application).
 Is the code somewhere I can look at it?

Absolutely. Pick up the v0.91 zipfiles from here: http://dsource.org/projects/mango/?sec=downloads Even better, grab the very latest from SVN here: http://svn.dsource.org/svn/projects/mango/trunk/ The latest stuff in SVN has some version (LinuxTrace){} stuff in it to help follow the execution path. Take a look near bottom of unittest.d where there's a testMulticast() function ... Though the quantity of source might look intimidating, there's actually on a few classes involved with this issue (SocketListener, Thread, Socket). If you wish, let's move onto email. I can walk you through a variety of "uncomment this, then this ..." scenarios to try out, plus some version() stuff to try. For those just checking in: this is a linux only issue ~ works as designed running under Win32, which might be the problem :-) - Kris an
 On Sun, 18 Jul 2004 17:34:45 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:
 Bummer; it's a SEGV.  Thanks for trying mate ...

 - Kris

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbcuyfpv5a2sq9 digitalmars.com...
 Unfortunately.. either we've forgotten we had this problem, or we



 have this exact problem.

 Is it definately a SIGSEGV or could it be a SIGHUP you're getting



 termination?

 Regan

 On Sat, 17 Jul 2004 23:22:25 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:
 Thanks Regan ...

 "Regan Heath" <regan netwin.co.nz> wrote in message
 news:opsbbj3vx55a2sq9 digitalmars.com...
 On Sat, 17 Jul 2004 22:27:39 -0700, Kris
 <someidiot earthlink.dot.dot.dot.net> wrote:

 I have a SocketListener class that accepts a socket during

 and
 fires up a thread to read said socket and dispatch to a



 notification routine. The socket is of the datagram variety.

 Everything works fine until program termination, where the blocked


 (waiting in a socket-receive function) segfaults instead of




 Here's the kicker: if, just after starting the thread, I pause()



 then
 resume() it, everything works perfectly during termination also.



 is,
 the socket-receive unblocks and returns as expected, causing the

 to
 shutdown gracefully.

 There are no such woes under Win32 ...

 Help! Any ideas?

I believe I/we ran into this where I work once. I have sent a





 to
 the guys involved, hopefully someone remembers why it occurs and how


 solved it.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Jul 18 2004
parent reply teqDruid <me teqdruid.com> writes:
On Sun, 18 Jul 2004 19:06:28 -0700, Kris wrote:
 some socket state gets clobbered (I really need to get a linux box for
 Mango)

I'm sure you can find someone with a linux server that's willing to give you a shell account. I know a coupla guys you might try to contact.
Jul 19 2004
parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
I somehow missed your post Druid. Sorry, and thank you for the suggestion. I
was given one in the meantime (for doing remote sanity checks).

"teqDruid" <me teqdruid.com> wrote in message
news:pan.2004.07.19.09.09.26.816995 teqdruid.com...
 On Sun, 18 Jul 2004 19:06:28 -0700, Kris wrote:
 some socket state gets clobbered (I really need to get a linux box for
 Mango)

I'm sure you can find someone with a linux server that's willing to give you a shell account. I know a coupla guys you might try to contact.

Jul 21 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
I got me a linux box too.. kinds.. VMware is really cool!
I'll have a play with this tomorrow :)

On Wed, 21 Jul 2004 22:43:59 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:
 I somehow missed your post Druid. Sorry, and thank you for the 
 suggestion. I
 was given one in the meantime (for doing remote sanity checks).

 "teqDruid" <me teqdruid.com> wrote in message
 news:pan.2004.07.19.09.09.26.816995 teqdruid.com...
 On Sun, 18 Jul 2004 19:06:28 -0700, Kris wrote:
 some socket state gets clobbered (I really need to get a linux box for
 Mango)

I'm sure you can find someone with a linux server that's willing to give you a shell account. I know a coupla guys you might try to contact.


-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 22 2004
parent Regan Heath <regan netwin.co.nz> writes:
Ok.. trying to replicate the working case on my VMware linux machine I get:

[root linux mango]# ./mangotest
looking up exception error code ...
throwing socket exception: Unable to join multicast group. Error # 19
4 FATAL mango.unittest - Unable to join multicast group. Error # 19
socket cancel status now set
closing resource via destructor
closing socket handle ...
socket handle closed

Error #19 is "No such device" (according to strerror).

 From a brief look at the code the exception is thrown here:
   mango\io\MulticastSocket.d(121): exception ("Unable to join multicast 
group. Error # ");

due to this call:
   if (! setGroup (groupAddress, Option.IP_ADD_MEMBERSHIP))

which returns whether setsockopt worked or not. According to the man pages 
on setsockopt it does not set that error code.. so something weird appears 
to be going on, perhaps due to the way VMware handles the networking.

I found getError() is used to get the errno value, what about setting it? 
It would be good to set it to 0 before the setsockopt call to ensure 
setsockopt is really setting errno to this value.

As a side note, how does getError work, it's not a C fn someone must have 
stubbed access to errno, right? but where/how :)

Regan.

On Thu, 22 Jul 2004 23:35:14 +1200, Regan Heath <regan netwin.co.nz> wrote:
 I got me a linux box too.. kinds.. VMware is really cool!
 I'll have a play with this tomorrow :)

 On Wed, 21 Jul 2004 22:43:59 -0700, Kris 
 <someidiot earthlink.dot.dot.dot.net> wrote:
 I somehow missed your post Druid. Sorry, and thank you for the 
 suggestion. I
 was given one in the meantime (for doing remote sanity checks).

 "teqDruid" <me teqdruid.com> wrote in message
 news:pan.2004.07.19.09.09.26.816995 teqdruid.com...
 On Sun, 18 Jul 2004 19:06:28 -0700, Kris wrote:
 some socket state gets clobbered (I really need to get a linux box 

 Mango)

I'm sure you can find someone with a linux server that's willing to give you a shell account. I know a coupla guys you might try to contact.



-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 22 2004