www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [std.concurrency] Critical bug

reply osa <osa aso.osa> writes:
I've struggled with occasional hangs in my code using std.concurrency 
for a while. Initially I thought it was my fault but it seems that 
std.concurrency has a critical bug which makes it completely unusable, 
at least for me. The problem is that in some situations message sent by 
send() is never delivered to receive(). This happens when the thread on 
receive() side has other linked threads (started by spawnLinked) and 
those threads terminate causing LinkTerminated exception. And this 
exception screws receiving thread queue completely. Here is the smallest 
example I could come up with: let's suppose we have two threads, "main" 
and "service", exchanging messages in a simple loop.
The "main" thread sends message A to the "service" and waits for message 
B. The "service" thread waits for message A and sends message B to the 
main thread:

main thread:
   for( ;; ) {
     send( service, A() );
     receive( ( B ){} );
   }

service thread:
   for( ;; ) {
     receive( ( A ){} );
     send( main, B() );
   }

This works like a charm. But if we have another linked thread spawned 
from the service, and that thread terminates causing LinkTerminated 
exception raised by call to receive() in service thread, next receive() 
calls never succeed. Below is the actual test program:
-------
// compile with -version=hang to see the problem
import std.concurrency;
import std.stdio;

struct A { int c; }
struct B {}

void main() {
     auto service = spawn( &service_proc, thisTid );
     int count;
     for( count = 0; count < 200; ++count ) {

         send( service, A( count ) );
         writeln( "main\t: waiting for B" );
         receive( ( B ){} );
     }
     writeln( "done: ", count, " iterations" );
}

void service_proc( Tid main_tid ) {
     Tid child;
     for( ;; ) {
         version(hang) if( child == Tid.init ) child = spawnLinked( 
&child_proc );
         try {
             if( child != Tid.init ) send( child, 42 );
             writeln( "service\t: waiting for A" );

received, sending B to main" ); } );
             send( main_tid, B() );
         }
         catch( LinkTerminated e ) {
             assert( e.tid == child );
             writeln( "service\t: link terminated" );
             child = Tid.init;
         }
         catch( OwnerTerminated ) {
             return;
         }
     }
}

void child_proc() {
     for( int i = 0; i < 2; ++i )
         receive( ( int ){} );
}
-------
Without -version=hang, no child thread is started from the service and 
everything works fine, output is like

	service	: waiting for A
	main	: waiting for B

	service	: waiting for A
	.......................

	service	: waiting for A
	done: 200 iterations


If compiled with -version=hang (dmd v2.049, tried both Windows and 
Linux, makes no difference), the service starts a child thread and the 
output is this:

	main	: waiting for B
	service	: waiting for A

	service	: waiting for A

	main	: waiting for B

	service	: waiting for A

	main	: waiting for B
	service	: link terminated
	service	: waiting for A
and the program hangs forever.

Sometimes it takes more messages, and sometimes it even works fine, but 
in most cases the last A sent to the service before LinkTerminated 
exception is lost and never received.

I did not file the bug in Bugzilla yet but if anyone confirms that the 
problem is true, I'd file it.
Sep 30 2010
parent Sean Kelly <sean invisibleduck.org> writes:
osa Wrote:

 I've struggled with occasional hangs in my code using std.concurrency 
 for a while. Initially I thought it was my fault but it seems that 
 std.concurrency has a critical bug which makes it completely unusable, 
 at least for me. The problem is that in some situations message sent by 
 send() is never delivered to receive(). This happens when the thread on 
 receive() side has other linked threads (started by spawnLinked) and 
 those threads terminate causing LinkTerminated exception. And this 
 exception screws receiving thread queue completely.
Thanks for the repro. The bug is fixed in SVN rev 2078.
Sep 30 2010