www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Thread fails to start

reply Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
As a way to learn D, I am writing a quick test setup for examining different
ways of passing data from one set of threads to another.  I am trying a few
kinds of queues (resizeable array with locking, linked list with locking and
lockfree with cas) and trying to also add message passing and then compare
performance.

Anyway, I'm running into an odd case where a thread fails to start.  The code
simply hangs in the Threadgroup.create(...) call. I am printing (with
unbuffered i/o) right before the call to "create" and then as soon as the
threadfunction starts so as far as I can tell, the "create" call is made but
the threadfunc never starts and "create" never returns.

It's repeatable but doesn't happen every time the queue is used. It happens
sometimes when the queue is locking and sometimes when it is lockfree. I'd be
happy to post code but for now I thought I'd just see if anyone can think of
why that might happen or can provide some ideas for how to debug it.

The code always starts the one consumer thread (via spawn. I'm learning!),
then loops over the producers and creates them in a threadgroup just so I can
do a "joinAll" to wait for them to finish.  Also, when the problem occurs,
it's always the first producer thread that fails.  I never get 2 out of 4
started, e.g.

I am truly enjoying learning D.  Coming from a C++ and C# background, it's a
pleasure.

Any ideas would be appreciated.

Thanks!

Adam
Jan 01 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/01/2011 06:02 PM, Adam Conner-Sax wrote:
 As a way to learn D, I am writing a quick test setup for examining different
 ways of passing data from one set of threads to another.  I am trying a few
 kinds of queues (resizeable array with locking, linked list with locking and
 lockfree with cas) and trying to also add message passing and then compare
 performance.

 Anyway, I'm running into an odd case where a thread fails to start.  The code
 simply hangs in the Threadgroup.create(...) call. I am printing (with
 unbuffered i/o) right before the call to "create" and then as soon as the
 threadfunction starts so as far as I can tell, the "create" call is made but
 the threadfunc never starts and "create" never returns.

 It's repeatable but doesn't happen every time the queue is used. It happens
 sometimes when the queue is locking and sometimes when it is lockfree. I'd be
 happy to post code but for now I thought I'd just see if anyone can think of
 why that might happen or can provide some ideas for how to debug it.

 The code always starts the one consumer thread (via spawn. I'm learning!),
 then loops over the producers and creates them in a threadgroup just so I can
 do a "joinAll" to wait for them to finish.  Also, when the problem occurs,
 it's always the first producer thread that fails.  I never get 2 out of 4
 started, e.g.

 I am truly enjoying learning D.  Coming from a C++ and C# background, it's a
 pleasure.

 Any ideas would be appreciated.

 Thanks!

 Adam

Welcome. I suggest you post some code to serve as a basis for suggestions. Andrei
Jan 01 2011
parent reply Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 01/01/2011 06:02 PM, Adam Conner-Sax wrote:
 As a way to learn D, I am writing a quick test setup for examining different
 ways of passing data from one set of threads to another.  I am trying a few
 kinds of queues (resizeable array with locking, linked list with locking and
 lockfree with cas) and trying to also add message passing and then compare
 performance.

 Anyway, I'm running into an odd case where a thread fails to start.  The code
 simply hangs in the Threadgroup.create(...) call. I am printing (with
 unbuffered i/o) right before the call to "create" and then as soon as the
 threadfunction starts so as far as I can tell, the "create" call is made but
 the threadfunc never starts and "create" never returns.

 It's repeatable but doesn't happen every time the queue is used. It happens
 sometimes when the queue is locking and sometimes when it is lockfree. I'd be
 happy to post code but for now I thought I'd just see if anyone can think of
 why that might happen or can provide some ideas for how to debug it.

 The code always starts the one consumer thread (via spawn. I'm learning!),
 then loops over the producers and creates them in a threadgroup just so I can
 do a "joinAll" to wait for them to finish.  Also, when the problem occurs,
 it's always the first producer thread that fails.  I never get 2 out of 4
 started, e.g.

 I am truly enjoying learning D.  Coming from a C++ and C# background, it's a
 pleasure.

 Any ideas would be appreciated.

 Thanks!

 Adam

Andrei

Okay, thanks. Was trying to avoid subjecting you all to my naive first try code... alias void delegate(in Data_Packet d) sender; void produce(sender send, Random r, int packets, int packets_per_s, int us_variability) { debug { printf("in produce(...)\n"); } int packet_spacing = 10000000/packets_per_s; //units are 100ns // make sure we can't have -tve sleep int packet_variability = min(10*us_variability,packet_spacing+1); packets--; // one extra produced after loop with signal to end int counter = 0; while (counter++ < packets) { Data_Packet data; data.pTime = systime(); debug (5) { printf("Sending t=%i\n",data.pTime.value); } debug (3) { printf("S"); } send(data); Thread.sleep(packet_spacing + uniform(-packet_variability,packet_variability,r)); } Data_Packet data; data.last = true; data.pTime = systime(); send(data); } for (int k=0; k<num_producers_; ++k) { debug { printf("creating & starting producer\n"); } producer_threads[k] = producerThreads.create({produce(delegate void(in Data_Packet d) { q.produce(d); }, gen,packets_per_producer_,pps,microsecond_variability_);}); debug { printf("started.\n"); } } So, in the normal case, I get matched "creating & starting..." and "started" and "in produce()" In the case where it hangs, all I get is "Creating and starting..." and then it stays forever (or as long as I have patience to leave it). Full source attached
Jan 02 2011
parent reply Sean Kelly <sean invisibleduck.org> writes:
Adam Conner-Sax Wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 01/01/2011 06:02 PM, Adam Conner-Sax wrote:
 As a way to learn D, I am writing a quick test setup for examining different
 ways of passing data from one set of threads to another.  I am trying a few
 kinds of queues (resizeable array with locking, linked list with locking and
 lockfree with cas) and trying to also add message passing and then compare
 performance.

 Anyway, I'm running into an odd case where a thread fails to start.  The code
 simply hangs in the Threadgroup.create(...) call. I am printing (with
 unbuffered i/o) right before the call to "create" and then as soon as the
 threadfunction starts so as far as I can tell, the "create" call is made but
 the threadfunc never starts and "create" never returns.



What OS are you using? ThreadGroup.create is extremely simple, the problem is almost definitely in either the thread startup code itself or in the preamble of your supplied thread routine. What would be great is if you could produce a minimal repro. The full source you included is a bit much to easily figure out where the problem may be.
 It's repeatable but doesn't happen every time the queue is used. It happens
 sometimes when the queue is locking and sometimes when it is lockfree. I'd be
 happy to post code but for now I thought I'd just see if anyone can think of
 why that might happen or can provide some ideas for how to debug it.

 The code always starts the one consumer thread (via spawn. I'm learning!),
 then loops over the producers and creates them in a threadgroup just so I can
 do a "joinAll" to wait for them to finish.  Also, when the problem occurs,
 it's always the first producer thread that fails.  I never get 2 out of 4
 started, e.g.



If you're using std.concurrency then don't start threads manually using core.thread. The mailbox for a thread is created by spawn, and you'll get a segfault trying to send a message to a thread started using core.thread. I could change this to throw an exception instead though.
 So, in the normal case, I get matched "creating & starting..." and "started"
and
 "in produce()"
 
 In the case where it hangs, all I get is "Creating and starting..." and then it
 stays forever (or as long as I have patience to leave it).

I tried this twice. The first time I got partway into the third test and got a bus error. The second time (run via GDB) I have this on my screen and it's been this way for a while now: F,LL,Lk 8.87 64.41 970.64 0 50566 1.91 4512.36 starting consumer started. creating & starting producer in consume(...) in produce(...) Seems different from what you've experienced.
Jan 02 2011
parent reply Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
Thanks for trying it!

I've seen that outcome once also but usually I don't get the "in produce(...)"
when it hangs.  And I don't get the bus errors (I've gotten them other ways).

I get that I should use spawn (and I am writing a new version to use spawn
everywhere), though I did make it all work without ever using spawn so the
mailbox
setup seems to work anyway though I do not understand how. Are both (from
spawner
to spawnee and vice versa) mailboxes set up by spawn?  How does the unit test in
std.concurrency work?  It spawns but then sends messages in both directions.

If I get the error on a smaller subset I will post.  It was all working then at
some point as I was moving toward the std.concurrency way, it all broke in the
way
I described so I don't know how to make a smaller version that also has the
error.

Adam
Jan 02 2011
parent reply Sean Kelly <sean invisibleduck.org> writes:
Adam Conner-Sax Wrote:

 Thanks for trying it!
 
 I've seen that outcome once also but usually I don't get the "in produce(...)"
 when it hangs.  And I don't get the bus errors (I've gotten them other ways).
 
 I get that I should use spawn (and I am writing a new version to use spawn
 everywhere), though I did make it all work without ever using spawn so the
mailbox
 setup seems to work anyway though I do not understand how. Are both (from
spawner
 to spawnee and vice versa) mailboxes set up by spawn?  How does the unit test
in
 std.concurrency work?  It spawns but then sends messages in both directions.

spawn creates the mailbox in the new thread. I'm going to change thisTid to create a mailbox for the current thread if none exists though. That should resolve the issue I mentioned earlier. The main thread gets a mailbox by default, if I remember correctly. I just didn't want to give other threads one by default because the ref is immediately overwritten by spawn.
 If I get the error on a smaller subset I will post.  It was all working then at
 some point as I was moving toward the std.concurrency way, it all broke in the
way
 I described so I don't know how to make a smaller version that also has the
error.

I'll try to find some time to give your code a closer look. It seems like it could be used as a more thorough test suite for the thread and messaging code.
Jan 02 2011
parent reply Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
Thanks!

It's OSX, by the way.

So it's clear, I understand that message passing is preferred and I can see how
to
do that (kind of!) but I want to compare the performance to other queue
implementations so I can see that message passing is faster or comparable.

Originally, I was just trying to compare (avg and worst case latency) Locking
and
LockFree queues (and learn the basics of using CAS) but then I wanted to add
message passing to the list to compare.  Combining them all in one test harness
is
proving challenging.

That the main thread has a default mailbox makes sense and explains a lot. 
Though
I doubt I know enough to comment usefully, creating a mailbox if "thisTid" is
called makes sense to me.

Thanks again.

Adam
Jan 02 2011
parent Adam Conner-Sax <adam_conner_sax yahoo.com> writes:
Okay.  Here's a working version.  Does all the hand-coded queues in old-style
threads and the message passing via spawn.

Code attached.

FWIW, message passing is (in terms of avg and max latency) on par with the
locking
linked-list queues but the lockfree linked list queues are much faster (6x to
7x)
in all the scenarios I tried.  I'm running on a 2 x 2.8 GHz Quad-Core Intel Xeon
running Mac OS X.

Am I doing the message passing wrong somehow or is it expected to underperform
lockfree queueing if transaction speed is all that matters?

D really is great.  I haven't had this much fun programming in a while.  Thanks!

Adam
Jan 02 2011