digitalmars.D - Thread fails to start
- Adam Conner-Sax <adam_conner_sax yahoo.com> Jan 01 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 01 2011
- Adam Conner-Sax <adam_conner_sax yahoo.com> Jan 02 2011
- Sean Kelly <sean invisibleduck.org> Jan 02 2011
- Adam Conner-Sax <adam_conner_sax yahoo.com> Jan 02 2011
- Sean Kelly <sean invisibleduck.org> Jan 02 2011
- Adam Conner-Sax <adam_conner_sax yahoo.com> Jan 02 2011
- Adam Conner-Sax <adam_conner_sax yahoo.com> Jan 02 2011
As a way to learn D, I am writing a quick test setup for examining different ways of passing data from one set of threads to another. I am trying a few kinds of queues (resizeable array with locking, linked list with locking and lockfree with cas) and trying to also add message passing and then compare performance. Anyway, I'm running into an odd case where a thread fails to start. The code simply hangs in the Threadgroup.create(...) call. I am printing (with unbuffered i/o) right before the call to "create" and then as soon as the threadfunction starts so as far as I can tell, the "create" call is made but the threadfunc never starts and "create" never returns. It's repeatable but doesn't happen every time the queue is used. It happens sometimes when the queue is locking and sometimes when it is lockfree. I'd be happy to post code but for now I thought I'd just see if anyone can think of why that might happen or can provide some ideas for how to debug it. The code always starts the one consumer thread (via spawn. I'm learning!), then loops over the producers and creates them in a threadgroup just so I can do a "joinAll" to wait for them to finish. Also, when the problem occurs, it's always the first producer thread that fails. I never get 2 out of 4 started, e.g. I am truly enjoying learning D. Coming from a C++ and C# background, it's a pleasure. Any ideas would be appreciated. Thanks! Adam
Jan 01 2011
On 01/01/2011 06:02 PM, Adam Conner-Sax wrote:As a way to learn D, I am writing a quick test setup for examining different ways of passing data from one set of threads to another. I am trying a few kinds of queues (resizeable array with locking, linked list with locking and lockfree with cas) and trying to also add message passing and then compare performance. Anyway, I'm running into an odd case where a thread fails to start. The code simply hangs in the Threadgroup.create(...) call. I am printing (with unbuffered i/o) right before the call to "create" and then as soon as the threadfunction starts so as far as I can tell, the "create" call is made but the threadfunc never starts and "create" never returns. It's repeatable but doesn't happen every time the queue is used. It happens sometimes when the queue is locking and sometimes when it is lockfree. I'd be happy to post code but for now I thought I'd just see if anyone can think of why that might happen or can provide some ideas for how to debug it. The code always starts the one consumer thread (via spawn. I'm learning!), then loops over the producers and creates them in a threadgroup just so I can do a "joinAll" to wait for them to finish. Also, when the problem occurs, it's always the first producer thread that fails. I never get 2 out of 4 started, e.g. I am truly enjoying learning D. Coming from a C++ and C# background, it's a pleasure. Any ideas would be appreciated. Thanks! Adam
Welcome. I suggest you post some code to serve as a basis for suggestions. Andrei
Jan 01 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 01/01/2011 06:02 PM, Adam Conner-Sax wrote:As a way to learn D, I am writing a quick test setup for examining different ways of passing data from one set of threads to another. I am trying a few kinds of queues (resizeable array with locking, linked list with locking and lockfree with cas) and trying to also add message passing and then compare performance. Anyway, I'm running into an odd case where a thread fails to start. The code simply hangs in the Threadgroup.create(...) call. I am printing (with unbuffered i/o) right before the call to "create" and then as soon as the threadfunction starts so as far as I can tell, the "create" call is made but the threadfunc never starts and "create" never returns. It's repeatable but doesn't happen every time the queue is used. It happens sometimes when the queue is locking and sometimes when it is lockfree. I'd be happy to post code but for now I thought I'd just see if anyone can think of why that might happen or can provide some ideas for how to debug it. The code always starts the one consumer thread (via spawn. I'm learning!), then loops over the producers and creates them in a threadgroup just so I can do a "joinAll" to wait for them to finish. Also, when the problem occurs, it's always the first producer thread that fails. I never get 2 out of 4 started, e.g. I am truly enjoying learning D. Coming from a C++ and C# background, it's a pleasure. Any ideas would be appreciated. Thanks! Adam
Andrei
Okay, thanks. Was trying to avoid subjecting you all to my naive first try code... alias void delegate(in Data_Packet d) sender; void produce(sender send, Random r, int packets, int packets_per_s, int us_variability) { debug { printf("in produce(...)\n"); } int packet_spacing = 10000000/packets_per_s; //units are 100ns // make sure we can't have -tve sleep int packet_variability = min(10*us_variability,packet_spacing+1); packets--; // one extra produced after loop with signal to end int counter = 0; while (counter++ < packets) { Data_Packet data; data.pTime = systime(); debug (5) { printf("Sending t=%i\n",data.pTime.value); } debug (3) { printf("S"); } send(data); Thread.sleep(packet_spacing + uniform(-packet_variability,packet_variability,r)); } Data_Packet data; data.last = true; data.pTime = systime(); send(data); } for (int k=0; k<num_producers_; ++k) { debug { printf("creating & starting producer\n"); } producer_threads[k] = producerThreads.create({produce(delegate void(in Data_Packet d) { q.produce(d); }, gen,packets_per_producer_,pps,microsecond_variability_);}); debug { printf("started.\n"); } } So, in the normal case, I get matched "creating & starting..." and "started" and "in produce()" In the case where it hangs, all I get is "Creating and starting..." and then it stays forever (or as long as I have patience to leave it). Full source attached
Jan 02 2011
Adam Conner-Sax Wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 01/01/2011 06:02 PM, Adam Conner-Sax wrote:As a way to learn D, I am writing a quick test setup for examining different ways of passing data from one set of threads to another. I am trying a few kinds of queues (resizeable array with locking, linked list with locking and lockfree with cas) and trying to also add message passing and then compare performance. Anyway, I'm running into an odd case where a thread fails to start. The code simply hangs in the Threadgroup.create(...) call. I am printing (with unbuffered i/o) right before the call to "create" and then as soon as the threadfunction starts so as far as I can tell, the "create" call is made but the threadfunc never starts and "create" never returns.
What OS are you using? ThreadGroup.create is extremely simple, the problem is almost definitely in either the thread startup code itself or in the preamble of your supplied thread routine. What would be great is if you could produce a minimal repro. The full source you included is a bit much to easily figure out where the problem may be.It's repeatable but doesn't happen every time the queue is used. It happens sometimes when the queue is locking and sometimes when it is lockfree. I'd be happy to post code but for now I thought I'd just see if anyone can think of why that might happen or can provide some ideas for how to debug it. The code always starts the one consumer thread (via spawn. I'm learning!), then loops over the producers and creates them in a threadgroup just so I can do a "joinAll" to wait for them to finish. Also, when the problem occurs, it's always the first producer thread that fails. I never get 2 out of 4 started, e.g.
If you're using std.concurrency then don't start threads manually using core.thread. The mailbox for a thread is created by spawn, and you'll get a segfault trying to send a message to a thread started using core.thread. I could change this to throw an exception instead though.So, in the normal case, I get matched "creating & starting..." and "started" and "in produce()" In the case where it hangs, all I get is "Creating and starting..." and then it stays forever (or as long as I have patience to leave it).
I tried this twice. The first time I got partway into the third test and got a bus error. The second time (run via GDB) I have this on my screen and it's been this way for a while now: F,LL,Lk 8.87 64.41 970.64 0 50566 1.91 4512.36 starting consumer started. creating & starting producer in consume(...) in produce(...) Seems different from what you've experienced.
Jan 02 2011
Thanks for trying it! I've seen that outcome once also but usually I don't get the "in produce(...)" when it hangs. And I don't get the bus errors (I've gotten them other ways). I get that I should use spawn (and I am writing a new version to use spawn everywhere), though I did make it all work without ever using spawn so the mailbox setup seems to work anyway though I do not understand how. Are both (from spawner to spawnee and vice versa) mailboxes set up by spawn? How does the unit test in std.concurrency work? It spawns but then sends messages in both directions. If I get the error on a smaller subset I will post. It was all working then at some point as I was moving toward the std.concurrency way, it all broke in the way I described so I don't know how to make a smaller version that also has the error. Adam
Jan 02 2011
Adam Conner-Sax Wrote:Thanks for trying it! I've seen that outcome once also but usually I don't get the "in produce(...)" when it hangs. And I don't get the bus errors (I've gotten them other ways). I get that I should use spawn (and I am writing a new version to use spawn everywhere), though I did make it all work without ever using spawn so the mailbox setup seems to work anyway though I do not understand how. Are both (from spawner to spawnee and vice versa) mailboxes set up by spawn? How does the unit test in std.concurrency work? It spawns but then sends messages in both directions.
spawn creates the mailbox in the new thread. I'm going to change thisTid to create a mailbox for the current thread if none exists though. That should resolve the issue I mentioned earlier. The main thread gets a mailbox by default, if I remember correctly. I just didn't want to give other threads one by default because the ref is immediately overwritten by spawn.If I get the error on a smaller subset I will post. It was all working then at some point as I was moving toward the std.concurrency way, it all broke in the way I described so I don't know how to make a smaller version that also has the error.
I'll try to find some time to give your code a closer look. It seems like it could be used as a more thorough test suite for the thread and messaging code.
Jan 02 2011
Thanks! It's OSX, by the way. So it's clear, I understand that message passing is preferred and I can see how to do that (kind of!) but I want to compare the performance to other queue implementations so I can see that message passing is faster or comparable. Originally, I was just trying to compare (avg and worst case latency) Locking and LockFree queues (and learn the basics of using CAS) but then I wanted to add message passing to the list to compare. Combining them all in one test harness is proving challenging. That the main thread has a default mailbox makes sense and explains a lot. Though I doubt I know enough to comment usefully, creating a mailbox if "thisTid" is called makes sense to me. Thanks again. Adam
Jan 02 2011
Okay. Here's a working version. Does all the hand-coded queues in old-style threads and the message passing via spawn. Code attached. FWIW, message passing is (in terms of avg and max latency) on par with the locking linked-list queues but the lockfree linked list queues are much faster (6x to 7x) in all the scenarios I tried. I'm running on a 2 x 2.8 GHz Quad-Core Intel Xeon running Mac OS X. Am I doing the message passing wrong somehow or is it expected to underperform lockfree queueing if transaction speed is all that matters? D really is great. I haven't had this much fun programming in a while. Thanks! Adam
Jan 02 2011








Adam Conner-Sax <adam_conner_sax yahoo.com>