www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Getting started with threads in D

reply "Henrik Valter Vogelius Hansson" <groogy groogy.se> writes:
Hi again!

I have looked around a little with what D offers but don't know 
really what I should use since D offers several ways to use 
threads. Some more high level than others. Don't really also know 
which one would be suitable for me.

A little background could help. I am a game developer and during 
my semester I want to experiment with making games in D. I use 
threads to separate some tasks that can easily work in parallel 
with each other. The most common being a Logic/Graphics 
separation. But as development progresses I usually add more 
threads like inside graphics I can end up with 2 or 3 more 
threads.

I want to avoid Amdahl's law as much as possible and have as 
small synchronization nodes. The data exchange should be as basic 
as possible but still have room for improvements and future 
additions.

The Concurrency library looked very promising but felt like the 
synchronization wouldn't be that nice but it would provide a 
random-access to the data in your code. Correct me of course if I 
am wrong. Is there a good thread pool system that could be used? 
Does that system also handle solving dependencies in the 
work-flow? This is what we use at my work more or less.

In worst case scenario I will just use the basic thread class and 
implement my own system above that. Then there is the question, 
is there any pitfalls in the current library that I should be 
aware of?
Jun 16 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, June 17, 2012 03:15:44 Henrik Valter Vogelius Hansson wrote:
 Hi again!
 
 I have looked around a little with what D offers but don't know
 really what I should use since D offers several ways to use
 threads. Some more high level than others. Don't really also know
 which one would be suitable for me.
 
 A little background could help. I am a game developer and during
 my semester I want to experiment with making games in D. I use
 threads to separate some tasks that can easily work in parallel
 with each other. The most common being a Logic/Graphics
 separation. But as development progresses I usually add more
 threads like inside graphics I can end up with 2 or 3 more
 threads.
 
 I want to avoid Amdahl's law as much as possible and have as
 small synchronization nodes. The data exchange should be as basic
 as possible but still have room for improvements and future
 additions.
 
 The Concurrency library looked very promising but felt like the
 synchronization wouldn't be that nice but it would provide a
 random-access to the data in your code. Correct me of course if I
 am wrong. Is there a good thread pool system that could be used?
 Does that system also handle solving dependencies in the
 work-flow? This is what we use at my work more or less.
 
 In worst case scenario I will just use the basic thread class and
 implement my own system above that. Then there is the question,
 is there any pitfalls in the current library that I should be
 aware of?
For starters, read this: http://www.informit.com/articles/article.aspx?p=1609144 And look at these modules in the standard library: http://dlang.org/phobos/std_concurrency.html http://dlang.org/phobos/std_parallelism.html - Jonathan M Davis
Jun 16 2012
prev sibling parent reply Russel Winder <russel winder.org.uk> writes:
On Sun, 2012-06-17 at 03:15 +0200, Henrik Valter Vogelius Hansson wrote:
 Hi again!
=20
 I have looked around a little with what D offers but don't know=20
 really what I should use since D offers several ways to use=20
 threads. Some more high level than others. Don't really also know=20
 which one would be suitable for me.
My take on this is that as soon as an applications programmer talks about using threads in their program, they have admitted they are working at the wrong level. Applications programmers do not manage their control stacks, applications programmers do not manage their heaps, why on earth manage your threads. Threads are an implementation resource best managed by an abstraction. Using processes and message passing (over a thread pool, as you are heading towards in comments below) has proven over the last 30+ years to be the only scalable way of managing parallelism, so use it as a concurrency technique as well and get parallelism as near as for free as it is possible to get. Ancient models and techniques such as actors, dataflow, CSP, data parallelism are making a resurgence exactly because explicit shared memory multi-threading is an inappropriate technique. It has just taken the world 15+ years to appreciate this.
 A little background could help. I am a game developer and during=20
 my semester I want to experiment with making games in D. I use=20
 threads to separate some tasks that can easily work in parallel=20
 with each other. The most common being a Logic/Graphics=20
 separation. But as development progresses I usually add more=20
 threads like inside graphics I can end up with 2 or 3 more=20
 threads.
I can only repeat the above: don't think in terms of threads and shared memory, think in terms of processes and messages passed between them.
 I want to avoid Amdahl's law as much as possible and have as=20
 small synchronization nodes. The data exchange should be as basic=20
 as possible but still have room for improvements and future=20
 additions.
Isn't the current hypothesis that you can't avoid Amdahl's Law? If what you mean is that you want to ensure you have an embarrassingly parallel solution so that speed up is linear that seems entirely reasonable, but then D has a play in this game with the std.parallelism module. It uses the term "task" rather than process or thread to try and enforce an algorithm-focused view. cf. http://dlang.org/phobos/std_parallelism.html
 The Concurrency library looked very promising but felt like the=20
 synchronization wouldn't be that nice but it would provide a=20
 random-access to the data in your code. Correct me of course if I=20
 am wrong. Is there a good thread pool system that could be used?=20
 Does that system also handle solving dependencies in the=20
 work-flow? This is what we use at my work more or less.
What makes you say synchronization is not that nice? Random access, data, threads and parallelism in the same paragraph raises a red flag of warning! std.concurrency is a realization of actors so there is effectively a variety of thread pool involved. std.parallelism has task pools explicitly.=20
 In worst case scenario I will just use the basic thread class and=20
 implement my own system above that. Then there is the question,=20
 is there any pitfalls in the current library that I should be=20
 aware of?
I am sure D's current offerings are not perfect but they do represent a good part of the right direction to be travelling. What is missing is a module for dataflow processing(*) and one for CSP. Sadly I haven't had time to get stuck into doing an implementation as I had originally planned 18 months or so ago: most of my time is now in the Python and Groovy arena as that is where the income comes from. cf. GPars (http://gpars.codehaus.org) and Python-CSP =E2=80=93 though the latter has stopped moving due to planning a whole new Python framework for concurrency and parallelism. (*) People who talk about "you can implement dataflow with actors and vice versa" miss the point about provision of appropriate abstractions with appropriate performance characteristics. =20 --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jun 17 2012
parent reply "Henrik Valter Vogelius Hansson" <groogy groogy.se> writes:
On Sunday, 17 June 2012 at 07:23:38 UTC, Russel Winder wrote:
 On Sun, 2012-06-17 at 03:15 +0200, Henrik Valter Vogelius 
 Hansson wrote:
 Hi again!
 
 I have looked around a little with what D offers but don't 
 know really what I should use since D offers several ways to 
 use threads. Some more high level than others. Don't really 
 also know which one would be suitable for me.
My take on this is that as soon as an applications programmer talks about using threads in their program, they have admitted they are working at the wrong level. Applications programmers do not manage their control stacks, applications programmers do not manage their heaps, why on earth manage your threads. Threads are an implementation resource best managed by an abstraction. Using processes and message passing (over a thread pool, as you are heading towards in comments below) has proven over the last 30+ years to be the only scalable way of managing parallelism, so use it as a concurrency technique as well and get parallelism as near as for free as it is possible to get. Ancient models and techniques such as actors, dataflow, CSP, data parallelism are making a resurgence exactly because explicit shared memory multi-threading is an inappropriate technique. It has just taken the world 15+ years to appreciate this.
 A little background could help. I am a game developer and 
 during my semester I want to experiment with making games in 
 D. I use threads to separate some tasks that can easily work 
 in parallel with each other. The most common being a 
 Logic/Graphics separation. But as development progresses I 
 usually add more threads like inside graphics I can end up 
 with 2 or 3 more threads.
I can only repeat the above: don't think in terms of threads and shared memory, think in terms of processes and messages passed between them.
 I want to avoid Amdahl's law as much as possible and have as 
 small synchronization nodes. The data exchange should be as 
 basic as possible but still have room for improvements and 
 future additions.
Isn't the current hypothesis that you can't avoid Amdahl's Law? If what you mean is that you want to ensure you have an embarrassingly parallel solution so that speed up is linear that seems entirely reasonable, but then D has a play in this game with the std.parallelism module. It uses the term "task" rather than process or thread to try and enforce an algorithm-focused view. cf. http://dlang.org/phobos/std_parallelism.html
 The Concurrency library looked very promising but felt like 
 the synchronization wouldn't be that nice but it would provide 
 a random-access to the data in your code. Correct me of course 
 if I am wrong. Is there a good thread pool system that could 
 be used? Does that system also handle solving dependencies in 
 the work-flow? This is what we use at my work more or less.
What makes you say synchronization is not that nice? Random access, data, threads and parallelism in the same paragraph raises a red flag of warning! std.concurrency is a realization of actors so there is effectively a variety of thread pool involved. std.parallelism has task pools explicitly.
 In worst case scenario I will just use the basic thread class 
 and implement my own system above that. Then there is the 
 question, is there any pitfalls in the current library that I 
 should be aware of?
I am sure D's current offerings are not perfect but they do represent a good part of the right direction to be travelling. What is missing is a module for dataflow processing(*) and one for CSP. Sadly I haven't had time to get stuck into doing an implementation as I had originally planned 18 months or so ago: most of my time is now in the Python and Groovy arena as that is where the income comes from. cf. GPars (http://gpars.codehaus.org) and Python-CSP – though the latter has stopped moving due to planning a whole new Python framework for concurrency and parallelism. (*) People who talk about "you can implement dataflow with actors and vice versa" miss the point about provision of appropriate abstractions with appropriate performance characteristics.
Aight been reading a lot now about it. I'm interested in the TaskPool but there is a problem and also why I have to think about threads. OpenGL/DirectX contexts are only valid for one thread at a time. And with the task pool I can't control what thread to be used with the specified task right? At least from what I could find I couldn't. So that's out of the question. The concurrency library is... I don't know. I most usually do a very fast synchronization swap(just swap two pointers) while the concurrency library seems like it would halt both threads for a longer time. Or am I viewing this from the wrong direction? Should I do it like lazy evaluation maybe? If you need code examples of what I am talking about I can give you that. Though I don't know the code-tag for this message board. I will still use the task pool I think though all OpenGL calls will have to be routed so they are all done on the same thread somehow. The message box for the threads in concurrency, are they thread safe? Let's say we have two logic tasks running in parallel and both are sending messages to the graphics thread. Would that result in undefined behavior or does the concurrency library handle this kind of scenario for you?
Jun 22 2012
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Jun 22, 2012, at 11:17 AM, Henrik Valter Vogelius Hansson wrote:
=20
 Aight been reading a lot now about it. I'm interested in the TaskPool =
but there is a problem and also why I have to think about threads. = OpenGL/DirectX contexts are only valid for one thread at a time. And = with the task pool I can't control what thread to be used with the = specified task right? That's pretty much the entire point of a thread pool--it aims for = optimal task completion time, and does this via an opaque scheduling = mechanism.
 At least from what I could find I couldn't. So that's out of the =
question. The concurrency library is... I don't know. I most usually do = a very fast synchronization swap(just swap two pointers) while the = concurrency library seems like it would halt both threads for a longer = time. Or am I viewing this from the wrong direction? Should I do it like = lazy evaluation maybe? If you need code examples of what I am talking = about I can give you that. Though I don't know the code-tag for this = message board. Games are an odd bird in that performance comes at the expense of much = else, and that it really isn't easy to parallelize the main loop. That = said, the only time the concurrency library would halt a thread is if = you do a receive() with no timeout and the message you want isn't in the = queue. So you can bypass this by using a timeout of 0 (basically a peek = operation), and changing the code path based on whether the desired = message was received.
 I will still use the task pool I think though all OpenGL calls will =
have to be routed so they are all done on the same thread somehow. I think that will net you worse performance than if the main thread just = did everything. You still have synchronous execution but thread = synchronization on top of that. Can ownership of an OpenGL/DirectX = contact be passed between threads? Can you maybe just give every thread = its own context and let it process whatever task you give to it, or is a = context necessarily linked with some set of operations?
 The message box for the threads in concurrency, are they thread safe? =
Let's say we have two logic tasks running in parallel and both are = sending messages to the graphics thread. Would that result in undefined = behavior or does the concurrency library handle this kind of scenario = for you? Since it's a concurrency library, of course the API is thread safe :-) = Basically, how receive() works is it first looks in a thread-local queue = for the desired message. If one wasn't found it acquires a lock on that = thread's shared message queue, moves the shared queue elements into the = local queue, and releases the mutex. Then it scans the new elements in = the list for a match. If it still doesn't find one, it re-acquires the = mutex on the shared queue, and does the same thing. If the shared queue = is ever empty during this process, receive() will block on a condition = variable up to the supplied timeout value. The only performance issue with the concurrency API right now is that it = allocates a struct to wrap each sent message, so there is some GC load. = I experimented with using a shared free list instead however, and it = didn't really help performance in my test cases. I suspect I'd either = have to go to a lock-free free list, or something other fairly fancy = approach. Beyond that, I've experimented with using ref and not using = ref attributes for parameters everywhere applicable, etc. The current = implementation is as fast as I could get things. For future directions, I really want to add inter-process messaging. = That means serialization support and a scalable socket implementation = though. Not to mention free time. I've considered just hacking = together the implementation and limiting inter-process messages to = concrete variables as a proof of concept. That would need just free = time.=
Jun 22 2012
parent "Henrik Valter Vogelius Hansson" <groogy groogy.se> writes:
 Games are an odd bird in that performance comes at the expense 
 of much else, and that it really isn't easy to parallelize the 
 main loop.  That said, the only time the concurrency library 
 would halt a thread is if you do a receive() with no timeout 
 and the message you want isn't in the queue.  So you can bypass 
 this by using a timeout of 0 (basically a peek operation), and 
 changing the code path based on whether the desired message was 
 received.
Well it also depends on how you do the receive. Though right now I am thinking of like a lazy evaluation, so I only try to receive the messages(with timeout) where I expect to use them instead of doing it all on the same place. And the same goes on the other end. Well might be over thinking it cause it's starting to sound more and more like how I used to work before I tried task pools. And I guess it won't be added in the near future so you can specify thread id's to the task? Like all OpenGL-related tasks get a specific thread while all other's doesn't matter.
 Can ownership of an OpenGL/DirectX contact be passed between 
 threads?  Can you maybe just give every thread its own context 
 and let it process whatever task you give to it, or is a 
 context necessarily linked with some set of operations?
Ownership can not be passed between threads. And giving every thread it's own context is possible but is bothersome because for instance the different context would have different states. (Backface culling, depth settings, and so on) Plus it would be pretty slow because I would have to call glFlush or similar to force the drivers to make sure all texture data has been updated to all threads and so on. Most of these problems with context and threads I have learned through the hard way :P If you have a opinion in what you think would be the best way to do this then I am interested, even if it is single threading it. But I want a motivation of course. Otherwise I'll just go with the concurrency library and lazy evaluation idea. I'll probably profile a little and do a consideration over what is easiest to work with and expand on later as well.
Jun 22 2012