www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Modify thread-local storage from parent thread

reply Kai Meyer <kai unixlords.com> writes:
I am playing with threading, and I am doing something like this:
         file.rawRead(bytes);
         auto tmpTask = task!do_something(bytes.idup);
         task_pool.put(tmpTask);
Is there a way to avoid the idup (or can somebody explain why idup here 
is not expensive?)

If the logic above is expressed as:
Read bytes into an array
Create a thread (task) to execute a function that takes a copy of 'bytes'
Execute the thread

I wonder if I could:
Create a thread (task)
Read bytes directly into the tasks' thread local storage
Execute the thread
Aug 08 2011
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
          file.rawRead(bytes);
          auto tmpTask = task!do_something(bytes.idup);
          task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup here  
 is not expensive?)

I'd have to see where bytes is created, if it's created in the same context, just casting to immutable is allowed, as long as you never use the mutable reference again.
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible. However, in many cases, the OS is responsible for creating the TLS when the thread starts, so you have to wait until the thread is actually running to access it (not an expert on this, but I think this is the case for everything but OSX?) So you would have to create the thread, have it pause while you fill it's TLS, then resume it. But I think this is clearly a weird approach to this problem. Finding a way to reliably pass the data to the sub-thread seems more appropriate. BTW, I've dealt with having to access other threads' TLS. It's not pretty, and I don't recommend using it except in specialized situations (mine was adding a GC hook). -Steve
Aug 08 2011
parent reply Kai Meyer <kai unixlords.com> writes:
On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

I'd have to see where bytes is created, if it's created in the same context, just casting to immutable is allowed, as long as you never use the mutable reference again.
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible. However, in many cases, the OS is responsible for creating the TLS when the thread starts, so you have to wait until the thread is actually running to access it (not an expert on this, but I think this is the case for everything but OSX?) So you would have to create the thread, have it pause while you fill it's TLS, then resume it. But I think this is clearly a weird approach to this problem. Finding a way to reliably pass the data to the sub-thread seems more appropriate. BTW, I've dealt with having to access other threads' TLS. It's not pretty, and I don't recommend using it except in specialized situations (mine was adding a GC hook). -Steve

Well, bytes is in a loop, so casting to immutable wouldn't do it. The idea is to read a block of bytes, and hand them off to a worker thread to operate on those set of bytes. Everything is working, I'm just trying to avoid having to reallocate that block of bytes for the read, and then reallocate them again to pass them off to the worker thread. If I could get away with one allocation, I'd be happier. -Kai Meyer
Aug 09 2011
parent Kai Meyer <kai unixlords.com> writes:
On 08/09/2011 10:27 AM, Steven Schveighoffer wrote:
 On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

I'd have to see where bytes is created, if it's created in the same context, just casting to immutable is allowed, as long as you never use the mutable reference again.
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible. However, in many cases, the OS is responsible for creating the TLS when the thread starts, so you have to wait until the thread is actually running to access it (not an expert on this, but I think this is the case for everything but OSX?) So you would have to create the thread, have it pause while you fill it's TLS, then resume it. But I think this is clearly a weird approach to this problem. Finding a way to reliably pass the data to the sub-thread seems more appropriate. BTW, I've dealt with having to access other threads' TLS. It's not pretty, and I don't recommend using it except in specialized situations (mine was adding a GC hook). -Steve

Well, bytes is in a loop, so casting to immutable wouldn't do it. The idea is to read a block of bytes, and hand them off to a worker thread to operate on those set of bytes. Everything is working, I'm just trying to avoid having to reallocate that block of bytes for the read, and then reallocate them again to pass them off to the worker thread. If I could get away with one allocation, I'd be happier.

OK, there are other options. First, you could keep a "pool" of buffers, which are marked as shared. When you want to run a task, get one of those buffers, fill it, then pass the buffer to the task thread to process. Make sure the task thread puts the buffer back into the pool when it's done. I'd recommend casting the buffer to unshared while inside the task thread to save some cycles. This is probably the option I'd go with. Second, you can have the task thread give you it's TLS buffer to read data into (you need to do some casting to get this around the type system). Note that in order for it truly to be stored in TLS, the buffer has to be a fixed-sized array. -Steve

These are concepts that I'm only familiar with. I think I would like to try the "pool" of buffers. I can't say I know how to mark buffers as shared, though. Could you modify this for me, to show me an example? import std.parallelism; ubyte[8][] pool; // Dynamic array of (array of 8 bytes) void thread_func() { //my_pool[0] = 50; } void main(string[] args) { uint threads = 2; pool.length = threads; TaskPool taskpool = new TaskPool(threads); foreach(i; 0..threads) { auto tmpTask = task!thread_func(); taskpool.put(tmpTask); } taskpool.stop(); }
Aug 09 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com> wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

I'd have to see where bytes is created, if it's created in the same context, just casting to immutable is allowed, as long as you never use the mutable reference again.
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of  
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible. However, in many cases, the OS is responsible for creating the TLS when the thread starts, so you have to wait until the thread is actually running to access it (not an expert on this, but I think this is the case for everything but OSX?) So you would have to create the thread, have it pause while you fill it's TLS, then resume it. But I think this is clearly a weird approach to this problem. Finding a way to reliably pass the data to the sub-thread seems more appropriate. BTW, I've dealt with having to access other threads' TLS. It's not pretty, and I don't recommend using it except in specialized situations (mine was adding a GC hook). -Steve

Well, bytes is in a loop, so casting to immutable wouldn't do it. The idea is to read a block of bytes, and hand them off to a worker thread to operate on those set of bytes. Everything is working, I'm just trying to avoid having to reallocate that block of bytes for the read, and then reallocate them again to pass them off to the worker thread. If I could get away with one allocation, I'd be happier.

OK, there are other options. First, you could keep a "pool" of buffers, which are marked as shared. When you want to run a task, get one of those buffers, fill it, then pass the buffer to the task thread to process. Make sure the task thread puts the buffer back into the pool when it's done. I'd recommend casting the buffer to unshared while inside the task thread to save some cycles. This is probably the option I'd go with. Second, you can have the task thread give you it's TLS buffer to read data into (you need to do some casting to get this around the type system). Note that in order for it truly to be stored in TLS, the buffer has to be a fixed-sized array. -Steve
Aug 09 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 09 Aug 2011 12:59:46 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/09/2011 10:27 AM, Steven Schveighoffer wrote:
 On Tue, 09 Aug 2011 11:36:13 -0400, Kai Meyer <kai unixlords.com> wrote:

 On 08/08/2011 01:38 PM, Steven Schveighoffer wrote:
 On Mon, 08 Aug 2011 14:17:28 -0400, Kai Meyer <kai unixlords.com>  
 wrote:

 I am playing with threading, and I am doing something like this:
 file.rawRead(bytes);
 auto tmpTask = task!do_something(bytes.idup);
 task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup
 here is not expensive?)

I'd have to see where bytes is created, if it's created in the same context, just casting to immutable is allowed, as long as you never use the mutable reference again.
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes'
 Execute the thread

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage
 Execute the thread

This *might* be possible. However, in many cases, the OS is responsible for creating the TLS when the thread starts, so you have to wait until the thread is actually running to access it (not an expert on this, but I think this is the case for everything but OSX?) So you would have to create the thread, have it pause while you fill it's TLS, then resume it. But I think this is clearly a weird approach to this problem. Finding a way to reliably pass the data to the sub-thread seems more appropriate. BTW, I've dealt with having to access other threads' TLS. It's not pretty, and I don't recommend using it except in specialized situations (mine was adding a GC hook). -Steve

Well, bytes is in a loop, so casting to immutable wouldn't do it. The idea is to read a block of bytes, and hand them off to a worker thread to operate on those set of bytes. Everything is working, I'm just trying to avoid having to reallocate that block of bytes for the read, and then reallocate them again to pass them off to the worker thread. If I could get away with one allocation, I'd be happier.

OK, there are other options. First, you could keep a "pool" of buffers, which are marked as shared. When you want to run a task, get one of those buffers, fill it, then pass the buffer to the task thread to process. Make sure the task thread puts the buffer back into the pool when it's done. I'd recommend casting the buffer to unshared while inside the task thread to save some cycles. This is probably the option I'd go with. Second, you can have the task thread give you it's TLS buffer to read data into (you need to do some casting to get this around the type system). Note that in order for it truly to be stored in TLS, the buffer has to be a fixed-sized array. -Steve

These are concepts that I'm only familiar with. I think I would like to try the "pool" of buffers. I can't say I know how to mark buffers as shared, though. Could you modify this for me, to show me an example?

shared is just like const, you use a cast to mark something as shared. It can also be a storage class. So for example, you can simply mark your pool as shared, and all threads can see it. I'm not very familiar with std.parallelism, so I don't know how to pass the buffer (or it's pool index) to the task thread. What you have to be careful is that you somehow mark the pool buffers as being "used" by the thread. I'd recommend something like this: struct buffer { bool inUse; bool[8] buf; } Then use this as your pool: shared buffer[] pool; // this is now not in TLS, it's accessible from all threads. Someone more familiar with std.parallelism can probably find a way to do this with parallel foreach. -Steve
Aug 09 2011
prev sibling parent reply Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:
On Mon, 08 Aug 2011 12:17:28 -0600, Kai Meyer wrote:

 I am playing with threading, and I am doing something like this:
          file.rawRead(bytes);
          auto tmpTask = task!do_something(bytes.idup);
          task_pool.put(tmpTask);
 Is there a way to avoid the idup (or can somebody explain why idup here
 is not expensive?)
 
 If the logic above is expressed as:
 Read bytes into an array
 Create a thread (task) to execute a function that takes a copy of
 'bytes' Execute the thread
 
 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage Execute the
 thread

I don't know what copies happen behind the scenes in the following code, but std.paralleism is great when threads don't need to interact with each other: import std.stdio; import std.parallelism; void main() { ubyte[8][10] buffers; foreach (i, ref buffer; parallel(buffers[])) { ubyte value = cast(ubyte)i; workWith(value, buffer); } writeln(buffers); } void workWith(ubyte value, ref ubyte[8] buffer) { foreach (ref b; buffer) { b = value; } } Notes: - I had to give buffers[] to parallel() as it calls popFront() which my constant-size array can't provide. (Yes, I could have used a dynamic array.) - Note the three ref's that I used; two of those are because constant- size arrays are value types. Ali
Aug 09 2011
parent Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:
On Tue, 09 Aug 2011 20:37:04 +0000, Ali Çehreli wrote:

 I wonder if I could:
 Create a thread (task)
 Read bytes directly into the tasks' thread local storage Execute the
 thread

I don't know what copies happen behind the scenes in the following code, but std.paralleism is great when threads don't need to interact with each other: import std.stdio; import std.parallelism; void main() { ubyte[8][10] buffers; foreach (i, ref buffer; parallel(buffers[])) { ubyte value = cast(ubyte)i; workWith(value, buffer); } writeln(buffers); } void workWith(ubyte value, ref ubyte[8] buffer) { foreach (ref b; buffer) { b = value; } } Notes: - I had to give buffers[] to parallel() as it calls popFront() which my constant-size array can't provide. (Yes, I could have used a dynamic array.) - Note the three ref's that I used; two of those are because constant- size arrays are value types.

The following is a program that uses std.concurrency. In this case the threads communicate with each other: import std.stdio; import std.concurrency; void main() { shared(ubyte[8])[10] buffers; /* Spawn the threads */ foreach (i, ref buffer; buffers) { spawn(&worker, thisTid, i, &buffer); } /* Collect the results */ foreach (i; 0 .. buffers.length) { size_t id = receiveOnly!size_t(); writefln("thread %s is done", id); } writeln(buffers); } void worker(Tid owner, size_t myId, ubyte[8] * buffer) { foreach (ref b; *buffer) { b = cast(ubyte)myId; } owner.send(myId); } Ali
Aug 09 2011