www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposal: Thread-sealed containers

reply dsimcha <dsimcha yahoo.com> writes:
I'm thinking about ways to make std.concurrency's model more flexible 
without compromising safety.  It's sometimes useful to cheaply _move_ 
(as opposed to share or copy) data between threads.  I wonder if we 
could co-opt sealed containers here:

1.  Define basic data structures like arrays, hash tables and trees that 
are otherwise identical to sealed containers, but maintain a 
_thread-specific_ reference count.  This can be naively implemented as a 
hash table with Thread keys and size_t values, but there's probably a 
better way.  Provisionally call these "thread-sealed containers".

A thread-sealed container has the invariant that no two thread-specific 
reference counts may be simultaneously nonzero.  If this becomes untrue, 
an Error is thrown at runtime.  This is not as good as a compile time 
error, but it's much better than a low-level data race.

2.  Restrict the contents of thread-sealed containers to types that can 
be safely shared/moved between threads:  Primitives, shared and 
immutable data, and other thread-sealed containers.

3.  Define a std.concurrency.moveSend() function.  This function takes a 
thread-sealed container by reference, decrements the sending thread's 
reference count, enforces that it's zero, then sends the container as a 
message to the receiving thread.  The receiving thread increments its 
reference count of the container.
Mar 26 2011
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
dsimcha Wrote:

 I'm thinking about ways to make std.concurrency's model more flexible 
 without compromising safety.  It's sometimes useful to cheaply _move_ 
 (as opposed to share or copy) data between threads.  I wonder if we 
 could co-opt sealed containers here:

What definition of sealed are you using? A quick web search mostly brought up C#'s sealed keyword which I think is the same as D's final. I vaguely remember Andrei might have used the term to mean no aliases to the data held by the container?
 1.  Define basic data structures like arrays, hash tables and trees that 
 are otherwise identical to sealed containers, but maintain a 
 _thread-specific_ reference count.  This can be naively implemented as a 
 hash table with Thread keys and size_t values, but there's probably a 
 better way.  Provisionally call these "thread-sealed containers".

It might be possible to simply store the owning thread id. Since the thread that sent the container will still have a reference to it, you may be forced to do some extra runtime checks on every method call anyway...
 A thread-sealed container has the invariant that no two thread-specific 
 reference counts may be simultaneously nonzero.  If this becomes untrue, 
 an Error is thrown at runtime.  This is not as good as a compile time 
 error, but it's much better than a low-level data race.
 
 2.  Restrict the contents of thread-sealed containers to types that can 
 be safely shared/moved between threads:  Primitives, shared and 
 immutable data, and other thread-sealed containers.

Some kind of equivalent to assumeUnique may be needed here?
 3.  Define a std.concurrency.moveSend() function.  This function takes a 
 thread-sealed container by reference, decrements the sending thread's 
 reference count, enforces that it's zero, then sends the container as a 
 message to the receiving thread.  The receiving thread increments its 
 reference count of the container.

I like that you're thinking about this kind of stuff. I am over tired right now and haven't really thought this through, but it feels like this could lead to a major safety upgrade for std.concurrency.
Mar 26 2011
parent dsimcha <dsimcha yahoo.com> writes:
On 3/27/2011 2:47 AM, Jason House wrote:
 dsimcha Wrote:

 I'm thinking about ways to make std.concurrency's model more flexible
 without compromising safety.  It's sometimes useful to cheaply _move_
 (as opposed to share or copy) data between threads.  I wonder if we
 could co-opt sealed containers here:

What definition of sealed are you using? A quick web search mostly brought up C#'s sealed keyword which I think is the same as D's final. I vaguely remember Andrei might have used the term to mean no aliases to the data held by the container?

I was using Andrei's definition that had been floated around here previously. For those not already familiar, a sealed container is one where references/pointers to the elements cannot be obtained. This allows things like reference-counted memory management to be done safely.
 1.  Define basic data structures like arrays, hash tables and trees that
 are otherwise identical to sealed containers, but maintain a
 _thread-specific_ reference count.  This can be naively implemented as a
 hash table with Thread keys and size_t values, but there's probably a
 better way.  Provisionally call these "thread-sealed containers".

It might be possible to simply store the owning thread id. Since the thread that sent the container will still have a reference to it, you may be forced to do some extra runtime checks on every method call anyway...

 3.  Define a std.concurrency.moveSend() function.  This function takes a
 thread-sealed container by reference, decrements the sending thread's
 reference count, enforces that it's zero, then sends the container as a
 message to the receiving thread.  The receiving thread increments its
 reference count of the container.

I like that you're thinking about this kind of stuff. I am over tired right now and haven't really thought this through, but it feels like this could lead to a major safety upgrade for std.concurrency.

Now that I think about it some more, we don't need explicit thread-specific reference counts. All we need to prove is that we're dealing with a unique reference, i.e. that the reference count is 1. moveSend() then clears the sending thread's view of the container so that the sending thread no longer has access to it and sends it as a message to the receiving thread. The receiving thread is the new owner. The bottom line concept is that uniqueness is useful but difficult to prove statically. (To refresh people's memory uniqueness in this context means that a piece of data only has one pointer/reference pointing to it.) Static solutions will always be conservative and risk making the type system absurdly complex. Therefore, enforce uniqueness at runtime when it can be determined exactly via reference counting. The same principle may solve the problem of immutable data creation.
Mar 27 2011
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/26/2011 05:26 PM, dsimcha wrote:
 I'm thinking about ways to make std.concurrency's model more flexible
 without compromising safety. It's sometimes useful to cheaply _move_ (as
 opposed to share or copy) data between threads. I wonder if we could
 co-opt sealed containers here:

 1. Define basic data structures like arrays, hash tables and trees that
 are otherwise identical to sealed containers, but maintain a
 _thread-specific_ reference count. This can be naively implemented as a
 hash table with Thread keys and size_t values, but there's probably a
 better way. Provisionally call these "thread-sealed containers".

 A thread-sealed container has the invariant that no two thread-specific
 reference counts may be simultaneously nonzero. If this becomes untrue,
 an Error is thrown at runtime. This is not as good as a compile time
 error, but it's much better than a low-level data race.

 2. Restrict the contents of thread-sealed containers to types that can
 be safely shared/moved between threads: Primitives, shared and immutable
 data, and other thread-sealed containers.

 3. Define a std.concurrency.moveSend() function. This function takes a
 thread-sealed container by reference, decrements the sending thread's
 reference count, enforces that it's zero, then sends the container as a
 message to the receiving thread. The receiving thread increments its
 reference count of the container.

Sounds like a good idea. Given the restriction, instead of one reference count per thread you only need a pair threadid + refcount. Then the refcounts of all other threads are implicitly zero. Andrei
Mar 27 2011