www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.container.BinaryHeap + refCounted = WTF???

reply dsimcha <dsimcha yahoo.com> writes:
I'm trying to use BinaryHeap in a multithreaded programming using
std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
stuff, and when I looked into it in a debugger, I realized the crashes had to
do with reference counting, probably caused by this line in BinaryHeap:

    private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
                       RefCountedAutoInitialize.no) _payload;

Why the heck are the payload and the length being stored in such a weird way?
 IMHO BinaryHeap shouldn't use reference counting unless Store does.  More
generally, anything that uses reference counting, especially where unexpected,
should come with a very strong warning in its ddoc so that people are aware of
the hidden caveats, like copying it using memcpy() and sharing it across
threads.
Nov 16 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/16/10 12:47 PM, dsimcha wrote:
 I'm trying to use BinaryHeap in a multithreaded programming using
 std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
 stuff, and when I looked into it in a debugger, I realized the crashes had to
 do with reference counting, probably caused by this line in BinaryHeap:

      private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
                         RefCountedAutoInitialize.no) _payload;

 Why the heck are the payload and the length being stored in such a weird way?
   IMHO BinaryHeap shouldn't use reference counting unless Store does.  More
 generally, anything that uses reference counting, especially where unexpected,
 should come with a very strong warning in its ddoc so that people are aware of
 the hidden caveats, like copying it using memcpy() and sharing it across
threads.

BinaryHeap being a container it has reference semantics. In order to also be sealed, it needs to be reference counted. The fact that the store uses itself reference counting is not relevant because BinaryHeap has two fields. Andrei
Nov 16 2010
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-11-16 15:47:07 -0500, dsimcha <dsimcha yahoo.com> said:

 I'm trying to use BinaryHeap in a multithreaded programming using
 std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
 stuff, and when I looked into it in a debugger, I realized the crashes had to
 do with reference counting, probably caused by this line in BinaryHeap:
 
     private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
                        RefCountedAutoInitialize.no) _payload;
 
 Why the heck are the payload and the length being stored in such a weird way?
  IMHO BinaryHeap shouldn't use reference counting unless Store does.  More
 generally, anything that uses reference counting, especially where unexpected,
 should come with a very strong warning in its ddoc so that people are aware of
 the hidden caveats, like copying it using memcpy() and sharing it 
 across threads.

RefCounted, and many other things in Phobos, aren't thread-safe because of those reference counters. This problem can occur even if you don't share it with other threads, as the GC may destroy objects in another thread. See bug 4624: <http://d.puremagic.com/issues/show_bug.cgi?id=4624> Also bug 4621 about the compiler not catching the problem at compile-time (as it should catch low-level races): <http://d.puremagic.com/issues/show_bug.cgi?id=4621> I suggest you try to make RefCounted use an atomic reference counter to see if it removes your segfaults. Andrei acknowledged the issue recently (Walter too, I think), so I trust it'll be fixed at some point. One idea was to change the GC so it knows in which thread it should call destructors... -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Nov 16 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 16 Nov 2010 15:47:07 -0500, dsimcha <dsimcha yahoo.com> wrote:

 I'm trying to use BinaryHeap in a multithreaded programming using
 std.parallelism/parallelfuture.  I kept getting ridiculous segfaults and
 stuff, and when I looked into it in a debugger, I realized the crashes  
 had to
 do with reference counting, probably caused by this line in BinaryHeap:

     private RefCounted!(Tuple!(Store, "_store", size_t, "_length"),
                        RefCountedAutoInitialize.no) _payload;

 Why the heck are the payload and the length being stored in such a weird  
 way?
  IMHO BinaryHeap shouldn't use reference counting unless Store does.   
 More
 generally, anything that uses reference counting, especially where  
 unexpected,
 should come with a very strong warning in its ddoc so that people are  
 aware of
 the hidden caveats, like copying it using memcpy() and sharing it across  
 threads.

I think in general containers don't work across multiple threads unless specifically designed to do that. dcollections containers would probably all fail if you tried to use them from multiple threads. That being said, I'm not a huge fan of reference counting period. Containers have no business being reference counted anyways, since their resource is memory, and should be handled by the GC. This doesn't mean pieces of it shouldn't be reference counted or allocated via malloc or whatever, but the object itself's lifetime should be managed by the GC IMO. Not coincidentally, this is how dcollections is set up. -Steve
Nov 16 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 I think in general containers don't work across multiple threads unless
 specifically designed to do that.

I'm making the assumption that you'd handle all the synchronization issues yourself. When you need to update the container, there's an obvious issue. In general I don't like the trend in D of building things into arrays, containers, etc. such that they can't be shared across threads due to obscure implementation details even when it looks safe. (For arrays, I'm referring to the appending issue, which is problematic when I try to append to an array from multiple threads, synchronizing manually.)
 dcollections containers would probably all fail if you tried to use them
  from multiple threads.
 That being said, I'm not a huge fan of reference counting period.
 Containers have no business being reference counted anyways, since their
 resource is memory, and should be handled by the GC.  This doesn't mean
 pieces of it shouldn't be reference counted or allocated via malloc or
 whatever, but the object itself's lifetime should be managed by the GC
 IMO.  Not coincidentally, this is how dcollections is set up.

I think reference counting is an elegant solution to a niche problem (namely, memory management of large containers that won't have circular references when memory is tight), but given all the baggage it creates, I don't think it should be the default for any container. I think we need to start thinking about custom allocators, and allow reference counting but make GC the default.
Nov 17 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 I think that we need a wrapper for containers that implements the shared
 methods required and manually locks things in order to use them.  Then you
 apply this wrapper to any container type, and it's now a shared container.
 There are also certain types of containers that lend themselves to shared
 access.  For example, I can see a linked list where each node contains a
 lock being a useful type.

This is a good idea to some degree, but the thing is that it forces you to use shared even when you're going for fine-grained parallelism and want to just cowboy it. For fine-grained parallelism use cases, my hunch is that cowboying is going to be the only game in town for a long time in all languages, not just D.
 (For arrays, I'm referring to the appending issue, which is problematic
 when I try
 to append to an array from multiple threads, synchronizing manually.)

appending is possible.
 dcollections containers would probably all fail if you tried to use them
  from multiple threads.



Ok, I stand corrected. It seemed to work in practice, but always I just assumed that it was a Bad Thing to do and worked for the Wrong Reasons.
 memory (a relatively abundant resource).

Apparently you've never tired working with multigigabyte datasets using a conservative garbage collector and 32-bit address space.
Nov 17 2010
parent reply Sean Kelly <sean invisibleduck.org> writes:
Steven Schveighoffer Wrote:
 
 There is specific code in array appending that locks a global lock when  
 appending to shared arrays.  Appending to __gshared arrays from multiple  
 threads likely will not work in some cases though.  I don't know how to  
 get around this, since the runtime is not made aware that the data is  
 shared.

The shared attribute will have to become a part of the TypeInfo, much like const is now. Knowing whether data is shared can affect where/how the memory block is allocated by the GC, etc.
Nov 17 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more pages in
 place, the block length location will move.  Since each thread caches its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause problems
 because of the cache.  I'm not sure there's any way to reliably append to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?
Nov 17 2010
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more pages
 in
 place, the block length location will move.  Since each thread caches
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably append
 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

append without using the cache. -Steve

I thought the whole point of assumeSafeAppend is that it puts the current ptr and length into the cache as-is.
Nov 17 2010
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more pages
 in
 place, the block length location will move.  Since each thread caches
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably append
 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

append without using the cache. -Steve

How about using std.array.Appender? This looks safe as far as I can tell, but I want to make sure there aren't any obscure implementation details that would prevent this from working, too.
Nov 29 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more


 in
 place, the block length location will move.  Since each thread caches
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably


 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

to append without using the cache. -Steve

How about using std.array.Appender? This looks safe as far as I can tell, but I want to make sure there aren't any obscure implementation details that would prevent this from working, too.

fact, if you try to do normal appending to an array built with Appender, it will automatically reallocate because the memory does not have the APPENDABLE bit set (new GC bit I added to avoid misinterpretations). This is a good solution. -Steve

Sounds like a good solution to me, too. As long as this situation is unlikely to change in the future, I think we should scrap the idea of making appending to __gshared using ~= safe with manual synchronization, BUT the following needs to be documented: 1. Due to obscure implementation details, it is **not** safe to append to a non-shared array even if you synchronize manually. 2. If you're doing low-level shared-memory cowboy multithreading, Appender is the correct solution. 3. In general, any data structure, etc. in Phobos/druntime that can't be made thread-safe by using a synchronized/shared wrapper or manual synchronization due to obscure implementation details (especially if this would be safe with a textbook implementation) should come with an **bold** or ALL CAPS explicit warning against doing such things.
Nov 29 2010
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 1.  Due to obscure implementation details, it is **not** safe to append
 to a
 non-shared array even if you synchronize manually.

append to non-shared arrays. -Steve

No, I meant appending to non-shared arrays from multiple threads, i.e. __gshared or passed by reference between threads.
Nov 29 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 09:17:05 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 I think in general containers don't work across multiple threads unless
 specifically designed to do that.

I'm making the assumption that you'd handle all the synchronization issues yourself. When you need to update the container, there's an obvious issue. In general I don't like the trend in D of building things into arrays, containers, etc. such that they can't be shared across threads due to obscure implementation details even when it looks safe.

I think that we need a wrapper for containers that implements the shared methods required and manually locks things in order to use them. Then you apply this wrapper to any container type, and it's now a shared container. There are also certain types of containers that lend themselves to shared access. For example, I can see a linked list where each node contains a lock being a useful type.
 (For arrays, I'm referring to the appending issue, which is problematic  
 when I try
 to append to an array from multiple threads, synchronizing manually.)

I'm interested what you mean here, I tried to make sure cross-thread appending is possible.
 dcollections containers would probably all fail if you tried to use them
  from multiple threads.
 That being said, I'm not a huge fan of reference counting period.
 Containers have no business being reference counted anyways, since their
 resource is memory, and should be handled by the GC.  This doesn't mean
 pieces of it shouldn't be reference counted or allocated via malloc or
 whatever, but the object itself's lifetime should be managed by the GC
 IMO.  Not coincidentally, this is how dcollections is set up.

I think reference counting is an elegant solution to a niche problem (namely, memory management of large containers that won't have circular references when memory is tight), but given all the baggage it creates, I don't think it should be the default for any container. I think we need to start thinking about custom allocators, and allow reference counting but make GC the default.

The memory management of a container's innards is open to reference counting (or whatever, I agree that allocators should be supported somewhere). I just object to reference counting of the container itself, as it's not important to me whether a container gets closed outside the GC automatically. With something like a File it's different since you are coupling closing a file (a very limited resource) with cleaning memory (a relatively abundant resource). -Steve
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 10:14:21 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 I think that we need a wrapper for containers that implements the shared
 methods required and manually locks things in order to use them.  Then  
 you
 apply this wrapper to any container type, and it's now a shared  
 container.
 There are also certain types of containers that lend themselves to  
 shared
 access.  For example, I can see a linked list where each node contains a
 lock being a useful type.

This is a good idea to some degree, but the thing is that it forces you to use shared even when you're going for fine-grained parallelism and want to just cowboy it. For fine-grained parallelism use cases, my hunch is that cowboying is going to be the only game in town for a long time in all languages, not just D.

There is always the possibility of "cowboy"ing it. But I don't see that the standard lib should be catering to this.
 (For arrays, I'm referring to the appending issue, which is  

 when I try
 to append to an array from multiple threads, synchronizing manually.)

appending is possible.
 dcollections containers would probably all fail if you tried to use  


  from multiple threads.



Ok, I stand corrected. It seemed to work in practice, but always I just assumed that it was a Bad Thing to do and worked for the Wrong Reasons.

There is specific code in array appending that locks a global lock when appending to shared arrays. Appending to __gshared arrays from multiple threads likely will not work in some cases though. I don't know how to get around this, since the runtime is not made aware that the data is shared.
 memory (a relatively abundant resource).

Apparently you've never tired working with multigigabyte datasets using a conservative garbage collector and 32-bit address space.

Is that supported by out-of-the-box containers? I would expect you need to create a special data structure to deal with such things. And no, I don't regularly work with such issues. But my point is, reference counting the *container* which uses some sort of memory allocation to implement its innards is not coupling a limited resource to memory allocation/deallocation. In other words, I think it's better to have the container be a non-reference counted type, even if you reference-count the elements. I prefer class semantics to be quite honest, where explicit initialization is required. -Steve
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 11:58:20 -0500, Sean Kelly <sean invisibleduck.org>  
wrote:

 Steven Schveighoffer Wrote:
 There is specific code in array appending that locks a global lock when
 appending to shared arrays.  Appending to __gshared arrays from multiple
 threads likely will not work in some cases though.  I don't know how to
 get around this, since the runtime is not made aware that the data is
 shared.

The shared attribute will have to become a part of the TypeInfo, much like const is now. Knowing whether data is shared can affect where/how the memory block is allocated by the GC, etc.

shared is part of it, but __gshared is not. Since __gshared is the hack to allow "bare metal" sharing, I don't see how it can be part of the type info. The issue is that if you append to such an array and it adds more pages in place, the block length location will move. Since each thread caches its own copy of the block info, one will be wrong and look at array data thinking it's a length field. Even if you surround the appends with a lock, it will still cause problems because of the cache. I'm not sure there's any way to reliably append to such data from multiple threads. -Steve
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more pages  
 in
 place, the block length location will move.  Since each thread caches  
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause  
 problems
 because of the cache.  I'm not sure there's any way to reliably append  
 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

No, that does not affect your cache. I probably should add a function to append without using the cache. -Steve
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 17 Nov 2010 13:58:55 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more  


 in
 place, the block length location will move.  Since each thread caches
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably  


 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

to append without using the cache. -Steve

I thought the whole point of assumeSafeAppend is that it puts the current ptr and length into the cache as-is.

All the cache does is store the block info -- block start, block size, and block flags. The length is stored in the block directly. The cache allows me to skip a call to the GC (and lock the GC's global mutex) by getting the block info directly from a small cache. The block info is then used to determine where and how the "used" length is stored. Since the length is stored at the end, a change in block size in one cache while being unchanged in another cache can lead to problems. assumeSafeAppend sets the "used" block length as the given array's length so the block can be used again for appending. It does not affect the cache. Another option is to go back to the mode where the "used" length is stored at the beginning of large blocks (this caused alignment problems for some people). -Steve
Nov 17 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more  


 in
 place, the block length location will move.  Since each thread caches
 its
 own copy of the block info, one will be wrong and look at array data
 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably  


 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?

to append without using the cache. -Steve

How about using std.array.Appender? This looks safe as far as I can tell, but I want to make sure there aren't any obscure implementation details that would prevent this from working, too.

That should be fine, std.array.Appender does not affect the LRU cache. In fact, if you try to do normal appending to an array built with Appender, it will automatically reallocate because the memory does not have the APPENDABLE bit set (new GC bit I added to avoid misinterpretations). This is a good solution. -Steve
Nov 29 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 29 Nov 2010 11:44:23 -0500, dsimcha <dsimcha yahoo.com> wrote:

 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Mon, 29 Nov 2010 10:58:05 -0500, dsimcha <dsimcha yahoo.com> wrote:
 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On Wed, 17 Nov 2010 12:09:11 -0500, dsimcha <dsimcha yahoo.com>  


 == Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 The issue is that if you append to such an array and it adds more


 in
 place, the block length location will move.  Since each thread  




 its
 own copy of the block info, one will be wrong and look at array  




 thinking it's a length field.
 Even if you surround the appends with a lock, it will still cause
 problems
 because of the cache.  I'm not sure there's any way to reliably


 to
 such data from multiple threads.
 -Steve

Would assumeSafeAppend() do the trick?



 to
 append without using the cache.
 -Steve

How about using std.array.Appender? This looks safe as far as I can tell, but I want to make sure there aren't any obscure implementation details that would prevent this from working, too.

In fact, if you try to do normal appending to an array built with Appender, it will automatically reallocate because the memory does not have the APPENDABLE bit set (new GC bit I added to avoid misinterpretations). This is a good solution. -Steve

Sounds like a good solution to me, too. As long as this situation is unlikely to change in the future, I think we should scrap the idea of making appending to __gshared using ~= safe with manual synchronization,

I think it's worth giving people who want low-level access to the runtime functions the ability to avoid using the LRU cache. The cache is meant to support everyday coding, but can possibly cause problems if you are interested in doing low-level optimizations.
 BUT the following needs to be
 documented:

 1.  Due to obscure implementation details, it is **not** safe to append  
 to a
 non-shared array even if you synchronize manually.

I think you meant __gshared, not non-shared. It's perfectly safe to append to non-shared arrays. -Steve
Nov 29 2010