www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - RFC: safe ref counting

reply Steven Schveighoffer <schveiguy gmail.com> writes:
In trying to make iopipe  safe, I came to the realization that having 
auto-managed items such as files and the like (std.io Files and Sockets 
are non-copyable), you need to rely on some form of  safe reference 
counting. Unfortunately std.typecons.RefCounted is not and cannot be 
safe. This is because it allocates in the C heap, and deallocates 
regardless of whether anyone has ever squirreled away a reference.

So I thought I'd make a refCounted struct that uses the GC [1]. The 
concept is simple -- allocate the refCounted payload in a GC block, then 
pin the block as a root. Once all references are gone, remove the root. 
But the memory stays behind to keep things memory safe (if, for example, 
you saved a pointer to it outside a reference count object). The memory 
will be in an initial state, but not invalid.

This means that if you include it in e.g. an array or a class, then it 
still should work correctly (the memory is guaranteed to be present, and 
anything it points at).

Of course, you can have cycles that prevent it ever from being cleaned 
up. But most of the time, this is for auto cleaning up stack items. So 
maybe that's OK?

Let me know what you think. It sucks that we have no valid way to do 
reference counting in safe code, because std.io and iopipe highly depend 
on it.

-Steve

[1] https://github.com/schveiguy/iopipe/blob/makesafe/source/iopipe/refc.d
May 01
next sibling parent reply Meta <jared771 gmail.com> writes:
On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer 
wrote:
 In trying to make iopipe  safe, I came to the realization that 
 having auto-managed items such as files and the like (std.io 
 Files and Sockets are non-copyable), you need to rely on some 
 form of  safe reference counting. Unfortunately 
 std.typecons.RefCounted is not and cannot be safe. This is 
 because it allocates in the C heap, and deallocates regardless 
 of whether anyone has ever squirreled away a reference.

 So I thought I'd make a refCounted struct that uses the GC [1]. 
 The concept is simple -- allocate the refCounted payload in a 
 GC block, then pin the block as a root. Once all references are 
 gone, remove the root. But the memory stays behind to keep 
 things memory safe (if, for example, you saved a pointer to it 
 outside a reference count object). The memory will be in an 
 initial state, but not invalid.

 This means that if you include it in e.g. an array or a class, 
 then it still should work correctly (the memory is guaranteed 
 to be present, and anything it points at).

 Of course, you can have cycles that prevent it ever from being 
 cleaned up. But most of the time, this is for auto cleaning up 
 stack items. So maybe that's OK?

 Let me know what you think. It sucks that we have no valid way 
 to do reference counting in safe code, because std.io and 
 iopipe highly depend on it.

 -Steve

 [1] 
 https://github.com/schveiguy/iopipe/blob/makesafe/source/iopipe/refc.d
Is it not enough to add a flag to RefCounted that tells it to destroy (but not deallocate) the ref counted object, instead of deallocating it? Then it could be conditionally safe.
May 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/1/20 10:52 PM, Meta wrote:
 Is it not enough to add a flag to RefCounted that tells it to destroy 
 (but not deallocate) the ref counted object, instead of deallocating it? 
 Then it could be conditionally  safe.
You mean for Phobos? Phobos uses C malloc/free, so no. Something has to deallocate it. If you mean something else, I'm not sure what you mean. -Steve
May 01
parent Meta <jared771 gmail.com> writes:
On Saturday, 2 May 2020 at 03:31:55 UTC, Steven Schveighoffer 
wrote:
 On 5/1/20 10:52 PM, Meta wrote:
 Is it not enough to add a flag to RefCounted that tells it to 
 destroy (but not deallocate) the ref counted object, instead 
 of deallocating it? Then it could be conditionally  safe.
You mean for Phobos? Phobos uses C malloc/free, so no. Something has to deallocate it. If you mean something else, I'm not sure what you mean. -Steve
Err... right. I wasn't thinking clearly.
May 01
prev sibling next sibling parent ikod <geller.garry gmail.com> writes:
On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer 
wrote:
 In trying to make iopipe  safe, I came to the realization that 
 having auto-managed items such as files and the like (std.io 
 Files and Sockets are non-copyable), you need to rely on some 
 form of  safe reference counting. Unfortunately 
 std.typecons.RefCounted is not and cannot be safe. This is 
 because it allocates in the C heap, and deallocates regardless 
 of whether anyone has ever squirreled away a reference.

 So I thought I'd make a refCounted struct that uses the GC [1]. 
 The concept is simple -- allocate the refCounted payload in a 
 GC block, then pin the block as a root. Once all references are 
 gone, remove the root. But the memory stays behind to keep 
 things memory safe (if, for example, you saved a pointer to it 
 outside a reference count object). The memory will be in an 
 initial state, but not invalid.

 This means that if you include it in e.g. an array or a class, 
 then it still should work correctly (the memory is guaranteed 
 to be present, and anything it points at).

 Of course, you can have cycles that prevent it ever from being 
 cleaned up. But most of the time, this is for auto cleaning up 
 stack items. So maybe that's OK?

 Let me know what you think. It sucks that we have no valid way 
 to do reference counting in safe code, because std.io and 
 iopipe highly depend on it.
For my small memory buffer mgmt library I use next solution - library user can't have raw pointers to memory (I understand this is not your case with file etc), It can only have unique_ptr to the mutable memory chunk which user can fill with some data(from file/network), and also can convert this unique_ptr to ref_counted (to immutable view of this memory), and this destroys uniq_ptr, so it can't be used anymore. It is safe to keep ref_counted in arrays. PS. it is better to make things nogc from beginning
 -Steve

 [1] 
 https://github.com/schveiguy/iopipe/blob/makesafe/source/iopipe/refc.d
May 02
prev sibling next sibling parent reply Seb <seb wilzba.ch> writes:
On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer 
wrote:
 In trying to make iopipe  safe, I came to the realization that 
 having auto-managed items such as files and the like (std.io 
 Files and Sockets are non-copyable), you need to rely on some 
 form of  safe reference counting. Unfortunately 
 std.typecons.RefCounted is not and cannot be safe. This is 
 because it allocates in the C heap, and deallocates regardless 
 of whether anyone has ever squirreled away a reference.

 [...]
Are you aware of the stalled work to get reference counting in druntime? https://github.com/dlang/druntime/pull/2679 https://github.com/dlang/druntime/pull/2646 https://github.com/dlang/druntime/pull/2760
May 02
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/2/20 7:14 AM, Seb wrote:
 On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer wrote:
 In trying to make iopipe  safe, I came to the realization that having 
 auto-managed items such as files and the like (std.io Files and 
 Sockets are non-copyable), you need to rely on some form of  safe 
 reference counting. Unfortunately std.typecons.RefCounted is not and 
 cannot be safe. This is because it allocates in the C heap, and 
 deallocates regardless of whether anyone has ever squirreled away a 
 reference.

 [...]
Are you aware of the stalled work to get reference counting in druntime? https://github.com/dlang/druntime/pull/2679 https://github.com/dlang/druntime/pull/2646 https://github.com/dlang/druntime/pull/2760
No, but these are not what I'm focused on. I want a safe API, I don't care about nogc in iopipe. The problem with letting the GC clean up your non-memory resources is that it may never happen, or it may happen at a much later time. A process can have a lot of memory, it's open file descriptors are much more limited. But the memory used to implement said resources I'm fine leaving to the GC to clean up. In other words, I want my files to close as soon as I no longer need them, but the shell being used to store the file info (e.g. buffer etc) can stick around while something has a pointer to it. I don't know if reference counting without GC can be made safe for D. ikod has a good plan, don't give access to the actual data. But it falls apart in practice, because then you can't use standard functions, everything has to use reference counted pointers and arrays. Not only that, but a generic library can easily give away the data without you wanting it to. You have to be really cautious about how you use the reference counting. I like the plan of letting the GC ensure the memory is always valid. I'm just not sure about the problem of cycles. That's the one place this might fall down. -Steve
May 02
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/2/20 2:16 PM, Steven Schveighoffer wrote:
 ikod has a good plan, don't give access to the actual data. But it falls 
 apart in practice, because then you can't use standard functions, 
 everything has to use reference counted pointers and arrays. Not only 
 that, but a generic library can easily give away the data without you 
 wanting it to. You have to be really cautious about how you use the 
 reference counting.
As an example, if you have something like: struct S { int[100] buf; disable this(this); int[] opSlice() { return buf[]; } } Now, reference counting this struct, you can prevent direct access to the S instance, but just calling x[] now gives you naked access to the data. How do you solve this issue in D and still make it safe without using the GC? -Steve
May 02
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 03/05/2020 6:23 AM, Steven Schveighoffer wrote:
 How do you solve this issue in D and still make it  safe without using 
 the GC?
(head)const + DIP25/1000 would be a good place to begin. I would love to have a headconst tied to lifetimes. Most of the work has already been done, it just needs exposing in a nice way. I'm not sure if live is the right way to do this though.
May 02
prev sibling parent reply ikod <geller.garry gmail.com> writes:
On Saturday, 2 May 2020 at 18:23:36 UTC, Steven Schveighoffer 
wrote:
 On 5/2/20 2:16 PM, Steven Schveighoffer wrote:
 ikod has a good plan, don't give access to the actual data. 
 But it falls apart in practice, because then you can't use 
 standard functions, everything has to use reference counted 
 pointers and arrays. Not only that, but a generic library can 
 easily give away the data without you wanting it to. You have 
 to be really cautious about how you use the reference counting.
As an example, if you have something like: struct S { int[100] buf; disable this(this); int[] opSlice() { return buf[]; } } Now, reference counting this struct, you can prevent direct access to the S instance, but just calling x[] now gives you naked access to the data. How do you solve this issue in D and still make it safe without using the GC?
Yes there is no magic and there are lot of limitations and inconveniences, but at least I know where data can leak. Here is gist with code sample and comments: https://gist.github.com/ikod/2c35851581b59677a0d9511812592df0
 -Steve
May 02
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/2/20 4:28 PM, ikod wrote:
 Yes there is no magic and there are lot of limitations and 
 inconveniences, but at least I know where data can leak.
 
 Here is gist with code sample and comments: 
 https://gist.github.com/ikod/2c35851581b59677a0d9511812592df0
OK, so essentially you need to have a lot of trusted escapes. I'm looking for something that doesn't need that by default for useful code. Though that does look correct in terms of memory safety. My biggest problem with doing something like that is that for instance, an iopipe buffered output stream uses reference counting to ensure that once all references to the output stream are done, the final data in the buffer is flushed to the output. This is a perfect fit for reference counting, but of course, it can't be made safe because iopipe provides direct buffer access (that is part of the design). So I want something safe that provides direct buffer access, and also can clean up whatever needs cleaning synchronously (e.g. closing files, flushing data, etc). Developers are not going to be keen to a buffered file flushing it's buffer at sometime in the future (or never). AND I would like it to be storable inside a GC block (many people don't realize that std.typecons.RefCounted isn't valid to put in the GC when you have multiple threads). -Steve
May 02
prev sibling parent reply Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Saturday, May 2, 2020 12:16:13 PM MDT Steven Schveighoffer via 
Digitalmars-d wrote:
 I like the plan of letting the GC ensure the memory is always valid. I'm
 just not sure about the problem of cycles. That's the one place this
 might fall down.
As I understand it, as long as nothing that the program still has access to refers to the objects with circular references, the cycle won't be a problem, and the GC will be able to collect them. I recall Andrei talking in the past about having reference counting which did basically what you're describing with the GC being left to handle the cycles. - Jonathan M Davis
May 02
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 5/2/20 2:38 PM, Jonathan M Davis wrote:
 On Saturday, May 2, 2020 12:16:13 PM MDT Steven Schveighoffer via
 Digitalmars-d wrote:
 I like the plan of letting the GC ensure the memory is always valid. I'm
 just not sure about the problem of cycles. That's the one place this
 might fall down.
As I understand it, as long as nothing that the program still has access to refers to the objects with circular references, the cycle won't be a problem, and the GC will be able to collect them. I recall Andrei talking in the past about having reference counting which did basically what you're describing with the GC being left to handle the cycles.
This is different. I'm pinning the blocks so they won't be collected until all "appropriate" (e.g. "counted") references are no more. Essentially, what I want is ref counting for resource management, but instead of freeing the memory, I'm releasing it for the GC to clean up. So there is definitely the possibility of cycles. -Steve
May 02
prev sibling parent aliak <something something.com> writes:
On Saturday, 2 May 2020 at 11:14:23 UTC, Seb wrote:
 On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer 
 wrote:
 In trying to make iopipe  safe, I came to the realization that 
 having auto-managed items such as files and the like (std.io 
 Files and Sockets are non-copyable), you need to rely on some 
 form of  safe reference counting. Unfortunately 
 std.typecons.RefCounted is not and cannot be safe. This is 
 because it allocates in the C heap, and deallocates regardless 
 of whether anyone has ever squirreled away a reference.

 [...]
Are you aware of the stalled work to get reference counting in druntime? https://github.com/dlang/druntime/pull/2679 https://github.com/dlang/druntime/pull/2646 https://github.com/dlang/druntime/pull/2760
Do you know what happened to the whole __mutable ?
May 02
prev sibling parent Jon Degenhardt <jond noreply.com> writes:
On Saturday, 2 May 2020 at 02:27:10 UTC, Steven Schveighoffer 
wrote:
 In trying to make iopipe  safe, I came to the realization that 
 having auto-managed items such as files and the like (std.io 
 Files and Sockets are non-copyable), you need to rely on some 
 form of  safe reference counting. Unfortunately 
 std.typecons.RefCounted is not and cannot be safe. This is 
 because it allocates in the C heap, and deallocates regardless 
 of whether anyone has ever squirreled away a reference.
I was solving a much more constrained problem, but I wrote a couple of one-pass input ranges over a set of files, providing open file access to each in turn. Has the nice benefit of closing the file immediately when it is popFront'ed off the range. There's nothing preventing the caller from holding a copy of the underlying File object, but it is only open while it is the front element of the range. If the concept might be of interest, the code is here: * https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L1968 * https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/utils.d#L2344 --Jon
May 02