www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Orphan ranges

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'm making good progress on an allocator design. If things come together 
as I hope, it'll kick some serious ass.

I'm currently looking at four allocation models:

* straight GC, safe (no deallocation)
* GC + the ability to free memory
* malloc/free, unsafe, buyer beware (STL-level safety)
* reference counted (based on either malloc or GC+free)
* region (scoped)

I need to kink out a few details, most important being - is safety part 
of the allocator or an extricable property that's a different policy? 
For now I have a related question about orphan ranges.

Consider this motivating example:

void main()
{
     int[] x;
     {
         auto b = new int[20];
         x = b[5 .. $ - 5];
     }
     ... use x ...
}

The range x is a 10-elements range that originates in a 20-element 
array. There is no safe way to access the original array again, so in a 
sense the other 10 elements are "lost". That's why I call x an orphan 
range - a range of which original container is gone.

Built-in arrays work routinely like that, and in fact the originating 
arrays are not distinguished by type in any way from their ranges (be 
they orphan or not). The question is, what should std.container do about 
orphan ranges in general? Should it allow them, disallow them, or leave 
the decision open (e.g. to be made by the allocator)? Depending on what 
way we go, the low-level design would be quite different.


Thanks,

Andrei
Apr 15 2012
next sibling parent "Tove" <tove fransson.se> writes:
On Sunday, 15 April 2012 at 16:34:12 UTC, Andrei Alexandrescu 
wrote:
 I'm making good progress on an allocator design. If things come 
 together as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is 
 safety part of the allocator or an extricable property that's a 
 different policy? For now I have a related question about 
 orphan ranges.

 Consider this motivating example:

 void main()
 {
     int[] x;
     {
         auto b = new int[20];
         x = b[5 .. $ - 5];
     }
     ... use x ...
 }

 The range x is a 10-elements range that originates in a 
 20-element array. There is no safe way to access the original 
 array again, so in a sense the other 10 elements are "lost". 
 That's why I call x an orphan range - a range of which original 
 container is gone.

 Built-in arrays work routinely like that, and in fact the 
 originating arrays are not distinguished by type in any way 
 from their ranges (be they orphan or not). The question is, 
 what should std.container do about orphan ranges in general? 
 Should it allow them, disallow them, or leave the decision open 
 (e.g. to be made by the allocator)? Depending on what way we 
 go, the low-level design would be quite different.


 Thanks,

 Andrei

Wow, cool! To flat out disallow Orphan ranges is imho too restrictive, especially considering we already have a safe solution, what is missing is an unsafe(no overhead) version. If the design can handle safe/unsafe as a policy for a stack(scoped) based allocator... that would be kick ass, indeed.
Apr 15 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-04-15 18:34, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.

 Consider this motivating example:

 void main()
 {
 int[] x;
 {
 auto b = new int[20];
 x = b[5 .. $ - 5];
 }
 ... use x ...
 }

 The range x is a 10-elements range that originates in a 20-element
 array. There is no safe way to access the original array again, so in a
 sense the other 10 elements are "lost". That's why I call x an orphan
 range - a range of which original container is gone.

 Built-in arrays work routinely like that, and in fact the originating
 arrays are not distinguished by type in any way from their ranges (be
 they orphan or not). The question is, what should std.container do about
 orphan ranges in general? Should it allow them, disallow them, or leave
 the decision open (e.g. to be made by the allocator)? Depending on what
 way we go, the low-level design would be quite different.


 Thanks,

 Andrei

As you say, built-in arrays work like this. Therefore at least an allocator using the GC should allow this. -- /Jacob Carlborg
Apr 15 2012
prev sibling next sibling parent reply Chad J <chadjoan __spam.is.bad__gmail.com> writes:
On 04/15/2012 12:34 PM, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.

 ...
 Thanks,

 Andrei

How about * User-defined allocator ?? At the very least, I think we should be able to choose from the above strategies for a given container. If this isn't done, then there will always be people wanting to re-implement the containers just to accomplish silly stuff like allocating a little differently. As for safety: I would probably want to be able to tweak the ways my containers are safe/unsafe. I'd want safety by default, with the ability to choose my vulnerabilities. "Safety" is a fairly vague notion, since there are a number of things that could cause memory corruption or undefined behavior. Allowing me to decide which risk factors to take allows me to decide which causes of corruption/undefined behavior I want to expose myself to. Once there's a sufficiently large panel of twiddley knobs and toggle switches, then it might make sense to create "policies" like "safe" vs "blazing fast". As for orphan ranges: I've always written under the assumption that an array like 'x' would keep 'b' around. It would be nice if large amounts of memory wasted from orphans could be reused in some way, but not at the expense of a lot of speed. If large waste from orphans sticks around, a warning in the docs would be nice. Other note: For the longest time I've thought that D should do reference counting for types that can be proven to have no reference cycles. I don't understand why this is difficult, but even if it is it might be worth it. I don't actually need it personally, but I constantly hear complaints about "oh, D requires garbage collection because you can't use most of Phobos without it". What they really want, I bet, is deterministic memory management for realtime systems that does not cause thread contention/world stops. Add reference counting and this will suddenly be doable for what I'd image would be a lot of important stuff in Phobos (think CoW string functions--probably reference counter friendly). At this point the "reference counted" option gets subsumed into the "straight GC" / "GC+free" options, because the language/compiler would decide at compile-time which strategy to use for a range. I see D's current coverage of memory management techniques looking like this: - GC garbage collection: allocation for objects with complicated lifespans, with the caveat of complex allocation/deallocation code (check, not %100 featureful) - malloc/free for mediocre-speed allocation of objects whose lifespans can't be easily calculated but are known by the programmer (check) - custom allocators for when everything else fails (check, in most cases) - memory reuse with slices/immutability which gives us an awesome zero-overhead solution in a number of situations (check) - reference counting for almost-deterministic allocation of objects with easily calculated lifespans (**D does not cover this case!**) - stack for fast allocation of short-lived objects (check)
Apr 15 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/15/12 12:50 PM, Chad J wrote:
 How about
 * User-defined allocator
 ??

Yah, the allocator would be a generic parameter. Any allocator that supports a compatible interface with the archetypes used would work. But making e.g. file-backed allocators or whatnot may be difficult.
 As for safety:
 I would probably want to be able to tweak the ways my containers are
 safe/unsafe. I'd want safety by default, with the ability to choose my
 vulnerabilities. "Safety" is a fairly vague notion, since there are a
 number of things that could cause memory corruption or undefined
 behavior. Allowing me to decide which risk factors to take allows me to
 decide which causes of corruption/undefined behavior I want to expose
 myself to. Once there's a sufficiently large panel of twiddley knobs and
 toggle switches, then it might make sense to create "policies" like
 "safe" vs "blazing fast".

Makes sense.
 As for orphan ranges:
 I've always written under the assumption that an array like 'x' would
 keep 'b' around. It would be nice if large amounts of memory wasted from
 orphans could be reused in some way, but not at the expense of a lot of
 speed. If large waste from orphans sticks around, a warning in the docs
 would be nice.

So orphans should be allowed. Wonder if that should be an option or just "it".
 Other note:
 For the longest time I've thought that D should do reference counting
 for types that can be proven to have no reference cycles. I don't
 understand why this is difficult, but even if it is it might be worth
 it. I don't actually need it personally, but I constantly hear
 complaints about "oh, D requires garbage collection because you can't
 use most of Phobos without it". What they really want, I bet, is
 deterministic memory management for realtime systems that does not cause
 thread contention/world stops. Add reference counting and this will
 suddenly be doable for what I'd image would be a lot of important stuff
 in Phobos (think CoW string functions--probably reference counter
 friendly). At this point the "reference counted" option gets subsumed
 into the "straight GC" / "GC+free" options, because the
 language/compiler would decide at compile-time which strategy to use for
 a range.

Agreed. RC is not that difficult, and implementing it in terms of containers should be doable pretty nicely.
 I see D's current coverage of memory management techniques looking like
 this:

 - GC garbage collection: allocation for objects with complicated
 lifespans, with the caveat of complex allocation/deallocation code
 (check, not %100 featureful)

 - malloc/free for mediocre-speed allocation of objects whose lifespans
 can't be easily calculated but are known by the programmer (check)

 - custom allocators for when everything else fails (check, in most cases)

 - memory reuse with slices/immutability which gives us an awesome
 zero-overhead solution in a number of situations (check)

 - reference counting for almost-deterministic allocation of objects with
 easily calculated lifespans (**D does not cover this case!**)

 - stack for fast allocation of short-lived objects (check)

Sounds great. There's quite a few difficulties along the road, but I think the roadmap is good. Andrei
Apr 15 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
 Built-in arrays work routinely like that, and in fact the originating arrays
are
 not distinguished by type in any way from their ranges (be they orphan or not).
 The question is, what should std.container do about orphan ranges in general?
 Should it allow them, disallow them, or leave the decision open (e.g. to be
made
 by the allocator)? Depending on what way we go, the low-level design would be
 quite different.

I suspect that disallowing them would be surprising behavior, as D with array slices supports it well.
Apr 15 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/15/2012 11:43 AM, Walter Bright wrote:
 On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
 Built-in arrays work routinely like that, and in fact the originating arrays
are
 not distinguished by type in any way from their ranges (be they orphan or not).
 The question is, what should std.container do about orphan ranges in general?
 Should it allow them, disallow them, or leave the decision open (e.g. to be
made
 by the allocator)? Depending on what way we go, the low-level design would be
 quite different.

I suspect that disallowing them would be surprising behavior, as D with array slices supports it well.

I'd add that unless there's a pretty compelling reason not to, the collections should support orphan ranges.
Apr 15 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 15.04.2012 20:34, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

Say a pool allocator on top of any of these. Same goes for free-list which is arguably a limited form of pool allocator.
 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?

That was the main Q apparently. Potential answers follow.
 For now I have a related question about orphan ranges.

 Consider this motivating example:

 void main()
 {
 int[] x;
 {
 auto b = new int[20];

Imagine it's an Array!int(20) from here on. And x is a slice type for it.
 x = b[5 .. $ - 5];
 }
 ... use x ...
 }

Here, obviously the ability to use x not thinking at all of b is largely a property of GC allocator. I'd call it "any reference suffice" property, meaning that no extra effort needed to keep container alive. Otherwise if say used refcounted scheme then Array *itself* has to be aware of the fact its allocator is !any_reference_suffice thus put a an extra reference into it's slice. Now scale-up from this low-level issue. All of the above was written in the usual memory-safety-not-negotiable perspective. Call it a policy and enact it by default. Now kill all of extra safety hooks and call it unsafe-speed-comes-first policy. The answer then in my opinion that it's *both*. Allocator tells what kind of guarantees it has and container then observes if it has to do some extra work to ensure his policy stands (e.g. add extra refs to allow memory safety for orphan ranges). Because by the end of day it's container that provides ranges and thus responsible for this decision. -- Dmitry Olshansky
Apr 16 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 15 Apr 2012 12:34:11 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I'm making good progress on an allocator design. If things come together  
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

* pool/freelist based. Dcollections has such an allocator (Called a chunk allocator), which I hope would be applicable. It's in fact the default allocator. I saw you responded later that it will be a generic parameter. While I support that, and it is the same model in dcollections, it leaves a lot to be desired. For instance, if I have a RedBlackTree of SList!int, will each SList have it's own copy of an allocator? Sure you could use a singleton allocator to avoid overhead, but what if I simply want only the SLists in my RedBlackTree to share an allocator, and not with all SList!int? I've been occasionally putting some thought into that for dcollections, and I haven't really come up with a good solution yet, but it feels withink reach with some dedicated effort. I'm interested if you have a solution for this in mind.
 I need to kink out a few details, most important being - is safety part  
 of the allocator or an extricable property that's a different policy?  
 For now I have a related question about orphan ranges.

It's definitely an allocator property. For example, malloc/free is unsafe, there is no way to make it safe.
 Consider this motivating example:

 void main()
 {
      int[] x;
      {
          auto b = new int[20];
          x = b[5 .. $ - 5];
      }
      ... use x ...
 }

 The range x is a 10-elements range that originates in a 20-element  
 array. There is no safe way to access the original array again, so in a  
 sense the other 10 elements are "lost". That's why I call x an orphan  
 range - a range of which original container is gone.

 Built-in arrays work routinely like that, and in fact the originating  
 arrays are not distinguished by type in any way from their ranges (be  
 they orphan or not). The question is, what should std.container do about  
 orphan ranges in general? Should it allow them, disallow them, or leave  
 the decision open (e.g. to be made by the allocator)? Depending on what  
 way we go, the low-level design would be quite different.

I'd like to slightly correct your statement ;) b is a 20-element range that originates in a 31-element array (b.capacity is 31). The GC is the allocator and owner of the array type, which has no formal type. But I understand the question. and I think the situation is somewhat different -- most ranges will not be able to be attached spontaneously to different container instances. Nor will they rely on the allocator to generate extra instances at will to appease the requests. Built-in arrays/slices are like that. However, I think absolutely we need orphan range support. I think it's really a property of the allocator, not the container, as to whether orphan ranges exist. For proof, look no further than your list of possible allocators at the malloc/free version. Clearly orphan ranges are not implementable using that back-end, since once you lose your original container reference, the memory is leaked. If we make it container policy, any container would have to disallow that allocator. I think that's too restrictive, and makes the malloc/free allocator somewhat useless. In terms of "allowance", I don't see how you disallow them. All I think you can do is make using orphaned ranges defined behavior or not. -Steve
Apr 16 2012
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Sunday, April 15, 2012 11:34:11 Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.
 
 I'm currently looking at four allocation models:
 
 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)
 
 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.
 
 Consider this motivating example:
 
 void main()
 {
 int[] x;
 {
 auto b = new int[20];
 x = b[5 .. $ - 5];
 }
 ... use x ...
 }
 
 The range x is a 10-elements range that originates in a 20-element
 array. There is no safe way to access the original array again, so in a
 sense the other 10 elements are "lost". That's why I call x an orphan
 range - a range of which original container is gone.
 
 Built-in arrays work routinely like that, and in fact the originating
 arrays are not distinguished by type in any way from their ranges (be
 they orphan or not). The question is, what should std.container do about
 orphan ranges in general? Should it allow them, disallow them, or leave
 the decision open (e.g. to be made by the allocator)? Depending on what
 way we go, the low-level design would be quite different.

So, the problem that we have is basically void main() { Range r; { Container c = //allocated using allocator, however that works r = c[]; } //Container may or may not exist in memory, depending on the allocator, //but the range does exist and may or may not be valid. } How is this different from an iterator or range being invalidated after a function is called on the container which alters its state? You could theoretically add plumbing to the range to keep track of whether it's valid or not, but we're not doing that or planning to do that with std.container are we? And if we're not doing that, why would we go out of our way to do so in the case where a custom allocator causes a container to exist for a shorter lifetime than it would normally? - Jonathan M Davis
Apr 16 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/16/12 12:57 PM, Jonathan M Davis wrote:
 So, the problem that we have is basically

 void main()
 {
   Range r;
   {
   Container c = //allocated using allocator, however that works
   r = c[];
   }
   //Container may or may not exist in memory, depending on the allocator,
   //but the range does exist and may or may not be valid.
 }

Yes.
 How is this different from an iterator or range being invalidated after a
 function is called on the container which alters its state?

Well it's just a different matter. In particular containers offer the unstable versions of their primitives when they mess iterators up.
 You could
 theoretically add plumbing to the range to keep track of whether it's valid or
 not, but we're not doing that or planning to do that with std.container are
 we?

I guess we have to, at least plant the decision to do so or not in the allocator or in a policy. Andrei
Apr 16 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, April 16, 2012 23:07:34 Andrei Alexandrescu wrote:
 On 4/16/12 12:57 PM, Jonathan M Davis wrote:
 So, the problem that we have is basically
 
 void main()
 {
 
   Range r;
   {
   Container c = //allocated using allocator, however that works
   r = c[];
   }
   //Container may or may not exist in memory, depending on the allocator,
   //but the range does exist and may or may not be valid.
 
 }

Yes.
 How is this different from an iterator or range being invalidated after a
 function is called on the container which alters its state?

Well it's just a different matter. In particular containers offer the unstable versions of their primitives when they mess iterators up.

Yeah, but the example is pretty much the same as code where someone called remove instead of stableRemove. They did something that invalidates the existing ranges for the container. It's just that rather than calling a function to do it, they did it with scope and allocators.
 You could
 theoretically add plumbing to the range to keep track of whether it's
 valid or not, but we're not doing that or planning to do that with
 std.container are we?

I guess we have to, at least plant the decision to do so or not in the allocator or in a policy.

I don't really see how else we could handle it. At minimum, if we're going to do it normally, it needs to be via assertions so that they'll go away in release mode - the same goes for checking for ranges which get invalidated by function calls to the container. Otherwise, we get stuck with undesirable overhead. - Jonathan M Davis
Apr 16 2012