digitalmars.D - Orphan ranges

Andrei Alexandrescu (33/33) Apr 15 2012 I'm making good progress on an allocator design. If things come together...

Tove (8/44) Apr 15 2012 Wow, cool!
Jacob Carlborg (5/38) Apr 15 2012 As you say, built-in arrays work like this. Therefore at least an
Chad J (52/66) Apr 15 2012 How about

Andrei Alexandrescu (12/59) Apr 15 2012 Yah, the allocator would be a generic parameter. Any allocator that

Walter Bright (3/9) Apr 15 2012 I suspect that disallowing them would be surprising behavior, as D with ...

Walter Bright (3/12) Apr 15 2012 I'd add that unless there's a pretty compelling reason not to, the colle...

Dmitry Olshansky (23/44) Apr 16 2012 Nice overview. I believe there should adapters on top of that.
Steven Schveighoffer (38/69) Apr 16 2012 * pool/freelist based. Dcollections has such an allocator (Called a chu...
Jonathan M Davis (20/58) Apr 16 2012 So, the problem that we have is basically

Andrei Alexandrescu (7/24) Apr 16 2012 Well it's just a different matter. In particular containers offer the

Jonathan M Davis (11/41) Apr 16 2012 Yeah, but the example is pretty much the same as code where someone call...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I'm making good progress on an allocator design. If things come together 
as I hope, it'll kick some serious ass.

I'm currently looking at four allocation models:

* straight GC, safe (no deallocation)
* GC + the ability to free memory
* malloc/free, unsafe, buyer beware (STL-level safety)
* reference counted (based on either malloc or GC+free)
* region (scoped)

I need to kink out a few details, most important being - is safety part 
of the allocator or an extricable property that's a different policy? 
For now I have a related question about orphan ranges.

Consider this motivating example:

void main()
{
     int[] x;
     {
         auto b = new int[20];
         x = b[5 .. $ - 5];
     }
     ... use x ...
}

The range x is a 10-elements range that originates in a 20-element 
array. There is no safe way to access the original array again, so in a 
sense the other 10 elements are "lost". That's why I call x an orphan 
range - a range of which original container is gone.

Built-in arrays work routinely like that, and in fact the originating 
arrays are not distinguished by type in any way from their ranges (be 
they orphan or not). The question is, what should std.container do about 
orphan ranges in general? Should it allow them, disallow them, or leave 
the decision open (e.g. to be made by the allocator)? Depending on what 
way we go, the low-level design would be quite different.


Thanks,

Andrei

Apr 15 2012

"Tove" <tove fransson.se> writes:

On Sunday, 15 April 2012 at 16:34:12 UTC, Andrei Alexandrescu 
wrote:
 I'm making good progress on an allocator design. If things come 
 together as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is 
 safety part of the allocator or an extricable property that's a 
 different policy? For now I have a related question about 
 orphan ranges.

 Consider this motivating example:

 void main()
 {
     int[] x;
     {
         auto b = new int[20];
         x = b[5 .. $ - 5];
     }
     ... use x ...
 }

 The range x is a 10-elements range that originates in a 
 20-element array. There is no safe way to access the original 
 array again, so in a sense the other 10 elements are "lost". 
 That's why I call x an orphan range - a range of which original 
 container is gone.

 Built-in arrays work routinely like that, and in fact the 
 originating arrays are not distinguished by type in any way 
 from their ranges (be they orphan or not). The question is, 
 what should std.container do about orphan ranges in general? 
 Should it allow them, disallow them, or leave the decision open 
 (e.g. to be made by the allocator)? Depending on what way we 
 go, the low-level design would be quite different.


 Thanks,

 Andrei

Wow, cool!

To flat out disallow Orphan ranges is imho too restrictive, 
especially considering we already have a safe solution, what is 
missing is an unsafe(no overhead) version.

If the design can handle safe/unsafe as a policy for a 
stack(scoped) based allocator... that would be kick ass, indeed.

Apr 15 2012

Jacob Carlborg <doob me.com> writes:

On 2012-04-15 18:34, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.

 Consider this motivating example:

 void main()
 {
 int[] x;
 {
 auto b = new int[20];
 x = b[5 .. $ - 5];
 }
 ... use x ...
 }

 The range x is a 10-elements range that originates in a 20-element
 array. There is no safe way to access the original array again, so in a
 sense the other 10 elements are "lost". That's why I call x an orphan
 range - a range of which original container is gone.

 Built-in arrays work routinely like that, and in fact the originating
 arrays are not distinguished by type in any way from their ranges (be
 they orphan or not). The question is, what should std.container do about
 orphan ranges in general? Should it allow them, disallow them, or leave
 the decision open (e.g. to be made by the allocator)? Depending on what
 way we go, the low-level design would be quite different.


 Thanks,

 Andrei

As you say, built-in arrays work like this. Therefore at least an 
allocator using the GC should allow this.

-- 
/Jacob Carlborg

Apr 15 2012

Chad J <chadjoan __spam.is.bad__gmail.com> writes:

On 04/15/2012 12:34 PM, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.

 ...
 Thanks,

 Andrei

How about
* User-defined allocator
??

At the very least, I think we should be able to choose from the above 
strategies for a given container.  If this isn't done, then there will 
always be people wanting to re-implement the containers just to 
accomplish silly stuff like allocating a little differently.

As for safety:
I would probably want to be able to tweak the ways my containers are 
safe/unsafe.  I'd want safety by default, with the ability to choose my 
vulnerabilities.  "Safety" is a fairly vague notion, since there are a 
number of things that could cause memory corruption or undefined 
behavior.  Allowing me to decide which risk factors to take allows me to 
decide which causes of corruption/undefined behavior I want to expose 
myself to.  Once there's a sufficiently large panel of twiddley knobs 
and toggle switches, then it might make sense to create "policies" like 
"safe" vs "blazing fast".

As for orphan ranges:
I've always written under the assumption that an array like 'x' would 
keep 'b' around.  It would be nice if large amounts of memory wasted 
from orphans could be reused in some way, but not at the expense of a 
lot of speed.  If large waste from orphans sticks around, a warning in 
the docs would be nice.

Other note:
For the longest time I've thought that D should do reference counting 
for types that can be proven to have no reference cycles.  I don't 
understand why this is difficult, but even if it is it might be worth 
it.  I don't actually need it personally, but I constantly hear 
complaints about "oh, D requires garbage collection because you can't 
use most of Phobos without it".  What they really want, I bet, is 
deterministic memory management for realtime systems that does not cause 
thread contention/world stops.  Add reference counting and this will 
suddenly be doable for what I'd image would be a lot of important stuff 
in Phobos (think CoW string functions--probably reference counter 
friendly).  At this point the "reference counted" option gets subsumed 
into the "straight GC" / "GC+free" options, because the 
language/compiler would decide at compile-time which strategy to use for 
a range.

I see D's current coverage of memory management techniques looking like 
this:

- GC garbage collection: allocation for objects with complicated 
lifespans, with the caveat of complex allocation/deallocation code 
(check, not %100 featureful)

- malloc/free for mediocre-speed allocation of objects whose lifespans 
can't be easily calculated but are known by the programmer (check)

- custom allocators for when everything else fails (check, in most cases)

- memory reuse with slices/immutability which gives us an awesome 
zero-overhead solution in a number of situations (check)

- reference counting for almost-deterministic allocation of objects with 
easily calculated lifespans (**D does not cover this case!**)

- stack for fast allocation of short-lived objects (check)

Apr 15 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/15/12 12:50 PM, Chad J wrote:
 How about
 * User-defined allocator
 ??

Yah, the allocator would be a generic parameter. Any allocator that 
supports a compatible interface with the archetypes used would work. But 
making e.g. file-backed allocators or whatnot may be difficult.

 As for safety:
 I would probably want to be able to tweak the ways my containers are
 safe/unsafe. I'd want safety by default, with the ability to choose my
 vulnerabilities. "Safety" is a fairly vague notion, since there are a
 number of things that could cause memory corruption or undefined
 behavior. Allowing me to decide which risk factors to take allows me to
 decide which causes of corruption/undefined behavior I want to expose
 myself to. Once there's a sufficiently large panel of twiddley knobs and
 toggle switches, then it might make sense to create "policies" like
 "safe" vs "blazing fast".

Makes sense.

 As for orphan ranges:
 I've always written under the assumption that an array like 'x' would
 keep 'b' around. It would be nice if large amounts of memory wasted from
 orphans could be reused in some way, but not at the expense of a lot of
 speed. If large waste from orphans sticks around, a warning in the docs
 would be nice.

So orphans should be allowed. Wonder if that should be an option or just 
"it".

 Other note:
 For the longest time I've thought that D should do reference counting
 for types that can be proven to have no reference cycles. I don't
 understand why this is difficult, but even if it is it might be worth
 it. I don't actually need it personally, but I constantly hear
 complaints about "oh, D requires garbage collection because you can't
 use most of Phobos without it". What they really want, I bet, is
 deterministic memory management for realtime systems that does not cause
 thread contention/world stops. Add reference counting and this will
 suddenly be doable for what I'd image would be a lot of important stuff
 in Phobos (think CoW string functions--probably reference counter
 friendly). At this point the "reference counted" option gets subsumed
 into the "straight GC" / "GC+free" options, because the
 language/compiler would decide at compile-time which strategy to use for
 a range.

Agreed. RC is not that difficult, and implementing it in terms of 
containers should be doable pretty nicely.

 I see D's current coverage of memory management techniques looking like
 this:

 - GC garbage collection: allocation for objects with complicated
 lifespans, with the caveat of complex allocation/deallocation code
 (check, not %100 featureful)

 - malloc/free for mediocre-speed allocation of objects whose lifespans
 can't be easily calculated but are known by the programmer (check)

 - custom allocators for when everything else fails (check, in most cases)

 - memory reuse with slices/immutability which gives us an awesome
 zero-overhead solution in a number of situations (check)

 - reference counting for almost-deterministic allocation of objects with
 easily calculated lifespans (**D does not cover this case!**)

 - stack for fast allocation of short-lived objects (check)

Sounds great. There's quite a few difficulties along the road, but I 
think the roadmap is good.


Andrei

Apr 15 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
 Built-in arrays work routinely like that, and in fact the originating arrays
are
 not distinguished by type in any way from their ranges (be they orphan or not).
 The question is, what should std.container do about orphan ranges in general?
 Should it allow them, disallow them, or leave the decision open (e.g. to be
made
 by the allocator)? Depending on what way we go, the low-level design would be
 quite different.

I suspect that disallowing them would be surprising behavior, as D with array 
slices supports it well.

Apr 15 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 4/15/2012 11:43 AM, Walter Bright wrote:
 On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:
 Built-in arrays work routinely like that, and in fact the originating arrays
are
 not distinguished by type in any way from their ranges (be they orphan or not).
 The question is, what should std.container do about orphan ranges in general?
 Should it allow them, disallow them, or leave the decision open (e.g. to be
made
 by the allocator)? Depending on what way we go, the low-level design would be
 quite different.

 I suspect that disallowing them would be surprising behavior, as D with array
 slices supports it well.

I'd add that unless there's a pretty compelling reason not to, the collections 
should support orphan ranges.

Apr 15 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 15.04.2012 20:34, Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

Nice overview. I believe there should adapters on top of that.
Say a pool allocator on top of any of these. Same goes for free-list 
which is arguably a limited form of pool allocator.

 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?

That was the main Q apparently. Potential answers follow.

 For now I have a related question about orphan ranges.

 Consider this motivating example:

 void main()
 {
 int[] x;
 {
 auto b = new int[20];

Imagine it's an Array!int(20) from here on. And x is a slice type for it.

 x = b[5 .. $ - 5];
 }
 ... use x ...
 }

Here, obviously the ability to use x not thinking at all of b is largely 
a property of GC allocator. I'd call it "any reference suffice" 
property, meaning that no extra effort needed to keep container alive.

Otherwise if say used refcounted scheme then Array *itself* has to be 
aware of the fact its allocator is !any_reference_suffice thus put a an 
extra reference into it's slice.

Now scale-up from this low-level issue. All of the above was written in 
the usual memory-safety-not-negotiable perspective. Call it a policy and 
enact it by default. Now kill all of extra safety hooks and call it 
unsafe-speed-comes-first policy.

The answer then in my opinion that it's *both*. Allocator tells what 
kind of guarantees it has and container then observes if it has to do 
some extra work to ensure his policy stands (e.g. add extra refs to 
allow memory safety for orphan ranges). Because by the end of day it's 
container that provides ranges and thus responsible for this decision.


-- 
Dmitry Olshansky

Apr 16 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 15 Apr 2012 12:34:11 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I'm making good progress on an allocator design. If things come together  
 as I hope, it'll kick some serious ass.

 I'm currently looking at four allocation models:

 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)

* pool/freelist based.  Dcollections has such an allocator (Called a chunk  
allocator), which I hope would be applicable.  It's in fact the default  
allocator.

I saw you responded later that it will be a generic parameter.  While I  
support that, and it is the same model in dcollections, it leaves a lot to  
be desired.  For instance, if I have a RedBlackTree of SList!int, will  
each SList have it's own copy of an allocator?  Sure you could use a  
singleton allocator to avoid overhead, but what if I simply want only the  
SLists in my RedBlackTree to share an allocator, and not with all  
SList!int?

I've been occasionally putting some thought into that for dcollections,  
and I haven't really come up with a good solution yet, but it feels  
withink reach with some dedicated effort.  I'm interested if you have a  
solution for this in mind.

 I need to kink out a few details, most important being - is safety part  
 of the allocator or an extricable property that's a different policy?  
 For now I have a related question about orphan ranges.

It's definitely an allocator property.  For example, malloc/free is  
unsafe, there is no way to make it safe.

 Consider this motivating example:

 void main()
 {
      int[] x;
      {
          auto b = new int[20];
          x = b[5 .. $ - 5];
      }
      ... use x ...
 }

 The range x is a 10-elements range that originates in a 20-element  
 array. There is no safe way to access the original array again, so in a  
 sense the other 10 elements are "lost". That's why I call x an orphan  
 range - a range of which original container is gone.

 Built-in arrays work routinely like that, and in fact the originating  
 arrays are not distinguished by type in any way from their ranges (be  
 they orphan or not). The question is, what should std.container do about  
 orphan ranges in general? Should it allow them, disallow them, or leave  
 the decision open (e.g. to be made by the allocator)? Depending on what  
 way we go, the low-level design would be quite different.

I'd like to slightly correct your statement ;)  b is a 20-element range  
that originates in a 31-element array (b.capacity is 31).  The GC is the  
allocator and owner of the array type, which has no formal type.

But I understand the question.  and I think the situation is somewhat  
different -- most ranges will not be able to be attached spontaneously to  
different container instances.  Nor will they rely on the allocator to  
generate extra instances at will to appease the requests.  Built-in  
arrays/slices are like that.

However, I think absolutely we need orphan range support.  I think it's  
really a property of the allocator, not the container, as to whether  
orphan ranges exist.  For proof, look no further than your list of  
possible allocators at the malloc/free version.  Clearly orphan ranges are  
not implementable using that back-end, since once you lose your original  
container reference, the memory is leaked.

If we make it container policy, any container would have to disallow that  
allocator.  I think that's too restrictive, and makes the malloc/free  
allocator somewhat useless.

In terms of "allowance", I don't see how you disallow them.  All I think  
you can do is make using orphaned ranges defined behavior or not.

-Steve

Apr 16 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Sunday, April 15, 2012 11:34:11 Andrei Alexandrescu wrote:
 I'm making good progress on an allocator design. If things come together
 as I hope, it'll kick some serious ass.
 
 I'm currently looking at four allocation models:
 
 * straight GC, safe (no deallocation)
 * GC + the ability to free memory
 * malloc/free, unsafe, buyer beware (STL-level safety)
 * reference counted (based on either malloc or GC+free)
 * region (scoped)
 
 I need to kink out a few details, most important being - is safety part
 of the allocator or an extricable property that's a different policy?
 For now I have a related question about orphan ranges.
 
 Consider this motivating example:
 
 void main()
 {
 int[] x;
 {
 auto b = new int[20];
 x = b[5 .. $ - 5];
 }
 ... use x ...
 }
 
 The range x is a 10-elements range that originates in a 20-element
 array. There is no safe way to access the original array again, so in a
 sense the other 10 elements are "lost". That's why I call x an orphan
 range - a range of which original container is gone.
 
 Built-in arrays work routinely like that, and in fact the originating
 arrays are not distinguished by type in any way from their ranges (be
 they orphan or not). The question is, what should std.container do about
 orphan ranges in general? Should it allow them, disallow them, or leave
 the decision open (e.g. to be made by the allocator)? Depending on what
 way we go, the low-level design would be quite different.

So, the problem that we have is basically

void main()
{
 Range r;
 {
 Container c = //allocated using allocator, however that works
 r = c[];
 }
 //Container may or may not exist in memory, depending on the allocator,
 //but the range does exist and may or may not be valid.
}

How is this different from an iterator or range being invalidated after a 
function is called on the container which alters its state? You could 
theoretically add plumbing to the range to keep track of whether it's valid or 
not, but we're not doing that or planning to do that with std.container are 
we? And if we're not doing that, why would we go out of our way to do so in 
the case where a custom allocator causes a container to exist for a shorter 
lifetime than it would normally?

- Jonathan M Davis

Apr 16 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/16/12 12:57 PM, Jonathan M Davis wrote:
 So, the problem that we have is basically

 void main()
 {
   Range r;
   {
   Container c = //allocated using allocator, however that works
   r = c[];
   }
   //Container may or may not exist in memory, depending on the allocator,
   //but the range does exist and may or may not be valid.
 }

Yes.

 How is this different from an iterator or range being invalidated after a
 function is called on the container which alters its state?

Well it's just a different matter. In particular containers offer the 
unstable versions of their primitives when they mess iterators up.

 You could
 theoretically add plumbing to the range to keep track of whether it's valid or
 not, but we're not doing that or planning to do that with std.container are
 we?

I guess we have to, at least plant the decision to do so or not in the 
allocator or in a policy.


Andrei

Apr 16 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, April 16, 2012 23:07:34 Andrei Alexandrescu wrote:
 On 4/16/12 12:57 PM, Jonathan M Davis wrote:
 So, the problem that we have is basically
 
 void main()
 {
 
   Range r;
   {
   Container c = //allocated using allocator, however that works
   r = c[];
   }
   //Container may or may not exist in memory, depending on the allocator,
   //but the range does exist and may or may not be valid.
 
 }

 
 Yes.
 
 How is this different from an iterator or range being invalidated after a
 function is called on the container which alters its state?

 
 Well it's just a different matter. In particular containers offer the
 unstable versions of their primitives when they mess iterators up.

Yeah, but the example is pretty much the same as code where someone called 
remove instead of stableRemove. They did something that invalidates the 
existing ranges for the container. It's just that rather than calling a 
function to do it, they did it with scope and allocators.

 You could
 theoretically add plumbing to the range to keep track of whether it's
 valid or not, but we're not doing that or planning to do that with
 std.container are we?

 
 I guess we have to, at least plant the decision to do so or not in the
 allocator or in a policy.

I don't really see how else we could handle it. At minimum, if we're going to 
do it normally, it needs to be via assertions so that they'll go away in 
release mode - the same goes for checking for ranges which get invalidated by 
function calls to the container. Otherwise, we get stuck with undesirable 
overhead.

- Jonathan M Davis

Apr 16 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Orphan ranges