www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The great slice debate -- should slices be separated from arrays?

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
In many other posts, people have been festering over dropping T[new] and  
not having a reference array type.  The argument is (and forgive me if  
this is a strawman, someone from the other side can post if I'm  
incorrect): If we make arrays a separate type from slices, and only allow  
appending on arrays, then we solve the stomping problem and the  
hard-to-determine reallocating problem.  For those who are unfamiliar with  
these problems, I'll try to outline them at the bottom of the post.

I contend that even if we make arrays a separate type, even if arrays are  
made a true reference type, slices will still suffer from the same  
hard-to-determine reallocation problem *unless* we make slices fatter.

My proof is as simple as an example.  Assume 'Array' is the new type for  
an array:

auto a = new Array!(int)(15); // allocate an array of 15 integers, all 0
auto s = a[0..5];
a ~= [1,2,3,4,5];
a[0] = 1.

Now, what does s[0] equal?

It depends on how s is defined.  If s is a simple pointer + length slice  
as defined today, then it is *STILL* hard-to-determine because you are  
unsure whether a had to reallocate to a new block!  If you make s a "fat"  
slice, that is, it contains a reference to the *original* array, and 2  
indexes, then you have issues:

1. you are now passing around 12 (24 for 64-bit) bytes for a slice instead  
of 8 (16), that hurts performance
2. you are now subject to a slice becoming invalid if a decides to  
reallocate into a smaller block and your slice indexes data that is now  
outside the block.
3. you are now subject to corruption if the array decides to truncate from  
the *front* of it's block, since the slice will now refer to different  
data.

Because of points 2 and 3, you lose static provability of slices being  
valid, since their backing array can reallocate at any time.

Are there any alternative implementations for slices that work better?  If  
not, I think thin slices (pointer and length) are the way to go, which  
leaves us with a still hard-to-determine reallocation strategy.  In this  
case, why not leave the arrays the way they are?

The answer is, stomping.  But we have already found ways to fix that  
without changing any code.  In the instance where you need the fastest  
appending possible, we can create a library type to handle that, then you  
convert to an actual array when you need it.  I think dsichma already is  
working on such a type.

So is there anyone in the "please separate arrays from slices" camp that  
can counter this?  Are there other solutions someone can think of or has  
already brought up that I didn't mention?

======================

Definition of the "hard-to-determine" reallocating problem:  (I'd classify  
this as non-deterministic reallocating, but it depends on your point of  
view, and Walter has fits over that term :)

If you reallocate an array, it may or may not copy the data to another  
memory block depending on if it can resize into the existing memory  
block.  If it can, then the original memory is still reference by the  
array.  The determination of when this occurs is dependent on the  
implementation of the runtime.  In the current incarnation of dmd, this  
occurs when the array is sized beyond powers of 2 up to a page size, and  
then at page size increments.

The trouble is, when you see code like this:

void foo(char[] buf)
{
    buf ~= "abcd";
    buf[0] = 'z';
}

You cannot tell whether the input array was affected because the append  
statement may or may not change the address of buf.  If it doesn't, then  
the argument passed in is affected.  If it does, the argument passed in is  
not affected.  The current solutions to this problem are to either  
document the behavior or dup the incoming buffer before appending (or  
using the buf = buf ~ "abcd" form which guarantees reallocation).

======================

Definition of the stomping problem:

If you append to a slice which *starts* at the beginning of a  
heap-allocated memory block, and the append operation fits within the  
memory block, the compiler reallocates in place.  However, if that slice  
was a smaller piece of a larger array, it will overwrite the data in the  
array.  The optimization came into effect when people noticed code like  
the following was very slow and ate up too much memory:

int[] x;

for(int i = 0; i < 10000; i++)
    x ~= i;

If x reallocates on every loop, it will allocate 10000 times, each time  
allocating a larger memory block.  This forces more GC collection cycles  
and can leave behind scores of unused pages of memory if the collector  
doesn't reuse or allow the OS to reclaim them.

The optimization depends on the likelihood of a slice that points to the  
beginning of a block being the owner of the memory block (that is, all  
other slices were created from it, and therefore do not go beyond the  
extents of the primary slice).  This is certainly the case after the first  
reallocation that occurs in the loop above, making this loop perfectly  
safe.  However, it is not hard to come up with an unsafe case:

int[] x = [1,2,3];
auto y = x[0..1]; // slice only the first element
y ~= 5; // now x == [1,5,3]

This is even more disturbing for const or immutable data:

string x = "hello".idup; // need the idup to put the data on the heap.
auto y = x[0..1]; // slice only "h";
y ~= "owdy"; // now x, which is supposed to be immutable, changed to  
"howdy"

This stomping becomes more of a problem for library writers:

char * toStringZ(string s)
{
    s ~= '\0';
    return s.ptr;
}

This seemingly innocuous function could easily corrupt immutable data if s  
is a slice of the beginning of a larger array.  Therefore, toStringZ has  
to be defensive:

char * toStringZ(string s)
{
    s = s ~ '\0'; // not using ~= operator forces a reallocation
    return s.ptr;
}

But this "defensive" style would be unnecessary if stomping could not  
occur.
Nov 24 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 09:27:22 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 This stomping becomes more of a problem for library writers:

 char * toStringZ(string s)
 {
     s ~= '\0';
     return s.ptr;
 }
Apologies, this should be written: immutable(char) * toStringZ(string s) -Steve
Nov 24 2009
prev sibling next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Just to clarify, is it true that stomping over mutable arrays is covered by 
the spec (currently), but stomping immutable arrays is classified as a bug? 
I was very surprised by this. 
Nov 24 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 10:19:57 -0500, Lutger <lutger.blijdestijn gmail.com>  
wrote:

 Just to clarify, is it true that stomping over mutable arrays is covered  
 by
 the spec (currently), but stomping immutable arrays is classified as a  
 bug?
 I was very surprised by this.
Stomping of arrays is covered by the spec (without reference to how it affects immutability), but so is immutability, so technically it is an inconsistency in the spec, but it's an inconsistency that cannot be fixed without declaring that all append operations render other slices that alias the same data invalid. See http://www.digitalmars.com/d/2.0/arrays.html#resize for the spec info on stomping, note there is no mention of immutability being compromised. -Steve
Nov 24 2009
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 09:27:22 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 I think dsichma already is working on such a type.
Gah! my apologies to David, I should know better as someone who has a hard-to-spell last name. For some reason I always read "dee-sic-ma" when I see your handle :) I meant of course dsimcha. -Steve
Nov 24 2009
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new] and 
 not having a reference array type.  The argument is (and forgive me if 
 this is a strawman, someone from the other side can post if I'm 
 incorrect): If we make arrays a separate type from slices, and only 
 allow appending on arrays, then we solve the stomping problem and the 
 hard-to-determine reallocating problem.  For those who are unfamiliar 
 with these problems, I'll try to outline them at the bottom of the post.
 
 I contend that even if we make arrays a separate type, even if arrays 
 are made a true reference type, slices will still suffer from the same 
 hard-to-determine reallocation problem *unless* we make slices fatter.
 
 My proof is as simple as an example.  Assume 'Array' is the new type for 
 an array:
 
 auto a = new Array!(int)(15); // allocate an array of 15 integers, all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
 
 Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations. Andrei
Nov 24 2009
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new]  
 and not having a reference array type.  The argument is (and forgive me  
 if this is a strawman, someone from the other side can post if I'm  
 incorrect): If we make arrays a separate type from slices, and only  
 allow appending on arrays, then we solve the stomping problem and the  
 hard-to-determine reallocating problem.  For those who are unfamiliar  
 with these problems, I'll try to outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if arrays  
 are made a true reference type, slices will still suffer from the same  
 hard-to-determine reallocation problem *unless* we make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type  
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, all  
 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations.
We weren't talking about performance, but I think I know what you are saying: To solve the hard-to-determine reallocation problem, you propose that any array which has had a slice taken from it, will reallocate on append even if it could grow into the block? That seems like a rather blunt solution... if(arr[0.."hello".length] == "hello") arr ~= "xyz"; // oops, reallocate because we took a slice earlier. Wouldn't this make slicing combined with appending unusable for performance reasons? Then the only time you want to use slices is if you are done appending to the array, and then even today's implementation wouldn't cause the hard-to-determine problem for code written that way. Maybe I misunderstood you, let me know. -Steve
Nov 24 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new] 
 and not having a reference array type.  The argument is (and forgive 
 me if this is a strawman, someone from the other side can post if I'm 
 incorrect): If we make arrays a separate type from slices, and only 
 allow appending on arrays, then we solve the stomping problem and the 
 hard-to-determine reallocating problem.  For those who are unfamiliar 
 with these problems, I'll try to outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer from 
 the same hard-to-determine reallocation problem *unless* we make 
 slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type 
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations.
We weren't talking about performance, but I think I know what you are saying: To solve the hard-to-determine reallocation problem, you propose that any array which has had a slice taken from it, will reallocate on append even if it could grow into the block? That seems like a rather blunt solution... if(arr[0.."hello".length] == "hello") arr ~= "xyz"; // oops, reallocate because we took a slice earlier. Wouldn't this make slicing combined with appending unusable for performance reasons? Then the only time you want to use slices is if you are done appending to the array, and then even today's implementation wouldn't cause the hard-to-determine problem for code written that way. Maybe I misunderstood you, let me know.
You understood me correctly. I think it's reasonable if Array defines ~= in relation to using other primitives. It's a common approach, for example, in C++ and Java to define an operation (such as bumping or dereferencing an iterator) in terms of no intervening conflicting operations against the same object. Andrei
Nov 24 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new]  
 and not having a reference array type.  The argument is (and forgive  
 me if this is a strawman, someone from the other side can post if I'm  
 incorrect): If we make arrays a separate type from slices, and only  
 allow appending on arrays, then we solve the stomping problem and the  
 hard-to-determine reallocating problem.  For those who are unfamiliar  
 with these problems, I'll try to outline them at the bottom of the  
 post.
  I contend that even if we make arrays a separate type, even if  
 arrays are made a true reference type, slices will still suffer from  
 the same hard-to-determine reallocation problem *unless* we make  
 slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type  
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers,  
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations.
We weren't talking about performance, but I think I know what you are saying: To solve the hard-to-determine reallocation problem, you propose that any array which has had a slice taken from it, will reallocate on append even if it could grow into the block? That seems like a rather blunt solution... if(arr[0.."hello".length] == "hello") arr ~= "xyz"; // oops, reallocate because we took a slice earlier. Wouldn't this make slicing combined with appending unusable for performance reasons? Then the only time you want to use slices is if you are done appending to the array, and then even today's implementation wouldn't cause the hard-to-determine problem for code written that way. Maybe I misunderstood you, let me know.
You understood me correctly. I think it's reasonable if Array defines ~= in relation to using other primitives. It's a common approach, for example, in C++ and Java to define an operation (such as bumping or dereferencing an iterator) in terms of no intervening conflicting operations against the same object.
Why do we need this scheme then? If it only makes sense to use slicing when you are done appending, then why not use that model with today's rules? Why make code that uses slices before you are finished appending perform poorly? Why make code that uses temporary slices (like I showed in my example) perform poorly? -Steve
Nov 24 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping 
 T[new] and not having a reference array type.  The argument is (and 
 forgive me if this is a strawman, someone from the other side can 
 post if I'm incorrect): If we make arrays a separate type from 
 slices, and only allow appending on arrays, then we solve the 
 stomping problem and the hard-to-determine reallocating problem.  
 For those who are unfamiliar with these problems, I'll try to 
 outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer 
 from the same hard-to-determine reallocation problem *unless* we 
 make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new 
 type for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations.
We weren't talking about performance, but I think I know what you are saying: To solve the hard-to-determine reallocation problem, you propose that any array which has had a slice taken from it, will reallocate on append even if it could grow into the block? That seems like a rather blunt solution... if(arr[0.."hello".length] == "hello") arr ~= "xyz"; // oops, reallocate because we took a slice earlier. Wouldn't this make slicing combined with appending unusable for performance reasons? Then the only time you want to use slices is if you are done appending to the array, and then even today's implementation wouldn't cause the hard-to-determine problem for code written that way. Maybe I misunderstood you, let me know.
You understood me correctly. I think it's reasonable if Array defines ~= in relation to using other primitives. It's a common approach, for example, in C++ and Java to define an operation (such as bumping or dereferencing an iterator) in terms of no intervening conflicting operations against the same object.
Why do we need this scheme then? If it only makes sense to use slicing when you are done appending, then why not use that model with today's rules? Why make code that uses slices before you are finished appending perform poorly? Why make code that uses temporary slices (like I showed in my example) perform poorly?
Because we don't know how to make things perfect in all circumstances. Andrei
Nov 24 2009
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 14:55:55 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Because we don't know how to make things perfect in all circumstances.
Well, this makes things less than perfect in most circumstances for the sake of a small questionable use case. IMO, this is a huge step backwards. -Steve
Nov 24 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:55:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Because we don't know how to make things perfect in all circumstances.
Well, this makes things less than perfect in most circumstances for the sake of a small questionable use case. IMO, this is a huge step backwards.
I don't understand (I think I've lost track of what's going on). What is the problem, what is a step backwards, and what is a better solution? Andrei
Nov 24 2009
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping 
 T[new] and not having a reference array type.  The argument is (and 
 forgive me if this is a strawman, someone from the other side can 
 post if I'm incorrect): If we make arrays a separate type from 
 slices, and only allow appending on arrays, then we solve the 
 stomping problem and the hard-to-determine reallocating problem.  
 For those who are unfamiliar with these problems, I'll try to 
 outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer 
 from the same hard-to-determine reallocation problem *unless* we 
 make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new 
 type for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?
Array may include a field bool sliceExtracted; that is set to true whenever you take a slice from the array and set to false whenever the array's data is reallocated. The array's documentation could mention that ~= is amortized constant if there are no intervening slicing operations.
We weren't talking about performance, but I think I know what you are saying: To solve the hard-to-determine reallocation problem, you propose that any array which has had a slice taken from it, will reallocate on append even if it could grow into the block? That seems like a rather blunt solution... if(arr[0.."hello".length] == "hello") arr ~= "xyz"; // oops, reallocate because we took a slice earlier. Wouldn't this make slicing combined with appending unusable for performance reasons? Then the only time you want to use slices is if you are done appending to the array, and then even today's implementation wouldn't cause the hard-to-determine problem for code written that way. Maybe I misunderstood you, let me know.
You understood me correctly. I think it's reasonable if Array defines ~= in relation to using other primitives. It's a common approach, for example, in C++ and Java to define an operation (such as bumping or dereferencing an iterator) in terms of no intervening conflicting operations against the same object.
Why do we need this scheme then? If it only makes sense to use slicing when you are done appending, then why not use that model with today's rules? Why make code that uses slices before you are finished appending perform poorly? Why make code that uses temporary slices (like I showed in my example) perform poorly? -Steve
Oh, I think I understand. I wasn't talking about the current proposal, but instead of your hypothetical Array class (note that I capitalized Array). So probably I caused confusion. Andrei
Nov 24 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 15:25:27 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  Why do we need this scheme then?  If it only makes sense to use  
 slicing when you are done appending, then why not use that model with  
 today's rules?  Why make code that uses slices before you are finished  
 appending perform poorly?  Why make code that uses temporary slices  
 (like I showed in my example) perform poorly?
  -Steve
Oh, I think I understand. I wasn't talking about the current proposal, but instead of your hypothetical Array class (note that I capitalized Array). So probably I caused confusion.
I might be the one causing confusion :) I assumed you were taking the side of "Array should be a separate type from slice" and proposing a method of avoiding hard-to-determine reallocation properties. But my counter-point is that your method forces users (at least users who care about performance) to avoid slicing in the first place until all appending operations are finished. A common usage of slicing is to temporarily use a read-only view of a portion of an array, as in my example. In that case, you inadvertently triggered a reallocation when none was necessary. While that is a conservative approach, it hinders performance unnecessarily. Basically, in many cases where slicing and appending are interleaved, you will incur an unreasonable performance penalty. *and* without this scheme, you can achieve the same results by writing your code the way you would have written it if this scheme were in place. In other words, isn't it just enough to recommend "don't slice before you're done appending" instead of inventing weird allocation rules to herd programmers in that direction? -Steve
Nov 24 2009
prev sibling parent reply Bartosz Milewski <bartosz-nospam relisoft.com> writes:
Andrei Alexandrescu Wrote:

 Array may include a field
 
     bool sliceExtracted;
 
 that is set to true whenever you take a slice from the array and set to 
 false whenever the array's data is reallocated. The array's 
 documentation could mention that ~= is amortized constant if there are 
 no intervening slicing operations.
In other words, an array may include a field "isUnique" that is set to true after every re-allocation and turned off whenever the compiler can't prove uniqueness--e.g., when a slice is taken. This "dynamic uniqueness" is different from "static uniqueness". The latter could be made part of the type system and checked at compile time, but the former might be just enough for the purpose of optimization. What is important is that it would clear the semantics of array expansion--it would always predictably break the connection to its slices, if any. Note also that if slices are just two pointers, they will never dangle because the GC will guarantee that the original is not recycled as long as somebody is pointing into it. If this could be implemented, I'd feel much better about D arrays.
Nov 24 2009
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Nov 2009 15:29:22 -0500, Bartosz Milewski  
<bartosz-nospam relisoft.com> wrote:

 Andrei Alexandrescu Wrote:

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set to
 false whenever the array's data is reallocated. The array's
 documentation could mention that ~= is amortized constant if there are
 no intervening slicing operations.
In other words, an array may include a field "isUnique" that is set to true after every re-allocation and turned off whenever the compiler can't prove uniqueness--e.g., when a slice is taken. This "dynamic uniqueness" is different from "static uniqueness". The latter could be made part of the type system and checked at compile time, but the former might be just enough for the purpose of optimization. What is important is that it would clear the semantics of array expansion--it would always predictably break the connection to its slices, if any. Note also that if slices are just two pointers, they will never dangle because the GC will guarantee that the original is not recycled as long as somebody is pointing into it. If this could be implemented, I'd feel much better about D arrays.
I'll add this to my opinion on Andrei's idea: If Array is a suplementary type in addition to slices (that is, appending to slices is still possible, and do not necessarily reallocate on appending), then I agree this is along the right lines for an appending array type used to build arrays as efficiently as possible. -Steve
Nov 24 2009
prev sibling parent "Phil Deets" <pjdeets2 gmail.com> writes:
I'm somewhat new to D; so take everything I say with a grain of salt.

It seems to me that there is a tension here between high-level and  
low-level. A high-level Array type might make slices be a Range. The Range  
would keep a reference to the Array type plus a start and end index. When  
the slice gets accessed, array range checking occurs to make sure the  
array hasn't shrunk outside of the area the Range covers.

Array!(int) x = [1, 2, 3, 4, 5];
auto y = x[0..$];
x.length -= 1;
y[4] // array out of bounds error

Also, when Array reallocates, all the Range slices would remain valid  
since they call .ptr to get the pointer anyway whenever they are indexed.  
Maybe even appending to a Range slice would insert the elements instead of  
stomp.

Array!(int) x = [1, 2, 3, 4, 5];
auto y = x[0..1];
y ~= [5, 6]; // x is [1, 5, 6, 2, 3, 4, 5];

The Range would tell the Array to insert the elements, and the Array would  
take care of informing all the slices to increase their indices if they  
are after the inserted data. This would require Arrays to hold weak  
references to all of their slices. (I know the GC doesn't support weak  
references, but I'm just speaking hypothetically.)

I think this would be perfectly safe and would avoid all the issues  
presented, but the performance would suffer. This level of "fatness" would  
be unacceptable for many system-level applications. So maybe int[] should  
be low-level and perhaps even avoid appending altogether, while Array  
would be available as a library type. But now, we are almost back to C++  
with pointers and vectors. The main difference would be D's low-level  
arrays store their length and can be bounds checked.

Please forgive me if someone has already proposed this. I didn't go back  
and read all the past discussion before I posted this. I don't even know  
what the functionality of T[new] was; so I probably don't understand all  
the issues here.
Nov 24 2009