digitalmars.D - The great slice debate -- should slices be separated from arrays?

Steven Schveighoffer (112/112) Nov 24 2009 In many other posts, people have been festering over dropping T[new] and...

Steven Schveighoffer (5/11) Nov 24 2009 Apologies, this should be written:
Lutger (3/3) Nov 24 2009 Just to clarify, is it true that stomping over mutable arrays is covered...

Steven Schveighoffer (10/15) Nov 24 2009 Stomping of arrays is covered by the spec (without reference to how it

Steven Schveighoffer (7/8) Nov 24 2009 Gah! my apologies to David, I should know better as someone who has a
Andrei Alexandrescu (8/29) Nov 24 2009 Array may include a field

Steven Schveighoffer (15/40) Nov 24 2009 We weren't talking about performance, but I think I know what you are

Andrei Alexandrescu (7/56) Nov 24 2009 You understood me correctly. I think it's reasonable if Array defines ~=...

Steven Schveighoffer (8/60) Nov 24 2009 Why do we need this scheme then? If it only makes sense to use slicing ...

Andrei Alexandrescu (3/65) Nov 24 2009 Because we don't know how to make things perfect in all circumstances.

Steven Schveighoffer (5/6) Nov 24 2009 Well, this makes things less than perfect in most circumstances for the ...

Andrei Alexandrescu (4/11) Nov 24 2009 I don't understand (I think I've lost track of what's going on). What is...

Andrei Alexandrescu (5/69) Nov 24 2009 Oh, I think I understand. I wasn't talking about the current proposal,

Steven Schveighoffer (20/30) Nov 24 2009 I might be the one causing confusion :) I assumed you were taking the

Bartosz Milewski (5/13) Nov 24 2009

Steven Schveighoffer (8/29) Nov 24 2009 I'll add this to my opinion on Andrei's idea: If Array is a suplementar...

Phil Deets (33/33) Nov 24 2009 I'm somewhat new to D; so take everything I say with a grain of salt.

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

In many other posts, people have been festering over dropping T[new] and  
not having a reference array type.  The argument is (and forgive me if  
this is a strawman, someone from the other side can post if I'm  
incorrect): If we make arrays a separate type from slices, and only allow  
appending on arrays, then we solve the stomping problem and the  
hard-to-determine reallocating problem.  For those who are unfamiliar with  
these problems, I'll try to outline them at the bottom of the post.

I contend that even if we make arrays a separate type, even if arrays are  
made a true reference type, slices will still suffer from the same  
hard-to-determine reallocation problem *unless* we make slices fatter.

My proof is as simple as an example.  Assume 'Array' is the new type for  
an array:

auto a = new Array!(int)(15); // allocate an array of 15 integers, all 0
auto s = a[0..5];
a ~= [1,2,3,4,5];
a[0] = 1.

Now, what does s[0] equal?

It depends on how s is defined.  If s is a simple pointer + length slice  
as defined today, then it is *STILL* hard-to-determine because you are  
unsure whether a had to reallocate to a new block!  If you make s a "fat"  
slice, that is, it contains a reference to the *original* array, and 2  
indexes, then you have issues:

1. you are now passing around 12 (24 for 64-bit) bytes for a slice instead  
of 8 (16), that hurts performance
2. you are now subject to a slice becoming invalid if a decides to  
reallocate into a smaller block and your slice indexes data that is now  
outside the block.
3. you are now subject to corruption if the array decides to truncate from  
the *front* of it's block, since the slice will now refer to different  
data.

Because of points 2 and 3, you lose static provability of slices being  
valid, since their backing array can reallocate at any time.

Are there any alternative implementations for slices that work better?  If  
not, I think thin slices (pointer and length) are the way to go, which  
leaves us with a still hard-to-determine reallocation strategy.  In this  
case, why not leave the arrays the way they are?

The answer is, stomping.  But we have already found ways to fix that  
without changing any code.  In the instance where you need the fastest  
appending possible, we can create a library type to handle that, then you  
convert to an actual array when you need it.  I think dsichma already is  
working on such a type.

So is there anyone in the "please separate arrays from slices" camp that  
can counter this?  Are there other solutions someone can think of or has  
already brought up that I didn't mention?

======================

Definition of the "hard-to-determine" reallocating problem:  (I'd classify  
this as non-deterministic reallocating, but it depends on your point of  
view, and Walter has fits over that term :)

If you reallocate an array, it may or may not copy the data to another  
memory block depending on if it can resize into the existing memory  
block.  If it can, then the original memory is still reference by the  
array.  The determination of when this occurs is dependent on the  
implementation of the runtime.  In the current incarnation of dmd, this  
occurs when the array is sized beyond powers of 2 up to a page size, and  
then at page size increments.

The trouble is, when you see code like this:

void foo(char[] buf)
{
    buf ~= "abcd";
    buf[0] = 'z';
}

You cannot tell whether the input array was affected because the append  
statement may or may not change the address of buf.  If it doesn't, then  
the argument passed in is affected.  If it does, the argument passed in is  
not affected.  The current solutions to this problem are to either  
document the behavior or dup the incoming buffer before appending (or  
using the buf = buf ~ "abcd" form which guarantees reallocation).

======================

Definition of the stomping problem:

If you append to a slice which *starts* at the beginning of a  
heap-allocated memory block, and the append operation fits within the  
memory block, the compiler reallocates in place.  However, if that slice  
was a smaller piece of a larger array, it will overwrite the data in the  
array.  The optimization came into effect when people noticed code like  
the following was very slow and ate up too much memory:

int[] x;

for(int i = 0; i < 10000; i++)
    x ~= i;

If x reallocates on every loop, it will allocate 10000 times, each time  
allocating a larger memory block.  This forces more GC collection cycles  
and can leave behind scores of unused pages of memory if the collector  
doesn't reuse or allow the OS to reclaim them.

The optimization depends on the likelihood of a slice that points to the  
beginning of a block being the owner of the memory block (that is, all  
other slices were created from it, and therefore do not go beyond the  
extents of the primary slice).  This is certainly the case after the first  
reallocation that occurs in the loop above, making this loop perfectly  
safe.  However, it is not hard to come up with an unsafe case:

int[] x = [1,2,3];
auto y = x[0..1]; // slice only the first element
y ~= 5; // now x == [1,5,3]

This is even more disturbing for const or immutable data:

string x = "hello".idup; // need the idup to put the data on the heap.
auto y = x[0..1]; // slice only "h";
y ~= "owdy"; // now x, which is supposed to be immutable, changed to  
"howdy"

This stomping becomes more of a problem for library writers:

char * toStringZ(string s)
{
    s ~= '\0';
    return s.ptr;
}

This seemingly innocuous function could easily corrupt immutable data if s  
is a slice of the beginning of a larger array.  Therefore, toStringZ has  
to be defensive:

char * toStringZ(string s)
{
    s = s ~ '\0'; // not using ~= operator forces a reallocation
    return s.ptr;
}

But this "defensive" style would be unnecessary if stomping could not  
occur.

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 09:27:22 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 This stomping becomes more of a problem for library writers:

 char * toStringZ(string s)
 {
     s ~= '\0';
     return s.ptr;
 }

Apologies, this should be written:

immutable(char) * toStringZ(string s)


-Steve

Nov 24 2009

Lutger <lutger.blijdestijn gmail.com> writes:

Just to clarify, is it true that stomping over mutable arrays is covered by 
the spec (currently), but stomping immutable arrays is classified as a bug? 
I was very surprised by this.

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 10:19:57 -0500, Lutger <lutger.blijdestijn gmail.com>  
wrote:

 Just to clarify, is it true that stomping over mutable arrays is covered  
 by
 the spec (currently), but stomping immutable arrays is classified as a  
 bug?
 I was very surprised by this.

Stomping of arrays is covered by the spec (without reference to how it  
affects immutability), but so is immutability, so technically it is an  
inconsistency in the spec, but it's an inconsistency that cannot be fixed  
without declaring that all append operations render other slices that  
alias the same data invalid.

See http://www.digitalmars.com/d/2.0/arrays.html#resize for the spec info  
on stomping, note there is no mention of immutability being compromised.

-Steve

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 09:27:22 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 I think dsichma already is working on such a type.

Gah!  my apologies to David, I should know better as someone who has a  
hard-to-spell last name.

For some reason I always read "dee-sic-ma" when I see your handle :)

I meant of course dsimcha.

-Steve

Nov 24 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new] and 
 not having a reference array type.  The argument is (and forgive me if 
 this is a strawman, someone from the other side can post if I'm 
 incorrect): If we make arrays a separate type from slices, and only 
 allow appending on arrays, then we solve the stomping problem and the 
 hard-to-determine reallocating problem.  For those who are unfamiliar 
 with these problems, I'll try to outline them at the bottom of the post.
 
 I contend that even if we make arrays a separate type, even if arrays 
 are made a true reference type, slices will still suffer from the same 
 hard-to-determine reallocation problem *unless* we make slices fatter.
 
 My proof is as simple as an example.  Assume 'Array' is the new type for 
 an array:
 
 auto a = new Array!(int)(15); // allocate an array of 15 integers, all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
 
 Now, what does s[0] equal?

Array may include a field

    bool sliceExtracted;

that is set to true whenever you take a slice from the array and set to 
false whenever the array's data is reallocated. The array's 
documentation could mention that ~= is amortized constant if there are 
no intervening slicing operations.


Andrei

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new]  
 and not having a reference array type.  The argument is (and forgive me  
 if this is a strawman, someone from the other side can post if I'm  
 incorrect): If we make arrays a separate type from slices, and only  
 allow appending on arrays, then we solve the stomping problem and the  
 hard-to-determine reallocating problem.  For those who are unfamiliar  
 with these problems, I'll try to outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if arrays  
 are made a true reference type, slices will still suffer from the same  
 hard-to-determine reallocation problem *unless* we make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type  
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, all  
 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set to  
 false whenever the array's data is reallocated. The array's  
 documentation could mention that ~= is amortized constant if there are  
 no intervening slicing operations.

We weren't talking about performance, but I think I know what you are  
saying: To solve the hard-to-determine reallocation problem, you propose  
that any array which has had a slice taken from it, will reallocate on  
append even if it could grow into the block?  That seems like a rather  
blunt solution...

if(arr[0.."hello".length] == "hello")
    arr ~= "xyz"; // oops, reallocate because we took a slice earlier.

Wouldn't this make slicing combined with appending unusable for  
performance reasons?  Then the only time you want to use slices is if you  
are done appending to the array, and then even today's implementation  
wouldn't cause the hard-to-determine problem for code written that way.

Maybe I misunderstood you, let me know.

-Steve

Nov 24 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new] 
 and not having a reference array type.  The argument is (and forgive 
 me if this is a strawman, someone from the other side can post if I'm 
 incorrect): If we make arrays a separate type from slices, and only 
 allow appending on arrays, then we solve the stomping problem and the 
 hard-to-determine reallocating problem.  For those who are unfamiliar 
 with these problems, I'll try to outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer from 
 the same hard-to-determine reallocation problem *unless* we make 
 slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type 
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set 
 to false whenever the array's data is reallocated. The array's 
 documentation could mention that ~= is amortized constant if there are 
 no intervening slicing operations.

 
 We weren't talking about performance, but I think I know what you are 
 saying: To solve the hard-to-determine reallocation problem, you propose 
 that any array which has had a slice taken from it, will reallocate on 
 append even if it could grow into the block?  That seems like a rather 
 blunt solution...
 
 if(arr[0.."hello".length] == "hello")
    arr ~= "xyz"; // oops, reallocate because we took a slice earlier.
 
 Wouldn't this make slicing combined with appending unusable for 
 performance reasons?  Then the only time you want to use slices is if 
 you are done appending to the array, and then even today's 
 implementation wouldn't cause the hard-to-determine problem for code 
 written that way.
 
 Maybe I misunderstood you, let me know.

You understood me correctly. I think it's reasonable if Array defines ~= 
in relation to using other primitives. It's a common approach, for 
example, in C++ and Java to define an operation (such as bumping or 
dereferencing an iterator) in terms of no intervening conflicting 
operations against the same object.


Andrei

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping T[new]  
 and not having a reference array type.  The argument is (and forgive  
 me if this is a strawman, someone from the other side can post if I'm  
 incorrect): If we make arrays a separate type from slices, and only  
 allow appending on arrays, then we solve the stomping problem and the  
 hard-to-determine reallocating problem.  For those who are unfamiliar  
 with these problems, I'll try to outline them at the bottom of the  
 post.
  I contend that even if we make arrays a separate type, even if  
 arrays are made a true reference type, slices will still suffer from  
 the same hard-to-determine reallocation problem *unless* we make  
 slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new type  
 for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers,  
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set  
 to false whenever the array's data is reallocated. The array's  
 documentation could mention that ~= is amortized constant if there are  
 no intervening slicing operations.

  We weren't talking about performance, but I think I know what you are  
 saying: To solve the hard-to-determine reallocation problem, you  
 propose that any array which has had a slice taken from it, will  
 reallocate on append even if it could grow into the block?  That seems  
 like a rather blunt solution...
  if(arr[0.."hello".length] == "hello")
    arr ~= "xyz"; // oops, reallocate because we took a slice earlier.
  Wouldn't this make slicing combined with appending unusable for  
 performance reasons?  Then the only time you want to use slices is if  
 you are done appending to the array, and then even today's  
 implementation wouldn't cause the hard-to-determine problem for code  
 written that way.
  Maybe I misunderstood you, let me know.

 You understood me correctly. I think it's reasonable if Array defines ~=  
 in relation to using other primitives. It's a common approach, for  
 example, in C++ and Java to define an operation (such as bumping or  
 dereferencing an iterator) in terms of no intervening conflicting  
 operations against the same object.

Why do we need this scheme then?  If it only makes sense to use slicing  
when you are done appending, then why not use that model with today's  
rules?  Why make code that uses slices before you are finished appending  
perform poorly?  Why make code that uses temporary slices (like I showed  
in my example) perform poorly?

-Steve

Nov 24 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping 
 T[new] and not having a reference array type.  The argument is (and 
 forgive me if this is a strawman, someone from the other side can 
 post if I'm incorrect): If we make arrays a separate type from 
 slices, and only allow appending on arrays, then we solve the 
 stomping problem and the hard-to-determine reallocating problem.  
 For those who are unfamiliar with these problems, I'll try to 
 outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer 
 from the same hard-to-determine reallocation problem *unless* we 
 make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new 
 type for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set 
 to false whenever the array's data is reallocated. The array's 
 documentation could mention that ~= is amortized constant if there 
 are no intervening slicing operations.

  We weren't talking about performance, but I think I know what you 
 are saying: To solve the hard-to-determine reallocation problem, you 
 propose that any array which has had a slice taken from it, will 
 reallocate on append even if it could grow into the block?  That 
 seems like a rather blunt solution...
  if(arr[0.."hello".length] == "hello")
    arr ~= "xyz"; // oops, reallocate because we took a slice earlier.
  Wouldn't this make slicing combined with appending unusable for 
 performance reasons?  Then the only time you want to use slices is if 
 you are done appending to the array, and then even today's 
 implementation wouldn't cause the hard-to-determine problem for code 
 written that way.
  Maybe I misunderstood you, let me know.

 You understood me correctly. I think it's reasonable if Array defines 
 ~= in relation to using other primitives. It's a common approach, for 
 example, in C++ and Java to define an operation (such as bumping or 
 dereferencing an iterator) in terms of no intervening conflicting 
 operations against the same object.

 
 Why do we need this scheme then?  If it only makes sense to use slicing 
 when you are done appending, then why not use that model with today's 
 rules?  Why make code that uses slices before you are finished appending 
 perform poorly?  Why make code that uses temporary slices (like I showed 
 in my example) perform poorly?

Because we don't know how to make things perfect in all circumstances.

Andrei

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 14:55:55 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Because we don't know how to make things perfect in all circumstances.

Well, this makes things less than perfect in most circumstances for the  
sake of a small questionable use case.  IMO, this is a huge step backwards.

-Steve

Nov 24 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:55:55 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Because we don't know how to make things perfect in all circumstances.

 
 Well, this makes things less than perfect in most circumstances for the 
 sake of a small questionable use case.  IMO, this is a huge step backwards.

I don't understand (I think I've lost track of what's going on). What is 
the problem, what is a step backwards, and what is a better solution?

Andrei

Nov 24 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 14:07:27 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Tue, 24 Nov 2009 11:07:43 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 In many other posts, people have been festering over dropping 
 T[new] and not having a reference array type.  The argument is (and 
 forgive me if this is a strawman, someone from the other side can 
 post if I'm incorrect): If we make arrays a separate type from 
 slices, and only allow appending on arrays, then we solve the 
 stomping problem and the hard-to-determine reallocating problem.  
 For those who are unfamiliar with these problems, I'll try to 
 outline them at the bottom of the post.
  I contend that even if we make arrays a separate type, even if 
 arrays are made a true reference type, slices will still suffer 
 from the same hard-to-determine reallocation problem *unless* we 
 make slices fatter.
  My proof is as simple as an example.  Assume 'Array' is the new 
 type for an array:
  auto a = new Array!(int)(15); // allocate an array of 15 integers, 
 all 0
 auto s = a[0..5];
 a ~= [1,2,3,4,5];
 a[0] = 1.
  Now, what does s[0] equal?

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set 
 to false whenever the array's data is reallocated. The array's 
 documentation could mention that ~= is amortized constant if there 
 are no intervening slicing operations.

  We weren't talking about performance, but I think I know what you 
 are saying: To solve the hard-to-determine reallocation problem, you 
 propose that any array which has had a slice taken from it, will 
 reallocate on append even if it could grow into the block?  That 
 seems like a rather blunt solution...
  if(arr[0.."hello".length] == "hello")
    arr ~= "xyz"; // oops, reallocate because we took a slice earlier.
  Wouldn't this make slicing combined with appending unusable for 
 performance reasons?  Then the only time you want to use slices is if 
 you are done appending to the array, and then even today's 
 implementation wouldn't cause the hard-to-determine problem for code 
 written that way.
  Maybe I misunderstood you, let me know.

 You understood me correctly. I think it's reasonable if Array defines 
 ~= in relation to using other primitives. It's a common approach, for 
 example, in C++ and Java to define an operation (such as bumping or 
 dereferencing an iterator) in terms of no intervening conflicting 
 operations against the same object.

 
 Why do we need this scheme then?  If it only makes sense to use slicing 
 when you are done appending, then why not use that model with today's 
 rules?  Why make code that uses slices before you are finished appending 
 perform poorly?  Why make code that uses temporary slices (like I showed 
 in my example) perform poorly?
 
 -Steve

Oh, I think I understand. I wasn't talking about the current proposal, 
but instead of your hypothetical Array class (note that I capitalized 
Array). So probably I caused confusion.

Andrei

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 15:25:27 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  Why do we need this scheme then?  If it only makes sense to use  
 slicing when you are done appending, then why not use that model with  
 today's rules?  Why make code that uses slices before you are finished  
 appending perform poorly?  Why make code that uses temporary slices  
 (like I showed in my example) perform poorly?
  -Steve

 Oh, I think I understand. I wasn't talking about the current proposal,  
 but instead of your hypothetical Array class (note that I capitalized  
 Array). So probably I caused confusion.

I might be the one causing confusion :)  I assumed you were taking the  
side of "Array should be a separate type from slice" and proposing a  
method of avoiding hard-to-determine reallocation properties.

But my counter-point is that your method forces users (at least users who  
care about performance) to avoid slicing in the first place until all  
appending operations are finished.  A common usage of slicing is to  
temporarily use a read-only view of a portion of an array, as in my  
example.  In that case, you inadvertently triggered a reallocation when  
none was necessary.  While that is a conservative approach, it hinders  
performance unnecessarily.  Basically, in many cases where slicing and  
appending are interleaved, you will incur an unreasonable performance  
penalty.

*and* without this scheme, you can achieve the same results by writing  
your code the way you would have written it if this scheme were in place.   
In other words, isn't it just enough to recommend "don't slice before  
you're done appending" instead of inventing weird allocation rules to herd  
programmers in that direction?

-Steve

Nov 24 2009

Bartosz Milewski <bartosz-nospam relisoft.com> writes:

Andrei Alexandrescu Wrote:

 Array may include a field
 
     bool sliceExtracted;
 
 that is set to true whenever you take a slice from the array and set to 
 false whenever the array's data is reallocated. The array's 
 documentation could mention that ~= is amortized constant if there are 
 no intervening slicing operations.

 
In other words, an array may include a field "isUnique" that is set to true
after every re-allocation and turned off whenever the compiler can't prove
uniqueness--e.g., when a slice is taken. This "dynamic uniqueness" is different
from "static uniqueness". The latter could be made part of the type system and
checked at compile time, but the former might be just enough for the purpose of
optimization. 

What is important is that it would clear the semantics of array expansion--it
would always predictably break the connection to its slices, if any. Note also
that if slices are just two pointers, they will never dangle because the GC
will guarantee that the original is not recycled as long as somebody is
pointing into it.

If this could be implemented, I'd feel much better about D arrays.

Nov 24 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Nov 2009 15:29:22 -0500, Bartosz Milewski  
<bartosz-nospam relisoft.com> wrote:

 Andrei Alexandrescu Wrote:

 Array may include a field

     bool sliceExtracted;

 that is set to true whenever you take a slice from the array and set to
 false whenever the array's data is reallocated. The array's
 documentation could mention that ~= is amortized constant if there are
 no intervening slicing operations.

 In other words, an array may include a field "isUnique" that is set to  
 true after every re-allocation and turned off whenever the compiler  
 can't prove uniqueness--e.g., when a slice is taken. This "dynamic  
 uniqueness" is different from "static uniqueness". The latter could be  
 made part of the type system and checked at compile time, but the former  
 might be just enough for the purpose of optimization.

 What is important is that it would clear the semantics of array  
 expansion--it would always predictably break the connection to its  
 slices, if any. Note also that if slices are just two pointers, they  
 will never dangle because the GC will guarantee that the original is not  
 recycled as long as somebody is pointing into it.

 If this could be implemented, I'd feel much better about D arrays.

I'll add this to my opinion on Andrei's idea:  If Array is a suplementary  
type in addition to slices (that is, appending to slices is still  
possible, and do not necessarily reallocate on appending), then I agree  
this is along the right lines for an appending array type used to build  
arrays as efficiently as possible.

-Steve

Nov 24 2009

"Phil Deets" <pjdeets2 gmail.com> writes:

I'm somewhat new to D; so take everything I say with a grain of salt.

It seems to me that there is a tension here between high-level and  
low-level. A high-level Array type might make slices be a Range. The Range  
would keep a reference to the Array type plus a start and end index. When  
the slice gets accessed, array range checking occurs to make sure the  
array hasn't shrunk outside of the area the Range covers.

Array!(int) x = [1, 2, 3, 4, 5];
auto y = x[0..$];
x.length -= 1;
y[4] // array out of bounds error

Also, when Array reallocates, all the Range slices would remain valid  
since they call .ptr to get the pointer anyway whenever they are indexed.  
Maybe even appending to a Range slice would insert the elements instead of  
stomp.

Array!(int) x = [1, 2, 3, 4, 5];
auto y = x[0..1];
y ~= [5, 6]; // x is [1, 5, 6, 2, 3, 4, 5];

The Range would tell the Array to insert the elements, and the Array would  
take care of informing all the slices to increase their indices if they  
are after the inserted data. This would require Arrays to hold weak  
references to all of their slices. (I know the GC doesn't support weak  
references, but I'm just speaking hypothetically.)

I think this would be perfectly safe and would avoid all the issues  
presented, but the performance would suffer. This level of "fatness" would  
be unacceptable for many system-level applications. So maybe int[] should  
be low-level and perhaps even avoid appending altogether, while Array  
would be available as a library type. But now, we are almost back to C++  
with pointers and vectors. The main difference would be D's low-level  
arrays store their length and can be bounds checked.

Please forgive me if someone has already proposed this. I didn't go back  
and read all the past discussion before I posted this. I don't even know  
what the functionality of T[new] was; so I probably don't understand all  
the issues here.

Nov 24 2009

D Programming

C/C++ Programming

Other

digitalmars.D - The great slice debate -- should slices be separated from arrays?