www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Resizable Arrays?

reply Jason House <jason.james.house gmail.com> writes:
Are there any good reasons to allow built in arrays to be resizable?

There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.

There are also bugs in bugzilla where resizing arrays after assigning one array
to another violates type safety. Those bugs can be solved by always
reallocatimg arrays when resizing, but that makes performance even worse...

While not much of an argument, C# also prevents array resizing.

Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... 
Feb 27 2009
next sibling parent reply "Tim M" <a b.com> writes:
On Sat, 28 Feb 2009 12:55:44 +1300, Jason House  
<jason.james.house gmail.com> wrote:

 Are there any good reasons to allow built in arrays to be resizable?

 There's already plenty of empirical evidence that building arrays by  
 appending is incredibly slow.

 There are also bugs in bugzilla where resizing arrays after assigning  
 one array to another violates type safety. Those bugs can be solved by  
 always reallocatimg arrays when resizing, but that makes performance  
 even worse...

 While not much of an argument, C# also prevents array resizing.

 Is it time to rely on standard libraries for data structures and let  
 built in arrays be fixed length? It's a breaking change...

C++ has stl vectors to make it up for lack of resizeable arrays. Theres nothing stopping you from having all your arrays static in D. I think your crazy :)
Feb 27 2009
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to tim,

 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)
 

I think the thought is that arrays can have dynamic length but can't be changed after allocation. starting with: int[] arr = new int[15]; assigning to length arr.length = 30; would always become auto tmp = new int[30]; tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)]; arr = tmp;
Feb 27 2009
parent Jason House <jason.james.house gmail.com> writes:
BCS Wrote:

 Reply to tim,
 
 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)
 

I think the thought is that arrays can have dynamic length but can't be changed after allocation.

Right. Length is fixed until (re)allication.
 starting with:
 
 int[] arr = new int[15];
 
 assigning to length
 
 arr.length = 30;
 
 would always become
 
 auto tmp = new int[30];
 tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)];
 arr = tmp;

Either that or a simple compile error... It's also possible a middle ground where only const(T)[] is fixed length until (re)allocation. I'm hoping there will be good discussion on the pros and cons of alternatives.
Feb 27 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sat, 28 Feb 2009 04:27:52 +0300, Jason House <jason.james.house gmail.com>
wrote:

 BCS Wrote:

 Reply to tim,

 Theres nothing stopping you from having all your arrays static in D. I
 think your crazy :)

I think the thought is that arrays can have dynamic length but can't be changed after allocation.

Right. Length is fixed until (re)allication.
 starting with:

 int[] arr = new int[15];

 assigning to length

 arr.length = 30;

 would always become

 auto tmp = new int[30];
 tmp[0..min(arr.length,$)] = arr[0..min(tmp.length,$)];
 arr = tmp;

Either that or a simple compile error... It's also possible a middle ground where only const(T)[] is fixed length until (re)allocation. I'm hoping there will be good discussion on the pros and cons of alternatives.

I might be wrong, but it seems that arrays are always reallocating memory when resize. I might be wrong but that's the behaviour I discovered in a recent test.
Feb 27 2009
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?

http://www.digitalmars.com/d/2.0/builtin.html
 There's already plenty of empirical evidence that building arrays by 
 appending is incredibly slow.

You need to really get into programming in D and explore the many possible ways of using arrays. Stewart.
Feb 27 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Jason House (jason.james.house gmail.com)'s article
 Are there any good reasons to allow built in arrays to be resizable?
 There's already plenty of empirical evidence that building arrays by appending

 There are also bugs in bugzilla where resizing arrays after assigning one array

arrays when resizing, but that makes performance even worse...
 While not much of an argument, C# also prevents array resizing.
 Is it time to rely on standard libraries for data structures and let built in

I think that builtin arrays in D are a godsend. Resizable arrays are such an important, heavily used feature that they should get special treatment from the language, to ensure that they are syntactically elegant, efficient both at runtime and in terms of compilation time, easy to use, free of odd corner cases, and in general are true first class citizens. Nonetheless, D's arrays do have some rough edges. My proposal would be as follows: There should be two array types: T[new] and T[]. This has been proposed before, though not in as much detail as what I'm suggesting. The semantics should work as follows: T[new] is considered a subtype of T[]. This works mostly like you would expect. T[new] can be implicitly converted to T[]. The twist is that T[] can also be assigned to T[new] without any explicit conversion, but a copy of the underlying data is implicitly made for safety. Assigning either a T[] or a T[new] to a T[] will result in aliasing: T[] foo; T[new] bar = new T[N]; bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 2. T[new]s are always assigned by value to make concatenation and resizing safe. The underlying data is copied implicitly when assigning from a T[] to a T[new] or from a T[new] to another T[new]. T[new] foo; T[] bar = new T[N]; // new returns T[new], using implicit conversion. bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 1. T[]s can be sliced, but cannot be enlarged or appended to. They should be laid out the same way they are now, as a struct of a pointer and a length. T[new]s can be appended to and enlarged in addition to being sliced. Their layout should be a struct of pointer, length, capacity to make appending fast. This will also make implicit conversion to T[] essentially free, since all that is necessary is slicing off the capacity field. I also wonder if this proposal could be extended to fix some of the covariance issues with arrays (see Bugzilla 2095). This might work by only allowing covariant types to be assigned to T[new], not T[]. For example: class A{} class B:A{} B[new] ba = [new B]; A[] aa = ba; // Error: Cannot assign covariant types to T[]. A[new] aa = ba; // Works, but copy is implicitly made.
Feb 27 2009
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Jason House (jason.james.house gmail.com)'s article
 Are there any good reasons to allow built in arrays to be resizable?
 There's already plenty of empirical evidence that building arrays by appending

 There are also bugs in bugzilla where resizing arrays after assigning one array

arrays when resizing, but that makes performance even worse...
 While not much of an argument, C# also prevents array resizing.
 Is it time to rely on standard libraries for data structures and let built in

I think that builtin arrays in D are a godsend. Resizable arrays are such an important, heavily used feature that they should get special treatment from the language, to ensure that they are syntactically elegant, efficient both at runtime and in terms of compilation time, easy to use, free of odd corner cases, and in general are true first class citizens. Nonetheless, D's arrays do have some rough edges. My proposal would be as follows: There should be two array types: T[new] and T[]. This has been proposed before, though not in as much detail as what I'm suggesting. The semantics should work as follows: T[new] is considered a subtype of T[]. This works mostly like you would expect. T[new] can be implicitly converted to T[]. The twist is that T[] can also be assigned to T[new] without any explicit conversion, but a copy of the underlying data is implicitly made for safety. Assigning either a T[] or a T[new] to a T[] will result in aliasing: T[] foo; T[new] bar = new T[N]; bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 2. T[new]s are always assigned by value to make concatenation and resizing safe. The underlying data is copied implicitly when assigning from a T[] to a T[new] or from a T[new] to another T[new]. T[new] foo; T[] bar = new T[N]; // new returns T[new], using implicit conversion. bar[0] = 1; foo = bar; foo[0] = 2; writeln(bar[0]); // Prints 1. T[]s can be sliced, but cannot be enlarged or appended to. They should be laid out the same way they are now, as a struct of a pointer and a length. T[new]s can be appended to and enlarged in addition to being sliced. Their layout should be a struct of pointer, length, capacity to make appending fast. This will also make implicit conversion to T[] essentially free, since all that is necessary is slicing off the capacity field. I also wonder if this proposal could be extended to fix some of the covariance issues with arrays (see Bugzilla 2095). This might work by only allowing covariant types to be assigned to T[new], not T[]. For example: class A{} class B:A{} B[new] ba = [new B]; A[] aa = ba; // Error: Cannot assign covariant types to T[]. A[new] aa = ba; // Works, but copy is implicitly made.

One minor detail that I thought of: When returning a T[new] from a function, the copying could be optimized away. If any escaping happened during the function call, this would be either by value or as a T[] instead of a T[new]. Therefore, at the end of a function, the only T[new] reference to the heap space in question is guaranteed to be the T[new] that is about to be returned. Since the reference in the callee is about to be destroyed, it can safely be returned without any copying.
Feb 27 2009
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
 
 There's already plenty of empirical evidence that building arrays by appending
is incredibly slow.
 
 There are also bugs in bugzilla where resizing arrays after assigning one
array to another violates type safety. Those bugs can be solved by always
reallocatimg arrays when resizing, but that makes performance even worse...
 
 While not much of an argument, C# also prevents array resizing.
 
 Is it time to rely on standard libraries for data structures and let built in
arrays be fixed length? It's a breaking change... 

C has dynamic arrays now. If D got rid of them I might just cry.
Feb 28 2009
parent "Tim M" <a b.com> writes:
On Sun, 01 Mar 2009 05:30:28 +1300, Sean Kelly <sean invisibleduck.org>  
wrote:

 Jason House wrote:
 Are there any good reasons to allow built in arrays to be resizable?
  There's already plenty of empirical evidence that building arrays by  
 appending is incredibly slow.
  There are also bugs in bugzilla where resizing arrays after assigning  
 one array to another violates type safety. Those bugs can be solved by  
 always reallocatimg arrays when resizing, but that makes performance  
 even worse...
  While not much of an argument, C# also prevents array resizing.
  Is it time to rely on standard libraries for data structures and let  
 built in arrays be fixed length? It's a breaking change...

C has dynamic arrays now. If D got rid of them I might just cry.

How do you mean? Are you talking about something like this: void someFunc(int s) { int ar[s]; }
Feb 28 2009