www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - proposal: capacity variable for dynamic arrays

reply "Ameer Armaly" <ameer_armaly hotmail.com> writes:
Right now dynamic arrays use length to resize and handle memmory allocation. 
This is a bit limiting as far as the old memmory  reservation problem as 
well as a few other things.  What I propose is adding a new variable called 
capacity, which handles the actual memmory allocation and using length for 
the accepted size of the array.  Here are a few code fragments to show how 
it would work:

char[] string = new char[1024];
assert(string.length==1024 && string.capacity=1024);
string.length=0; // we've reserved memmory, but haven't freed it
version(break) writefln(string[1..4]); // throws a bounds exception, since 
length is zero
string.capacity = 1300; // traditional resize with a different variable, 
will be set to whatever the true capacity is
string.length = 2048; // would still work, since length is greater than 
capacity so it acts the same way.  However, capacity would be the actual 
amount of memmory reserved by the malloc algorithm, in this case doubling 
the size of the array.
string.capacity = 0; // freed

In short, this  solves the problem of reserving memmory with little cost, 
while maintaining backwards compatibility.  It could also be used in the 
stream and socket code, where the actual capacity doesn't change but the 
length does, removing the need for a separate size variable, something like 
this:

char[] buf = new char[1024];
buf.length = 0;
Socket s;
// set up the socket
s.read(buf); // Length is set to the actual data, but capacity is the same. 
This way the array is truely dynamic.

Thoughts?
-- 


Ameer
---
Life is either tragedy or comedy. Usually it's your choice. You can whine or 
you can laugh.
--Animorphs 
Jul 10 2006
next sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory allocation. 
 This is a bit limiting as far as the old memmory  reservation problem as 
 well as a few other things.  What I propose is adding a new variable called 
 capacity, which handles the actual memmory allocation and using length for 
 the accepted size of the array.  Here are a few code fragments to show how 
 it would work:
 
 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, since 
 length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed
 
 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something like 
 this:
 
 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the same. 
 This way the array is truely dynamic.
 
 Thoughts?
Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory. What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing arr.capacity = 1024; arr.capacity could be something at least 1024, like probably 2047. /Oskar
Jul 10 2006
parent reply "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:e8tvq5$1i3d$1 digitaldaemon.com...
 Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory 
 allocation. This is a bit limiting as far as the old memmory  reservation 
 problem as well as a few other things.  What I propose is adding a new 
 variable called capacity, which handles the actual memmory allocation and 
 using length for the accepted size of the array.  Here are a few code 
 fragments to show how it would work:

 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, 
 since length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed

 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something 
 like this:

 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the 
 same. This way the array is truely dynamic.

 Thoughts?
Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory. What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing arr.capacity = 1024; arr.capacity could be something at least 1024, like probably 2047.
Sounds good to me, especially since that means no redundant variables. What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.
 /Oskar 
Jul 10 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Ameer Armaly skrev:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:e8tvq5$1i3d$1 digitaldaemon.com...
 Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory 
 allocation. This is a bit limiting as far as the old memmory  reservation 
 problem as well as a few other things.  What I propose is adding a new 
 variable called capacity, which handles the actual memmory allocation and 
 using length for the accepted size of the array.  Here are a few code 
 fragments to show how it would work:

 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, 
 since length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed

 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something 
 like this:

 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the 
 same. This way the array is truely dynamic.

 Thoughts?
Where would the capacity value be stored? Today the capacity is (as far as I understand) stored by the GC in correspondance to the the allocated chunk of memory. What could be done is having a virtual .capacity property that when set to a value makes sure the capacity is /at least/ the value set and returns the actual capacity when read. So for example after doing arr.capacity = 1024; arr.capacity could be something at least 1024, like probably 2047.
Sounds good to me, especially since that means no redundant variables. What I see capacity as holding is the maximum amount of elements I can put in this array _without_ any new allocations.
Here is to play with. (You probably need to add ~/dmd/src/phobos/internal/gc to module search path). No warranties: :) --- arrcapacity.d import std.gc; import internal.gc.gcx; // :) int capacity(T)(T x) { gc_t gc = cast(gc_t)getGCHandle(); uint gccap = gc.capacity(x); if (gccap == 0) return x.length; return (gccap-1)/(typeof(x[0]).sizeof); } void setCapacity(T)(inout T x, int capacity) { int oldLength = x.length; x.length = capacity; x = x[0..oldLength]; } --- usage: if (arr.capacity() < 100) arr.setCapacity(200); It's a shame implicit injected array member functions don't support the property syntax: arr.capacity; arr.capacity=200; NOTE: I have no idea if the above code is even close to correct... The code would also be nicer if Phobos was altered slightly. Especially if _gc.capacity() was exposed by std.gc. /Oskar
Jul 10 2006
prev sibling next sibling parent "Chris Miller" <chris dprogramming.com> writes:
I'd like such a capacity property. The implementation need not even do  
anything and it merely be a request. A good thing about capacity is that  
it doesn't need to pre-initialize this extra memory since it's not even  
accessible yet.
Jul 10 2006
prev sibling next sibling parent reply "Lionello Lunesu" <lionello lunesu.remove.com> writes:
What's the capacity of a slice?

Futhermore, the capacity can't be stored in the memory block, since you 
don't know the start and end of the block. Could be stored in the array 
variables itself (as are length and ptr), but that would make the 
sizeof(array) and strange 12, probably being padded to 16, resulting in 
twice the mem transfer for array arguments, return values.

L. 
Jul 10 2006
parent reply "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message 
news:e8u973$21uo$1 digitaldaemon.com...
 What's the capacity of a slice?
The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.
 Futhermore, the capacity can't be stored in the memory block, since you 
 don't know the start and end of the block. Could be stored in the array 
 variables itself (as are length and ptr), but that would make the 
 sizeof(array) and strange 12, probably being padded to 16, resulting in 
 twice the mem transfer for array arguments, return values.

 L.
My favorite solution is the one mentioned previously in which capacity is a virtual property that obtains the block size from the gc itself in some manner.
 
Jul 10 2006
parent Kevin Bealer <Kevin_member pathlink.com> writes:
In article <e8udip$2c0r$1 digitaldaemon.com>, Ameer Armaly says...
"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message 
news:e8u973$21uo$1 digitaldaemon.com...
 What's the capacity of a slice?
The capacity of a slice is strictly the number of elements in that slice; any expansion in capacity would result in a new memmory block.
An alternate solution would be to use 0 for the capacity of slices; then allow people to downsize slice objects, but as soon as they increase them, reallocation is triggered. My rationale is the following scenario: char[] b = /* something */; char[] a = b[0..10]; // 10 bytes a.resize(5); a.resize(10); If slice capacity is the same as length, the slice gets smaller, and when it gets larger, a[5..10] is initialized, BUT so is b[5..10]. If the capacity were left at zero for the slice, the second resize would reallocate rather than corrupting the contents of B. This also provides a handy way of checking if an array is a slice.
 Futhermore, the capacity can't be stored in the memory block, since you 
 don't know the start and end of the block. Could be stored in the array 
 variables itself (as are length and ptr), but that would make the 
 sizeof(array) and strange 12, probably being padded to 16, resulting in 
 twice the mem transfer for array arguments, return values.

 L.
Since most systems will be 64 bit soon, I think this padding could be avoided most of the time -- you rarely want padding to more than 8 bytes. Of course, an array of arrays has the unfortunate "multiply by 24" when indexing (i.e. multiply by a non-power of two, which is slower than shifting.) Kevin
Jul 18 2006
prev sibling parent "Derek Parnell" <derek psych.ward> writes:
On Tue, 11 Jul 2006 02:22:30 +1000, Ameer Armaly  =

<ameer_armaly hotmail.com> wrote:

 Right now dynamic arrays use length to resize and handle memmory  =
 allocation.
 This is a bit limiting as far as the old memmory  reservation problem =
as
 well as a few other things.
If the behaviour in D changed (dare I say 'fixed') such that assigning a= n = array length to zero would *not* also set the pointer to null, then the = = capacity idea can be easily implemented as ... int[] arr; . . . arr.length =3D capacity; arr.length =3D 0; This would assign RAM to the required size and by setting length back to= = zero, it would be kept and 'resused' as the array elements where added. -- = Derek Parnell Melbourne, Australia
Jul 10 2006