digitalmars.D - proposal: capacity variable for dynamic arrays

Ameer Armaly (36/36) Jul 10 2006 Right now dynamic arrays use length to resize and handle memmory allocat...

Oskar Linde (10/44) Jul 10 2006 Where would the capacity value be stored? Today the capacity is (as far

Ameer Armaly (5/49) Jul 10 2006 Sounds good to me, especially since that means no redundant variables. ...

Oskar Linde (28/78) Jul 10 2006 Here is to play with. (You probably need to add

Chris Miller (4/4) Jul 10 2006 I'd like such a capacity property. The implementation need not even do
Lionello Lunesu (7/7) Jul 10 2006 What's the capacity of a slice?

Ameer Armaly (7/15) Jul 10 2006 The capacity of a slice is strictly the number of elements in that slice...

Kevin Bealer (19/32) Jul 18 2006 An alternate solution would be to use 0 for the capacity of slices; then...

Derek Parnell (18/22) Jul 10 2006 as

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

Right now dynamic arrays use length to resize and handle memmory allocation. 
This is a bit limiting as far as the old memmory  reservation problem as 
well as a few other things.  What I propose is adding a new variable called 
capacity, which handles the actual memmory allocation and using length for 
the accepted size of the array.  Here are a few code fragments to show how 
it would work:

char[] string = new char[1024];
assert(string.length==1024 && string.capacity=1024);
string.length=0; // we've reserved memmory, but haven't freed it
version(break) writefln(string[1..4]); // throws a bounds exception, since 
length is zero
string.capacity = 1300; // traditional resize with a different variable, 
will be set to whatever the true capacity is
string.length = 2048; // would still work, since length is greater than 
capacity so it acts the same way.  However, capacity would be the actual 
amount of memmory reserved by the malloc algorithm, in this case doubling 
the size of the array.
string.capacity = 0; // freed

In short, this  solves the problem of reserving memmory with little cost, 
while maintaining backwards compatibility.  It could also be used in the 
stream and socket code, where the actual capacity doesn't change but the 
length does, removing the need for a separate size variable, something like 
this:

char[] buf = new char[1024];
buf.length = 0;
Socket s;
// set up the socket
s.read(buf); // Length is set to the actual data, but capacity is the same. 
This way the array is truely dynamic.

Thoughts?
-- 


Ameer
---
Life is either tragedy or comedy. Usually it's your choice. You can whine or 
you can laugh.
--Animorphs

Jul 10 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory allocation. 
 This is a bit limiting as far as the old memmory  reservation problem as 
 well as a few other things.  What I propose is adding a new variable called 
 capacity, which handles the actual memmory allocation and using length for 
 the accepted size of the array.  Here are a few code fragments to show how 
 it would work:
 
 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, since 
 length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed
 
 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something like 
 this:
 
 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the same. 
 This way the array is truely dynamic.
 
 Thoughts?

Where would the capacity value be stored? Today the capacity is (as far 
as I understand) stored by the GC in correspondance to the the allocated 
  chunk of memory.

What could be done is having a virtual .capacity property that when set 
to a value makes sure the capacity is /at least/ the value set and 
returns the actual capacity when read. So for example after doing
arr.capacity = 1024;
arr.capacity could be something at least 1024, like probably 2047.

/Oskar

Jul 10 2006

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

"Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
news:e8tvq5$1i3d$1 digitaldaemon.com...
 Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory 
 allocation. This is a bit limiting as far as the old memmory  reservation 
 problem as well as a few other things.  What I propose is adding a new 
 variable called capacity, which handles the actual memmory allocation and 
 using length for the accepted size of the array.  Here are a few code 
 fragments to show how it would work:

 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, 
 since length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed

 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something 
 like this:

 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the 
 same. This way the array is truely dynamic.

 Thoughts?

 Where would the capacity value be stored? Today the capacity is (as far as 
 I understand) stored by the GC in correspondance to the the allocated 
 chunk of memory.

 What could be done is having a virtual .capacity property that when set to 
 a value makes sure the capacity is /at least/ the value set and returns 
 the actual capacity when read. So for example after doing
 arr.capacity = 1024;
 arr.capacity could be something at least 1024, like probably 2047.

Sounds good to me, especially since that means no redundant variables.  What 
I see capacity as holding is the maximum amount of elements I can put in 
this array _without_ any new allocations.
 /Oskar

Jul 10 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Ameer Armaly skrev:
 "Oskar Linde" <oskar.lindeREM OVEgmail.com> wrote in message 
 news:e8tvq5$1i3d$1 digitaldaemon.com...
 Ameer Armaly skrev:
 Right now dynamic arrays use length to resize and handle memmory 
 allocation. This is a bit limiting as far as the old memmory  reservation 
 problem as well as a few other things.  What I propose is adding a new 
 variable called capacity, which handles the actual memmory allocation and 
 using length for the accepted size of the array.  Here are a few code 
 fragments to show how it would work:

 char[] string = new char[1024];
 assert(string.length==1024 && string.capacity=1024);
 string.length=0; // we've reserved memmory, but haven't freed it
 version(break) writefln(string[1..4]); // throws a bounds exception, 
 since length is zero
 string.capacity = 1300; // traditional resize with a different variable, 
 will be set to whatever the true capacity is
 string.length = 2048; // would still work, since length is greater than 
 capacity so it acts the same way.  However, capacity would be the actual 
 amount of memmory reserved by the malloc algorithm, in this case doubling 
 the size of the array.
 string.capacity = 0; // freed

 In short, this  solves the problem of reserving memmory with little cost, 
 while maintaining backwards compatibility.  It could also be used in the 
 stream and socket code, where the actual capacity doesn't change but the 
 length does, removing the need for a separate size variable, something 
 like this:

 char[] buf = new char[1024];
 buf.length = 0;
 Socket s;
 // set up the socket
 s.read(buf); // Length is set to the actual data, but capacity is the 
 same. This way the array is truely dynamic.

 Thoughts?

 Where would the capacity value be stored? Today the capacity is (as far as 
 I understand) stored by the GC in correspondance to the the allocated 
 chunk of memory.

 What could be done is having a virtual .capacity property that when set to 
 a value makes sure the capacity is /at least/ the value set and returns 
 the actual capacity when read. So for example after doing
 arr.capacity = 1024;
 arr.capacity could be something at least 1024, like probably 2047.

 Sounds good to me, especially since that means no redundant variables.  What 
 I see capacity as holding is the maximum amount of elements I can put in 
 this array _without_ any new allocations.

Here is to play with. (You probably need to add 
~/dmd/src/phobos/internal/gc to module search path). No warranties: :)


--- arrcapacity.d
import std.gc;
import internal.gc.gcx; // :)

int capacity(T)(T x) {
         gc_t gc = cast(gc_t)getGCHandle();
         uint gccap = gc.capacity(x);
         if (gccap == 0)
                 return x.length;
         return (gccap-1)/(typeof(x[0]).sizeof);
}

void setCapacity(T)(inout T x, int capacity) {
         int oldLength = x.length;
         x.length = capacity;
         x = x[0..oldLength];
}
---

usage:


if (arr.capacity() < 100)
	arr.setCapacity(200);

It's a shame implicit injected array member functions don't support the 
property syntax: arr.capacity; arr.capacity=200;

NOTE: I have no idea if the above code is even close to correct...
The code would also be nicer if Phobos was altered slightly. Especially 
if _gc.capacity() was exposed by std.gc.

/Oskar

Jul 10 2006

"Chris Miller" <chris dprogramming.com> writes:

I'd like such a capacity property. The implementation need not even do  
anything and it merely be a request. A good thing about capacity is that  
it doesn't need to pre-initialize this extra memory since it's not even  
accessible yet.

Jul 10 2006

"Lionello Lunesu" <lionello lunesu.remove.com> writes:

What's the capacity of a slice?

Futhermore, the capacity can't be stored in the memory block, since you 
don't know the start and end of the block. Could be stored in the array 
variables itself (as are length and ptr), but that would make the 
sizeof(array) and strange 12, probably being padded to 16, resulting in 
twice the mem transfer for array arguments, return values.

L.

Jul 10 2006

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message 
news:e8u973$21uo$1 digitaldaemon.com...
 What's the capacity of a slice?

The capacity of a slice is strictly the number of elements in that slice; 
any expansion in capacity would result in a new memmory block.
 Futhermore, the capacity can't be stored in the memory block, since you 
 don't know the start and end of the block. Could be stored in the array 
 variables itself (as are length and ptr), but that would make the 
 sizeof(array) and strange 12, probably being padded to 16, resulting in 
 twice the mem transfer for array arguments, return values.

 L.

My favorite solution is the one mentioned previously in which capacity is a 
virtual property that obtains the block size from the gc itself in some 
manner.

Jul 10 2006

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <e8udip$2c0r$1 digitaldaemon.com>, Ameer Armaly says...
"Lionello Lunesu" <lionello lunesu.remove.com> wrote in message 
news:e8u973$21uo$1 digitaldaemon.com...
 What's the capacity of a slice?

The capacity of a slice is strictly the number of elements in that slice; 
any expansion in capacity would result in a new memmory block.

An alternate solution would be to use 0 for the capacity of slices; then allow
people to downsize slice objects, but as soon as they increase them,
reallocation is triggered.

My rationale is the following scenario:

char[] b = /* something */;
char[] a = b[0..10]; // 10 bytes

a.resize(5);
a.resize(10);

If slice capacity is the same as length, the slice gets smaller, and when it
gets larger, a[5..10] is initialized, BUT so is b[5..10].  If the capacity were
left at zero for the slice, the second resize would reallocate rather than
corrupting the contents of B.

This also provides a handy way of checking if an array is a slice.

 Futhermore, the capacity can't be stored in the memory block, since you 
 don't know the start and end of the block. Could be stored in the array 
 variables itself (as are length and ptr), but that would make the 
 sizeof(array) and strange 12, probably being padded to 16, resulting in 
 twice the mem transfer for array arguments, return values.

 L.


Since most systems will be 64 bit soon, I think this padding could be avoided
most of the time -- you rarely want padding to more than 8 bytes.  Of course, an
array of arrays has the unfortunate "multiply by 24" when indexing (i.e.
multiply by a non-power of two, which is slower than shifting.)

Kevin

Jul 18 2006

"Derek Parnell" <derek psych.ward> writes:

On Tue, 11 Jul 2006 02:22:30 +1000, Ameer Armaly  =

<ameer_armaly hotmail.com> wrote:

 Right now dynamic arrays use length to resize and handle memmory  =

 allocation.
 This is a bit limiting as far as the old memmory  reservation problem =

as
 well as a few other things.

If the behaviour in D changed (dare I say 'fixed') such that assigning a=
n  =

array length to zero would *not* also set the pointer to null, then the =
 =

capacity idea can be easily implemented as ...

   int[] arr;
   . . .
   arr.length =3D capacity;
   arr.length =3D 0;

This would assign RAM to the required size and by setting length back to=
  =

zero, it would be kept and 'resused' as the array elements where added.

-- =

Derek Parnell
Melbourne, Australia

Jul 10 2006

D Programming

C/C++ Programming

Other

digitalmars.D - proposal: capacity variable for dynamic arrays