www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Array Design Idea

reply "Craig Black" <cblack ara.com> writes:
I know that there has been discussion about the possibility of changing 
arrays to use two pointers insead of a pointer and a size.  Since this idea 
seems to be on the table, I though I would share another related idea.  I 
have written my own array template class in C++ a long while ago and I 
thought I would share my design because it has worked well for me and is 
quite efficient.

There are three things that need to be stored for a resizable array:  the 
pointer to the array, the size, and the capacity.  My array implementation, 
both the size and the capacity are stored on the heap with the array.  There 
is only one heap allocation, and the first 8 bytes are reserved for the size 
and capacity values.  Further, there is no heap allocation if the array is 
empty.  If the array is empty, then the pointer is null, and the array is 
assumed to have a size and capacity of zero, so there is no need to store 
the size or capacity of an empty array.

This means that an empty array is just a null pointer.  This is beneficial 
for my purposes because arrays are used a lot in my code, and about half of 
them end up being empty.  So storing an empty array efficiently is a good 
thing.  I have been using this system for a long time and haven't had any 
problems with it.

-Craig 
Dec 10 2007
next sibling parent reply "Jb" <jb nowhere.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array:  the 
 pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap with 
 the array.  There is only one heap allocation, and the first 8 bytes are 
 reserved for the size and capacity values.  Further, there is no heap 
 allocation if the array is empty.  If the array is empty, then the pointer 
 is null, and the array is assumed to have a size and capacity of zero, so 
 there is no need to store the size or capacity of an empty array.

Thats how Delphi does dynamic arrays and strings. Well except that it has length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?
Dec 10 2007
parent reply "Craig Black" <cblack ara.com> writes:
"Jb" <jb nowhere.com> wrote in message news:fjk5g9$10lb$1 digitalmars.com...
 "Craig Black" <cblack ara.com> wrote in message 
 news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array:  the 
 pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap 
 with the array.  There is only one heap allocation, and the first 8 bytes 
 are reserved for the size and capacity values.  Further, there is no heap 
 allocation if the array is empty.  If the array is empty, then the 
 pointer is null, and the array is assumed to have a size and capacity of 
 zero, so there is no need to store the size or capacity of an empty 
 array.

Thats how Delphi does dynamic arrays and strings. Well except that it has length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?

I still haven't wrapped my head around the slicing issue. Obviously it's not an issue for C++ arrays. Doesn't D already allocate the capacity of the array on the heap? If this is so why doesn't this break slicing?
Dec 10 2007
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Craig Black wrote:
 "Jb" <jb nowhere.com> wrote in message news:fjk5g9$10lb$1 digitalmars.com...
 "Craig Black" <cblack ara.com> wrote in message 
 news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array:  the 
 pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap 
 with the array.  There is only one heap allocation, and the first 8 bytes 
 are reserved for the size and capacity values.  Further, there is no heap 
 allocation if the array is empty.  If the array is empty, then the 
 pointer is null, and the array is assumed to have a size and capacity of 
 zero, so there is no need to store the size or capacity of an empty 
 array.

length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?

I still haven't wrapped my head around the slicing issue. Obviously it's not an issue for C++ arrays. Doesn't D already allocate the capacity of the array on the heap? If this is so why doesn't this break slicing?

D relies on the GC to tell it what the capacity for a block is. However, because operations on slices should always reallocate, it is enough for the GC to simply return 0 for the capacity if the supplied pointer is not at the head of a memory block. Sean
Dec 10 2007
parent reply "Jb" <jb nowhere.com> writes:
"Sean Kelly" <sean f4.ca> wrote in message 
news:fjk706$134g$1 digitalmars.com...
 Craig Black wrote:
 "Jb" <jb nowhere.com> wrote in message 
 news:fjk5g9$10lb$1 digitalmars.com...
 "Craig Black" <cblack ara.com> wrote in message 
 news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array: 
 the pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap 
 with the array.  There is only one heap allocation, and the first 8 
 bytes are reserved for the size and capacity values.  Further, there is 
 no heap allocation if the array is empty.  If the array is empty, then 
 the pointer is null, and the array is assumed to have a size and 
 capacity of zero, so there is no need to store the size or capacity of 
 an empty array.

has length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?

I still haven't wrapped my head around the slicing issue. Obviously it's not an issue for C++ arrays. Doesn't D already allocate the capacity of the array on the heap? If this is so why doesn't this break slicing?

D relies on the GC to tell it what the capacity for a block is. However, because operations on slices should always reallocate, it is enough for the GC to simply return 0 for the capacity if the supplied pointer is not at the head of a memory block.

Except that a slice can somtimes point at the begining of a block. So slices dont always reallocate, although perhaps they should?
Dec 10 2007
parent Sean Kelly <sean f4.ca> writes:
Jb wrote:
 "Sean Kelly" <sean f4.ca> wrote in message 
 news:fjk706$134g$1 digitalmars.com...
 Craig Black wrote:
 "Jb" <jb nowhere.com> wrote in message 
 news:fjk5g9$10lb$1 digitalmars.com...
 "Craig Black" <cblack ara.com> wrote in message 
 news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array: 
 the pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap 
 with the array.  There is only one heap allocation, and the first 8 
 bytes are reserved for the size and capacity values.  Further, there is 
 no heap allocation if the array is empty.  If the array is empty, then 
 the pointer is null, and the array is assumed to have a size and 
 capacity of zero, so there is no need to store the size or capacity of 
 an empty array.

has length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?

not an issue for C++ arrays. Doesn't D already allocate the capacity of the array on the heap? If this is so why doesn't this break slicing?

because operations on slices should always reallocate, it is enough for the GC to simply return 0 for the capacity if the supplied pointer is not at the head of a memory block.

Except that a slice can somtimes point at the begining of a block. So slices dont always reallocate, although perhaps they should?

Steven Schveighoffer's proposal entitled "Proposal: alternative GC idea that negates the need for T[new]" would address issue, though it has yet to receive any responses. Sean
Dec 10 2007
prev sibling parent "Jb" <jb nowhere.com> writes:
"Craig Black" <cblack ara.com> wrote in message 
news:fjk5ra$11h6$1 digitalmars.com...
 "Jb" <jb nowhere.com> wrote in message 
 news:fjk5g9$10lb$1 digitalmars.com...
 "Craig Black" <cblack ara.com> wrote in message 
 news:fjk2k6$pfl$1 digitalmars.com...

 There are three things that need to be stored for a resizable array: 
 the pointer to the array, the size, and the capacity.  My array 
 implementation, both the size and the capacity are stored on the heap 
 with the array.  There is only one heap allocation, and the first 8 
 bytes are reserved for the size and capacity values.  Further, there is 
 no heap allocation if the array is empty.  If the array is empty, then 
 the pointer is null, and the array is assumed to have a size and 
 capacity of zero, so there is no need to store the size or capacity of 
 an empty array.

Thats how Delphi does dynamic arrays and strings. Well except that it has length & refcount rather than length & capacity. Wouldnt it break slicing though? You cant point halfway into an existing array and still have the length & capacity at -4,-8 bytes?

I still haven't wrapped my head around the slicing issue. Obviously it's not an issue for C++ arrays. Doesn't D already allocate the capacity of the array on the heap? If this is so why doesn't this break slicing?

From looking at the docs it seems 'capacity' is a property of the heap allocated block, not the array. And arrays only try to take advantage of this capacity when they happen to point at the begining of the block. If you slice into an existing array, starting at > 0, then resizing that slize will always result in a copy.
Dec 10 2007
prev sibling parent Robert Fraser <fraserofthenight gmail.com> writes:
Craig Black wrote:
 I know that there has been discussion about the possibility of changing 
 arrays to use two pointers insead of a pointer and a size.  Since this idea 
 seems to be on the table, I though I would share another related idea.  I 
 have written my own array template class in C++ a long while ago and I 
 thought I would share my design because it has worked well for me and is 
 quite efficient.
 
 There are three things that need to be stored for a resizable array:  the 
 pointer to the array, the size, and the capacity.  My array implementation, 
 both the size and the capacity are stored on the heap with the array.  There 
 is only one heap allocation, and the first 8 bytes are reserved for the size 
 and capacity values.  Further, there is no heap allocation if the array is 
 empty.  If the array is empty, then the pointer is null, and the array is 
 assumed to have a size and capacity of zero, so there is no need to store 
 the size or capacity of an empty array.
 
 This means that an empty array is just a null pointer.  This is beneficial 
 for my purposes because arrays are used a lot in my code, and about half of 
 them end up being empty.  So storing an empty array efficiently is a good 
 thing.  I have been using this system for a long time and haven't had any 
 problems with it.
 
 -Craig 
 
 

If you're looking for super-fast dynamic arrays... http://judy.sourceforge.net/
Dec 10 2007