www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Array length & allocation question

reply Robert Atkinson <Robert_member pathlink.com> writes:
Quick question concerning Array lengths and memory allocations.

When an array.length = array.length + 1 (or length - 1) happens, does the system
only increase (decrease) the memory allocation by 1 [unit] or does it internally
mantain a buffer and try to minimise the resizing of the array?

I think I can remember seeing posts saying to maintain the buffer yourself and
other posts saying it was done automatically behind the scenes.
Jun 08 2006
next sibling parent BCS <BCS pathlink.com> writes:
Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?
 
 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.
 
 
 
 

it your self because you have more information to decide how to do it char[] first = "foo bar" func(first[0..3]); char[] func(char[] inp) { // first time around can't extend in place // logic to check this would be costly while(go()) inp.length = inp.length+1; return inp; }
Jun 08 2006
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?

The latter.
 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.

In most cases it's not worth it to try and maintain the buffer yourself. At the very least, you should test both methods and see which is faster. Sean
Jun 08 2006
parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Sean Kelly wrote:

 Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
 system only increase (decrease) the memory allocation by 1 [unit] or does
 it internally mantain a buffer and try to minimise the resizing of the
 array?

The latter.
 I think I can remember seeing posts saying to maintain the buffer
 yourself and other posts saying it was done automatically behind the
 scenes.

In most cases it's not worth it to try and maintain the buffer yourself. At the very least, you should test both methods and see which is faster. Sean

I think the "double-the-size-when-more-is-needed" strategy is used, and afaik, it is the one that performs best in the general case. -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi
Jun 08 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Lars Ivar Igesund wrote:
 Sean Kelly wrote:
 
 Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.

 When an array.length = array.length + 1 (or length - 1) happens, does the
 system only increase (decrease) the memory allocation by 1 [unit] or does
 it internally mantain a buffer and try to minimise the resizing of the
 array?

 I think I can remember seeing posts saying to maintain the buffer
 yourself and other posts saying it was done automatically behind the
 scenes.

At the very least, you should test both methods and see which is faster. Sean

I think the "double-the-size-when-more-is-needed" strategy is used, and afaik, it is the one that performs best in the general case.

Hum, and happens when one shortens the length of the array? The Memory Manager "back" buffer size remains the same? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 11 2006
parent reply "Derek Parnell" <derek psych.ward> writes:
On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros  
<brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The Memory  
 Manager "back" buffer size remains the same?

Yes. However there is a bug (oops - an issue) in which if the length is set to zero the RAM is released back to the the system. -- Derek Parnell Melbourne, Australia
Jun 11 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:
 
 Hum, and happens when one shortens the length of the array? The Memory 
 Manager "back" buffer size remains the same?

Yes. However there is a bug (oops - an issue) in which if the length is set to zero the RAM is released back to the the system. --Derek Parnell Melbourne, Australia

That makes perfect sense, why would it be a bug? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 12 2006
next sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Bruno Medeiros skrev:
 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The 
 Memory Manager "back" buffer size remains the same?

Yes. However there is a bug (oops - an issue) in which if the length is set to zero the RAM is released back to the the system. --Derek Parnell Melbourne, Australia

That makes perfect sense, why would it be a bug?

I don't know if this is what Derek refers to, but it used to be recommended practice to reserve space for an array by doing: arr.length = 1024; arr.length = 0; (start filling arr with data) I'm quite sure this used to be mentioned in the documentation, but I can no longer find any reference to it (except this old post: D/17691) Today, I guess you should do the following instead: arr.length = 1024; arr = arr[0..0]; (start filling arr with data) /Oskar
Jun 12 2006
prev sibling parent reply "Derek Parnell" <derek psych.ward> writes:
On Tue, 13 Jun 2006 05:27:44 +1000, Bruno Medeiros  
<brunodomedeirosATgmail SPAM.com> wrote:

 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros  
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The Memory  
 Manager "back" buffer size remains the same?

is set to zero the RAM is released back to the the system. --Derek Parnell Melbourne, Australia

That makes perfect sense, why would it be a bug?

Agreed, it is not a bug in the sense that it is contrary to specifications because this behaviour isn't specified. However it does prevent a coder from distinguishing between an empty array from a null array. An Empty one is an array that (no longer) has any elements and a null array is one that doesn't have any RAM to reference. I sugest that Walter either document this functionality or fix it. "When an array length is reduced the RAM it owns is not released and can be reused when the array subsequently is expanded (, unless the length is set to zero in which case the RAM is released). " Setting the length to zero is a convenient way to reserved RAM for an array. Also consider this ... foo(""); Now how can 'foo' be written to detect a coder's error of passing it an uninitialized array. char[] x; foo(x); -- Derek Parnell Melbourne, Australia
Jun 12 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Derek Parnell wrote:
 On Tue, 13 Jun 2006 05:27:44 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:
 
 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The 
 Memory Manager "back" buffer size remains the same?

is set to zero the RAM is released back to the the system. --Derek Parnell Melbourne, Australia

That makes perfect sense, why would it be a bug?

Agreed, it is not a bug in the sense that it is contrary to specifications because this behaviour isn't specified. However it does prevent a coder from distinguishing between an empty array from a null array. An Empty one is an array that (no longer) has any elements and a null array is one that doesn't have any RAM to reference. I sugest that Walter either document this functionality or fix it. "When an array length is reduced the RAM it owns is not released and can be reused when the array subsequently is expanded (, unless the length is set to zero in which case the RAM is released). " Setting the length to zero is a convenient way to reserved RAM for an array. Also consider this ... foo(""); Now how can 'foo' be written to detect a coder's error of passing it an uninitialized array. char[] x; foo(x);

Perhaps D arrays simply need a reserve property? Sean
Jun 12 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Sean Kelly skrev:

 Perhaps D arrays simply need a reserve property?

Something like this ought to work: template reserve(ArrTy,IntTy) { void reserve(inout ArrTy a, IntTy size) { if (size > a.length) { size_t old_length = a.length; a.length = size; a = a[0..old_length]; } } } usage: arr.reserve(1000); /Oskar
Jun 12 2006
prev sibling parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Derek Parnell skrev:

 I sugest that Walter either document this functionality or fix it.

I agree that it should be better documented.
 
 "When an array length is reduced the RAM it owns is not released and can 
 be reused when the array subsequently is expanded (, unless the length 
 is set to zero in which case the RAM is released). "
 
 Setting the length to zero is a convenient way to reserved RAM for an 
 array.

t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.
 Also consider this ...
 
     foo("");
 
 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.
 
     char[] x;
     foo(x);
 

Like this: void foo(char[] arr) { if (!arr) writefln("Uninitialized array passed"); else if (arr.length == 0) writefln("Zero length array received"); } /Oskar
Jun 12 2006
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:

 Setting the length to zero is a convenient way to reserved RAM for an 
 array.

t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

Unfortunately this only appears to reserve the RAM, because the next change in length will cause a new allocation to be made. See the example program below ...
 Also consider this ...
 
     foo("");
 
 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.
 
     char[] x;
     foo(x);
 

Like this: void foo(char[] arr) { if (!arr) writefln("Uninitialized array passed"); else if (arr.length == 0) writefln("Zero length array received"); }

Yes, I can see that D can now distinguish between the two. This didn't used to be the case, IIRC. However there is still a 'bug' with this as the program here demonstrates... import std.stdio; void main() { char[] arr; foo(arr); foo(""); foo("".dup); writefln("%s %s", arr.length, arr.ptr); arr.length = 100; writefln("%s %s", arr.length, arr.ptr); arr = arr[0..0]; writefln("%s %s", arr.length, arr.ptr); arr.length = 50; writefln("%s %s", arr.length, arr.ptr); arr.length = 500; writefln("%s %s", arr.length, arr.ptr); } void foo(char[] t) { writefln("foo: %s %s", t.length, t.ptr); } The results are ... foo: 0 0000 foo: 0 413080 foo: 0 0000 *** A 'dup'ed empty string is now a null string. 0 0000 100 8A2F00 0 8A2F00 *** RAM appears to be reserved. 50 8A1F80 *** But it is not as a new allocation just occurred. 500 8A3E00 *** This allocation is expected. -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 13/06/2006 11:08:24 AM
Jun 12 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Derek Parnell skrev:
 On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:
 
 Setting the length to zero is a convenient way to reserved RAM for an 
 array.


Unfortunately this only appears to reserve the RAM, because the next change in length will cause a new allocation to be made. See the example program below ...
 Also consider this ...

     foo("");

 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.

     char[] x;
     foo(x);

void foo(char[] arr) { if (!arr) writefln("Uninitialized array passed"); else if (arr.length == 0) writefln("Zero length array received"); }

Yes, I can see that D can now distinguish between the two. This didn't used to be the case, IIRC. However there is still a 'bug' with this as the program here demonstrates... import std.stdio; void main() { char[] arr; foo(arr); foo(""); foo("".dup); writefln("%s %s", arr.length, arr.ptr); arr.length = 100; writefln("%s %s", arr.length, arr.ptr); arr = arr[0..0]; writefln("%s %s", arr.length, arr.ptr); arr.length = 50; writefln("%s %s", arr.length, arr.ptr); arr.length = 500; writefln("%s %s", arr.length, arr.ptr); } void foo(char[] t) { writefln("foo: %s %s", t.length, t.ptr); } The results are ... foo: 0 0000 foo: 0 413080 foo: 0 0000 *** A 'dup'ed empty string is now a null string. 0 0000 100 8A2F00 0 8A2F00 *** RAM appears to be reserved. 50 8A1F80 *** But it is not as a new allocation just occurred. 500 8A3E00 *** This allocation is expected.

You are right, changing length forces a reallocation. Interestingly, the following works: arr.length = 100; arr = arr[0..0]; writefln("%s %s",arr.length,arr.ptr); for (int i = 0; i < 50; i++) arr ~= i; writefln("%s %s",arr.length,arr.ptr); prints (for me): 0 b7ee9e00 50 b7ee9e00 What is even more interesting is that the above "buggy" behavior seems intentional. The following patch removes the forced reallocation when changing length of a 0-length array: --- gc.d.orig 2006-06-04 11:50:08.979945284 +0200 +++ gc.d 2006-06-13 09:19:02.135348959 +0200 -382,8 +382,6 } //printf("newsize = %x, newlength = %x\n", newsize, newlength); - if (p.length) - { newdata = p.data; if (newlength > p.length) { -397,11 +395,6 } newdata[size .. newsize] = 0; } - } - else - { - newdata = cast(byte *)_gc.calloc(newsize + 1, 1); - } } else { With this change, your above code prints: $build -run ./arrtest ~/dmd/src/phobos/internal/gc/gc.d Path and Version : build v2.9(1197) built on Thu Aug 11 16:07:55 2005 foo: 0 0 foo: 0 805765c foo: 0 0 0 0 100 b7ee8e80 0 b7ee8e80 *** RAM is reserved 50 b7ee8e80 *** and is used 500 b7ee9e00 *** This causes reallocation as expected I wonder why the code looks like it does... /Oskar
Jun 13 2006
next sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 13 Jun 2006 09:24:34 +0200, Oskar Linde wrote:

 What is even more interesting is that the above "buggy" behavior
 seems intentional. The following patch removes the forced
 reallocation when changing length of a 0-length array:

Hmmm... I just rewrote that function as below and it seems to test out quite well too. I incorporated your change plus I removed the check for a zero new length. Seems to work without any problems. ----------------- extern (C) byte[] _d_arraysetlength(size_t newlength, size_t sizeelem, Array *p) in { assert(sizeelem); assert(!p.length || p.data); } body { byte* newdata; newdata = p.data; if (newlength > p.length) { version (D_InlineAsm_X86) { size_t newsize = void; asm { mov EAX,newlength ; mul EAX,sizeelem ; mov newsize,EAX ; jc Loverflow ; } } else { size_t newsize = sizeelem * newlength; if (newsize / newlength != sizeelem) goto Loverflow; } size_t size = p.length * sizeelem; size_t cap = _gc.capacity(p.data); if (cap <= newsize) { newdata = cast(byte *)_gc.malloc(newsize + 1); newdata[0 .. size] = p.data[0 .. size]; } newdata[size .. newsize] = 0; } p.data = newdata; p.length = newlength; return newdata[0 .. newlength]; Loverflow: _d_OutOfMemory(); } --------------- -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 13/06/2006 5:54:57 PM
Jun 13 2006
prev sibling parent Sean Kelly <sean f4.ca> writes:
Oskar Linde wrote:
 Derek Parnell skrev:
 On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:

 Setting the length to zero is a convenient way to reserved RAM for 
 an array.


Unfortunately this only appears to reserve the RAM, because the next change in length will cause a new allocation to be made. See the example program below ...
 Also consider this ...

     foo("");

 Now how can 'foo' be written to detect a coder's error of passing it 
 an uninitialized array.

     char[] x;
     foo(x);

void foo(char[] arr) { if (!arr) writefln("Uninitialized array passed"); else if (arr.length == 0) writefln("Zero length array received"); }

Yes, I can see that D can now distinguish between the two. This didn't used to be the case, IIRC. However there is still a 'bug' with this as the program here demonstrates... import std.stdio; void main() { char[] arr; foo(arr); foo(""); foo("".dup); writefln("%s %s", arr.length, arr.ptr); arr.length = 100; writefln("%s %s", arr.length, arr.ptr); arr = arr[0..0]; writefln("%s %s", arr.length, arr.ptr); arr.length = 50; writefln("%s %s", arr.length, arr.ptr); arr.length = 500; writefln("%s %s", arr.length, arr.ptr); } void foo(char[] t) { writefln("foo: %s %s", t.length, t.ptr); } The results are ... foo: 0 0000 foo: 0 413080 foo: 0 0000 *** A 'dup'ed empty string is now a null string. 0 0000 100 8A2F00 0 8A2F00 *** RAM appears to be reserved. 50 8A1F80 *** But it is not as a new allocation just occurred. 500 8A3E00 *** This allocation is expected.

You are right, changing length forces a reallocation. Interestingly, the following works: arr.length = 100; arr = arr[0..0]; writefln("%s %s",arr.length,arr.ptr); for (int i = 0; i < 50; i++) arr ~= i; writefln("%s %s",arr.length,arr.ptr); prints (for me): 0 b7ee9e00 50 b7ee9e00 What is even more interesting is that the above "buggy" behavior seems intentional.

Hrm, there were some changes to gc.d a while back, but it was more than 10 versions ago as that's as far back as I have installed at the moment. Perhaps Walter could comment on the change? I suspect it was probably a bug fix. Sean
Jun 13 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Oskar Linde wrote:
 
 Like this:
 
 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }
 
 /Oskar

This is not safe to do. Currently in D null arrays and zero-length arrays are conceptually the same. It just so happens that sometimes the arr.ptr is null and sometimes not, depending on the previous operations. The "A 'dup'ed empty string is now a null string." is an example of why that is not safe. I thought you knew this already? This is nothing new. BTW, I do find it (at first sight at least) unnatural that a null array is the same as a zero-length arrays. It doesn't seem conceptually right/consistent. -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 13 2006
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Bruno Medeiros skrev:
 Oskar Linde wrote:
 Like this:

 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }

 /Oskar

This is not safe to do. Currently in D null arrays and zero-length arrays are conceptually the same. It just so happens that sometimes the arr.ptr is null and sometimes not, depending on the previous operations. The "A 'dup'ed empty string is now a null string." is an example of why that is not safe. I thought you knew this already? This is nothing new.

Yeah, I knew about that. I did mot mean to imply that D is flawless in this regard. The cases given were: foo(""); and char[] s; foo(s); And for those, the above function works. My only point, if I had one, was that there are differences between zero length arrays and null arrays in some cases in D.
 BTW, I do find it (at first sight at least) unnatural that a null array 
 is the same as a zero-length arrays. It doesn't seem conceptually 
 right/consistent.

In my view, D's dynamic arrays are quite different from a conceptually ideal array. Conceptually, I see an array as an ordered collection of elements. The elements belong to (or are part of) the array. One could imagine such arrays as both value and reference types. For a reference type ideal array, there has to be a clear difference between null and zero length. A value type ideal array on the other hand would not need one such distinction. Another conceptual entity apart from an array is an array view. An array view refers to a selection of indices of another array. For example, a range of indices (aka a slice). An array view may or may not remain valid when the referred array changes. D's dynamic array is quite far from my ideal array. Both its reference and its value version. A closer match is actually a by-value array slice. Does it make sense for a by-value array slice type to discriminate between null and zero-length? I would say that it has its uses. For example, a regexp could match a zero length portion of a string. It is still important to know where in the string the match was made. D's arrays have both the role of a non-reference array and of an array slice. In the role of an non-reference array, it makes sense that null is equivalent to zero-length. In the role of an array slice on the other hand, it does make sense to discriminate between zero length and null. There are other differences. Appending elements only makes sense to the array role, not the slice role. dup creates an array from a slice or an array. It therefore makes sense that dup returns null on zero length arrays. The semantics of some operations depends on the role the array has. D has no way of knowing, so it guesses. Take that with a grain of salt, but operations on arrays depend on a runtime judgment by the gc. Take the append operation. Appending elements to a D array that is in the array role makes sense and works like a charm. Appending elements to an array slice doesn't make any sense, but D will create a new array with copies of the elements the slice refers to and append the element to that array. The slice has been transformed into an array. But how does D know when an array is in the slice role or the array role? It doesn't. Here is where the (educated) guess comes in. Any array that starts at the beginning of a gc chunk is assumed to be an array. Otherwise, it is assumed to be a slice. The implications are: char[] mystr = "abcd".dup; char[] slice1 = mystr[0..1]; char[] slice2 = mystr[1..2]; slice1 ~= "x"; // alters the original mystr slice2 ~= "y"; // doesn't alter the original I've written too much nonsense now. Some condensed conclusions: - D's arrays have a schizophrenic nature (slice vs array) - The compiler is unable to tell the difference and can't protect you against mistakes - D arrays are not self documenting: char[] foo(); // <- returns an array or a slice of someone else's array? /Oskar
Jun 13 2006
parent Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Oskar Linde wrote:
 Bruno Medeiros skrev:
 Oskar Linde wrote:
 Like this:

 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }

 /Oskar

This is not safe to do. Currently in D null arrays and zero-length arrays are conceptually the same. It just so happens that sometimes the arr.ptr is null and sometimes not, depending on the previous operations. The "A 'dup'ed empty string is now a null string." is an example of why that is not safe. I thought you knew this already? This is nothing new.

Yeah, I knew about that. I did mot mean to imply that D is flawless in this regard. The cases given were: foo(""); and char[] s; foo(s); And for those, the above function works. My only point, if I had one, was that there are differences between zero length arrays and null arrays in some cases in D.
 BTW, I do find it (at first sight at least) unnatural that a null 
 array is the same as a zero-length arrays. It doesn't seem 
 conceptually right/consistent.

In my view, D's dynamic arrays are quite different from a conceptually ideal array. Conceptually, I see an array as an ordered collection of elements. The elements belong to (or are part of) the array. One could imagine such arrays as both value and reference types. For a reference type ideal array, there has to be a clear difference between null and zero length. A value type ideal array on the other hand would not need one such distinction. Another conceptual entity apart from an array is an array view. An array view refers to a selection of indices of another array. For example, a range of indices (aka a slice). An array view may or may not remain valid when the referred array changes. D's dynamic array is quite far from my ideal array. Both its reference and its value version. A closer match is actually a by-value array slice. Does it make sense for a by-value array slice type to discriminate between null and zero-length? I would say that it has its uses. For example, a regexp could match a zero length portion of a string. It is still important to know where in the string the match was made. D's arrays have both the role of a non-reference array and of an array slice. In the role of an non-reference array, it makes sense that null is equivalent to zero-length. In the role of an array slice on the other hand, it does make sense to discriminate between zero length and null. There are other differences. Appending elements only makes sense to the array role, not the slice role. dup creates an array from a slice or an array. It therefore makes sense that dup returns null on zero length arrays. The semantics of some operations depends on the role the array has. D has no way of knowing, so it guesses. Take that with a grain of salt, but operations on arrays depend on a runtime judgment by the gc. Take the append operation. Appending elements to a D array that is in the array role makes sense and works like a charm. Appending elements to an array slice doesn't make any sense, but D will create a new array with copies of the elements the slice refers to and append the element to that array. The slice has been transformed into an array. But how does D know when an array is in the slice role or the array role? It doesn't. Here is where the (educated) guess comes in. Any array that starts at the beginning of a gc chunk is assumed to be an array. Otherwise, it is assumed to be a slice. The implications are: char[] mystr = "abcd".dup; char[] slice1 = mystr[0..1]; char[] slice2 = mystr[1..2]; slice1 ~= "x"; // alters the original mystr slice2 ~= "y"; // doesn't alter the original

Well, those new thing you mentioned are actually very related with ownership management, and reference/object immutibility, than to just arrays itself.
 I've written too much nonsense now. Some condensed conclusions:
 
 - D's arrays have a schizophrenic nature (slice vs array)
 - The compiler is unable to tell the difference and can't protect you 
 against mistakes
 - D arrays are not self documenting:
 
 char[] foo(); // <- returns an array or a slice of someone else's array?
 
 /Oskar

We have often mentioned the problems of arrays (both static and dynamic) before. It should be brought under discussion to the "general" D public eventually. (although for me preferably not soon, other things to take care) -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jun 14 2006
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?
 
 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.
 
 

Setting the array length does just that and nothing more or less. But using the the array concatenation operator (~) will preallocate some space. time this: int[] arr; for(int i = 0; i < 1000000; i++) { arr.length = arr.length + 1; arr[i] = i; } vs this: int[] arr; for(int i = 0; i < 1000000; i++) { arr ~= i; }
Jun 08 2006
parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
Dave wrote:
 Setting the array length does just that and nothing more or less. But 
 using the the array concatenation operator (~) will preallocate some space.
 
 time this:
 
     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr.length = arr.length + 1;
         arr[i] = i;
     }
 
 vs this:
 
     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr ~= i;
     }

So I did. :) My test program: # module array_alloc; # # import cashew .utils .Benchmark ; # import mango .io .Stdout ; # # void main () { # auto bench = new BaselineBenchmark ("Index Assign"c); # # for (int i; i < 10; i++) { # bench .begin (); # viaIndexAssign (); # bench .end (); # } # # Stdout(CR); # bench.reset("Cat Assign"); # for (int i; i < 10; i++) { # bench .begin (); # viaCatAssign (); # bench .end (); # } # } # # void viaIndexAssign () { # int[] arr ; # # for(int i; i < 1_000_000; i++) { # arr.length = arr.length + 1; # arr[i] = i; # } # } # # void viaCatAssign () { # int[] arr ; # # for(int i; i < 1_000_000; i++) # arr ~= i; # } And my results, compiling with "-release -O -inline", were: <Benchmark Index Assign> Baseline 79.090000 <Benchmark Index Assign> 43.830000 & 1.804472 versus baseline <Benchmark Index Assign> 42.570000 & 1.857881 versus baseline <Benchmark Index Assign> 42.560000 & 1.858318 versus baseline <Benchmark Index Assign> 42.410000 & 1.864890 versus baseline <Benchmark Index Assign> 41.680000 & 1.897553 versus baseline <Benchmark Index Assign> 41.640000 & 1.899376 versus baseline <Benchmark Index Assign> 41.580000 & 1.902116 versus baseline <Benchmark Index Assign> 41.580000 & 1.902116 versus baseline <Benchmark Index Assign> 41.680000 & 1.897553 versus baseline <Benchmark Cat Assign> Baseline 0.720000 <Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline <Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline <Benchmark Cat Assign> 0.610000 & 1.180328 versus baseline <Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline <Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline <Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline <Benchmark Cat Assign> 0.610000 & 1.180328 versus baseline <Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline <Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline DMD 0.160, Win32. That's a rather disturbing disparity, if you ask me. Now, what I didn't test but probably should have, was the effect of "pre-allocating" the array by setting the .length to a large value and then back to zero, expanding the behind-the-scenes capacity of the array. I'm betting in that case the IndexAssign would be the faster. -- Chris Nicholson-Sauls
Jun 08 2006
parent "Derek Parnell" <derek psych.ward> writes:
On Fri, 09 Jun 2006 05:33:33 +1000, Chris Nicholson-Sauls  
<ibisbasenji gmail.com> wrote:

 Now, what I didn't test but probably should have, was the effect of  
 "pre-allocating" the array by setting the .length to a large value and  
 then back to zero, expanding the behind-the-scenes capacity of the  
 array.   I'm betting in that case the IndexAssign would be the faster.

Not if you set it back to zero. If you do that, D also deallocates the RAM. Setting its length back to 1 however is okay it that the allocated RAM stays allocated to the array. This means that the first element is just a dummy to get around the 'bug'. -- Derek Parnell Melbourne, Australia
Jun 08 2006