digitalmars.D.learn - Array length & allocation question

Robert Atkinson (6/6) Jun 08 2006 Quick question concerning Array lengths and memory allocations.

BCS (13/25) Jun 08 2006 Even if the buffer is there I would think that it would be faster to do
Sean Kelly (5/12) Jun 08 2006 In most cases it's not worth it to try and maintain the buffer yourself....

Lars Ivar Igesund (7/26) Jun 08 2006 I think the "double-the-size-when-more-is-needed" strategy is used, and

Bruno Medeiros (6/29) Jun 11 2006 Hum, and happens when one shortens the length of the array? The Memory

Derek Parnell (7/9) Jun 11 2006 Yes. However there is a bug (oops - an issue) in which if the length is ...

Bruno Medeiros (5/16) Jun 12 2006 That makes perfect sense, why would it be a bug?

Oskar Linde (14/29) Jun 12 2006 I don't know if this is what Derek refers to, but it used to be
Derek Parnell (22/33) Jun 12 2006 Agreed, it is not a bug in the sense that it is contrary to specificatio...

Sean Kelly (3/43) Jun 12 2006 Perhaps D arrays simply need a reserve property?

Oskar Linde (14/15) Jun 12 2006 Something like this ought to work:

Oskar Linde (11/29) Jun 12 2006 t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

Derek Parnell (45/68) Jun 12 2006 Unfortunately this only appears to reserve the RAM, because the next cha...

Oskar Linde (51/121) Jun 13 2006 You are right, changing length forces a reallocation. Interestingly, the...

Derek Parnell (57/60) Jun 13 2006 Hmmm... I just rewrote that function as below and it seems to test out
Sean Kelly (6/96) Jun 13 2006 Hrm, there were some changes to gc.d a while back, but it was more than

Bruno Medeiros (12/23) Jun 13 2006 This is not safe to do. Currently in D null arrays and zero-length

Oskar Linde (56/77) Jun 13 2006 Yeah, I knew about that. I did mot mean to imply that D is flawless in

Bruno Medeiros (10/109) Jun 14 2006 Well, those new thing you mentioned are actually very related with

Dave (16/26) Jun 08 2006 Setting the array length does just that and nothing more or less. But

Chris Nicholson-Sauls (67/86) Jun 08 2006 So I did. :) My test program:

Derek Parnell (9/13) Jun 08 2006 Not if you set it back to zero. If you do that, D also deallocates the

Robert Atkinson <Robert_member pathlink.com> writes:

Quick question concerning Array lengths and memory allocations.

When an array.length = array.length + 1 (or length - 1) happens, does the system
only increase (decrease) the memory allocation by 1 [unit] or does it internally
mantain a buffer and try to minimise the resizing of the array?

I think I can remember seeing posts saying to maintain the buffer yourself and
other posts saying it was done automatically behind the scenes.

Jun 08 2006

BCS <BCS pathlink.com> writes:

Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?
 
 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.
 
 
 
 

Even if the buffer is there I would think that it would be faster to do 
it your self because you have more information to decide how to do it


char[] first = "foo bar"

func(first[0..3]);


char[] func(char[] inp)
{
		// first time around can't extend in place
		// logic to check this would be costly
	while(go())
		inp.length = inp.length+1;

	return inp;		
}

Jun 08 2006

Sean Kelly <sean f4.ca> writes:

Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?

The latter.

 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.

In most cases it's not worth it to try and maintain the buffer yourself. 
  At the very least, you should test both methods and see which is faster.


Sean

Jun 08 2006

Lars Ivar Igesund <larsivar igesund.net> writes:

Sean Kelly wrote:

 Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
 system only increase (decrease) the memory allocation by 1 [unit] or does
 it internally mantain a buffer and try to minimise the resizing of the
 array?

 
 The latter.
 
 I think I can remember seeing posts saying to maintain the buffer
 yourself and other posts saying it was done automatically behind the
 scenes.

 
 In most cases it's not worth it to try and maintain the buffer yourself.
   At the very least, you should test both methods and see which is faster.
 
 
 Sean

I think the "double-the-size-when-more-is-needed" strategy is used, and
afaik, it is the one that performs best in the general case.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource & #D: larsivi

Jun 08 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Lars Ivar Igesund wrote:
 Sean Kelly wrote:
 
 Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.

 When an array.length = array.length + 1 (or length - 1) happens, does the
 system only increase (decrease) the memory allocation by 1 [unit] or does
 it internally mantain a buffer and try to minimise the resizing of the
 array?

 The latter.

 I think I can remember seeing posts saying to maintain the buffer
 yourself and other posts saying it was done automatically behind the
 scenes.

 In most cases it's not worth it to try and maintain the buffer yourself.
   At the very least, you should test both methods and see which is faster.


 Sean

 
 I think the "double-the-size-when-more-is-needed" strategy is used, and
 afaik, it is the one that performs best in the general case.
 

Hum, and happens when one shortens the length of the array? The Memory 
Manager "back" buffer size remains the same?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 11 2006

"Derek Parnell" <derek psych.ward> writes:

On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros  
<brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The Memory  
 Manager "back" buffer size remains the same?

Yes. However there is a bug (oops - an issue) in which if the length is  
set to zero the RAM is released back to the the system.

-- 
Derek Parnell
Melbourne, Australia

Jun 11 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:
 
 Hum, and happens when one shortens the length of the array? The Memory 
 Manager "back" buffer size remains the same?

 
 Yes. However there is a bug (oops - an issue) in which if the length is 
 set to zero the RAM is released back to the the system.
 
 --Derek Parnell
 Melbourne, Australia

That makes perfect sense, why would it be a bug?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 12 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Bruno Medeiros skrev:
 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The 
 Memory Manager "back" buffer size remains the same?

 Yes. However there is a bug (oops - an issue) in which if the length 
 is set to zero the RAM is released back to the the system.

 --Derek Parnell
 Melbourne, Australia

 
 That makes perfect sense, why would it be a bug?
 

I don't know if this is what Derek refers to, but it used to be 
recommended practice to reserve space for an array by doing:

arr.length = 1024;
arr.length = 0;
(start filling arr with data)

I'm quite sure this used to be mentioned in the documentation, but I can 
no longer find any reference to it (except this old post: 
http://www.digitalmars.com/drn-bin/wwwnews?D/17691)

Today, I guess you should do the following instead:

arr.length = 1024;
arr = arr[0..0];
(start filling arr with data)

/Oskar

Jun 12 2006

"Derek Parnell" <derek psych.ward> writes:

On Tue, 13 Jun 2006 05:27:44 +1000, Bruno Medeiros  
<brunodomedeirosATgmail SPAM.com> wrote:

 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros  
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The Memory  
 Manager "back" buffer size remains the same?

  Yes. However there is a bug (oops - an issue) in which if the length  
 is set to zero the RAM is released back to the the system.
  --Derek Parnell
 Melbourne, Australia

 That makes perfect sense, why would it be a bug?

Agreed, it is not a bug in the sense that it is contrary to specifications  
because this behaviour isn't specified. However it does prevent a coder  
 from distinguishing between an empty array from a null array. An Empty one  
is an array that (no longer) has any elements and a null array is one that  
doesn't have any RAM to reference.

I sugest that Walter either document this functionality or fix it.

"When an array length is reduced the RAM it owns is not released and can  
be reused when the array subsequently is expanded (, unless the length is  
set to zero in which case the RAM is released). "

Setting the length to zero is a convenient way to reserved RAM for an  
array.

Also consider this ...

     foo("");

Now how can 'foo' be written to detect a coder's error of passing it an  
uninitialized array.

     char[] x;
     foo(x);


-- 
Derek Parnell
Melbourne, Australia

Jun 12 2006

Sean Kelly <sean f4.ca> writes:

Derek Parnell wrote:
 On Tue, 13 Jun 2006 05:27:44 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:
 
 Derek Parnell wrote:
 On Mon, 12 Jun 2006 09:11:04 +1000, Bruno Medeiros 
 <brunodomedeirosATgmail SPAM.com> wrote:

 Hum, and happens when one shortens the length of the array? The 
 Memory Manager "back" buffer size remains the same?

  Yes. However there is a bug (oops - an issue) in which if the length 
 is set to zero the RAM is released back to the the system.
  --Derek Parnell
 Melbourne, Australia

 That makes perfect sense, why would it be a bug?

 
 Agreed, it is not a bug in the sense that it is contrary to 
 specifications because this behaviour isn't specified. However it does 
 prevent a coder from distinguishing between an empty array from a null 
 array. An Empty one is an array that (no longer) has any elements and a 
 null array is one that doesn't have any RAM to reference.
 
 I sugest that Walter either document this functionality or fix it.
 
 "When an array length is reduced the RAM it owns is not released and can 
 be reused when the array subsequently is expanded (, unless the length 
 is set to zero in which case the RAM is released). "
 
 Setting the length to zero is a convenient way to reserved RAM for an 
 array.
 
 Also consider this ...
 
     foo("");
 
 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.
 
     char[] x;
     foo(x);

Perhaps D arrays simply need a reserve property?


Sean

Jun 12 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Sean Kelly skrev:

 Perhaps D arrays simply need a reserve property?

Something like this ought to work:

template reserve(ArrTy,IntTy) {
         void reserve(inout ArrTy a, IntTy size) {
                 if (size > a.length) {
                         size_t old_length = a.length;
                         a.length = size;
                         a = a[0..old_length];
                 }
         }
}


usage:

arr.reserve(1000);

/Oskar

Jun 12 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Derek Parnell skrev:

 I sugest that Walter either document this functionality or fix it.

I agree that it should be better documented.

 
 "When an array length is reduced the RAM it owns is not released and can 
 be reused when the array subsequently is expanded (, unless the length 
 is set to zero in which case the RAM is released). "
 
 Setting the length to zero is a convenient way to reserved RAM for an 
 array.


t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

 Also consider this ...
 
     foo("");
 
 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.
 
     char[] x;
     foo(x);
 

Like this:

void foo(char[] arr) {
	if (!arr)
		writefln("Uninitialized array passed");
	else if (arr.length == 0)
		writefln("Zero length array received");
}

/Oskar

Jun 12 2006

Derek Parnell <derek psych.ward> writes:

On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:

 Setting the length to zero is a convenient way to reserved RAM for an 
 array.

 
 t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

Unfortunately this only appears to reserve the RAM, because the next change
in length will cause a new allocation to be made. See the example program
below ...
 
 Also consider this ...
 
     foo("");
 
 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.
 
     char[] x;
     foo(x);
 

 
 Like this:
 
 void foo(char[] arr) {
 	if (!arr)
 		writefln("Uninitialized array passed");
 	else if (arr.length == 0)
 		writefln("Zero length array received");
 }

Yes, I can see that D can now distinguish between the two. This didn't used
to be the case, IIRC. However there is still a 'bug' with this as the
program here demonstrates...


 import std.stdio;
 void main()
 { 

    char[] arr;

    foo(arr);
    foo("");
    foo("".dup);

    writefln("%s %s", arr.length, arr.ptr);
    arr.length = 100;
    writefln("%s %s", arr.length, arr.ptr);
    arr = arr[0..0];
    writefln("%s %s", arr.length, arr.ptr);
    arr.length = 50;
    writefln("%s %s", arr.length, arr.ptr);
    arr.length = 500;
    writefln("%s %s", arr.length, arr.ptr);
 
 }

 void foo(char[] t)
 {
    writefln("foo: %s %s", t.length, t.ptr);
 }

The results are ...
foo: 0 0000
foo: 0 413080
foo: 0 0000  *** A 'dup'ed empty string is now a null string.
0 0000
100 8A2F00
0 8A2F00   *** RAM appears to be reserved.
50 8A1F80  *** But it is not as a new allocation just occurred.
500 8A3E00 *** This allocation is expected.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
13/06/2006 11:08:24 AM

Jun 12 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Derek Parnell skrev:
 On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:
 
 Setting the length to zero is a convenient way to reserved RAM for an 
 array.

 t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

 
 Unfortunately this only appears to reserve the RAM, because the next change
 in length will cause a new allocation to be made. See the example program
 below ...
  
 Also consider this ...

     foo("");

 Now how can 'foo' be written to detect a coder's error of passing it an 
 uninitialized array.

     char[] x;
     foo(x);

 Like this:

 void foo(char[] arr) {
 	if (!arr)
 		writefln("Uninitialized array passed");
 	else if (arr.length == 0)
 		writefln("Zero length array received");
 }

 
 Yes, I can see that D can now distinguish between the two. This didn't used
 to be the case, IIRC. However there is still a 'bug' with this as the
 program here demonstrates...
 
 
  import std.stdio;
  void main()
  { 
 
     char[] arr;
 
     foo(arr);
     foo("");
     foo("".dup);
 
     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 100;
     writefln("%s %s", arr.length, arr.ptr);
     arr = arr[0..0];
     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 50;
     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 500;
     writefln("%s %s", arr.length, arr.ptr);
  
  }
 
  void foo(char[] t)
  {
     writefln("foo: %s %s", t.length, t.ptr);
  }
 
 The results are ...
 foo: 0 0000
 foo: 0 413080
 foo: 0 0000  *** A 'dup'ed empty string is now a null string.
 0 0000
 100 8A2F00
 0 8A2F00   *** RAM appears to be reserved.
 50 8A1F80  *** But it is not as a new allocation just occurred.
 500 8A3E00 *** This allocation is expected.

You are right, changing length forces a reallocation. Interestingly, the 
following works:

arr.length = 100;
arr = arr[0..0];
writefln("%s %s",arr.length,arr.ptr);
for (int i = 0; i < 50; i++)
	arr ~= i;
writefln("%s %s",arr.length,arr.ptr);

prints (for me):

0 b7ee9e00
50 b7ee9e00

What is even more interesting is that the above "buggy" behavior seems 
intentional. The following patch removes the forced reallocation when 
changing length of a 0-length array:

--- gc.d.orig   2006-06-04 11:50:08.979945284 +0200
+++ gc.d        2006-06-13 09:19:02.135348959 +0200
   -382,8 +382,6   
         }
         //printf("newsize = %x, newlength = %x\n", newsize, newlength);

-       if (p.length)
-       {
             newdata = p.data;
             if (newlength > p.length)
             {
   -397,11 +395,6   
                 }
                 newdata[size .. newsize] = 0;
             }
-       }
-       else
-       {
-           newdata = cast(byte *)_gc.calloc(newsize + 1, 1);
-       }
      }
      else
      {


With this change, your above code prints:

$build -run ./arrtest ~/dmd/src/phobos/internal/gc/gc.d
Path and Version : build v2.9(1197)
   built on Thu Aug 11 16:07:55 2005
foo: 0 0
foo: 0 805765c
foo: 0 0
0 0
100 b7ee8e80
0 b7ee8e80    *** RAM is reserved
50 b7ee8e80   *** and is used
500 b7ee9e00  *** This causes reallocation as expected

I wonder why the code looks like it does...

/Oskar

Jun 13 2006

Derek Parnell <derek psych.ward> writes:

On Tue, 13 Jun 2006 09:24:34 +0200, Oskar Linde wrote:

 What is even more interesting is that the above "buggy" behavior
 seems intentional. The following patch removes the forced
 reallocation when changing length of a 0-length array:

Hmmm... I just rewrote that function as below and it seems to test out
quite well too. I incorporated your change plus I removed the check for a
zero new length. Seems to work without any problems.

-----------------
extern (C)
byte[] _d_arraysetlength(size_t newlength, size_t sizeelem, Array *p)
in
{
    assert(sizeelem);
    assert(!p.length || p.data);
}
body
{
    byte* newdata;
    newdata = p.data;
    if (newlength > p.length)
    {
        version (D_InlineAsm_X86)
        {
            size_t newsize = void;
            asm
            {
            mov EAX,newlength   ;
            mul EAX,sizeelem    ;
            mov newsize,EAX ;
            jc  Loverflow   ;
            }
        }
        else
        {
            size_t newsize = sizeelem * newlength;
            if (newsize / newlength != sizeelem)
            goto Loverflow;
        }
        size_t size = p.length * sizeelem;
        size_t cap = _gc.capacity(p.data);
        if (cap <= newsize)
        {
            newdata = cast(byte *)_gc.malloc(newsize + 1);
            newdata[0 .. size] = p.data[0 .. size];
        }
        newdata[size .. newsize] = 0;
    }
    p.data = newdata;
    p.length = newlength;
    return newdata[0 .. newlength];
Loverflow:
    _d_OutOfMemory();
}
---------------

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
13/06/2006 5:54:57 PM

Jun 13 2006

Sean Kelly <sean f4.ca> writes:

Oskar Linde wrote:
 Derek Parnell skrev:
 On Tue, 13 Jun 2006 01:05:04 +0200, Oskar Linde wrote:

 Setting the length to zero is a convenient way to reserved RAM for 
 an array.

 t arr.length = 100_000_000; arr = arr[0..0]; is almost as convenient.

 Unfortunately this only appears to reserve the RAM, because the next 
 change
 in length will cause a new allocation to be made. See the example program
 below ...
  
 Also consider this ...

     foo("");

 Now how can 'foo' be written to detect a coder's error of passing it 
 an uninitialized array.

     char[] x;
     foo(x);

 Like this:

 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }

 Yes, I can see that D can now distinguish between the two. This didn't 
 used
 to be the case, IIRC. However there is still a 'bug' with this as the
 program here demonstrates...


  import std.stdio;
  void main()
  {
     char[] arr;

     foo(arr);
     foo("");
     foo("".dup);

     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 100;
     writefln("%s %s", arr.length, arr.ptr);
     arr = arr[0..0];
     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 50;
     writefln("%s %s", arr.length, arr.ptr);
     arr.length = 500;
     writefln("%s %s", arr.length, arr.ptr);
  
  }

  void foo(char[] t)
  {
     writefln("foo: %s %s", t.length, t.ptr);
  }

 The results are ...
 foo: 0 0000
 foo: 0 413080
 foo: 0 0000  *** A 'dup'ed empty string is now a null string.
 0 0000
 100 8A2F00
 0 8A2F00   *** RAM appears to be reserved.
 50 8A1F80  *** But it is not as a new allocation just occurred.
 500 8A3E00 *** This allocation is expected.

 
 You are right, changing length forces a reallocation. Interestingly, the 
 following works:
 
 arr.length = 100;
 arr = arr[0..0];
 writefln("%s %s",arr.length,arr.ptr);
 for (int i = 0; i < 50; i++)
     arr ~= i;
 writefln("%s %s",arr.length,arr.ptr);
 
 prints (for me):
 
 0 b7ee9e00
 50 b7ee9e00
 
 What is even more interesting is that the above "buggy" behavior seems 
 intentional.

Hrm, there were some changes to gc.d a while back, but it was more than 
10 versions ago as that's as far back as I have installed at the moment. 
  Perhaps Walter could comment on the change?  I suspect it was probably 
a bug fix.


Sean

Jun 13 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Oskar Linde wrote:
 
 Like this:
 
 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }
 
 /Oskar

This is not safe to do. Currently in D null arrays and zero-length 
arrays are conceptually the same. It just so happens that sometimes the 
arr.ptr is null and sometimes not, depending on the previous operations.
The "A 'dup'ed empty string is now a null string." is an example of why 
that is not safe. I thought you knew this already? This is nothing new.

BTW, I do find it (at first sight at least) unnatural that a null array 
is the same as a zero-length arrays. It doesn't seem conceptually 
right/consistent.



-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 13 2006

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

Bruno Medeiros skrev:
 Oskar Linde wrote:
 Like this:

 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }

 /Oskar

 
 This is not safe to do. Currently in D null arrays and zero-length 
 arrays are conceptually the same. It just so happens that sometimes the 
 arr.ptr is null and sometimes not, depending on the previous operations.
 The "A 'dup'ed empty string is now a null string." is an example of why 
 that is not safe. I thought you knew this already? This is nothing new.

Yeah, I knew about that. I did mot mean to imply that D is flawless in 
this regard. The cases given were:

foo(""); and char[] s; foo(s);

And for those, the above function works. My only point, if I had one, 
was that there are differences between zero length arrays and null 
arrays in some cases in D.

 BTW, I do find it (at first sight at least) unnatural that a null array 
 is the same as a zero-length arrays. It doesn't seem conceptually 
 right/consistent.

In my view, D's dynamic arrays are quite different from a conceptually 
ideal array.

Conceptually, I see an array as an ordered collection of elements. The 
elements belong to (or are part of) the array.

One could imagine such arrays as both value and reference types. For a 
reference type ideal array, there has to be a clear difference between 
null and zero length. A value type ideal array on the other hand would 
not need one such distinction.

Another conceptual entity apart from an array is an array view. An array 
view refers to a selection of indices of another array. For example, a 
range of indices (aka a slice). An array view may or may not remain 
valid when the referred array changes.

D's dynamic array is quite far from my ideal array. Both its reference 
and its value version. A closer match is actually a by-value array slice.

Does it make sense for a by-value array slice type to discriminate 
between null and zero-length? I would say that it has its uses. For 
example, a regexp could match a zero length portion of a string. It is 
still important to know where in the string the match was made.

D's arrays have both the role of a non-reference array and of an array 
slice. In the role of an non-reference array, it makes sense that null 
is equivalent to zero-length. In the role of an array slice on the other 
hand, it does make sense to discriminate between zero length and null. 
There are other differences. Appending elements only makes sense to the 
array role, not the slice role. dup creates an array from a slice or an 
array. It therefore makes sense that dup returns null on zero length arrays.

The semantics of some operations depends on the role the array has. D 
has no way of knowing, so it guesses. Take that with a grain of salt, 
but operations on arrays depend on a runtime judgment by the gc.

Take the append operation. Appending elements to a D array that is in 
the array role makes sense and works like a charm. Appending elements to 
an array slice doesn't make any sense, but D will create a new array 
with copies of the elements the slice refers to and append the element 
to that array. The slice has been transformed into an array.

But how does D know when an array is in the slice role or the array 
role? It doesn't. Here is where the (educated) guess comes in. Any array 
that starts at the beginning of a gc chunk is assumed to be an array. 
Otherwise, it is assumed to be a slice. The implications are:

char[] mystr = "abcd".dup;
char[] slice1 = mystr[0..1];
char[] slice2 = mystr[1..2];
slice1 ~= "x"; // alters the original mystr
slice2 ~= "y"; // doesn't alter the original

I've written too much nonsense now. Some condensed conclusions:

- D's arrays have a schizophrenic nature (slice vs array)
- The compiler is unable to tell the difference and can't protect you 
against mistakes
- D arrays are not self documenting:

char[] foo(); // <- returns an array or a slice of someone else's array?

/Oskar

Jun 13 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Oskar Linde wrote:
 Bruno Medeiros skrev:
 Oskar Linde wrote:
 Like this:

 void foo(char[] arr) {
     if (!arr)
         writefln("Uninitialized array passed");
     else if (arr.length == 0)
         writefln("Zero length array received");
 }

 /Oskar

 This is not safe to do. Currently in D null arrays and zero-length 
 arrays are conceptually the same. It just so happens that sometimes 
 the arr.ptr is null and sometimes not, depending on the previous 
 operations.
 The "A 'dup'ed empty string is now a null string." is an example of 
 why that is not safe. I thought you knew this already? This is nothing 
 new.

 
 Yeah, I knew about that. I did mot mean to imply that D is flawless in 
 this regard. The cases given were:
 
 foo(""); and char[] s; foo(s);
 
 And for those, the above function works. My only point, if I had one, 
 was that there are differences between zero length arrays and null 
 arrays in some cases in D.
 
 BTW, I do find it (at first sight at least) unnatural that a null 
 array is the same as a zero-length arrays. It doesn't seem 
 conceptually right/consistent.

 
 In my view, D's dynamic arrays are quite different from a conceptually 
 ideal array.
 
 Conceptually, I see an array as an ordered collection of elements. The 
 elements belong to (or are part of) the array.
 
 One could imagine such arrays as both value and reference types. For a 
 reference type ideal array, there has to be a clear difference between 
 null and zero length. A value type ideal array on the other hand would 
 not need one such distinction.
 
 Another conceptual entity apart from an array is an array view. An array 
 view refers to a selection of indices of another array. For example, a 
 range of indices (aka a slice). An array view may or may not remain 
 valid when the referred array changes.
 
 D's dynamic array is quite far from my ideal array. Both its reference 
 and its value version. A closer match is actually a by-value array slice.
 
 Does it make sense for a by-value array slice type to discriminate 
 between null and zero-length? I would say that it has its uses. For 
 example, a regexp could match a zero length portion of a string. It is 
 still important to know where in the string the match was made.
 
 D's arrays have both the role of a non-reference array and of an array 
 slice. In the role of an non-reference array, it makes sense that null 
 is equivalent to zero-length. In the role of an array slice on the other 
 hand, it does make sense to discriminate between zero length and null. 
 There are other differences. Appending elements only makes sense to the 
 array role, not the slice role. dup creates an array from a slice or an 
 array. It therefore makes sense that dup returns null on zero length 
 arrays.
 
 The semantics of some operations depends on the role the array has. D 
 has no way of knowing, so it guesses. Take that with a grain of salt, 
 but operations on arrays depend on a runtime judgment by the gc.
 
 Take the append operation. Appending elements to a D array that is in 
 the array role makes sense and works like a charm. Appending elements to 
 an array slice doesn't make any sense, but D will create a new array 
 with copies of the elements the slice refers to and append the element 
 to that array. The slice has been transformed into an array.
 
 But how does D know when an array is in the slice role or the array 
 role? It doesn't. Here is where the (educated) guess comes in. Any array 
 that starts at the beginning of a gc chunk is assumed to be an array. 
 Otherwise, it is assumed to be a slice. The implications are:
 
 char[] mystr = "abcd".dup;
 char[] slice1 = mystr[0..1];
 char[] slice2 = mystr[1..2];
 slice1 ~= "x"; // alters the original mystr
 slice2 ~= "y"; // doesn't alter the original
 

Well, those new thing you mentioned are actually very related with 
ownership management, and reference/object immutibility, than to just 
arrays itself.


 I've written too much nonsense now. Some condensed conclusions:
 
 - D's arrays have a schizophrenic nature (slice vs array)
 - The compiler is unable to tell the difference and can't protect you 
 against mistakes
 - D arrays are not self documenting:
 
 char[] foo(); // <- returns an array or a slice of someone else's array?
 
 /Oskar

We have often mentioned the problems of arrays (both static and dynamic) 
before. It should be brought under discussion to the "general" D public 
eventually. (although for me preferably not soon, other things to take care)


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jun 14 2006

Dave <Dave_member pathlink.com> writes:

Robert Atkinson wrote:
 Quick question concerning Array lengths and memory allocations.
 
 When an array.length = array.length + 1 (or length - 1) happens, does the
system
 only increase (decrease) the memory allocation by 1 [unit] or does it
internally
 mantain a buffer and try to minimise the resizing of the array?
 
 I think I can remember seeing posts saying to maintain the buffer yourself and
 other posts saying it was done automatically behind the scenes.
 
 

Setting the array length does just that and nothing more or less. But 
using the the array concatenation operator (~) will preallocate some space.

time this:

     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr.length = arr.length + 1;
         arr[i] = i;
     }

vs this:

     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr ~= i;
     }

Jun 08 2006

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Dave wrote:
 Setting the array length does just that and nothing more or less. But 
 using the the array concatenation operator (~) will preallocate some space.
 
 time this:
 
     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr.length = arr.length + 1;
         arr[i] = i;
     }
 
 vs this:
 
     int[] arr;
     for(int i = 0; i < 1000000; i++)
     {
         arr ~= i;
     }

So I did.  :)  My test program:








































And my results, compiling with "-release -O -inline", were:
<Benchmark Index Assign> Baseline 79.090000
<Benchmark Index Assign> 43.830000 & 1.804472 versus baseline
<Benchmark Index Assign> 42.570000 & 1.857881 versus baseline
<Benchmark Index Assign> 42.560000 & 1.858318 versus baseline
<Benchmark Index Assign> 42.410000 & 1.864890 versus baseline
<Benchmark Index Assign> 41.680000 & 1.897553 versus baseline
<Benchmark Index Assign> 41.640000 & 1.899376 versus baseline
<Benchmark Index Assign> 41.580000 & 1.902116 versus baseline
<Benchmark Index Assign> 41.580000 & 1.902116 versus baseline
<Benchmark Index Assign> 41.680000 & 1.897553 versus baseline

<Benchmark Cat Assign> Baseline 0.720000
<Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline
<Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline
<Benchmark Cat Assign> 0.610000 & 1.180328 versus baseline
<Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline
<Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline
<Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline
<Benchmark Cat Assign> 0.610000 & 1.180328 versus baseline
<Benchmark Cat Assign> 0.600000 & 1.200000 versus baseline
<Benchmark Cat Assign> 0.550000 & 1.309091 versus baseline


DMD 0.160, Win32.
That's a rather disturbing disparity, if you ask me.  Now, what I didn't test
but probably 
should have, was the effect of "pre-allocating" the array by setting the
.length to a 
large value and then back to zero, expanding the behind-the-scenes capacity of
the array. 
  I'm betting in that case the IndexAssign would be the faster.

-- Chris Nicholson-Sauls

Jun 08 2006

"Derek Parnell" <derek psych.ward> writes:

On Fri, 09 Jun 2006 05:33:33 +1000, Chris Nicholson-Sauls  
<ibisbasenji gmail.com> wrote:

 Now, what I didn't test but probably should have, was the effect of  
 "pre-allocating" the array by setting the .length to a large value and  
 then back to zero, expanding the behind-the-scenes capacity of the  
 array.   I'm betting in that case the IndexAssign would be the faster.

Not if you set it back to zero. If you do that, D also deallocates the  
RAM. Setting its length back to 1 however is okay it that the allocated  
RAM stays allocated to the array. This means that the first element is  
just a dummy to get around the 'bug'.


-- 
Derek Parnell
Melbourne, Australia

Jun 08 2006

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Array length & allocation question