digitalmars.D.learn - why does array appending add a page size instead of doubling ?

timotheecour (24/24) Feb 02 2013 What's the rationale behind array appending adding a page size to

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (13/29) Feb 02 2013 It is not always the case. The page is added only if the next page

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (110/115) Feb 02 2013 I have tested this for D and C++ with the following programs. They count...

Chris Cain (5/16) Feb 02 2013 Okay, I've got a question for you Ali... why is it that sometimes

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/21) Feb 02 2013 We must be seeing the effects of how the GC works. Note that each line

monarch_dodra (10/15) Feb 03 2013 Note that dynamic arrays are generic containers, so they aren't

timotheecour (320/329) Feb 03 2013 modified a bit (see below) to include timing etc. I'm on OSX

monarch_dodra (6/18) Feb 03 2013 One more: Did you try using std.container.Array?

timotheecour (87/92) Feb 03 2013 it's a bit better (see below for timings), with T=22.628 sec

monarch_dodra (4/20) Feb 03 2013 Yeah: std.container.Array does not allow references to its
=?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/11) Feb 03 2013 I had heard about this years ago first on comp.lang.c++.moderated. I am

Steven Schveighoffer (9/33) Feb 03 2013 The performance increase is due to a) using malloc instead of GC

monarch_dodra (5/46) Feb 03 2013 True...

Steven Schveighoffer (9/12) Feb 03 2013 in fact, D's appender uses something more than doubling. I did not writ...

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (26/35) Feb 03 2013 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/li...

"timotheecour" <thelastmammoth gmail.com> writes:

What's the rationale behind array appending adding a page size to 
capacity instead of doubling, after the size of the array reaches 
a certain size (cf what std::vector does) ? That seems like an 
example of (evil?) early optimization.

----
T[]a;
int n=some big value;
foreach(i;0..n)
     a~=T.init;
----
Depending on T.sizeof or n, the code above will be linear 
complexity in n (while capacity remains below page size) or 
quadratic (while capacity increments are one page at a time).

The C++ version doesn't have this problem, as doubling is always 
the case (in most implementations at least).
----
std::vector<T>a;
int n=some big value;
for(int i=0;i<n;i++)
     a.push_back(T());
----

Note that we can't always use a.reserve(n) as in more complex 
examples, we don't know final size. And appender is more 
restrictive. Would love some explanations!

Feb 02 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/02/2013 06:22 PM, timotheecour wrote:
 What's the rationale behind array appending adding a page size to
 capacity

It is not always the case. The page is added only if the next page 
conveniently free. There is no cost for that "allocation".

 instead of doubling,

Rather, a better growth factor is 150%. It has been shown that 150% 
growth factor works much better with memory allocators.

 after the size of the array reaches a
 certain size (cf what std::vector does) ? That seems like an example of
 (evil?) early optimization.

Wouldn't you prefer simply increasing the allocation size without 
copying any elements? A growth strategy that always doubled would have 
to give up even if the next page has been free.

 ----
 T[]a;
 int n=some big value;
 foreach(i;0..n)
 a~=T.init;
 ----
 Depending on T.sizeof or n, the code above will be linear complexity in
 n (while capacity remains below page size) or quadratic (while capacity
 increments are one page at a time).

Have you actually tested the performance? You should not see linear 
complexity.

 Would love some explanations!

The best explanation is this document:

   http://dlang.org/d-array-article.html

Ali

Feb 02 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/02/2013 08:37 PM, Ali Çehreli wrote:
 On 02/02/2013 06:22 PM, timotheecour wrote:
  > What's the rationale behind array appending adding a page size to
  > capacity

 It is not always the case. The page is added only if the next page
 conveniently free. There is no cost for that "allocation".

I have tested this for D and C++ with the following programs. They count 
the number of times the underlying buffer gets reallocated and the 
number of elements that get moved as a result of that.

Here is the C++ program:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
     for (size_t maxElements = 1; maxElements <= 0x4000000; maxElements 
*= 2) {
         vector<int> s;
         size_t totalElementsMoved = 0;
         size_t totalAllocations = 0;

         for (size_t i = 0; i < maxElements; ++i) {
             int *oldPtr = s.empty() ? NULL : &s[0];
             s.push_back(0);

             if (&s[0] != oldPtr) {
                 totalElementsMoved += (s.size() - 1);
                 ++totalAllocations;
             }
         }

         cout << "Elements: " << maxElements
              << " Allocations: " << totalAllocations
              << " Moved: " << totalElementsMoved
              << '\n';
     }
}

It indeed shows amortized constant growth. The total number of elements 
that get moved is always one less than the eventual size of the array 
and the number of times of the allocations is predictable:

Elements: 1 Allocations: 1 Moved: 0
Elements: 2 Allocations: 2 Moved: 1
Elements: 4 Allocations: 3 Moved: 3
Elements: 8 Allocations: 4 Moved: 7
Elements: 16 Allocations: 5 Moved: 15
Elements: 32 Allocations: 6 Moved: 31
Elements: 64 Allocations: 7 Moved: 63
Elements: 128 Allocations: 8 Moved: 127
Elements: 256 Allocations: 9 Moved: 255
Elements: 512 Allocations: 10 Moved: 511
Elements: 1024 Allocations: 11 Moved: 1023
Elements: 2048 Allocations: 12 Moved: 2047
Elements: 4096 Allocations: 13 Moved: 4095
Elements: 8192 Allocations: 14 Moved: 8191
Elements: 16384 Allocations: 15 Moved: 16383
Elements: 32768 Allocations: 16 Moved: 32767
Elements: 65536 Allocations: 17 Moved: 65535
Elements: 131072 Allocations: 18 Moved: 131071
Elements: 262144 Allocations: 19 Moved: 262143
Elements: 524288 Allocations: 20 Moved: 524287
Elements: 1048576 Allocations: 21 Moved: 1048575
Elements: 2097152 Allocations: 22 Moved: 2097151
Elements: 4194304 Allocations: 23 Moved: 4194303
Elements: 8388608 Allocations: 24 Moved: 8388607
Elements: 16777216 Allocations: 25 Moved: 16777215
Elements: 33554432 Allocations: 26 Moved: 33554431
Elements: 67108864 Allocations: 27 Moved: 67108863

Here is the D program:

import std.stdio;

void main()
{
     for (auto maxElements = 1; maxElements <= 0x400_0000; maxElements 
*= 2) {
         int[] s;
         auto totalElementsMoved = 0;
         auto totalAllocations = 0;

         foreach (i; 0 .. maxElements) {
             auto oldPtr = s.ptr;
             s ~= 0;

             if (s.ptr != oldPtr) {
                 totalElementsMoved += (s.length - 1);
                 ++totalAllocations;
             }
         }

         writefln("Elements: %s, Allocations: %s, Moved: %s",
                  maxElements, totalAllocations, totalElementsMoved);
     }
}

Its output is different in that the number of allocations is less than 
C++ but the total number of elements that get moved are more:

Elements: 1, Allocations: 1, Moved: 0
Elements: 2, Allocations: 1, Moved: 0
Elements: 4, Allocations: 2, Moved: 3
Elements: 8, Allocations: 3, Moved: 10
Elements: 16, Allocations: 4, Moved: 25
Elements: 32, Allocations: 5, Moved: 56
Elements: 64, Allocations: 6, Moved: 119
Elements: 128, Allocations: 7, Moved: 246
Elements: 256, Allocations: 8, Moved: 501
Elements: 512, Allocations: 9, Moved: 1012
Elements: 1024, Allocations: 9, Moved: 1012
Elements: 2048, Allocations: 9, Moved: 1012
Elements: 4096, Allocations: 9, Moved: 1012
Elements: 8192, Allocations: 9, Moved: 1012
Elements: 16384, Allocations: 9, Moved: 1012
Elements: 32768, Allocations: 9, Moved: 1012
Elements: 65536, Allocations: 9, Moved: 1012
Elements: 131072, Allocations: 10, Moved: 28655
Elements: 262144, Allocations: 9, Moved: 1012
Elements: 524288, Allocations: 12, Moved: 435173
Elements: 1048576, Allocations: 13, Moved: 1431520
Elements: 2097152, Allocations: 11, Moved: 1993706
Elements: 4194304, Allocations: 13, Moved: 5877728
Elements: 8388608, Allocations: 14, Moved: 14838747
Elements: 16777216, Allocations: 12, Moved: 30438373
Elements: 33554432, Allocations: 12, Moved: 59881445
Elements: 67108864, Allocations: 12, Moved: 109627365

Ali

Feb 02 2013

"Chris Cain" <clcain uncg.edu> writes:

Okay, I've got a question for you Ali... why is it that sometimes 
as Elements increased, there were a _drop_ of allocations?

So, these in particular:

On Sunday, 3 February 2013 at 06:02:04 UTC, Ali Çehreli wrote:
 Elements: 65536, Allocations: 9, Moved: 1012
 Elements: 131072, Allocations: 10, Moved: 28655
 [[ Elements: 262144, Allocations: 9, Moved: 1012 ]]
 Elements: 524288, Allocations: 12, Moved: 435173
 Elements: 1048576, Allocations: 13, Moved: 1431520
 [[ Elements: 2097152, Allocations: 11, Moved: 1993706 ]]
 Elements: 4194304, Allocations: 13, Moved: 5877728
 Elements: 8388608, Allocations: 14, Moved: 14838747
 [[ Elements: 16777216, Allocations: 12, Moved: 30438373 ]]
 Elements: 33554432, Allocations: 12, Moved: 59881445
 Elements: 67108864, Allocations: 12, Moved: 109627365

That's quite an oddity to me, especially looking at the code.

Feb 02 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/02/2013 10:48 PM, Chris Cain wrote:
 Okay, I've got a question for you Ali... why is it that sometimes as
 Elements increased, there were a _drop_ of allocations?

 So, these in particular:

 On Sunday, 3 February 2013 at 06:02:04 UTC, Ali Çehreli wrote:
 Elements: 65536, Allocations: 9, Moved: 1012
 Elements: 131072, Allocations: 10, Moved: 28655
 [[ Elements: 262144, Allocations: 9, Moved: 1012 ]]
 Elements: 524288, Allocations: 12, Moved: 435173
 Elements: 1048576, Allocations: 13, Moved: 1431520
 [[ Elements: 2097152, Allocations: 11, Moved: 1993706 ]]
 Elements: 4194304, Allocations: 13, Moved: 5877728
 Elements: 8388608, Allocations: 14, Moved: 14838747
 [[ Elements: 16777216, Allocations: 12, Moved: 30438373 ]]
 Elements: 33554432, Allocations: 12, Moved: 59881445
 Elements: 67108864, Allocations: 12, Moved: 109627365

 That's quite an oddity to me, especially looking at the code.

We must be seeing the effects of how the GC works. Note that each line 
is printed for a different slice 's'. My guess is that the GC runs a 
collection right before the observed drop and the next array gets lucky.

Ali

Feb 02 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 3 February 2013 at 06:02:04 UTC, Ali Çehreli wrote:
 Its output is different in that the number of allocations is 
 less than C++ but the total number of elements that get moved 
 are more:

 [SNIP]

 Ali

Note that dynamic arrays are generic containers, so they aren't 
exactly optimized for anything. You could try that test with the 
"made for appending" std.array.appender: It always tries to 
extend without reallocating, and only relocates when the current 
memory segment is 100% full.

Furthermore, when it *does* relocate, it use D-style memmove 
without calling postblit. Independent of language, and with the 
requirement of contiguous memory, I can't really think of a 
scheme that could be better than that...?

Feb 03 2013

"timotheecour" <thelastmammoth gmail.com> writes:

 Note that dynamic arrays are generic containers, so they aren't 
 exactly optimized for anything. You could try that test with 
 the "made for appending" std.array.appender: It always tries to 
 extend without reallocating, and only relocates when the 
 current memory segment is 100% full.

 Furthermore, when it *does* relocate, it use D-style memmove 
 without calling postblit. Independent of language, and with the 
 requirement of contiguous memory, I can't really think of a 
 scheme that could be better than that...?


modified a bit (see below) to include timing etc. I'm on OSX 
(16GB ram, core i7 2.3GHz).
dmd is slow even with optimizations so I used ldc:

ldmd2 -inline -O -release -noboundscheck  (similar results with 
ldc2 -O3 -release)
for C++: g++ -O2

It seems that:
* the number of moves is up to 2x (resp. 1.8x) as the C++ version 
for T[] (resp. appender)
* the C++ version is 7x (resp 1.8x) times faster .
* it seems even the appender version grows super-linearly whereas 
C++ version stays roughly linear in that regime in terms of time.

Based on this, I'm curious whether the current growing scheme 
could be improved, or be closer to the simple doubling scheme 
used in C++.

T[]: 94 s
appender!(T[]): 26.130
std::vector<T> (C++): 14.635s



D version with T[]:
Elements[0]: 1, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[1]: 2, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[2]: 4, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[3]: 8, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[4]: 16, Allocations: 2, Moved: 15; Moved/Elements = 
0.9375; time/Elements=0
Elements[5]: 32, Allocations: 3, Moved: 46; Moved/Elements = 
1.4375; time/Elements=0
Elements[6]: 64, Allocations: 4, Moved: 109; Moved/Elements = 
1.70312; time/Elements=0
Elements[7]: 128, Allocations: 5, Moved: 236; Moved/Elements = 
1.84375; time/Elements=0
Elements[8]: 256, Allocations: 6, Moved: 491; Moved/Elements = 
1.91797; time/Elements=0
Elements[9]: 512, Allocations: 7, Moved: 1001; Moved/Elements = 
1.95508; time/Elements=0
Elements[10]: 1024, Allocations: 8, Moved: 2023; Moved/Elements = 
1.97559; time/Elements=0
Elements[11]: 2048, Allocations: 9, Moved: 4069; Moved/Elements = 
1.98682; time/Elements=0
Elements[12]: 4096, Allocations: 9, Moved: 4069; Moved/Elements = 
0.993408; time/Elements=0
Elements[13]: 8192, Allocations: 9, Moved: 4069; Moved/Elements = 
0.496704; time/Elements=0
Elements[14]: 16384, Allocations: 9, Moved: 4069; Moved/Elements 
= 0.248352; time/Elements=0
Elements[15]: 32768, Allocations: 9, Moved: 4069; Moved/Elements 
= 0.124176; time/Elements=0
Elements[16]: 65536, Allocations: 9, Moved: 4069; Moved/Elements 
= 0.062088; time/Elements=1.52588e-05
Elements[17]: 131072, Allocations: 9, Moved: 4069; Moved/Elements 
= 0.031044; time/Elements=1.52588e-05
Elements[18]: 262144, Allocations: 9, Moved: 4069; Moved/Elements 
= 0.015522; time/Elements=2.28882e-05
Elements[19]: 524288, Allocations: 10, Moved: 114644; 
Moved/Elements = 0.218666; time/Elements=2.09808e-05
Elements[20]: 1048576, Allocations: 11, Moved: 745411; 
Moved/Elements = 0.710879; time/Elements=2.19345e-05
Elements[21]: 2097152, Allocations: 12, Moved: 2342834; 
Moved/Elements = 1.11715; time/Elements=2.24113e-05
Elements[22]: 4194304, Allocations: 13, Moved: 7344033; 
Moved/Elements = 1.75095; time/Elements=2.19345e-05
Elements[23]: 8388608, Allocations: 13, Moved: 10469281; 
Moved/Elements = 1.24804; time/Elements=2.18153e-05
Elements[24]: 16777216, Allocations: 13, Moved: 17981345; 
Moved/Elements = 1.07177; time/Elements=2.15173e-05
Elements[25]: 33554432, Allocations: 14, Moved: 59359120; 
Moved/Elements = 1.76904; time/Elements=2.18451e-05
Elements[26]: 67108864, Allocations: 14, Moved: 127885200; 
Moved/Elements = 1.90564; time/Elements=2.17557e-05
Elements[27]: 134217728, Allocations: 15, Moved: 196579199; 
Moved/Elements = 1.46463; time/Elements=2.16886e-05
Elements[28]: 268435456, Allocations: 14, Moved: 408059792; 
Moved/Elements = 1.52014; time/Elements=2.19792e-05
Elements[29]: 536870912, Allocations: 14, Moved: 1094434704; 
Moved/Elements = 2.03854; time/Elements=2.21152e-05
Elements[30]: 1073741824, Allocations: 14, Moved: 1874734992; 
Moved/Elements = 1.74598; time/Elements=2.17576e-05
Elements[31]: 2147483648, Allocations: 13, Moved: 2723954593; 
Moved/Elements = 1.26844; time/Elements=2.20812e-05
./test_010  91.65s user 2.97s system 99% cpu 1:34.79 total

D version with appender:
Elements[0]: 1, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[1]: 2, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[2]: 4, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[3]: 8, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[4]: 16, Allocations: 1, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[5]: 32, Allocations: 2, Moved: 16; Moved/Elements = 0.5; 
time/Elements=0
Elements[6]: 64, Allocations: 2, Moved: 16; Moved/Elements = 
0.25; time/Elements=0
Elements[7]: 128, Allocations: 3, Moved: 80; Moved/Elements = 
0.625; time/Elements=0
Elements[8]: 256, Allocations: 3, Moved: 80; Moved/Elements = 
0.3125; time/Elements=0
Elements[9]: 512, Allocations: 4, Moved: 336; Moved/Elements = 
0.65625; time/Elements=0
Elements[10]: 1024, Allocations: 4, Moved: 336; Moved/Elements = 
0.328125; time/Elements=0
Elements[11]: 2048, Allocations: 5, Moved: 1360; Moved/Elements = 
0.664062; time/Elements=0
Elements[12]: 4096, Allocations: 6, Moved: 3408; Moved/Elements = 
0.832031; time/Elements=0
Elements[13]: 8192, Allocations: 6, Moved: 3408; Moved/Elements = 
0.416016; time/Elements=0
Elements[14]: 16384, Allocations: 6, Moved: 3408; Moved/Elements 
= 0.208008; time/Elements=0
Elements[15]: 32768, Allocations: 6, Moved: 3408; Moved/Elements 
= 0.104004; time/Elements=0
Elements[16]: 65536, Allocations: 6, Moved: 3408; Moved/Elements 
= 0.052002; time/Elements=0
Elements[17]: 131072, Allocations: 6, Moved: 3408; Moved/Elements 
= 0.026001; time/Elements=0
Elements[18]: 262144, Allocations: 6, Moved: 3408; Moved/Elements 
= 0.0130005; time/Elements=3.8147e-06
Elements[19]: 524288, Allocations: 7, Moved: 462160; 
Moved/Elements = 0.8815; time/Elements=3.8147e-06
Elements[20]: 1048576, Allocations: 7, Moved: 527696; 
Moved/Elements = 0.50325; time/Elements=4.76837e-06
Elements[21]: 2097152, Allocations: 9, Moved: 3083600; 
Moved/Elements = 1.47038; time/Elements=5.24521e-06
Elements[22]: 4194304, Allocations: 9, Moved: 3304784; 
Moved/Elements = 0.787922; time/Elements=5.00679e-06
Elements[23]: 8388608, Allocations: 9, Moved: 6622544; 
Moved/Elements = 0.789469; time/Elements=5.36442e-06
Elements[24]: 16777216, Allocations: 10, Moved: 14069072; 
Moved/Elements = 0.838582; time/Elements=5.36442e-06
Elements[25]: 33554432, Allocations: 10, Moved: 29244752; 
Moved/Elements = 0.871562; time/Elements=5.42402e-06
Elements[26]: 67108864, Allocations: 10, Moved: 57773392; 
Moved/Elements = 0.860891; time/Elements=5.39422e-06
Elements[27]: 134217728, Allocations: 11, Moved: 241548624; 
Moved/Elements = 1.79968; time/Elements=5.93811e-06
Elements[28]: 268435456, Allocations: 10, Moved: 478076240; 
Moved/Elements = 1.78097; time/Elements=5.68479e-06
Elements[29]: 536870912, Allocations: 9, Moved: 773229904; 
Moved/Elements = 1.44025; time/Elements=5.64754e-06
Elements[30]: 1073741824, Allocations: 9, Moved: 1165659472; 
Moved/Elements = 1.0856; time/Elements=5.46221e-06
Elements[31]: 2147483648, Allocations: 10, Moved: 3299478864; 
Moved/Elements = 1.53644; time/Elements=6.32741e-06
./test_010  22.63s user 3.31s system 99% cpu 26.130 total


C++ version with std::vector:
g++ -O2 test8.cc -o test && time ./test 	
Elements[0] : 1 Allocations: 1 Moved: 0 Moved/Elements: 0 
time/Elements: 0
Elements[1] : 2 Allocations: 2 Moved: 1 Moved/Elements: 0.5 
time/Elements: 0
Elements[2] : 4 Allocations: 3 Moved: 3 Moved/Elements: 0.75 
time/Elements: 0
Elements[3] : 8 Allocations: 4 Moved: 7 Moved/Elements: 0.875 
time/Elements: 0
Elements[4] : 16 Allocations: 5 Moved: 15 Moved/Elements: 0.9375 
time/Elements: 0
Elements[5] : 32 Allocations: 6 Moved: 31 Moved/Elements: 0.96875 
time/Elements: 0
Elements[6] : 64 Allocations: 7 Moved: 63 Moved/Elements: 
0.984375 time/Elements: 0
Elements[7] : 128 Allocations: 8 Moved: 127 Moved/Elements: 
0.992188 time/Elements: 0
Elements[8] : 256 Allocations: 9 Moved: 255 Moved/Elements: 
0.996094 time/Elements: 0
Elements[9] : 512 Allocations: 10 Moved: 511 Moved/Elements: 
0.998047 time/Elements: 0
Elements[10] : 1024 Allocations: 11 Moved: 1023 Moved/Elements: 
0.999023 time/Elements: 0
Elements[11] : 2048 Allocations: 12 Moved: 2047 Moved/Elements: 
0.999512 time/Elements: 0
Elements[12] : 4096 Allocations: 13 Moved: 4095 Moved/Elements: 
0.999756 time/Elements: 0
Elements[13] : 8192 Allocations: 14 Moved: 8191 Moved/Elements: 
0.999878 time/Elements: 0
Elements[14] : 16384 Allocations: 15 Moved: 16383 Moved/Elements: 
0.999939 time/Elements: 0
Elements[15] : 32768 Allocations: 16 Moved: 32767 Moved/Elements: 
0.999969 time/Elements: 0
Elements[16] : 65536 Allocations: 17 Moved: 65535 Moved/Elements: 
0.999985 time/Elements: 0
Elements[17] : 131072 Allocations: 18 Moved: 131071 
Moved/Elements: 0.999992 time/Elements: 0
Elements[18] : 262144 Allocations: 19 Moved: 262143 
Moved/Elements: 0.999996 time/Elements: 3.8147e-06
Elements[19] : 524288 Allocations: 20 Moved: 524287 
Moved/Elements: 0.999998 time/Elements: 3.8147e-06
Elements[20] : 1048576 Allocations: 21 Moved: 1048575 
Moved/Elements: 0.999999 time/Elements: 2.86102e-06
Elements[21] : 2097152 Allocations: 22 Moved: 2097151 
Moved/Elements: 1 time/Elements: 2.86102e-06
Elements[22] : 4194304 Allocations: 23 Moved: 4194303 
Moved/Elements: 1 time/Elements: 2.86102e-06
Elements[23] : 8388608 Allocations: 24 Moved: 8388607 
Moved/Elements: 1 time/Elements: 3.09944e-06
Elements[24] : 16777216 Allocations: 25 Moved: 16777215 
Moved/Elements: 1 time/Elements: 3.03984e-06
Elements[25] : 33554432 Allocations: 26 Moved: 33554431 
Moved/Elements: 1 time/Elements: 3.15905e-06
Elements[26] : 67108864 Allocations: 27 Moved: 67108863 
Moved/Elements: 1 time/Elements: 3.08454e-06
Elements[27] : 134217728 Allocations: 28 Moved: 134217727 
Moved/Elements: 1 time/Elements: 3.09944e-06
Elements[28] : 268435456 Allocations: 29 Moved: 268435455 
Moved/Elements: 1 time/Elements: 3.24473e-06
Elements[29] : 536870912 Allocations: 30 Moved: 536870911 
Moved/Elements: 1 time/Elements: 3.32668e-06
Elements[30] : 1073741824 Allocations: 31 Moved: 1073741823 
Moved/Elements: 1 time/Elements: 3.34531e-06
Elements[31] : 2147483648 Allocations: 32 Moved: 2147483647 
Moved/Elements: 1 time/Elements: 3.36301e-06

real	0m14.635s
user	0m11.665s
sys	0m2.962s



----
import std.stdio;
import std.datetime;

import std.array;


static private StopWatch sw;//TEMP:move?
auto TIC(){
   sw.reset();
   sw.start();
}
auto TOC(){
   sw.stop();
   return sw.peek().msecs;
}

void main(){
   enum N=32;
   size_t maxElements=1;
   alias ubyte T;
    for (size_t iter=0;iter<N;iter++ , maxElements*=2) {
     //if(iter<N-1) continue;


        size_t totalElementsMoved = 0;
        size_t totalAllocations = 0;
        TIC();
        version(none){
        T[] s;
        foreach (i; 0 .. maxElements) {
            auto oldPtr = s.ptr;
            s ~= 0;
            if (s.ptr != oldPtr) {
                totalElementsMoved += (s.length - 1);
                ++totalAllocations;
            }
        }
      }
      else{
        auto s = appender!(T[])();
        foreach (i; 0 .. maxElements) {
            auto oldPtr = s.data.ptr;
            s.put(cast(T)0);
            if (s.data.ptr != oldPtr) {
                totalElementsMoved += (s.data.length - 1);
                ++totalAllocations;
            }
        }
      }
        auto t=TOC();
        writefln("Elements[%s]: %s, Allocations: %s, Moved: %s; 
Moved/Elements = %s; time/Elements=%s",
         iter, maxElements, totalAllocations, totalElementsMoved, 
totalElementsMoved/cast(double)(maxElements), 
t/cast(double)(maxElements));
    }
}
----

----
#include <iostream>
#include <vector>

#include <sys/time.h>
#include <stdio.h>
#include <unistd.h>

using namespace std;

struct timeval start, end;
     long mtime, seconds, useconds;

void tic(){
     gettimeofday(&start, NULL);
   }
long toc(){
     gettimeofday(&end, NULL);

     seconds  = end.tv_sec  - start.tv_sec;
     useconds = end.tv_usec - start.tv_usec;

     mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
     return mtime;
}


int main(){
   int N=32;
   size_t maxElements=1;
   typedef unsigned char T;
    for (size_t iter=0;iter<N;iter++ , maxElements*=2) {
        vector<T> s;
        size_t totalElementsMoved = 0;
        size_t totalAllocations = 0;

tic();
        for (size_t i = 0; i < maxElements; ++i) {
            T *oldPtr = s.empty() ? NULL : &s[0];
            s.push_back(0);

            if (&s[0] != oldPtr) {
                totalElementsMoved += (s.size() - 1);
                ++totalAllocations;
            }
        }
long t=toc();

        cout << "Elements[" << iter << "] : " << maxElements
             << " Allocations: " << totalAllocations
             << " Moved: " << totalElementsMoved
             << " Moved/Elements: " << 
totalElementsMoved/static_cast<double>(maxElements)
             << " time/Elements: " << 
t/static_cast<double>(maxElements)
             << '\n';
    }
}
----

Feb 03 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 3 February 2013 at 08:53:11 UTC, timotheecour wrote:
 Note that dynamic arrays are generic containers, so they 
 aren't exactly optimized for anything. You could try that test 
 with the "made for appending" std.array.appender: It always 
 tries to extend without reallocating, and only relocates when 
 the current memory segment is 100% full.

 Furthermore, when it *does* relocate, it use D-style memmove 
 without calling postblit. Independent of language, and with 
 the requirement of contiguous memory, I can't really think of 
 a scheme that could be better than that...?


 modified a bit (see below) to include timing etc. I'm on OSX 
 (16GB ram, core i7 2.3GHz).

One more: Did you try using std.container.Array?

Even if appender is more optimized than naked array appending, it 
still has to work with the underlying system. std.container.Array 
should be able to do just as well as vector, or better (D move 
semnantics).

Feb 03 2013

"timotheecour" <thelastmammoth gmail.com> writes:

 One more: Did you try using std.container.Array?

 Even if appender is more optimized than naked array appending, 
 it still has to work with the underlying system. 
 std.container.Array should be able to do just as well as 
 vector, or better (D move semnantics).

it's a bit better (see below for timings), with T=22.628 sec 
total vs 14s for C++'s std::vector.

Note, I'm not sure how to get address of 1st element in an 
std.container.Array (is it even possible??? seems like an 
overreaching limitation for the sake of safety in a system 
language) so I don't know how to tell the reallocations. But I 
guess the scheme is clear (reserve(1+3/2*capacity)). In any case, 
there's still a noticable gap with C++. Also, I'm curious whether 
you have a reference explaining why 3/2 works better than 2x (as 
in std::vector) in resizing?

Elements[0]: 1, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[1]: 2, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[2]: 4, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[3]: 8, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[4]: 16, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[5]: 32, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[6]: 64, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[7]: 128, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[8]: 256, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[9]: 512, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[10]: 1024, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[11]: 2048, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[12]: 4096, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[13]: 8192, Allocations: 0, Moved: 0; Moved/Elements = 0; 
time/Elements=0
Elements[14]: 16384, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=0
Elements[15]: 32768, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=0
Elements[16]: 65536, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=0
Elements[17]: 131072, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=0
Elements[18]: 262144, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=3.8147e-06
Elements[19]: 524288, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=5.72205e-06
Elements[20]: 1048576, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=4.76837e-06
Elements[21]: 2097152, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=4.76837e-06
Elements[22]: 4194304, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=5.00679e-06
Elements[23]: 8388608, Allocations: 0, Moved: 0; Moved/Elements = 
0; time/Elements=4.88758e-06
Elements[24]: 16777216, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=4.88758e-06
Elements[25]: 33554432, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=5.00679e-06
Elements[26]: 67108864, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=4.97699e-06
Elements[27]: 134217728, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=4.92483e-06
Elements[28]: 268435456, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=4.97699e-06
Elements[29]: 536870912, Allocations: 0, Moved: 0; Moved/Elements 
= 0; time/Elements=4.93042e-06
Elements[30]: 1073741824, Allocations: 0, Moved: 0; 
Moved/Elements = 0; time/Elements=4.94439e-06
Elements[31]: 2147483648, Allocations: 0, Moved: 0; 
Moved/Elements = 0; time/Elements=5.16605e-06
./test_010  20.71s user 1.89s system 99% cpu 22.628 total


----
			auto s=Array!T();
			foreach (i; 0 .. maxElements) {
				auto oldPtr = &(s.front); //NOTE: incorrect, doesn't take 
address of returned element but of the function...
				s.insertBack(cast(T)0);
				if (&(s.front) != oldPtr) {
					totalElementsMoved += (s.length - 1);
					++totalAllocations;
				}
			}
----

Feb 03 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 3 February 2013 at 10:32:10 UTC, timotheecour wrote:
 One more: Did you try using std.container.Array?

 Even if appender is more optimized than naked array appending, 
 it still has to work with the underlying system. 
 std.container.Array should be able to do just as well as 
 vector, or better (D move semnantics).

 it's a bit better (see below for timings), with T=22.628 sec 
 total vs 14s for C++'s std::vector.

 Note, I'm not sure how to get address of 1st element in an 
 std.container.Array (is it even possible??? seems like an 
 overreaching limitation for the sake of safety in a system 
 language) so I don't know how to tell the reallocations. But I 
 guess the scheme is clear (reserve(1+3/2*capacity)). In any 
 case, there's still a noticable gap with C++. Also, I'm curious 
 whether you have a reference explaining why 3/2 works better 
 than 2x (as in std::vector) in resizing?

Yeah: std.container.Array does not allow references to its 
internals. D's design is more of a safety first, compared to 
C++'s performance first.

Feb 03 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/03/2013 02:32 AM, timotheecour wrote:

 I'm curious whether you have a reference explaining why
 3/2 works better than 2x (as in std::vector) in resizing?

I had heard about this years ago first on comp.lang.c++.moderated. I am 
pretty sure it is common practice in modern libraries.

This may be the thread:

 
https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.c++.moderated/asH_VojWKJw/discussion[1-25-false]

Scott Meyers refers to an older thread there. So it may have been that 
one. But it doesn't matter, there are articles online now. :)

Ali

Feb 03 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 03 Feb 2013 03:53:10 -0500, timotheecour  
<thelastmammoth gmail.com> wrote:

 Note that dynamic arrays are generic containers, so they aren't exactly  
 optimized for anything. You could try that test with the "made for  
 appending" std.array.appender: It always tries to extend without  
 reallocating, and only relocates when the current memory segment is  
 100% full.

 Furthermore, when it *does* relocate, it use D-style memmove without  
 calling postblit. Independent of language, and with the requirement of  
 contiguous memory, I can't really think of a scheme that could be  
 better than that...?


 modified a bit (see below) to include timing etc. I'm on OSX (16GB ram,  
 core i7 2.3GHz).
 dmd is slow even with optimizations so I used ldc:

 ldmd2 -inline -O -release -noboundscheck  (similar results with ldc2 -O3  
 -release)
 for C++: g++ -O2

 It seems that:
 * the number of moves is up to 2x (resp. 1.8x) as the C++ version for  
 T[] (resp. appender)
 * the C++ version is 7x (resp 1.8x) times faster .
 * it seems even the appender version grows super-linearly whereas C++  
 version stays roughly linear in that regime in terms of time.

 Based on this, I'm curious whether the current growing scheme could be  
 improved, or be closer to the simple doubling scheme used in C++.

The performance increase is due to a) using malloc instead of GC  
allocation, which would run quite a few collection cycles while  
allocating.  If you profile the D code, you will see this.  b) vector is  
FREEING its previously used array space, D array appending is relying on  
the GC to free that space.  The consumed D space will grow more rapidly.

You are essentially comparing apples to oranges here.

-Steve

Feb 03 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 3 February 2013 at 11:47:21 UTC, Steven Schveighoffer 
wrote:
 On Sun, 03 Feb 2013 03:53:10 -0500, timotheecour 
 <thelastmammoth gmail.com> wrote:

 Note that dynamic arrays are generic containers, so they 
 aren't exactly optimized for anything. You could try that 
 test with the "made for appending" std.array.appender: It 
 always tries to extend without reallocating, and only 
 relocates when the current memory segment is 100% full.

 Furthermore, when it *does* relocate, it use D-style memmove 
 without calling postblit. Independent of language, and with 
 the requirement of contiguous memory, I can't really think of 
 a scheme that could be better than that...?


 modified a bit (see below) to include timing etc. I'm on OSX 
 (16GB ram, core i7 2.3GHz).
 dmd is slow even with optimizations so I used ldc:

 ldmd2 -inline -O -release -noboundscheck  (similar results 
 with ldc2 -O3 -release)
 for C++: g++ -O2

 It seems that:
 * the number of moves is up to 2x (resp. 1.8x) as the C++ 
 version for T[] (resp. appender)
 * the C++ version is 7x (resp 1.8x) times faster .
 * it seems even the appender version grows super-linearly 
 whereas C++ version stays roughly linear in that regime in 
 terms of time.

 Based on this, I'm curious whether the current growing scheme 
 could be improved, or be closer to the simple doubling scheme 
 used in C++.

 The performance increase is due to a) using malloc instead of 
 GC allocation, which would run quite a few collection cycles 
 while allocating.  If you profile the D code, you will see 
 this.  b) vector is FREEING its previously used array space, D 
 array appending is relying on the GC to free that space.  The 
 consumed D space will grow more rapidly.

 You are essentially comparing apples to oranges here.

 -Steve

True...

But not true for std.container.Array though. Array uses malloc. 
That's an apples to apples comparison right there.

Feb 03 2013

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 02 Feb 2013 23:37:10 -0500, Ali =C3=87ehreli <acehreli yahoo.com=
 wrote:

 Rather, a better growth factor is 150%. It has been shown that 150%  =

 growth factor works much better with memory allocators.

in fact, D's appender uses something more than doubling.  I did not writ=
e  =

that part (though I did massage it for optimizations), so I don't know t=
he  =

rationale behind it or how it is designed to work, but the code is here:=


https://github.com/D-Programming-Language/druntime/blob/master/src/rt/li=
fetime.d#L1651

-Steve

Feb 03 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 02/03/2013 03:57 AM, Steven Schveighoffer wrote:
 On Sat, 02 Feb 2013 23:37:10 -0500, Ali Çehreli <acehreli yahoo.com> 

wrote:
 Rather, a better growth factor is 150%. It has been shown that 150%
 growth factor works much better with memory allocators.

 in fact, D's appender uses something more than doubling. I did not write
 that part (though I did massage it for optimizations), so I don't know
 the rationale behind it or how it is designed to work, but the code is
 here:

 

https://github.com/D-Programming-Language/druntime/blob/master/src
rt/lifetime.d#L1651 

 -Steve

If the comments are current, the implementer is Dave Fladebo. That 
algorithm considers arrays differently by how large they are:

/*
* Better version by Dave Fladebo:
* This uses an inverse logorithmic algorithm to pre-allocate a bit more
* space for larger arrays.
* - Arrays smaller than PAGESIZE bytes are left as-is, so for the most
* common cases, memory allocation is 1 to 1. The small overhead added
* doesn't affect small array perf. (it's virtually the same as
* current).
* - Larger arrays have some space pre-allocated.
* - As the arrays grow, the relative pre-allocated space shrinks.
* - The logorithmic algorithm allocates relatively more space for
* mid-size arrays, making it very fast for medium arrays (for
* mid-to-large arrays, this turns out to be quite a bit faster than the
* equivalent realloc() code in C, on Linux at least. Small arrays are
* just as fast as GCC).
* - Perhaps most importantly, overall memory usage and stress on the GC
* is decreased significantly for demanding environments.
*/

The comments include some test results as well and where the diminishing 
return is.

Ali

Feb 03 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - why does array appending add a page size instead of doubling ?