## digitalmars.D - log(n) amortized growth for containers

• Andrei Alexandrescu (42/42) Dec 10 2015 Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and
• Andrei Alexandrescu (5/8) Dec 10 2015 I was wrong here, that's the number of total moves. The growth schedule
• Timon Gehr (3/15) Dec 11 2015 E.g. the following sequence of capacities leads to amortized Θ(log n).
• deadalnix (5/5) Dec 11 2015 I've been using the same mechanism as jemalloc in SDC's runtime
• Jimmy Cao (6/9) Dec 13 2015 Yes, my go-to for my university plotting needs is SageMath.
• Jimmy Cao (13/26) Dec 13 2015 I like this approach to amortized analysis. I have a question,
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```Many thanks to Jason Causey and Jakob Ovrum for their work on heaps and
Randomized Slide to Front.

Here's a quick thought on growth schedule for arrays. The classic
approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
16, etc. That way, n memory moves are necessary for achieving a capacity
equal to n, so on average we spend a constant amount of time per
appended element.

The main disadvantage with this growth schedule is there are on average
n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
memory allocator (long discussion) so a smaller one (such as the golden
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
latter is considered scalable all right and the hope is to get a
corresponding reduction in the slack memory space.

To figure a growth schedule that leads to log(n) work per element, a
differential equation must be solved:

s(n) / s'(n) = log(x)

where s(n) will be the (continuous extension of the) growth schedule's
sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other word, s(n) is how
many elements we moved around, and s'(n) is the capacity after n steps.
The division gives us how much work was done per element.

To verify, if s(n) = n, you get O(n) work per element, i.e. quadratic
behavior for linear capacity growth. If s(n) = 2^n, you get O(1) work
per element.

So I entered the equation above in Wolfram:

https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

So the growth schedule is exponential of the LogIntegral(n). Let's write
it as:

f(n) = 2^^li(n)

Not too nice, but easy to tabulate for the relevant ranges. Let's take a
look at the slack space at capacity f(n):

slack(f(n)) = f(n) - f(n - 1)

Sadly the expression doesn't seem to simplify and I couldn't find a good
online plotter that can plot 2^^li(x). Here's what Mathematica free shows:

https://www.wolframalpha.com/input/?i=2%5Eli%28x%29+-+2%5Eli%28x+-+1%29

Compare with:

https://www.wolframalpha.com/input/?i=2%5Ex+-+2%5E%28x+-+1%29

The slack elements seem to grow considerably slower, which is great.

Does anyone know of a plotter that supports logarithmic integral? Of
course better ideas about all of the above are welcome!

Thanks,

Andrei
```
Dec 10 2015
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 12/10/15 10:55 PM, Andrei Alexandrescu wrote:
So the growth schedule is exponential of the LogIntegral(n). Let's write
it as:

f(n) = 2^^li(n)

I was wrong here, that's the number of total moves. The growth schedule
is proportional to the derivative of that:

f(n) = 2^^li(n)/log(n)

Andrei
```
Dec 10 2015
Timon Gehr <timon.gehr gmx.ch> writes:
```On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
Here's a quick thought on growth schedule for arrays. The classic
approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
16, etc. That way, n memory moves are necessary for achieving a capacity
equal to n, so on average we spend a constant amount of time per
appended element.

The main disadvantage with this growth schedule is there are on average
n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
memory allocator (long discussion) so a smaller one (such as the golden
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
latter is considered scalable all right and the hope is to get a
corresponding reduction in the slack memory space. ...

E.g. the following sequence of capacities leads to amortized Θ(log n).

c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.
```
Dec 11 2015
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 12/11/15 9:02 PM, Timon Gehr wrote:
On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
Here's a quick thought on growth schedule for arrays. The classic
approach is to grow by a geometric schedule. For example: 1, 2, 4, 8,
16, etc. That way, n memory moves are necessary for achieving a capacity
equal to n, so on average we spend a constant amount of time per
appended element.

The main disadvantage with this growth schedule is there are on average
n/2 slack memory slots. Also, a growth factor of 2 is unfriendly to the
memory allocator (long discussion) so a smaller one (such as the golden
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not O(log n)? The
latter is considered scalable all right and the hope is to get a
corresponding reduction in the slack memory space. ...

E.g. the following sequence of capacities leads to amortized Θ(log n).

c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

How do you prove that? -- Andrei
```
Dec 11 2015
Marco Nembrini <marco.nembrini.co gmail.com> writes:
```On Saturday, 12 December 2015 at 02:12:13 UTC, Andrei
Alexandrescu wrote:
On 12/11/15 9:02 PM, Timon Gehr wrote:
On 12/11/2015 04:55 AM, Andrei Alexandrescu wrote:
Here's a quick thought on growth schedule for arrays. The
classic
approach is to grow by a geometric schedule. For example: 1,
2, 4, 8,
16, etc. That way, n memory moves are necessary for achieving
a capacity
equal to n, so on average we spend a constant amount of time
per
appended element.

The main disadvantage with this growth schedule is there are
on average
n/2 slack memory slots. Also, a growth factor of 2 is
unfriendly to the
memory allocator (long discussion) so a smaller one (such as
the golden
cut or less) would be better.

Anyhow, I was thinking why O(1) amortized growth and not
O(log n)? The
latter is considered scalable all right and the hope is to
get a
corresponding reduction in the slack memory space. ...

E.g. the following sequence of capacities leads to amortized
Θ(log n).

c(0) = 0, c(k+1) = c(k)+c(k)/⌊log₂(c(k)+2)⌋+1.

How do you prove that? -- Andrei

Forget about going to continuous extensions, and substitute the
derivative of s(n) with (s(n+1) - s(n)) / 1.
Plugging into s(n) / s'(n) = log(n) and solving for s(n+1) gives
you a recursion for s:
s(n+1) = s(n) + s(n) / log(n).
Then you decide what you want for s(0) and fix small things like
log(0) not being defined or s(n) / log(n) not always being a
whole number.

You'll notice that Timon has log(c(k)) and not log(k) in his
expression, which I think comes from some confusion in your
original post between the number of elements n ( which is c(k)?)
and the number of times you have to do move elements (k used by
Timon)
```
Dec 12 2015
```I've been using the same mechanism as jemalloc in SDC's runtime
and it bucket basically by keeping 2 bits of precision + shift.
It goes as :

4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, 28, 32, 40, 48, 56, 64, ...

It work quite well in practice.
```
Dec 11 2015
Jimmy Cao <jc2462 cornell.edu> writes:
```On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu
wrote:
Does anyone know of a plotter that supports logarithmic
integral? Of course better ideas about all of the above are
welcome!

Yes, my go-to for my university plotting needs is SageMath.

Here is the plot of the two functions, the red one is without the
li(x).  The vertical axis scale is logarithmic.

https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage
```
Dec 13 2015
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
```On 12/13/15 2:56 PM, Jimmy Cao wrote:
On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu wrote:
Does anyone know of a plotter that supports logarithmic integral? Of
course better ideas about all of the above are welcome!

Yes, my go-to for my university plotting needs is SageMath.

Here is the plot of the two functions, the red one is without the
li(x).  The vertical axis scale is logarithmic.

https://sagecell.sagemath.org/?z=eJyVjk0KgzAQhfeCdxjcmNAI1mXBK_QIlWAmaWhMZEzB9vRNBDdFCt3NPL73o1CDZiu_QFkAEMYneVgFdLcrc8EM1kc0JF1CODQHcnPmvCzKQqUgk4O-c9bNtoGZm6GHZZQxIg2zC5FNcmZaAElvkHXi3LacC9ByxDG4QH3lrLlHQ4i-EoDK7PouTZIeSIt9Y9-1-9vXSy1ykcN04mTT6lfNc__pYID5NYBQ_V_zAe2Da2c=&lang=sage

Thanks, great!

Now, this is going to be difficult to evaluate in the real world.
Microbenchmarks are unlikely to be useful - most just grow a vector up
to wazoo, which works faster with a more aggressive growth schedule. A
better test is in a real application where several arrays compete for
data cache.

Andrei
```
Dec 13 2015
Jimmy Cao <jc2462 cornell.edu> writes:
```On Friday, 11 December 2015 at 03:55:27 UTC, Andrei Alexandrescu
wrote:
To figure a growth schedule that leads to log(n) work per
element, a differential equation must be solved:

s(n) / s'(n) = log(x)

where s(n) will be the (continuous extension of the) growth
schedule's sum, i.e. s(n) = f(1) + f(2) + ... + f(n). In other
word, s(n) is how many elements we moved around, and s'(n) is
the capacity after n steps. The division gives us how much work
was done per element.

To verify, if s(n) = n, you get O(n) work per element, i.e.
quadratic behavior for linear capacity growth. If s(n) = 2^n,
you get O(1) work per element.

So I entered the equation above in Wolfram:

https://www.wolframalpha.com/input/?i=y%28x%29+%2F+y%27%28x%29+%3D+log%28x%29

I like this approach to amortized analysis.  I have a question,
though,

the diff eq you have is

s(n) / s'(n) = log(x)

But the WolframAlpha query solves for effectively:

s(n) / s'(n) = log(n)

Why does x = n?

Shouldn't x be s'(n), so that s(n)/s'(n), how much work was done
per element, is bounded by the number of elements appended (that
is about the capacity s'(n))?  I thought number of elements != n,
but rather something between s'(n) and s'(n-1)?
```
Dec 13 2015