digitalmars.D.learn - Impressed with Appender - Is there design/implementation description?

Jon Degenhardt (11/11) Dec 04 2016 I've been using Appender quite a bit recently, typically when I

thedeemon (15/27) Dec 06 2016 It's rather simple, just take a look at its source code in

Anonymouse (6/15) Dec 06 2016 I always assumed it kept its own manually allocated array on a

Jonathan M Davis via Digitalmars-d-learn (36/51) Dec 06 2016 No. The main thing that Appender does is reduce the number of checks

Jon Degenhardt (6/26) Dec 06 2016 Thanks everyone for the explanations. I should probably look into

Jon Degenhardt <jond noreply.com> writes:

I've been using Appender quite a bit recently, typically when I 
need append-only arrays with variable and unknown final sizes. I 
had been prepared to write a custom data structure when I started 
using it with large amounts of data, but very nicely this has not 
surfaced as a need. Appender has held up quite well.

I haven't actually benchmarked it against competing data 
structures, nor have I studied the implementation. I'd be very 
interested in understanding the design and how it compares to 
other data structures. Are there any write-ups or articles 
describing it?

--Jon

Dec 04 2016

thedeemon <dlang thedeemon.com> writes:

On Sunday, 4 December 2016 at 20:03:37 UTC, Jon Degenhardt wrote:
 I've been using Appender quite a bit recently, typically when I 
 need append-only arrays with variable and unknown final sizes. 
 I had been prepared to write a custom data structure when I 
 started using it with large amounts of data, but very nicely 
 this has not surfaced as a need. Appender has held up quite 
 well.

 I haven't actually benchmarked it against competing data 
 structures, nor have I studied the implementation. I'd be very 
 interested in understanding the design and how it compares to 
 other data structures. Are there any write-ups or articles 
 describing it?

 --Jon

It's rather simple, just take a look at its source code in 
std.array.
It's an ordinary linear array partially filled with your data. 
When your data fills it up, it gets resized to make more space. 
Just two interesting points:

1. When it needs to grow it uses a formula like k = 1 + 10 / 
log2(newsize) for the growth factor (but with a limit of k <= 2), 
which means up to 16 KB it doubles the size each time, then tries 
to grow by a factor of 2/3, then by lower and lower factors.

2. Up until 4 KB it reallocates when growing, but after 4 KB the 
array lives in a larger pool of memory where it can often grow a 
lot without reallocating, so in many scenarios where other 
allocations do not interfere, the data array of appender grows in 
place without copying any data, thanks to GC.extend() method.

Dec 06 2016

Anonymouse <asdf asdf.net> writes:

On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
 It's rather simple, just take a look at its source code in 
 std.array.
 It's an ordinary linear array partially filled with your data.

[...]

 2. Up until 4 KB it reallocates when growing, but after 4 KB 
 the array lives in a larger pool of memory where it can often 
 grow a lot without reallocating, so in many scenarios where 
 other allocations do not interfere, the data array of appender 
 grows in place without copying any data, thanks to GC.extend() 
 method.

I always assumed it kept its own manually allocated array on a 
malloc heap :O

Hence a practice of using .dup and .idup on the .data member when 
you're happy with the result?

Dec 06 2016

Jonathan M Davis via Digitalmars-d-learn writes:

On Tuesday, December 06, 2016 13:19:22 Anonymouse via Digitalmars-d-learn 
wrote:
 On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
 It's rather simple, just take a look at its source code in
 std.array.
 It's an ordinary linear array partially filled with your data.

 [...]

 2. Up until 4 KB it reallocates when growing, but after 4 KB
 the array lives in a larger pool of memory where it can often
 grow a lot without reallocating, so in many scenarios where
 other allocations do not interfere, the data array of appender
 grows in place without copying any data, thanks to GC.extend()
 method.

 I always assumed it kept its own manually allocated array on a
 malloc heap :O

No. The main thing that Appender does is reduce the number of checks
required for whether there's room for the array to append in place, because
that check is a good chunk of why ~= is expensive for arrays. And using ~=
isn't all that bad. It's just that it adds up, and when you're clearly going
to be doing a lot of appending when you first create a dynamic array, it's
just more efficient to use Appender and avoid the extra overhead that ~=
has. What Appender does with put is actually pretty close to what druntime
does with ~=. It's just that Appender can just look at the member variable
to know the capacity of the dynamic array, whereas ~= has to look it up by
looking at the data for the memory block that the dynamic array currently
points to.

All Appender really is is a way to make appending to dynamic arrays more
efficient. It doesn't do anything magical with data structures. At present,
this is what it's member variables look like

    private alias T = ElementEncodingType!A;
    private struct Data
    {
        size_t capacity;
        Unqual!T[] arr;
        bool canExtend = false;
    }

    private Data* _data;

The magic is in that extra capacity variable, but otherwise, you're
basically dealing with the same array appending you get out of ~=.

 Hence a practice of using .dup and .idup on the .data member when
 you're happy with the result?

That isn't necessary. I don't think that I ever do it. If you want to keep
appending with Appender, and you want a dynamic array that has its own
memory to grow into without being affected by Appender, then it would make
sense to use dup/idup, but that's true any time that you append to a dynamic
array. And you might want dup/idup to get a dynamic array with a different
type of mutability than the dynamic array in the Appender just like you
might want with a naked dynamic array, but since you define the mutability
with Appender, you'd usually just change the array type that you
instantiated Appender with.

- Jonathan M Davis

Dec 06 2016

Jon Degenhardt <jond noreply.com> writes:

On Tuesday, 6 December 2016 at 15:29:59 UTC, Jonathan M Davis 
wrote:
 On Tuesday, December 06, 2016 13:19:22 Anonymouse via 
 Digitalmars-d-learn wrote:
 On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:

 [...]

 2. Up until 4 KB it reallocates when growing, but after 4 KB 
 the array lives in a larger pool of memory where it can 
 often grow a lot without reallocating, so in many scenarios 
 where other allocations do not interfere, the data array of 
 appender grows in place without copying any data, thanks to 
 GC.extend() method.

 I always assumed it kept its own manually allocated array on a 
 malloc heap :O

 No. The main thing that Appender does is reduce the number of 
 checks required for whether there's room for the array to 
 append in place, because that check is a good chunk of why ~= 
 is expensive for arrays.
 [...]

Thanks everyone for the explanations. I should probably look into 
my data and see how often I'm reaching the 4kb size triggering 
GC.extend() use.

--Jon

Dec 06 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Impressed with Appender - Is there design/implementation description?