www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Impressed with Appender - Is there design/implementation description?

reply Jon Degenhardt <jond noreply.com> writes:
I've been using Appender quite a bit recently, typically when I 
need append-only arrays with variable and unknown final sizes. I 
had been prepared to write a custom data structure when I started 
using it with large amounts of data, but very nicely this has not 
surfaced as a need. Appender has held up quite well.

I haven't actually benchmarked it against competing data 
structures, nor have I studied the implementation. I'd be very 
interested in understanding the design and how it compares to 
other data structures. Are there any write-ups or articles 
describing it?

--Jon
Dec 04 2016
parent reply thedeemon <dlang thedeemon.com> writes:
On Sunday, 4 December 2016 at 20:03:37 UTC, Jon Degenhardt wrote:
 I've been using Appender quite a bit recently, typically when I 
 need append-only arrays with variable and unknown final sizes. 
 I had been prepared to write a custom data structure when I 
 started using it with large amounts of data, but very nicely 
 this has not surfaced as a need. Appender has held up quite 
 well.

 I haven't actually benchmarked it against competing data 
 structures, nor have I studied the implementation. I'd be very 
 interested in understanding the design and how it compares to 
 other data structures. Are there any write-ups or articles 
 describing it?

 --Jon
It's rather simple, just take a look at its source code in std.array. It's an ordinary linear array partially filled with your data. When your data fills it up, it gets resized to make more space. Just two interesting points: 1. When it needs to grow it uses a formula like k = 1 + 10 / log2(newsize) for the growth factor (but with a limit of k <= 2), which means up to 16 KB it doubles the size each time, then tries to grow by a factor of 2/3, then by lower and lower factors. 2. Up until 4 KB it reallocates when growing, but after 4 KB the array lives in a larger pool of memory where it can often grow a lot without reallocating, so in many scenarios where other allocations do not interfere, the data array of appender grows in place without copying any data, thanks to GC.extend() method.
Dec 06 2016
parent reply Anonymouse <asdf asdf.net> writes:
On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
 It's rather simple, just take a look at its source code in 
 std.array.
 It's an ordinary linear array partially filled with your data.
[...]
 2. Up until 4 KB it reallocates when growing, but after 4 KB 
 the array lives in a larger pool of memory where it can often 
 grow a lot without reallocating, so in many scenarios where 
 other allocations do not interfere, the data array of appender 
 grows in place without copying any data, thanks to GC.extend() 
 method.
I always assumed it kept its own manually allocated array on a malloc heap :O Hence a practice of using .dup and .idup on the .data member when you're happy with the result?
Dec 06 2016
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Tuesday, December 06, 2016 13:19:22 Anonymouse via Digitalmars-d-learn 
wrote:
 On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:
 It's rather simple, just take a look at its source code in
 std.array.
 It's an ordinary linear array partially filled with your data.
[...]
 2. Up until 4 KB it reallocates when growing, but after 4 KB
 the array lives in a larger pool of memory where it can often
 grow a lot without reallocating, so in many scenarios where
 other allocations do not interfere, the data array of appender
 grows in place without copying any data, thanks to GC.extend()
 method.
I always assumed it kept its own manually allocated array on a malloc heap :O
No. The main thing that Appender does is reduce the number of checks required for whether there's room for the array to append in place, because that check is a good chunk of why ~= is expensive for arrays. And using ~= isn't all that bad. It's just that it adds up, and when you're clearly going to be doing a lot of appending when you first create a dynamic array, it's just more efficient to use Appender and avoid the extra overhead that ~= has. What Appender does with put is actually pretty close to what druntime does with ~=. It's just that Appender can just look at the member variable to know the capacity of the dynamic array, whereas ~= has to look it up by looking at the data for the memory block that the dynamic array currently points to. All Appender really is is a way to make appending to dynamic arrays more efficient. It doesn't do anything magical with data structures. At present, this is what it's member variables look like private alias T = ElementEncodingType!A; private struct Data { size_t capacity; Unqual!T[] arr; bool canExtend = false; } private Data* _data; The magic is in that extra capacity variable, but otherwise, you're basically dealing with the same array appending you get out of ~=.
 Hence a practice of using .dup and .idup on the .data member when
 you're happy with the result?
That isn't necessary. I don't think that I ever do it. If you want to keep appending with Appender, and you want a dynamic array that has its own memory to grow into without being affected by Appender, then it would make sense to use dup/idup, but that's true any time that you append to a dynamic array. And you might want dup/idup to get a dynamic array with a different type of mutability than the dynamic array in the Appender just like you might want with a naked dynamic array, but since you define the mutability with Appender, you'd usually just change the array type that you instantiated Appender with. - Jonathan M Davis
Dec 06 2016
parent Jon Degenhardt <jond noreply.com> writes:
On Tuesday, 6 December 2016 at 15:29:59 UTC, Jonathan M Davis 
wrote:
 On Tuesday, December 06, 2016 13:19:22 Anonymouse via 
 Digitalmars-d-learn wrote:
 On Tuesday, 6 December 2016 at 10:52:44 UTC, thedeemon wrote:

 [...]

 2. Up until 4 KB it reallocates when growing, but after 4 KB 
 the array lives in a larger pool of memory where it can 
 often grow a lot without reallocating, so in many scenarios 
 where other allocations do not interfere, the data array of 
 appender grows in place without copying any data, thanks to 
 GC.extend() method.
I always assumed it kept its own manually allocated array on a malloc heap :O
No. The main thing that Appender does is reduce the number of checks required for whether there's room for the array to append in place, because that check is a good chunk of why ~= is expensive for arrays. [...]
Thanks everyone for the explanations. I should probably look into my data and see how often I'm reaching the 4kb size triggering GC.extend() use. --Jon
Dec 06 2016