www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Re: RFC on range design for D2

reply bearophile <bearophileHUGS lycos.com> writes:
I don't have enough experience to have a clear view on the whole subject, so I
write only some general comments:

1) In time I have seen that everyone that likes/loves language X wants D to
become like X. This is true for Andrei too, you clearly like C++ a lot. The
worse fault of C++ is excessive complexity, that's the single fault that's
killing it, that has pushed people to invent Java, and so on. Walter has
clearly tried hard to make D a less complex language. This means that D is
sometimes less powerful that C++, and every smart programmer can see that it's
less powerful, but having 15% less power is a good deal if you have 1/3 of the
complexity. As you may guess I don't like C++ (even if I like its efficiency
I'm not going to use it for real coding), so I don't like D to become just a
resyntaxed C++. Note that originally D was more like a statically compiled
Java, and while that's not perfect, that's probably better than a resyntaxed
C++. So I suggest less power, if this decreases a lot what the programmer has
to keep in the mind while programming. If you don't believe me take a look at
what the blurb of D says:

D is a systems programming language. Its focus is on combining the power and
high performance of C and C++ with the programmer productivity of modern
languages like Ruby and Python. Special attention is given to the needs of
quality assurance, documentation, management, portability and reliability.<

As you see it's a matter of balance. 2) Keep in mind that not all D programmers are lovers of C++ and they don't all come from C++, some of them are actually lovers of other languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may expect, they may want to push D to look more lisp, Python, C#, etc. 3) opApply has some problems, but it isn't infamous. 4) Syntax matters, and function/ method /attribute names too. So they have to be chosen with care. I suggest to use single words when possible, and to consider "alllowercase" names too (I know this contrasts with D style guide). 4b) Coping function/method/attribute names from C++ isn't bad if people think such names are good. Inventing different names just to be different is not good. 5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds. 6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to: - offer a very handy syntax, easy to use and remember, short to type too. - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation. - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib. 7) Take a look at the lazy "views" of keys/values/items of Python3, how they fit into your view of such ranges. Java too has something similar. (I can give a summary if you want. But in few words if you have an associative array (dict) d.keys() doesn't return an array, but a "view", a lazy iterable that's an object that can be sliced lazily, iterated, etc. This is way more efficient than the .keys/.values of the currenct D implementation). 8) Lot of functions/thinghies in my mostly-functional library are designed to be lazy, that is they generate items on the fly. At the moment D lacks a *lot* such view of the world. Again, take a good look at how Python 3 has shifted from eager to lazy in lot of its style. How do such things fit in your range view? I think they can fit well, but the D language has to shift its phylosophy a little to support such lazy generators/iterators more often and commonly in the built-ins too. 9) I have a long discussion in the main D newsgroup, a partial conclusion was that a struct of 2 items may not be enough for the current implementation of dynamic arrays, because withot a capacity field, the append is dead-slow. I presume this is ortogonal to your range propostal, but I have mentioned it here because you keep talking about dynamic arrays as 2-pointer structs, and that may be changed in the future. Bye, bearophile
Sep 09 2008
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 I don't have enough experience to have a clear view on the whole
 subject, so I write only some general comments:

Uh-oh. That doesn't quite bode well :o).
 1) In time I have seen that everyone that likes/loves language X
 wants D to become like X. This is true for Andrei too, you clearly
 like C++ a lot.

Stop right there. This is just a presupposition. I think I could say I *know* C++ a lot, which is quite different. But then I know a few other languages quite well, along with the advantages and disadvantages that made them famous. Maybe I could say I do like the STL. Not the quirks it takes to implement it in C++, but for the fact that it brings clarity and organization in defining fundamental data structures and algorithms. For my money, other collection/algorithms designs don't hold a candle to STL's.
 The worse fault of C++ is excessive complexity,
 that's the single fault that's killing it, that has pushed people to
 invent Java, and so on. Walter has clearly tried hard to make D a
 less complex language. This means that D is sometimes less powerful
 that C++, and every smart programmer can see that it's less powerful,
 but having 15% less power is a good deal if you have 1/3 of the
 complexity. As you may guess I don't like C++ (even if I like its
 efficiency I'm not going to use it for real coding), so I don't like
 D to become just a resyntaxed C++. Note that originally D was more
 like a statically compiled Java, and while that's not perfect, that's
 probably better than a resyntaxed C++. So I suggest less power, if
 this decreases a lot what the programmer has to keep in the mind
 while programming. If you don't believe me take a look at what the
 blurb of D says:
 
 D is a systems programming language. Its focus is on combining the
 power and high performance of C and C++ with the programmer
 productivity of modern languages like Ruby and Python. Special
 attention is given to the needs of quality assurance,
 documentation, management, portability and reliability.<

As you see it's a matter of balance.

I know it's a matter of balance. I am not sure what gave you the idea that I didn't.
 2) Keep in mind that not all D programmers are lovers of C++ and they
 don't all come from C++, some of them are actually lovers of other
 languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may
 expect, they may want to push D to look more lisp, Python, C#, etc.

I like many things in all of the languages above. Don't forget it was me who opened Walter to Lisp :o).
 3) opApply has some problems, but it isn't infamous.

opApply has two problems: 1. It can't save the state of the iteration for later resumption. 2. It uses repeated function calls through a pointer, which is measurably slow. Both disadvantages are major. The first fosters container design without true iteration. That's just bad design. (Look at built-in hashes.) The second is even worse because it makes foreach a toy useful when you read lines from the console, but not in inner loops. Which is kind of ironic because they are inner *loops*. I think many would agree that foreach minus the disadvantages would be better.
 4) Syntax matters, and function/ method /attribute names too. So they
 have to be chosen with care. I suggest to use single words when
 possible, and to consider "alllowercase" names too (I know this
 contrasts with D style guide).

It's very easy to give very general advice (which to be honest your post is replete of). It would help if you followed with a few concrete points.
 4b) Coping function/method/attribute names from C++ isn't bad if
 people think such names are good. Inventing different names just to
 be different is not good.
 
 5) The source code of the current algorithm module of D2 is already
 very complex to follow, it smells of over-generalization here and
 there. Sometimes it's better to reduce the generality of things, even
 if that reduces their power a little, to reduce complexity, etc.
 Tango code too isn't perfect, but it often looks more human. While
 you have created the algorithm module I too have created something
 similar, but based on different grounds.

I am sure you like your library more than mine, because it's your design and your realization. The code in std.algorithm is complex because it implements algorithms that are complex. I know if someone would look over partition, rotate, topN, or sort, without knowing how they work, they wouldn't have an easy job picking it up. That is fine. The purpose of std.algorithm's implementation is not to be a tutorial on algorithms. On occasion std.algorithm does a few flip-flops, but always for a good reason. For example it takes some effort to allow multiple functions in reduce. Consider: double[] a = [ 3.0, 4, 7, 11, 3, 2, 5 ]; // Compute minimum and maximum in one pass auto r = reduce!(min, max)(double.max, -double.max, a); // The type of r is Tuple!(double, double) assert(r._0 == 2); // minimum assert(r._1 == 11); // maximum A simpler reduce would have only allowed one function, so I would have had to write: auto m = reduce!(min)(double.max, a); auto M = reduce!(max)(-double.max, a); On the face of it, this looks reasonable. After all, come one, why can't one write two lines instead of one. However, min and max are so simple functions that the cost of computing them is drown by the cost of looping alone. On a large array, things become onerous so I had to write the loop by hand. How do I know that? Because I measured. Why did I measure? Because I care. Why do I care? Because the typical runtime of my programs measure in hours of sheer computation, and because things like reduce and others in std.algorithm are in the core loops. And I am not alone. If I can help it, I wouldn't want to write an algorithms library for a systems-level programming language with fundamental limitations that makes them unsuitable for efficient computation. Costly abstractions are a dime a dozen. It's efficient abstractions that are harder to come across. If you believe I am wasting time with cutesies in std.algorithm, please let me know of the places and of the suggested improvements.
 6) Built-in data types are important, they aren't meant to replace a
 good standard library, where you can find more specialized and more
 efficient data structures. The built-in data types are meant to: -
 offer a very handy syntax, easy to use and remember, short to type
 too. - They have to be efficient in a very wide variety of
 situations, so they must avoid having really bad peformance in a
 large number of situations, while it's okay for them to be not much
 fast in any situation. - They are useful for example when you have
 little data, in 90% of the code where max performance isn't so
 important. In the other situations you are supposed able to import
 things like an IntrusiveRedBlackHashMap from the std lib.

I agree.
 7) Take a look at the lazy "views" of keys/values/items of Python3,
 how they fit into your view of such ranges. Java too has something
 similar. (I can give a summary if you want. But in few words if you
 have an associative array (dict) d.keys() doesn't return an array,
 but a "view", a lazy iterable that's an object that can be sliced
 lazily, iterated, etc. This is way more efficient than the
 .keys/.values of the currenct D implementation).

To quote a classic, Lisp has had them all for 50 years. Well-defined ranges are the perfect stepping stone towards lazy computation. In particular input ranges are really generators. I plan to implement things like lazyMap and lazyReduce, and also generators that are so popular in functional languages.
 8) Lot of functions/thinghies in my mostly-functional library are
 designed to be lazy, that is they generate items on the fly. At the
 moment D lacks a *lot* such view of the world. Again, take a good
 look at how Python 3 has shifted from eager to lazy in lot of its
 style. How do such things fit in your range view? I think they can
 fit well, but the D language has to shift its phylosophy a little to
 support such lazy generators/iterators more often and commonly in the
 built-ins too.

I agree that D could use more lazy iteration instead of eager computation.
 9) I have a long discussion in the main D newsgroup, a partial
 conclusion was that a struct of 2 items may not be enough for the
 current implementation of dynamic arrays, because withot a capacity
 field, the append is dead-slow. I presume this is ortogonal to your
 range propostal, but I have mentioned it here because you keep
 talking about dynamic arrays as 2-pointer structs, and that may be
 changed in the future.

I saw that discussion. In my opinion that discussion clarifies why slices shouldn't be conflated with full-fledged containers. Handling storage strategy should not be the job of a slice. That's why there's a need for true containers (including straight block arrays as a particular case). Ranges are perfect for their internal implementation and for iterating them. Andrei
Sep 09 2008
parent reply superdan <super dan.org> writes:
Andrei Alexandrescu Wrote:

 bearophile wrote:
 I don't have enough experience to have a clear view on the whole
 subject, so I write only some general comments:

Uh-oh. That doesn't quite bode well :o).

u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
Sep 09 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.
Sep 09 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Walter Bright wrote:
 superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.

And when a correct design is devised, the mark of genius is everyone will think it in retrospect to be simple and obvious!
Sep 09 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Walter Bright wrote:
 Walter Bright wrote:
 superdan wrote:
 u could've stopped readin'. general comments without experience are
 oxpoop. my mailman could give'em.

The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.

And when a correct design is devised, the mark of genius is everyone will think it in retrospect to be simple and obvious!

Also: everyone will measure the quality of that design using a different yardstick. For my money, the best collection design I've ever worked with is the C5 library for .Net, developed at the IT University of Copenhagen: http://www.itu.dk/research/c5/ http://www.ddj.com/windows/199902700 --benji
Sep 09 2008
prev sibling next sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
bearophile wrote:
 6) Built-in data types are important, they aren't meant to replace a good
standard library, where you can find more specialized and more efficient data
structures. The built-in data types are meant to:
 - offer a very handy syntax, easy to use and remember, short to type too.
 - They have to be efficient in a very wide variety of situations, so they must
avoid having really bad peformance in a large number of situations, while it's
okay for them to be not much fast in any situation.
 - They are useful for example when you have little data, in 90% of the code
where max performance isn't so important. In the other situations you are
supposed able to import things like an IntrusiveRedBlackHashMap from the std
lib.

I'd also add this: A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically. It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types. On a separate note... I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc. But I can think of a few examples where the concept of "iteration" needs to be even more generalized. For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation. I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this: MarkovModel<SimulationState> model = ...; for (SimulationState state : model) { state.doStuff(); } The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities). Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection. For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method). How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor. Thanks! --benji
Sep 09 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 bearophile wrote:
 6) Built-in data types are important, they aren't meant to replace a 
 good standard library, where you can find more specialized and more 
 efficient data structures. The built-in data types are meant to:
 - offer a very handy syntax, easy to use and remember, short to type too.
 - They have to be efficient in a very wide variety of situations, so 
 they must avoid having really bad peformance in a large number of 
 situations, while it's okay for them to be not much fast in any 
 situation.
 - They are useful for example when you have little data, in 90% of the 
 code where max performance isn't so important. In the other situations 
 you are supposed able to import things like an 
 IntrusiveRedBlackHashMap from the std lib.

I'd also add this: A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically. It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types. On a separate note... I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc. But I can think of a few examples where the concept of "iteration" needs to be even more generalized. For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation. I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this: MarkovModel<SimulationState> model = ...; for (SimulationState state : model) { state.doStuff(); } The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities). Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection. For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method). How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor.

Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can do it with D's isEmpty and next. There's no difference. Andrei
Sep 09 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Andrei Alexandrescu wrote:
 Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can 
 do it with D's isEmpty and next. There's no difference.
 
 
 Andrei

Oh. Okay. Good to know :) I guess all the talk about "ranges" has me visualizing contiguous items and sequential ordering and determinism. It's a good word for a well-defined set of items with a true beginning and end, but I wonder whether "cursor" might be a better word than "range", especially for input consumption and non-deterministic iteration. One thing I definitely like better about the new range proposal is the notion that the container is not responsible for providing iteration logic or, more importantly, maintaining the state of any iteration. I think it'll be a welcome change. Nice work, and I'm looking forward to working with the new stuff :) --benji
Sep 09 2008
prev sibling parent reply Benji Smith <dlanguage benjismith.net> writes:
bearophile wrote:
 5) The source code of the current algorithm module of D2 is already very
complex to follow, it smells of over-generalization here and there. Sometimes
it's better to reduce the generality of things, even if that reduces their
power a little, to reduce complexity, etc. Tango code too isn't perfect, but it
often looks more human. While you have created the algorithm module I too have
created something similar, but based on different grounds.

Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority. --benji
Sep 09 2008
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already 
 very complex to follow, it smells of over-generalization here and 
 there. Sometimes it's better to reduce the generality of things, even 
 if that reduces their power a little, to reduce complexity, etc. Tango 
 code too isn't perfect, but it often looks more human. While you have 
 created the algorithm module I too have created something similar, but 
 based on different grounds.

Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.

This is a valid concern. The sample ranges I have coded so far look deceptively simple, and that's a good sign. It only takes a dozen lines to write an interesting range. (This is very unlike STL iterators.) Andrei
Sep 09 2008
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already 
 very complex to follow, it smells of over-generalization here and 
 there. Sometimes it's better to reduce the generality of things, even 
 if that reduces their power a little, to reduce complexity, etc. Tango 
 code too isn't perfect, but it often looks more human. While you have 
 created the algorithm module I too have created something similar, but 
 based on different grounds.

Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.

Speaking of examples and readability, and to tie this with the discussion on array reallocation, I was curious on an array appender built on the output range interface. I've seen a quite complicated array builder in digitalmars.d. I wanted a simpler appender that should not do worse than the built-in ~= and that works with algorithm2 whenever data is written out. It turned out quite simple and it improved performance of a large-scale data preprocessing task of mine (which involved reading and parsing about 1M lines of integers) by 15%. I'd be curious how it fares with other tests that you guys may have. The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { private T* _buffer; private size_t _length; private size_t _capacity; this(T[] array) { _buffer = array.ptr; _length = array.length; if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof; } size_t capacity() const { return _capacity; } void putNext(T item) { invariant desiredLength = _length + 1; if (desiredLength <= _capacity) { // Should do in-place construction here _buffer[_length] = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _buffer[0 .. _length]; tmp ~= item; _buffer = tmp.ptr; _capacity = .capacity(_buffer) / T.sizeof; } _length = desiredLength; } T[] release() { // Shrink-to-fit auto result = cast(T[]) realloc(_buffer, _length * T.sizeof); // Relinquish state _buffer = null; _length = _capacity = 0; return result; } } unittest { auto app = arrayAppender(new char[0]); string b = "abcdefg"; foreach (char c; b) app.putNext(c); assert(app.release == "abcdefg"); }
Sep 09 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Speaking of examples and readability, and to tie this with the 
 discussion on array reallocation, I was curious on an array appender 
 built on the output range interface. I've seen a quite complicated array 
 builder in digitalmars.d. I wanted a simpler appender that should not do 
 worse than the built-in ~= and that works with algorithm2 whenever data 
 is written out.

That builder was probably my one, designed for D1. For the last version take a look at the 'builder.d' module in the libs I have shown you. It's not too much complex: its API is simple and its internals are as complex as they have to be to be efficient. (I'm slowly improving it still, I'm now trying to make it more flexible, making its extending functionality work with other kinds of iterables too.)
 I'd be curious how it fares with other tests that you guys may have.

That module has about 380 lines of code of benchmarks (after few hundred of lines of unit tests), I think you can add few lines to them to benchmark your implementation, but I presume mine is more efficient :-) I may do few benchmarks later... Bye, bearophile
Sep 10 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":

benchmark 10, N=10_000_000:
  Array append:  0.813 s
  ArrayBuilder:  0.159 s
  ArrayAppender: 1.056 s

benchmark 10, N=100_000_000:
  Array append:  10.887 s
  ArrayBuilder:   1.477 s
  ArrayAppender: 13.305 s

-----------------

Chunk of the code I have used:

struct ArrayAppender(T)
{
     private T* _buffer;
     private size_t _length;
     private size_t _capacity;

     void build(T[] array)
     {
         _buffer = array.ptr;
         _length = array.length;
         //if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof;
     }

     size_t capacity() /* const */ { return _capacity; }

     void putNext(T item)
     {
         /* invariant */ int desiredLength = _length + 1;
         if (desiredLength <= _capacity)
         {
             // Should do in-place construction here
             _buffer[_length] = item;
         }
         else
         {
             // Time to reallocate, do it and cache capacity
             auto tmp = _buffer[0 .. _length];
             tmp ~= item;
             _buffer = tmp.ptr;
             _capacity = this.capacity() / T.sizeof;
         }
         _length = desiredLength;
     }

     T[] release()
     {
         // Shrink-to-fit
         //auto result = cast(T[]) realloc(_buffer, _length * T.sizeof);
         auto result = cast(T[]) gcrealloc(_buffer, _length * T.sizeof);
         // Relinquish state
         _buffer = null;
         _length = _capacity = 0;
         return result;
     }
}

unittest
{
     auto app = arrayAppender(new char[0]);
     string b = "abcdefg";
     foreach (char c; b) app.putNext(c);
     assert(app.release == "abcdefg");
}


    void benchmark10(int ntest, int N) {
        putr("\nbenchmark 10, N=", thousands(N), ":");

        if (ntest == 0) {
            auto t0 = clock();
            int[] a1;
            for (int i; i < N; i++)
                a1 ~= i;
            auto last1 = a1[$ - 1];
            auto t2 = clock() - t0;
            putr("  Array append: %.3f", t2, " s   ", last1);
        } else if (ntest == 1) {
            auto t0 = clock();
            ArrayBuilder!(int) a2;
            for (int i; i < N; i++)
                a2 ~= i;
            auto last2 = a2.toarray[$ - 1];
            auto t2 = clock() - t0;
            putr("  ArrayBuilder: %.3f", t2, " s   ", last2);
        } else {
            auto t0 = clock();
            ArrayAppender!(int) a3;
            a3.build(null);
            for (int i; i < N; i++)
                a3.putNext(i);
            auto last3 = a3.release()[$ - 1];
            auto t2 = clock() - t0;
            putr("  ArrayAppender: %.3f", t2, " s   ", last3);
        }
    }

Bye,
bearophile
Sep 10 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":
 
 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s
 
 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

That's odd. The array appender should never, by definition, do significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei
Sep 10 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append.

But the reality of your code and benchmarks may differ from the abstract definition :-)
 I think some other factor intervened (e.g. swapping). Also don't forget to
 compile with -O -release -inline and to test several times after a warmup run.

I have kept an eye on such things too. Note that benchmarks are generally tricky. I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the code doesn't make HD swap, and timings are warm. My timings are repeatable within 0.02-0.03 seconds on my PC. If you want I can give you the whole testing code, of course. But it's just the builders.d module of my libs plus the added testing code I have shown you. Anyway, in the end it doesn't matter much, I think. Bye, bearophile
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 That's odd. The array appender should never, by definition, do 
 significantly worse than the straight array append.

But the reality of your code and benchmarks may differ from the abstract definition :-)

But it's not abstract, and besides my benchmarks do support my hypothesis. On the common path my code does the minimum amount that any append would do: test, assign at index, bump. On the less common path (invoked an amortized constant number of times) my code does an actual built-in append plus a call to capacity to cache it. Assuming the built-in array does a screaming fast append, my code should be just about as fast, probably insignificantly slower because of the extra straggler operations.
 I think some other factor intervened (e.g. swapping). Also don't
 forget to compile with -O -release -inline and to test several
 times after a warmup run.

I have kept an eye on such things too. Note that benchmarks are generally tricky. I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the code doesn't make HD swap, and timings are warm. My timings are repeatable within 0.02-0.03 seconds on my PC. If you want I can give you the whole testing code, of course. But it's just the builders.d module of my libs plus the added testing code I have shown you.

But I copied your test code from your post. Then I adapted it to Phobos (replaced putr with writefln, clock with ctime and the such, which shouldn't matter). Then I ran it under Phobos. For the built-in array append we get comparable numbers. So there must be something that makes your numbers for ArrayAppender skewed. I'm saying yours and not mine because mine are in line with expectations and yours aren't. Andrei
Sep 10 2008
prev sibling parent reply superdan <super dan.org> writes:
Andrei Alexandrescu Wrote:

 bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":
 
 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s
 
 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

That's odd. The array appender should never, by definition, do significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei

arrayappender is simple as dumb. compare and contrast with the ginormous arraybuilder. yet works great. why? because it is on the right thing. the common case. on the uncommon case it does whatever to get the job done. don't matter if it's rare. not sure anybody saw the irony. it was bearophile advocating simplicity eh. allegedly andre's the complexity guy. code seems to tell nother story. btw doods i figured indexed access is slower than access via pointer. so i recoded arrayappender to only use pointers. there's some half a second savings for the large case. small arrays don't feel it. struct ArrayAppender(T) { private T* _begin; private T* _end; private T* _eos; this(T[] array) { _begin = array.ptr; _end = _begin + array.length; if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof; } size_t capacity() const { return _eos - _begin; } void putNext(T item) { if (_end < _eos) { *_end++ = item; } else { auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; _begin = tmp.ptr; _end = _begin + tmp.length; _eos = _begin + .capacity(_begin) / T.sizeof; } } T[] releaseArray() { auto result = _begin[0 .. _end - _begin]; _begin = _end = _eos = null; return result; } }
Sep 10 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
superdan wrote:
 Andrei Alexandrescu Wrote:
 
 bearophile wrote:
 Few benchmarks, appending ints, note this is a worst-case situation (other
benchmarks are less dramatic). Just many appends, followed by the "release":

 benchmark 10, N=10_000_000:
   Array append:  0.813 s
   ArrayBuilder:  0.159 s
   ArrayAppender: 1.056 s

 benchmark 10, N=100_000_000:
   Array append:  10.887 s
   ArrayBuilder:   1.477 s
   ArrayAppender: 13.305 s

significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei

arrayappender is simple as dumb. compare and contrast with the ginormous arraybuilder. yet works great. why? because it is on the right thing. the common case. on the uncommon case it does whatever to get the job done. don't matter if it's rare. not sure anybody saw the irony. it was bearophile advocating simplicity eh. allegedly andre's the complexity guy. code seems to tell nother story. btw doods i figured indexed access is slower than access via pointer. so i recoded arrayappender to only use pointers. there's some half a second savings for the large case. small arrays don't feel it. struct ArrayAppender(T) { private T* _begin; private T* _end; private T* _eos; this(T[] array) { _begin = array.ptr; _end = _begin + array.length; if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof; } size_t capacity() const { return _eos - _begin; } void putNext(T item) { if (_end < _eos) { *_end++ = item; } else { auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; _begin = tmp.ptr; _end = _begin + tmp.length; _eos = _begin + .capacity(_begin) / T.sizeof; } } T[] releaseArray() { auto result = _begin[0 .. _end - _begin]; _begin = _end = _eos = null; return result; } }

Thanks! Can't hurt. Guess I'll integrate your code if you don't mind. I gave it some more thought and I have a theory for the root of the issue. My implementation assumes there's exponential (multiplicative) increase of capacity in the built-in ~=. I hope Walter wouldn't do anything else. If there are differences in growth strategies between D1 and D2, that could explain the difference between bearophile's benchmarks and mine. Andrei
Sep 10 2008
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between D1 
 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

You can't compare benchmarks of two different compilers. (Your code doesn't work (on D1) if T is a static array, and using the ~= looks like a better interface for the append. Your code doesn't append arrays, so you have to call it many times if you want to append strings (in D1) that's a very common case.) Bye, bearophile
Sep 10 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
bearophile:
 (Your code doesn't work (on D1) if T is a static array,

Sorry, ignore what I have written, I'm a little nervous... Bye, bearophile
Sep 10 2008
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 bearophile:
 (Your code doesn't work (on D1) if T is a static array,

Sorry, ignore what I have written, I'm a little nervous...

I think I've unnecessarily overstated my case, which has put both of us in defensive. You are very right that tests on D1 and D2 are not comparable. And Walter has at one point made crucial changes in the allocator at my behest. Specifically, he introduced in-place growth whenever possible and added the expand() primitive to the gc. These are present in D2 but I don't know whether and when he has regressed those to D1. (And btw why wouldn't you try it :o).) Andrei
Sep 10 2008
parent Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 bearophile wrote:
 bearophile:
 (Your code doesn't work (on D1) if T is a static array,

Sorry, ignore what I have written, I'm a little nervous...

I think I've unnecessarily overstated my case, which has put both of us in defensive. You are very right that tests on D1 and D2 are not comparable. And Walter has at one point made crucial changes in the allocator at my behest. Specifically, he introduced in-place growth whenever possible and added the expand() primitive to the gc. These are present in D2 but I don't know whether and when he has regressed those to D1. (And btw why wouldn't you try it :o).)

For the record, in-place growth has been in D1 for as long as it's been in D2. Sean
Sep 10 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu wrote:
 
 I gave it some more thought and I have a theory for the root of the 
 issue. My implementation assumes there's exponential (multiplicative) 
 increase of capacity in the built-in ~=. I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between D1 
 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

Arrays larger than 4k grow logarithmically, smaller than 4k they grow exponentially. This is certainly true of D1 and Tango, and I'd assume D2 is no different. Sean
Sep 10 2008
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sean Kelly wrote:
 Andrei Alexandrescu wrote:
 I gave it some more thought and I have a theory for the root of the 
 issue. My implementation assumes there's exponential (multiplicative) 
 increase of capacity in the built-in ~=. I hope Walter wouldn't do 
 anything else. If there are differences in growth strategies between 
 D1 and D2, that could explain the difference between bearophile's 
 benchmarks and mine.

Arrays larger than 4k grow logarithmically, smaller than 4k they grow exponentially. This is certainly true of D1 and Tango, and I'd assume D2 is no different.

Yes, but with in-place expansion, effective growth stays exponential even beyond 4k. So I'm not that worried that it could become quadratic, unless there are some really wicked workloads. I modified my putNext to print what's happening: void putNext(T item) { if (_end < _eos) { // Should do in-place construction here *_end++ = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; if (_begin != tmp.ptr) { _begin = tmp.ptr; _end = _begin + tmp.length; writeln(_end - _begin); } else { ++_end; } _eos = _begin + .capacity(_begin) / T.sizeof; } } Notice the writeln. On my system the console reads: benchmark 10, N=75000000: 1 5 9 17 33 65 129 257 513 253953 1155073 4743169 18505729 68861953 That's exponential alright. On the other hand, if you move the writeln after the if, indeed: 1 5 9 17 33 65 129 257 513 1025 2049 3073 4097 5121 6145 7169 8193 9217 10241 11265 12289 But it's the former column that matters, because moving chinks is where real work is being done. In-place block expansion should take constant time. With this behavior in place, my code amortizes calls to capacity() by a factor of 1024, and keeps amortized append complexity constant. Andrei
Sep 10 2008
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Benji Smith wrote:
 bearophile wrote:
 5) The source code of the current algorithm module of D2 is already
 very complex to follow, it smells of over-generalization here and
 there. Sometimes it's better to reduce the generality of things, even
 if that reduces their power a little, to reduce complexity, etc. Tango
 code too isn't perfect, but it often looks more human. While you have
 created the algorithm module I too have created something similar, but
 based on different grounds.

Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.

discussion on array reallocation, I was curious on an array appender built on the output range interface. I've seen a quite complicated array builder in digitalmars.d. I wanted a simpler appender that should not do worse than the built-in ~= and that works with algorithm2 whenever data is written out. It turned out quite simple and it improved performance of a large-scale data preprocessing task of mine (which involved reading and parsing about 1M lines of integers) by 15%. I'd be curious how it fares with other tests that you guys may have. The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { private T* _buffer; private size_t _length; private size_t _capacity; this(T[] array) { _buffer = array.ptr; _length = array.length; if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof; } size_t capacity() const { return _capacity; } void putNext(T item) { invariant desiredLength = _length + 1; if (desiredLength <= _capacity) { // Should do in-place construction here _buffer[_length] = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _buffer[0 .. _length]; tmp ~= item; _buffer = tmp.ptr; _capacity = .capacity(_buffer) / T.sizeof; } _length = desiredLength; } T[] release() { // Shrink-to-fit auto result = cast(T[]) realloc(_buffer, _length * T.sizeof); // Relinquish state _buffer = null; _length = _capacity = 0; return result; } } unittest { auto app = arrayAppender(new char[0]); string b = "abcdefg"; foreach (char c; b) app.putNext(c); assert(app.release == "abcdefg"); }

One definite problem that I've just realized is that there's no putNext(T[]). What if you need to append another array to your ArrayAppender, not just a single element?
Jan 08 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 One definite problem that I've just realized is that there's no putNext(T[]).
 What if you need to append another array to your ArrayAppender, not just a
single
 element?

Just add a simple method overload for that purpose, it's easy enough to do. (the ArrayBuilder of my dlibs has this already, of course). Bye, bearophile
Jan 08 2009
prev sibling next sibling parent reply jq <jlquinn optonline.net> writes:
dsimcha Wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

 The idea is very simple: just use D's native append operation, but cache
 the capacity to avoid too many lookups (I understand that that's the
 bottleneck).
 I paste the code below, I'd be indebted if you guys grabbed it and
 tested it.
 Andrei
 struct ArrayAppender(T)
 {
      size_t capacity() const { return _capacity; }
      void putNext(T item)


I have thoughts: 1) There should probably be a length/size call 2) How about add() or append() for a shorter name 3) What about using ~=? Maybe this is too short... Jerry
Jan 08 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
jq wrote:
 dsimcha Wrote:
 
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

 The idea is very simple: just use D's native append operation, but cache
 the capacity to avoid too many lookups (I understand that that's the
 bottleneck).
 I paste the code below, I'd be indebted if you guys grabbed it and
 tested it.
 Andrei
 struct ArrayAppender(T)
 {
      size_t capacity() const { return _capacity; }
      void putNext(T item)


I have thoughts: 1) There should probably be a length/size call 2) How about add() or append() for a shorter name 3) What about using ~=? Maybe this is too short...

Length sounds good. The other two I'm more hesitant about because ArrayAppender supports the interface of an output range. The output range only allows putting one element and making a step simultaneously, hence putNext. The ~= is also a bit of an unfortunate choice because it's odd to define a type with ~= but no meaningful/desirable binary ~. Andrei
Jan 09 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article

 One definite problem that I've just realized is that there's no putNext(T[]).
 What if you need to append another array to your ArrayAppender, not just a
single
 element?

My current codebase has that. It's about time I commit. Andrei
Jan 09 2009