digitalmars.D.announce - Re: RFC on range design for D2
- bearophile <bearophileHUGS lycos.com> Sep 09 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 09 2008
- superdan <super dan.org> Sep 09 2008
- Walter Bright <newshound1 digitalmars.com> Sep 09 2008
- Walter Bright <newshound1 digitalmars.com> Sep 09 2008
- Benji Smith <dlanguage benjismith.net> Sep 09 2008
- Benji Smith <dlanguage benjismith.net> Sep 09 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 09 2008
- Benji Smith <dlanguage benjismith.net> Sep 09 2008
- Benji Smith <dlanguage benjismith.net> Sep 09 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 09 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 09 2008
- bearophile <bearophileHUGS lycos.com> Sep 10 2008
- bearophile <bearophileHUGS lycos.com> Sep 10 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 10 2008
- bearophile <bearophileHUGS lycos.com> Sep 10 2008
- superdan <super dan.org> Sep 10 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 10 2008
- bearophile <bearophileHUGS lycos.com> Sep 10 2008
- bearophile <bearophileHUGS lycos.com> Sep 10 2008
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Sep 10 2008
- Sean Kelly <sean invisibleduck.org> Sep 10 2008
- Sean Kelly <sean invisibleduck.org> Sep 10 2008
- dsimcha <dsimcha yahoo.com> Jan 08 2009
- bearophile <bearophileHUGS lycos.com> Jan 08 2009
- jq <jlquinn optonline.net> Jan 08 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 09 2009
I don't have enough experience to have a clear view on the whole subject, so I write only some general comments: 1) In time I have seen that everyone that likes/loves language X wants D to become like X. This is true for Andrei too, you clearly like C++ a lot. The worse fault of C++ is excessive complexity, that's the single fault that's killing it, that has pushed people to invent Java, and so on. Walter has clearly tried hard to make D a less complex language. This means that D is sometimes less powerful that C++, and every smart programmer can see that it's less powerful, but having 15% less power is a good deal if you have 1/3 of the complexity. As you may guess I don't like C++ (even if I like its efficiency I'm not going to use it for real coding), so I don't like D to become just a resyntaxed C++. Note that originally D was more like a statically compiled Java, and while that's not perfect, that's probably better than a resyntaxed C++. So I suggest less power, if this decreases a lot what the programmer has to keep in the mind while programming. If you don't believe me take a look at what the blurb of D says:D is a systems programming language. Its focus is on combining the power and high performance of C and C++ with the programmer productivity of modern languages like Ruby and Python. Special attention is given to the needs of quality assurance, documentation, management, portability and reliability.<
As you see it's a matter of balance. 2) Keep in mind that not all D programmers are lovers of C++ and they don't all come from C++, some of them are actually lovers of other languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may expect, they may want to push D to look more lisp, Python, C#, etc. 3) opApply has some problems, but it isn't infamous. 4) Syntax matters, and function/ method /attribute names too. So they have to be chosen with care. I suggest to use single words when possible, and to consider "alllowercase" names too (I know this contrasts with D style guide). 4b) Coping function/method/attribute names from C++ isn't bad if people think such names are good. Inventing different names just to be different is not good. 5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds. 6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to: - offer a very handy syntax, easy to use and remember, short to type too. - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation. - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib. 7) Take a look at the lazy "views" of keys/values/items of Python3, how they fit into your view of such ranges. Java too has something similar. (I can give a summary if you want. But in few words if you have an associative array (dict) d.keys() doesn't return an array, but a "view", a lazy iterable that's an object that can be sliced lazily, iterated, etc. This is way more efficient than the .keys/.values of the currenct D implementation). 8) Lot of functions/thinghies in my mostly-functional library are designed to be lazy, that is they generate items on the fly. At the moment D lacks a *lot* such view of the world. Again, take a good look at how Python 3 has shifted from eager to lazy in lot of its style. How do such things fit in your range view? I think they can fit well, but the D language has to shift its phylosophy a little to support such lazy generators/iterators more often and commonly in the built-ins too. 9) I have a long discussion in the main D newsgroup, a partial conclusion was that a struct of 2 items may not be enough for the current implementation of dynamic arrays, because withot a capacity field, the append is dead-slow. I presume this is ortogonal to your range propostal, but I have mentioned it here because you keep talking about dynamic arrays as 2-pointer structs, and that may be changed in the future. Bye, bearophile
Sep 09 2008
bearophile wrote:I don't have enough experience to have a clear view on the whole subject, so I write only some general comments:
Uh-oh. That doesn't quite bode well :o).1) In time I have seen that everyone that likes/loves language X wants D to become like X. This is true for Andrei too, you clearly like C++ a lot.
Stop right there. This is just a presupposition. I think I could say I *know* C++ a lot, which is quite different. But then I know a few other languages quite well, along with the advantages and disadvantages that made them famous. Maybe I could say I do like the STL. Not the quirks it takes to implement it in C++, but for the fact that it brings clarity and organization in defining fundamental data structures and algorithms. For my money, other collection/algorithms designs don't hold a candle to STL's.The worse fault of C++ is excessive complexity, that's the single fault that's killing it, that has pushed people to invent Java, and so on. Walter has clearly tried hard to make D a less complex language. This means that D is sometimes less powerful that C++, and every smart programmer can see that it's less powerful, but having 15% less power is a good deal if you have 1/3 of the complexity. As you may guess I don't like C++ (even if I like its efficiency I'm not going to use it for real coding), so I don't like D to become just a resyntaxed C++. Note that originally D was more like a statically compiled Java, and while that's not perfect, that's probably better than a resyntaxed C++. So I suggest less power, if this decreases a lot what the programmer has to keep in the mind while programming. If you don't believe me take a look at what the blurb of D says:D is a systems programming language. Its focus is on combining the power and high performance of C and C++ with the programmer productivity of modern languages like Ruby and Python. Special attention is given to the needs of quality assurance, documentation, management, portability and reliability.<
As you see it's a matter of balance.
I know it's a matter of balance. I am not sure what gave you the idea that I didn't.2) Keep in mind that not all D programmers are lovers of C++ and they don't all come from C++, some of them are actually lovers of other languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may expect, they may want to push D to look more lisp, Python, C#, etc.
I like many things in all of the languages above. Don't forget it was me who opened Walter to Lisp :o).3) opApply has some problems, but it isn't infamous.
opApply has two problems: 1. It can't save the state of the iteration for later resumption. 2. It uses repeated function calls through a pointer, which is measurably slow. Both disadvantages are major. The first fosters container design without true iteration. That's just bad design. (Look at built-in hashes.) The second is even worse because it makes foreach a toy useful when you read lines from the console, but not in inner loops. Which is kind of ironic because they are inner *loops*. I think many would agree that foreach minus the disadvantages would be better.4) Syntax matters, and function/ method /attribute names too. So they have to be chosen with care. I suggest to use single words when possible, and to consider "alllowercase" names too (I know this contrasts with D style guide).
It's very easy to give very general advice (which to be honest your post is replete of). It would help if you followed with a few concrete points.4b) Coping function/method/attribute names from C++ isn't bad if people think such names are good. Inventing different names just to be different is not good. 5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
I am sure you like your library more than mine, because it's your design and your realization. The code in std.algorithm is complex because it implements algorithms that are complex. I know if someone would look over partition, rotate, topN, or sort, without knowing how they work, they wouldn't have an easy job picking it up. That is fine. The purpose of std.algorithm's implementation is not to be a tutorial on algorithms. On occasion std.algorithm does a few flip-flops, but always for a good reason. For example it takes some effort to allow multiple functions in reduce. Consider: double[] a = [ 3.0, 4, 7, 11, 3, 2, 5 ]; // Compute minimum and maximum in one pass auto r = reduce!(min, max)(double.max, -double.max, a); // The type of r is Tuple!(double, double) assert(r._0 == 2); // minimum assert(r._1 == 11); // maximum A simpler reduce would have only allowed one function, so I would have had to write: auto m = reduce!(min)(double.max, a); auto M = reduce!(max)(-double.max, a); On the face of it, this looks reasonable. After all, come one, why can't one write two lines instead of one. However, min and max are so simple functions that the cost of computing them is drown by the cost of looping alone. On a large array, things become onerous so I had to write the loop by hand. How do I know that? Because I measured. Why did I measure? Because I care. Why do I care? Because the typical runtime of my programs measure in hours of sheer computation, and because things like reduce and others in std.algorithm are in the core loops. And I am not alone. If I can help it, I wouldn't want to write an algorithms library for a systems-level programming language with fundamental limitations that makes them unsuitable for efficient computation. Costly abstractions are a dime a dozen. It's efficient abstractions that are harder to come across. If you believe I am wasting time with cutesies in std.algorithm, please let me know of the places and of the suggested improvements.6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to: - offer a very handy syntax, easy to use and remember, short to type too. - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation. - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.
I agree.7) Take a look at the lazy "views" of keys/values/items of Python3, how they fit into your view of such ranges. Java too has something similar. (I can give a summary if you want. But in few words if you have an associative array (dict) d.keys() doesn't return an array, but a "view", a lazy iterable that's an object that can be sliced lazily, iterated, etc. This is way more efficient than the .keys/.values of the currenct D implementation).
To quote a classic, Lisp has had them all for 50 years. Well-defined ranges are the perfect stepping stone towards lazy computation. In particular input ranges are really generators. I plan to implement things like lazyMap and lazyReduce, and also generators that are so popular in functional languages.8) Lot of functions/thinghies in my mostly-functional library are designed to be lazy, that is they generate items on the fly. At the moment D lacks a *lot* such view of the world. Again, take a good look at how Python 3 has shifted from eager to lazy in lot of its style. How do such things fit in your range view? I think they can fit well, but the D language has to shift its phylosophy a little to support such lazy generators/iterators more often and commonly in the built-ins too.
I agree that D could use more lazy iteration instead of eager computation.9) I have a long discussion in the main D newsgroup, a partial conclusion was that a struct of 2 items may not be enough for the current implementation of dynamic arrays, because withot a capacity field, the append is dead-slow. I presume this is ortogonal to your range propostal, but I have mentioned it here because you keep talking about dynamic arrays as 2-pointer structs, and that may be changed in the future.
I saw that discussion. In my opinion that discussion clarifies why slices shouldn't be conflated with full-fledged containers. Handling storage strategy should not be the job of a slice. That's why there's a need for true containers (including straight block arrays as a particular case). Ranges are perfect for their internal implementation and for iterating them. Andrei
Sep 09 2008
Andrei Alexandrescu Wrote:bearophile wrote:I don't have enough experience to have a clear view on the whole subject, so I write only some general comments:
Uh-oh. That doesn't quite bode well :o).
u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
Sep 09 2008
superdan wrote:u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.
Sep 09 2008
Walter Bright wrote:superdan wrote:u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.
And when a correct design is devised, the mark of genius is everyone will think it in retrospect to be simple and obvious!
Sep 09 2008
Walter Bright wrote:Walter Bright wrote:superdan wrote:u could've stopped readin'. general comments without experience are oxpoop. my mailman could give'em.
The thing about iterators and collections is that they look so simple, but getting the right design is fiendishly difficult.
And when a correct design is devised, the mark of genius is everyone will think it in retrospect to be simple and obvious!
Also: everyone will measure the quality of that design using a different yardstick. For my money, the best collection design I've ever worked with is the C5 library for .Net, developed at the IT University of Copenhagen: http://www.itu.dk/research/c5/ http://www.ddj.com/windows/199902700 --benji
Sep 09 2008
bearophile wrote:6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to: - offer a very handy syntax, easy to use and remember, short to type too. - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation. - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.
I'd also add this: A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically. It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types. On a separate note... I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc. But I can think of a few examples where the concept of "iteration" needs to be even more generalized. For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation. I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this: MarkovModel<SimulationState> model = ...; for (SimulationState state : model) { state.doStuff(); } The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities). Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection. For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method). How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor. Thanks! --benji
Sep 09 2008
Benji Smith wrote:bearophile wrote:6) Built-in data types are important, they aren't meant to replace a good standard library, where you can find more specialized and more efficient data structures. The built-in data types are meant to: - offer a very handy syntax, easy to use and remember, short to type too. - They have to be efficient in a very wide variety of situations, so they must avoid having really bad peformance in a large number of situations, while it's okay for them to be not much fast in any situation. - They are useful for example when you have little data, in 90% of the code where max performance isn't so important. In the other situations you are supposed able to import things like an IntrusiveRedBlackHashMap from the std lib.
I'd also add this: A built-in data-type can't implement an interface, so there should be one and only one obvious implementation of its interface. For example, it would be a mistake to have a "File" as a built-in type, because a "File" really ought to implement an "InputStream" interface, so that people can write stream-consumer code generically. It's one of the reasons I think dynamic arrays and associate arrays belong in the standard library (as List and Map interfaces) rather than as built-in types. On a separate note... I'm also a little skeptical about the range proposal (though I'll have to give it more thought any maybe play with an implementation before I can come to any firm conclusion). The proposal seems to cover the standard cases, of bounded and unbounded forward iteration, reverse iteration, etc. But I can think of a few examples where the concept of "iteration" needs to be even more generalized. For example, I worked on a project last year (in Java) that used a hidden markov model to navigate through the nodes of a graph, driving a montel carlo simulation. I designed the MarkovModel<T> class to implement the Collection<T> interface so that I could use foreach iteration to crawl through the nodes of the graph. It basically looked like this: MarkovModel<SimulationState> model = ...; for (SimulationState state : model) { state.doStuff(); } The cool thing about this is that the simulation would continue running until it reached a natural termination point (a simulation state with no outbound state transition probabilities). Depending upon the particulars of the model, it might never end. And although the ordering of the nodes was significant, it was never deterministic. Iterating through the states of a markov model is essentially a probabilistically guided random-walk through the elements of the collection. For me, java iterators made a very natural design choice, since the iterator is such a dirt-simple interface (with just a "hasNext" and "next" method). How would the D range proposal address the sorts of problems that require non-deterministic iteration? To me, the metaphor of a "range" is nice, but it doesn't cover all the cases I'd like to see in a general-purpose iteration metaphor.
Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can do it with D's isEmpty and next. There's no difference. Andrei
Sep 09 2008
Andrei Alexandrescu wrote:Hmm, HMMs :o). If you could do it with Java's hasNext and next, you can do it with D's isEmpty and next. There's no difference. Andrei
Oh. Okay. Good to know :) I guess all the talk about "ranges" has me visualizing contiguous items and sequential ordering and determinism. It's a good word for a well-defined set of items with a true beginning and end, but I wonder whether "cursor" might be a better word than "range", especially for input consumption and non-deterministic iteration. One thing I definitely like better about the new range proposal is the notion that the container is not responsible for providing iteration logic or, more importantly, maintaining the state of any iteration. I think it'll be a welcome change. Nice work, and I'm looking forward to working with the new stuff :) --benji
Sep 09 2008
bearophile wrote:5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority. --benji
Sep 09 2008
Benji Smith wrote:bearophile wrote:5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.
This is a valid concern. The sample ranges I have coded so far look deceptively simple, and that's a good sign. It only takes a dozen lines to write an interesting range. (This is very unlike STL iterators.) Andrei
Sep 09 2008
Benji Smith wrote:bearophile wrote:5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.
Speaking of examples and readability, and to tie this with the discussion on array reallocation, I was curious on an array appender built on the output range interface. I've seen a quite complicated array builder in digitalmars.d. I wanted a simpler appender that should not do worse than the built-in ~= and that works with algorithm2 whenever data is written out. It turned out quite simple and it improved performance of a large-scale data preprocessing task of mine (which involved reading and parsing about 1M lines of integers) by 15%. I'd be curious how it fares with other tests that you guys may have. The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { private T* _buffer; private size_t _length; private size_t _capacity; this(T[] array) { _buffer = array.ptr; _length = array.length; if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof; } size_t capacity() const { return _capacity; } void putNext(T item) { invariant desiredLength = _length + 1; if (desiredLength <= _capacity) { // Should do in-place construction here _buffer[_length] = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _buffer[0 .. _length]; tmp ~= item; _buffer = tmp.ptr; _capacity = .capacity(_buffer) / T.sizeof; } _length = desiredLength; } T[] release() { // Shrink-to-fit auto result = cast(T[]) realloc(_buffer, _length * T.sizeof); // Relinquish state _buffer = null; _length = _capacity = 0; return result; } } unittest { auto app = arrayAppender(new char[0]); string b = "abcdefg"; foreach (char c; b) app.putNext(c); assert(app.release == "abcdefg"); }
Sep 09 2008
Andrei Alexandrescu:Speaking of examples and readability, and to tie this with the discussion on array reallocation, I was curious on an array appender built on the output range interface. I've seen a quite complicated array builder in digitalmars.d. I wanted a simpler appender that should not do worse than the built-in ~= and that works with algorithm2 whenever data is written out.
That builder was probably my one, designed for D1. For the last version take a look at the 'builder.d' module in the libs I have shown you. It's not too much complex: its API is simple and its internals are as complex as they have to be to be efficient. (I'm slowly improving it still, I'm now trying to make it more flexible, making its extending functionality work with other kinds of iterables too.)I'd be curious how it fares with other tests that you guys may have.
That module has about 380 lines of code of benchmarks (after few hundred of lines of unit tests), I think you can add few lines to them to benchmark your implementation, but I presume mine is more efficient :-) I may do few benchmarks later... Bye, bearophile
Sep 10 2008
Few benchmarks, appending ints, note this is a worst-case situation (other benchmarks are less dramatic). Just many appends, followed by the "release": benchmark 10, N=10_000_000: Array append: 0.813 s ArrayBuilder: 0.159 s ArrayAppender: 1.056 s benchmark 10, N=100_000_000: Array append: 10.887 s ArrayBuilder: 1.477 s ArrayAppender: 13.305 s ----------------- Chunk of the code I have used: struct ArrayAppender(T) { private T* _buffer; private size_t _length; private size_t _capacity; void build(T[] array) { _buffer = array.ptr; _length = array.length; //if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof; } size_t capacity() /* const */ { return _capacity; } void putNext(T item) { /* invariant */ int desiredLength = _length + 1; if (desiredLength <= _capacity) { // Should do in-place construction here _buffer[_length] = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _buffer[0 .. _length]; tmp ~= item; _buffer = tmp.ptr; _capacity = this.capacity() / T.sizeof; } _length = desiredLength; } T[] release() { // Shrink-to-fit //auto result = cast(T[]) realloc(_buffer, _length * T.sizeof); auto result = cast(T[]) gcrealloc(_buffer, _length * T.sizeof); // Relinquish state _buffer = null; _length = _capacity = 0; return result; } } unittest { auto app = arrayAppender(new char[0]); string b = "abcdefg"; foreach (char c; b) app.putNext(c); assert(app.release == "abcdefg"); } void benchmark10(int ntest, int N) { putr("\nbenchmark 10, N=", thousands(N), ":"); if (ntest == 0) { auto t0 = clock(); int[] a1; for (int i; i < N; i++) a1 ~= i; auto last1 = a1[$ - 1]; auto t2 = clock() - t0; putr(" Array append: %.3f", t2, " s ", last1); } else if (ntest == 1) { auto t0 = clock(); ArrayBuilder!(int) a2; for (int i; i < N; i++) a2 ~= i; auto last2 = a2.toarray[$ - 1]; auto t2 = clock() - t0; putr(" ArrayBuilder: %.3f", t2, " s ", last2); } else { auto t0 = clock(); ArrayAppender!(int) a3; a3.build(null); for (int i; i < N; i++) a3.putNext(i); auto last3 = a3.release()[$ - 1]; auto t2 = clock() - t0; putr(" ArrayAppender: %.3f", t2, " s ", last3); } } Bye, bearophile
Sep 10 2008
bearophile wrote:Few benchmarks, appending ints, note this is a worst-case situation (other benchmarks are less dramatic). Just many appends, followed by the "release": benchmark 10, N=10_000_000: Array append: 0.813 s ArrayBuilder: 0.159 s ArrayAppender: 1.056 s benchmark 10, N=100_000_000: Array append: 10.887 s ArrayBuilder: 1.477 s ArrayAppender: 13.305 s
That's odd. The array appender should never, by definition, do significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei
Sep 10 2008
Andrei Alexandrescu:That's odd. The array appender should never, by definition, do significantly worse than the straight array append.
But the reality of your code and benchmarks may differ from the abstract definition :-)I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run.
I have kept an eye on such things too. Note that benchmarks are generally tricky. I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the code doesn't make HD swap, and timings are warm. My timings are repeatable within 0.02-0.03 seconds on my PC. If you want I can give you the whole testing code, of course. But it's just the builders.d module of my libs plus the added testing code I have shown you. Anyway, in the end it doesn't matter much, I think. Bye, bearophile
Sep 10 2008
bearophile wrote:Andrei Alexandrescu:That's odd. The array appender should never, by definition, do significantly worse than the straight array append.
But the reality of your code and benchmarks may differ from the abstract definition :-)
But it's not abstract, and besides my benchmarks do support my hypothesis. On the common path my code does the minimum amount that any append would do: test, assign at index, bump. On the less common path (invoked an amortized constant number of times) my code does an actual built-in append plus a call to capacity to cache it. Assuming the built-in array does a screaming fast append, my code should be just about as fast, probably insignificantly slower because of the extra straggler operations.I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run.
I have kept an eye on such things too. Note that benchmarks are generally tricky. I am using DMD v1.035, on a Core Duo 2 GHz, 2 GB RAM, on Win, the code doesn't make HD swap, and timings are warm. My timings are repeatable within 0.02-0.03 seconds on my PC. If you want I can give you the whole testing code, of course. But it's just the builders.d module of my libs plus the added testing code I have shown you.
But I copied your test code from your post. Then I adapted it to Phobos (replaced putr with writefln, clock with ctime and the such, which shouldn't matter). Then I ran it under Phobos. For the built-in array append we get comparable numbers. So there must be something that makes your numbers for ArrayAppender skewed. I'm saying yours and not mine because mine are in line with expectations and yours aren't. Andrei
Sep 10 2008
Andrei Alexandrescu Wrote:bearophile wrote:Few benchmarks, appending ints, note this is a worst-case situation (other benchmarks are less dramatic). Just many appends, followed by the "release": benchmark 10, N=10_000_000: Array append: 0.813 s ArrayBuilder: 0.159 s ArrayAppender: 1.056 s benchmark 10, N=100_000_000: Array append: 10.887 s ArrayBuilder: 1.477 s ArrayAppender: 13.305 s
That's odd. The array appender should never, by definition, do significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei
arrayappender is simple as dumb. compare and contrast with the ginormous arraybuilder. yet works great. why? because it is on the right thing. the common case. on the uncommon case it does whatever to get the job done. don't matter if it's rare. not sure anybody saw the irony. it was bearophile advocating simplicity eh. allegedly andre's the complexity guy. code seems to tell nother story. btw doods i figured indexed access is slower than access via pointer. so i recoded arrayappender to only use pointers. there's some half a second savings for the large case. small arrays don't feel it. struct ArrayAppender(T) { private T* _begin; private T* _end; private T* _eos; this(T[] array) { _begin = array.ptr; _end = _begin + array.length; if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof; } size_t capacity() const { return _eos - _begin; } void putNext(T item) { if (_end < _eos) { *_end++ = item; } else { auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; _begin = tmp.ptr; _end = _begin + tmp.length; _eos = _begin + .capacity(_begin) / T.sizeof; } } T[] releaseArray() { auto result = _begin[0 .. _end - _begin]; _begin = _end = _eos = null; return result; } }
Sep 10 2008
superdan wrote:Andrei Alexandrescu Wrote:bearophile wrote:Few benchmarks, appending ints, note this is a worst-case situation (other benchmarks are less dramatic). Just many appends, followed by the "release": benchmark 10, N=10_000_000: Array append: 0.813 s ArrayBuilder: 0.159 s ArrayAppender: 1.056 s benchmark 10, N=100_000_000: Array append: 10.887 s ArrayBuilder: 1.477 s ArrayAppender: 13.305 s
significantly worse than the straight array append. I think some other factor intervened (e.g. swapping). Also don't forget to compile with -O -release -inline and to test several times after a warmup run. I adapted your code obtaining the numbers below: benchmark 10, N=10000000: Array append: 0.69 s ArrayAppender: 0.19 s benchmark 10, N=25000000: Array append: 2.06 s ArrayAppender: 0.82 s benchmark 10, N=50000000: Array append: 4.28 s ArrayAppender: 1.75 s benchmark 10, N=75000000: Array append: 9.62 s ArrayAppender: 5.8 s benchmark 10, N=100000000: Array append: 11.35 s ArrayAppender: 6.20 s Andrei
arrayappender is simple as dumb. compare and contrast with the ginormous arraybuilder. yet works great. why? because it is on the right thing. the common case. on the uncommon case it does whatever to get the job done. don't matter if it's rare. not sure anybody saw the irony. it was bearophile advocating simplicity eh. allegedly andre's the complexity guy. code seems to tell nother story. btw doods i figured indexed access is slower than access via pointer. so i recoded arrayappender to only use pointers. there's some half a second savings for the large case. small arrays don't feel it. struct ArrayAppender(T) { private T* _begin; private T* _end; private T* _eos; this(T[] array) { _begin = array.ptr; _end = _begin + array.length; if (_begin) _eos = _begin + .capacity(_begin) / T.sizeof; } size_t capacity() const { return _eos - _begin; } void putNext(T item) { if (_end < _eos) { *_end++ = item; } else { auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; _begin = tmp.ptr; _end = _begin + tmp.length; _eos = _begin + .capacity(_begin) / T.sizeof; } } T[] releaseArray() { auto result = _begin[0 .. _end - _begin]; _begin = _end = _eos = null; return result; } }
Thanks! Can't hurt. Guess I'll integrate your code if you don't mind. I gave it some more thought and I have a theory for the root of the issue. My implementation assumes there's exponential (multiplicative) increase of capacity in the built-in ~=. I hope Walter wouldn't do anything else. If there are differences in growth strategies between D1 and D2, that could explain the difference between bearophile's benchmarks and mine. Andrei
Sep 10 2008
Andrei Alexandrescu:I hope Walter wouldn't do anything else. If there are differences in growth strategies between D1 and D2, that could explain the difference between bearophile's benchmarks and mine.
You can't compare benchmarks of two different compilers. (Your code doesn't work (on D1) if T is a static array, and using the ~= looks like a better interface for the append. Your code doesn't append arrays, so you have to call it many times if you want to append strings (in D1) that's a very common case.) Bye, bearophile
Sep 10 2008
bearophile:(Your code doesn't work (on D1) if T is a static array,
Sorry, ignore what I have written, I'm a little nervous... Bye, bearophile
Sep 10 2008
bearophile wrote:bearophile:(Your code doesn't work (on D1) if T is a static array,
Sorry, ignore what I have written, I'm a little nervous...
I think I've unnecessarily overstated my case, which has put both of us in defensive. You are very right that tests on D1 and D2 are not comparable. And Walter has at one point made crucial changes in the allocator at my behest. Specifically, he introduced in-place growth whenever possible and added the expand() primitive to the gc. These are present in D2 but I don't know whether and when he has regressed those to D1. (And btw why wouldn't you try it :o).) Andrei
Sep 10 2008
Andrei Alexandrescu wrote:bearophile wrote:bearophile:(Your code doesn't work (on D1) if T is a static array,
Sorry, ignore what I have written, I'm a little nervous...
I think I've unnecessarily overstated my case, which has put both of us in defensive. You are very right that tests on D1 and D2 are not comparable. And Walter has at one point made crucial changes in the allocator at my behest. Specifically, he introduced in-place growth whenever possible and added the expand() primitive to the gc. These are present in D2 but I don't know whether and when he has regressed those to D1. (And btw why wouldn't you try it :o).)
For the record, in-place growth has been in D1 for as long as it's been in D2. Sean
Sep 10 2008
Andrei Alexandrescu wrote:I gave it some more thought and I have a theory for the root of the issue. My implementation assumes there's exponential (multiplicative) increase of capacity in the built-in ~=. I hope Walter wouldn't do anything else. If there are differences in growth strategies between D1 and D2, that could explain the difference between bearophile's benchmarks and mine.
Arrays larger than 4k grow logarithmically, smaller than 4k they grow exponentially. This is certainly true of D1 and Tango, and I'd assume D2 is no different. Sean
Sep 10 2008
Sean Kelly wrote:Andrei Alexandrescu wrote:I gave it some more thought and I have a theory for the root of the issue. My implementation assumes there's exponential (multiplicative) increase of capacity in the built-in ~=. I hope Walter wouldn't do anything else. If there are differences in growth strategies between D1 and D2, that could explain the difference between bearophile's benchmarks and mine.
Arrays larger than 4k grow logarithmically, smaller than 4k they grow exponentially. This is certainly true of D1 and Tango, and I'd assume D2 is no different.
Yes, but with in-place expansion, effective growth stays exponential even beyond 4k. So I'm not that worried that it could become quadratic, unless there are some really wicked workloads. I modified my putNext to print what's happening: void putNext(T item) { if (_end < _eos) { // Should do in-place construction here *_end++ = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _begin[0 .. _end - _begin]; tmp ~= item; if (_begin != tmp.ptr) { _begin = tmp.ptr; _end = _begin + tmp.length; writeln(_end - _begin); } else { ++_end; } _eos = _begin + .capacity(_begin) / T.sizeof; } } Notice the writeln. On my system the console reads: benchmark 10, N=75000000: 1 5 9 17 33 65 129 257 513 253953 1155073 4743169 18505729 68861953 That's exponential alright. On the other hand, if you move the writeln after the if, indeed: 1 5 9 17 33 65 129 257 513 1025 2049 3073 4097 5121 6145 7169 8193 9217 10241 11265 12289 But it's the former column that matters, because moving chinks is where real work is being done. In-place block expansion should take constant time. With this behavior in place, my code amortizes calls to capacity() by a factor of 1024, and keeps amortized append complexity constant. Andrei
Sep 10 2008
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleBenji Smith wrote:bearophile wrote:5) The source code of the current algorithm module of D2 is already very complex to follow, it smells of over-generalization here and there. Sometimes it's better to reduce the generality of things, even if that reduces their power a little, to reduce complexity, etc. Tango code too isn't perfect, but it often looks more human. While you have created the algorithm module I too have created something similar, but based on different grounds.
Along these same lines, while D is still young, the documentation is often thin, and code examples are scarce. I've been doing a lot of programming lately with Tango, and despite the growing documentation, I've had to refer directly to the Tango source code on more occasions than I can count. Luckily, the Tango sources are well-written and pretty easy to read and understand, and I've had very few problems figuring out how to use the library. I hope the range implementation makes readability a high priority.
discussion on array reallocation, I was curious on an array appender built on the output range interface. I've seen a quite complicated array builder in digitalmars.d. I wanted a simpler appender that should not do worse than the built-in ~= and that works with algorithm2 whenever data is written out. It turned out quite simple and it improved performance of a large-scale data preprocessing task of mine (which involved reading and parsing about 1M lines of integers) by 15%. I'd be curious how it fares with other tests that you guys may have. The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { private T* _buffer; private size_t _length; private size_t _capacity; this(T[] array) { _buffer = array.ptr; _length = array.length; if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof; } size_t capacity() const { return _capacity; } void putNext(T item) { invariant desiredLength = _length + 1; if (desiredLength <= _capacity) { // Should do in-place construction here _buffer[_length] = item; } else { // Time to reallocate, do it and cache capacity auto tmp = _buffer[0 .. _length]; tmp ~= item; _buffer = tmp.ptr; _capacity = .capacity(_buffer) / T.sizeof; } _length = desiredLength; } T[] release() { // Shrink-to-fit auto result = cast(T[]) realloc(_buffer, _length * T.sizeof); // Relinquish state _buffer = null; _length = _capacity = 0; return result; } } unittest { auto app = arrayAppender(new char[0]); string b = "abcdefg"; foreach (char c; b) app.putNext(c); assert(app.release == "abcdefg"); }
One definite problem that I've just realized is that there's no putNext(T[]). What if you need to append another array to your ArrayAppender, not just a single element?
Jan 08 2009
dsimcha:One definite problem that I've just realized is that there's no putNext(T[]). What if you need to append another array to your ArrayAppender, not just a single element?
Just add a simple method overload for that purpose, it's easy enough to do. (the ArrayBuilder of my dlibs has this already, of course). Bye, bearophile
Jan 08 2009
dsimcha Wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { size_t capacity() const { return _capacity; } void putNext(T item)
I have thoughts: 1) There should probably be a length/size call 2) How about add() or append() for a shorter name 3) What about using ~=? Maybe this is too short... Jerry
Jan 08 2009
jq wrote:dsimcha Wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
The idea is very simple: just use D's native append operation, but cache the capacity to avoid too many lookups (I understand that that's the bottleneck). I paste the code below, I'd be indebted if you guys grabbed it and tested it. Andrei struct ArrayAppender(T) { size_t capacity() const { return _capacity; } void putNext(T item)
I have thoughts: 1) There should probably be a length/size call 2) How about add() or append() for a shorter name 3) What about using ~=? Maybe this is too short...
Length sounds good. The other two I'm more hesitant about because ArrayAppender supports the interface of an output range. The output range only allows putting one element and making a step simultaneously, hence putNext. The ~= is also a bit of an unfortunate choice because it's odd to define a type with ~= but no meaningful/desirable binary ~. Andrei
Jan 09 2009
dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
One definite problem that I've just realized is that there's no putNext(T[]). What if you need to append another array to your ArrayAppender, not just a single element?
My current codebase has that. It's about time I commit. Andrei
Jan 09 2009