www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - dcollections version 0.02

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
changes:

 * utilizes chunk allocator for more efficient memory management (Tango 
only)
 * includes a script to build as a library to make setting up your 
environment easier
 * fix to hash implementation for efficiency
 * other bug fixes

-Steve 
Aug 04 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Steven Schveighoffer" wrote
 changes:

 * utilizes chunk allocator for more efficient memory management (Tango 
 only)
 * includes a script to build as a library to make setting up your 
 environment easier
 * fix to hash implementation for efficiency
 * other bug fixes

And the link :) http://www.dsource.org/projects/dcollections -Steve
Aug 04 2008
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
Nice work.  This brings to the table a lot of stuff that I really felt was
missing
in D2.  One quick question, though.  Given that associative arrays are built-in
and implemented as hash tables, are there any tradeoffs dcollections makes
differently than the builtins that would make me choose one over the other?  As
a
hypothetical example, is the dcollections HashMap more tuned for space at the
expense of speed, or vice-versa than the builtin?
Aug 05 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"dsimcha" wrote
 Nice work.  This brings to the table a lot of stuff that I really felt was 
 missing
 in D2.  One quick question, though.  Given that associative arrays are 
 built-in
 and implemented as hash tables, are there any tradeoffs dcollections makes
 differently than the builtins that would make me choose one over the 
 other?  As a
 hypothetical example, is the dcollections HashMap more tuned for space at 
 the
 expense of speed, or vice-versa than the builtin?

Two main reasons you might choose HashMap over the builtin. 1. customizable. You can customize down to the hash implementation. 2. api. HashMap offers a lot more features than the builtin, such as removal while traversing, or keeping a cursor to a specific element for later use and possible removal. The ArrayList only really exists to implement the list interface as an array-style container. I made it really easy to switch between using the builtin array type and the ArrayList. There is the performance benefit, but you must be using Tango (and therefore D1), as Phobos does not provide a necessary GC function needed for the custom allocator. At that point, you can just use the Tango containers (although there are some features that the tango containers don't have). On a side note, have you used dcollections with D2? I haven't tested it. And it's not const-ified. -Steve
Aug 05 2008
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 On a side note, have you used dcollections with D2?  I haven't tested it.
 And it's not const-ified.
 -Steve

Actually, yes, I thought that was part of the point of the release was D2 support. I've tried it very little, but it at least compiles seems to work. One small hitch: You have to do a find/replace, find all instances of "int opEquals" and replace with "bool opEquals". This can be done perfectly as an automated search. Anyhow, D2 support is the reason I'm not using Tango.
Aug 05 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"dsimcha" wrote
 == Quote from Steven Schveighoffer article
 On a side note, have you used dcollections with D2?  I haven't tested it.
 And it's not const-ified.
 -Steve

Actually, yes, I thought that was part of the point of the release was D2 support. I've tried it very little, but it at least compiles seems to work. One small hitch: You have to do a find/replace, find all instances of "int opEquals" and replace with "bool opEquals". This can be done perfectly as an automated search.

Yeah, it should work (with the opEquals change) as long as you don't need a const container to work :) Good to know that it builds at least. The only thing is that just because the library 'builds' doesn't mean it all works :) The classes/structs are all templates and so won't really 'compile' until you use them. But if you could just build the examples, and let me know if they work, I'd appreciate it! And BTW, the point of the release was to add the library building scripts, and fix some bugs (as discussed in the start of this thread).
 Anyhow, D2 support is the reason I'm not using Tango.

I'm on the opposite side, not using D2 because Tango can't build with it (yet) :) As soon as Tango builds with D2, I'll be switching over. And then I'll probably migrate dcollections to D2. -Steve
Aug 05 2008
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
 But if you could just build the examples, and let me know if they work, I'd
 appreciate it!

Actually, they all work with only very minor tweaks to the test files and no tweaks to the library files except the int opEquals/bool opEquals I mentioned previously. The tweaks are little things like: 1. Use writefln instead of Stdout, since I can't use Tango w/ D2 yet. 2. Explicitly cast some classes to their base class before passing to template functions b/c apparently (IDK why) D2 template instantiation has some quirks when used w/ inheritance that D1 doesn't have. I've attached the slightly tweaked test files. I did all the tweaks quick and dirty b/c I didn't want to waste too much time, for example, formatting the printing, but you can diff them against the origninals and see that almost nothing has changed.
Aug 05 2008
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"dsimcha" wrote
 But if you could just build the examples, and let me know if they work, 
 I'd
 appreciate it!

Actually, they all work with only very minor tweaks to the test files and no tweaks to the library files except the int opEquals/bool opEquals I mentioned previously. The tweaks are little things like:

Cool thanks!
 1.  Use writefln instead of Stdout, since I can't use Tango w/ D2 yet.

Right, I forgot about that...
 2.  Explicitly cast some classes to their base class before passing to 
 template
 functions b/c apparently (IDK why) D2 template instantiation has some 
 quirks when
 used w/ inheritance that D1 doesn't have.

That's odd... There probably should be a bug filed on that. I'll investigate further.
 I've attached the slightly tweaked test files.  I did all the tweaks quick 
 and
 dirty b/c I didn't want to waste too much time, for example, formatting 
 the
 printing, but you can diff them against the origninals and see that almost 
 nothing
 has changed.

Yes, looks good, thanks again! -Steve
Aug 05 2008
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:
 The only thing is that just because 
 the library 'builds' doesn't mean it all works :)  The classes/structs are 
 all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too! Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections... Bye, bearophile
Aug 05 2008
next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
bearophile wrote:

 Steven Schveighoffer:
 The only thing is that just because
 the library 'builds' doesn't mean it all works :)  The classes/structs
 are all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too! Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections... Bye, bearophile

You can find a very efficient stack in the new Tango containers. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Aug 06 2008
next sibling parent reply maelp <mael.primet gmail.com> writes:
 You can find a very efficient stack in the new Tango containers.

This raises the questions: a. How does the structures from dcollection and Tango compare, in terms of integration to tango, speed, memory efficiency, richness of API? b. Why aren't the efforts of both libraries combined ? Dcollection is maybe not mature enough ? Are there plans to merge them if dcollections happen to be more efficient ?
Aug 06 2008
next sibling parent Lars Ivar Igesund <larsivar igesund.net> writes:
maelp wrote:

 You can find a very efficient stack in the new Tango containers.

This raises the questions: a. How does the structures from dcollection and Tango compare, in terms of integration to tango, speed, memory efficiency, richness of API? b. Why aren't the efforts of both libraries combined ? Dcollection is maybe not mature enough ? Are there plans to merge them if dcollections happen to be more efficient ?

Implementation wise, the approaches are quite different, thus it is not likely that it is easy to merge much. It is possible that dcollections has a richer API (I don't know), but Tango's is implemented to be extemely fast and memory efficient (and have a fairly rich API). In terms of age, dcollections is actually slightly older than tango.util.container, the tango.util.collection package in Tango will be deprecated. Steven is probably best suited to answer further questions, but he _has_ contributed to the new Tango containers. -- Lars Ivar Igesund blog at http://larsivi.net DSource, #d.tango & #D: larsivi Dancing the Tango
Aug 06 2008
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"maelp" wrote
 You can find a very efficient stack in the new Tango containers.

This raises the questions: a. How does the structures from dcollection and Tango compare, in terms of integration to tango, speed, memory efficiency, richness of API?

At the moment, Tango is more efficient for speed when using keys or values that do not have GC pointers. When using keys or values that do have GC pointers, dcollections is more efficient, however, i have submitted a ticket for Tango to include the allocator I use in dcollections to make this more efficient, and then I think Tango will have the edge in performance. However, I plan at some point to implement the same allocators in dcollections that Tango uses for the non-GC pointer types. At this time, Tango probably still will have the edge in performance because of the design of it.
 b. Why aren't the efforts of both libraries combined ? Dcollection is 
 maybe not mature enough ? Are there plans to merge them if dcollections 
 happen to be more efficient ?

The two libraries have different design goals. My design is much closer to C++'s containers, and I made sure all of dcollection's containers can be iterated in both directions, and the cursors were the main mode of operation when operating on the containers. Tango's containers focus on speed and efficiency, while ease of iteration is a secondary goal (I'm only guessing here, as I didn't design it, but it appears that way). However, as a contributor to Tango, I'll always try to contribute any new ideas I have with dcollections as long as it fits within the design of Tango's containers. I doubt that dcollections will ever be more efficient than Tango's containers, but I still plan on maintaining it because I prefer the interface. Complexity-wise, they should be about the same. -Steve
Aug 06 2008
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Lars Ivar Igesund:
 You can find a very efficient stack in the new Tango containers.

I see... I'll try to port that code to Phobos then, thank you (but I'll keep my deque, because sometimes I need more than a stack. I'll add the deque to my libs when I can). Bye, bearophile
Aug 06 2008
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"bearophile" wrote
 Steven Schveighoffer:
 The only thing is that just because
 the library 'builds' doesn't mean it all works :)  The classes/structs 
 are
 all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too! Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...

I'm assuming you are talking about STL's deque? I don't have anything exactly. You can implement a stack with the LinkList class. If you end up finishing your deque, I'll gladly take a look at adding it to dcollections. The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque. -Steve
Aug 06 2008
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"Steven Schveighoffer"
 "bearophile" wrote
 Steven Schveighoffer:
 The only thing is that just because
 the library 'builds' doesn't mean it all works :)  The classes/structs 
 are
 all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too! Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...

I'm assuming you are talking about STL's deque? I don't have anything exactly. You can implement a stack with the LinkList class. If you end up finishing your deque, I'll gladly take a look at adding it to dcollections. The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque.

Actually, it wouldn't work to provide O(1) lookup. I think you need probably 2 arrays of arrays to provide the O(1) prepend, one that goes 'down' instead of 'up'. I'll think about how this could be implemented. Maybe I'll take a look at the design of gcc's deque. -Steve
Aug 06 2008
parent bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:
 I'm assuming you are talking about STL's deque?

I am talking about Python collections.deque: http://docs.python.org/lib/deque-objects.html The purpose of my libs is yet another, (often but not always) to mimic the ease of use of Python ;-)
 I don't have anything exactly.  You can implement a stack with the LinkList
class.

A normal linked list may be too much slow for this because it allocates too many small structures, and iterating on it can be slow for low space efficiency and low cache coherence.
 If you end up finishing your deque, I'll gladly take a look at adding it to 
 dcollections.

Very good :-) But I don't know how much well its style can fit in. For example I have already written an hash set into my libs, but its API is modeled around the Python sets API...
 Actually, it wouldn't work to provide O(1) lookup.  I think you need 
 probably 2 arrays of arrays to provide the O(1) prepend, one that goes 
 'down' instead of 'up'.
 I'll think about how this could be implemented.

I think there are two good ways to implement a mutable deque data structure: 1) Dynamic array of pointers to fixed-size arrays that I call blocks. 2) Unrolled linked list (double linked, usually), that is a chain of fixed-size arrays, where you add/remove items only at the tail/head and not in the middle. Image of the first implementation: http://pages.cpsc.ucalgary.ca/~kremer/STL/1024x768/deque.jpeg (If you are using immutable data structures you need much more complex stuff, like finger trees). Python deque is implemented as the second one, the deque of the STL is often the first one. They have different advantages and disadvantages, so ideally you may create a collection that includes both implementations (but I presume no one does this). And both can be tuned changing the block size (a smart data structure can keep few runtime statistics of its usage and adapt its block size to the specific usage as time goes on, I'll think about this). They are both fast in iteration, probably just about 2-2.5 times slower than iterating over a normal array. The second one is probably a bit simpler to implement. The bigger difference is that in the second one the access time to random items isn't O(1), but if the blocks are large enough this isn't a big problem. The first implementation may be a bit slower for the normal deque usage, because once in a while you have to reset/tidy the main dynamic array (see below). In both implementations you have to keep two pointers or two lengths that keep how much items are contained into the first and last fixed-size arrays (in a generic unrolled linked list implementation you can have holes everywhere, so you have to store how many items are present in each block, and you have to merge/split adjacent blocks too much empty/filled up, this makes the management less easy. A deque makes things simpler). In the first implementation the dynamic array can be managed as a circular deque, so you may need a modulus operation each time you access an item randomly, this requires a bit of time more than accessing items sequentially in the second implementation. When the circular deque is too much empty/filled up, you have to create a new one and copy data, but this generally requires little time because this dynamic arrays is 300/1000/10000 times smaller than the total number of items inserted into the deque. So there's probably no need to invent more complex management schemes for this, like you say. Bye, bearophile
Aug 06 2008
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"bearophile" wrote
 Steven Schveighoffer:
 The only thing is that just because
 the library 'builds' doesn't mean it all works :)  The classes/structs 
 are
 all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too!

Yeah, dcollections is pretty lacking in unittests :( That's one of my TODOs -Steve
Aug 06 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Wed, Aug 6, 2008 at 11:04 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 "Steven Schveighoffer"
 "bearophile" wrote
 Steven Schveighoffer:
 The only thing is that just because
 the library 'builds' doesn't mean it all works :)  The classes/structs
 are
 all templates and so won't really 'compile' until you use them.

But D unittests exists to test all templates too! Is an efficient deque too included? I often find the need for it (or just a stack). I have partially written a deque data structure (implemented as dynamic array of pointers to fixed size arrays, it's not an unrolled linked list), I may finish it to add it to your collections...

I'm assuming you are talking about STL's deque? I don't have anything exactly. You can implement a stack with the LinkList class. If you end up finishing your deque, I'll gladly take a look at adding it to dcollections. The ArrayMultiset is implemented as a linked list of dynamic arrays, a similar approach would probably work for a deque.

Actually, it wouldn't work to provide O(1) lookup. I think you need probably 2 arrays of arrays to provide the O(1) prepend, one that goes 'down' instead of 'up'. I'll think about how this could be implemented. Maybe I'll take a look at the design of gcc's deque.

I think the wikipedia article on deque implementations is actually pretty good. One way to implement is to use a circular buffer internally. You can tack on elements at either end of the empty space. It also works very efficiently for the case where you are hovering around a constant number of elements. No re-allocs needed in that case. And all your extra capacity can always be used for either end of the queue. When you are forced to realloc you can shift elements at the same time so they're all at the front or back end of the new buffer. Anyway there are various ways to do it, but that was the implementation that sounded coolest to me. Most of the others don't do well with the case of stead flow in and steady flow out. They force either copying or reallocations. --bb
Aug 06 2008