www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.container update

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I just implemented a singly-linked list type to illustrate the container 
abstraction.

http://erdani.com/d/phobos/std_container.html
http://erdani.com/d/phobos/container.d

One interesting aspect is that SList must cooperate with Take. More 
details are to be found in code, but essentially SList ranges are all 
sentinel-terminated, which makes them "right-bound". If you want to only 
remove a few elements from the middle of a list, you need to construct a 
range that spans a limited portion of the list. I encoded that by using 
the Take abstraction in std.range.

Please let me know of how you find it. There are a few refinements and 
ancillary functions to define, but those are pretty clear given the rest.


Andrei
May 26 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I just implemented a singly-linked list type to illustrate the container  
 abstraction.

 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 One interesting aspect is that SList must cooperate with Take. More  
 details are to be found in code, but essentially SList ranges are all  
 sentinel-terminated, which makes them "right-bound". If you want to only  
 remove a few elements from the middle of a list, you need to construct a  
 range that spans a limited portion of the list. I encoded that by using  
 the Take abstraction in std.range.

 Please let me know of how you find it. There are a few refinements and  
 ancillary functions to define, but those are pretty clear given the rest.

Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list. -Steve
May 27 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2010 08:23 AM, Steven Schveighoffer wrote:

You're making a number of great points in the four posts starting with
this. Since they sort of augment one another, let me quote and answer
them all here.

 On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I just implemented a singly-linked list type to illustrate the
 container abstraction.

 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 Please let me know of how you find it. There are a few refinements
  and ancillary functions to define, but those are pretty clear
 given the rest.

Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list. -Steve

The performance profile of SList is what one would expect for a straightforward singly-linked list implemented with nodes, pointer to root, and the such. You are making a good point that adding one 1-2 extra indirections would lead to improvements of certain primitives. There have been various experiments with STL-like slist implementations that dwell on such ideas. See e.g. http://manpages.ubuntu.com/manpages/lucid/man3/std::forward_list.3cxx.html with functions like before_begin() in addition to begin(), and also insert_after() instead of insert(). It has become clear to STL implementors that adding hidden indirections puts forward_list at a disadvantage compared to the straightforward lists against which they compete. I want to avoid that by abiding to the straight storage strategy and deal with its consequences. Second post:
 In fact, I'd say that it would be critical to split the insert and
 remove functions to slow (O(n) or greater) versions and fast( O(lg(n)
 or less)) functions.  Some algorithms may depend on fast insertion
 and removal, such as mergesort on a linked list.

This is a good idea, which generalizes linearRemove(). I'd suspected if there's one place to distinguish linear from better, then there ought to be more. From here we have a fork in the road: a) Tighten complexity requirements for e.g. insertBefore(Range, Stuff) and let containers that can't implement them just not have them. b) Define linearInsertBefore(Range, Stuff) etc. I'm leaning towards (a) because a list can insert virtually anywhere by only using insertAfter(Take!Range, Stuff) (which I haven't defined yet but I will, and which also takes care of the complexity issue) and insertFront(Stuff). Third post:
 Actualy mergesort could not be implemented without some sort of way
 to restructure the list without allocating new nodes, requiring
 knowledge of the link structure.  I'm not sure it could be
 generic... I'll think about this a bit.

You are entirely right. A mergesort that relies on link shuffling needs to have intimate knowledge of the node structure. It would most likely be a member of the list. Fourth post:
 I have another general question in general about these collections.

 Ranges fix a very bad problem with iterators -- non-matching begin
 and end iterators.  For example, passing a begin from one list and
 and end from another.

Right.
 However, all your range functions such as remove, insertBefore, etc.
 are called via the container type, using a range as an argument.  But
 there can be no static verification that a range actually is part of
 a given container.  Should there be some requirement that a container
 validates that a range is part of the container before performing the
 operation?  This is the case for dcollections.

 In your current implementation of SList, verification seems
 incidental for insertBefore because you must find the previous node
 from the root anyways (and that will throw on enforce if it cannot
 find it).  But insertAfter does not, so something like this will
 compile, run, and result in strange behavior:

 auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this
 correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8,
 9, 10);

 auto r1 = sl1[]; r1.popFront(); r1.popFront();

 auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts
 into sl1

It's a good question. My current view of the matter is to maximize library flexibility and versatility. Following the adage that you can build safe on fast but not the other way around, I'd go with the following philosophy: checking of range membership to their collections should be made only to the extent of ensuring memory safety. Beyond that, if efficiency is in tension with verifiability (as is the case with your example above), leave it to the programmer or the supra-structure to correctly pair collections with their ranges. Sounds good? Andrei
May 27 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 May 2010 09:23:03 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I just implemented a singly-linked list type to illustrate the  
 container abstraction.

 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 One interesting aspect is that SList must cooperate with Take. More  
 details are to be found in code, but essentially SList ranges are all  
 sentinel-terminated, which makes them "right-bound". If you want to  
 only remove a few elements from the middle of a list, you need to  
 construct a range that spans a limited portion of the list. I encoded  
 that by using the Take abstraction in std.range.

 Please let me know of how you find it. There are a few refinements and  
 ancillary functions to define, but those are pretty clear given the  
 rest.

Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list.

In fact, I'd say that it would be critical to split the insert and remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or less)) functions. Some algorithms may depend on fast insertion and removal, such as mergesort on a linked list. -Steve
May 27 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 May 2010 09:27:39 -0400, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 27 May 2010 09:23:03 -0400, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I just implemented a singly-linked list type to illustrate the  
 container abstraction.

 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 One interesting aspect is that SList must cooperate with Take. More  
 details are to be found in code, but essentially SList ranges are all  
 sentinel-terminated, which makes them "right-bound". If you want to  
 only remove a few elements from the middle of a list, you need to  
 construct a range that spans a limited portion of the list. I encoded  
 that by using the Take abstraction in std.range.

 Please let me know of how you find it. There are a few refinements and  
 ancillary functions to define, but those are pretty clear given the  
 rest.

Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list.

In fact, I'd say that it would be critical to split the insert and remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or less)) functions. Some algorithms may depend on fast insertion and removal, such as mergesort on a linked list.

Actualy mergesort could not be implemented without some sort of way to restructure the list without allocating new nodes, requiring knowledge of the link structure. I'm not sure it could be generic... I'll think about this a bit. -Steve
May 27 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I just implemented a singly-linked list type to illustrate the container  
 abstraction.

 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 One interesting aspect is that SList must cooperate with Take. More  
 details are to be found in code, but essentially SList ranges are all  
 sentinel-terminated, which makes them "right-bound". If you want to only  
 remove a few elements from the middle of a list, you need to construct a  
 range that spans a limited portion of the list. I encoded that by using  
 the Take abstraction in std.range.

 Please let me know of how you find it. There are a few refinements and  
 ancillary functions to define, but those are pretty clear given the rest.

I have another general question in general about these collections. Ranges fix a very bad problem with iterators -- non-matching begin and end iterators. For example, passing a begin from one list and and end from another. However, all your range functions such as remove, insertBefore, etc. are called via the container type, using a range as an argument. But there can be no static verification that a range actually is part of a given container. Should there be some requirement that a container validates that a range is part of the container before performing the operation? This is the case for dcollections. In your current implementation of SList, verification seems incidental for insertBefore because you must find the previous node from the root anyways (and that will throw on enforce if it cannot find it). But insertAfter does not, so something like this will compile, run, and result in strange behavior: auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8, 9, 10); auto r1 = sl1[]; r1.popFront(); r1.popFront(); auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts into sl1 -Steve
May 27 2010
prev sibling next sibling parent reply Pillsy <pillsbury gmail.com> writes:
== Quote from Andrei Alexandrescu
(SeeWebsiteForEmail erdani.org)'s article:
 I just implemented a singly-linked list type to illustrate
 the container abstraction.
 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 Please let me know of how you find it.

One thing worries me a bit, but may be based on a misunderstanding of how D's GC works. In dup, you allocate all the nodes at once as an array and then go through them one by one setting their payload_ and next_ values to make the list. My worry is that the array is allocated as a single block and (presumably) GC'd as a single block. If I copy a 1000 element list and then remove 999 elements from the front using removeFront(), won't I end up leaking memory for the first 999 nodes? Cheers, Pillsy
May 27 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2010 11:01 AM, Pillsy wrote:
 == Quote from Andrei Alexandrescu
 (SeeWebsiteForEmail erdani.org)'s article:
 I just implemented a singly-linked list type to illustrate
 the container abstraction.
 http://erdani.com/d/phobos/std_container.html
 http://erdani.com/d/phobos/container.d

 Please let me know of how you find it.

One thing worries me a bit, but may be based on a misunderstanding of how D's GC works. In dup, you allocate all the nodes at once as an array and then go through them one by one setting their payload_ and next_ values to make the list. My worry is that the array is allocated as a single block and (presumably) GC'd as a single block. If I copy a 1000 element list and then remove 999 elements from the front using removeFront(), won't I end up leaking memory for the first 999 nodes?

There are indeed tradeoffs associated with allocation of nodes that way. Technically that's not a leak, but indeed the entire block will stay put for as long as at least one node in it it's used. One improvement would be to put removed nodes in a static freelist to be used by all SLists of the same type in the current thread. In that case, remove becomes unstable. I think I hastened a bit with that optimization, it's debatable and detracts from the main point of the discussion. Andrei
May 27 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Another update:

http://erdani.com/d/phobos/std_container.html
http://erdani.com/d/phobos/container.d

I simplified the implementation (no more array allocation etc.), 
eliminated replace() and insertBefore() after convincing myself they are 
not a good fit for lists, and added linearRemove(Take!Range).

Take!Range seems to be the secret recipe for an enjoyable SList experience.

I'll follow up later with an Array sample implementation. Then on the 
radar are an associative container (probably a straight hash or a binary 
tree) and then TightSList and TightArray which use deterministic storage 
management.

I figured out a lot of stuff, and it seems to settle together quite nicely.


Andrei
May 27 2010
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmail.com> writes:
I take it that Array is basically supposed to be std.container's version of 
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not 
the best of names in that it would become very easy to confuse it with 
built-in arrays when discussing them (particularly in verbal communication 
where you can't see the capital A). Personally, I would have just gone with 
Vector, since it's a fairly standard name for that sort of container and 
wouldn't be confused with anything else. If you really want Array, that's 
fine - it should be clear enough when it's in code - but I would think that 
talking about it could easily be confusing enough that it wouldn't be the 
best of names.

- Jonathan M Davis
May 27 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Jonathan M Davis <jmdavisProg gmail.com> wrote:

 wouldn't be confused with anything else. If you really want Array, that's
 Personally, I would have just gone with Vector, since it's a fairly

anything else. I take it you don't work in simulation or games, then? Don't do much linear algebra? While I understand the reasoning for it, I dislike the name vector for arrays. A vector to me, is a geometric object with a length and magnitude, not a random collection of whatevers. -- Simen
May 27 2010
parent Jonathan M Davis <jmdavisProg gmail.com> writes:
Simen kjaeraas wrote:
 I take it you don't work in simulation or games, then? Don't do much
 linear algebra?
 
 While I understand the reasoning for it, I dislike the name vector for
 arrays. A vector to me, is a geometric object with a length and magnitude,
 not a random collection of whatevers.

Vector is arguably a poor name because of that. However, it is, at this point, a very standard name for such a container. At minimum, both C++ and Java have a vector class, though the main one that you'd use in Java at this point would be ArrayList rather than Vector. So, if a better name can be found, that's great, but Vector is quite a standard name for the concept at this point, and I do think that it works better than Array. I also don't like the name ArrayList because I wouldn't really consider a vector to be a list, but that's another option. I'm not all that familiar with what other languages have used for such a class, but there are probably other pre- existing options out there as well. - Jonathan M Davis
May 28 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2010 06:28 PM, Jonathan M Davis wrote:
 I take it that Array is basically supposed to be std.container's version of
 C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
 the best of names in that it would become very easy to confuse it with
 built-in arrays when discussing them (particularly in verbal communication
 where you can't see the capital A). Personally, I would have just gone with
 Vector, since it's a fairly standard name for that sort of container and
 wouldn't be confused with anything else. If you really want Array, that's
 fine - it should be clear enough when it's in code - but I would think that
 talking about it could easily be confusing enough that it wouldn't be the
 best of names.

 - Jonathan M Davis

Good point. Other opinions? Andrei
May 27 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2010 08:42 PM, Leandro Lucarella wrote:
 Andrei Alexandrescu, el 27 de mayo a las 20:06 me escribiste:
 On 05/27/2010 06:28 PM, Jonathan M Davis wrote:
 I take it that Array is basically supposed to be std.container's version of
 C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
 the best of names in that it would become very easy to confuse it with
 built-in arrays when discussing them (particularly in verbal communication
 where you can't see the capital A). Personally, I would have just gone with
 Vector, since it's a fairly standard name for that sort of container and
 wouldn't be confused with anything else. If you really want Array, that's
 fine - it should be clear enough when it's in code - but I would think that
 talking about it could easily be confusing enough that it wouldn't be the
 best of names.

 - Jonathan M Davis

Good point. Other opinions?

I always thought vector was a bad name choice in C++, I had that word associated with what C++ calls a valarray (a physics vector). I agree that Array is too generic for D, though, and unfortunately I don't have a better suggestion, but you asked for other opinions ;) BTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing.

This is a good question. The relationship between Array and T[] is simple: * Array is a compliant container so it is a reference type * T[] is Array's range With built-in associative arrays the question becomes even more interesting because the associative array _is_ already a reference type. So Walter and I agreed to make built-in associative arrays just compliant std.container objects, with three differences: * AssocArray lives in object.di for various reasons * The compiler translates the type name V[K] as AssocArray!(K, V) * The compiler translates associative array literals into associative array objects Other than that... an associative array will be just one of the growing offering of collections in std.container. And I think that's the way it should be. Andrei
May 27 2010
prev sibling parent reply Jonathan M Davis <jmdavisProg gmail.com> writes:
 BTW, what would be the point of an array/vector when you have built-in
 arrays? If built-in arrays would be syntax sugar for a real library type,
 like AAs, I can see as a good option using Array for that type, since
 built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type. On top of that, of course, there's the issue of the API used to interact with it, which is what we're getting with std.container, but that could probably be gotten around in much the same way as the range stuff was with std.array. Really, the big difference is that you can efficiently append to a vector type while you cannot do that with an array. The array is still a tad too low level. And since you do need that low levelness at least some of the time (particularly in a systems language), it's not like you could replace arrays with vectors or make them act more like vectors. Both types have their uses. - Jonathan M Davis
May 28 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Jonathan M Davis:

 The biggest difference is that a vector has capacity separate from its 
 size/length. You can efficiency insert elements at the end with a vector - 
 amortized constant time usually - but you can't do that with a built-in 
 array because it would have to reallocate every time. D's arrays are 
 fantastic, but they're still not quite good enough to outright replace a 
 vector type.

I can think about the opposite too: to remove the built-in dynamic arrays and replace them as it's being done with associative arrays, keeping only the syntax sugar and mapping them to the Array/Vector. Is this a bad idea? (this is only about dynamic arrays, at the moment built-in fixed-sized arrays have to be kept). Bye, bearophile
May 28 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 28 May 2010 07:58:55 -0400, Jonathan M Davis  
<jmdavisProg gmail.com> wrote:

 BTW, what would be the point of an array/vector when you have built-in
 arrays? If built-in arrays would be syntax sugar for a real library  
 type,
 like AAs, I can see as a good option using Array for that type, since
 built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type.

This isn't exactly true. D's builtin arrays also are amortized constant time to append, but the constant is way higher :) I agree that an array-like container would provide features that D's builtin arrays do not, including much better append performance. -Steve
May 28 2010
prev sibling next sibling parent Leandro Lucarella <llucax gmail.com> writes:
Andrei Alexandrescu, el 27 de mayo a las 20:06 me escribiste:
 On 05/27/2010 06:28 PM, Jonathan M Davis wrote:
I take it that Array is basically supposed to be std.container's version of
C++'s vector or Java's ArrayList? If so, I would suggest that Array is not
the best of names in that it would become very easy to confuse it with
built-in arrays when discussing them (particularly in verbal communication
where you can't see the capital A). Personally, I would have just gone with
Vector, since it's a fairly standard name for that sort of container and
wouldn't be confused with anything else. If you really want Array, that's
fine - it should be clear enough when it's in code - but I would think that
talking about it could easily be confusing enough that it wouldn't be the
best of names.

- Jonathan M Davis

Good point. Other opinions?

I always thought vector was a bad name choice in C++, I had that word associated with what C++ calls a valarray (a physics vector). I agree that Array is too generic for D, though, and unfortunately I don't have a better suggestion, but you asked for other opinions ;) BTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- DETIENEN A PADRE, MADRE, TIOS Y ABUELOS: TODOS DEPRAVADOS -- Crónica TV
May 27 2010
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Jonathan M Davis, el 28 de mayo a las 04:58 me escribiste:
 BTW, what would be the point of an array/vector when you have built-in
 arrays? If built-in arrays would be syntax sugar for a real library type,
 like AAs, I can see as a good option using Array for that type, since
 built-in arrays and the library Array would be the same thing.

The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type.

OK, I forgot that, I guess my brain choose to forget it to stop keeping the flame alive, so I'll forget all about it again =P -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Estamos cantando a la sombra de nuestra parra una canción que dice que uno sólo conserva lo que no amarra
May 28 2010