www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DList and SList missing length property?

reply Paulo Pinto <pjmlp progtools.org> writes:
Hi,

maybe this is more indicated for d.learn, but I am not sure if
it is not a bug in the documentation/implementation.

According to the phobos documentation, all containers in std.container
should provide a length property.

However it is only available for arrays.

Sure one could use the std.algorithm.count() or std.range.walkLength(), 
but I don't see a reason why SList and DList aren't aware of their 
current size.

Is this on purpose?

--
Paulo
May 28 2013
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, May 28, 2013 20:03:49 Paulo Pinto wrote:
 Hi,
 
 maybe this is more indicated for d.learn, but I am not sure if
 it is not a bug in the documentation/implementation.
 
 According to the phobos documentation, all containers in std.container
 should provide a length property.

It never says that. It lists what complexity each operation must have in order for a container to have that operation. It doesn't guarantee that any particular operation will be present on all containers (though any that's O(n) or worse is pretty much bound to be on all containers unless in does something fundamentally incompatible with a particular container's data structure).
 However it is only available for arrays.
 
 Sure one could use the std.algorithm.count() or std.range.walkLength(),
 but I don't see a reason why SList and DList aren't aware of their
 current size.
 
 Is this on purpose?

Yes. It's on purpose. When dealing with lists, because you can insert at any point, you have two choices: 1. Make length efficent. 2. Make splicing efficient. If you keep track of the length, then you have to count all of the elements when you splice two lists, so splicing is O(n) instead of O(1). If you don't keep track of the length, then splicing is O(1), but then length is O(n), and length can't be O(n) per std.container's complexity requirements. Now, that being said, I don't see a splice operation on either SList or DList (and I don't think that splicing an SList can be done in O(1) anyway), so it would appear that the implementation is optimizing for an operation that it doesn't currently support. The same situation exists with C++'s std::list type though - size() is O(n) and splicing is O(1). However, we opted for not having containers implement inefficient operations like that (which C++ mostly did as well, but not in this case). If you want the length of a container which doesn't have a length property, then use walkLength on its range - e.g. walkLength(container[]). - Jonathan M Davis
May 28 2013
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 28.05.2013 20:16, schrieb Jonathan M Davis:
 On Tuesday, May 28, 2013 20:03:49 Paulo Pinto wrote:
 Hi,

 maybe this is more indicated for d.learn, but I am not sure if
 it is not a bug in the documentation/implementation.

 According to the phobos documentation, all containers in std.container
 should provide a length property.

It never says that. It lists what complexity each operation must have in order for a container to have that operation. It doesn't guarantee that any particular operation will be present on all containers (though any that's O(n) or worse is pretty much bound to be on all containers unless in does something fundamentally incompatible with a particular container's data structure).
 However it is only available for arrays.

 Sure one could use the std.algorithm.count() or std.range.walkLength(),
 but I don't see a reason why SList and DList aren't aware of their
 current size.

 Is this on purpose?

Yes. It's on purpose. When dealing with lists, because you can insert at any point, you have two choices: 1. Make length efficent. 2. Make splicing efficient. If you keep track of the length, then you have to count all of the elements when you splice two lists, so splicing is O(n) instead of O(1). If you don't keep track of the length, then splicing is O(1), but then length is O(n), and length can't be O(n) per std.container's complexity requirements. Now, that being said, I don't see a splice operation on either SList or DList (and I don't think that splicing an SList can be done in O(1) anyway), so it would appear that the implementation is optimizing for an operation that it doesn't currently support. The same situation exists with C++'s std::list type though - size() is O(n) and splicing is O(1). However, we opted for not having containers implement inefficient operations like that (which C++ mostly did as well, but not in this case). If you want the length of a container which doesn't have a length property, then use walkLength on its range - e.g. walkLength(container[]). - Jonathan M Davis

Ok, maybe the documentation should explicitly state to which containers each operation applies to. Currently it is a bit confusing to see the complexity listed, but not finding the methods. Thanks for the lengthy explanation, it is actually in sync with what I was thinking might be the reason. -- Paulo
May 28 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 28 May 2013 at 18:16:48 UTC, Jonathan M Davis wrote:
 Now, that being said, I don't see a splice operation on either 
 SList or DList
 (and I don't think that splicing an SList can be done in O(1) 
 anyway), so it
 would appear that the implementation is optimizing for an 
 operation that it
 doesn't currently support.

SList can be spliced, but the semantics are a bit more complicated. In most cases, if you need to splice, you'd use a DList anyway: SList is really for particular use cases. I think that if splice is not in DList, it is only because it is not *yet* in DList. I have an implementation ready to submit, which adheres to Andrei's proposed container semantics. I have not yet submitted it though, because fixing DList is more important that inserting more functionality. It's current semantics are neither value nor ref. It's... something else. My fault for inserting it when trying to fix previous clusterfuck. I've since submitted fix that gives it classic semantics, but it's kind of just sitting there... We should *really* see to getting it (or something else) pulled through.
 The same situation exists with C++'s std::list type though - 
 size() is O(n)
 and splicing is O(1). However, we opted for not having 
 containers implement
 inefficient operations like that (which C++ mostly did as well, 
 but not in this
 case).

 - Jonathan M Davis

That was in C++03. In C++11, length was reduced to 0(1). And splice was increased to 0(N) (I think, I don't remember the exact details, but it was definitely changed). I'll look up the exact requirement later tonight.
May 28 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the exact
 details, but it was definitely changed).

I do remember hearing something about that, though that will actually result in some nasty performance changes to some of the code out there (including code that I've written). Regardless, you have the choice of making length O(1) or splicing O(1) and not both, and C++98/03 made splicing O(1), which I'm inclined to believe is the better choice, though having size when it's O(n) was a definite mistake on C++'s part IMHO. I hate to think of how many times I've seen programmer's write container.size() == 0 instead of empty(), particularly when size() was O(n)... - Jonathan M Davis
May 28 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 May 2013 15:06:53 -0400, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the exact
 details, but it was definitely changed).

I do remember hearing something about that, though that will actually result in some nasty performance changes to some of the code out there (including code that I've written). Regardless, you have the choice of making length O(1) or splicing O(1) and not both, and C++98/03 made splicing O(1), which I'm inclined to believe is the better choice, though having size when it's O(n) was a definite mistake on C++'s part IMHO. I hate to think of how many times I've seen programmer's write container.size() == 0 instead of empty(), particularly when size() was O(n)...

I actually had performance bugs the OTHER way -- I needed the length of a list, and was using .size() without realizing it was O(n). In profiling my code, I found that std::list<T>::iterator::operator++ was the most used function, and I had no idea why for a LONG time :) You know, it is trivial to have an O(1) splice O(n) length list wrapped into an o(n) splice O(1) length list. All you need is to add a tracked length. And this is a REAL pain when you have to do it manually. There should just be two list types in <list> -Steve
May 28 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 28 May 2013 at 19:24:05 UTC, Steven Schveighoffer 
wrote:
 On Tue, 28 May 2013 15:06:53 -0400, Jonathan M Davis 
 <jmdavisProg gmx.com> wrote:

 On Tuesday, May 28, 2013 20:51:55 monarch_dodra wrote:
 That was in C++03. In C++11, length was reduced to 0(1). And
 splice was increased to 0(N) (I think, I don't remember the 
 exact
 details, but it was definitely changed).

I do remember hearing something about that, though that will actually result in some nasty performance changes to some of the code out there (including code that I've written). Regardless, you have the choice of making length O(1) or splicing O(1) and not both, and C++98/03 made splicing O(1), which I'm inclined to believe is the better choice, though having size when it's O(n) was a definite mistake on C++'s part IMHO. I hate to think of how many times I've seen programmer's write container.size() == 0 instead of empty(), particularly when size() was O(n)...

I actually had performance bugs the OTHER way -- I needed the length of a list, and was using .size() without realizing it was O(n). In profiling my code, I found that std::list<T>::iterator::operator++ was the most used function, and I had no idea why for a LONG time :) You know, it is trivial to have an O(1) splice O(n) length list wrapped into an o(n) splice O(1) length list. All you need is to add a tracked length. And this is a REAL pain when you have to do it manually. There should just be two list types in <list> -Steve

A proper implementation should be able to track length anyway: provide 0(1) splice, and an "amortized" 0(1) length. I've always wondered why the standard didn't decide to do *that*? I think *we* should provide that...
May 28 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 28 May 2013 at 19:24:05 UTC, Steven Schveighoffer 
wrote:
 You know, it is trivial to have an O(1) splice O(n) length list 
 wrapped into an o(n) splice O(1) length list.  All you need is 
 to add a tracked length.  And this is a REAL pain when you have 
 to do it manually.  There should just be two list types in 
 <list>

I think having two lists in list would be overkill. I *do* find the standard's choice odd though: The *primary* reason I ever use list, ever, is for its splice ability to move elements without copying them, and still having their addresses valid. No other container can do that. I don't get why they borked list's primary strength for a function you really shouldn't be using anyway. Worst, as you say, it is easy to implement one in terms of the other... but not the other way around. By doing what they just did, they effectively closed *any* way to splice in 0(1)... If you don't need splice, then most of the time, deque is a better choice of a container, and *that* has 0(1) length (along with a few neat properties). All this really doesn't make sense to me.
May 28 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 A proper implementation should be able to track length anyway: provide  
 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do *that*? I  
 think *we* should provide that...

I'm not sure how that works, can you explain/have a link? -Steve
May 28 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer 
wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length anyway: 
 provide 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do 
 *that*? I think *we* should provide that...

I'm not sure how that works, can you explain/have a link? -Steve

Well, the basic idea is to give your list an "m_size" member. This starts at 0. Whenever the user does a push_back/pop_back or whatnot operation, the the "m_size" attribute gets correctly upgraded. At that point, calling "size()" simply returns "m_size". Now, once a splice operation gets called, the m_size gets reset to a magic value, to indicate that tracking of the size has been lost. Operations will seize upgrading m_size, until a call to size is made, at which point it will be re-calculated, once. This, I think is the best solution, since it keeps people that use length in a safe position, while users of splice are also satisfied. Also, I *think* people who use splice tend to be more aware of the situation, and avoid calling length entirely. Implementation wise, it would mostly look like this: template <typename T> class list { size_t m_size = 0; size_t push_back(T other) { //Normal Code if ( m_size != std::numeric_limits<size_t>::max() ) ++m_size; } size_t splice(list::iterator first, list::iterator last) { //Normal Code //Reset length m_size == std::numeric_limits<size_t>::max(); } size_t size() const { if ( m_size == std::numeric_limits<size_t>::max() ) //Re-evaluate m_size m_size = std::distance( cbegin(), cend() ); return m_size; } };
May 29 2013
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 29 May 2013 05:15:00 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:

 On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra  
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length anyway: provide  
 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do *that*? I  
 think *we* should provide that...

I'm not sure how that works, can you explain/have a link? -Steve

Well, the basic idea is to give your list an "m_size" member. This starts at 0. Whenever the user does a push_back/pop_back or whatnot operation, the the "m_size" attribute gets correctly upgraded. At that point, calling "size()" simply returns "m_size". Now, once a splice operation gets called, the m_size gets reset to a magic value, to indicate that tracking of the size has been lost. Operations will seize upgrading m_size, until a call to size is made, at which point it will be re-calculated, once.

This is O(n) length, not amortized O(1) length, as it is highly dependent on usage. For example, in a project I am currently working on, I most frequently am using splice to move one list element at a time. This degrades your solution to basically O(n) length for every move, and I move a lot. -Steve
May 29 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 30 May 2013 at 01:56:03 UTC, Steven Schveighoffer 
wrote:
 On Wed, 29 May 2013 05:15:00 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 On Tuesday, 28 May 2013 at 22:29:02 UTC, Steven Schveighoffer 
 wrote:
 On Tue, 28 May 2013 17:11:06 -0400, monarch_dodra 
 <monarchdodra gmail.com> wrote:

 A proper implementation should be able to track length 
 anyway: provide 0(1) splice, and an "amortized" 0(1) length.

 I've always wondered why the standard didn't decide to do 
 *that*? I think *we* should provide that...

I'm not sure how that works, can you explain/have a link? -Steve

Well, the basic idea is to give your list an "m_size" member. This starts at 0. Whenever the user does a push_back/pop_back or whatnot operation, the the "m_size" attribute gets correctly upgraded. At that point, calling "size()" simply returns "m_size". Now, once a splice operation gets called, the m_size gets reset to a magic value, to indicate that tracking of the size has been lost. Operations will seize upgrading m_size, until a call to size is made, at which point it will be re-calculated, once.

This is O(n) length, not amortized O(1) length, as it is highly dependent on usage.

That's why I said "amortized" with quotes. Not the exact term, I know. I should have been clearer. "You can get 0(1) length, except in certain situations, where you'll have to pay once an 0(n) to get back to 0(1)". And yeah, it is dependent on usage. Like vector's push_back. (although incidentally *that* is true amortized)
 For example, in a project I am currently working on, I most 
 frequently am using splice to move one list element at a time.  
 This degrades your solution to basically O(n) length for every 
 move, and I move a lot.

http://www.cplusplus.com/reference/list/list/splice/ Note that 1 element splice (2) is not a problem. It has always been 0(1), regardless of scheme. In my scheme, it wouldn't invalidate legnth. Only [iterator .. iterator] splice (3) is is problematic. Full list splice (1) status is more complex. It is actually 0(1) in the scheme where length is 0(1). With my scheme, the guarantees depends on where you want to place the best compromise.
May 29 2013
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 30 May 2013 02:09:41 -0400, monarch_dodra <monarchdodra gmail.com>  
wrote:


 http://www.cplusplus.com/reference/list/list/splice/

 Note that 1 element splice (2) is not a problem. It has always been  
 0(1), regardless of scheme. In my scheme, it wouldn't invalidate legnth.

 Only [iterator .. iterator] splice (3) is is problematic.

That is true, and a good point. -Steve
May 30 2013