www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - std.container & ranges

reply Max Wolter <awishformore gmail.com> writes:
Hello there.

I seem to be having problems wrapping my head around how to use the 
ranges in the context of containers in phobos. Specifically, I can't 
seem to figure out how to remove an element from a linked list.

          foreach(cell; organism)
          {
             if(cell.x == x && cell.y == y)
             {
                organism.stableLinearRemove(cell);
                break;
             }
          }

mind.d(123): Error: function 
std.container.SList!(Cell).SList.linearRemove (Range r) is not callable 
using argument types (Cell)
mind.d(123): Error: cannot implicitly convert expression (cell) of type 
cell.Cell to Take!(Range)

I somehow get the feeling such a basic operation should just...work?

/Max
Oct 30 2011
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, October 30, 2011 11:38:30 Max Wolter wrote:
 Hello there.
 
 I seem to be having problems wrapping my head around how to use the
 ranges in the context of containers in phobos. Specifically, I can't
 seem to figure out how to remove an element from a linked list.
 
           foreach(cell; organism)
           {
              if(cell.x == x && cell.y == y)
              {
                 organism.stableLinearRemove(cell);
                 break;
              }
           }
 
 mind.d(123): Error: function
 std.container.SList!(Cell).SList.linearRemove (Range r) is not callable
 using argument types (Cell)
 mind.d(123): Error: cannot implicitly convert expression (cell) of type
 cell.Cell to Take!(Range)
 
 I somehow get the feeling such a basic operation should just...work?

linearRemove (and stableLinearRemove) takes a _range_ not a value. cell is an element in the list, not a range over the list. The range that it takes must be either of type SList.Range or Take!(SList.Range). You get that range by slicing an SList. Take!(SList.Range) is for the case where you want only a portion of the beginning of a range rather than the whole range. Your example actually has a really simple solution: auto found = find!((a){return a.x == x && a.y == y;})(organism[]); organism.stableLinearRemove(take(found, 1)); It finds the element in the list that you're looking for, and then it passes a range with that one element to stableLinearRemove so that it'll remove it. - Jonathan M Davis
Oct 30 2011
parent reply Max Wolter <awishformore gmail.com> writes:
On 10/30/2011 6:45 PM, Jonathan M Davis wrote:
 On Sunday, October 30, 2011 11:38:30 Max Wolter wrote:
 Hello there.

 I seem to be having problems wrapping my head around how to use the
 ranges in the context of containers in phobos. Specifically, I can't
 seem to figure out how to remove an element from a linked list.

            foreach(cell; organism)
            {
               if(cell.x == x&&  cell.y == y)
               {
                  organism.stableLinearRemove(cell);
                  break;
               }
            }

 mind.d(123): Error: function
 std.container.SList!(Cell).SList.linearRemove (Range r) is not callable
 using argument types (Cell)
 mind.d(123): Error: cannot implicitly convert expression (cell) of type
 cell.Cell to Take!(Range)

 I somehow get the feeling such a basic operation should just...work?

linearRemove (and stableLinearRemove) takes a _range_ not a value. cell is an element in the list, not a range over the list. The range that it takes must be either of type SList.Range or Take!(SList.Range). You get that range by slicing an SList. Take!(SList.Range) is for the case where you want only a portion of the beginning of a range rather than the whole range. Your example actually has a really simple solution: auto found = find!((a){return a.x == x&& a.y == y;})(organism[]); organism.stableLinearRemove(take(found, 1)); It finds the element in the list that you're looking for, and then it passes a range with that one element to stableLinearRemove so that it'll remove it. - Jonathan M Davis

Hello there. Thank you very much for the explanation. However, while I really liked the concept of ranges in Andrei's book and a lot of it seems intuitive and faster than using iterators, I can't shake the feeling that in this case, it's just unnecessarily convoluted. Maybe it's just the fact that the container library is still very basic, but I don't think I should go through such a complicated procedure to remove an known element from a list. It's just not a "very simple" or intuitive solution, which is something I came to love D for thus far. /Max
Oct 30 2011
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On 10/31/2011 5:28 AM, Jonathan M Davis wrote:


 So, in comparison to C++, there's no significant difference. Now, Java does
have
 a remove function which will take an element and remove the first occurence of
 that element from a list, and we could theoretically add one, but why?

IMO, it's much more intuitive to say list.remove(item). It's the first thing a lot of people expect coming from a Java background, that's for sure. The first time I tried to use SList, being unfamiliar with ranges as I was, it took a while to figure out what I needed to do. IIRC, I had to post here to ask. The problem is that you really have to understand ranges and the Phobos functions that manipulate them before you can begin to use containers. I think that's wrong. Ideally, containers should be usable without having to know about find and Take and whatever else. This isn't the first time this question has come up in the NG and I've got a feeling it won't be the last. At the very least it would be nice to see something in the documentation tying std.algorithm and std.range together with std.container. Something to point the way for those to whom it isn't obvious.
Oct 30 2011
parent reply Tobias Pankrath <tobias pankrath.net> writes:
Jonathan M Davis wrote:

  find allows
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it would be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to the
 list's erase function. 

How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation. What do you think of an construct like this: foreach(item, itemRange; container) { // itemRange is a range only containing item // itemRange.length == 1 && itemRange.front == item } That would be analogous to foreach over array with an index. You might think that iterating explicity may not be elegant. But it is what people want to do and it should not be any harder than necessary. -- Tobias
Oct 31 2011
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 31.10.2011 11:16, Tobias Pankrath wrote:
 Jonathan M Davis wrote:

   find allows
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it would be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to the
 list's erase function.

How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation.

It is, let's use remove from std.algorithm! ... though there is no overload of remove for forward ranges ... crap, worthy of bugzilla report. Or SList could be special enough to offer a built-in remove with predicate done in O(n). However this should work( yet because of apparent bug in SList it doesn't). The bug is: Line 1294 of std.container should be: for (; !r.empty; r.popFront()) ... or it will remove the whole container. Code: import std.container, std.range; import std.functional, std.algorithm, std.stdio; void removeFromList(alias pred, T)(ref SList!T s) { static if(is(typeof(pred) == string)) alias unaryFun!pred Pred; else alias pred Pred; for(auto r=s[]; !r.empty; ) { if(Pred(r.front)) { r = s.linearRemove(take(r,1)); continue; } else r.popFront(); } } void main() { SList!int list(20, 30, 40, 50, 60); // will work with stock Phobos if the first element % 20 != 0 removeFromList!"a % 20 == 0"(list); writeln(list[]); } And more importantly, it still would be horribly slow O(N^2). Personally, because of that I'd prefer hand-rolled intrusive singly-linked list any time of day.
 What do you think of an construct like this:

 foreach(item, itemRange; container)
 {
 // itemRange is a range only containing item
 // itemRange.length == 1&&  itemRange.front == item
 }

 That would be analogous to foreach over array with an index. You might think
 that iterating explicity may not be elegant. But it is what people want to
 do and it should not be any harder than necessary.

You still can, just use plain range interface with empty/front/popFront. for(auto r = list[]; !r.empty; r.popFront()) { ... } -- Dmitry Olshansky
Oct 31 2011
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 31.10.2011 18:38, Steven Schveighoffer wrote:
 On Mon, 31 Oct 2011 05:16:11 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 31.10.2011 11:16, Tobias Pankrath wrote:
 Jonathan M Davis wrote:

 find allows
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it
 would be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to the
 list's erase function.

How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation.

It is, let's use remove from std.algorithm! ... though there is no overload of remove for forward ranges ... crap, worthy of bugzilla report. Or SList could be special enough to offer a built-in remove with predicate done in O(n). However this should work( yet because of apparent bug in SList it doesn't). The bug is: Line 1294 of std.container should be: for (; !r.empty; r.popFront()) ... or it will remove the whole container. Code: import std.container, std.range; import std.functional, std.algorithm, std.stdio; void removeFromList(alias pred, T)(ref SList!T s) { static if(is(typeof(pred) == string)) alias unaryFun!pred Pred; else alias pred Pred; for(auto r=s[]; !r.empty; ) { if(Pred(r.front)) { r = s.linearRemove(take(r,1)); continue; } else r.popFront(); } } void main() { SList!int list(20, 30, 40, 50, 60); // will work with stock Phobos if the first element % 20 != 0 removeFromList!"a % 20 == 0"(list); writeln(list[]); } And more importantly, it still would be horribly slow O(N^2). Personally, because of that I'd prefer hand-rolled intrusive singly-linked list any time of day.

ahem, using dcollections: foreach(ref doRemove, cell; &organism.purge) doRemove = cell.x == x && cell.y == y; complexity: O(n)

While computer happily does O(n) work here, my brain screams in cognitive dissonance figuring out that this assignment actually does delete... meh voodoo through and through :) I'd rather organism.remove!((cell){ return cell.x == x && cell.y == y; })(); or something like that. But it's not a question of form, but more of a missing stuff that should be there.
 :)

 -Steve

-- Dmitry Olshansky
Oct 31 2011
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
Looks like it was accidental cross-posting:

On 31.10.2011 20:11, Alessandro Stamatto wrote:
     it still would be horribly slow O(N^2).
     Personally, because of that I'd prefer hand-rolled intrusive
     singly-linked list any time of day.


 Now you're scaring me... You're saying that SList in D not only is 

 a templated remove_if would run in O(N^2) instead of O(N) ????!!!!

Basically I'm saying that with current SList *implementation* it will run in O(N^2), because for some reason it always do a check on each range passed to remove that it does belong to this list. (i.e. it walks from head till hits it or it's own tail) -- Dmitry Olshansky
Oct 31 2011
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
Steven Schveighoffer Wrote:

 ahem, using dcollections:
 
 foreach(ref doRemove, cell; &organism.purge)
      doRemove = cell.x == x && cell.y == y;
 
 complexity: O(n)

may be a generic iteration handler would be more useful? foreach(ref handler, cell; &organism.each) if(cell.x == x && cell.y == y) handler.removeCurrent(); it could provide a whole api, say, you may want to have lookahead foreach(ref handler, cell; &organism.each) if(cell.x == x && cell.y == y && handler.next.x==0) handler.removeCurrent();
Nov 01 2011
parent travert phare.normalesup.org (Christophe) writes:
Kagamin , dans le message (digitalmars.D.learn:30362), a écrit :
 Steven Schveighoffer Wrote:
 
 ahem, using dcollections:
 
 foreach(ref doRemove, cell; &organism.purge)
      doRemove = cell.x == x && cell.y == y;
 
 complexity: O(n)

may be a generic iteration handler would be more useful? foreach(ref handler, cell; &organism.each) if(cell.x == x && cell.y == y) handler.removeCurrent(); it could provide a whole api, say, you may want to have lookahead foreach(ref handler, cell; &organism.each) if(cell.x == x && cell.y == y && handler.next.x==0) handler.removeCurrent();

That's not easier than using Tobias' improved foreach: foreach(cell, cellRange; organism[]) if (cell.x == x && cell.y == y) organism.remove(cellRange); If you want to use algorithm specialized and optimized for the container, I would prefer to use purge than each + handler. purge seems better to me, because it can be specialized for the container and for the action, and do not require to learn a new handler structure. "each" do not seem to add a lot to foreach(cell, cellRange; range), since it must be able to cope with any operation provided by handler, and any operation combinaisons...
Nov 02 2011
prev sibling parent reply Tobias Pankrath <tobias pankrath.net> writes:
Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.
 

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N). I still think the interface to container lacks a important feature here.
Nov 03 2011
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check. Actually, it opens up a question of "Checked ranges" akin to "Checked iterators" used in many STL implementations. Any ideas what they can carry around as an "ID" of list? Root pointer won't cut it, as it can be easily removed during iteration. If SList was a class it's reference probably would do. * I think I'll scrap together a pull request to address both these issues though. -- Dmitry Olshansky
Nov 03 2011
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical: 0. Have a sentinel for end of list. (O(1) additional memory for the entire list). It is the only node in the list that has a null 'next' pointer. example, for removing just the current node: 1. if the following node is not the sentinel 1.1. Move value from following node into the current node. 1.2. Remove following node. 2. if the following node is the sentinel 2.1. erase the contents of the current node (iff it has indirections) 2.2. set it's 'next' field to null Removing an entire range is just erasing the contents of the current node and setting the 'next' field of the associated Node to zero. Removing a Take!Range is a simple generalization of the algorithm above.
Nov 03 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 11/03/2011 07:15 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually. A range is a reference to a Node struct. But I still think the following approach will work.
 0. Have a sentinel for end of list. (O(1) additional memory for the
 entire list). It is the only node in the list that has a null 'next'
 pointer.

 example, for removing just the current node:
 1. if the following node is not the sentinel
 1.1. Move value from following node into the current node.
 1.2. Remove following node.
 2. if the following node is the sentinel
 2.1. erase the contents of the current node (iff it has indirections)
 2.2. set it's 'next' field to null


 Removing an entire range is just erasing the contents of the current
 node and setting the 'next' field of the associated Node to zero.
 Removing a Take!Range is a simple generalization of the algorithm above.

This would be a good addition to the type. It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed. Can you put together a pull request?

I can do that, but it will have to wait a few days, as I am quite busy at the moment.
 If not, I
 can see about doing it. One issue, you could not do this if the value is
 immutable/const.

That is true. But how to fix this? Resolving it by simply not exposing the remove method if it is impossible to implement the straightforward way would be an option, but I don't think that is satisfying. Probably it could be made to work by creating some kind of head mutable structure. I will think about it. It should even be possible to generalize std.typecons.Rebindable for structs to help this idea.
 I could use this idea, I think, to implement a singly linked list in
 dcollections as well (the prospect of not having O(1) removal is what
 has stopped me). Thanks for the idea!

Nice!
Nov 03 2011
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/03/2011 09:46 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 14:50:31 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 07:15 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr gmx.ch>
 wrote:

 On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not
 support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually. A range is a reference to a Node struct. But I still think the following approach will work.
 0. Have a sentinel for end of list. (O(1) additional memory for the
 entire list). It is the only node in the list that has a null 'next'
 pointer.

 example, for removing just the current node:
 1. if the following node is not the sentinel
 1.1. Move value from following node into the current node.
 1.2. Remove following node.
 2. if the following node is the sentinel
 2.1. erase the contents of the current node (iff it has indirections)
 2.2. set it's 'next' field to null


 Removing an entire range is just erasing the contents of the current
 node and setting the 'next' field of the associated Node to zero.
 Removing a Take!Range is a simple generalization of the algorithm
 above.

This would be a good addition to the type. It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed. Can you put together a pull request?

I can do that, but it will have to wait a few days, as I am quite busy at the moment.

No rush. I just wanted to know if you would do it, and if not, I would do it.
 If not, I
 can see about doing it. One issue, you could not do this if the value is
 immutable/const.

That is true. But how to fix this? Resolving it by simply not exposing the remove method if it is impossible to implement the straightforward way would be an option, but I don't think that is satisfying. Probably it could be made to work by creating some kind of head mutable structure. I will think about it. It should even be possible to generalize std.typecons.Rebindable for structs to help this idea.

Hm... rebindable doesn't help if the item is a struct. You'd have to store an allocated struct on the heap. -Steve

My point is, if you have a struct like this: struct S{ const int* x; immutable int y; } Then that could be stored in the list as struct _S{ const(int)* x; int y; S getS(){return S(x,y);} } Without putting the type system under pressure. But on second thought, std.typecons.Rebindable can probably not meaningfully be generalized to structs because it could not deal with member functions.
Nov 03 2011
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 03.11.2011 21:13, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

removeAfter ? ;) Hm though it does hit as strange. Looking at my code, I usually keep previous node but only when I remove elements during iteration.
 I suggested to Andrei having the range keep track of the *previous*
 node/head, but he did not like that idea (too many dereferences for
 front()). Another option is to have a removeAllButFirst method, but
 that's kind of awkward...

And then using a sentinel as in Timon's idea doesn't look half bad.
 Actually, it opens up a question of "Checked ranges" akin to "Checked
 iterators" used in many STL implementations. Any ideas what they can
 carry around as an "ID" of list? Root pointer won't cut it, as it can
 be easily removed during iteration. If SList was a class it's
 reference probably would do.

dcollections stipulates that all ranges/cursors can be verified in O(lgn) time or better to belong to a specific container. In some cases, this adds an extra word to the range/cursor, and in others, it's easy to determine or the owner-reference was already needed. Since everything is a class, the fallback is to just stick an owner class instance in the range. This stipulation is necessary to allow safe slicing.

Seems reasonable, I'd expect checks to go away in release, right(?). -- Dmitry Olshansky
Nov 03 2011
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 03.11.2011 22:37, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 14:08:31 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 21:13, Steven Schveighoffer wrote:
 dcollections stipulates that all ranges/cursors can be verified in
 O(lgn) time or better to belong to a specific container. In some cases,
 this adds an extra word to the range/cursor, and in others, it's easy to
 determine or the owner-reference was already needed. Since everything is
 a class, the fallback is to just stick an owner class instance in the
 range.

 This stipulation is necessary to allow safe slicing.

Seems reasonable, I'd expect checks to go away in release, right(?).

For the moment, no. I am not sure whether this is the right decision or not, because once you get beyond arrays, when to do bounds checks becomes fuzzy. For example, imagine you have this: auto ts = new TreeSet!int(1, 3, 5, 7, 9); What does this mean? auto r = ts[2..4]; Note that a range type for a treeset has a pointer to a begin and end node for the container. For arrays, not doing a bounds check is simply less code. For a RBTree, you still have to look for the specific node, even if you are in release mode. Another example: auto ts2 = ts.dup; // duplicate the treeset auto c1 = ts2.elemAt(3); // get the node for 3 in ts2 auto r2 = ts[c1..ts.end];

hm-h will that actually work? I mean searching in ts for node from ts2?
 Here I can say, we can skip the belongs check (which is an O(lgn) walk
 up the tree).

Well, I was mostly talking about this kind of scenario i.e. skip checking that cursor do belong to this collection. While in ts[2..4] example there is no (explicit) cursors so it's a different situation.
 But I'm still doing it in release mode. I'm not sure what's best. Should
 I just do the least work possible, or should I make it consistent with
 ts[2..4]?

I'd say release mode should avoid as much extra work as possible while keeping primary functionality intact, but that's just me. -- Dmitry Olshansky
Nov 03 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, October 31, 2011 13:16:11 Dmitry Olshansky wrote:
 On 31.10.2011 11:16, Tobias Pankrath wrote:
 Jonathan M Davis wrote:
   find allows
 
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it would
 be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to
 the
 list's erase function.

I don't think we should refer to C++ as an benchmark for ease of use. How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation.

It is, let's use remove from std.algorithm! ... though there is no overload of remove for forward ranges ... crap, worthy of bugzilla report.

Probably because it has to swap elements around, which requires a bidirectional range.
 Or SList could be special enough to offer a built-in remove with
 predicate done in O(n).

It would be valuable in general IMHO. We should add something similar to the STL's remove_if function on linked lists. The fact that std.algorith.remove must shuffle elements around instead of actually removing them (similar to what the STL does with its free function erase) just shows how it doesn't really work all that well. It can be done _much_ more efficiently by putting remove_if (or whatever we would call it) on the container. It's also less confusing, since std.algorithm.remove (and the STL's erase function) doesn't actually _anything_. You have to pass the returned range (or iterator in C++) to the container's linearRemove function (or erase in C++) to remove the elements, which is _highly_ confusing (but since iterators and ranges don't really know about their containers beyond the elements that they reference, it's the best that can be done with a free function). For the case where you're searching for a specific element to remove though, I see no reason to add another function. That's what find is for (and given that we have lambdas, it's easy to use even in the case where you need a predicate other than ==), and if you really want to, you can iterate the range explicitly and pass the portion of the range to the container's remove function that you want to remove. The main problem with it is that we don't have proper examples, and we don't have enough documentation explaining ranges in general, so it's not immediately obvious to people how to properly use a container's remove function. - Jonathan M Davis
Oct 31 2011
prev sibling parent reply Max Wolter <awishformore gmail.com> writes:
On 10/30/2011 9:28 PM, Jonathan M Davis wrote:
 On Sunday, October 30, 2011 20:53:02 Max Wolter wrote:
 Hello there.

 Thank you very much for the explanation.

 However, while I really liked the concept of ranges in Andrei's book and
 a lot of it seems intuitive and faster than using iterators, I can't
 shake the feeling that in this case, it's just unnecessarily convoluted.

 Maybe it's just the fact that the container library is still very basic,
 but I don't think I should go through such a complicated procedure to
 remove an known element from a list. It's just not a "very simple" or
 intuitive solution, which is something I came to love D for thus far.

You would be doing exactly the same thing in C++ except that it would be with an iterator rather than a range. You would use find to find the iterator to the element that you wanted and then you'd pass that to the list's erase function. It is only more complicated in D in that you get a range which _starts_ with the element that you want (essentially, you get the iterator to that element plus the end iterator for the list), so if you want to remove only the first element, you take it from the front of the range. Other than that, it's the same as in C++. You can't remove an element from a list in C++ by giving that element to the list anymore than you can in D. In both cases, you need an iterator or a range. So, in comparison to C++, there's no significant difference. Now, Java does have a remove function which will take an element and remove the first occurence of that element from a list, and we could theoretically add one, but why? It would just be duplicating the functionality that find already gives. Java doesn't use iterators the way the C++ does, and it doesn't have ranges, so it _can't_ have a find function the way that C++ and D do, but D can. And in some cases, find can do it more efficiently than your loop could have. I grant you that if you're not used to using find like this in C++ (and far, far too many C++ programmers use loops instead of find - in part because pre- C++11, there were no lambdas and any time you need a custom comparison function, it's a pain to get one), then it's not immediately intuitive, but it's far more flexible and powerful than removing elements by giving the element to a remove function on the list. And if you really want to use a loop, then you still can. You just can't use foreach. for(auto r = organism[]; !r.empty; r.popFront()) { if(r.front.x == x&& r.front.y == y) { organism.stableLinearRemove(take(r, 1)); break; } } but that's a lot more verbose than simply using find, and as I said, in at least some cases, find can be more efficient than simply looping, since it's optimized for finding elements. - Jonathan M Davis

Hello again. I've read all further replies in this thread, but it seems to me this is the most appropriate place to respond. Simply put, all of those options are too verbose. If what you want to do is simple, the syntax should also be simple, this is what I love about D. As a side-note, I don't come from Java, but I still expect a container to be able to remove an element by passing it to a method. The verbosity and the details of this operation should be hidden in the implementation of the method and I shouldn't need to know about the details. Otherwise, I could just as well implement my own container. /Max
Nov 01 2011
parent reply Ary Manzana <ary esperanto.org.ar> writes:
On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
 On Tue, 01 Nov 2011 16:53:26 -0400, Max Wolter <awishformore gmail.com>
 wrote:

 On 10/30/2011 9:28 PM, Jonathan M Davis wrote:
 On Sunday, October 30, 2011 20:53:02 Max Wolter wrote:
 Hello there.

 Thank you very much for the explanation.

 However, while I really liked the concept of ranges in Andrei's book
 and
 a lot of it seems intuitive and faster than using iterators, I can't
 shake the feeling that in this case, it's just unnecessarily
 convoluted.

 Maybe it's just the fact that the container library is still very
 basic,
 but I don't think I should go through such a complicated procedure to
 remove an known element from a list. It's just not a "very simple" or
 intuitive solution, which is something I came to love D for thus far.

You would be doing exactly the same thing in C++ except that it would be with an iterator rather than a range. You would use find to find the iterator to the element that you wanted and then you'd pass that to the list's erase function. It is only more complicated in D in that you get a range which _starts_ with the element that you want (essentially, you get the iterator to that element plus the end iterator for the list), so if you want to remove only the first element, you take it from the front of the range. Other than that, it's the same as in C++. You can't remove an element from a list in C++ by giving that element to the list anymore than you can in D. In both cases, you need an iterator or a range. So, in comparison to C++, there's no significant difference. Now, Java does have a remove function which will take an element and remove the first occurence of that element from a list, and we could theoretically add one, but why? It would just be duplicating the functionality that find already gives. Java doesn't use iterators the way the C++ does, and it doesn't have ranges, so it _can't_ have a find function the way that C++ and D do, but D can. And in some cases, find can do it more efficiently than your loop could have. I grant you that if you're not used to using find like this in C++ (and far, far too many C++ programmers use loops instead of find - in part because pre- C++11, there were no lambdas and any time you need a custom comparison function, it's a pain to get one), then it's not immediately intuitive, but it's far more flexible and powerful than removing elements by giving the element to a remove function on the list. And if you really want to use a loop, then you still can. You just can't use foreach. for(auto r = organism[]; !r.empty; r.popFront()) { if(r.front.x == x&& r.front.y == y) { organism.stableLinearRemove(take(r, 1)); break; } } but that's a lot more verbose than simply using find, and as I said, in at least some cases, find can be more efficient than simply looping, since it's optimized for finding elements. - Jonathan M Davis

Hello again. I've read all further replies in this thread, but it seems to me this is the most appropriate place to respond. Simply put, all of those options are too verbose. If what you want to do is simple, the syntax should also be simple, this is what I love about D. As a side-note, I don't come from Java, but I still expect a container to be able to remove an element by passing it to a method. The verbosity and the details of this operation should be hidden in the implementation of the method and I shouldn't need to know about the details. Otherwise, I could just as well implement my own container.

The basic response to this is, when dealing with containers generically (that is, you know you have a container, but you don't know what type), the "remove this element" operation is not necessarily a good primitive to have. Simply because from the myriad of containers, only some can implement this operation efficiently. Java embeds this operation in the interface, which means any interface you have to a container could potentially use O(n) time to remove that element. Such an innocuous piece of syntax *should* have a cost if it's not efficient IMO. BTW, the original question doesn't provide enough information to say "remove this element." Even in Java, if you aren't using the default comparison, you must use a comparator method to determine which one to remove. If cell.x == x && cell.y == y *is* the comparison operator for the type, then the syntax gets much simpler, because you don't need to pass a specialized comparison function. In dcollections, removing a specific element (using the default comparison operator for that element) on a *fast lookup* container is as simple as: container.remove(container.find(x)); Which removes the element x if it's found. However, this is not defined for containers which use O(n) time to search (such as linked list), you must use std.algorithm.find for that: container.remove(find(container[], x).begin); Should work, and takes O(n) time. -Steve

I don't really understand what's wrong with inefficient methods. You can have inefficient methods that are convenient, like removing an element by the default comparison, or giving it a delegate to match the element(s) to remove. You profile your application. Is that method the bottle-neck? If so, you change it to a more efficient one. If not, you are happy you had that method there, performing in an inefficient way, but which doesn't matter that much compared to, say, opening an SQL connection. Programmers want to program, fast. They have schedules, they need to deliver. They don't need to always find the best solution. They can find a compromise between "working" and "fast", move on, and later profile and worry about what matters most. Programmers don't want to fight with the language or think "Oh, so to remove an element I need to use this operation and combine it with that one and with that other one"...
Nov 02 2011
parent reply Ary Manzana <ary esperanto.org.ar> writes:
On 11/2/11 10:12 AM, Steven Schveighoffer wrote:
 On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary esperanto.org.ar>
 wrote:

 On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
 The basic response to this is, when dealing with containers generically
 (that is, you know you have a container, but you don't know what type),
 the "remove this element" operation is not necessarily a good primitive
 to have.

 Simply because from the myriad of containers, only some can implement
 this operation efficiently. Java embeds this operation in the interface,
 which means any interface you have to a container could potentially use
 O(n) time to remove that element. Such an innocuous piece of syntax
 *should* have a cost if it's not efficient IMO.

 BTW, the original question doesn't provide enough information to say
 "remove this element." Even in Java, if you aren't using the default
 comparison, you must use a comparator method to determine which one to
 remove. If cell.x == x && cell.y == y *is* the comparison operator for
 the type, then the syntax gets much simpler, because you don't need to
 pass a specialized comparison function.

 In dcollections, removing a specific element (using the default
 comparison operator for that element) on a *fast lookup* container is as
 simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found. However, this is not defined
 for containers which use O(n) time to search (such as linked list), you
 must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

I don't really understand what's wrong with inefficient methods. You can have inefficient methods that are convenient, like removing an element by the default comparison, or giving it a delegate to match the element(s) to remove. You profile your application. Is that method the bottle-neck? If so, you change it to a more efficient one. If not, you are happy you had that method there, performing in an inefficient way, but which doesn't matter that much compared to, say, opening an SQL connection. Programmers want to program, fast. They have schedules, they need to deliver. They don't need to always find the best solution. They can find a compromise between "working" and "fast", move on, and later profile and worry about what matters most. Programmers don't want to fight with the language or think "Oh, so to remove an element I need to use this operation and combine it with that one and with that other one"...

Or use the right container for the job? Where it really comes into play is generic programming. Let's say I write some algorithm that removes certain elements from a container: removeElements(C, T)(C c, T t[]...) { foreach(x; t) c.remove(t); } What's the complexity of this algorithm? For a HashSet, for instance, it will be O(n) where n is the number of elements to remove. But for an ArrayList, it will be O(n*m) where m is the number of elements in c.

But I'm sure in this algorithm I have for this app I'm making, my collection won't have more than 50 elements. So everything will be O(1). I need to remove an element from the collection. I really don't care about the complexity of the operation, because if n is 50, everything is O(1). Why can't I have an inefficient (for you) remove operation that, for me, will be ok? I don't want to spend 2 hours browsing the ranges algorithms to figure out how to combine them to remove an element... My point is, as someone else said it in another post: add inefficient operations and tell the programmer so. Then he can decide what to do. If he's sure that that performance penalty is not a big issue for him, he'll be happy to have that method available.
Nov 02 2011
next sibling parent reply Max Wolter <awishformore gmail.com> writes:
On 11/2/2011 2:41 PM, Steven Schveighoffer wrote:
 On Wed, 02 Nov 2011 09:17:39 -0400, Ary Manzana <ary esperanto.org.ar>
 wrote:

 On 11/2/11 10:12 AM, Steven Schveighoffer wrote:
 On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary esperanto.org.ar>
 wrote:

 On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
 The basic response to this is, when dealing with containers
 generically
 (that is, you know you have a container, but you don't know what
 type),
 the "remove this element" operation is not necessarily a good
 primitive
 to have.

 Simply because from the myriad of containers, only some can implement
 this operation efficiently. Java embeds this operation in the
 interface,
 which means any interface you have to a container could potentially
 use
 O(n) time to remove that element. Such an innocuous piece of syntax
 *should* have a cost if it's not efficient IMO.

 BTW, the original question doesn't provide enough information to say
 "remove this element." Even in Java, if you aren't using the default
 comparison, you must use a comparator method to determine which one to
 remove. If cell.x == x && cell.y == y *is* the comparison operator for
 the type, then the syntax gets much simpler, because you don't need to
 pass a specialized comparison function.

 In dcollections, removing a specific element (using the default
 comparison operator for that element) on a *fast lookup* container
 is as
 simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found. However, this is not
 defined
 for containers which use O(n) time to search (such as linked list),
 you
 must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

I don't really understand what's wrong with inefficient methods. You can have inefficient methods that are convenient, like removing an element by the default comparison, or giving it a delegate to match the element(s) to remove. You profile your application. Is that method the bottle-neck? If so, you change it to a more efficient one. If not, you are happy you had that method there, performing in an inefficient way, but which doesn't matter that much compared to, say, opening an SQL connection. Programmers want to program, fast. They have schedules, they need to deliver. They don't need to always find the best solution. They can find a compromise between "working" and "fast", move on, and later profile and worry about what matters most. Programmers don't want to fight with the language or think "Oh, so to remove an element I need to use this operation and combine it with that one and with that other one"...

Or use the right container for the job? Where it really comes into play is generic programming. Let's say I write some algorithm that removes certain elements from a container: removeElements(C, T)(C c, T t[]...) { foreach(x; t) c.remove(t); } What's the complexity of this algorithm? For a HashSet, for instance, it will be O(n) where n is the number of elements to remove. But for an ArrayList, it will be O(n*m) where m is the number of elements in c.

But I'm sure in this algorithm I have for this app I'm making, my collection won't have more than 50 elements. So everything will be O(1). I need to remove an element from the collection. I really don't care about the complexity of the operation, because if n is 50, everything is O(1).

Then your specific application can use std.algorithm.find to implement the equivalent. Only the algorithm body changes, not the usage of the algorithm.
 Why can't I have an inefficient (for you) remove operation that, for
 me, will be ok?

I never said you couldn't (and I've even given examples of such implementations). It's just not neatly packaged into a method. But again, if the method is exactly the same as the efficient version for other containers, it becomes *impossible* to design an algorithm that guarantees any sort of complexity. As I said before, quadratic sort is epic fail, and needs to always be avoided. I'll give you a scenario: People often complain that Linked List does not have an opIndex on it. Yes it's inefficient, but so what? "I know it's inefficient, let me decide whether it's worth it or not." So let's say I add it to LinkList. Those people are happy. But now, LinkList becomes defined as a *random-access-range* according to std.ranges. Therefore, std.algorithm.sort(linklist) compiles! And is now something like O(n^3). Whereas LinkList already defines a sort method, which uses mergesort (O(nlgn)). So are you going to realize this when reading someones code and you see: sort(somelist); That it's going to be horribly inefficient? Why shouldn't we strive for a library where such things just don't compile?
 I don't want to spend 2 hours browsing the ranges algorithms to figure
 out how to combine them to remove an element...

You severely exaggerate the amount of time needed to browse the algorithms. I wrote an equivalent in less than a minute. And I don't even look at std.algorithm maybe once every few months (and didn't look at it to write the solution).
 My point is, as someone else said it in another post: add inefficient
 operations and tell the programmer so. Then he can decide what to do.
 If he's sure that that performance penalty is not a big issue for him,
 he'll be happy to have that method available.

The operation is available using std.algorithm. There is no need to reimplement it as a method, sanctioned by the container. As I said before, if you find yourself wanting to remove specific elements from a container, perhaps Array is not the smartest choice for a container. There are plenty of others. There is something to be said for a language/library that discourages or prevents you from writing dumb code. It means that I'm that much more confident a piece of code written in that language is more efficient than one written in another language. -Steve

Hello. You generally arguing this is all nice and good, but this is a very specific case. I am using a LinkList because in my code, the elements are iterated over a million times and during this, I add stuff in-between elements all the time. However, I will be removing stuff *very* rarely. I am thus using the appropriate container and it doesn't matter whether the remove will be inefficient. To put it another way: if removing elements was of crucial importance to the performance of my code in the first place, I wouldn't (and shouldn't) be using a LinkList. Therefore, implementing an inefficient method which does this won't be of consequence. Finally, the fundamental statement I'm trying to make here is: adding and removing *single* elements should be a straightforward method call for *any* container. As a side note, your example about generic programming is really neat and makes sense; personally, I would never want a linked list with indexes and it's also a horrible analogy to the complaint at hand. /Max
Nov 02 2011
parent Max Wolter <awishformore gmail.com> writes:
 As long as you don't need to search for the element to remove using its
 value, removal in a linked list should be O(1). A linked list that does
 not allow O(1) removal and O(1) insertion given a topological reference
 is a failure (yes, that includes the current version of SList).

Well, thank god that's cleared up. Time to move on. /Max
Nov 04 2011
prev sibling parent reply Mike Parker <aldacron gmail.com> writes:
On 11/2/2011 10:41 PM, Steven Schveighoffer wrote:

 Then your specific application can use std.algorithm.find to implement
 the equivalent. Only the algorithm body changes, not the usage of the
 algorithm.

This is the root of the problem. How are we supposed to know that we need something from std.algorithm to remove something from a container in std.container? It's non-intuitive and non-obvious unless you already have more than a passing familiarity with ranges. If we can't have remove(element) on a container, then we need some good documentation in the std.container module that points the way.
Nov 03 2011
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-11-03 08:41, Mike Parker wrote:
 On 11/2/2011 10:41 PM, Steven Schveighoffer wrote:

 Then your specific application can use std.algorithm.find to implement
 the equivalent. Only the algorithm body changes, not the usage of the
 algorithm.

This is the root of the problem. How are we supposed to know that we need something from std.algorithm to remove something from a container in std.container? It's non-intuitive and non-obvious unless you already have more than a passing familiarity with ranges. If we can't have remove(element) on a container, then we need some good documentation in the std.container module that points the way.

What about having a remove method on a container that calls remove in std.algorithm, just as a convenience. -- /Jacob Carlborg
Nov 03 2011
prev sibling parent travert phare.normalesup.org (Christophe) writes:
"Steven Schveighoffer" , dans le message (digitalmars.D.learn:30402), a
 The primitive for a container is remove(range).  Ranges are essential to  
 containers, and should be the major interface to them.

Programmers have to learn ranges to use containers. Hiding ranges is not helping them. But here, it is more complicated than that, because range may be more powerful than iterators, they are less friendly to use when a single element reference has to be used. c.remove(find(c.all, E)); will not remove the first occurence of E, but all elements beyond E. so instead we have to write: c.remove(take(find(c[], E), 1)); Then it's too much. The problem is that range are not practical to refer to a single element in the container. We need to have single-element reference to manipulate the range. Then a function should be used to find such one-element reference. std.find is already taken, and can hardly be changed (although it should be name popUntil), but we can still use a find method of the container. auto r = take(find(c[], E), 1); should just be written: auto r = c.find(E); Then the syntax to remove a single element from c is acceptable: c.remove(c.find(E)). Now let us remove several consecutive elements from c, let us say, all elements between the value E and the next value F: | auto r = find(c[], E); | int i=0; | foreach(e, r) | if (e == F) break; | else ++i; | c.remove(take(r, i+1)); That is not practical at all, and in addition, it is not efficient, since r is walked again from E to F. If we add little methods to single element reference to make them behave a little like iterators, we can recreate ranges from two single element references, and regain all the power of iterators. To remove all elements from E to the next F included: auto r = c.find( E); c.remove(r, r.find(F)++); // or c.remove(range(r, r.find(F)++)); (I use the find method a bit like a joker in this exemple, it is just to get the idea). In conclusion, to refer to a single element of a container for simple operations, range looses against iterator. Ranges even loose to refer to a range of consecutive elements... Many alternatives are possible, but a simple iterator structure to refer to a single element, and that you can combine to recreate a range (and use all range algorithms) would be enough, and would complete the range interface to make them have no drawback against iterators.
Nov 03 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, October 30, 2011 20:53:02 Max Wolter wrote:
 Hello there.
 
 Thank you very much for the explanation.
 
 However, while I really liked the concept of ranges in Andrei's book and
 a lot of it seems intuitive and faster than using iterators, I can't
 shake the feeling that in this case, it's just unnecessarily convoluted.
 
 Maybe it's just the fact that the container library is still very basic,
 but I don't think I should go through such a complicated procedure to
 remove an known element from a list. It's just not a "very simple" or
 intuitive solution, which is something I came to love D for thus far.

You would be doing exactly the same thing in C++ except that it would be with an iterator rather than a range. You would use find to find the iterator to the element that you wanted and then you'd pass that to the list's erase function. It is only more complicated in D in that you get a range which _starts_ with the element that you want (essentially, you get the iterator to that element plus the end iterator for the list), so if you want to remove only the first element, you take it from the front of the range. Other than that, it's the same as in C++. You can't remove an element from a list in C++ by giving that element to the list anymore than you can in D. In both cases, you need an iterator or a range. So, in comparison to C++, there's no significant difference. Now, Java does have a remove function which will take an element and remove the first occurence of that element from a list, and we could theoretically add one, but why? It would just be duplicating the functionality that find already gives. Java doesn't use iterators the way the C++ does, and it doesn't have ranges, so it _can't_ have a find function the way that C++ and D do, but D can. And in some cases, find can do it more efficiently than your loop could have. I grant you that if you're not used to using find like this in C++ (and far, far too many C++ programmers use loops instead of find - in part because pre- C++11, there were no lambdas and any time you need a custom comparison function, it's a pain to get one), then it's not immediately intuitive, but it's far more flexible and powerful than removing elements by giving the element to a remove function on the list. And if you really want to use a loop, then you still can. You just can't use foreach. for(auto r = organism[]; !r.empty; r.popFront()) { if(r.front.x == x && r.front.y == y) { organism.stableLinearRemove(take(r, 1)); break; } } but that's a lot more verbose than simply using find, and as I said, in at least some cases, find can be more efficient than simply looping, since it's optimized for finding elements. - Jonathan M Davis
Oct 30 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, October 31, 2011 12:11:45 Mike Parker wrote:
 On 10/31/2011 5:28 AM, Jonathan M Davis wrote:
 So, in comparison to C++, there's no significant difference. Now, Java
 does have a remove function which will take an element and remove the
 first occurence of that element from a list, and we could theoretically
 add one, but why?

thing a lot of people expect coming from a Java background, that's for sure. The first time I tried to use SList, being unfamiliar with ranges as I was, it took a while to figure out what I needed to do. IIRC, I had to post here to ask. The problem is that you really have to understand ranges and the Phobos functions that manipulate them before you can begin to use containers. I think that's wrong. Ideally, containers should be usable without having to know about find and Take and whatever else. This isn't the first time this question has come up in the NG and I've got a feeling it won't be the last. At the very least it would be nice to see something in the documentation tying std.algorithm and std.range together with std.container. Something to point the way for those to whom it isn't obvious.

I definitely agree that the documentation should be improved to at least give the basic example of using find with linearRemove. Adding something similar to C++'s remove_if which removes all elements which match a predicate could also be useful, but I don't at all agree that there should be a function which takes an element and removes the first element which is equal to it. find allows you to do that just fine, and such a remove function would simply be duplicating its functionality. As for understanding ranges, it's definitely true that there needs to be more documentation and/or articles on them so that the concept can better communicated. That's a definite flaw in the current documentation. But there's a lot in Phobos which you're just not going to be able to use if you don't understand ranges, and the library would definitely be worse if it used them less, since they're such a powerful concept, so I think that the problem is the lack of communication on ranges, not the fact that ranges are used so heavily in Phobos. - Jonathan M Davis
Oct 30 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 30 Oct 2011 06:38:30 -0400, Max Wolter <awishformore gmail.com>  
wrote:

 Hello there.

 I seem to be having problems wrapping my head around how to use the  
 ranges in the context of containers in phobos. Specifically, I can't  
 seem to figure out how to remove an element from a linked list.

           foreach(cell; organism)
           {
              if(cell.x == x && cell.y == y)
              {
                 organism.stableLinearRemove(cell);
                 break;
              }
           }

 mind.d(123): Error: function  
 std.container.SList!(Cell).SList.linearRemove (Range r) is not callable  
 using argument types (Cell)
 mind.d(123): Error: cannot implicitly convert expression (cell) of type  
 cell.Cell to Take!(Range)

 I somehow get the feeling such a basic operation should just...work?

I offer an alternative collection library to std.container which has a simpler (IMO) interface and features. There is specifically a feature for removal during iteration. It would look like this: foreach(ref doRemove, cell; &organism.purge) { doRemove = cell.x == x && cell.y == y; // flag indicating the current element should be removed // note, not necessary to break here if(doRemove) break; } Note two things, removal during iteration is directly supported, meaning it does not derail the iteration; and removal during iteration is a quick operation (O(lg(n)) or better) in terms of the individual removal. If you have a linked list, you can also do it the "range" way: organism.remove(find!((a){return a.x == x && a.y == y;})(organism[]).begin); begin is an accessor for a dcollections range that returns a reference (cursor) to the first element in the range. Note also that dcollections' linked list is doubly-linked, so removal is O(1). -Steve
Oct 31 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 31 Oct 2011 05:16:11 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 31.10.2011 11:16, Tobias Pankrath wrote:
 Jonathan M Davis wrote:

   find allows
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it would  
 be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to the
 list's erase function.

How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation.

It is, let's use remove from std.algorithm! ... though there is no overload of remove for forward ranges ... crap, worthy of bugzilla report. Or SList could be special enough to offer a built-in remove with predicate done in O(n). However this should work( yet because of apparent bug in SList it doesn't). The bug is: Line 1294 of std.container should be: for (; !r.empty; r.popFront()) ... or it will remove the whole container. Code: import std.container, std.range; import std.functional, std.algorithm, std.stdio; void removeFromList(alias pred, T)(ref SList!T s) { static if(is(typeof(pred) == string)) alias unaryFun!pred Pred; else alias pred Pred; for(auto r=s[]; !r.empty; ) { if(Pred(r.front)) { r = s.linearRemove(take(r,1)); continue; } else r.popFront(); } } void main() { SList!int list(20, 30, 40, 50, 60); // will work with stock Phobos if the first element % 20 != 0 removeFromList!"a % 20 == 0"(list); writeln(list[]); } And more importantly, it still would be horribly slow O(N^2). Personally, because of that I'd prefer hand-rolled intrusive singly-linked list any time of day.

ahem, using dcollections: foreach(ref doRemove, cell; &organism.purge) doRemove = cell.x == x && cell.y == y; complexity: O(n) :) -Steve
Oct 31 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 31 Oct 2011 12:40:37 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 31.10.2011 18:38, Steven Schveighoffer wrote:
 On Mon, 31 Oct 2011 05:16:11 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 31.10.2011 11:16, Tobias Pankrath wrote:
 Jonathan M Davis wrote:

 find allows
 you to do that just fine, and such a remove function would simply be
 duplicating its functionality.

But this thread serves as a proof that it is not obvious and something like containers should be obvious to use. Deleting an element with a certain value is a common task and should should not take three function calls, with functions from three different modules.
 You would be doing exactly the same thing in C++ except that it
 would be
 with an iterator rather than a range. You would use find to find the
 iterator to the element that you wanted and then you'd pass that to  
 the
 list's erase function.

How do you delete every occurence of v? Not just the first one? Is this equally "easy" with find and take? I didn't find an easy way not envolving a loop within 10 Minutes by reading the documentation.

It is, let's use remove from std.algorithm! ... though there is no overload of remove for forward ranges ... crap, worthy of bugzilla report. Or SList could be special enough to offer a built-in remove with predicate done in O(n). However this should work( yet because of apparent bug in SList it doesn't). The bug is: Line 1294 of std.container should be: for (; !r.empty; r.popFront()) ... or it will remove the whole container. Code: import std.container, std.range; import std.functional, std.algorithm, std.stdio; void removeFromList(alias pred, T)(ref SList!T s) { static if(is(typeof(pred) == string)) alias unaryFun!pred Pred; else alias pred Pred; for(auto r=s[]; !r.empty; ) { if(Pred(r.front)) { r = s.linearRemove(take(r,1)); continue; } else r.popFront(); } } void main() { SList!int list(20, 30, 40, 50, 60); // will work with stock Phobos if the first element % 20 != 0 removeFromList!"a % 20 == 0"(list); writeln(list[]); } And more importantly, it still would be horribly slow O(N^2). Personally, because of that I'd prefer hand-rolled intrusive singly-linked list any time of day.

ahem, using dcollections: foreach(ref doRemove, cell; &organism.purge) doRemove = cell.x == x && cell.y == y; complexity: O(n)

While computer happily does O(n) work here, my brain screams in cognitive dissonance figuring out that this assignment actually does delete... meh voodoo through and through :) I'd rather organism.remove!((cell){ return cell.x == x && cell.y == y; })(); or something like that. But it's not a question of form, but more of a missing stuff that should be there.

That could also be added (and I agree it's a good idea). Alas, interface templates still don't work, so it'd have to be defined as a convention. Note that this isn't really the intended usage for purge (though it works quite nicely as an afterthought). Typically, when I use purge, I'm iterating through all the elements, processing them somehow, and then deciding to remove them or not based on the processing. For instance, a main thread processing results from sub-threads, and removing ones that are finished. Think of it as an implementation of Java's Iterator.remove: http://download.oracle.com/javase/7/docs/api/java/util/Iterator.html#remove() -Steve
Oct 31 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 01 Nov 2011 16:53:26 -0400, Max Wolter <awishformore gmail.com>  
wrote:

 On 10/30/2011 9:28 PM, Jonathan M Davis wrote:
 On Sunday, October 30, 2011 20:53:02 Max Wolter wrote:
 Hello there.

 Thank you very much for the explanation.

 However, while I really liked the concept of ranges in Andrei's book  
 and
 a lot of it seems intuitive and faster than using iterators, I can't
 shake the feeling that in this case, it's just unnecessarily  
 convoluted.

 Maybe it's just the fact that the container library is still very  
 basic,
 but I don't think I should go through such a complicated procedure to
 remove an known element from a list. It's just not a "very simple" or
 intuitive solution, which is something I came to love D for thus far.

You would be doing exactly the same thing in C++ except that it would be with an iterator rather than a range. You would use find to find the iterator to the element that you wanted and then you'd pass that to the list's erase function. It is only more complicated in D in that you get a range which _starts_ with the element that you want (essentially, you get the iterator to that element plus the end iterator for the list), so if you want to remove only the first element, you take it from the front of the range. Other than that, it's the same as in C++. You can't remove an element from a list in C++ by giving that element to the list anymore than you can in D. In both cases, you need an iterator or a range. So, in comparison to C++, there's no significant difference. Now, Java does have a remove function which will take an element and remove the first occurence of that element from a list, and we could theoretically add one, but why? It would just be duplicating the functionality that find already gives. Java doesn't use iterators the way the C++ does, and it doesn't have ranges, so it _can't_ have a find function the way that C++ and D do, but D can. And in some cases, find can do it more efficiently than your loop could have. I grant you that if you're not used to using find like this in C++ (and far, far too many C++ programmers use loops instead of find - in part because pre- C++11, there were no lambdas and any time you need a custom comparison function, it's a pain to get one), then it's not immediately intuitive, but it's far more flexible and powerful than removing elements by giving the element to a remove function on the list. And if you really want to use a loop, then you still can. You just can't use foreach. for(auto r = organism[]; !r.empty; r.popFront()) { if(r.front.x == x&& r.front.y == y) { organism.stableLinearRemove(take(r, 1)); break; } } but that's a lot more verbose than simply using find, and as I said, in at least some cases, find can be more efficient than simply looping, since it's optimized for finding elements. - Jonathan M Davis

Hello again. I've read all further replies in this thread, but it seems to me this is the most appropriate place to respond. Simply put, all of those options are too verbose. If what you want to do is simple, the syntax should also be simple, this is what I love about D. As a side-note, I don't come from Java, but I still expect a container to be able to remove an element by passing it to a method. The verbosity and the details of this operation should be hidden in the implementation of the method and I shouldn't need to know about the details. Otherwise, I could just as well implement my own container.

The basic response to this is, when dealing with containers generically (that is, you know you have a container, but you don't know what type), the "remove this element" operation is not necessarily a good primitive to have. Simply because from the myriad of containers, only some can implement this operation efficiently. Java embeds this operation in the interface, which means any interface you have to a container could potentially use O(n) time to remove that element. Such an innocuous piece of syntax *should* have a cost if it's not efficient IMO. BTW, the original question doesn't provide enough information to say "remove this element." Even in Java, if you aren't using the default comparison, you must use a comparator method to determine which one to remove. If cell.x == x && cell.y == y *is* the comparison operator for the type, then the syntax gets much simpler, because you don't need to pass a specialized comparison function. In dcollections, removing a specific element (using the default comparison operator for that element) on a *fast lookup* container is as simple as: container.remove(container.find(x)); Which removes the element x if it's found. However, this is not defined for containers which use O(n) time to search (such as linked list), you must use std.algorithm.find for that: container.remove(find(container[], x).begin); Should work, and takes O(n) time. -Steve
Nov 02 2011
prev sibling next sibling parent Paolo Invernizzi <arathorn fastwebnet.it> writes:
On Nov 2, 2011, at 12:48 PM, Steven Schveighoffer wrote:

 Hello again.
=20
 I've read all further replies in this thread, but it seems to me this =


=20
 Simply put, all of those options are too verbose. If what you want to =


about D. As a side-note, I don't come from Java, but I still expect a = container to be able to remove an element by passing it to a method. The = verbosity and the details of this operation should be hidden in the = implementation of the method and I shouldn't need to know about the = details. Otherwise, I could just as well implement my own container.
=20
 The basic response to this is, when dealing with containers =

what type), the "remove this element" operation is not necessarily a = good primitive to have.
=20
 Simply because from the myriad of containers, only some can implement =

interface, which means any interface you have to a container could = potentially use O(n) time to remove that element. Such an innocuous = piece of syntax *should* have a cost if it's not efficient IMO.
=20
 BTW, the original question doesn't provide enough information to say =

comparison, you must use a comparator method to determine which one to = remove. If cell.x =3D=3D x && cell.y =3D=3D y *is* the comparison = operator for the type, then the syntax gets much simpler, because you = don't need to pass a specialized comparison function.
=20
 In dcollections, removing a specific element (using the default =

simple as:
=20
 container.remove(container.find(x));
=20
 Which removes the element x if it's found.  However, this is not =

list), you must use std.algorithm.find for that:
=20
 container.remove(find(container[], x).begin);
=20
 Should work, and takes O(n) time.
=20
 -Steve

I can't really understand what is wrong with an inefficient "remove this = element" primitive if it's really what the user had to do... Once it's in the documentation that the operation is inefficient, why = the user must be forced to dig into the newsgroup to find out some code?=20= Why can't that be furnished in a free function, for example? Cheers, Paolo
Nov 02 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary esperanto.org.ar>  
wrote:

 On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
 The basic response to this is, when dealing with containers generically
 (that is, you know you have a container, but you don't know what type),
 the "remove this element" operation is not necessarily a good primitive
 to have.

 Simply because from the myriad of containers, only some can implement
 this operation efficiently. Java embeds this operation in the interface,
 which means any interface you have to a container could potentially use
 O(n) time to remove that element. Such an innocuous piece of syntax
 *should* have a cost if it's not efficient IMO.

 BTW, the original question doesn't provide enough information to say
 "remove this element." Even in Java, if you aren't using the default
 comparison, you must use a comparator method to determine which one to
 remove. If cell.x == x && cell.y == y *is* the comparison operator for
 the type, then the syntax gets much simpler, because you don't need to
 pass a specialized comparison function.

 In dcollections, removing a specific element (using the default
 comparison operator for that element) on a *fast lookup* container is as
 simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found. However, this is not defined
 for containers which use O(n) time to search (such as linked list), you
 must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

I don't really understand what's wrong with inefficient methods. You can have inefficient methods that are convenient, like removing an element by the default comparison, or giving it a delegate to match the element(s) to remove. You profile your application. Is that method the bottle-neck? If so, you change it to a more efficient one. If not, you are happy you had that method there, performing in an inefficient way, but which doesn't matter that much compared to, say, opening an SQL connection. Programmers want to program, fast. They have schedules, they need to deliver. They don't need to always find the best solution. They can find a compromise between "working" and "fast", move on, and later profile and worry about what matters most. Programmers don't want to fight with the language or think "Oh, so to remove an element I need to use this operation and combine it with that one and with that other one"...

Or use the right container for the job? Where it really comes into play is generic programming. Let's say I write some algorithm that removes certain elements from a container: removeElements(C, T)(C c, T t[]...) { foreach(x; t) c.remove(t); } What's the complexity of this algorithm? For a HashSet, for instance, it will be O(n) where n is the number of elements to remove. But for an ArrayList, it will be O(n*m) where m is the number of elements in c. So what we get is, an algorithm whose complexity depends on the complexity of an operation that varies *widely*. What you end up with is things like sorting algorithms which are O(n^2) or even worse. What omitting those methods from the containers that don't support it does is allow you to make *predictably efficient* generic algorithms. For example, you know sorting is going to be at most O(nlgn). I don't care how beautiful the syntax is, a quadratic sort is epic fail, and should be avoided at all costs. std.container tries to allow having these inefficient methods by changing the name (so algorithms can still claim efficiency by not using those names). My stance is this makes the API overly complex for very little gain. If something is inefficient, either don't use it, or use the range interface via std.algorithm. Why should I write a linear search algorithm when std.algorithm already has done this? -Steve
Nov 02 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 02 Nov 2011 09:17:39 -0400, Ary Manzana <ary esperanto.org.ar>  
wrote:

 On 11/2/11 10:12 AM, Steven Schveighoffer wrote:
 On Wed, 02 Nov 2011 08:40:19 -0400, Ary Manzana <ary esperanto.org.ar>
 wrote:

 On 11/2/11 8:48 AM, Steven Schveighoffer wrote:
 The basic response to this is, when dealing with containers  
 generically
 (that is, you know you have a container, but you don't know what  
 type),
 the "remove this element" operation is not necessarily a good  
 primitive
 to have.

 Simply because from the myriad of containers, only some can implement
 this operation efficiently. Java embeds this operation in the  
 interface,
 which means any interface you have to a container could potentially  
 use
 O(n) time to remove that element. Such an innocuous piece of syntax
 *should* have a cost if it's not efficient IMO.

 BTW, the original question doesn't provide enough information to say
 "remove this element." Even in Java, if you aren't using the default
 comparison, you must use a comparator method to determine which one to
 remove. If cell.x == x && cell.y == y *is* the comparison operator for
 the type, then the syntax gets much simpler, because you don't need to
 pass a specialized comparison function.

 In dcollections, removing a specific element (using the default
 comparison operator for that element) on a *fast lookup* container is  
 as
 simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found. However, this is not  
 defined
 for containers which use O(n) time to search (such as linked list),  
 you
 must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

I don't really understand what's wrong with inefficient methods. You can have inefficient methods that are convenient, like removing an element by the default comparison, or giving it a delegate to match the element(s) to remove. You profile your application. Is that method the bottle-neck? If so, you change it to a more efficient one. If not, you are happy you had that method there, performing in an inefficient way, but which doesn't matter that much compared to, say, opening an SQL connection. Programmers want to program, fast. They have schedules, they need to deliver. They don't need to always find the best solution. They can find a compromise between "working" and "fast", move on, and later profile and worry about what matters most. Programmers don't want to fight with the language or think "Oh, so to remove an element I need to use this operation and combine it with that one and with that other one"...

Or use the right container for the job? Where it really comes into play is generic programming. Let's say I write some algorithm that removes certain elements from a container: removeElements(C, T)(C c, T t[]...) { foreach(x; t) c.remove(t); } What's the complexity of this algorithm? For a HashSet, for instance, it will be O(n) where n is the number of elements to remove. But for an ArrayList, it will be O(n*m) where m is the number of elements in c.

But I'm sure in this algorithm I have for this app I'm making, my collection won't have more than 50 elements. So everything will be O(1). I need to remove an element from the collection. I really don't care about the complexity of the operation, because if n is 50, everything is O(1).

Then your specific application can use std.algorithm.find to implement the equivalent. Only the algorithm body changes, not the usage of the algorithm.
 Why can't I have an inefficient (for you) remove operation that, for me,  
 will be ok?

I never said you couldn't (and I've even given examples of such implementations). It's just not neatly packaged into a method. But again, if the method is exactly the same as the efficient version for other containers, it becomes *impossible* to design an algorithm that guarantees any sort of complexity. As I said before, quadratic sort is epic fail, and needs to always be avoided. I'll give you a scenario: People often complain that Linked List does not have an opIndex on it. Yes it's inefficient, but so what? "I know it's inefficient, let me decide whether it's worth it or not." So let's say I add it to LinkList. Those people are happy. But now, LinkList becomes defined as a *random-access-range* according to std.ranges. Therefore, std.algorithm.sort(linklist) compiles! And is now something like O(n^3). Whereas LinkList already defines a sort method, which uses mergesort (O(nlgn)). So are you going to realize this when reading someones code and you see: sort(somelist); That it's going to be horribly inefficient? Why shouldn't we strive for a library where such things just don't compile?
 I don't want to spend 2 hours browsing the ranges algorithms to figure  
 out how to combine them to remove an element...

You severely exaggerate the amount of time needed to browse the algorithms. I wrote an equivalent in less than a minute. And I don't even look at std.algorithm maybe once every few months (and didn't look at it to write the solution).
 My point is, as someone else said it in another post: add inefficient  
 operations and tell the programmer so. Then he can decide what to do. If  
 he's sure that that performance penalty is not a big issue for him,  
 he'll be happy to have that method available.

The operation is available using std.algorithm. There is no need to reimplement it as a method, sanctioned by the container. As I said before, if you find yourself wanting to remove specific elements from a container, perhaps Array is not the smartest choice for a container. There are plenty of others. There is something to be said for a language/library that discourages or prevents you from writing dumb code. It means that I'm that much more confident a piece of code written in that language is more efficient than one written in another language. -Steve
Nov 02 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, November 03, 2011 16:41:27 Mike Parker wrote:
 On 11/2/2011 10:41 PM, Steven Schveighoffer wrote:
 Then your specific application can use std.algorithm.find to implement
 the equivalent. Only the algorithm body changes, not the usage of the
 algorithm.

This is the root of the problem. How are we supposed to know that we need something from std.algorithm to remove something from a container in std.container? It's non-intuitive and non-obvious unless you already have more than a passing familiarity with ranges. If we can't have remove(element) on a container, then we need some good documentation in the std.container module that points the way.

It's quite clear that we need to do a better job getting the concept of ranges across (probably with at least one explanatory article) and that std.container's documentation needs improvement. - Jonathan M Davis
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 03:41:27 -0400, Mike Parker <aldacron gmail.com> wrote:

 On 11/2/2011 10:41 PM, Steven Schveighoffer wrote:

 Then your specific application can use std.algorithm.find to implement
 the equivalent. Only the algorithm body changes, not the usage of the
 algorithm.

This is the root of the problem. How are we supposed to know that we need something from std.algorithm to remove something from a container in std.container? It's non-intuitive and non-obvious unless you already have more than a passing familiarity with ranges. If we can't have remove(element) on a container, then we need some good documentation in the std.container module that points the way.

The primitive for a container is remove(range). Ranges are essential to containers, and should be the major interface to them. remove(value) translates to remove(findRangeFor(value)). I'm asserting that we should not promote this "shortcut" if it's unavoidably linear in nature, otherwise, remove(value) has the stigma of being slow, even for containers which can do it quickly. It's not the only choice, and there are ways to argue both sides. But the fact that one can still do this using std.algorithm makes it at least so you can have the best of both worlds -- it's difficult to do, not impossible, but we can still develop generic code with complexity guarantees. I agree the documentation needs more care. I think std.container suffers from neglect in other areas too. I argued this position, however, because even though it's not spelled out in std.container's docs, it *is* the position std.container is taking. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Wed, 02 Nov 2011 19:24:03 -0400, Max Wolter <awishformore gmail.com>  
wrote:

 On 11/2/2011 2:41 PM, Steven Schveighoffer wrote:

 I never said you couldn't (and I've even given examples of such
 implementations). It's just not neatly packaged into a method.

 But again, if the method is exactly the same as the efficient version
 for other containers, it becomes *impossible* to design an algorithm
 that guarantees any sort of complexity. As I said before, quadratic sort
 is epic fail, and needs to always be avoided.

 I'll give you a scenario:

 People often complain that Linked List does not have an opIndex on it.
 Yes it's inefficient, but so what? "I know it's inefficient, let me
 decide whether it's worth it or not."

 So let's say I add it to LinkList. Those people are happy.

 But now, LinkList becomes defined as a *random-access-range* according
 to std.ranges. Therefore, std.algorithm.sort(linklist) compiles! And is
 now something like O(n^3).

 Whereas LinkList already defines a sort method, which uses mergesort
 (O(nlgn)). So are you going to realize this when reading someones code
 and you see:

 sort(somelist);

 That it's going to be horribly inefficient? Why shouldn't we strive for
 a library where such things just don't compile?

Hello. You generally arguing this is all nice and good, but this is a very specific case. I am using a LinkList because in my code, the elements are iterated over a million times and during this, I add stuff in-between elements all the time. However, I will be removing stuff *very* rarely. I am thus using the appropriate container and it doesn't matter whether the remove will be inefficient.

A linked list (any list really) is important if you want to maintain insertion order. If that's not important, don't use a list, a hash or tree is about as efficient. There exist (not in D) hybrid container types that allow O(1) removal using value, and maintain insertion order. In fact, the actual remove is not inefficient if you have a reference to an element (not just the value). Unfortunately, for SList, this is not the case, but it should be fixed at some point. But I still maintain, anything in std.container is not just a container for your code, it's a container that tries to cater to the most common needs. If you want a remove(value) function, it is trivial to write.
 To put it another way: if removing elements was of crucial importance to  
 the performance of my code in the first place, I wouldn't (and  
 shouldn't) be using a LinkList.

As long as you don't need to search for the element to remove using its value, removal in a linked list should be O(1). A linked list that does not allow O(1) removal and O(1) insertion given a topological reference is a failure (yes, that includes the current version of SList).
 Therefore, implementing an inefficient method which does this won't be  
 of consequence. Finally, the fundamental statement I'm trying to make  
 here is: adding and removing *single* elements should be a  
 straightforward method call for *any* container.

Adding, yes. Removing a container-selected element, yes. Removing a *specific* element, no. Inserting an element in a *specific location*, no. std.container has taken the stance that primitives should reflect the efficiency of the operation. This is not the only valid position to have. It's just std.container's position. For example, Java allows this.
 As a side note, your example about generic programming is really neat  
 and makes sense; personally, I would never want a linked list with  
 indexes and it's also a horrible analogy to the complaint at hand.

People have asked for it and argued vigorously for it on this newsgroup, just as you are arguing for this. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath <tobias pankrath.net>  
wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).

SList is a poor singly linked list implementation. It does not support O(1) removal. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer? I suggested to Andrei having the range keep track of the *previous* node/head, but he did not like that idea (too many dereferences for front()). Another option is to have a removeAllButFirst method, but that's kind of awkward...
 Actually, it opens up a question of "Checked ranges" akin to "Checked  
 iterators" used in many STL implementations. Any ideas what they can  
 carry around as an "ID" of list? Root pointer won't cut it, as it can be  
 easily removed during iteration. If SList was a class it's reference  
 probably would do.

dcollections stipulates that all ranges/cursors can be verified in O(lgn) time or better to belong to a specific container. In some cases, this adds an extra word to the range/cursor, and in others, it's easy to determine or the owner-reference was already needed. Since everything is a class, the fallback is to just stick an owner class instance in the range. This stipulation is necessary to allow safe slicing. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 12:32:06 -0400, Christophe  
<travert phare.normalesup.org> wrote:

 "Steven Schveighoffer" , dans le message (digitalmars.D.learn:30402), a
 The primitive for a container is remove(range).  Ranges are essential to
 containers, and should be the major interface to them.

Programmers have to learn ranges to use containers. Hiding ranges is not helping them. But here, it is more complicated than that, because range may be more powerful than iterators, they are less friendly to use when a single element reference has to be used. c.remove(find(c.all, E)); will not remove the first occurence of E, but all elements beyond E. so instead we have to write: c.remove(take(find(c[], E), 1)); Then it's too much. The problem is that range are not practical to refer to a single element in the container. We need to have single-element reference to manipulate the range. Then a function should be used to find such one-element reference. std.find is already taken, and can hardly be changed (although it should be name popUntil), but we can still use a find method of the container. auto r = take(find(c[], E), 1); should just be written: auto r = c.find(E); Then the syntax to remove a single element from c is acceptable: c.remove(c.find(E)). Now let us remove several consecutive elements from c, let us say, all elements between the value E and the next value F: | auto r = find(c[], E); | int i=0; | foreach(e, r) | if (e == F) break; | else ++i; | c.remove(take(r, i+1)); That is not practical at all, and in addition, it is not efficient, since r is walked again from E to F. If we add little methods to single element reference to make them behave a little like iterators, we can recreate ranges from two single element references, and regain all the power of iterators. To remove all elements from E to the next F included: auto r = c.find( E); c.remove(r, r.find(F)++); // or c.remove(range(r, r.find(F)++)); (I use the find method a bit like a joker in this exemple, it is just to get the idea). In conclusion, to refer to a single element of a container for simple operations, range looses against iterator. Ranges even loose to refer to a range of consecutive elements... Many alternatives are possible, but a simple iterator structure to refer to a single element, and that you can combine to recreate a range (and use all range algorithms) would be enough, and would complete the range interface to make them have no drawback against iterators.

Preaching to the choir sir :) http://www.dsource.org/projects/dcollections/browser/branches/d2/concepts.txt If you can convince Andrei that cursors are a good addition to std.container, you have my admiration. I've tried and failed quite a few times. To illustrate syntax: auto cursor = c.elemAt(E); c.remove(c[cursor..c.end]); // note, $ could be used here, but opDollar is broken. Note, this only works if c supports elemAt. For example, TreeSet supports this, LinkList does not. But dcollections supports even other slicing abilities. For example TreeSet can do this too: c.remove(c[E..F]); If you have a container that doesn't support elemAt, you can use std.algorithm.find, and the dcollections' range.begin method to get a valid cursor: auto cursor = find(c[], E).begin; I realize this isn't much better than take(find(c[], E), 1), but I don't know if you realize what a task/chore it would be to create a simple shortcut in a class for std.algorithm.find. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually. A range is a reference to a Node struct. But I still think the following approach will work.
 0. Have a sentinel for end of list. (O(1) additional memory for the  
 entire list). It is the only node in the list that has a null 'next'  
 pointer.

 example, for removing just the current node:
 1. if the following node is not the sentinel
 1.1. Move value from following node into the current node.
 1.2. Remove following node.
 2. if the following node is the sentinel
 2.1. erase the contents of the current node (iff it has indirections)
 2.2. set it's 'next' field to null


 Removing an entire range is just erasing the contents of the current  
 node and setting the 'next' field of the associated Node to zero.  
 Removing a Take!Range is a simple generalization of the algorithm above.

This would be a good addition to the type. It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed. Can you put together a pull request? If not, I can see about doing it. One issue, you could not do this if the value is immutable/const. I could use this idea, I think, to implement a singly linked list in dcollections as well (the prospect of not having O(1) removal is what has stopped me). Thanks for the idea! -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 14:08:31 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 03.11.2011 21:13, Steven Schveighoffer wrote:
 dcollections stipulates that all ranges/cursors can be verified in
 O(lgn) time or better to belong to a specific container. In some cases,
 this adds an extra word to the range/cursor, and in others, it's easy to
 determine or the owner-reference was already needed. Since everything is
 a class, the fallback is to just stick an owner class instance in the
 range.

 This stipulation is necessary to allow safe slicing.

Seems reasonable, I'd expect checks to go away in release, right(?).

For the moment, no. I am not sure whether this is the right decision or not, because once you get beyond arrays, when to do bounds checks becomes fuzzy. For example, imagine you have this: auto ts = new TreeSet!int(1, 3, 5, 7, 9); What does this mean? auto r = ts[2..4]; Note that a range type for a treeset has a pointer to a begin and end node for the container. For arrays, not doing a bounds check is simply less code. For a RBTree, you still have to look for the specific node, even if you are in release mode. Another example: auto ts2 = ts.dup; // duplicate the treeset auto c1 = ts2.elemAt(3); // get the node for 3 in ts2 auto r2 = ts[c1..ts.end]; Here I can say, we can skip the belongs check (which is an O(lgn) walk up the tree). But I'm still doing it in release mode. I'm not sure what's best. Should I just do the least work possible, or should I make it consistent with ts[2..4]? -Steve
Nov 03 2011
prev sibling next sibling parent Ali =?iso-8859-1?q?=C7ehreli?= <acehreli yahoo.com> writes:
On Thu, 03 Nov 2011 22:08:31 +0400, Dmitry Olshansky wrote:

 On 03.11.2011 21:13, Steven Schveighoffer wrote:
 The range type for a SList has a single pointer to the currently
 iterated node. How do you remove that node without having access to the
 head/previous pointer?

usually keep previous node but only when I remove elements during iteration.

Matt Austern's excellent "STL Singly Linked Lists" presentation is relevant: http://accu.org/index.php/accu_branches/accu_usa/past Slide 11 is titled "Solving the insert problem: off-by-one iterators". Also slide 16. Ali
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 14:50:31 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 07:15 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not  
 support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually. A range is a reference to a Node struct. But I still think the following approach will work.
 0. Have a sentinel for end of list. (O(1) additional memory for the
 entire list). It is the only node in the list that has a null 'next'
 pointer.

 example, for removing just the current node:
 1. if the following node is not the sentinel
 1.1. Move value from following node into the current node.
 1.2. Remove following node.
 2. if the following node is the sentinel
 2.1. erase the contents of the current node (iff it has indirections)
 2.2. set it's 'next' field to null


 Removing an entire range is just erasing the contents of the current
 node and setting the 'next' field of the associated Node to zero.
 Removing a Take!Range is a simple generalization of the algorithm  
 above.

This would be a good addition to the type. It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed. Can you put together a pull request?

I can do that, but it will have to wait a few days, as I am quite busy at the moment.

No rush. I just wanted to know if you would do it, and if not, I would do it.
 If not, I
 can see about doing it. One issue, you could not do this if the value is
 immutable/const.

That is true. But how to fix this? Resolving it by simply not exposing the remove method if it is impossible to implement the straightforward way would be an option, but I don't think that is satisfying. Probably it could be made to work by creating some kind of head mutable structure. I will think about it. It should even be possible to generalize std.typecons.Rebindable for structs to help this idea.

Hm... rebindable doesn't help if the item is a struct. You'd have to store an allocated struct on the heap. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 16:27:46 -0400, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 03.11.2011 22:37, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 14:08:31 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 21:13, Steven Schveighoffer wrote:
 dcollections stipulates that all ranges/cursors can be verified in
 O(lgn) time or better to belong to a specific container. In some  
 cases,
 this adds an extra word to the range/cursor, and in others, it's easy  
 to
 determine or the owner-reference was already needed. Since everything  
 is
 a class, the fallback is to just stick an owner class instance in the
 range.

 This stipulation is necessary to allow safe slicing.

Seems reasonable, I'd expect checks to go away in release, right(?).

For the moment, no. I am not sure whether this is the right decision or not, because once you get beyond arrays, when to do bounds checks becomes fuzzy. For example, imagine you have this: auto ts = new TreeSet!int(1, 3, 5, 7, 9); What does this mean? auto r = ts[2..4]; Note that a range type for a treeset has a pointer to a begin and end node for the container. For arrays, not doing a bounds check is simply less code. For a RBTree, you still have to look for the specific node, even if you are in release mode. Another example: auto ts2 = ts.dup; // duplicate the treeset auto c1 = ts2.elemAt(3); // get the node for 3 in ts2 auto r2 = ts[c1..ts.end];

hm-h will that actually work? I mean searching in ts for node from ts2?

c1 is a cursor, so there is no need to search for it (the point of a cursor is to keep a reference to an element for later use). All you have to do is verify it's in the container (and the ordering is valid). If unchecked, it will result in likely a segfault, because the two endpoints are not connected.
 Here I can say, we can skip the belongs check (which is an O(lgn) walk
 up the tree).

Well, I was mostly talking about this kind of scenario i.e. skip checking that cursor do belong to this collection. While in ts[2..4] example there is no (explicit) cursors so it's a different situation.

So what should happen? Should it throw?
 But I'm still doing it in release mode. I'm not sure what's best. Should
 I just do the least work possible, or should I make it consistent with
 ts[2..4]?

I'd say release mode should avoid as much extra work as possible while keeping primary functionality intact, but that's just me.

Yes, it's a dilemma that I'm not sure what the right answer is. It does make sense that release mode should just avoid the checks altogether. -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 17:13:55 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 09:46 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 14:50:31 -0400, Timon Gehr <timon.gehr gmx.ch>  
 wrote:

 On 11/03/2011 07:15 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 13:47:28 -0400, Timon Gehr <timon.gehr gmx.ch>
 wrote:

 On 11/03/2011 06:13 PM, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 12:35:36 -0400, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 03.11.2011 19:34, Steven Schveighoffer wrote:
 On Thu, 03 Nov 2011 10:02:22 -0400, Tobias Pankrath
 <tobias pankrath.net> wrote:

 Dmitry Olshansky wrote:

 And more importantly, it still would be horribly slow O(N^2).
 Personally, because of that I'd prefer hand-rolled intrusive
 singly-linked list any time of day.

To be honest, I don't understand this. A "remove_if" for non-intrusive single-linked lists should be doable in O(N).


Yeah, poor wording going to kill me one day :) And looking at how SList works with "checked" remove does O(N^2) turned out to be a context for reference for intrusive singly-linked list, just my bad. As for why I'd rather go with intrusive lists, it's because of it usually uses less memory due to node structures being more padding-free. Anyway the point should have been focused on _hand-rolled_ part, mostly because SList is plain not ready for prime time*. And btw singly-linked list it's not hard to implement and you can fine tune it how you like it e.g. they do play nice with free list allocation strategy. Not to say of circular lists and/or using sentinels.
 SList is a poor singly linked list implementation. It does not
 support
 O(1) removal.

 -Steve

Aye, looking at SList implementation I can say that it sort of tries to verify that this is a correct list. Otherwise it would be O(1). Indeed passing wrong range/iterator is quite bad in linked lists and could lead to all manner of funky bugs, but the cost is horrific. Especially since insert doesn't do this extra check.

No, it's necessarily doing O(n) search for current node because it has no reference to the previous node. The range type for a SList has a single pointer to the currently iterated node. How do you remove that node without having access to the head/previous pointer?

The container interface does not expose references to the 'Node' struct, therefore the following approach would be practical:

It does, actually. A range is a reference to a Node struct. But I still think the following approach will work.
 0. Have a sentinel for end of list. (O(1) additional memory for the
 entire list). It is the only node in the list that has a null 'next'
 pointer.

 example, for removing just the current node:
 1. if the following node is not the sentinel
 1.1. Move value from following node into the current node.
 1.2. Remove following node.
 2. if the following node is the sentinel
 2.1. erase the contents of the current node (iff it has indirections)
 2.2. set it's 'next' field to null


 Removing an entire range is just erasing the contents of the current
 node and setting the 'next' field of the associated Node to zero.
 Removing a Take!Range is a simple generalization of the algorithm
 above.

This would be a good addition to the type. It could not be a stableRemove, however, because it would invalidate any ranges pointing at the node you removed. Can you put together a pull request?

I can do that, but it will have to wait a few days, as I am quite busy at the moment.

No rush. I just wanted to know if you would do it, and if not, I would do it.
 If not, I
 can see about doing it. One issue, you could not do this if the value  
 is
 immutable/const.

That is true. But how to fix this? Resolving it by simply not exposing the remove method if it is impossible to implement the straightforward way would be an option, but I don't think that is satisfying. Probably it could be made to work by creating some kind of head mutable structure. I will think about it. It should even be possible to generalize std.typecons.Rebindable for structs to help this idea.

Hm... rebindable doesn't help if the item is a struct. You'd have to store an allocated struct on the heap. -Steve

My point is, if you have a struct like this: struct S{ const int* x; immutable int y; } Then that could be stored in the list as struct _S{ const(int)* x; int y; S getS(){return S(x,y);} } Without putting the type system under pressure. But on second thought, std.typecons.Rebindable can probably not meaningfully be generalized to structs because it could not deal with member functions.

Yes, this looks like a viable solution. Not sure how it would be done in a generic way. I think what is best is to have getS be like this: S *getS(){ return cast(S*)&this;} And never provide access to this As long as S.opCmp, S.opEquals, and S.toHash are all const, this should be no issue (this needs to be a requirement). I wonder if we really have to go through such trouble... -Steve
Nov 03 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 03 Nov 2011 14:50:31 -0400, Timon Gehr <timon.gehr gmx.ch> wrote:

 On 11/03/2011 07:15 PM, Steven Schveighoffer wrote:
 I could use this idea, I think, to implement a singly linked list in
 dcollections as well (the prospect of not having O(1) removal is what
 has stopped me). Thanks for the idea!

Nice!

Looking at dcollections' List interface, the one thing I can't implement is back() (i.e. get the last element in the list). I can implement everything else. It might be worth it to make an exception for this (i.e. just throw if someone calls back()) in order to have a singly-linked list implementation. -Steve
Nov 03 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 02.11.2011, 12:48 Uhr, schrieb Steven Schveighoffer  
<schveiguy yahoo.com>:
 In dcollections, removing a specific element (using the default  
 comparison operator for that element) on a *fast lookup* container is as  
 simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found.  However, this is not defined  
 for containers which use O(n) time to search (such as linked list), you  
 must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

While I stumble over this. I found that I sometimes need a certain container (data structure/algorithm), but with an additional fast removal or lookup. So I returned a "handle" (a pointer or array index) for any inserted element that you can store (in the element itself for example) and use for lookup/removal. Templates bake the correct struct for this: // The element I want to store struct Element { // ... actual data size_t handle; // a pointer to a node in disguise, returned from insert(...) } // A collection node struct Node { Element e; Node* prev; Node* next; } // Usage elem.handle = list.insert(elem); list.remove(elem.handle); Now you have a double-linked list with O(1) removal, as long as you keep the handle around. The limitation is, that the handle has to remain constant. So a binary heap with the elements embedded in an array and no indirection through a pointer would not work. As soon as elements are reorganized, their handle (array index in this case) becomes invalid. But the point is, that you can reduce the time complexity for removal in every data structure like this in an implementation agnostic way (unless plain array of elements) and I think it is better than Java's approach that just works, but may result in an O(n) lookup operation. Iterators with remove functionality overlap with this partially. (They don't need an extra handle stored somewhere, but generally go over the complete list.)
Nov 05 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 05 Nov 2011 17:28:34 -0400, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 02.11.2011, 12:48 Uhr, schrieb Steven Schveighoffer  
 <schveiguy yahoo.com>:
 In dcollections, removing a specific element (using the default  
 comparison operator for that element) on a *fast lookup* container is  
 as simple as:

 container.remove(container.find(x));

 Which removes the element x if it's found.  However, this is not  
 defined for containers which use O(n) time to search (such as linked  
 list), you must use std.algorithm.find for that:

 container.remove(find(container[], x).begin);

 Should work, and takes O(n) time.

 -Steve

While I stumble over this. I found that I sometimes need a certain container (data structure/algorithm), but with an additional fast removal or lookup. So I returned a "handle" (a pointer or array index) for any inserted element that you can store (in the element itself for example) and use for lookup/removal. Templates bake the correct struct for this: // The element I want to store struct Element { // ... actual data size_t handle; // a pointer to a node in disguise, returned from insert(...) } // A collection node struct Node { Element e; Node* prev; Node* next; } // Usage elem.handle = list.insert(elem); list.remove(elem.handle);

You should be able to do this with dcollections: struct Element(T) { T data; List!T.cursor handle; } elem.handle = list.insert(list.end, elem.data); // x is the value being inserted list.remove(elem.handle); // removes the element you inserted earlier. Note, you have to specify an insertion point (this is similar to C++). But note that you can't exactly create a handle to a list of the Element type, because you need that cursor type to define Element.
 Now you have a double-linked list with O(1) removal, as long as you keep  
 the handle around. The limitation is, that the handle has to remain  
 constant. So a binary heap with the elements embedded in an array and no  
 indirection through a pointer would not work. As soon as elements are  
 reorganized, their handle (array index in this case) becomes invalid.
 But the point is, that you can reduce the time complexity for removal in  
 every data structure like this in an implementation agnostic way (unless  
 plain array of elements) and I think it is better than Java's approach  
 that just works, but may result in an O(n) lookup operation.
 Iterators with remove functionality overlap with this partially. (They  
 don't need an extra handle stored somewhere, but generally go over the  
 complete list.)

Definitely. This is the benefit of iterators/cursors. The ability to refer to exactly one element makes things much more straightforward. It's why dcollections has cursors. -Steve
Nov 05 2011