www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - to invalidate a range

reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
in std.container, the stable* container functions advocate that they do 
not invalidate the ranges of their containers. What does it mean to 
invalidate a range?

my assumption is it means causing e.g. front or popFront to fail when 
empty says they should succeed or vice versa.
Aug 12 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Ellery Newcomer:

 in std.container, the stable* container functions advocate that they do 
 not invalidate the ranges of their containers. What does it mean to 
 invalidate a range?

Generally modifying a collection while you iterate on it causes troubles. When you iterate on a range and you modify the range during the iteration Python gives you an error, because the "for" temporary sets boolean inside the iteratee. In D this problem was avoided in another way, using those stable functions. Bye, bearophile
Aug 12 2011
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 08/12/2011 03:13 PM, bearophile wrote:
 Ellery Newcomer:

 in std.container, the stable* container functions advocate that they do
 not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

Generally modifying a collection while you iterate on it causes troubles. When you iterate on a range and you modify the range during the iteration Python gives you an error, because the "for" temporary sets boolean inside the iteratee. In D this problem was avoided in another way, using those stable functions. Bye, bearophile

I am not convinced of this. In the following code, there is definitely a problem; it just remains to be seen whether the range is invalidated, or merely noisy according to the specified semantics. import std.container; import std.stdio; import std.array; void main(){ auto arr = make!(Array!int)([1,2,3]); auto r = arr[]; writeln(array(r.save())); arr.stableRemoveAny(); writeln(array(r.save())); }
Aug 12 2011
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they do  
 not invalidate the ranges of their containers. What does it mean to  
 invalidate a range?

Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid. What happens when you use an invalidated range? Well, we could implement something that throws an exception, but that's an efficiency problem. I contemplated doing this for debug mode in dcollections, I probably still will. Another example of an invalidated range, let's say you have a hash map. The range has a start and a finish, with the finish being iterated after the start. If you add a node, it could cause a rehash, which could potentially put the finish *before* the start! However, the same hash implementation could potentially define a stable add, which is guaranteed not to rehash the map, even when it exceeds the rehash threshold :) -Steve
Aug 12 2011
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 <ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.

Then there is no way to implement a stable remove from a node based container?
 Another example of an invalidated range, let's say you have a hash map.
 The range has a start and a finish, with the finish being iterated after
 the start. If you add a node, it could cause a rehash, which could
 potentially put the finish *before* the start!

Then the invalidation is that the range failed to produce an element of the container?
Aug 12 2011
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, August 12, 2011 12:54 Ellery Newcomer wrote:
 in std.container, the stable* container functions advocate that they do
 not invalidate the ranges of their containers. What does it mean to
 invalidate a range?
 
 my assumption is it means causing e.g. front or popFront to fail when
 empty says they should succeed or vice versa.

Short answer: The range doesn't point to what it's supposed to point to anymore. Don't use it. Its behavior is undefined. Long answer: This is a classic issue in C++ with the STL, and it applies to D's ranges for the same reason. An iterator or a range is valid only so long as it continues to point to a valid element in the container that it points to. With a vector or Array for instance, if you have an iterator or range pointer to that vector/Array and the container is reallocated because you appended to it, and it didn't have any capacity left, then you have an iterator/range which points to memory which isn't in the container anymore. Iterating with that iterator/range would be problematic. In C++, you'd likely be iterating over memory which had been deleted, which could cause all kinds of problems and would blow up on you in a variety of ways at least some of the time. In D, the memory is probably still sitting on the stack exactly as it was, so iterating over it would mean iterating over an old version of the container. It probably wouldn't blow up, but it definitely wouldn't be what you wanted. Adding and removing elements without reallocations causes problems too, because the elements get shifted around. The iterator/range may still technically be valid and useable, but it doesn't necessarily point to the same data anymore. In the case of container that uses nodes - such as a linked list - because you can add and remove elements without affecting other elements, iterators and ranges don't tend to get invalidated as easily. As long as you don't remove the element (or elements in the case of a range - assuming that it keeps track of its two end points, as is likely) that it points to, then adding or removing elements from the container shouldn't invalidate the iterator/range. So, whether a particular operation invalidates an iterator or range depends very much on the container and the operation. std.container provides stableX functions which do whatever is necessary to guarantee that any ranges which point to the container stay valid. However, any other operation which alters a container risks invalidating any existing range for that container. It may or may not invalidate the range, depending on the container and the operation, but it's a risk. The only way to avoid any risk of invalidating ranges is to not keep ranges over a container when you alter that container. So, basically what it comes down to is the short answer. A range which has been invalidated doesn't point to what it's supposed to point to anymore, and using it results in undefined behavior. It's less likely to blow up in D, because it's generally memory-safe, but you're going to get incorrect behavior. - Jonathan M Davis
Aug 12 2011
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list - because you
 can add and remove elements without affecting other elements, iterators and
 ranges don't tend to get invalidated as easily. As long as you don't remove
 the element (or elements in the case of a range - assuming that it keeps track
 of its two end points, as is likely) that it points to, then adding or
 removing elements from the container shouldn't invalidate the iterator/range.

"shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?
 So, basically what it comes down to is the short answer. A range which has
 been invalidated doesn't point to what it's supposed to point to anymore, and
 using it results in undefined behavior. It's less likely to blow up in D,
 because it's generally memory-safe, but you're going to get incorrect
 behavior.

 - Jonathan M Davis

suppose your linked list range points to a node X. element in X is removed by the linked list, and the range automagically moves to X.next (or X.prev). Is the range invalid by this standard or not? (no way 'san ifrinn I'm going to implement that, though). heh heh. most of this business has only convinced me I want immutable containers.
Aug 12 2011
parent reply Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 08/12/2011 05:51 PM, Jonathan M Davis wrote:
 An implementation can guarantee it as long as your range doesn't directly
 point to an element being removed (i.e. as long as the element isn't on the
 ends - or maybe one past the end, depending on the implementation). But _no_
 container can guarantee that an iterator or range which directly references an
 element which is removed is going to stay valid - not without playing some
 serious games internally which make iterators and ranges too inefficent, and
 possibly not even then. So, stableRemove is only going to guarantee that a
 range stays valid on as long as the end points of that range aren't what was
 being removed.

Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.
Aug 12 2011
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 08/12/2011 06:34 PM, Jonathan M Davis wrote:
 Forgive my being dense, but where is this 'as long as' coming from? If
 your range only points to ends in e.g. a linked list, how is it supposed
 to retrieve elements in the middle? I'm having a hard time visualizing a
 range over a node based container that doesn't point to a node in the
 middle (at some point in time). The range points to the node to retrieve
 the current front quickly, the node can get removed, the removed
 function won't know its invalidating the range without playing yon
 internal games, ergo stable remove cannot be.

Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second.

Now you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict.
Aug 12 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, August 12, 2011 13:58 Ellery Newcomer wrote:
 On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 
 <ellery-newcomer utulsa.edu> wrote:
 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.

Then there is no way to implement a stable remove from a node based container?

Removing elements from a node-based container only invalidates ranges if they specifically point to an element which was removed. Whether an iterator is invalidated is more obvious, because it points to a specific element, and as long as it's not the one which was removed, you're fine. For a range, it's not as obvious, because you don't really know how it was implemented internally. However, as long as it's effectively holding the begin and end iterators for the range (which is almost certainly what it has to do), then you know that your rang is fine as long as the first element pointed to and the last elemented pointed to weren't removed. You _do_ have the possible concern that your range doesn't contain the same elements that it did before (if an element was removed from its middle), but the range is still valid. - Jonathan M Davis
Aug 12 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, August 12, 2011 15:29 Ellery Newcomer wrote:
 On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list -
 because you can add and remove elements without affecting other
 elements, iterators and ranges don't tend to get invalidated as easily.
 As long as you don't remove the element (or elements in the case of a
 range - assuming that it keeps track of its two end points, as is
 likely) that it points to, then adding or removing elements from the
 container shouldn't invalidate the iterator/range.

"shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?

An implementation can guarantee it as long as your range doesn't directly point to an element being removed (i.e. as long as the element isn't on the ends - or maybe one past the end, depending on the implementation). But _no_ container can guarantee that an iterator or range which directly references an element which is removed is going to stay valid - not without playing some serious games internally which make iterators and ranges too inefficent, and possibly not even then. So, stableRemove is only going to guarantee that a range stays valid on as long as the end points of that range aren't what was being removed.
 So, basically what it comes down to is the short answer. A range which
 has been invalidated doesn't point to what it's supposed to point to
 anymore, and using it results in undefined behavior. It's less likely to
 blow up in D, because it's generally memory-safe, but you're going to
 get incorrect behavior.
 
 - Jonathan M Davis

suppose your linked list range points to a node X. element in X is removed by the linked list, and the range automagically moves to X.next (or X.prev). Is the range invalid by this standard or not? (no way 'san ifrinn I'm going to implement that, though).

If the element that you removed was the end point of a range, then the range won't be valid anymore.
 heh heh. most of this business has only convinced me I want immutable
 containers.

It's only an issue if you keep ranges of a container around and then alter the container. If you're just use ranges to do an operation or two and then throw them away, it's not an issue. C++ has been this way for years, and it's generally not a problem. It _can_ be a problem if you try and keep iterators/ranges around while altering a container, but there's not really a good way around that. And as long as you're aware of that, you'll be fine. It's only when you try and alter a container while retaining ranges to it that you're going to have to start worrying about whether a range has been invalidated or not. - Jonathan M Davis
Aug 12 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, August 12, 2011 16:16 Ellery Newcomer wrote:
 On 08/12/2011 05:51 PM, Jonathan M Davis wrote:
 An implementation can guarantee it as long as your range doesn't directly
 point to an element being removed (i.e. as long as the element isn't on
 the ends - or maybe one past the end, depending on the implementation).
 But _no_ container can guarantee that an iterator or range which
 directly references an element which is removed is going to stay valid -
 not without playing some serious games internally which make iterators
 and ranges too inefficent, and possibly not even then. So, stableRemove
 is only going to guarantee that a range stays valid on as long as the
 end points of that range aren't what was being removed.

Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.

Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second. Ranges at their most basic level are a pair of iterators - one pointing to the first element in the range and one pointing either to the last element or one past that, depending on the implementation. The range API and concept is actually much more flexible than that, allowing us to implement stuff like the fibonacci sequence as a range, but when it comes to containers, they're almost certainly doing what C++ by using two iterators, except that it's wrapped them so that you don't ever have to worry about them pointing to separate containers or the first iterator actually being past the second one. Wrapping them as a range makes it much cleaner, but ultimately, for containers at least, you're still going to have those iterators internally. So, front returns the element that the first iterator points to and popFront just increments that iterator by one. If the range has back, then back points to the last element of the range (though the internal iterator may point one elment past that), and popBack decrements that iterator by one. It doesn't directly refer to _any_ elements in the middle. So, in a node-based container, adding or removing elements in the middle of the range will have no effect on the internal iterators. It'll have an effect on what elements you ultimately iterate over if you iterate over the range later, but the two iterators are still completely valid and point to the same elements that they always have. However, if you remove either of the elements that the iterators point to, _then_ the range is going to be invalidated, because its iterators are invalid. They don't point to valid elements in the container anymore. In a contiguous container, such as Array, adding or removing elements is more disruptive, since the elements get copied around inside of the contiguous block of memory, and while the iterators may continue to point at the same indices as before, the exact elements will have shifted. So, they're still valid in the sense that they point to valid elements, but they don't point to the same elements. The two places where they end up no longer pointing to valid elements are when you remove enough elements that one or both iterators points at an index which is greater than the size of the container and when the container has to be reallocated (typically when you append enough elements that it no longer has the capacity to resize in place). So, in general, altering contiguous containers, such as Array, risks invalidating all iterators or ranges which point to them, whereas with node-based containers it's only when the end point of a range gets removed that you run into that kind of trouble. Really, to understand range invalidation, you probably should have a fair understanding of iterators. But again, as long as you don't keep any ranges around when you alter a container by adding or removing elements from it, you don't have anything to worry about. - Jonathan M Davis
Aug 12 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, August 12, 2011 20:03:59 Ellery Newcomer wrote:
 On 08/12/2011 06:34 PM, Jonathan M Davis wrote:
 Forgive my being dense, but where is this 'as long as' coming from? If
 your range only points to ends in e.g. a linked list, how is it
 supposed
 to retrieve elements in the middle? I'm having a hard time visualizing
 a
 range over a node based container that doesn't point to a node in the
 middle (at some point in time). The range points to the node to
 retrieve
 the current front quickly, the node can get removed, the removed
 function won't know its invalidating the range without playing yon
 internal games, ergo stable remove cannot be.

Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second.

Now you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict.

It means that if you're dealing with a node-based container, and you remove an element from that container, and you have a range which does not have that element at either of its ends, then you know that your range is valid. If you're keeping random ranges around and removing elements from the container, then no, you can't know whether the range is still valid or not. What it really comes down to is that you don't keep ranges around long term, and that if you're altering a container, and you're using a range over that container at the same time, you need to be sure of what you're doing. If you are, then you can use the range in spite of altering the container. If you're not, then you're in trouble. The simplest thing to do, of course, is to just not alter a container while you have ranges which refer to it (or to get rid of any ranges that you have over a container when you do alter that container). - Jonathan M Davis
Aug 12 2011
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Aug 2011 16:58:15 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 <ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.

Then there is no way to implement a stable remove from a node based container?

Not one that guarantees stability. However, you can implement a remove that can be proven to be stable for certain cases (basically, as long as you don't remove one of the endpoints).
 Another example of an invalidated range, let's say you have a hash map.
 The range has a start and a finish, with the finish being iterated after
 the start. If you add a node, it could cause a rehash, which could
 potentially put the finish *before* the start!

Then the invalidation is that the range failed to produce an element of the container?

No, it may crash the program, for example if empty does: return this.beginNode is this.endNode; If beginNode is sequentially after endNode, this condition will never be true. But there are other definitions of "invalid", I'd call any of these cases invalidated ranges: * it fails to iterate valid nodes that were in the range before the operation, and are still valid after the operation. * it iterates a node more than once (for example, iterates a node before the operation, then iterates it again after the operation) * it iterates invalid nodes (nodes that have been removed). -Steve
Aug 15 2011
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 12 Aug 2011 18:29:00 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list -  
 because you
 can add and remove elements without affecting other elements, iterators  
 and
 ranges don't tend to get invalidated as easily. As long as you don't  
 remove
 the element (or elements in the case of a range - assuming that it  
 keeps track
 of its two end points, as is likely) that it points to, then adding or
 removing elements from the container shouldn't invalidate the  
 iterator/range.

"shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?

I don't think it's possible to implement stableRemove IMO. I believe SList does claim it, but IMO once you remove an element from a container, any range that iterates that element is invalid. Once we get custom allocators, this is going to become a lot dicier, because removing elements actually may deallocate them. stableAdd is more possible for implementing, as long as adding does not significantly change the topology of the container (for example, adding to a hash may do a rehash which changes the topology). -Steve
Aug 15 2011