digitalmars.D.learn - to invalidate a range

Ellery Newcomer (5/5) Aug 12 2011 in std.container, the stable* container functions advocate that they do

bearophile (4/7) Aug 12 2011 Generally modifying a collection while you iterate on it causes troubles...

Ellery Newcomer (14/21) Aug 12 2011 I am not convinced of this. In the following code, there is definitely a...

Steven Schveighoffer (18/21) Aug 12 2011 Say for example, you are iterating a red black tree, and your current

Ellery Newcomer (5/18) Aug 12 2011 Then there is no way to implement a stable remove from a node based

Jonathan M Davis (13/28) Aug 12 2011 Removing elements from a node-based container only invalidates ranges if...
Steven Schveighoffer (17/37) Aug 15 2011 Not one that guarantees stability. However, you can implement a remove ...

Jonathan M Davis (40/46) Aug 12 2011 Short answer: The range doesn't point to what it's supposed to point to

Ellery Newcomer (9/21) Aug 12 2011 "shouldn't" isn't a guarantee. Where there is "shouldn't", there can't

Jonathan M Davis (22/47) Aug 12 2011 An implementation can guarantee it as long as your range doesn't directl...

Ellery Newcomer (9/18) Aug 12 2011 Forgive my being dense, but where is this 'as long as' coming from? If

Jonathan M Davis (48/67) Aug 12 2011 Are you familiar with iterators? This will be a lot easier if you are. A...

Ellery Newcomer (8/22) Aug 12 2011 Now you're just bludgeoning me into apathy (though my ability to

Jonathan M Davis (15/43) Aug 12 2011 It means that if you're dealing with a node-based container, and you rem...

Steven Schveighoffer (11/26) Aug 15 2011 I don't think it's possible to implement stableRemove IMO. I believe

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

in std.container, the stable* container functions advocate that they do 
not invalidate the ranges of their containers. What does it mean to 
invalidate a range?

my assumption is it means causing e.g. front or popFront to fail when 
empty says they should succeed or vice versa.

Aug 12 2011

bearophile <bearophileHUGS lycos.com> writes:

Ellery Newcomer:

 in std.container, the stable* container functions advocate that they do 
 not invalidate the ranges of their containers. What does it mean to 
 invalidate a range?

Generally modifying a collection while you iterate on it causes troubles. When
you iterate on a range and you modify the range during the iteration Python
gives you an error, because the "for" temporary sets boolean inside the
iteratee. In D this problem was avoided in another way, using those stable
functions.

Bye,
bearophile

Aug 12 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 08/12/2011 03:13 PM, bearophile wrote:
 Ellery Newcomer:

 in std.container, the stable* container functions advocate that they do
 not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

 Generally modifying a collection while you iterate on it causes troubles. When
you iterate on a range and you modify the range during the iteration Python
gives you an error, because the "for" temporary sets boolean inside the
iteratee. In D this problem was avoided in another way, using those stable
functions.

 Bye,
 bearophile

I am not convinced of this. In the following code, there is definitely a 
problem; it just remains to be seen whether the range is invalidated, or 
merely noisy according to the specified semantics.


import std.container;
import std.stdio;
import std.array;

void main(){
     auto arr = make!(Array!int)([1,2,3]);
     auto r = arr[];
     writeln(array(r.save()));
     arr.stableRemoveAny();
     writeln(array(r.save()));
}

Aug 12 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they do  
 not invalidate the ranges of their containers. What does it mean to  
 invalidate a range?

Say for example, you are iterating a red black tree, and your current  
"front" points at a certain node.  Then that node is removed from the  
tree.  That range is now invalid, because the node it points to is not  
valid.

What happens when you use an invalidated range?  Well, we could implement  
something that throws an exception, but that's an efficiency problem.  I  
contemplated doing this for debug mode in dcollections, I probably still  
will.

Another example of an invalidated range, let's say you have a hash map.   
The range has a start and a finish, with the finish being iterated after  
the start.  If you add a node, it could cause a rehash, which could  
potentially put the finish *before* the start!

However, the same hash implementation could potentially define a stable  
add, which is guaranteed not to rehash the map, even when it exceeds the  
rehash threshold :)

-Steve

Aug 12 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 <ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

 Say for example, you are iterating a red black tree, and your current
 "front" points at a certain node. Then that node is removed from the
 tree. That range is now invalid, because the node it points to is not
 valid.

Then there is no way to implement a stable remove from a node based 
container?

 Another example of an invalidated range, let's say you have a hash map.
 The range has a start and a finish, with the finish being iterated after
 the start. If you add a node, it could cause a rehash, which could
 potentially put the finish *before* the start!

Then the invalidation is that the range failed to produce an element of 
the container?

Aug 12 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, August 12, 2011 13:58 Ellery Newcomer wrote:
 On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 
 <ellery-newcomer utulsa.edu> wrote:
 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

 
 Say for example, you are iterating a red black tree, and your current
 "front" points at a certain node. Then that node is removed from the
 tree. That range is now invalid, because the node it points to is not
 valid.

 
 Then there is no way to implement a stable remove from a node based
 container?

Removing elements from a node-based container only invalidates ranges if they 
specifically point to an element which was removed. Whether an iterator is 
invalidated is more obvious, because it points to a specific element, and as 
long as it's not the one which was removed, you're fine. For a range, it's not 
as obvious, because you don't really know how it was implemented internally. 
However, as long as it's effectively holding the begin and end iterators for 
the range (which is almost certainly what it has to do), then you know that 
your rang is fine as long as the first element pointed to and the last 
elemented pointed to weren't removed. You _do_ have the possible concern that 
your range doesn't contain the same elements that it did before (if an element 
was removed from its middle), but the range is still valid.

- Jonathan M Davis

Aug 12 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 12 Aug 2011 16:58:15 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:
 On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer
 <ellery-newcomer utulsa.edu> wrote:

 in std.container, the stable* container functions advocate that they
 do not invalidate the ranges of their containers. What does it mean to
 invalidate a range?

 Say for example, you are iterating a red black tree, and your current
 "front" points at a certain node. Then that node is removed from the
 tree. That range is now invalid, because the node it points to is not
 valid.

 Then there is no way to implement a stable remove from a node based  
 container?

Not one that guarantees stability.  However, you can implement a remove  
that can be proven to be stable for certain cases (basically, as long as  
you don't remove one of the endpoints).

 Another example of an invalidated range, let's say you have a hash map.
 The range has a start and a finish, with the finish being iterated after
 the start. If you add a node, it could cause a rehash, which could
 potentially put the finish *before* the start!

 Then the invalidation is that the range failed to produce an element of  
 the container?

No, it may crash the program, for example if empty does:

return this.beginNode is this.endNode;

If beginNode is sequentially after endNode, this condition will never be  
true.

But there are other definitions of "invalid", I'd call any of these cases  
invalidated ranges:

  * it fails to iterate valid nodes that were in the range before the  
operation, and are still valid after the operation.
  * it iterates a node more than once (for example, iterates a node before  
the operation, then iterates it again after the operation)
  * it iterates invalid nodes (nodes that have been removed).

-Steve

Aug 15 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, August 12, 2011 12:54 Ellery Newcomer wrote:
 in std.container, the stable* container functions advocate that they do
 not invalidate the ranges of their containers. What does it mean to
 invalidate a range?
 
 my assumption is it means causing e.g. front or popFront to fail when
 empty says they should succeed or vice versa.

Short answer: The range doesn't point to what it's supposed to point to 
anymore. Don't use it. Its behavior is undefined.

Long answer: This is a classic issue in C++ with the STL, and it applies to 
D's ranges for the same reason. An iterator or a range is valid only so long 
as it continues to point to a valid element in the container that it points 
to. With a vector or Array for instance, if you have an iterator or range 
pointer to that vector/Array and the container is reallocated because you 
appended to it, and it didn't have any capacity left, then you have an 
iterator/range which points to memory which isn't in the container anymore. 
Iterating with that iterator/range would be problematic. In C++, you'd likely 
be iterating over memory which had been deleted, which could cause all kinds 
of problems and would blow up on you in a variety of ways at least some of the 
time. In D, the memory is probably still sitting on the stack exactly as it 
was, so iterating over it would mean iterating over an old version of the 
container. It probably wouldn't blow up, but it definitely wouldn't be what 
you wanted. Adding and removing elements without reallocations causes problems 
too, because the elements get shifted around. The iterator/range may still 
technically be valid and useable, but it doesn't necessarily point to the same 
data anymore.

In the case of container that uses nodes - such as a linked list - because you 
can add and remove elements without affecting other elements, iterators and 
ranges don't tend to get invalidated as easily. As long as you don't remove 
the element (or elements in the case of a range - assuming that it keeps track 
of its two end points, as is likely) that it points to, then adding or 
removing elements from the container shouldn't invalidate the iterator/range.

So, whether a particular operation invalidates an iterator or range depends 
very much on the container and the operation. std.container provides stableX 
functions which do whatever is necessary to guarantee that any ranges which 
point to the container stay valid. However, any other operation which alters a 
container risks invalidating any existing range for that container. It may or 
may not invalidate the range, depending on the container and the operation, 
but it's a risk. The only way to avoid any risk of invalidating ranges is to 
not keep ranges over a container when you alter that container.

So, basically what it comes down to is the short answer. A range which has 
been invalidated doesn't point to what it's supposed to point to anymore, and 
using it results in undefined behavior. It's less likely to blow up in D, 
because it's generally memory-safe, but you're going to get incorrect 
behavior.

- Jonathan M Davis

Aug 12 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list - because you
 can add and remove elements without affecting other elements, iterators and
 ranges don't tend to get invalidated as easily. As long as you don't remove
 the element (or elements in the case of a range - assuming that it keeps track
 of its two end points, as is likely) that it points to, then adding or
 removing elements from the container shouldn't invalidate the iterator/range.

"shouldn't" isn't a guarantee. Where there is "shouldn't", there can't 
be stableRemove*, no?

 So, basically what it comes down to is the short answer. A range which has
 been invalidated doesn't point to what it's supposed to point to anymore, and
 using it results in undefined behavior. It's less likely to blow up in D,
 because it's generally memory-safe, but you're going to get incorrect
 behavior.

 - Jonathan M Davis

suppose your linked list range points to a node X. element in X is 
removed by the linked list, and the range automagically moves to X.next 
(or X.prev). Is the range invalid by this standard or not? (no way 'san 
ifrinn I'm going to implement that, though).


heh heh. most of this business has only convinced me I want immutable 
containers.

Aug 12 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, August 12, 2011 15:29 Ellery Newcomer wrote:
 On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list -
 because you can add and remove elements without affecting other
 elements, iterators and ranges don't tend to get invalidated as easily.
 As long as you don't remove the element (or elements in the case of a
 range - assuming that it keeps track of its two end points, as is
 likely) that it points to, then adding or removing elements from the
 container shouldn't invalidate the iterator/range.

 
 "shouldn't" isn't a guarantee. Where there is "shouldn't", there can't
 be stableRemove*, no?

An implementation can guarantee it as long as your range doesn't directly 
point to an element being removed (i.e. as long as the element isn't on the 
ends - or maybe one past the end, depending on the implementation). But _no_ 
container can guarantee that an iterator or range which directly references an 
element which is removed is going to stay valid - not without playing some 
serious games internally which make iterators and ranges too inefficent, and 
possibly not even then. So, stableRemove is only going to guarantee that a 
range stays valid on as long as the end points of that range aren't what was 
being removed.

 So, basically what it comes down to is the short answer. A range which
 has been invalidated doesn't point to what it's supposed to point to
 anymore, and using it results in undefined behavior. It's less likely to
 blow up in D, because it's generally memory-safe, but you're going to
 get incorrect behavior.
 
 - Jonathan M Davis

 
 suppose your linked list range points to a node X. element in X is
 removed by the linked list, and the range automagically moves to X.next
 (or X.prev). Is the range invalid by this standard or not? (no way 'san
 ifrinn I'm going to implement that, though).

If the element that you removed was the end point of a range, then the range 
won't be valid anymore.

 heh heh. most of this business has only convinced me I want immutable
 containers.

It's only an issue if you keep ranges of a container around and then alter the 
container. If you're just use ranges to do an operation or two and then throw 
them away, it's not an issue. C++ has been this way for years, and it's 
generally not a problem. It _can_ be a problem if you try and keep 
iterators/ranges around while altering a container, but there's not really a 
good way around that. And as long as you're aware of that, you'll be fine. 
It's only when you try and alter a container while retaining ranges to it that 
you're going to have to start worrying about whether a range has been 
invalidated or not.

- Jonathan M Davis

Aug 12 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 08/12/2011 05:51 PM, Jonathan M Davis wrote:
 An implementation can guarantee it as long as your range doesn't directly
 point to an element being removed (i.e. as long as the element isn't on the
 ends - or maybe one past the end, depending on the implementation). But _no_
 container can guarantee that an iterator or range which directly references an
 element which is removed is going to stay valid - not without playing some
 serious games internally which make iterators and ranges too inefficent, and
 possibly not even then. So, stableRemove is only going to guarantee that a
 range stays valid on as long as the end points of that range aren't what was
 being removed.

Forgive my being dense, but where is this 'as long as' coming from? If 
your range only points to ends in e.g. a linked list, how is it supposed 
to retrieve elements in the middle? I'm having a hard time visualizing a 
range over a node based container that doesn't point to a node in the 
middle (at some point in time). The range points to the node to retrieve 
the current front quickly, the node can get removed, the removed 
function won't know its invalidating the range without playing yon 
internal games, ergo stable remove cannot be.

Aug 12 2011

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Friday, August 12, 2011 16:16 Ellery Newcomer wrote:
 On 08/12/2011 05:51 PM, Jonathan M Davis wrote:
 An implementation can guarantee it as long as your range doesn't directly
 point to an element being removed (i.e. as long as the element isn't on
 the ends - or maybe one past the end, depending on the implementation).
 But _no_ container can guarantee that an iterator or range which
 directly references an element which is removed is going to stay valid -
 not without playing some serious games internally which make iterators
 and ranges too inefficent, and possibly not even then. So, stableRemove
 is only going to guarantee that a range stays valid on as long as the
 end points of that range aren't what was being removed.

 
 Forgive my being dense, but where is this 'as long as' coming from? If
 your range only points to ends in e.g. a linked list, how is it supposed
 to retrieve elements in the middle? I'm having a hard time visualizing a
 range over a node based container that doesn't point to a node in the
 middle (at some point in time). The range points to the node to retrieve
 the current front quickly, the node can get removed, the removed
 function won't know its invalidating the range without playing yon
 internal games, ergo stable remove cannot be.

Are you familiar with iterators? This will be a lot easier if you are. An 
iterator points to one element and one element only. In C++, you tend to pass 
around pairs of iterators - one pointing to the first element in a range of 
elements and one pointing to one past the end. You then usually iterate by 
incrementing the first iterator until it equals the second.

Ranges at their most basic level are a pair of iterators - one pointing to the 
first element in the range and one pointing either to the last element or one 
past that, depending on the implementation. The range API and concept is 
actually much more flexible than that, allowing us to implement stuff like the 
fibonacci sequence as a range, but when it comes to containers, they're almost 
certainly doing what C++ by using two iterators, except that it's wrapped them 
so that you don't ever have to worry about them pointing to separate 
containers or the first iterator actually being past the second one. Wrapping 
them as a range makes it much cleaner, but ultimately, for containers at 
least, you're still going to have those iterators internally.

So, front returns the element that the first iterator points to and popFront 
just increments that iterator by one. If the range has back, then back points 
to the last element of the range (though the internal iterator may point one 
elment past that), and popBack decrements that iterator by one. It doesn't 
directly refer to _any_ elements in the middle.

So, in a node-based container, adding or removing elements in the middle of 
the range will have no effect on the internal iterators. It'll have an effect 
on what elements you ultimately iterate over if you iterate over the range 
later, but the two iterators are still completely valid and point to the same 
elements that they always have. However, if you remove either of the elements 
that the iterators point to, _then_ the range is going to be invalidated, 
because its iterators are invalid. They don't point to valid elements in the 
container anymore.

In a contiguous container, such as Array, adding or removing elements is more 
disruptive, since the elements get copied around inside of the contiguous 
block of memory, and while the iterators may continue to point at the same 
indices as before, the exact elements will have shifted. So, they're still 
valid in the sense that they point to valid elements, but they don't point to 
the same elements. The two places where they end up no longer pointing to 
valid elements are when you remove enough elements that one or both iterators 
points at an index which is greater than the size of the container and when 
the container has to be reallocated (typically when you append enough elements 
that it no longer has the capacity to resize in place). So, in general, 
altering contiguous containers, such as Array, risks invalidating all 
iterators or ranges which point to them, whereas with node-based containers 
it's only when the end point of a range gets removed that you run into that 
kind of trouble.

Really, to understand range invalidation, you probably should have a fair 
understanding of iterators. But again, as long as you don't keep any ranges 
around when you alter a container by adding or removing elements from it, you 
don't have anything to worry about.

- Jonathan M Davis

Aug 12 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 08/12/2011 06:34 PM, Jonathan M Davis wrote:
 Forgive my being dense, but where is this 'as long as' coming from? If
 your range only points to ends in e.g. a linked list, how is it supposed
 to retrieve elements in the middle? I'm having a hard time visualizing a
 range over a node based container that doesn't point to a node in the
 middle (at some point in time). The range points to the node to retrieve
 the current front quickly, the node can get removed, the removed
 function won't know its invalidating the range without playing yon
 internal games, ergo stable remove cannot be.

 Are you familiar with iterators? This will be a lot easier if you are. An
 iterator points to one element and one element only. In C++, you tend to pass
 around pairs of iterators - one pointing to the first element in a range of
 elements and one pointing to one past the end. You then usually iterate by
 incrementing the first iterator until it equals the second.

Now you're just bludgeoning me into apathy (though my ability to 
communicate seems lacking). The iterator is an abstraction. Beneath it, 
in a node based container, [I expect] will be a pointer to a node, which 
might point to any node in the container. This means that removing any 
node could potentially invalidate a range somewhere. When such a 
conflict arises, you cannot both perform the removal and keep a valid 
range, regardless of whether you even knew of the conflict.

Aug 12 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, August 12, 2011 20:03:59 Ellery Newcomer wrote:
 On 08/12/2011 06:34 PM, Jonathan M Davis wrote:
 Forgive my being dense, but where is this 'as long as' coming from? If
 your range only points to ends in e.g. a linked list, how is it
 supposed
 to retrieve elements in the middle? I'm having a hard time visualizing
 a
 range over a node based container that doesn't point to a node in the
 middle (at some point in time). The range points to the node to
 retrieve
 the current front quickly, the node can get removed, the removed
 function won't know its invalidating the range without playing yon
 internal games, ergo stable remove cannot be.

 
 Are you familiar with iterators? This will be a lot easier if you are.
 An
 iterator points to one element and one element only. In C++, you tend to
 pass around pairs of iterators - one pointing to the first element in a
 range of elements and one pointing to one past the end. You then
 usually iterate by incrementing the first iterator until it equals the
 second.

 
 Now you're just bludgeoning me into apathy (though my ability to
 communicate seems lacking). The iterator is an abstraction. Beneath it,
 in a node based container, [I expect] will be a pointer to a node, which
 might point to any node in the container. This means that removing any
 node could potentially invalidate a range somewhere. When such a
 conflict arises, you cannot both perform the removal and keep a valid
 range, regardless of whether you even knew of the conflict.

It means that if you're dealing with a node-based container, and you remove an 
element from that container, and you have a range which does not have that 
element at either of its ends, then you know that your range is valid. If 
you're keeping random ranges around and removing elements from the container, 
then no, you can't know whether the range is still valid or not.

What it really comes down to is that you don't keep ranges around long term, 
and that if you're altering a container, and you're using a range over that 
container at the same time, you need to be sure of what you're doing. If you 
are, then you can use the range in spite of altering the container. If you're 
not, then you're in trouble. The simplest thing to do, of course, is to just 
not alter a container while you have ranges which refer to it (or to get rid 
of any ranges that you have over a container when you do alter that 
container).

- Jonathan M Davis

Aug 12 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 12 Aug 2011 18:29:00 -0400, Ellery Newcomer  
<ellery-newcomer utulsa.edu> wrote:

 On 08/12/2011 03:54 PM, Jonathan M Davis wrote:
 In the case of container that uses nodes - such as a linked list -  
 because you
 can add and remove elements without affecting other elements, iterators  
 and
 ranges don't tend to get invalidated as easily. As long as you don't  
 remove
 the element (or elements in the case of a range - assuming that it  
 keeps track
 of its two end points, as is likely) that it points to, then adding or
 removing elements from the container shouldn't invalidate the  
 iterator/range.

 "shouldn't" isn't a guarantee. Where there is "shouldn't", there can't  
 be stableRemove*, no?

I don't think it's possible to implement stableRemove IMO.  I believe  
SList does claim it, but IMO once you remove an element from a container,  
any range that iterates that element is invalid.

Once we get custom allocators, this is going to become a lot dicier,  
because removing elements actually may deallocate them.

stableAdd is more possible for implementing, as long as adding does not  
significantly change the topology of the container (for example, adding to  
a hash may do a rehash which changes the topology).

-Steve

Aug 15 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - to invalidate a range