digitalmars.D - Container insertion and removal

Andrei Alexandrescu (21/21) Mar 06 2010 In the STL world, writing container-independent code is generally

Steven Schveighoffer (5/31) Mar 06 2010 How can softRemove not affect iterating ranges? What if the range is po...

Robert Jacques (17/51) Mar 06 2010 It always affects ranges in so far as the range and container are

Steven Schveighoffer (22/47) Mar 06 2010 If my linked list range has two node pointers as the implementation, and...

Robert Jacques (21/71) Mar 06 2010 A linked list maps to a forward range: so the range simply points to a

Steven Schveighoffer (10/97) Mar 07 2010 How do you know when to stop? A range has a beginning and an ending, ot...

Fawzi Mohamed (38/95) Mar 07 2010 I agree with this point of view, it makes sense to make some operations

Andrei Alexandrescu (17/110) Mar 08 2010 These are great thoughts. We need to explore this niche because STL has

Michel Fortin (25/30) Mar 08 2010 That won't work for a vector as ranges that are further then the

Robert Jacques (53/178) Mar 07 2010 Please define for me an O(1) slice or index operation for a linked-list....

Steven Schveighoffer (42/124) Mar 07 2010 One for which you have references to the slice end points. I think this...

Robert Jacques (34/156) Mar 07 2010 [snip]

Steven Schveighoffer (16/55) Mar 08 2010 Not two ranges, two references. That is what cursors are in

Andrei Alexandrescu (29/44) Mar 08 2010 The way people usually take advantage of non-invalidating container

Steven Schveighoffer (8/47) Mar 08 2010 Hm... this seems to be a different problem than I originally was thinkin...

Andrei Alexandrescu (8/60) Mar 08 2010 That's the idea. Any documentation of STL containers specifies whether

Steven Schveighoffer (20/32) Mar 08 2010 What I meant was, your example only focuses on the iterator being advanc...

Andrei Alexandrescu (15/52) Mar 08 2010 The examples were given just to illustrate common cases when

dsimcha (13/20) Mar 07 2010 Using a 32-bit version tag probably could lead to overflow in some corne...

Andrei Alexandrescu (7/30) Mar 08 2010 Yah, there are many similar assessments that at least for a couple of

Steve Teale (6/19) Mar 06 2010 Steve,

Steven Schveighoffer (3/24) Mar 06 2010 I think the point of soft functions is to not compile if they are not su...

Andrei Alexandrescu (5/36) Mar 06 2010 Singly linked lists, doubly linked lists, various trees - three's a

Michel Fortin (10/15) Mar 06 2010 What exactly is an "invalidated" iterator? Is an iterator that no

Andrei Alexandrescu (7/21) Mar 06 2010 Good question. In STL, invalidation roughly means undefined behavior if

Joel C. Salomon (5/10) Mar 08 2010 So it’s still a bug to use an invalidated iterator/range, and probably

Andrei Alexandrescu (5/15) Mar 08 2010 I'd think so. In SafeD, clearly there's no memory corruption so throwing...

BLS (13/15) Mar 06 2010 slightly OT

Steven Schveighoffer (11/16) Mar 06 2010 We have had private discussions, and we have fundamental differences of ...

Andrei Alexandrescu (11/30) Mar 07 2010 I concur with Steve's characterization of the situation. My interface

Steven Schveighoffer (25/62) Mar 06 2010 If you define invalidation by "not pointing to unallocated memory,"

Andrei Alexandrescu (13/81) Mar 07 2010 Well consider an array of File objects, which are reference counted. I'd...

Steven Schveighoffer (10/15) Mar 07 2010 Is there any experience that led to a solution, even in theory?

Andrei Alexandrescu (11/30) Mar 08 2010 I don't know of one. There are of course the safe iterators that rely on...

Fawzi Mohamed (59/59) Mar 07 2010 I am coming late into this discussion, some comments:

Michel Fortin (26/33) Mar 06 2010 I think it'd be better to say "they are assumed to invalidate linked

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

In the STL world, writing container-independent code is generally 
shunned (see e.g. 
http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

One problem is a very small intersection between the functionalities 
offered by the various STL containers, and the conceptual organization 
that is weaker than that of iterators.

A worse problem is iterator invalidation rules, something that we'll 
need to address too. I'm thinking that the best defense is a strong 
offense, and I plan to define the following naming convention:

Methods such as insert, remove, pushFront, pushBack, removeFront, 
removeBack, are assumed to affect the container's topology and must be 
handled in user code as such.

In addition to those, a container may also define functions named after 
the above by adding a "soft" prefix (e.g. softInsert, softRemove...) 
that are guaranteed to not affect the ranges currently iterating the 
container.

Generic code that needs specific iterator (non-)invalidation rules can 
use softXxx methods, in confidence that containers not supporting it 
will be ruled out during compilations.

Sounds good?


Andrei

Mar 06 2010

Steven Schveighoffer <schveiguy yahoo.com> writes:

Andrei Alexandrescu Wrote:

 In the STL world, writing container-independent code is generally 
 shunned (see e.g. 
 http://www.informit.com/content/images/0201749629/items/item2-2.pdf).
 
 One problem is a very small intersection between the functionalities 
 offered by the various STL containers, and the conceptual organization 
 that is weaker than that of iterators.
 
 A worse problem is iterator invalidation rules, something that we'll 
 need to address too. I'm thinking that the best defense is a strong 
 offense, and I plan to define the following naming convention:
 
 Methods such as insert, remove, pushFront, pushBack, removeFront, 
 removeBack, are assumed to affect the container's topology and must be 
 handled in user code as such.
 
 In addition to those, a container may also define functions named after 
 the above by adding a "soft" prefix (e.g. softInsert, softRemove...) 
 that are guaranteed to not affect the ranges currently iterating the 
 container.
 
 Generic code that needs specific iterator (non-)invalidation rules can 
 use softXxx methods, in confidence that containers not supporting it 
 will be ruled out during compilations.
 
 Sounds good?

How can softRemove not affect iterating ranges?  What if the range is
positioned on the element removed?

The only two containers that would support softInsert would be linked list and
sorted map/set.  Anything else might completely screw up the iteration.  I
don't see a lot of "generic" use for it.

Another option is to use a "mutation" field that is checked every chance by the
range.  If it changes, then the range is invalidated.

-Steve

Mar 06 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 Andrei Alexandrescu Wrote:

 In the STL world, writing container-independent code is generally
 shunned (see e.g.
 http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

 One problem is a very small intersection between the functionalities
 offered by the various STL containers, and the conceptual organization
 that is weaker than that of iterators.

 A worse problem is iterator invalidation rules, something that we'll
 need to address too. I'm thinking that the best defense is a strong
 offense, and I plan to define the following naming convention:

 Methods such as insert, remove, pushFront, pushBack, removeFront,
 removeBack, are assumed to affect the container's topology and must be
 handled in user code as such.

 In addition to those, a container may also define functions named after
 the above by adding a "soft" prefix (e.g. softInsert, softRemove...)
 that are guaranteed to not affect the ranges currently iterating the
 container.

 Generic code that needs specific iterator (non-)invalidation rules can
 use softXxx methods, in confidence that containers not supporting it
 will be ruled out during compilations.

 Sounds good?

 How can softRemove not affect iterating ranges?  What if the range is  
 positioned on the element removed?

It always affects ranges in so far as the range and container are  
inconsistent, but that is a problem of softInsert as well. Removing an  
element from an array doesn't invalidate the underlying range, since the  
memory is still there. And so long as you're not trying to use free lists,  
linked-lists, trees, etc. can be written so that the ranges never enter an  
invalid state. If they happen to be pointing at the removed node, they  
just end up stopping early.

 The only two containers that would support softInsert would be linked  
 list and sorted map/set.  Anything else might completely screw up the  
 iteration.  I don't see a lot of "generic" use for it.

There's all the containers based upon linked-lists, etc like hashes,  
stacks, queues and dequeues.

 Another option is to use a "mutation" field that is checked every chance  
 by the range.  If it changes, then the range is invalidated.

The mutation field would have to be a version number to support multiple  
ranges, and given experience with lock-free algorithms which use a 'tag'  
in a similar manner, this concept is bug prone and should not be relied  
upon. It would be better to 'lock' the node or container to topology  
changes, though this does slow things down and has no conflict resolution:  
removing a locked node would have to throw an exception.

Mar 06 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 How can softRemove not affect iterating ranges?  What if the range is  
 positioned on the element removed?

 It always affects ranges in so far as the range and container are  
 inconsistent, but that is a problem of softInsert as well. Removing an  
 element from an array doesn't invalidate the underlying range, since the  
 memory is still there. And so long as you're not trying to use free  
 lists, linked-lists, trees, etc. can be written so that the ranges never  
 enter an invalid state. If they happen to be pointing at the removed  
 node, they just end up stopping early.

If my linked list range has two node pointers as the implementation, and  
you remove the end node, it's going to end later, not early.
Of course, you can get around this by just storing a current node and a  
length, but that may not be what the user is expecting.  For example, if  
you did a range of a sorted tree from "a" to "f" and it stops at "n", I  
think that would be extremely unexpected.

Stopping early is invalidation also IMO.  If your program logic depends on  
iterating over all the elements you originally intended to iterate, then  
we have big problems if they stop early.

 The only two containers that would support softInsert would be linked  
 list and sorted map/set.  Anything else might completely screw up the  
 iteration.  I don't see a lot of "generic" use for it.

 There's all the containers based upon linked-lists, etc like hashes,  
 stacks, queues and dequeues.

Hashes may be rehashed when inserting, completely invalidating a range  
(possibly the end point will be before the starting point).

Yes, the others you mentioned will be valid.  But I still don't see it  
being more useful than just using documentation to indicate insertion will  
not invalidate ranges.  Perhaps I am wrong.

 Another option is to use a "mutation" field that is checked every  
 chance by the range.  If it changes, then the range is invalidated.

 The mutation field would have to be a version number to support multiple  
 ranges, and given experience with lock-free algorithms which use a 'tag'  
 in a similar manner, this concept is bug prone and should not be relied  
 upon. It would be better to 'lock' the node or container to topology  
 changes, though this does slow things down and has no conflict  
 resolution: removing a locked node would have to throw an exception.

I was not thinking of multithreaded applications.  I don't think it's  
worth making containers by default be multithreaded safe.

The mutation index has been used in Tango forever, and I think was in Doug  
Lea's original container implementations.  I'm pretty sure it is sound in  
single-threaded uses.

-Steve

Mar 06 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 How can softRemove not affect iterating ranges?  What if the range is  
 positioned on the element removed?

 It always affects ranges in so far as the range and container are  
 inconsistent, but that is a problem of softInsert as well. Removing an  
 element from an array doesn't invalidate the underlying range, since  
 the memory is still there. And so long as you're not trying to use free  
 lists, linked-lists, trees, etc. can be written so that the ranges  
 never enter an invalid state. If they happen to be pointing at the  
 removed node, they just end up stopping early.

 If my linked list range has two node pointers as the implementation, and  
 you remove the end node, it's going to end later, not early.

A linked list maps to a forward range: so the range simply points to a  
single internal node. If you're going to store a pointer to the end, then  
you should be using a doubly linked list, not a singly linked list.

 Of course, you can get around this by just storing a current node and a  
 length, but that may not be what the user is expecting.  For example, if  
 you did a range of a sorted tree from "a" to "f" and it stops at "n", I  
 think that would be extremely unexpected.

By this logic, any and all forms of mutation, not just insertions and  
deletions must not be allowed.

 Stopping early is invalidation also IMO.  If your program logic depends  
 on iterating over all the elements you originally intended to iterate,  
 then we have big problems if they stop early.

Stopping early is the result of a logical error by the programmer. The  
code itself, however, is completely valid.

 The only two containers that would support softInsert would be linked  
 list and sorted map/set.  Anything else might completely screw up the  
 iteration.  I don't see a lot of "generic" use for it.

 There's all the containers based upon linked-lists, etc like hashes,  
 stacks, queues and dequeues.

 Hashes may be rehashed when inserting, completely invalidating a range  
 (possibly the end point will be before the starting point).

Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a  
stale view)

 Yes, the others you mentioned will be valid.  But I still don't see it  
 being more useful than just using documentation to indicate insertion  
 will not invalidate ranges.  Perhaps I am wrong.

The difference is that algorithms can document in their template  
constraints that they need a container with 'soft' properties.

 Another option is to use a "mutation" field that is checked every  
 chance by the range.  If it changes, then the range is invalidated.

 The mutation field would have to be a version number to support  
 multiple ranges, and given experience with lock-free algorithms which  
 use a 'tag' in a similar manner, this concept is bug prone and should  
 not be relied upon. It would be better to 'lock' the node or container  
 to topology changes, though this does slow things down and has no  
 conflict resolution: removing a locked node would have to throw an  
 exception.

 I was not thinking of multithreaded applications.  I don't think it's  
 worth making containers by default be multithreaded safe.

I wasn't thinking of multi-threaded containers. I was trying to point out  
that version ids have failed in lock-free containers, where things are  
happening on the order of a single atomic op or a context switch. Given  
the time a range could go unused in standard code, versioning won't work.

 The mutation index has been used in Tango forever, and I think was in  
 Doug Lea's original container implementations.  I'm pretty sure it is  
 sound in single-threaded uses.

No it's not. version tags + integer overflow = bug. Doug Lea knew about  
the problem and but thought it would never happen in real code. And Bill  
Gates thought no one will need more than 640k of ram. They both have been  
proven wrong.

Mar 06 2010

Steven Schveighoffer <schveiguy yahoo.com> writes:

Robert Jacques Wrote:

 On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 How can softRemove not affect iterating ranges?  What if the range is  
 positioned on the element removed?

 It always affects ranges in so far as the range and container are  
 inconsistent, but that is a problem of softInsert as well. Removing an  
 element from an array doesn't invalidate the underlying range, since  
 the memory is still there. And so long as you're not trying to use free  
 lists, linked-lists, trees, etc. can be written so that the ranges  
 never enter an invalid state. If they happen to be pointing at the  
 removed node, they just end up stopping early.

 If my linked list range has two node pointers as the implementation, and  
 you remove the end node, it's going to end later, not early.

 
 A linked list maps to a forward range: so the range simply points to a  
 single internal node. If you're going to store a pointer to the end, then  
 you should be using a doubly linked list, not a singly linked list.

How do you know when to stop?  A range has a beginning and an ending, otherwise
it's an iterator.  Whether you store it via a pointer to the last
(not-to-be-iterated) node, or via a length, or via a value to compare with for
stopping, you have to use something.  Or are you asserting the only useful
range for a linked list iterates the entire list?

Think of it as the equivalent of a slice of an array.

 
 Of course, you can get around this by just storing a current node and a  
 length, but that may not be what the user is expecting.  For example, if  
 you did a range of a sorted tree from "a" to "f" and it stops at "n", I  
 think that would be extremely unexpected.

 
 By this logic, any and all forms of mutation, not just insertions and  
 deletions must not be allowed.

Allowed, yes.  Not invalidating iterators, no.

 
 Stopping early is invalidation also IMO.  If your program logic depends  
 on iterating over all the elements you originally intended to iterate,  
 then we have big problems if they stop early.

 
 Stopping early is the result of a logical error by the programmer. The  
 code itself, however, is completely valid.

I still fail to see the difference between "soft" operations and non-soft. 
What does soft guarantee?  Give me a concrete definition, an example would help
too.

 The only two containers that would support softInsert would be linked  
 list and sorted map/set.  Anything else might completely screw up the  
 iteration.  I don't see a lot of "generic" use for it.

 There's all the containers based upon linked-lists, etc like hashes,  
 stacks, queues and dequeues.

 Hashes may be rehashed when inserting, completely invalidating a range  
 (possibly the end point will be before the starting point).

 
 Wouldn't re-hashing necessitate re-allocation? (Thus the range would see a  
 stale view)

God no.  If my hash collision solution is linked-list based (which it is in
dcollections), why should I reallocate all those nodes?  I just rearrange them
in a new bucket array.

 
 Yes, the others you mentioned will be valid.  But I still don't see it  
 being more useful than just using documentation to indicate insertion  
 will not invalidate ranges.  Perhaps I am wrong.

 
 The difference is that algorithms can document in their template  
 constraints that they need a container with 'soft' properties.

What is the advantage?  Why would an algorithm require soft functions?  What is
an example of such an algorithm?

 
 Another option is to use a "mutation" field that is checked every  
 chance by the range.  If it changes, then the range is invalidated.

 The mutation field would have to be a version number to support  
 multiple ranges, and given experience with lock-free algorithms which  
 use a 'tag' in a similar manner, this concept is bug prone and should  
 not be relied upon. It would be better to 'lock' the node or container  
 to topology changes, though this does slow things down and has no  
 conflict resolution: removing a locked node would have to throw an  
 exception.

 I was not thinking of multithreaded applications.  I don't think it's  
 worth making containers by default be multithreaded safe.

 
 I wasn't thinking of multi-threaded containers. I was trying to point out  
 that version ids have failed in lock-free containers, where things are  
 happening on the order of a single atomic op or a context switch. Given  
 the time a range could go unused in standard code, versioning won't work.

Are you trying to say that if you don't use your range for exactly 2^32
mutations, it could mistakenly think the range is still valid?  That's a valid,
but very very weak point.

 
 The mutation index has been used in Tango forever, and I think was in  
 Doug Lea's original container implementations.  I'm pretty sure it is  
 sound in single-threaded uses.

 
 No it's not. version tags + integer overflow = bug. Doug Lea knew about  
 the problem and but thought it would never happen in real code. And Bill  
 Gates thought no one will need more than 640k of ram. They both have been  
 proven wrong.

Overflow != bug.  Wrapping completely to the same value == bug, but is so
unlikely, it's worth the possibility.

-Steve

Mar 07 2010

Fawzi Mohamed <fmohamed mac.com> writes:

On 2010-03-07 14:23:03 +0100, Steven Schveighoffer <schveiguy yahoo.com> said:

 Robert Jacques Wrote:
 
 On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>
 wrote:
 
 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 
 How can softRemove not affect iterating ranges?  What if the range is
 positioned on the element removed?

 
 It always affects ranges in so far as the range and container are
 inconsistent, but that is a problem of softInsert as well. Removing an
 element from an array doesn't invalidate the underlying range, since
 the memory is still there. And so long as you're not trying to use free
 lists, linked-lists, trees, etc. can be written so that the ranges
 never enter an invalid state. If they happen to be pointing at the
 removed node, they just end up stopping early.

 
 If my linked list range has two node pointers as the implementation, and
 you remove the end node, it's going to end later, not early.

 
 A linked list maps to a forward range: so the range simply points to a
 single internal node. If you're going to store a pointer to the end, then
 you should be using a doubly linked list, not a singly linked list.

 
 How do you know when to stop?  A range has a beginning and an ending, 
 otherwise it's an iterator.  Whether you store it via a pointer to the 
 last (not-to-be-iterated) node, or via a length, or via a value to 
 compare with for stopping, you have to use something.  Or are you 
 asserting the only useful range for a linked list iterates the entire 
 list?
 
 Think of it as the equivalent of a slice of an array.
 
 
 Of course, you can get around this by just storing a current node and a
 length, but that may not be what the user is expecting.  For example, if
 you did a range of a sorted tree from "a" to "f" and it stops at "n", I
 think that would be extremely unexpected.

 
 By this logic, any and all forms of mutation, not just insertions and
 deletions must not be allowed.

 
 Allowed, yes.  Not invalidating iterators, no.

I agree with this point of view, it makes sense to make some operations 
invalidating iteration, and even there there is a hierarcy of 
"invalidation":

1) all iterator are still valid, and iterate on the same set that they 
were iterating before. This is the best thing from a user, and is what 
persistent structures do, all the structures described in the book I 
gave have this property.
To achieve this you need to basically *always* create a copy when 
modifying a structure, but fortunately often there are ways to share 
most of the structure with the old value. Still this has a cost, and 
normally puts pressure on the memory/gc (all the nodes, or groups of 
them have to be single objects for the gc).

2) iterators are still valid, but they might iterate on a different set 
of elements than originally planned.
Array append with atomic update of elements gives this for example, 
removal in general has more problems, even if the memory might still be 
valid. Invariants of the array might be violated (for example strict 
ordering), one could separate this in two sub points, invariant 
violating and non, but probably it is not worth the effort.

3) iterators are fully invalid, and at worst they might iterate on 
completely wrong/invalid data without detection of failure, hopefully 
at least in debug mode detection should be an option.

 Stopping early is invalidation also IMO.  If your program logic depends
 on iterating over all the elements you originally intended to iterate,
 then we have big problems if they stop early.

 
 Stopping early is the result of a logical error by the programmer. The
 code itself, however, is completely valid.

 
 I still fail to see the difference between "soft" operations and 
 non-soft.  What does soft guarantee?  Give me a concrete definition, an 
 example would help too.

I suppose soft should guarantee either 1 or 2, from what andrei was 
saying I suppose he wants 2

This information is very important for the user for example to iterate 
+manipulate (addition/removal), one doesn't necessarily need to do a 
copy with all containers.

In general one wants to have clean interface, + just one or two "good" 
implementations, I don't feel that there is a strong need to have many 
implementations of the basic containers in a standard library, (the 
other would be just implemented for a specific application, if needed). 
Then there might be different containers (db, distributed hashes, 
xml,...), and it would be nice if those can be as close as possible in 
some aspects of the api to the standard containers.
Again here generic algorithms are useful to augment the usefulness of 
special containers with a minimal coding effort, but should not 
"disallow" ad-hoc implementation in the container.

Mar 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Fawzi Mohamed wrote:
 On 2010-03-07 14:23:03 +0100, Steven Schveighoffer <schveiguy yahoo.com> 
 said:
 
 Robert Jacques Wrote:

 On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>
 wrote:

 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 How can softRemove not affect iterating ranges?  What if the range is
 positioned on the element removed?

 It always affects ranges in so far as the range and container are
 inconsistent, but that is a problem of softInsert as well. Removing an
 element from an array doesn't invalidate the underlying range, since
 the memory is still there. And so long as you're not trying to use 
 free
 lists, linked-lists, trees, etc. can be written so that the ranges
 never enter an invalid state. If they happen to be pointing at the
 removed node, they just end up stopping early.

 If my linked list range has two node pointers as the implementation, 
 and
 you remove the end node, it's going to end later, not early.

 A linked list maps to a forward range: so the range simply points to a
 single internal node. If you're going to store a pointer to the end, 
 then
 you should be using a doubly linked list, not a singly linked list.

 How do you know when to stop?  A range has a beginning and an ending, 
 otherwise it's an iterator.  Whether you store it via a pointer to the 
 last (not-to-be-iterated) node, or via a length, or via a value to 
 compare with for stopping, you have to use something.  Or are you 
 asserting the only useful range for a linked list iterates the entire 
 list?

 Think of it as the equivalent of a slice of an array.

 Of course, you can get around this by just storing a current node and a
 length, but that may not be what the user is expecting.  For 
 example, if
 you did a range of a sorted tree from "a" to "f" and it stops at "n", I
 think that would be extremely unexpected.

 By this logic, any and all forms of mutation, not just insertions and
 deletions must not be allowed.

 Allowed, yes.  Not invalidating iterators, no.

 
 I agree with this point of view, it makes sense to make some operations 
 invalidating iteration, and even there there is a hierarcy of 
 "invalidation":
 
 1) all iterator are still valid, and iterate on the same set that they 
 were iterating before. This is the best thing from a user, and is what 
 persistent structures do, all the structures described in the book I 
 gave have this property.
 To achieve this you need to basically *always* create a copy when 
 modifying a structure, but fortunately often there are ways to share 
 most of the structure with the old value. Still this has a cost, and 
 normally puts pressure on the memory/gc (all the nodes, or groups of 
 them have to be single objects for the gc).
 
 2) iterators are still valid, but they might iterate on a different set 
 of elements than originally planned.
 Array append with atomic update of elements gives this for example, 
 removal in general has more problems, even if the memory might still be 
 valid. Invariants of the array might be violated (for example strict 
 ordering), one could separate this in two sub points, invariant 
 violating and non, but probably it is not worth the effort.
 
 3) iterators are fully invalid, and at worst they might iterate on 
 completely wrong/invalid data without detection of failure, hopefully at 
 least in debug mode detection should be an option.

These are great thoughts. We need to explore this niche because STL has 
a black-and-white approach to invalidation - once invalidated, behavior 
is undefined. D could get much more nuanced than that.

Perhaps for the beginning we could just say that invalidation entails 
implementation-specific behavior.

 Stopping early is invalidation also IMO.  If your program logic depends
 on iterating over all the elements you originally intended to iterate,
 then we have big problems if they stop early.

 Stopping early is the result of a logical error by the programmer. The
 code itself, however, is completely valid.

 I still fail to see the difference between "soft" operations and 
 non-soft.  What does soft guarantee?  Give me a concrete definition, 
 an example would help too.

 
 I suppose soft should guarantee either 1 or 2, from what andrei was 
 saying I suppose he wants 2

I think softXxx functions that add data guarantee that existing ranges 
continue iterating the new collection correctly. Depending on where 
insertion was effected, iteration may or may not span the newly-inserted 
elements. For example, inserting elements in a linked list does preserve 
ranges so a linked list can afford to define softInsert and softPrepend.

softXxx functions that remove data should guarantee that existing ranges 
not including the removed elements will continue to operate on the range 
without seeing the removed elements. There is no guarantee that soft 
removal offers for ranges spanning the removed elements. (I'm unclear on 
this last topic.)

Andrei

Mar 08 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-03-08 13:01:32 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 softXxx functions that remove data should guarantee that existing 
 ranges not including the removed elements will continue to operate on 
 the range without seeing the removed elements. There is no guarantee 
 that soft removal offers for ranges spanning the removed elements. (I'm 
 unclear on this last topic.)

That won't work for a vector as ranges that are further then the 
removed element need adjustment too.

Just an idea, perhaps remove and other mutator functions for the 
container could take a list of range to update at the same time they do 
the operation. This way you don't even need a soft function, if remove 
can't accept other ranges then it won't take other ranges as an 
argument. For instance:

	class Container {
		void remove(Range toRemove, Range[] needAdjustment);
	}

	Container x = new Conatiner();
	auto a = x[];
	auto b = x[4..6];
	auto c = x[2..4];
	x.remove(c, [a, b]); // remove elements from c, adjust ranges a and b.

	OtherContainer y = new OtherContainer();
	auto d = y[];
	auto e = y[2..4];
	y.remove(e, [d]); // error, container is unable to adjust other ranges.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 08 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 Robert Jacques Wrote:

 On Sat, 06 Mar 2010 21:54:50 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 On Sat, 06 Mar 2010 11:19:15 -0500, Robert Jacques <sandford jhu.edu>
 wrote:

 On Sat, 06 Mar 2010 08:19:36 -0500, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:
 How can softRemove not affect iterating ranges?  What if the range  



 is
 positioned on the element removed?

 It always affects ranges in so far as the range and container are
 inconsistent, but that is a problem of softInsert as well. Removing  


 an
 element from an array doesn't invalidate the underlying range, since
 the memory is still there. And so long as you're not trying to use  


 free
 lists, linked-lists, trees, etc. can be written so that the ranges
 never enter an invalid state. If they happen to be pointing at the
 removed node, they just end up stopping early.

 If my linked list range has two node pointers as the implementation,  

 and
 you remove the end node, it's going to end later, not early.

 A linked list maps to a forward range: so the range simply points to a
 single internal node. If you're going to store a pointer to the end,  
 then
 you should be using a doubly linked list, not a singly linked list.

 How do you know when to stop?  A range has a beginning and an ending,  
 otherwise it's an iterator.  Whether you store it via a pointer to the  
 last (not-to-be-iterated) node, or via a length, or via a value to  
 compare with for stopping, you have to use something.  Or are you  
 asserting the only useful range for a linked list iterates the entire  
 list?

 Think of it as the equivalent of a slice of an array.

Please define for me an O(1) slice or index operation for a linked-list.  
The only way of doing this is to search the entire list in order,  
comparing for the search terms or counting the number of nodes passed. The  
range itself can do this just as easily as the host container, if you  
really want this functionality (I'd argue this isn't a valid list  
operation). For simple list[5..10] operations its definitely more  
efficient and then the range could even throw an exception when it reaches  
the end of the list before finding the end slice value (due to list  
topology manipulation). The efficiency of more complex ranges, like  
list["apple".."oranges"], is still much more efficient for the single  
range case. Even when you start to take many slices, cache effects can  
marginalize a lot of the cost.

 Of course, you can get around this by just storing a current node and  

 a
 length, but that may not be what the user is expecting.  For example,  

 if
 you did a range of a sorted tree from "a" to "f" and it stops at "n",  

 I
 think that would be extremely unexpected.

 By this logic, any and all forms of mutation, not just insertions and
 deletions must not be allowed.

 Allowed, yes.  Not invalidating iterators, no.

If I change a node in a sorted tree from 'a' to 'n', the node moves,  
changing the tree topology. Any range currently at the node would continue  
to look for the 'f' node, and iterate the rest of the tree erroneously. By  
the way, having ranges detect if they reach their end nodes or not is  
fairly easy to do.

 Stopping early is invalidation also IMO.  If your program logic  

 depends
 on iterating over all the elements you originally intended to iterate,
 then we have big problems if they stop early.

 Stopping early is the result of a logical error by the programmer. The
 code itself, however, is completely valid.

 I still fail to see the difference between "soft" operations and  
 non-soft.  What does soft guarantee?  Give me a concrete definition, an  
 example would help too.

There are a couple of possible definitions for soft operations: 1) the  
memory safety of the ranges of a collection are guaranteed. 2) That for  
the topology viewed by a range isn't logically changed. i.e. the range  
will continue to perform the same logical function if the topology its  
operating on is updated 3) That for the topology viewed by a range isn't  
actually changed and all elements selected at range creation will be  
viewed. 4) Like 3, but with all values being viewed.

For example, modifying an array in any way doesn't change 1, 2 or 3 for  
any of its slices.
For a linked list defining a forward range, mutation, insertion and  
removal can be done under 1 & 2.
The same can be said about doubly linked lists and bidirectional ranges.
For other containers, such as a sorted tree, mutation can break a 2/3  
though insertion and deletion don't break 2.
Although, the ranges will see many values, they may not see all the values  
currently in the collection nor all the values in the collection when the  
iterator was generated. So code that relies on such properties would be  
logically invalid.

I'd probably define hard ops as being 1) and soft ops at level 2. 4) is  
really only possible with immutable containers.

 The only two containers that would support softInsert would be  



 linked
 list and sorted map/set.  Anything else might completely screw up  



 the
 iteration.  I don't see a lot of "generic" use for it.

 There's all the containers based upon linked-lists, etc like hashes,
 stacks, queues and dequeues.

 Hashes may be rehashed when inserting, completely invalidating a range
 (possibly the end point will be before the starting point).

 Wouldn't re-hashing necessitate re-allocation? (Thus the range would  
 see a
 stale view)

 God no.  If my hash collision solution is linked-list based (which it is  
 in dcollections), why should I reallocate all those nodes?  I just  
 rearrange them in a new bucket array.

Sorry, I was assuming that if you were going to implement a hash  
collection you wouldn't be using a linked list approach, since that's what  
D's associative arrays already do. The are some really good reasons to not  
use a list based hash in D due to GC false pointer issues, but basically  
none to re-implementing (poorly?) D's built-in data structure.

 Yes, the others you mentioned will be valid.  But I still don't see it
 being more useful than just using documentation to indicate insertion
 will not invalidate ranges.  Perhaps I am wrong.

 The difference is that algorithms can document in their template
 constraints that they need a container with 'soft' properties.

 What is the advantage?  Why would an algorithm require soft functions?   
 What is an example of such an algorithm?

Something that uses toUpperCase or toLowerCase, for example.

 Another option is to use a "mutation" field that is checked every
 chance by the range.  If it changes, then the range is invalidated.

 The mutation field would have to be a version number to support
 multiple ranges, and given experience with lock-free algorithms which
 use a 'tag' in a similar manner, this concept is bug prone and should
 not be relied upon. It would be better to 'lock' the node or  


 container
 to topology changes, though this does slow things down and has no
 conflict resolution: removing a locked node would have to throw an
 exception.

 I was not thinking of multithreaded applications.  I don't think it's
 worth making containers by default be multithreaded safe.

 I wasn't thinking of multi-threaded containers. I was trying to point  
 out
 that version ids have failed in lock-free containers, where things are
 happening on the order of a single atomic op or a context switch. Given
 the time a range could go unused in standard code, versioning won't  
 work.

 Are you trying to say that if you don't use your range for exactly 2^32  
 mutations, it could mistakenly think the range is still valid?  That's a  
 valid, but very very weak point.

Umm, no. That is a valid point that happens in production code to  
disastrous effects. Worse, it doesn't happen often and there's no good  
unit test for it. I for one, never want to debug a program that only  
glitches after days of intensive use in a live environment with real  
customer data. Integer overflow bugs like this are actually one of the few  
bugs that have ever killed anyone.

 The mutation index has been used in Tango forever, and I think was in
 Doug Lea's original container implementations.  I'm pretty sure it is
 sound in single-threaded uses.

 No it's not. version tags + integer overflow = bug. Doug Lea knew about
 the problem and but thought it would never happen in real code. And Bill
 Gates thought no one will need more than 640k of ram. They both have  
 been
 proven wrong.

 Overflow != bug.  Wrapping completely to the same value == bug, but is  
 so unlikely, it's worth the possibility.

Statistics 101: do a test enough times and even the highly improbable will  
happen.

Mar 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 How do you know when to stop?  A range has a beginning and an ending,  
 otherwise it's an iterator.  Whether you store it via a pointer to the  
 last (not-to-be-iterated) node, or via a length, or via a value to  
 compare with for stopping, you have to use something.  Or are you  
 asserting the only useful range for a linked list iterates the entire  
 list?

 Think of it as the equivalent of a slice of an array.

 Please define for me an O(1) slice or index operation for a linked-list.

One for which you have references to the slice end points.  I think this  
will work, and I was planning on providing it in the upcoming dcollections  
port.  The only thing you cannot guarantee is that the order is correct.

 By the way, having ranges detect if they reach their end nodes or not is  
 fairly easy to do.

you are correct in that point, you could throw an exception as long as the  
end point is part of the range structure.  If you just use a current node  
plus a length, then you cannot do that.  But soft ops are not necessary to  
create this.

 I still fail to see the difference between "soft" operations and  
 non-soft.  What does soft guarantee?  Give me a concrete definition, an  
 example would help too.

 There are a couple of possible definitions for soft operations: 1) the  
 memory safety of the ranges of a collection are guaranteed. 2) That for  
 the topology viewed by a range isn't logically changed. i.e. the range  
 will continue to perform the same logical function if the topology its  
 operating on is updated 3) That for the topology viewed by a range isn't  
 actually changed and all elements selected at range creation will be  
 viewed. 4) Like 3, but with all values being viewed.

 For example, modifying an array in any way doesn't change 1, 2 or 3 for  
 any of its slices.
 For a linked list defining a forward range, mutation, insertion and  
 removal can be done under 1 & 2.
 The same can be said about doubly linked lists and bidirectional ranges.
 For other containers, such as a sorted tree, mutation can break a 2/3  
 though insertion and deletion don't break 2.
 Although, the ranges will see many values, they may not see all the  
 values currently in the collection nor all the values in the collection  
 when the iterator was generated. So code that relies on such properties  
 would be logically invalid.

 I'd probably define hard ops as being 1) and soft ops at level 2. 4) is  
 really only possible with immutable containers.



I don't really understand 2, "the range will continue to perform the same  
logical function," what does that mean?  Define that function.  I would  



don't think it can.

 Wouldn't re-hashing necessitate re-allocation? (Thus the range would  
 see a
 stale view)

 God no.  If my hash collision solution is linked-list based (which it  
 is in dcollections), why should I reallocate all those nodes?  I just  
 rearrange them in a new bucket array.

 Sorry, I was assuming that if you were going to implement a hash  
 collection you wouldn't be using a linked list approach, since that's  
 what D's associative arrays already do. The are some really good reasons  
 to not use a list based hash in D due to GC false pointer issues, but  
 basically none to re-implementing (poorly?) D's built-in data structure.

Hm... my hash outperforms builtin AAs by a wide margin.  But this is not  
technically because my implementation is better, it's because AA's use  
"dumb" allocation methods.  I don't know about false pointers, the hash  
nodes in my implementation only contain pointers, so I'm not sure there is  
any possibility for false ones.

What my hash implementation *does* do that AA's don't is to provide a  
better interface -- removal while iterating, storing cursors for later  
reference and removal (i.e. avoid double lookups).

Also, technically AAs are implemented with a tree, not a LL, so no opCmp  
is required for my hash.

 The difference is that algorithms can document in their template
 constraints that they need a container with 'soft' properties.

 What is the advantage?  Why would an algorithm require soft functions?   
 What is an example of such an algorithm?

 Something that uses toUpperCase or toLowerCase, for example.

I guess I won't get a real example.  I'm not sure it matters.  When Andrei  
starts implementing the soft methods, either they will be a huge win or  
obviously useless.  If I were a betting man, I'd bet on the latter, but  
I'm not really good at betting, and Andrei's ideas are usually good :)

 I wasn't thinking of multi-threaded containers. I was trying to point  
 out
 that version ids have failed in lock-free containers, where things are
 happening on the order of a single atomic op or a context switch. Given
 the time a range could go unused in standard code, versioning won't  
 work.

 Are you trying to say that if you don't use your range for exactly 2^32  
 mutations, it could mistakenly think the range is still valid?  That's  
 a valid, but very very weak point.

 Umm, no. That is a valid point that happens in production code to  
 disastrous effects. Worse, it doesn't happen often and there's no good  
 unit test for it. I for one, never want to debug a program that only  
 glitches after days of intensive use in a live environment with real  
 customer data. Integer overflow bugs like this are actually one of the  
 few bugs that have ever killed anyone.

I think you are overestimating the life of ranges on a container.  They  
will not survive that many mutations, and the worst that happens is the  


You can also force the container to invalidate itself once the first wrap  
occurs.  This at least will be a hard error.

 No it's not. version tags + integer overflow = bug. Doug Lea knew about
 the problem and but thought it would never happen in real code. And  
 Bill
 Gates thought no one will need more than 640k of ram. They both have  
 been
 proven wrong.

 Overflow != bug.  Wrapping completely to the same value == bug, but is  
 so unlikely, it's worth the possibility.

 Statistics 101: do a test enough times and even the highly improbable  
 will happen.

In statistics we generally ignore the outliers.  I normally am on your  
side in these types of cases, but we are talking about a statistical  
impossibility -- nobody will leave a range untouched for exactly 2^32  
mutations, it simply won't happen.  Your computer will probably be  
obsolete before it does.

BTW, I'm not advocating adding mutation counters, I don't plan on adding  
them.  It's just another alternative to "soft" mutations.

-Steve

Mar 07 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:
 On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:


[snip]
 Please define for me an O(1) slice or index operation for a linked-list.

 One for which you have references to the slice end points.  I think this  
 will work, and I was planning on providing it in the upcoming  
 dcollections port.  The only thing you cannot guarantee is that the  
 order is correct.

The container would have to do an O(N) search to verify the ranges are  
actually part of the collection. And using two ranges as iterators to  
create a third range feels very wrong and very bug prone: see all the  
issues raised during Andrei's iterators vs ranges presentations.  
Similarly, it feels wrong for something to define slicing and not indexing.

 By the way, having ranges detect if they reach their end nodes or not  
 is fairly easy to do.

 you are correct in that point, you could throw an exception as long as  
 the end point is part of the range structure.  If you just use a current  
 node plus a length, then you cannot do that.  But soft ops are not  
 necessary to create this.

Soft ops are necessary to document in code whether that invalidation is  
expected to happen.

 I still fail to see the difference between "soft" operations and  
 non-soft.  What does soft guarantee?  Give me a concrete definition,  
 an example would help too.

 There are a couple of possible definitions for soft operations: 1) the  
 memory safety of the ranges of a collection are guaranteed. 2) That for  
 the topology viewed by a range isn't logically changed. i.e. the range  
 will continue to perform the same logical function if the topology its  
 operating on is updated 3) That for the topology viewed by a range  
 isn't actually changed and all elements selected at range creation will  
 be viewed. 4) Like 3, but with all values being viewed.

 For example, modifying an array in any way doesn't change 1, 2 or 3 for  
 any of its slices.
 For a linked list defining a forward range, mutation, insertion and  
 removal can be done under 1 & 2.
 The same can be said about doubly linked lists and bidirectional ranges.
 For other containers, such as a sorted tree, mutation can break a 2/3  
 though insertion and deletion don't break 2.
 Although, the ranges will see many values, they may not see all the  
 values currently in the collection nor all the values in the collection  
 when the iterator was generated. So code that relies on such properties  
 would be logically invalid.

 I'd probably define hard ops as being 1) and soft ops at level 2. 4) is  
 really only possible with immutable containers.



 I don't really understand 2, "the range will continue to perform the  
 same logical function," what does that mean?  Define that function.  I  



 I don't think it can.

I was thinking of "iterate from the start to the end", for example. One  
might better describe this concept as the topology of the container  
relative to the range doesn't change: things before it stay before, things  
after it stay after and in the case of bidirectional ranges, things in the  
middle, stay in the middle.

 Wouldn't re-hashing necessitate re-allocation? (Thus the range would  
 see a
 stale view)

 God no.  If my hash collision solution is linked-list based (which it  
 is in dcollections), why should I reallocate all those nodes?  I just  
 rearrange them in a new bucket array.

 Sorry, I was assuming that if you were going to implement a hash  
 collection you wouldn't be using a linked list approach, since that's  
 what D's associative arrays already do. The are some really good  
 reasons to not use a list based hash in D due to GC false pointer  
 issues, but basically none to re-implementing (poorly?) D's built-in  
 data structure.

 Hm... my hash outperforms builtin AAs by a wide margin.  But this is not  
 technically because my implementation is better, it's because AA's use  
 "dumb" allocation methods.  I don't know about false pointers, the hash  
 nodes in my implementation only contain pointers, so I'm not sure there  
 is any possibility for false ones.

The GC isn't precise, so if you have a non-pointer type in a structure  
with a pointer or in a class, you'll get false pointers. (i.e. the hash  
value at each node)

 The difference is that algorithms can document in their template
 constraints that they need a container with 'soft' properties.

 What is the advantage?  Why would an algorithm require soft  
 functions?  What is an example of such an algorithm?

 Something that uses toUpperCase or toLowerCase, for example.

 I guess I won't get a real example.  I'm not sure it matters.  When  
 Andrei starts implementing the soft methods, either they will be a huge  
 win or obviously useless.  If I were a betting man, I'd bet on the  
 latter, but I'm not really good at betting, and Andrei's ideas are  
 usually good :)

Though this was the first thing that popped into my head, it is a fairly  
real example. Consider you are doing some processing involving a container  
and you call a library function, like toUpperCase, that will perform some  
mutation. For some containers, this is completely reasonable. But for  
others, the container topology is going to massively change, invalidating  
all your current ranges. And you really like to guarantee that both the  
current implementation and the one 6 months from now don't mess up your  
ranges and cause a bunch of exceptions to be thrown.

 I wasn't thinking of multi-threaded containers. I was trying to point  
 out
 that version ids have failed in lock-free containers, where things are
 happening on the order of a single atomic op or a context switch.  
 Given
 the time a range could go unused in standard code, versioning won't  
 work.

 Are you trying to say that if you don't use your range for exactly  
 2^32 mutations, it could mistakenly think the range is still valid?   
 That's a valid, but very very weak point.

 Umm, no. That is a valid point that happens in production code to  
 disastrous effects. Worse, it doesn't happen often and there's no good  
 unit test for it. I for one, never want to debug a program that only  
 glitches after days of intensive use in a live environment with real  
 customer data. Integer overflow bugs like this are actually one of the  
 few bugs that have ever killed anyone.

 I think you are overestimating the life of ranges on a container.  They  
 will not survive that many mutations, and the worst that happens is the  


A year ago I would have agreed with you, but then I saw a couple of  
articles about how this unlikely event does start to occur repeatably in  
certain circumstances. This made a bunch of news since it threw a massive  
spanner in lock-free containers, which relied on this never happening.

 You can also force the container to invalidate itself once the first  
 wrap occurs.  This at least will be a hard error.

So every container will have a finite number of operation and that's it?

 No it's not. version tags + integer overflow = bug. Doug Lea knew  
 about
 the problem and but thought it would never happen in real code. And  
 Bill
 Gates thought no one will need more than 640k of ram. They both have  
 been
 proven wrong.

 Overflow != bug.  Wrapping completely to the same value == bug, but is  
 so unlikely, it's worth the possibility.

 Statistics 101: do a test enough times and even the highly improbable  
 will happen.

 In statistics we generally ignore the outliers.  I normally am on your  
 side in these types of cases, but we are talking about a statistical  
 impossibility -- nobody will leave a range untouched for exactly 2^32  
 mutations, it simply won't happen.  Your computer will probably be  
 obsolete before it does.

No, this is a statistical implausibility. One that has already happened to  
some people. ulongs on the other hand...

 BTW, I'm not advocating adding mutation counters, I don't plan on adding  
 them.  It's just another alternative to "soft" mutations.

I understand.

Mar 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 08 Mar 2010 00:20:31 -0500, Robert Jacques <sandford jhu.edu>  
wrote:

 On Sun, 07 Mar 2010 22:07:14 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:
 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:


 [snip]
 Please define for me an O(1) slice or index operation for a  
 linked-list.

 One for which you have references to the slice end points.  I think  
 this will work, and I was planning on providing it in the upcoming  
 dcollections port.  The only thing you cannot guarantee is that the  
 order is correct.

 The container would have to do an O(N) search to verify the ranges are  
 actually part of the collection. And using two ranges as iterators to  
 create a third range feels very wrong and very bug prone: see all the  
 issues raised during Andrei's iterators vs ranges presentations.  
 Similarly, it feels wrong for something to define slicing and not  
 indexing.

Not two ranges, two references.  That is what cursors are in  
dcollections.  They currently are akin to iterators, but are being changed  
to immovable references.

BTW, I don't plan on adding this as a simple slice operation on a  
container that cannot quickly verify that the two cursors are ordered, to  
ensure it isn't accidentally used in functions that want safe slicing.

You don't need O(n) to verify the cursors are part of the collection if  
the cursors identify the collection they are from.

 Hm... my hash outperforms builtin AAs by a wide margin.  But this is  
 not technically because my implementation is better, it's because AA's  
 use "dumb" allocation methods.  I don't know about false pointers, the  
 hash nodes in my implementation only contain pointers, so I'm not sure  
 there is any possibility for false ones.

 The GC isn't precise, so if you have a non-pointer type in a structure  
 with a pointer or in a class, you'll get false pointers. (i.e. the hash  
 value at each node)

I don't store the hash in the node, it is recalculated when a re-hash is  
done.

 I think you are overestimating the life of ranges on a container.  They  
 will not survive that many mutations, and the worst that happens is the  


 A year ago I would have agreed with you, but then I saw a couple of  
 articles about how this unlikely event does start to occur repeatably in  
 certain circumstances. This made a bunch of news since it threw a  
 massive spanner in lock-free containers, which relied on this never  
 happening.

references?

 You can also force the container to invalidate itself once the first  
 wrap occurs.  This at least will be a hard error.

 So every container will have a finite number of operation and that's it?

That's one solution, if you want to be sure this bug doesn't occur, and  
you want to be sure your ranges aren't invalidated.

-Steve

Mar 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu> 
 wrote:
 
 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 What is the advantage?  Why would an algorithm require soft 
 functions?  What is an example of such an algorithm?

 Something that uses toUpperCase or toLowerCase, for example.

 
 I guess I won't get a real example.  I'm not sure it matters.  When 
 Andrei starts implementing the soft methods, either they will be a huge 
 win or obviously useless.  If I were a betting man, I'd bet on the 
 latter, but I'm not really good at betting, and Andrei's ideas are 
 usually good :)

The way people usually take advantage of non-invalidating container 
primitives is altering the container during an iteration. The canonical 
way to remove while iterating an erase-non-invalidating container (e.g. 
map) is this:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
     if (should_delete_this_guy)
         c.remove(i++);
     else
         ++i;
}

The canonical way to remove while iterating an erase-invalidating 
container (e.g. vector) is:

typedef ... Container;
Container c;
for (Container::iterator i = c.begin(); i != c.end(); )
{
     if (should_delete_this_guy)
         i = c.remove(i);
     else
         ++i;
}

I know, subtle :o/. Of course, things can get quickly more interesting 
when calling functions from within loops that may affect the container 
(think the Observer pattern).


Andrei

Mar 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 08 Mar 2010 12:53:40 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu>  
 wrote:

 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:
 What is the advantage?  Why would an algorithm require soft  
 functions?  What is an example of such an algorithm?

 Something that uses toUpperCase or toLowerCase, for example.

  I guess I won't get a real example.  I'm not sure it matters.  When  
 Andrei starts implementing the soft methods, either they will be a huge  
 win or obviously useless.  If I were a betting man, I'd bet on the  
 latter, but I'm not really good at betting, and Andrei's ideas are  
 usually good :)

 The way people usually take advantage of non-invalidating container  
 primitives is altering the container during an iteration. The canonical  
 way to remove while iterating an erase-non-invalidating container (e.g.  
 map) is this:

 typedef ... Container;
 Container c;
 for (Container::iterator i = c.begin(); i != c.end(); )
 {
      if (should_delete_this_guy)
          c.remove(i++);
      else
          ++i;
 }

 The canonical way to remove while iterating an erase-invalidating  
 container (e.g. vector) is:

 typedef ... Container;
 Container c;
 for (Container::iterator i = c.begin(); i != c.end(); )
 {
      if (should_delete_this_guy)
          i = c.remove(i);
      else
          ++i;
 }

Hm... this seems to be a different problem than I originally was thinking  
about.  So in essence, you want a set of "safe" functions that will not  
adversely affect a range you are using to iterate with?

BTW, I solved this problem (removing while iterating) in a much  
better/simpler way in dcollections.

-Steve

Mar 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Mon, 08 Mar 2010 12:53:40 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 On Sun, 07 Mar 2010 12:43:09 -0500, Robert Jacques <sandford jhu.edu> 
 wrote:

 On Sun, 07 Mar 2010 08:23:03 -0500, Steven Schveighoffer 
 <schveiguy yahoo.com> wrote:
 What is the advantage?  Why would an algorithm require soft 
 functions?  What is an example of such an algorithm?

 Something that uses toUpperCase or toLowerCase, for example.

  I guess I won't get a real example.  I'm not sure it matters.  When 
 Andrei starts implementing the soft methods, either they will be a 
 huge win or obviously useless.  If I were a betting man, I'd bet on 
 the latter, but I'm not really good at betting, and Andrei's ideas 
 are usually good :)

 The way people usually take advantage of non-invalidating container 
 primitives is altering the container during an iteration. The 
 canonical way to remove while iterating an erase-non-invalidating 
 container (e.g. map) is this:

 typedef ... Container;
 Container c;
 for (Container::iterator i = c.begin(); i != c.end(); )
 {
      if (should_delete_this_guy)
          c.remove(i++);
      else
          ++i;
 }

 The canonical way to remove while iterating an erase-invalidating 
 container (e.g. vector) is:

 typedef ... Container;
 Container c;
 for (Container::iterator i = c.begin(); i != c.end(); )
 {
      if (should_delete_this_guy)
          i = c.remove(i);
      else
          ++i;
 }

 
 Hm... this seems to be a different problem than I originally was 
 thinking about.  So in essence, you want a set of "safe" functions that 
 will not adversely affect a range you are using to iterate with?

That's the idea. Any documentation of STL containers specifies whether 
iterators are invalidated or not by a primitive. The plan is to simply 
define a nomenclature (i.e. "softXxx") for primitives that don't 
invalidate iterators. That moves the documentation into the name of the 
function.

 BTW, I solved this problem (removing while iterating) in a much 
 better/simpler way in dcollections.

Do tell!


Andrei

Mar 08 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 08 Mar 2010 13:21:14 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
  Hm... this seems to be a different problem than I originally was  
 thinking about.  So in essence, you want a set of "safe" functions that  
 will not adversely affect a range you are using to iterate with?

 That's the idea. Any documentation of STL containers specifies whether  
 iterators are invalidated or not by a primitive. The plan is to simply  
 define a nomenclature (i.e. "softXxx") for primitives that don't  
 invalidate iterators. That moves the documentation into the name of the  
 function.

What I meant was, your example only focuses on the iterator being advanced  
in the loop.  Another iterator might be invalidated that is not used in  
the remove function.  Was that the intent?

 BTW, I solved this problem (removing while iterating) in a much  
 better/simpler way in dcollections.

 Do tell!

I created a method purge, which is used as an opApply in a foreach loop  
like so:

foreach(ref erase, v; &container.purge)
{
    erase = should_delete_this_guy;
}

One nifty thing about this, purging an entire collection is an O(n) up to  
O(nlgn) operation, whereas purging a vector as you wrote is an O(n*n)  
operation.

I found that one of the biggest reasons I liked about STL's containers  
over others that I've used is the ability to remove elements from the  
container while iterating.  Java allows this also, but no foreach, and it  
doesn't make optimizations, since the iterator is in charge of removal,  
not the container itself.

-Steve

Mar 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Mon, 08 Mar 2010 13:21:14 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
  Hm... this seems to be a different problem than I originally was 
 thinking about.  So in essence, you want a set of "safe" functions 
 that will not adversely affect a range you are using to iterate with?

 That's the idea. Any documentation of STL containers specifies whether 
 iterators are invalidated or not by a primitive. The plan is to simply 
 define a nomenclature (i.e. "softXxx") for primitives that don't 
 invalidate iterators. That moves the documentation into the name of 
 the function.

 
 What I meant was, your example only focuses on the iterator being 
 advanced in the loop.  Another iterator might be invalidated that is not 
 used in the remove function.  Was that the intent?

The examples were given just to illustrate common cases when 
invalidation is relevant. As I mentioned, interesting cases occur when 
long-distance effects occur. The Observer pattern is a perfect example.

 BTW, I solved this problem (removing while iterating) in a much 
 better/simpler way in dcollections.

 Do tell!

 
 I created a method purge, which is used as an opApply in a foreach loop 
 like so:
 
 foreach(ref erase, v; &container.purge)
 {
    erase = should_delete_this_guy;
 }

Very nifty indeed!

 One nifty thing about this, purging an entire collection is an O(n) up 
 to O(nlgn) operation, whereas purging a vector as you wrote is an O(n*n) 
 operation.

Good point. The loop with erase is good if you have to occasionally 
erase some element; for bulk conditional erasures the canonical STL 
approach is to use remove() followed by erase():

http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Erase-Remove

 I found that one of the biggest reasons I liked about STL's containers 
 over others that I've used is the ability to remove elements from the 
 container while iterating.  Java allows this also, but no foreach, and 
 it doesn't make optimizations, since the iterator is in charge of 
 removal, not the container itself.

Yah, the tail that removes the dog. That's one of the reasons I think 
Java's containers are missing more points than one. In fact the more I 
think of Java containers, the more reasons I find to dislike them. To me 
they're more of a point of negative potential in the design space that 
needs to be carefully navigated away from.


Andrei

Mar 08 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Robert Jacques (sandford jhu.edu)'s article
 The mutation index has been used in Tango forever, and I think was in
 Doug Lea's original container implementations.  I'm pretty sure it is
 sound in single-threaded uses.

 No it's not. version tags + integer overflow = bug. Doug Lea knew about
 the problem and but thought it would never happen in real code. And Bill
 Gates thought no one will need more than 640k of ram. They both have been
 proven wrong.

Using a 32-bit version tag probably could lead to overflow in some corner cases,
but in the 64-bit case it can be shown that this is absurdly unlikely.  Assume
that an increment operation takes about 1 clock cycle (in reality it's more) and
that most CPUs are 10 GHz (to make things future-proof in case people figure out
how to get the clock speed race going again).

This means that a version tag could, at most, increment at about 10^10 ticks per
second.  A 64-bit unsigned int goes up to about 1.8x10^19.  This means that,
even
if our program does nothing but update the version counter in an empty for loop,
it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow.  There are about
3.15x10^7 seconds in a year.

Therefore, even under very pessimistic assumptions the fastest you could
overflow
a 64-bit unsigned int by incrementing it starting at zero is in about half a
century.

Mar 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Robert Jacques (sandford jhu.edu)'s article
 The mutation index has been used in Tango forever, and I think was in
 Doug Lea's original container implementations.  I'm pretty sure it is
 sound in single-threaded uses.

 No it's not. version tags + integer overflow = bug. Doug Lea knew about
 the problem and but thought it would never happen in real code. And Bill
 Gates thought no one will need more than 640k of ram. They both have been
 proven wrong.

 
 Using a 32-bit version tag probably could lead to overflow in some corner
cases,
 but in the 64-bit case it can be shown that this is absurdly unlikely.  Assume
 that an increment operation takes about 1 clock cycle (in reality it's more)
and
 that most CPUs are 10 GHz (to make things future-proof in case people figure
out
 how to get the clock speed race going again).
 
 This means that a version tag could, at most, increment at about 10^10 ticks
per
 second.  A 64-bit unsigned int goes up to about 1.8x10^19.  This means that,
even
 if our program does nothing but update the version counter in an empty for
loop,
 it will take 1.8x10^19 / 10^10 = 1.8x10^9 seconds to overflow.  There are about
 3.15x10^7 seconds in a year.
 
 Therefore, even under very pessimistic assumptions the fastest you could
overflow
 a 64-bit unsigned int by incrementing it starting at zero is in about half a
century.

Yah, there are many similar assessments that at least for a couple of 
centuries going we can assume that 64-bit counters never overflow and 
64-bit virtual address space is sparse. First time I saw this was when I 
read about the Opal OS: 
http://www.cs.washington.edu/homes/levy/opal/sigops92.ps

Andrei

Mar 08 2010

Steve Teale <steve.teale britseyeview.com> writes:

 Sounds good?

 
 How can softRemove not affect iterating ranges?  What if the range is
 positioned on the element removed?
 
 The only two containers that would support softInsert would be linked
 list and sorted map/set.  Anything else might completely screw up the
 iteration.  I don't see a lot of "generic" use for it.
 
 Another option is to use a "mutation" field that is checked every chance
 by the range.  If it changes, then the range is invalidated.
 
 -Steve

Steve,

Wouldn't the soft things throw (or maybe just nudge) an exception if the 
container was in a state where the soft option was not appropriate.

That would be the other way round to what you said, The container would 
note when some range operation had a conflicting currency.

Steve

Mar 06 2010

Steven Schveighoffer <schveiguy yahoo.com> writes:

Steve Teale Wrote:

 Sounds good?

 
 How can softRemove not affect iterating ranges?  What if the range is
 positioned on the element removed?
 
 The only two containers that would support softInsert would be linked
 list and sorted map/set.  Anything else might completely screw up the
 iteration.  I don't see a lot of "generic" use for it.
 
 Another option is to use a "mutation" field that is checked every chance
 by the range.  If it changes, then the range is invalidated.
 
 -Steve

 
 Steve,
 
 Wouldn't the soft things throw (or maybe just nudge) an exception if the 
 container was in a state where the soft option was not appropriate.
 
 That would be the other way round to what you said, The container would 
 note when some range operation had a conflicting currency.

I think the point of soft functions is to not compile if they are not supported
by the container.  Can you think of a way a container can have a soft option
that is appropriate sometimes, but not others (thereby enabling a runtime
option)?  I can't really see it, but maybe I'm missing it.

-Steve

Mar 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 Andrei Alexandrescu Wrote:
 
 In the STL world, writing container-independent code is generally 
 shunned (see e.g. 
 http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

 One problem is a very small intersection between the functionalities 
 offered by the various STL containers, and the conceptual organization 
 that is weaker than that of iterators.

 A worse problem is iterator invalidation rules, something that we'll 
 need to address too. I'm thinking that the best defense is a strong 
 offense, and I plan to define the following naming convention:

 Methods such as insert, remove, pushFront, pushBack, removeFront, 
 removeBack, are assumed to affect the container's topology and must be 
 handled in user code as such.

 In addition to those, a container may also define functions named after 
 the above by adding a "soft" prefix (e.g. softInsert, softRemove...) 
 that are guaranteed to not affect the ranges currently iterating the 
 container.

 Generic code that needs specific iterator (non-)invalidation rules can 
 use softXxx methods, in confidence that containers not supporting it 
 will be ruled out during compilations.

 Sounds good?

 
 How can softRemove not affect iterating ranges?  What if the range is
positioned on the element removed?

With GC, you can softRemove things without invalidating iterators.

 The only two containers that would support softInsert would be linked list and
sorted map/set.  Anything else might completely screw up the iteration.  I
don't see a lot of "generic" use for it.

Singly linked lists, doubly linked lists, various trees - three's a 
crowd. Most node-based containers support softInsert.


Andrei

Mar 06 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-03-06 14:55:49 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Steven Schveighoffer wrote:
 How can softRemove not affect iterating ranges?  What if the range is 
 positioned on the element removed?

 
 With GC, you can softRemove things without invalidating iterators.

What exactly is an "invalidated" iterator? Is an iterator that no 
longer points to the container still "valid"? Or is "valid" just 
another word for "memory safe"?

Wouldn't the notion of "detached" iterators/ranges be more useful?

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Michel Fortin wrote:
 On 2010-03-06 14:55:49 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Steven Schveighoffer wrote:
 How can softRemove not affect iterating ranges?  What if the range is 
 positioned on the element removed?

 With GC, you can softRemove things without invalidating iterators.

 
 What exactly is an "invalidated" iterator? Is an iterator that no longer 
 points to the container still "valid"? Or is "valid" just another word 
 for "memory safe"?
 
 Wouldn't the notion of "detached" iterators/ranges be more useful?

Good question. In STL, invalidation roughly means undefined behavior if 
you use it. With GC in tow, the concept could be significantly milder. 
For example, reallocating an array would leave the old contents of the 
array sort of around, just obsoleted and also depleted of meaningful 
content if the data type has a destructor.

Andrei

Mar 06 2010

"Joel C. Salomon" <joelcsalomon gmail.com> writes:

On 3/6/2010 5:18 PM, Andrei Alexandrescu wrote:
 Good question. In STL, invalidation roughly means undefined behavior if
 you use it. With GC in tow, the concept could be significantly milder.
 For example, reallocating an array would leave the old contents of the
 array sort of around, just obsoleted and also depleted of meaningful
 content if the data type has a destructor.

So it’s still a bug to use an invalidated iterator/range, and probably
still has unusual (if no longer undefined) effects if you do follow it,
but it (probably) won’t cause memory corruption.  Is that correct?

—Joel Salomon

Mar 08 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Joel C. Salomon wrote:
 On 3/6/2010 5:18 PM, Andrei Alexandrescu wrote:
 Good question. In STL, invalidation roughly means undefined behavior if
 you use it. With GC in tow, the concept could be significantly milder.
 For example, reallocating an array would leave the old contents of the
 array sort of around, just obsoleted and also depleted of meaningful
 content if the data type has a destructor.

 
 So it’s still a bug to use an invalidated iterator/range, and probably
 still has unusual (if no longer undefined) effects if you do follow it,
 but it (probably) won’t cause memory corruption.  Is that correct?

I'd think so. In SafeD, clearly there's no memory corruption so throwing 
or iterating over removed elements would be valid choices. In non-safe 
mode, the user should expect undefined behavior.

Andrei

Mar 08 2010

BLS <windevguy hotmail.de> writes:

On 06/03/2010 20:55, Andrei Alexandrescu wrote:
 Singly linked lists, doubly linked lists, various trees - three's a
 crowd. Most node-based containers support softInsert.

slightly OT
I am pretty sure that you have to implement IsRangeOfRange! and 
SubRange! for non linear collections. (if not already done)

Guess what I want to say is that I doubt that std.algo will fit in every 
case; and creating  a collection lib reduced to what is possible under 
the std.algo umbrella is (IMHO) a bad idea.

..and frankly said I wonder .. no offense.. why you are the (only) one 
who is creating and designing the collection library ?
finally :
from a pragmatic view > why create from scratch instead of reusing 
Stevens data-structures..
Bjoern

Mar 06 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 06 Mar 2010 19:41:57 -0500, BLS <windevguy hotmail.de> wrote:

 ..and frankly said I wonder .. no offense.. why you are the (only) one  
 who is creating and designing the collection library ?
 finally :
 from a pragmatic view > why create from scratch instead of reusing  
 Stevens data-structures..

We have had private discussions, and we have fundamental differences of  
opinion for what belongs in containers.  It may be that through trial and  
error and constructive discussion we come up with the same conclusions, in  
which case I'm sure we will combine forces (recently, I think we both  
moved our points of view closer, but not quite there yet).  However, I  
will not stop maintaining dcollections as long as Phobos' collection  
solution does not implement the features I think are important.  I am in  
the process of porting/improving dcollections to D2, and when it is done,  
I think it will have some really useful features in it.

-Steve

Mar 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sat, 06 Mar 2010 19:41:57 -0500, BLS <windevguy hotmail.de> wrote:
 
 ..and frankly said I wonder .. no offense.. why you are the (only) one 
 who is creating and designing the collection library ?
 finally :
 from a pragmatic view > why create from scratch instead of reusing 
 Stevens data-structures..

 
 We have had private discussions, and we have fundamental differences of 
 opinion for what belongs in containers.  It may be that through trial 
 and error and constructive discussion we come up with the same 
 conclusions, in which case I'm sure we will combine forces (recently, I 
 think we both moved our points of view closer, but not quite there 
 yet).

I concur with Steve's characterization of the situation. My interface 
definitions are very crisp and my cost function looks more like a step 
function, so I'm difficult to negotiate with. I'm running a mile away 
faster than you can say "Here, there's this Container.contains() method, 
which..."

But Steve suggested that he could volunteer, with credit, some of his 
implementation code to my interfaces, even though he has disagreements 
regarding the exact definitions of those interfaces.

 However, I will not stop maintaining dcollections as long as 
 Phobos' collection solution does not implement the features I think are 
 important.  I am in the process of porting/improving dcollections to D2, 
 and when it is done, I think it will have some really useful features in 
 it.

That's great. This kind of competition can only be best for everyone.


Andrei

Mar 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 06 Mar 2010 14:55:49 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:
 Andrei Alexandrescu Wrote:

 In the STL world, writing container-independent code is generally  
 shunned (see e.g.  
 http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

 One problem is a very small intersection between the functionalities  
 offered by the various STL containers, and the conceptual organization  
 that is weaker than that of iterators.

 A worse problem is iterator invalidation rules, something that we'll  
 need to address too. I'm thinking that the best defense is a strong  
 offense, and I plan to define the following naming convention:

 Methods such as insert, remove, pushFront, pushBack, removeFront,  
 removeBack, are assumed to affect the container's topology and must be  
 handled in user code as such.

 In addition to those, a container may also define functions named  
 after the above by adding a "soft" prefix (e.g. softInsert,  
 softRemove...) that are guaranteed to not affect the ranges currently  
 iterating the container.

 Generic code that needs specific iterator (non-)invalidation rules can  
 use softXxx methods, in confidence that containers not supporting it  
 will be ruled out during compilations.

 Sounds good?

  How can softRemove not affect iterating ranges?  What if the range is  
 positioned on the element removed?

 With GC, you can softRemove things without invalidating iterators.

If you define invalidation by "not pointing to unallocated memory,"  
wouldn't normal remove (not soft remove) also result in valid memory being  
pointed to?  Why do we need a special soft version?

Even if this is what you mean, with allocators (which significantly speed  
up container insert/removal), you may be pointing to a newly valid piece  
of memory that may not be where you want.

I thought you meant by not invalidating that the range would iterate over  
the elements it was originally scheduled to iterate over, sans the removed  
element.

 The only two containers that would support softInsert would be linked  
 list and sorted map/set.  Anything else might completely screw up the  
 iteration.  I don't see a lot of "generic" use for it.

 Singly linked lists, doubly linked lists, various trees - three's a  
 crowd. Most node-based containers support softInsert.

Hashes may rehash on insert, invalidating any ranges (it's one of the  
notes of dcollections' HashMap and HashSet).  I guess it depends on if an  
insert alters the ordering of the nodes.  This is why I said sorted  
containers which should be unaffected.

You have a point that this covers a lot of containers however, but I think  
the usefulness of just having a softInsert (I think softRemove is not a  
useful concept) is pretty limited.  It limits you to

BTW, I think its a good idea to think about how to make this work, if we  
can come up with something that covers a lot of ground, I think that's a  
good thing.  I thought of making ranges slower but more robust to  
container changes in debug mode, does this sound useful?  I.e. if you  
compile in non-release mode, removing/adding an element may trigger the  
range to do a O(n) check to validate itself.

-Steve

Mar 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sat, 06 Mar 2010 14:55:49 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Steven Schveighoffer wrote:
 Andrei Alexandrescu Wrote:

 In the STL world, writing container-independent code is generally 
 shunned (see e.g. 
 http://www.informit.com/content/images/0201749629/items/item2-2.pdf).

 One problem is a very small intersection between the functionalities 
 offered by the various STL containers, and the conceptual 
 organization that is weaker than that of iterators.

 A worse problem is iterator invalidation rules, something that we'll 
 need to address too. I'm thinking that the best defense is a strong 
 offense, and I plan to define the following naming convention:

 Methods such as insert, remove, pushFront, pushBack, removeFront, 
 removeBack, are assumed to affect the container's topology and must 
 be handled in user code as such.

 In addition to those, a container may also define functions named 
 after the above by adding a "soft" prefix (e.g. softInsert, 
 softRemove...) that are guaranteed to not affect the ranges 
 currently iterating the container.

 Generic code that needs specific iterator (non-)invalidation rules 
 can use softXxx methods, in confidence that containers not 
 supporting it will be ruled out during compilations.

 Sounds good?

  How can softRemove not affect iterating ranges?  What if the range 
 is positioned on the element removed?

 With GC, you can softRemove things without invalidating iterators.

 
 If you define invalidation by "not pointing to unallocated memory," 
 wouldn't normal remove (not soft remove) also result in valid memory 
 being pointed to?  Why do we need a special soft version?

Well consider an array of File objects, which are reference counted. I'd 
want the resizing of an array to not increment the reference count of 
the files. That means that resizing leaves the old array without 
meaningful content by calling clear() against the remaining files.

 Even if this is what you mean, with allocators (which significantly 
 speed up container insert/removal), you may be pointing to a newly valid 
 piece of memory that may not be where you want.
 
 I thought you meant by not invalidating that the range would iterate 
 over the elements it was originally scheduled to iterate over, sans the 
 removed element.

For remove that should be the case, but probably not for insert.

 The only two containers that would support softInsert would be linked 
 list and sorted map/set.  Anything else might completely screw up the 
 iteration.  I don't see a lot of "generic" use for it.

 Singly linked lists, doubly linked lists, various trees - three's a 
 crowd. Most node-based containers support softInsert.

 
 Hashes may rehash on insert, invalidating any ranges (it's one of the 
 notes of dcollections' HashMap and HashSet).  I guess it depends on if 
 an insert alters the ordering of the nodes.  This is why I said sorted 
 containers which should be unaffected.
 
 You have a point that this covers a lot of containers however, but I 
 think the usefulness of just having a softInsert (I think softRemove is 
 not a useful concept) is pretty limited.  It limits you to

Iterator invalidation has been a notion thoroughly explored by the STL 
and a commonly-mentioned liability of STL iterators. People find it very 
jarring that syntactically identical interfaces have distinct effects on 
iterators, it's a dependency very difficult to track. I'm glad that that 
experience has already been accumulated and that we can build upon it.

 BTW, I think its a good idea to think about how to make this work, if we 
 can come up with something that covers a lot of ground, I think that's a 
 good thing.  I thought of making ranges slower but more robust to 
 container changes in debug mode, does this sound useful?  I.e. if you 
 compile in non-release mode, removing/adding an element may trigger the 
 range to do a O(n) check to validate itself.

I think debug vs. release should not affect complexity.


Andrei

Mar 07 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:


 Iterator invalidation has been a notion thoroughly explored by the STL  
 and a commonly-mentioned liability of STL iterators. People find it very  
 jarring that syntactically identical interfaces have distinct effects on  
 iterators, it's a dependency very difficult to track. I'm glad that that  
 experience has already been accumulated and that we can build upon it.

Is there any experience that led to a solution, even in theory?

I'm going to stop arguing this point any further, because the quickest  
path to resolution is probably for you to stop wasting time debating with  
me and implement what you think will work.  Then we can see how useful it  
might be.

If you can solve this problem in a way that is useful, I think it might be  
as revolutionary as ranges are.

-Steve

Mar 07 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 07 Mar 2010 06:42:10 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 
 Iterator invalidation has been a notion thoroughly explored by the STL 
 and a commonly-mentioned liability of STL iterators. People find it 
 very jarring that syntactically identical interfaces have distinct 
 effects on iterators, it's a dependency very difficult to track. I'm 
 glad that that experience has already been accumulated and that we can 
 build upon it.

 
 Is there any experience that led to a solution, even in theory?

I don't know of one. There are of course the safe iterators that rely on 
a variety of techniques (some, such as serial numbers, discussed in this 
thread) to detect invalidation dynamically.

For the style of containers discussed herein, a static solution should 
be a better fit than a dynamic one. I plan to enact a solution by means 
of the softXxx primitives.

 I'm going to stop arguing this point any further, because the quickest 
 path to resolution is probably for you to stop wasting time debating 
 with me and implement what you think will work.  Then we can see how 
 useful it might be.
 
 If you can solve this problem in a way that is useful, I think it might 
 be as revolutionary as ranges are.

Probably not. The way I see it, it would be simply the removal of a 
large annoyance that stands in the way of writing container-independent 
generic code.


Andrei

Mar 08 2010

Fawzi Mohamed <fmohamed mac.com> writes:

I am coming late into this discussion, some comments:

I like a duck typing approach whenever possible, and don't need runtime 
adaptability (which seems to be in line with others)

different container have different trade offs, and switching between 
them should be a design choice

Generic algorithm in my opinion are useful to easily (and with little 
well tested code) give a wise range interface to several containers.
They should not "forbid" special implementation of some algorithms in a 
container.
To have a common interface one can always use a trampoline function, 
that looks for the specialized implementation in various places.
I know that in theory one should be able to write specialized templates 
instead, but in practice I have often encountered bugs and strange 
behavior, in container implementation seems much cleaner to me

Different containers have different "threadsafeness". Normally some 
subset of operations is threadsafe, but mixing in other might break it. 
This in my opinion *must* be documented, for example one should not 
have to look to the source to know if addition of an element to an AA 
breaks a foreach loop or not.
Threadsafeness has a cost, in general non threadsafe structures are 
faster, so it makes sense to have them (especially thinking that often 
even in the non threadsafe structures one can use a subset of 
operations concurrently without problems.
There is a large class of structures that *are* threadsafe, these have 
some beautiful properties, and are very useful to know. A very good 
review of them (I don't say introduction because the book is quite 
advanced) is "Purely Functional Data Structures" by Chris Okasaki. In 
these structures the values read are always the same, but lazy 
evaluation might be used to transform amortized cost in non amortized 
cost.

I had already spoken about these back during the const discussion, 
because I would have liked that these structures were "immutable" and 
usable by pure functions, which meant a slighlty different thing from 
immutable. Some casting might be used to implement them, if one assumes 
that casting a mutable structure to immutable and modifying it later is 
doable (no compiler optimization on immutable disallow it, this could 
be done even just for a special "thunk" that contain a lazy function 
that will produce the value). Anyway that is another discussion.

For the current discussion the important thing is that yes, there are 
fully threadsafe structures, and it would be useful to have them, but 
in general they have a slightly larger cost.
Not safe structures are also useful as they are faster, in general in 
the documentation one expects to find which operations are safe and 
which not.
The minima baseline typically is that just reading is threadsafe, 
modifications in general aren't and might invalidate all pointers to 
the structure.
Exceptions to this should be clearly stated, for example appending to 
an array can be made threadsafe without a large cost, whereas insertion 
(if the update of an element is not atomic) cannot.

If D really wants to go after multithreading it makes sense to offer as 
many safe structures as possible, and in any case clearly document the 
non threadsafeness of the others.

it might make sense to give "safe" interfaces (I would say basically 
just reading without updates), but one cannot expect to really express 
all threadsafeness of a container though interfaces, because that would 
be too complicated and then one would end up with one container per 
interface without much gain.

Fawzi

Mar 07 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-03-06 07:49:18 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Methods such as insert, remove, pushFront, pushBack, removeFront, 
 removeBack, are assumed to affect the container's topology and must be 
 handled in user code as such.

I think it'd be better to say "they are assumed to invalidate linked 
ranges". I know it's the same thing, but I think it's preferable to 
state the effect on the container's interface than the effects on the 
container's internals.

Side note: me thinks it'd be better to rename pushFront and pushBack to 
insertFront and insertBack. One reason for this is that "push" is often 
seen as the opposite of "pop" in container parlance while in our case 
it's not. Also, "insert" is a more natural opposite for "remove". And 
you're already using the "insert" term standalone anyway when it's not 
accompanied by "Front" or "Back".


 In addition to those, a container may also define functions named after 
 the above by adding a "soft" prefix (e.g. softInsert, softRemove...) 
 that are guaranteed to not affect the ranges currently iterating the 
 container.

I think that's a good idea. The problem is that you don't necessarily 
know when writing an algorithm whether you are authorized or not to 
invalidate iterators. Wouldn't it be more useful if it worked with a 
wrapper instead?

	auto range = container[];
	playWithContainer(container.softwrap); // playWithContainer can't 
invalidate range
	playWithRange(range); // range is still valid here

The soft wrapper could be build automatically from compile time 
reflection by looking at which soft methods are available.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Mar 06 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Container insertion and removal