www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - BIg semantic issues with SList & DList

reply "monarch_dodra" <monarchdodra gmail.com> writes:
Basically, the problem is that their copy semantics adhere to 
neither value nor reference semantics. The embody a "reference to 
a third party chain system" semantic.

Erm... What does this mean?

Basically, a "chain of nodes is created", and the lists reference 
points in that chain, but not necessarily the beginnings nor 
ends. How is this a problem? I'll start with SList:

SList
--------
import std.stdio, std.container, std.range, std.iostream, 
std.algorithm;
void main()
{
     auto a = SList!int([3, 4]);
     auto b = a;
     // ab -> 3 -> 4 -> null
     assert(a[].equal([3, 4]));
     assert(b[].equal([3, 4]));
     a.stableInsertFront(1);
     b.stableInsertFront(2);
     // a -> 1
     //       \
     //        3 -> 4 -> null
     //       /
     // b -> 2
     assert(a[].equal([1, 3, 4]));
     assert(b[].equal([2, 3, 4]));

     //a: change the 2nd element (4) into 5;
     a[].drop(2).front = 5;
     // a -> 1
     //       \
     //        3 -> 5 -> null
     //       /
     // b -> 2
     assert(a[].equal([1, 3, 5]));
     assert(b[].equal([2, 3, 5]));

     a.clear();
     // a -> null
     //
     //        3 -> 4 -> null
     //       /
     // b -> 2
     assert(a[].empty);
     assert(b[].equal([2, 3, 5]));

     a.insert(1);
     // -> 1 -> null
     //
     //      3 -> 5-> null
     //     /
     // -> 2
     assert(a[].equal([1]));
     assert(b[].equal([2, 3, 5]));
}
--------
Is this a clear example of what I mean?
NOW: The big question:

IS THIS A LEGITIMATE BEHAVIOR?

IMO, for SList, yes, yes it is. _HOWEVER_, it must be much 
clearly documented in the docs, as well as examples provided.

I have done some relatively heavy testing, and I have found no 
bugs in SList.

However, there are a couple *Gotcha's*, in particular, regarding 
the effects of "remove": Are we merely advancing the SList's 
"root pointer", or actually excising elements from the "chain" 
itself? Depends on context :D but nothing that can't be 
documented.

DLists has the same semantic:
--------
import std.stdio, std.container, std.range, std.algorithm;
void main()
{
     auto a = DList!int([3, 4]);
     auto b = a;
     // (3 - 4)
     assert(a[].equal([3, 4]));
     assert(b[].equal([3, 4]));

     b.stableInsertFront(1);
     b.stableInsertBack(5);
     // (2 - (3 - 4) - 5)
     assert(a[].equal([3, 4]));
     assert(b[].equal([1, 3, 4, 5]));

     a.stableInsertFront(2);
     // (1 - (2 - 3 - 4) - 5)
     assert(a[].equal([2, 3, 4]));
     writeln(b[]);
     assert(b[].equal([1, 2, 3, 4, 5]));

     //Does not work as expected!
     a.remove(a[]);
     writeln(a[]); //[2, 3, 4]
     writeln(b[]); //[1, 5]
}
--------
The example stops here, because DList was obviously written 
without this in mind. I don't have enough fingers on my hands to 
list the use-cases that break it.

Again, same question:

IS THIS A LEGITIMATE BEHAVIOR?

The gotchas are even more prevalent here. Not only gotchas, but 
just plain problems regarding theoretical behavior: What does it 
mean to append two DLists, if those DLists are views into two 
different, but, bigger, chains?

IMO: For DList, it is just too damn complicated to keep this 
semantic. I'd migrate to "true reference".


--------
The good news is that having "true" reference semantics with 
{SD}List should be relatively easy. Depending on the output of 
this discussion, it is something <s>I wouldn't mind</s> I'd enjoy 
doing.

TY for reading.
Sep 12 2012
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 12 September 2012 at 17:24:57 UTC, monarch_dodra
wrote:
 [SNIP]

I did some tests and implementations, and I came with 3 options: 1) Embrace the "view of a chain" implementation and document it. 2) Use classic reference semantics. However, ranges can still have an "out of bonds" view of the chain. This means the ranges are virtually never invalidated. 3) Use classic reference semantics. ENFORCE range bounds be inside the underlying DList. This forces us to systematically "detach" removed nodes (as opposed to just advancing the _first pointer) ... 4) There was a 4th solution I did not like: use classic reference semantics. Do not provide support for "out of bounds" view of the Range, but do not enforce it either. This is the "Bites you in the ass" solution, and kind of what we currently had :( All 3 solutions have been implemented. Common to all 3 implementations: *Added a LOT of unit tests. *Added a description in the beggining *Fixed the opOpAssign functions. I know they weren't working because their names were typoed (!). *MUCH more robust assertions of internals (and minor bugfixes). *Ranges are now assertion based validated. (as opposed to enforced). Implementation 1: This embraces the "chain" view. It works as advertised. The downside though is in the concept, in that it _IS_ actually possible to invalidate a DList itself (not just the range). Everything is Asserted, and I do not think it is possible to have some bad usage which isn't asserted. I had to change a few implementations, but everything works now. Here: https://github.com/monarchdodra/phobos/commit/054fe9bea5652aa0e3089d279d4df2ae069d9f30#std/container.d Implementation 2: Reference semantics. However, Ranges can still have an out of bounds view of the chain. This is because functions like "removeFront" don't actually detach nodes, merelly advance the get pointer. The *real* advantage of this approach is that Range are virtually never invalidated. Also had to change a few implementations. This is kind of a hybrid soltuion, as the DLists themselves keep the traditional semantics, but the ranges themselves are more of a "can see anywhere in the chain" object. https://github.com/monarchdodra/phobos/commit/2b08b46bf552f208deb897ec0f8c5b3327225c8c#std/container.d Implementation 3: Classic implementation. No surprises. Robust, safe, expected. Boring. Boring is good. IMO, this is what people expect. Not a lot of changes here actually. https://github.com/monarchdodra/phobos/commit/8ce21e1a9a0107e74ff86a9acc736595ef73362c#std/container.d Could I get any kind of feedback?
Sep 13 2012
prev sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Thursday, 13 September 2012 at 19:37:41 UTC, monarch_dodra 
wrote:
 On Wednesday, 12 September 2012 at 17:24:57 UTC, monarch_dodra
 wrote:
 [SNIP]


I'll just copy paste here the 3 headers which contain examples, to give a quicker idea of what I went for: 111 CHAIN IMPLEMENTATION 111 /** Implements a doubly-linked list. $(D DList) uses neither reference nor value semantics. They can be seen as several different handles into an external chain of nodes. Several different $(D DList)s can all reference different points in a same chain. $(D DList.Range) is, for all intents and purposes, a DList with range semantics. The $(D DList.Range) has a view directly into the chain itself. It is not tied to its parent $(D DList), and may be used to operate on other lists (that point to the same chain). The ONLY operation that can invalidate a $(D DList) or $(D DList.Range), but which will invalidate BOTH, is the $(D remove) operation, if the cut Range overlaps with the boundaries of another DList or DList.Range. Example: ---- auto a = DList!int([3, 4]); //Create a new chain auto b = a; //Point to the same chain // (3 - 4) assert(a[].equal([3, 4])); assert(b[].equal([3, 4])); b.stableInsertFront(1); //insert before of b b.stableInsertBack(5); //insert after of b // (2 - (3 - 4) - 5) assert(a[].equal([3, 4])); //a is not changed assert(b[].equal([1, 3, 4, 5])); // but a is a.stableInsertFront(2); //insert in front of a, this will insert "inside" the chain // (1 - (2 - 3 - 4) - 5) assert(a[].equal([2, 3, 4])); //a is modified assert(b[].equal([1, 2, 3, 4, 5])); //and so is b; a.remove(a[]); //remove all the elements of a; // (1 - 5) assert(a[].empty); //a is empty assert(b[].equal([1, 5])); //b has lost some of its elements; a.insert(2); //insert in a. This will create a new chain // (2) // (1 - 5) assert(a[].equal([2])); //a is empty assert(b[].equal([1, 5])); //b has lost some of its elements; ---- */ 222 HYBRID REFERENCE IMPLEMENATION 222 /** Implements a doubly-linked list. $(D DList) use reference semantics. They give a view into a bidirectional chain of nodes. $(D DList.Range) can be used to have $(D BidirectionalRange) view of the chain of nodes. $(D DList.Ranges) are almost never invalidated, but they may have access to nodes that are past the end of original $(D DList)'s, if that $(D DList) was shortened. Example: ---- auto a = DList!int([1, 2, 3]); //Create a new DList auto b = a; //A reference to that DList auto r = a[]; //A range view of the list a.removeFront(); a.removeBack(); assert(a[].equal([2])); // both a and b have been affected assert(b[].equal([2])); // both a and b have been affected assert(r.equal([1, 2, 3])); //But the range remains unchanged a.insertFront(1); assert(a[].equal([1, 2])); // both a and b have been affected assert(b[].equal([1, 2])); // both a and b have been affected assert(r.equal([1, 1, 2, 3])); //But so has the range auto c = a; //Create a third reference a.clear(); //Detaches from the chain assert(a[].empty); //a is now empty assert(b[].equal([1, 2])); //But a and b are unchanged assert(c[].equal([1, 2])); //But a and b are unchanged assert(r.equal([1, 1, 2, 3])); b.remove(b[]); //Do a hard removal of the actual elements assert(b[].empty); //b is empty of course assert(c[].empty); //but so is c assert(r.equal([1, 3])); //and the range has lost an element: It was removed from the chain b.insert(1); //Insert an element into b assert(a[].empty); //a remains unchanged assert(b[].equal([1])); //b has the new element assert(c[].equal([1])); //and so does c assert(r.equal([1, 3])); //r points to the old chain, so does not see this new element ---- */ 333 CLASSIC IMPLEMENTATION 333 /** Implements a doubly-linked list. $(D DList) use reference semantics. $(D DList.Range) may be used to have a $(D BidirectionalRange) view into the DList. Example: ---- auto a = DList!int([1, 2, 3]); //Create a new DList auto b = a; //A reference to that DList a.removeFront(); a.removeBack(); assert(a[].equal([2])); // both a and b have been affected assert(b[].equal([2])); // both a and b have been affected a.insertFront(1); assert(a[].equal([1, 2])); // both a and b have been affected assert(b[].equal([1, 2])); // both a and b have been affected auto c = a; //Create a third reference a.clear(); //Detaches from the chain assert(a[].empty); //a is now empty assert(b[].equal([1, 2])); //But a and b are unchanged assert(c[].equal([1, 2])); //But a and b are unchanged b.remove(b[]); //Do a hard removal of the actual elements assert(b[].empty); //b is empty of course assert(c[].empty); //but so is c b.insert(1); //Insert an element into b assert(a[].empty); //a remains unchanged assert(b[].equal([1])); //b has the new element assert(c[].equal([1])); //and so does c ---- */ /* There is not much to show here actually: Ranges are *immediatly* invalidated, and attempt to use an invalid range will assert pretty quickly. */
Sep 13 2012