digitalmars.D.bugs - [Issue 10821] New: .byKey erroneously returns a null key
- d-bugmail puremagic.com (73/81) Aug 14 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (7/7) Aug 16 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (10/10) Aug 16 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (9/15) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (9/9) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (47/47) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (78/83) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (13/19) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (7/8) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (12/12) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (7/7) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (9/11) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (40/40) Aug 17 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (22/22) Sep 02 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (85/85) Sep 05 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
- d-bugmail puremagic.com (11/11) Sep 05 2013 http://d.puremagic.com/issues/show_bug.cgi?id=10821
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Summary: .byKey erroneously returns a null key Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: critical Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: andrej.mitrovich gmail.com 15:14:26 PDT --- ----- import std.stdio; struct Signal { void connect(void delegate(int) func) { stderr.writefln("Func connect: %s", *cast(void**)&func); funcs[func] = 1; } void disconnect(void delegate(int) target) { stderr.writefln("Func disconnect: %s", *cast(void**)&target); funcs.remove(target); } void emit(int i) { version(VERSION_OK) { foreach (func; funcs.keys) { stderr.writefln("Calling: %s", *cast(void**)&func); func(i); } } else version(VERSION_CRASH) { foreach (func; funcs.byKey) { stderr.writefln("Calling: %s", *cast(void**)&func); func(i); } } } int[void delegate(int)] funcs; } void main() { Signal signal; void delegate(int) handler; handler = (int i) { signal.disconnect(handler); }; signal.connect(handler); signal.emit(1); signal.emit(2); } ----- Ok version: $ dmd -version=VERSION_OK -run test.dFunc connect: 1BF2FF0 Calling: 1BF2FF0 Func disconnect: 1BF2FF0Buggy version: $ dmd -version=VERSION_CRASH -run test.dFunc connect: 482FF0 Calling: 482FF0 Func disconnect: 482FF0 Calling: null object.Error: Access ViolationIt should have never called 'null', I don't know why byKey seems to return it after the delegate was removed from the associative array. This does not seem to be a regression (tested up to 2.050). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 14 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Even more strange: comment out the line that calls func(i), and suddenly the writefln displays a non-null address! A codegen bug perhaps? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 16 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Oh actually, nevermind, the delegate removed itself from the hash. Ahhh, I see what's going on. You're modifying the AA while iterating over it. That's very bad, because byKey is implemented using a pointer to the current Slot. But once the delegate removes itself from the table, this pointer is invalidated. That's the bug in your code. :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 16 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 01:59:09 PDT ---Oh actually, nevermind, the delegate removed itself from the hash. Ahhh, I see what's going on. You're modifying the AA while iterating over it. That's very bad, because byKey is implemented using a pointer to the current Slot. But once the delegate removes itself from the table, this pointer is invalidated. That's the bug in your code. :)Oh damn, I can't believe I've run into this hash issue again. And it's very hard to spot too. I hope we can make new hashes which are not unstable like this. Hows your new hashes branch going? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Andrej Mitrovic <andrej.mitrovich gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 I don't know if it's possible to make hash iteration stable across deletes. This isn't specific to hashes either; any data structure involving pointers will have this problem. If you have an iterator pointing to bucket B1, and then you delete B1 from inside the delegate and then try to advance the iterator, you will run into this problem. Even if you use an array to store your func ptrs, say, and keep an array index to track where you are. If you then delete the current element from inside the delegate and move the array elements up to fill in the gap, the loop wouldn't know about it, and would increment the index, skipping over what was supposed to be the next element. So your iteration breaks (even though in this case you won't get invalid pointers). It may be possible to hack around this particular instance of the problem, say by checking if the entry at the current index has changed, but it doesn't help in the general case, say if your delegate removes multiple entries from the array. Either way, your iteration will be screwed up. Basically, it's impossible to have straightforward iteration when you're modifying the container in the middle of things. Even your approach of using .key to get an array of keys is not reliable -- imagine if the delegate deletes some of the subsequent keys that you're about to visit. Then you'll get RangeErrors because the keys in the array no longer exist in the AA. I've had to deal with code that has to iterate over an array (or hash) of callbacks before (this was in C++ not D, but the same issues apply). I found that the only way to keep things consistent without strange side-effects was to keep a list of to-be-deleted entries separately, and the delegate would never modify the array of callbacks directly, but would append itself to the to-be-deleted list if it wants to remove itself. Then during the iteration, anything that is found in the to-be-deleted list is skipped, and after the iteration is finished, then loop through the to-be-deleted list and actually delete the entries. This kind of issue gets worse if the delegates have the possibility of triggering the destruction of the entire container. This is not applicable to your code, but I had to deal with this when writing a protocol stack with callback handlers -- the stack itself doesn't handle things like logout, it's all handled by callbacks, which makes it very tricky when one of the callbacks may be the logout handler, which will cleanup by destructing the entire protocol stack. Upon returning to the protocol stack code that called the callback in the first place, all references to the stack instance are now invalid, so basically callbacks must be called after everything else has been processed, otherwise it will segfault if the callback happens to be the logout handler. This code is so tricky that even years after I wrote it I still have to go back and fix subtle bugs in it every so often. tl;dr: modifying a container while iterating over it is tricky business, and is best avoided if you can. :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 11:00:42 PDT ---I found that the only way to keep things consistent without strange side-effects was to keep a list of to-be-deleted entries separately, and the delegate would never modify the array of callbacks directly, but would append itself to the to-be-deleted list if it wants to remove itself.Hmm yeah, it gets complicated real fast. Thanks for the insight. So here's a quick safer workaround implementation: ----- struct Signal { void connect(void delegate(int) func) { if (emitting) { funcsToAdd[func] = 1; } else { stderr.writefln("Func connect: %s", *cast(void**)&func); funcs[func] = 1; } } void disconnect(void delegate(int) func) { if (emitting) { funcsToDelete[func] = 1; } else { stderr.writefln("Func disconnect: %s", *cast(void**)&func); funcs.remove(func); } } void emit(int i) { emitting = true; scope(exit) emitting = false; foreach (func; funcs.byKey) { stderr.writefln("Calling: %s", *cast(void**)&func); func(i); } foreach (func; funcsToDelete.byKey()) funcs.remove(func); funcsToDelete = null; foreach (func; funcsToAdd.byKey()) funcs[func] = 1; funcsToAdd = null; } int[void delegate(int)] funcs; int[void delegate(int)] funcsToAdd; int[void delegate(int)] funcsToDelete; bool emitting; } void main() { Signal signal; void delegate(int) handler; handler = (int i) { signal.disconnect(handler); }; signal.connect(handler); signal.emit(1); signal.emit(2); signal.emit(3); signal.emit(4); } ----- Of course this still isn't perfect, there's no telling what a signal handler really wants to do, whether it actually wants to add/remove some other handler immediately or only schedule add/removal for later. I guess the bottom line is this stuff is more complicated than I thought. I wonder if that new signals implementation recently announced handles this sort of stuff. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 [...]Of course this still isn't perfect, there's no telling what a signal handler really wants to do, whether it actually wants to add/remove some other handler immediately or only schedule add/removal for later.I think the only safe way is to always process adds/deletes immediately after iterating over the current handlers. Adds/removes for later can use a timer mechanism. Modifying a container while iterating over it is never a good idea. :)I guess the bottom line is this stuff is more complicated than I thought. I wonder if that new signals implementation recently announced handles this sort of stuff.Well, it should be audited for this, then. :) It should be fixed if it doesn't already handle this. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 12:30:29 PDT ---Modifying a container while iterating over it is never a good idea.However some containers explicitly support this, like dcollections. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 It does not eliminate the bad behaviour though. Suppose you're iterating over a RedBlackTree, and then add and delete some keys. Depending on whether those keys are greater or less than the current key, they will be included / excluded from the current iteration. So whether the key will be found in the current iteration depends on the relative values of the current key and the key being added/deleted, and there's no way to change this apart from scheduling adds/deletes after the iteration is over. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Besides, if you're iterating a RedBlackTree and then in the middle you delete all keys, I bet that your iteration will break in a rather ugly way. :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 14:36:30 PDT ---It does not eliminate the bad behaviour though. Suppose you're iterating over a RedBlackTree, and then add and delete some keys.I was thinking of the purge method in various dcollection containers (purge doesn't seem to be in Phobos), it's explicitly used for removal during traversal. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Sorry, I mistakenly thought you were talking about std.container. :-/ Anyway, looking over the dcollections code, I'm pretty impressed to realize that foreach can take function pointers?! That's something I've never knew before! In any case, purge is basically a restricted form of container modification while iterating -- it iterates over the elements and lets you decide whether or not to remove each one. This is a more controlled form of container modification during iteration, and is made safe by (1) (implicitly) caching the reference to the next element and not advancing the iterator if the element *was* elected to be removed, and (2) only allowing the current element to be removed, not any arbitrary element. If you relax (2), say your foreach body directly references the container and deletes random elements, then all hell breaks loose. If you enforce (2), then you can use (1) as implementation strategy to allow deletion during iteration. So one possibility in the case of your signal/slot code, is to have the delegate return a bool/flag to indicate whether to remove itself from the table. That way, you can avoid keeping lists of what to remove after the iteration. It won't be as nice as a foreach loop, though, since you have to manually control where your iterator is: auto r = aa.byKey; while (!r.empty) { auto key = r.front; auto dg = aa[key]; bool deleteDg = dg(args) == Flag.deleteMe; r.popFront(); // important: this must come before aa.remove(key) if (deleteDg == Flag.deleteMe) { aa.remove(key); } } By advancing the range before the aa.remove(key) call and only allowing the current element to be deleted, you avoid running into invalid pointers. This doesn't address the problem with dg inserting new things into aa, though, since the newly inserted entry may or may not get iterated over by the loop depending on where its hash value falls relative to the current position of the byKey range. So there's still some unpredictability there. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 17 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 13:53:50 PDT --- Btw, Slist has a stableLinearRemove, which Phobos documents as: stableLinearRemove - same as linearRemove, but guarantees iterators are not invalidated. That sounds like this pseudocode is safe: ----- auto range = slist[]; foreach (item; range) { // do something slist.stableLinearRemove(item); // do something } ----- Slist are used in a signals implementation by Johannes Pfau. He uses this stableLinearRemove in his disconnect method. If this is not actually safe, then the docs should be more clear by what they mean. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 02 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Well, linked lists are designed to not invalidate iterators when things are removed. But still, it has the same issue with removing stuff while traversing over the list. For example, say you have a list: A -> B -> C -> D -> E and you have a foreach loop over the list, say foreach(e; myList) { ... }. Now suppose you have iterated to C. So e is C. Now you remove C from the list. Following the SList code in Phobos, it will result in this list: A -> C -> D -> E which is fine. But since e is C, e.next is now null, so your foreach loop will terminate prematurely. No iterators are invalidated, that's true; any existing references to A, B, C, D, or E will remain valid. However, their connections with each other may change when you modify the container. So still, changing stuff in a container while iterating over it has quirky side-effects. You cannot assume the iteration will go exactly from A to E unless you iterate over a copy of references to the elements. Let's take this further. Let's say you're in the middle of your foreach loop, and e is C. Let's say for whatever reason you decide to remove the range B..D. The resulting list would now be: A -> E But since e is C, it is now pointing to a removed section of the list: B -> C -> D ^ e So e.next is D, and the next iteration of the loop will be on D (which no longer exists in the container!). And after that, it stops, because D is no longer connected to E. So again, your iteration has "non-intuitive" behaviour. All hope is not lost, though. There *is* a way to guarantee "intuitive" iteration over the container while still being able to delete elements during the iteration. This is possible if you impose the restriction that you can only remove the *current* element being iterated over. That is, you're not allowed to remove arbitrary elements from the container (or add new elements while you're at it, or modify the container in any other way). Here's how: Element e = list.front; while (e) { if (wantToRemove(e)) { auto tmp = e.next; list.remove(e); e = tmp; } else e = e.next; } Unfortunately, there is no nice syntax for this, and you have to depend on the implementation details of the container to get it right. One way to make it nicer is for the container to provide opApply, so that the dirty details of what to do while removing the current element is abstracted away: struct MyList { ... int opApply(scope int delegate(ref Element, out bool removeMe) dg) { Element e = list.front; while (e) { bool removeMe = false; int ret; if ((ret = dg(e, removeMe)) != 0) return ret; if (removeMe) { auto tmp = e.next; this.remove(e); e = tmp; } else e = e.next; } } } Then user code won't have to worry about the implementation details: void main() { MyList l = ...; foreach (ref e, ref removeMe; l) { removeMe = wantToRemove(e); } } This is, in effect, what DCollections does. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 05 2013
http://d.puremagic.com/issues/show_bug.cgi?id=10821 Argh, I hate bugzilla not having an edit function... the first example in my previous comment, after removing C, should have the list: A -> B -> D -> E not A -> C -> D -> E Sorry for the confusion. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 05 2013