www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 4179] New: Deleting items from an associative array iterated over

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179

           Summary: Deleting items from an associative array iterated over
           Product: D
           Version: future
          Platform: x86
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: druntime
        AssignedTo: sean invisibleduck.org
        ReportedBy: bearophile_hugs eml.cc


--- Comment #0 from bearophile_hugs eml.cc 2010-05-12 13:00:32 PDT ---
This D2 program deletes items from an associative array while it is iterated
over with a foreach:


import std.stdio: writeln;
void main() {
    auto aa = [1: "one", 2: "two", 3: "three"];
    int count;
    foreach (k, v; aa) {
        count++;
        if (v == "two")
            aa.remove(k);
    }

    writeln(aa);
    writeln("count:", count);
}


The output, with DMD v.2.045:

[1:one,3:three]
count:104


The output shows this code deletes the item correctly from the associative
array, but I am not so sure it can work in general. In general this can be
unsafe (buggy, bad, wrong) code.

And the value of 'count' at the end is too much large, it can be a bug in the
associative array iteration.

----------------------

This is an alternative version of that program:


import std.stdio: writeln;
void main() {
    auto aa = [1: "one", 2: "two", 3: "three"];
    foreach (k; aa.byKey()) {
        if (aa[k] == "two")
            aa.remove(k);
    }

    writeln(aa);
}


When compiled with DMD v.2.045 it prints at run-time:
core.exception.RangeError test(5): Range violation

----------------------

Python avoids that probable programmer error (iterating over an associative
array being modified) and troubles with an esplicit safety and error message:


aa = {1: "one", 2: "two", 3: "three"}
for k in aa:
    if aa[k] == "two":
        del aa[k]


Python 2.6.5 prints:

Traceback (most recent call last):
  File "...\test.py", line 2, in <module>
    for k in aa:
RuntimeError: dictionary changed size during iteration


To do this they set a flag when the Python dictionary is iterated over, and
reset it later. Probably there is a way to circumvent this safety protection,
but in most cases it is enough to avoid bugs, especially in simple code written
by novices (that is the code that more often presents this error).

Introducing the same safety measure in D can be useful, especially for code
written by novices, safety belts are good, but it can slow down a little the
deletion of items from the associative array. To totally avoid this performance
problem this safety can be disabled when the code is compiled in release mode.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement


--- Comment #1 from bearophile_hugs eml.cc 2010-05-12 13:01:28 PDT ---
It's an enhancement, mostly.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 12 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179



--- Comment #2 from bearophile_hugs eml.cc 2010-05-26 09:49:51 PDT ---
To implement this idea the foreach() has to change a little, in nonrelease mode
it has to set and later reset a boolean flag inside the container being
iterated. So this boolean value if present must have a standard name, that has
to be shown in the std.container page about the standard API of the container.
If implemented this idea lessens a little the need for the stable ("soft")
methods.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
May 26 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179


Vladimir <thecybershadow gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thecybershadow gmail.com
           Severity|enhancement                 |critical


--- Comment #3 from Vladimir <thecybershadow gmail.com> 2011-03-29 04:08:16 PDT
---
Considering that this bug breaks safety of SafeD programs, it is definitely
much more severe than an "enhancement".

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com


--- Comment #4 from Jonathan M Davis <jmdavisProg gmx.com> 2011-03-29 08:26:59
PDT ---
SafeD refers to memory safety. The current behavior is completely memory safe.
The same could happen with any container whose iterators/ranges are invalidated
by the removal of an element from the container. It would be nice if removing
elements from an AA while looping over it worked, but it's not at all
surprising that it doesn't, and it may not be reasonable to require that it
does. And it's easy enough to get around this. You just save a list of the
items that you want to remove, and then you remove them after you're done
iterating over the AA.

There's nothing unsafe about the current behavior. It may be a bit bug-prone
for the unwary, but this sort of things is normal when dealing with iterators
or ranges. If it were unsafe, it wouldn't be throwing a RangeViolation but
would keep trying to iterate using invalid data and do who-knows-what.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179



--- Comment #5 from Vladimir <thecybershadow gmail.com> 2011-03-29 08:29:34 PDT
---
Eh? Is getting a segfault because the data structure has been modified
considered "safe"?

The workaround you suggested doesn't work when removing happens several calls
deep in the call hierarchy.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179



--- Comment #6 from Jonathan M Davis <jmdavisProg gmx.com> 2011-03-29 08:39:23
PDT ---
segfault? I didn't think that anyone was getting a segfault from this. If
there's a segfault, then that's obviously bad. I'm not sure that it's
technically unsafe though. That would depend on whether it could possibly
corrupt memory. If it's guarantee to segfault and never corrupt memory, then I
don't believe that it's technically unsafe and therefore would not be a problem
for SafeD. Still, one would hope that it could do better than a segfault. I
thought that it was throwing an exception though.

If it's segfaulting, that would put increased pressure on either flagging it as
a warning or fixing it so that it's not necessary (but that may not be
reasonable if you want AA's to be reasonably efficient - I don't know how
iterating over them is implemented). However, if it's a segfault with no risk
of memory corruption, then it's still perfectly safe, just annoying.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com


--- Comment #7 from Steven Schveighoffer <schveiguy yahoo.com> 2011-03-29
09:47:53 PDT ---
It's unsafe.  The issue is that if you remove the element being iterated, it
goes back to the memory pool.

Here is the code that removes an element:

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L373

Note on line 402, the unused node is free'd using gc_free.

The only way I know to make it safe is letting the iteration routine remove the
current element at the right time .  Dcollections allows this.

The only way to solve this other than that is to keep a modification id, and
throw an error if the iteration discovers the array structure has been modified
during iteration.  A compiler error or warning will not help, because the
compiler cannot know whether a called function is going to modify the AA, or if
two AA references point to the same object.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179



--- Comment #8 from Vladimir <thecybershadow gmail.com> 2011-03-29 09:58:07 PDT
---
There are more possible ways to make it safe than this.

One way is to have a list of current iterations associated with each AA. Any
time an item is removed, any current iterations are also updated. The nodes of
the list can be stored on the stack of aaApply or whatever function does the
"foreach" for AAs. It can be as simple as one pointer in the AA structure, and
a structure with two fields ("current iteration item" / "next list node") on
the stack - this should allow everything to "just work" and have a small impact
on performance.

Alternatively, when items are removed from an AA they can be left for the GC to
collect rather than deleted explicitly, and their "next" pointer can be set to
a magic value indicating that they were deleted (e.g. cast(void*)1).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 29 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=4179



--- Comment #9 from bearophile_hugs eml.cc 2013-07-20 04:05:58 PDT ---
(In reply to comment #8)
 There are more possible ways to make it safe than this.
 
 One way is to have a list of current iterations associated with each AA. Any
 time an item is removed, any current iterations are also updated. The nodes of
 the list can be stored on the stack of aaApply or whatever function does the
 "foreach" for AAs. It can be as simple as one pointer in the AA structure, and
 a structure with two fields ("current iteration item" / "next list node") on
 the stack - this should allow everything to "just work" and have a small impact
 on performance.
 
 Alternatively, when items are removed from an AA they can be left for the GC to
 collect rather than deleted explicitly, and their "next" pointer can be set to
 a magic value indicating that they were deleted (e.g. cast(void*)1).
My suggestion was for the foreach to set a boolean flag inside the associative array, and reset it when the foreach ends. Then the AA.delete should look for this flag and throw an exception if it's set. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 20 2013