www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 12218] New: [AA] inserting into associative array invalidates foreach iteration

reply d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218

           Summary: [AA] inserting into associative array invalidates
                    foreach iteration
           Product: D
           Version: D2
          Platform: All
               URL: http://dlang.org/hash-map.html
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: gassa mail.ru



I have this piece of code.  It creates an associative array of integers from 1
to 124, inclusive, where a[i] = i.  It then proceeds by iterating over that
array and adding (k+1, v+1) pair to the same array for each (k, v) pair we see.
 This breaks in a reproducible manner.  I understand this was perhaps not
intended to work.  Even if so, there are some points of concern: memory safety,
lack of documentation and the possibility for improvement.

-----
import std.exception;
import std.stdio;

/*  safe */ void main ()
{
    int lo = 1;
    int hi = 125;
    writeln ("start ", lo, " ", hi);
    int [int] a;
    foreach (i; lo..hi)
    {
        a[i] = i;
    }
    foreach (k, v; a)
    {
        writeln ("element ", k, " ", v);
        enforce (k == v);
        a[k + 1] = v + 1;
    }
}
-----

The typical output on Win32 with DMD, GDC and LDC is:
-----
start 1 125
element 31 31
element 62 62
element 93 93
element 124 124
element 273870408 2048
-----
And then, the enforcement fails.

The first number 273870408 on the last line of output can be 1404296, 3616704
or some other number.  The second number seems to always be 2048.  In short, a
(key, value) pair of garbage values is added into the array.  What happens is
perhaps reallocation of the array when its size reaches around 127 or 128
elements, and the foreach loop does not get to learn the new location of the
data.

The points that concern me are memory safety, lack of documentation and the
possibility for improvement.

1.  safe?

This is not exactly memory safe: the program ends up using uninitialized
memory.  The junk number (like 273870408) appears in the array out of nowhere. 
Yet, the compiler permits writing " safe void main ()" once we comment out the
calls to writeln.  Perhaps this piece of uninitialized memory could well appear
under the "ptr" field of some dynamic array if the declaration was slightly
more complex than "int [int]", and that would lead to memory corruption.

2. Undocumented?

There is no warning on the associative arrays documentation page
(http://dlang.org/hash-map.html).  There are a few bug reports and discussions
stating that one can not safely *remove* elements while iterating on an
associative array.  Yet, I didn't find any report which stated the same
problems when *adding* items to an associative array.

Perhaps adding while iterating is not a much used feature: there is no
guarantee that the newly added (key, value) pairs will or will not be visited
in the foreach loop.  Still, it can be useful in some situations where the
insertion of new elements is somehow idempotent: for example, we add no more
than one single special key in the course of the foreach loop.

Even if such usage is unintended, a friendly way to handle it would be a
warning on the documentation page, and maybe a compile-time error in simply
visible cases, like the example above.

3. Possible to improve?

The C++ STL documentation for set and map explicitly states that (an example
quote from http://www.sgi.com/tech/stl/set.html):

-----
Set has the important property that inserting a new element into a set does not
invalidate iterators that point to existing elements. Erasing an element from a
set also does not invalidate any iterators, except, of course, for iterators
that actually point to the element that is being erased.
-----

The implementation of hash_set by Microsoft also boasts a similar feature
(http://msdn.microsoft.com/en-us/library/bb398039.aspx):

-----
Moreover, inserting an element invalidates no iterators, and removing an
element invalidates only those iterators which point at the removed element.
-----

These quotes suggest that it might be possible to maintain correctness of such
foreach loops in D, too, while preserving efficiency.

Ivan Kazmenko.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 21 2014
next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


Ivan Kazmenko <gassa mail.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://d.puremagic.com/iss
                   |                            |ues/show_bug.cgi?id=4179,
                   |                            |https://d.puremagic.com/iss
                   |                            |ues/show_bug.cgi?id=10876



This report is a reduction of the case that bit me during a programming
contest: Google Code Jam 2013, Round 1B, problem C.

The problem statement:
http://code.google.com/codejam/contest/2434486/dashboard#s=p2

My faulty solution:
code.google.com/codejam/contest/scoreboard/do?cmd=GetSourceCode&contest=2434486&problem=2705486&io_set_id=1&username=Gassa

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 21 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218




Created an attachment (id=1333)
program to find a reduced test case

Well, 124 numbers sound like much, but the test case presented above is the
most reduced example I came up with.  I attached the program used to find such
a test case.  Basically, it just adds an interval [lo; hi] into an associative
array and then iterates on the array.  The second similar array is also created
to make the breakage more likely because of data relocation, but it turns out
to be unnecessary.

On the bright side, the original broken program processes several megabytes of
text, so some reduction did in fact take place.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218




Created an attachment (id=1334)
reduced test case printing diagnostic messages

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218




Created an attachment (id=1335)
reduced test case as a safe function - which it is not!

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


Ivan Kazmenko <gassa mail.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

          mime type|                            |


-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


Ivan Kazmenko <gassa mail.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

          mime type|                            |


-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


Ivan Kazmenko <gassa mail.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

          mime type|                            |


-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling next sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


yebblies <yebblies gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yebblies gmail.com



I can confirm this was not intended to work, and it most likely never will.  It
should be possible for the compiler to detect this and give you an error.

-- 
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 28 2014
prev sibling parent d-bugmail puremagic.com writes:
https://d.puremagic.com/issues/show_bug.cgi?id=12218


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc




 I can confirm this was not intended to work, and it most likely never will.  It
 should be possible for the compiler to detect this and give you an error.
Yes, while fixing it in general is probably not possible, giving a good compile-time error is a good thing. -- Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 28 2014