www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - associative arrays ate my lunch (and my support)

reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
So, I've been happily thinking this whole time that associative arrays (AA)
in D are rather cool; being built into the language and so on ... then today
I spent several hours tracking down what seemed to be a truly bizarre bug,
which turned out to be caused by the way D associative arrays operate.
Here's what happened:

I want to check if there's an entry in the AA; use it if so, otherwise
continue. The documentation say to use this construct for checking:

class X
{
     MyClass [char[]] aa;

    void test ()
    {
         if ("hello" in aa)
             ....
     }
}

and then (presumably) you have to subsequently retrieve the entry as
follows:

    void test ()
    {
         if ("hello" in aa)
            {
            MyClass x = aa ["hello"];
            // do something with x ...
            }
     }

Of course, you've just performed two full lookups instead of one: first you
look to see if the entry exists, then you lookup again to retrieve it. Being
the freak that I am, I don't want that overhead. So ... I proceeded with
this instead:

    void test ()
    {
         MyClass x = aa ["hello"];

         if (x)
            {
            // do something with x ...
            }
     }

Much better! Right?

Well, apparently not. Now clearly I did not assign anything to the AA in the
above code, but an entry is created regardless. That is, AA entries are
created as a /side-effect/ of doing that lookup. Now, I've used many
hash-table implementations before, but this behavior is truly a first.
Hands-up please, all those who would have expected this ...

This sneaky behavior really wasted a whole lot of effort (I can almost hear
Walter laughing in the background :-)

That aside, one is forced to do a double lookup in such cases. This is not
good at all, and I simply cannot fathom what benefit there is to such an
approach. In my particular application, the double-lookup (and, of course,
double hash-computation) is entirely unacceptable. Lookup is the most common
operation on a hash-table, by far. It has to be efficient, and this
"side-effect" totally flies in the face of reason. If it were a Phobos
module it would have been replaced long ago.

For the moment I have to assume the behavior is "by design", so fair-warning
to everyone.

(Ben? Do you have a proper hash-table I can use please? If not, I'll do an
FNV port; AA's be damned. Out! Damn spot! Out, I say!)
Jul 26 2004
next sibling parent Derek Parnell <derek psych.ward> writes:
On Mon, 26 Jul 2004 14:48:05 -0700, Kris wrote:

 class X
 {
      MyClass [char[]] aa;

It does seem a tad inefficient. Here is some simple code to demonstrate the effect... <code> import std.stdio; void main() { long[char[]] aa; long a = 9; long b = 9; a = aa["hello"]; aa["hello"] = 2; b = aa["hello"]; writef("%d %d\n", a,b); } </code> We sort of need a conditional fetch mechanism that maybe throws an exception if the key is not in the AA. try { a = aa[?"hello"]; } catch (AA_KeyMissing) { . . .}; -- Derek Melbourne, Australia 27/Jul/04 10:57:11 AM
Jul 26 2004
prev sibling next sibling parent reply "Michael Jørgensen" <ingen ukendt.dk> writes:
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:ce3u1t$1h7i$1 digitaldaemon.com...

     void test ()
     {
          MyClass x = aa ["hello"];

          if (x)
             {
             // do something with x ...
             }
      }

 Much better! Right?

 Well, apparently not. Now clearly I did not assign anything to the AA in

 above code, but an entry is created regardless. That is, AA entries are
 created as a /side-effect/ of doing that lookup. Now, I've used many
 hash-table implementations before, but this behavior is truly a first.
 Hands-up please, all those who would have expected this ...

Well, I don't know about D [Just arrived here], but the same behaviour is present in C++: The STL map<> construct provides the same functionality (side-effect). However, in the STL there is also a find() method that returns an interator (a pointer) to the element - provided it exists - or to a fictive end-of-array: MyClass *xx = aa.find("hello"); if (xx != aa.end()) { // do something with *xx. } Presumably, something similar is possible in D. -Michael.
Jul 26 2004
parent Florian <Florian_member pathlink.com> writes:
     void test ()
     {
          MyClass x = aa ["hello"];

          if (x)
             {
             // do something with x ...
             }
      }

 Much better! Right?

 Well, apparently not. Now clearly I did not assign anything to the AA in

 above code, but an entry is created regardless. That is, AA entries are
 created as a /side-effect/ of doing that lookup. Now, I've used many
 hash-table implementations before, but this behavior is truly a first.
 Hands-up please, all those who would have expected this ...

Well, I don't know about D [Just arrived here], but the same behaviour is present in C++: The STL map<> construct provides the same functionality (side-effect). However, in the STL there is also a find() method that returns an interator (a pointer) to the element - provided it exists - or to a fictive end-of-array:

You may also note that in c++ stl (at least microsoft's implementation of it), the [] opperator on the hash_map does not only have this side effect, but also is not doing the single look-up it seems to do, but actualy a double lookup in some cases : There is always one lookup to find the element, but if it is not there, a second lookup starts to find where it should be inserted. I don't know if D works the same way, but it wouldn't be surprising to me.
Jul 27 2004
prev sibling next sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <ce3u1t$1h7i$1 digitaldaemon.com>, Kris says...

Now, I've used many
hash-table implementations before, but this behavior is truly a first.

No, C++'s std::map got there first.
Hands-up please, all those who would have expected this ...

Me. Because I'm used to std::map. Jill
Jul 26 2004
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Kris wrote:

 So, I've been happily thinking this whole time that associative arrays (AA)
 in D are rather cool; being built into the language and so on ... then today
 I spent several hours tracking down what seemed to be a truly bizarre bug,

Which bug? The one that looking up a value adds it, or that your program is slow? <snip>
 and then (presumably) you have to subsequently retrieve the entry as
 follows:
 
     void test ()
     {
          if ("hello" in aa)
             {
             MyClass x = aa ["hello"];
             // do something with x ...
             }
      }

 That aside, one is forced to do a double lookup in such cases. This is not
 good at all, and I simply cannot fathom what benefit there is to such an
 approach.

The AA implementation could be improved to cache lookups. A simple idea would be to keep a note of the last key to be looked up, its hash and the memory address (if there) of the value. The next lookup (or even assignment) would check the key against the one in the cache (could get away with === rather than == for efficiency) and use the cached information to repeat the lookup. Even better, the double lookup itself could be optimised to a single lookup. <snip>
 For the moment I have to assume the behavior is "by design", so fair-warning
 to everyone.

I think the idea is that you can nest AAs to make a sparse matrix or something.... int[int][int] qwert; qwert[13][105] = 42;
 (Ben? Do you have a proper hash-table I can use please? If not, I'll do an
 FNV port; AA's be damned. Out! Damn spot! Out, I say!)

Depends on what you mean by 'proper'. But mine happens to implement lookup without adding.... http://smjg.port5.com/pr/d/ Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Jul 27 2004
prev sibling parent James McComb <alan jamesmccomb.id.au> writes:
Kris wrote:

 Well, apparently not. Now clearly I did not assign anything to the AA in the
 above code, but an entry is created regardless. That is, AA entries are
 created as a /side-effect/ of doing that lookup.

Read operations with side-effects raise issues of exception safety. For example, x = aa["hello"] may throw an exception if creates the "hello" entry. Then the value of aa.length would not be predictable: It could equal the old .length or .length + 1 depending upon whether the exception was thrown before of after the .length was incremented. I know that std::map also behaves this way. Can anyone confirm whether std::map's [] operator suffers from the same issue? James McComb
Jul 27 2004