www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - associative array: unexpected results after static initialization

reply kdevel <kdevel vogtner.de> writes:
This program

```
void aa_stat(T, U) (T[U] aa)
{
    import std.stdio;
    writefln ("aa        : %s", aa);
    writefln ("aa.length : %d", aa.length);
    writefln ("aa.keys   : %s", aa.keys);
    writefln ("aa.values : %s", aa.values);
    foreach (k, v; aa)
       writeln (k, ": ", v);
}

void main ()
{
    string[int] aa = [
       0: "null",
       0: "eins"
    ];
    aa.aa_stat;
}
```

produces this output:

    aa        : [0:"eins", ]
    aa.length : 2
    aa.keys   : [0, 32767]
    aa.values : ["eins", ""]
    0: eins

I would have expected either a compile time or a runtime error or 
this result:

    aa        : [0:"eins"]
    aa.length : 1
    aa.keys   : [0]
    aa.values : ["eins"]
    0: eins

What's happening here?
Nov 30
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 30, 2017 at 11:50:13PM +0000, kdevel via Digitalmars-d-learn
wrote:
 This program
 
 ``` void aa_stat(T, U) (T[U] aa) { import std.stdio; writefln ("aa
 : %s", aa); writefln ("aa.length : %d", aa.length); writefln ("aa.keys
 : %s", aa.keys); writefln ("aa.values : %s", aa.values); foreach (k,
 v; aa) writeln (k, ": ", v); }
 
 void main () { string[int] aa = [ 0: "null", 0: "eins" ]; aa.aa_stat;
 } ```
 
 produces this output:
 
    aa        : [0:"eins", ] aa.length : 2 aa.keys   : [0, 32767]
    aa.values : ["eins", ""] 0: eins
[...] This looks like a bug in the implementation of AA literals. Please file a bug at http://issues.dlang.org/. --T
Nov 30
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Hmm, which compiler are you using, and which version?  I cannot
reproduce this bug with DMD git head.  It may be have fixed since the
release you're using?


--T
Nov 30
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 30, 2017 at 03:57:37PM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
 Hmm, which compiler are you using, and which version?  I cannot
 reproduce this bug with DMD git head.  It may be have fixed since the
 release you're using?
[...] Sorry, I was testing the wrong code. This problem does still happen on DMD git head. Looking into the code right now... Please still file the bug if you haven't already. T -- "How are you doing?" "Doing what?"
Nov 30
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Seems like this problem is already known:

	https://issues.dlang.org/show_bug.cgi?id=15290

Here's the fix:

	https://github.com/dlang/druntime/pull/1980


--T
Nov 30
next sibling parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:

 	https://github.com/dlang/druntime/pull/1980
Great. By the way: It it true, that there cannot be more than 2^32 keys in an associative array (due to the use of uint)?
Dec 01
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/1/17 11:01 AM, kdevel wrote:
 On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:

     https://github.com/dlang/druntime/pull/1980
Great. By the way: It it true, that there cannot be more than 2^32 keys in an associative array (due to the use of uint)?
Yes, and technically even less because there is a requirement to grow the array when it's 4/5 full. But such a thing is easily fixed if we need to. Note that due to the way it's currently implemented (and you aren't going to get much smaller than this), each key takes 16 bytes (bucket ptr + hash) + at least 16 bytes per key/value pair (depending on what's stored there). So you are talking minimum 4GB * 32 = 128GB RAM required if you wanted to have such an AA! -Steve
Dec 01
prev sibling parent reply kdevel <kdevel vogtner.de> writes:
On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:

 	https://github.com/dlang/druntime/pull/1980
And wouldn't it be reasonable to add assert(aa.values == [ "b" ]); to the unittest?
Dec 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 01, 2017 at 04:06:50PM +0000, kdevel via Digitalmars-d-learn wrote:
 On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:
 
 	https://github.com/dlang/druntime/pull/1980
And wouldn't it be reasonable to add assert(aa.values == [ "b" ]); to the unittest?
I did think about adding that, but then I didn't want the unittest to dictate which value should take precedence when there are duplicate keys, since that is currently implementation-dependent. At the very least, it merits further discussion. T -- Never trust an operating system you don't have source for! -- Martin Schulze
Dec 01
next sibling parent kdevel <kdevel vogtner.de> writes:
On Friday, 1 December 2017 at 16:23:33 UTC, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 04:06:50PM +0000, kdevel via 
 Digitalmars-d-learn wrote:
 On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:
 
 	https://github.com/dlang/druntime/pull/1980
And wouldn't it be reasonable to add assert(aa.values == [ "b" ]); to the unittest?
I did think about adding that, but then I didn't want the unittest to dictate which value should take precedence when there are duplicate keys, since that is currently implementation-dependent. At the very least, it merits further discussion.
I personally would have preferred a run time or even a compile-time error in the case of my orginal example. That would support the "fail fast" philosohphy. IMHO it makes no sense at all to initialize an AA in such a way. The way the initialization works hides copy-and-paste errors like the one of my example code.
Dec 01
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/1/17 11:23 AM, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 04:06:50PM +0000, kdevel via Digitalmars-d-learn wrote:
 On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:

 	https://github.com/dlang/druntime/pull/1980
And wouldn't it be reasonable to add assert(aa.values == [ "b" ]); to the unittest?
I did think about adding that, but then I didn't want the unittest to dictate which value should take precedence when there are duplicate keys, since that is currently implementation-dependent. At the very least, it merits further discussion.
1. If there is going to be an "implementation defined" ambiguity, then nobody should ever use it, and it should be an error. This would mean a runtime error, since the keys may be runtime values. 2. I can think of possible (and questionable) reasons one may put in duplicate keys in an AA, like if they are defined separately (i.e. you use 2 symbols as keys that are sometimes the same, sometimes different, and the result is some defined order) 3. I can't think of a reason why anyone would implement the AA filling algorithm other than foreach-ing over the keys and values. So I would say we should define that the last key/value combo takes precedence, and it would be good to ensure that's the case with a test. -Steve
Dec 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 01, 2017 at 11:59:08AM -0500, Steven Schveighoffer via
Digitalmars-d-learn wrote:
 On 12/1/17 11:23 AM, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 04:06:50PM +0000, kdevel via Digitalmars-d-learn wrote:
 On Friday, 1 December 2017 at 00:42:00 UTC, H. S. Teoh wrote:
 Here's the fix:
 
 	https://github.com/dlang/druntime/pull/1980
And wouldn't it be reasonable to add assert(aa.values == [ "b" ]); to the unittest?
I did think about adding that, but then I didn't want the unittest to dictate which value should take precedence when there are duplicate keys, since that is currently implementation-dependent. At the very least, it merits further discussion.
1. If there is going to be an "implementation defined" ambiguity, then nobody should ever use it, and it should be an error. This would mean a runtime error, since the keys may be runtime values. 2. I can think of possible (and questionable) reasons one may put in duplicate keys in an AA, like if they are defined separately (i.e. you use 2 symbols as keys that are sometimes the same, sometimes different, and the result is some defined order)
[...] There is at least 1 use case in the bugzilla issue that justifies AA literals with (possibly) duplicated keys: https://issues.dlang.org/show_bug.cgi?id=15290#c1 Code snippet: ------- foreach (i; 0..10) foreach (j; 0..10) if (pred2 (i, j)) foreach (k; 0..10) if (pred3 (i, j, k)) ... and maybe more ... { auto set = [i: true, j: true; k: true]; if (set.length == 3) { ... we found distinct i, j, k, ... } } ------- In this case, the order doesn't matter.
 3. I can't think of a reason why anyone would implement the AA filling
 algorithm other than foreach-ing over the keys and values.
 
 So I would say we should define that the last key/value combo takes
 precedence, and it would be good to ensure that's the case with a
 test.
Sure, but that means a spec update besides the unittest addition. T -- "The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."
Dec 01
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 12/1/17 12:02 PM, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 11:59:08AM -0500, Steven Schveighoffer via
Digitalmars-d-learn wrote:
 So I would say we should define that the last key/value combo takes
 precedence, and it would be good to ensure that's the case with a
 test.
Sure, but that means a spec update besides the unittest addition.
I'm not sure it even needs a spec "update", as there is no spec for it :) But having the unit test at least ensures that someone implementing something different (though I don't ever see it happening) will notice the test. As of now, someone can be depending on this behavior, and then such an implementation rewrite could get through and break their code. I'll write a unit test, shouldn't be too hard. -Steve
Dec 01
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 01, 2017 at 01:14:14PM -0500, Steven Schveighoffer via
Digitalmars-d-learn wrote:
 On 12/1/17 12:02 PM, H. S. Teoh wrote:
 On Fri, Dec 01, 2017 at 11:59:08AM -0500, Steven Schveighoffer via
Digitalmars-d-learn wrote:
 So I would say we should define that the last key/value combo
 takes precedence, and it would be good to ensure that's the case
 with a test.
Sure, but that means a spec update besides the unittest addition.
I'm not sure it even needs a spec "update", as there is no spec for it :)
What I meant is, the AA spec needs to be updated to specify this behaviour. :-) Otherwise, it opens up the chance of somebody implementing an alternative D compiler (let's say, SDC, just for an actual example) that exhibits a different behaviour and breaking user code.
 But having the unit test at least ensures that someone implementing
 something different (though I don't ever see it happening) will notice
 the test.
 
 As of now, someone can be depending on this behavior, and then such an
 implementation rewrite could get through and break their code.
 
 I'll write a unit test, shouldn't be too hard.
[...] Maybe I'll take a shot at updating the AA spec. T -- Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
Dec 01
prev sibling parent kdevel <kdevel vogtner.de> writes:
On Friday, 1 December 2017 at 17:02:48 UTC, H. S. Teoh wrote:
 [...]
 There is at least 1 use case in the bugzilla issue that 
 justifies AA literals with (possibly) duplicated keys:

 	https://issues.dlang.org/show_bug.cgi?id=15290#c1

 Code snippet:
 -------
 foreach (i; 0..10)
     foreach (j; 0..10) if (pred2 (i, j))
         foreach (k; 0..10) if (pred3 (i, j, k))
             ... and maybe more ...
         {
             auto set = [i: true, j: true; k: true];
             if (set.length == 3)
             {
                  ... we found distinct i, j, k, ...
             }
         }
 -------
Sure. this looks "idiomatic" but the keys are not constants. If the compiler only complained on compile-time duplicate keys it would not do so in this case. BTW: I was under the impression (never checked or profiled such code) that in the following code trans_AA and trans_switch are technically equivalent: ``` string trans_AA (string s) { immutable string[string] aa = [ "src1" : "repl1", "src2" : "repl2", "src3" : "repl3", ]; return aa[s]; } string trans_switch (string s) { switch (s) { case "src1": return "repl1"; case "src2": return "repl2"; case "src3": return "repl3"; // case "src3": return "repl3"; default: throw new Exception ("no key " ~ s); } } void main () { import std.stdio; import std.string; auto a = stdin.readln.chomp; a.trans_switch.writeln; a.trans_AA.writeln; } ``` If one uncomments the "src3" line in trans_switch the compiler complains with aa.d(17): Error: duplicate case "src3" in switch statement while a duplication in trans_AA goes unnoticed.
Dec 01