www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Structs as Keys for AAs

reply Q. Schroll <qs.il.paperinik gmail.com> writes:
In [1] it says at 5. that

 For this reason, and for legacy reasons, an associative array 
 key is not allowed to define a specialized opCmp, but omit a 
 specialized opEquals. This restriction may be removed in future 
 versions of D.
I'm not completely sure what that means. Does "specialized" mean "user-defined"? I just challenged the spec and found an error by the way: [2]. Apart from that, it compiles. For 5. I used struct Key { int id; string tag; int opCmp(const Key other) const { return this.id < other.id ? -1 : this.id == other.id ? 0 : 1; } bool opEquals(ref const Key other) const safe pure nothrow { return this.id == other.id; } size_t toHash() const safe pure nothrow { return id; } } as a key type. To me the part "is not allowed to define a specialized opCmp" is clearly wrong, either a compiler bug or an error in the spec. Concerning opEquals and opCmp in general: Why isn't opEquals lowered to opCmp returning 0 if not present? [1] https://dlang.org/spec/hash-map.html#using_struct_as_key [2] https://github.com/dlang/dlang.org/pull/1861
Aug 09 2017
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Aug 09, 2017 at 11:31:44PM +0000, Q. Schroll via Digitalmars-d wrote:
 In [1] it says at 5. that
 
 For this reason, and for legacy reasons, an associative array key is
 not allowed to define a specialized opCmp, but omit a specialized
 opEquals.  This restriction may be removed in future versions of D.
I'm not completely sure what that means. Does "specialized" mean "user-defined"? I just challenged the spec and found an error by the way: [2]. Apart from that, it compiles.
The main issue is that the current AA implementation uses TypeInfo's .equal method for comparing keys. (This dates from before D has templates, and also because at the time, the AA implementation was too tightly bound to the compiler and couldn't be easily changed.) The way TypeInfo works, .equal will not do the right thing if you only define opCmp without also defining opEquals. This is why the spec was updated to say that you must essentially always define opEquals if you define opCmp. Otherwise, you may get strange or wrong behaviour when using the type as an AA key.
 For 5. I used
 
 struct Key
 {
     int id;
     string tag;
 
     int opCmp(const Key other) const
     {
         return this.id < other.id ? -1 : this.id == other.id ?  0 : 1;
     }
 
     bool opEquals(ref const Key other) const  safe pure nothrow
     {
         return this.id == other.id;
     }
 
     size_t toHash() const  safe pure nothrow
     {
         return id;
     }
 }
 
 as a key type. To me the part "is not allowed to define a specialized
 opCmp" is clearly wrong, either a compiler bug or an error in the
 spec.
Did you test the result at runtime with a real AA? A telling sign would be if you added the same Key twice, then iterate over the AA to print out the entries. You may get the same key more than once, which would be indicative of this problem. Just because the compiler accepts the code, doesn't necessarily mean it's right. (Of course, it's arguably a bug that the compiler accepts it in the first place. But the AA implementation in the compiler is a bit fragile to handle at the moment, I'm not sure if it can be easily fixed without causing other problems. We'll have to wait for Martin's library AA to get merged...)
 Concerning opEquals and opCmp in general: Why isn't opEquals lowered to
 opCmp returning 0 if not present?
[...] Mainly (1) for efficiency, and also, (2) to paraphrase Andrei, to allow partial orders that may not be linear. (1) because some types may have expensive computations to determine whether something is bigger or smaller, but trivial to determine equality. (2) because conceivably you can implement the subset relation in opCmp with a type that represents a set, so opCmp()==0 could mean the sets are equal, OR it could mean the sets are not subsets of each other (they are either disjoint, or have elements not in common with each other). Then you'd need opEquals() to tell you whether or not they are equal. For a less exotic example, consider a custom floating-point type where NAN < NAN and NAN > NAN are both false, so opCmp()==0 is the only reasonable return value, yet opEquals() also == 0 because NAN != NAN. T -- INTEL = Only half of "intelligence".
Aug 09 2017
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 8/9/17 7:31 PM, Q. Schroll wrote:
 In [1] it says at 5. that
 
 For this reason, and for legacy reasons, an associative array key is 
 not allowed to define a specialized opCmp, but omit a specialized 
 opEquals. This restriction may be removed in future versions of D.
I'm not completely sure what that means. Does "specialized" mean "user-defined"?
Yes, that's what it means.
 I just challenged the spec and found an error by the 
 way: [2]. Apart from that, it compiles.
 
 For 5. I used
 
 struct Key
 {
      int id;
      string tag;
 
      int opCmp(const Key other) const
      {
          return this.id < other.id ? -1 : this.id == other.id ?  0 : 1;
      }
 
      bool opEquals(ref const Key other) const  safe pure nothrow
      {
          return this.id == other.id;
      }
 
      size_t toHash() const  safe pure nothrow
      {
          return id;
      }
 }
 
 as a key type. To me the part "is not allowed to define a specialized 
 opCmp" is clearly wrong, either a compiler bug or an error in the spec.
You need to read it with the other part "but omit a specialized opEquals". In other words, you must implement opEquals if you implement opCmp. The reason is simple, because opEquals defaults to a comparison of all fields, and most likely if you are defining opCmp, it won't match the default opEquals. opHash uses opEquals, but does not use opCmp. Therefore, if this restriction wasn't in place, then you may just define opCmp thinking the AA would use it. Note that in your example, your opEquals is more efficient than opCmp == 0. This is the main reason opEquals is defined differently than opCmp.
 Concerning opEquals and opCmp in general: Why isn't opEquals lowered to 
 opCmp returning 0 if not present?
It really should IMO, but that's not how it works. I'm almost positive there's an enhancement request on this somewhere. -Steve
Aug 09 2017