www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AA key conversion woes

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
I'm having some major roadblocks with the AA implementation w.r.t. how
to store/convert AA key types correctly. I've been working on this for a
while now but still cannot find a satisfactory solution; hopefully y'all
can help.

First, in order to take advantage of the compiler's auto-conversion
magic (such as using literal [1,2,3] for ubyte[] keys, etc.) and to
avoid template bloat, we must take const(Key) as argument for all key
lookup methods. So Key must at least be stored as const. But since we
have to do this already, might as well do it right: Keys should be
stored as immutable.

The nice thing about this is that we can now *guarantee* that AA's won't
malfunction due to the user changing stored key values via an external
mutable reference.

The bad thing about this is how to deal with the various key types that
the user might try to pass in:

1) For value types like int, there's no problem: immutable(int)
interconverts freely with int via assignment, so we can store int keys
however we like, and we can always hand back unqualified ints to the
user. So if the user declares int[int], .keys will give int[], and if
the user declares const(int)[int], then .keys will give back
const(int)[]. No problem.

2) Where things get hairy is when non-trivial keys are stored. Take
arrays for example. If the user declares int[int[]], then we need an
.idup so that we can store the key as immutable(int)[] (or should that
be immutable(int[])?). But now if the user passes in an int[] key, we
will need to .idup it in order to store it. This is acceptable, but what
should .keys return?  If I were the user, I'd expect int[][], but since
we can't implicitly convert immutable(int)[] to int[], we need a .dup
for each key.  Which introduces unnecessary overhead, since for the most
part the user doesn't need to change the keys anyway. But handing back
immutable(int)[][] from .keys will break a lot of existing code.

3) With arrays, things are still not too bad because we have .dup and
.idup. But what about structs or classes? We do not have .idup if the
key type is specified as non-immutable, so how should the keys be
stored? (Besides, do we *want* to clone objects used as AA keys in the
first place?) And what type should .keys return?

4) What should be done if the key type is shared or inout? (I'm tempted
to say outright prohibit shared, but people may not like their code
breaking if they're actually using that in existing code.)

I'm tempted to propose that the language should be changed so that AA
keys are *always* immutable implicitly. That is, writing int[int] is
exactly the same as writing int[immutable(int)], and .keys will always
return immutable. This is the "correct" approach IMO, and existing code
should be fixed if this breaks them. This will simplify a lot of the
mess above (we can still support .idup for when the user wants to lookup
mutable keys, etc., but there will be no concessions for .keys to hand
back mutable keys -- .dup them yourself).

But I'm expecting some negative reactions to this hardline approach. :-)

But even then, I'm considering if .keys should return a mutable array if
the key is a value type (e.g., it's harmless for int[int].keys to return
int[] because the int's are a copy anyway, and this way user code won't
be unnecessarily straitjacketed to propagate immutable throughout).

I'd like to hear how people think this should work, so that the new AA
implementation will be more acceptable.


T

-- 
Never trust an operating system you don't have source for! -- Martin Schulze
Apr 17 2012
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 17.04.2012 22:41, H. S. Teoh wrote:
 I'm having some major roadblocks with the AA implementation w.r.t. how
 to store/convert AA key types correctly. I've been working on this for a
 while now but still cannot find a satisfactory solution; hopefully y'all
 can help.

 First, in order to take advantage of the compiler's auto-conversion
 magic (such as using literal [1,2,3] for ubyte[] keys, etc.) and to
 avoid template bloat, we must take const(Key) as argument for all key
 lookup methods. So Key must at least be stored as const. But since we
 have to do this already, might as well do it right: Keys should be
 stored as immutable.

 The nice thing about this is that we can now *guarantee* that AA's won't
 malfunction due to the user changing stored key values via an external
 mutable reference.

 The bad thing about this is how to deal with the various key types that
 the user might try to pass in:

 1) For value types like int, there's no problem: immutable(int)
 interconverts freely with int via assignment, so we can store int keys
 however we like, and we can always hand back unqualified ints to the
 user. So if the user declares int[int], .keys will give int[], and if
 the user declares const(int)[int], then .keys will give back
 const(int)[]. No problem.

 2) Where things get hairy is when non-trivial keys are stored. Take
 arrays for example. If the user declares int[int[]], then we need an
 .idup so that we can store the key as immutable(int)[] (or should that
 be immutable(int[])?). But now if the user passes in an int[] key, we
 will need to .idup it in order to store it. This is acceptable, but what
 should .keys return?  If I were the user, I'd expect int[][], but since
 we can't implicitly convert immutable(int)[] to int[], we need a .dup
 for each key.  Which introduces unnecessary overhead, since for the most
 part the user doesn't need to change the keys anyway. But handing back
 immutable(int)[][] from .keys will break a lot of existing code.

 3) With arrays, things are still not too bad because we have .dup and
 .idup. But what about structs or classes? We do not have .idup if the
 key type is specified as non-immutable, so how should the keys be
 stored? (Besides, do we *want* to clone objects used as AA keys in the
 first place?) And what type should .keys return?

 4) What should be done if the key type is shared or inout? (I'm tempted
 to say outright prohibit shared, but people may not like their code
 breaking if they're actually using that in existing code.)

 I'm tempted to propose that the language should be changed so that AA
 keys are *always* immutable implicitly. That is, writing int[int] is
 exactly the same as writing int[immutable(int)], and .keys will always
 return immutable. This is the "correct" approach IMO, and existing code
 should be fixed if this breaks them. This will simplify a lot of the
 mess above (we can still support .idup for when the user wants to lookup
 mutable keys, etc., but there will be no concessions for .keys to hand
 back mutable keys -- .dup them yourself).

 But I'm expecting some negative reactions to this hardline approach. :-)
+1 actually. Given H[Key] assoc; I just expect assoc.keys() to return array of immutable(Key). It most likely needs to be allocated.
 But even then, I'm considering if .keys should return a mutable array if
 the key is a value type (e.g., it's harmless for int[int].keys to return
 int[] because the int's are a copy anyway, and this way user code won't
 be unnecessarily straitjacketed to propagate immutable throughout).

 I'd like to hear how people think this should work, so that the new AA
 implementation will be more acceptable.


 T
-- Dmitry Olshansky
Apr 17 2012
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 I'm tempted to propose that the language should be changed
 so that AA keys are *always* immutable implicitly. That is,
 writing int[int] is exactly the same as writing
 int[immutable(int)],
See: http://d.puremagic.com/issues/show_bug.cgi?id=6253 Bye, bearophile
Apr 17 2012
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
 http://d.puremagic.com/issues/show_bug.cgi?id=6253
Maybe it needs a bit more explanation. It goes according to this idea: http://en.wikipedia.org/wiki/Principle_of_least_astonishment If I define: Foo[] a; I expect those Foo items to be mutable. If I see: int[Foo] I expect those Foo keys to be mutable. If I see: int[Foo] I expect those Foo keys to be mutable. If I see: int[immutable Foo] I expect those Foo keys to be immutable. If I see a int[Foo] and I get immutable Foo keys, I am astonished. Not doing what I am saying here will add another special case to D language. Avoiding many special cases is one the reasons I program in D instead of C++. Bye, bearophile
Apr 17 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Apr 17, 2012 at 10:13:49PM +0200, bearophile wrote:
http://d.puremagic.com/issues/show_bug.cgi?id=6253
Maybe it needs a bit more explanation. It goes according to this idea: http://en.wikipedia.org/wiki/Principle_of_least_astonishment If I define: Foo[] a; I expect those Foo items to be mutable. If I see: int[Foo] I expect those Foo keys to be mutable. If I see: int[Foo] I expect those Foo keys to be mutable. If I see: int[immutable Foo] I expect those Foo keys to be immutable. If I see a int[Foo] and I get immutable Foo keys, I am astonished. Not doing what I am saying here will add another special case to D language. Avoiding many special cases is one the reasons I program in D instead of C++.
[...] So you're basically saying that we should refuse all non-immutable keys in AA's? T -- Designer clothes: how to cover less by paying more.
Apr 17 2012
parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 So you're basically saying that we should refuse all 
 non-immutable keys in AA's?
Something like that. Bye, bearophile
Apr 17 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, April 17, 2012 14:17:38 H. S. Teoh wrote:
 So you're basically saying that we should refuse all non-immutable keys
 in AA's?
I would not be against enforcing that all key types be immutable or implicitly convertible to immutable. So, something like int[Foo] would be illegal unless Foo was a struct which could be implicitly converted to immutable. But there's certainly an argument for allowing non-immutable key types in the declaration and automatically tacking on immutable so that int[Foo] is legal, but is of course really int[immutable Foo]. Allowing int[Foo] has the benefit of brevity but with a cost in clarity. The fact that int[Foo] is currently allowed though is a definite argument against making it illegal. - Jonathan M Davis
Apr 17 2012
prev sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 17/04/2012 22:13, bearophile a écrit :
 http://d.puremagic.com/issues/show_bug.cgi?id=6253
If I see a int[Foo] and I get immutable Foo keys, I am astonished. Not doing what I am saying here will add another special case to D language. Avoiding many special cases is one the reasons I program in D instead of C++. Bye, bearophile
I agree, but if I had to choose, I'd rather be astonished than forced to have a program twice as slow because I need to duplicate all my keys for the compiler to be happy.
Apr 18 2012
prev sibling next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 04/17/2012 08:41 PM, H. S. Teoh wrote:
 I'm having some major roadblocks with the AA implementation w.r.t. how
 to store/convert AA key types correctly. I've been working on this for a
 while now but still cannot find a satisfactory solution; hopefully y'all
 can help.

 First, in order to take advantage of the compiler's auto-conversion
 magic (such as using literal [1,2,3] for ubyte[] keys, etc.) and to
 avoid template bloat, we must take const(Key) as argument for all key
 lookup methods. So Key must at least be stored as const. But since we
 have to do this already, might as well do it right: Keys should be
 stored as immutable.

 The nice thing about this is that we can now *guarantee* that AA's won't
 malfunction due to the user changing stored key values via an external
 mutable reference.

 The bad thing about this is how to deal with the various key types that
 the user might try to pass in:

 1) For value types like int, there's no problem: immutable(int)
 interconverts freely with int via assignment, so we can store int keys
 however we like, and we can always hand back unqualified ints to the
 user. So if the user declares int[int], .keys will give int[], and if
 the user declares const(int)[int], then .keys will give back
 const(int)[]. No problem.

 2) Where things get hairy is when non-trivial keys are stored. Take
 arrays for example. If the user declares int[int[]], then we need an
 .idup so that we can store the key as immutable(int)[] (or should that
 be immutable(int[])?). But now if the user passes in an int[] key, we
 will need to .idup it in order to store it. This is acceptable, but what
 should .keys return?  If I were the user, I'd expect int[][], but since
 we can't implicitly convert immutable(int)[] to int[], we need a .dup
 for each key.  Which introduces unnecessary overhead, since for the most
 part the user doesn't need to change the keys anyway. But handing back
 immutable(int)[][] from .keys will break a lot of existing code.
How? This is the current behavior: int[int[]] aa; static assert(is(typeof(aa.keys)==const(int)[][])); A change from const(int)[][] to immutable(int)[][] shouldn't break a significant amount of code.
 3) With arrays, things are still not too bad because we have .dup and
 .idup. But what about structs or classes? We do not have .idup if the
 key type is specified as non-immutable, so how should the keys be
 stored? (Besides, do we *want* to clone objects used as AA keys in the
 first place?) And what type should .keys return?
They could implement .dup or .idup if they want those semantics. If the container cannot figure out how to get an immutable key from a mutable key, then indexing with mutable keys is disabled.
 4) What should be done if the key type is shared or inout? (I'm tempted
 to say outright prohibit shared, but people may not like their code
 breaking if they're actually using that in existing code.)
immutable is implicitly shared, but implicit .idup should be disabled. For inout there would need to be a general solution for resolving templated types. For example: struct S(T){ int[] x; auto opResolveInout(string op){ ... } // op is "const"/immutable/"" }
 I'm tempted to propose that the language should be changed so that AA
 keys are *always* immutable implicitly. That is, writing int[int] is
 exactly the same as writing int[immutable(int)], and .keys will always
 return immutable.
Probably it should return an array of tail-immutable keys. Additionally, as Andrej proposes, there should be ikeys that returns a tail-immutable array.
 This is the "correct" approach IMO, and existing code
 should be fixed if this breaks them. This will simplify a lot of the
 mess above (we can still support .idup for when the user wants to lookup
 mutable keys,
Lookup should succeed without implicit .idup whenever supportable. Otherwise I fail to see the benefit of the implicit .idup behavior and it should be dropped entirely.
 etc., but there will be no concessions for .keys to hand
 back mutable keys -- .dup them yourself).

 But I'm expecting some negative reactions to this hardline approach. :-)

 But even then, I'm considering if .keys should return a mutable array if
 the key is a value type (e.g., it's harmless for int[int].keys to return
 int[] because the int's are a copy anyway, and this way user code won't
 be unnecessarily straitjacketed to propagate immutable throughout).

 I'd like to hear how people think this should work, so that the new AA
 implementation will be more acceptable.


 T
'keys' should return an array of tail-immutable keys. int[int] => typeof(keys)==int[], typeof(ikeys)==immutable(int)[] int[int[]] => typeof(keys) == immutable(int)[][], typeof(ikeys)==immutable(int[])[]. etc.
Apr 17 2012
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 17 Apr 2012 14:41:42 -0400, H. S. Teoh <hsteoh quickfur.ath.cx>  
wrote:

 I'm having some major roadblocks with the AA implementation w.r.t. how
 to store/convert AA key types correctly. I've been working on this for a
 while now but still cannot find a satisfactory solution; hopefully y'all
 can help.

 First, in order to take advantage of the compiler's auto-conversion
 magic (such as using literal [1,2,3] for ubyte[] keys, etc.) and to
 avoid template bloat, we must take const(Key) as argument for all key
 lookup methods. So Key must at least be stored as const. But since we
 have to do this already, might as well do it right: Keys should be
 stored as immutable.
This is probably the best solution. I will note that dcollections does not require immutable keys for maps, and likely would need work to be able to do immutable keys anyway. The one part where I think immutable keys can be hindering/confusing is for classes. A coder may write something like: class Foo { private int _x; property int x() { return _x;} } In effect, Foo is immutable, since there is no valid way to change Foo. But having Foo not be actually marked as immutable is nice in that you don't have to deal with the lack of tail-immutable for classes. In such a case, effectively, int[Foo] is really never going to be a problem. But int[immutable(Foo)] will not allow you to access the x property. Yes, there are better ways to mark Foo, but it still poses an unnecessary restriction.
 1) For value types like int, there's no problem: immutable(int)
 interconverts freely with int via assignment, so we can store int keys
 however we like, and we can always hand back unqualified ints to the
 user. So if the user declares int[int], .keys will give int[], and if
 the user declares const(int)[int], then .keys will give back
 const(int)[]. No problem.
Sounds good (think you meant int[const(int)])
 2) Where things get hairy is when non-trivial keys are stored. Take
 arrays for example. If the user declares int[int[]], then we need an
 .idup so that we can store the key as immutable(int)[] (or should that
 be immutable(int[])?). But now if the user passes in an int[] key, we
 will need to .idup it in order to store it. This is acceptable, but what
 should .keys return?  If I were the user, I'd expect int[][], but since
 we can't implicitly convert immutable(int)[] to int[], we need a .dup
 for each key.  Which introduces unnecessary overhead, since for the most
 part the user doesn't need to change the keys anyway. But handing back
 immutable(int)[][] from .keys will break a lot of existing code.
My first reaction is, just don't allow it. If you are going to store immutable keys, require immutable keys. BTW, immutable(int[]) is likely going to be required to be stored as a key, since this can also be a key: struct S { int[] } and there's no way to turn that into immutable(int)[]. So you will have to deal with this situation, might as well do it now.
 3) With arrays, things are still not too bad because we have .dup and
 .idup. But what about structs or classes? We do not have .idup if the
 key type is specified as non-immutable, so how should the keys be
 stored? (Besides, do we *want* to clone objects used as AA keys in the
 first place?) And what type should .keys return?
.dup and .idup should not be used lightly. Specifically, setting a hash value should be an amortized O(1) operation. Using idup will make it O(unbounded) :)
 4) What should be done if the key type is shared or inout? (I'm tempted
 to say outright prohibit shared, but people may not like their code
 breaking if they're actually using that in existing code.)
I think with the current implementation, inout is disallowed on a member field. And for good reason. inout only really makes sense inside an inout function.
 I'm tempted to propose that the language should be changed so that AA
 keys are *always* immutable implicitly. That is, writing int[int] is
 exactly the same as writing int[immutable(int)], and .keys will always
 return immutable. This is the "correct" approach IMO, and existing code
 should be fixed if this breaks them. This will simplify a lot of the
 mess above (we can still support .idup for when the user wants to lookup
 mutable keys, etc., but there will be no concessions for .keys to hand
 back mutable keys -- .dup them yourself).
I think it already does this. But I don't like it. I think it's better to just require immutable and be done with it. People will say we can't do that because int[int[]] currently compiles, but somehow it was ok to make int[int[]] silently turn into int[immutable(int[])], which broke quite a bit of code. I see this as a similar scenario.
 But I'm expecting some negative reactions to this hardline approach. :-)

 But even then, I'm considering if .keys should return a mutable array if
 the key is a value type (e.g., it's harmless for int[int].keys to return
 int[] because the int's are a copy anyway, and this way user code won't
 be unnecessarily straitjacketed to propagate immutable throughout).

 I'd like to hear how people think this should work, so that the new AA
 implementation will be more acceptable.


 T
Apr 18 2012