www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - associative arrays

reply RenatoL <rexlen gmail.com> writes:
Very quick question

import std.stdio;
void main()
{
	auto arr1 = ["uno":1, "due":2, "tre":3];
	arr1.remove("xxx");
}

also in this case the compiler does not say anything and the
program goes out silently ... why? Would not it be better if an
exception was raised? After all if i write:

writeln(arr1["xxx"]);

runtime expresses its disappointment...
Jan 07 2012
next sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 01/07/2012 02:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 	auto arr1 = ["uno":1, "due":2, "tre":3];
 	arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised?

I think that could be a valid design choice. On the other hand, since what is required is to remove the element, it is not really an error if the object is already missing. The requirement is met.
 After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...

You get the exception more simply by this: auto e = arr1["xxx"]; But that's understandable because the [] operator is supposed to provide access to a usable object. Since there is no general concept of a null object, there is no object to provide access to, so there is nothing else to do but to throw. Ali
Jan 07 2012
next sibling parent reply RenatoL <rexlen gmail.com> writes:
Yes, i agree this may acceptable. On the other hand if i really
want/have to remove an item i have to be very careful cause a
trivial typo could cause a disaster....
Jan 07 2012
parent reply RenatoL <rexlen gmail.com> writes:
Yes, Jonathan, you're right.
the question arose precisely from a typo... i had to remove an
item with key "length"... i wrote "lengt" and the item never went
away... i knew that "lengt" was not in my key list... This kind of
mistake is quite tricky, may be using and IDE could help.
Jan 07 2012
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
RenatoL:

 Yes, Jonathan, you're right.
 the question arose precisely from a typo... i had to remove an
 item with key "length"... i wrote "lengt" and the item never went
 away... i knew that "lengt" was not in my key list... This kind of
 mistake is quite tricky, may be using and IDE could help.

This an example that shows that silent failures are sources of bugs... So this time I don't agree with Jonathan Davis. Python Zen contains: Errors should never pass silently. And Python associative arrays show it:
 d = {1:2, 3:4}
 del d[5]



File "<stdin>", line 1, in <module> KeyError: 5 The successive rule of the Python Zen says: Unless explicitly silenced. Python sets show this at work, they have a 'remove' method that throws if the item is missing, plus a 'discard' method that acts more like D associative arrays, in a sense it's a way to silence explicitly the error:
 s = set([1, 2])
 s.discard(1)
 s



 s = set([1, 2])
 s.discard(3)
 s



 s.remove(3)



File "<stdin>", line 1, in <module> KeyError: 3 Maybe D associative arrays too should have both kinds of deleting methods :-) Bye, bearophile
Jan 07 2012
next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Jonathan M Davis wrote:

 So, as far as I can tell, the current situation is more efficient

There are some more arguments: 1) Different threads might want to `remove' the same key from the AA. I don't see a reason why only the first served thread should complete the operation without an error. 2) Because there is only one operation for insertion of a key into an AA, only one operation should be used for removing that key. Note: This argument vanishes if insertions are divided in a:declaration of a key as valid in that AA, b:first assignment to the value of a key, c:reassignment to the value of a key 3) Reverse operations should not implement more functionality than the reversed operation. Because `remove' is the reverse operation for inserting a key and that operation does not cause any error, neither should errors be given from `remove'. Note: Reassigning to the value of a key might very well treated as an error. -manfred
Jan 07 2012
parent RenatoL <rexlen gmail.com> writes:
Very interesting discussion. Tk u all.
Jan 08 2012
prev sibling next sibling parent Stephen Bennett <d tarzenda.net> writes:
On 1/7/2012 8:54 PM, bearophile wrote:
 Yes, Jonathan, you're right.
 the question arose precisely from a typo... i had to remove an
 item with key "length"... i wrote "lengt" and the item never went
 away... i knew that "lengt" was not in my key list... This kind of
 mistake is quite tricky, may be using and IDE could help.

This an example that shows that silent failures are sources of bugs... So this time I don't agree with Jonathan Davis.

a.remove(x) doesn't fail though. It successfully satisfies its own postconditions by placing a in a state whereby it no longer contains x. The real source of bugs in this example is using hardcoded strings instead of named constants. :) Besides, one frequently doesn't know if (x in a) or not ahead of time. Think of evicting some computed intermediate results from a cache when you detect that the underlying resource has changed, for instance. In fact, it's really rare for an item to be removed from a long-lived collection anywhere near the point that it was added. Insisting on existence before removal clutters up the code for an extremely common use case. FWIW, Java's Map and .NET's IDictionary also handle this situation silently. Regards, ~Stephen
Jan 08 2012
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
 assert(key in aa);
 aa.remove(key);

 So, as far as I can tell, the current situation is more efficient, and it
 doesn't cost you any expressiveness. You can still have an exception thrown
 when remove fails if you use enforce before the call if you want an exception
 thrown when the element isn't there.

but then it should be called optional_remove - because it "mabye" removes its like file_delete, DeleteFile (and all the others) on non existing files - why is there an exception/error neeeded if missing? the "maybe"-situation is not that often used in good code - because you can't find your errors if many routines would work like remove
Jan 09 2012
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
dennis luehring wrote:
 why is there an exception/error neeeded if missing?

Exceptions or errors are not _needed_. Their existence stems from the modell under which the user of the operation _has_ to think about the operation, especially whether it is a:only the outcome of the operation or b:the amount of work to get to this outcome and maybe c:at runtime has to be known, whether the object exists. Jesse Phillips wrote:
 I have a lot of if(exists) remove() in my code

As one can see, Jesse does not know, whether the object exists and has to ask, before he in fact removes the object, i.e. doubled access to the file, which may cause a slowdown. As Jesse states the designer of the API has made up false assumptions about his knowledge, described in c:, _and_ his interests, described in b:. Jesse is interested only in the outcome of the operation, as described in a:, Therefore the likely answer to your question stated at the beginning is: the designer of the API made a mistake. -manfred
Jan 09 2012
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 09.01.2012 22:08, schrieb Manfred Nowak:
 dennis luehring wrote:
  why is there an exception/error neeeded if missing?

Exceptions or errors are not _needed_. Their existence stems from the modell under which the user of the operation _has_ to think about the operation, especially whether it is a:only the outcome of the operation or b:the amount of work to get to this outcome and maybe c:at runtime has to be known, whether the object exists. Jesse Phillips wrote:
  I have a lot of if(exists) remove() in my code

As one can see, Jesse does not know, whether the object exists and has to ask, before he in fact removes the object, i.e. doubled access to the file, which may cause a slowdown.

so your FileDelete would not return an FileDoesNotExists-Error? it isn't always clear what is "ok" to happen - there are million types of remove-like-semantic, an Jesse's scenario is one type of usage, which other type of remove allows silently non-remove-removes? STL? Python, etc. (how do others handle such cases) would it not help to better understand big code if the remove would be renamed to remove_existing or to add something like this?
Jan 09 2012
parent Manfred Nowak <svv1999 hotmail.com> writes:
dennis luehring wrote:

 so your FileDelete would not return an FileDoesNotExists-Error?

 would it not help to better understand big code if the remove
 would be renamed to remove_existing or to add something like this?

You possibly know about the `rm'-command of *nix-like systems and the by typo inserted space, which makes `rm -r *.obj' to `rm -r * .obj' This will certainly result in the nice error-message: "cannot remove `.obj': No such file or directory" Therefore: trying to help a minority possibly routes the majority. -manfred
Jan 10 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, January 07, 2012 21:54:07 bearophile wrote:
 Maybe D associative arrays too should have both kinds of deleting methods
 :-)

Why? What's the gain? If you care that the element is already in the AA, then do either assert(key in aa); aa.remove(key); or enforce(key in aa); aa.remove(key); I suppose that that's less efficient than remove throwing on failure, since it has to do a double lookup, but since throwing the exception costs _far_ more than that, I don't think that the extra cost really matters But if you _don't_ care whether the element is in the AA when you remove it, and remove threw when the element wasn't there, then you'd have to do if(key in aa) aa.remove(key); which would result is a double lookup when you don't want one, and performance _would_ matter, since there isn't necessarily another operation which would cost a lot which is overshadowing the cost of the double lookup (as occurs in the case where you throw the exception when the element isn't there). Also, if you considered the key not being there to be a bug (as seems more likely to me), the fact that remove then threw an assertion would force you to do the extra check _anyway_. assert(key in aa); aa.remove(key); So, as far as I can tell, the current situation is more efficient, and it doesn't cost you any expressiveness. You can still have an exception thrown when remove fails if you use enforce before the call if you want an exception thrown when the element isn't there. - Jonathan M Davis
Jan 07 2012
prev sibling parent "Jesse Phillips" <jessekphillips+D gmail.com> writes:
Based on these arguments does that mean std.file.remove should 
not be throwing when a file doesn't exist?

Or more precisely should I add a bugzilla entry to change this. I 
certainly have a lot of if(exists) remove() in my code and end up 
having to change where I just use remove().

On Sunday, 8 January 2012 at 05:03:45 UTC, Manfred Nowak wrote:
 Jonathan M Davis wrote:

 So, as far as I can tell, the current situation is more 
 efficient

There are some more arguments: 1) Different threads might want to `remove' the same key from the AA. I don't see a reason why only the first served thread should complete the operation without an error. 2) Because there is only one operation for insertion of a key into an AA, only one operation should be used for removing that key. Note: This argument vanishes if insertions are divided in a:declaration of a key as valid in that AA, b:first assignment to the value of a key, c:reassignment to the value of a key 3) Reverse operations should not implement more functionality than the reversed operation. Because `remove' is the reverse operation for inserting a key and that operation does not cause any error, neither should errors be given from `remove'. Note: Reassigning to the value of a key might very well treated as an error. -manfred

Jan 08 2012
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, January 07, 2012 15:06:30 Ali =C3=87ehreli wrote:
 On 01/07/2012 02:11 PM, RenatoL wrote:
  > Very quick question
  >=20
  > import std.stdio;
  > void main()
  > {
  >=20
  > =09auto arr1 =3D ["uno":1, "due":2, "tre":3];
  > =09arr1.remove("xxx");
  >=20
  > }
  >=20
  > also in this case the compiler does not say anything and the
  > program goes out silently ... why? Would not it be better if an
  > exception was raised?
=20
 I think that could be a valid design choice. On the other hand, since=

 what is required is to remove the element, it is not really an error =

 the object is already missing. The requirement is met.
=20
  > After all if i write:
  >=20
  > writeln(arr1["xxx"]);
  >=20
  > runtime expresses its disappointment...
=20
 You get the exception more simply by this:
=20
      auto e =3D arr1["xxx"];
=20
 But that's understandable because the [] operator is supposed to prov=

 access to a usable object. Since there is no general concept of a nul=

 object, there is no object to provide access to, so there is nothing
 else to do but to throw.

You likely know this, but I'll point it out in case a newbie to decide = to get=20 htrowing behavior out of the substript operator. If the element is not in the container, then the subscript operator thr= ows a=20 RangeError if -release isn't used, so it's an Error, not an Exception, = and=20 isn't a Throwable that you're supposed to catch (since the unwinding of= the=20 stack skips destructors, scope statements, finall blocks for any Throwa= bles=20 which aren't either Exception or derived from Exception). Also, without= - release, it does no bounds checking and will likely result in a segfaul= t. So,=20 relying on the subscript operator throwing is a _bad_ idea. The idea is= that=20 you never give it something that isn't actually in the container. If you need to check it, and you're dealing with a dynamic or static ar= ray,=20 then you check the array's length. If you're dealing with an associativ= e=20 array, then you use the in operator to get a pointer to the element fro= m the=20 container and check whether it's null or not (since if it's null, it wa= sn't=20 there). It would probably make remove more expensive if it had to throw when it= didn't=20 find an element, and for the most part, it's rather meaningless whether= it was=20 in the container before that. As pointed out, remove makes sure that th= e=20 element is no longer in the container after it's been called. It has no= real=20 reason to care whether an element was in there before or not. And if _y= ou_=20 don't care, then being forced to check whether an element was in the co= ntainer=20 before trying to remove it would make your code less efficient. The way= remove=20 is now, the check only occurs if you do it yourself and therefore actua= lly=20 _want_ it. And odds are, such a check would be an assertion anyway (and= =20 therefore would be throwing an AssertError on failure, not an Exception= ),=20 since the fact that the element wasn't in there anymore is a bug in you= r code,=20 since your code is assuming that it's there. So, while I can understand someone's first reaction being why remove do= esn't=20 throw if the element isn't there, when you really think through it, it = does=20 make more sense for it not to throw. - Jonathan M Davis
Jan 07 2012
parent Manfred Nowak <svv1999 hotmail.com> writes:
Jonathan M Davis wrote:

[...]

This sort of explanation is missing in the docs.

-manfred
Jan 07 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, January 07, 2012 23:34:05 RenatoL wrote:
 Yes, i agree this may acceptable. On the other hand if i really
 want/have to remove an item i have to be very careful cause a
 trivial typo could cause a disaster....

In general, if an element isn't in a container, and you expect it to be there, it's bug in your code and _should_ cause disaster. That is, if your code is assuming that the element must be in the container before you remove it, you should be asserting for that. assert(key in aa); aa.remove(key); This will result in an AssertError if key is not in aa, and upon failure will indicate that you have a bug in your code that you need to fix. If, on the other hand, it's not a bug if that element is not in the container, then why would your code be assuming that it is? In that case, I don't see why you'd care whether it was actually in the container or not, and having remove not do anything if it isn't there is the perfect behavior in that case. So, I'd argue that if your code is assuming that the element is in the container, an assertion should be used, and the fact that remove doesn't throw an exception is irrelevant, since it would be the wrong thing to do anyway. And if your code can't assume that the element is in the container already, then just let remove remove it if it's there and do nothing if it's not. - Jonathan M Davis
Jan 07 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 00:13:12 Manfred Nowak wrote:
 Jonathan M Davis wrote:
 
 [...]
 
 This sort of explanation is missing in the docs.

Which part? - Jonathan M Davis
Jan 07 2012
prev sibling next sibling parent reply Kapps <Kapps NotValidEmail.com> writes:
For most languages (such as C# and maybe Java), the Remove method on 
collections returns a boolean indicating whether it was removed. So you 
could write:

enforce(MyAA.remove("lenght"))
or
bool Result = MyAA.remove("lenght");
assert(Result);

I'm not sure why it doesn't in D, but I thought I remembered seeing a 
pull request or change that added it. Maybe I'm imagining things since I 
can't find it now.

On 07/01/2012 4:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 	auto arr1 = ["uno":1, "due":2, "tre":3];
 	arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised? After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...

Jan 07 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 01:39:24 Kapps wrote:
 For most languages (such as C# and maybe Java), the Remove method on
 collections returns a boolean indicating whether it was removed. So you
 could write:
 
 enforce(MyAA.remove("lenght"))
 or
 bool Result = MyAA.remove("lenght");
 assert(Result);
 
 I'm not sure why it doesn't in D, but I thought I remembered seeing a
 pull request or change that added it. Maybe I'm imagining things since I
 can't find it now.

There _is_ extra cost in returning bool rather than void, but I'm not at all sure that it's unreasonable to incur that cost. std.container.RedBlackTree's remove function returns how many elements were removed. Other functions in Phobos do similar in similar situations. It probably merits an enhancement request. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent reply Kapps <Kapps NotValidEmail.com> writes:
Ah, found the bug / pull request.

https://github.com/D-Programming-Language/dmd/pull/597
http://d.puremagic.com/issues/show_bug.cgi?id=4523

On 08/01/2012 1:39 AM, Kapps wrote:
 For most languages (such as C# and maybe Java), the Remove method on
 collections returns a boolean indicating whether it was removed. So you
 could write:

 enforce(MyAA.remove("lenght"))
 or
 bool Result = MyAA.remove("lenght");
 assert(Result);

 I'm not sure why it doesn't in D, but I thought I remembered seeing a
 pull request or change that added it. Maybe I'm imagining things since I
 can't find it now.

 On 07/01/2012 4:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 auto arr1 = ["uno":1, "due":2, "tre":3];
 arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised? After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...


Jan 08 2012
next sibling parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis

Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }
Jan 08 2012
next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
simendsjo wrote:

 Wouldn't it make sense to return a pointer to the item being 
 removed/null?

According to the docs this is the intended behavior. -manfred
Jan 08 2012
parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 11:09, Manfred Nowak wrote:
 simendsjo wrote:

 Wouldn't it make sense to return a pointer to the item being
 removed/null?

According to the docs this is the intended behavior. -manfred

The only mention I can see of remove is at the top, and it doesn't state return value: http://dlang.org/hash-map.html The pull changes the return value to bool: https://github.com/9rnsr/dmd/commit/f3cc75445775927e2036054811afb686cf4f1624
Jan 08 2012
parent Manfred Nowak <svv1999 hotmail.com> writes:
simendsjo wrote:

 it doesn't state return value

-manfred
Jan 08 2012
prev sibling parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 11:27, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:02:41 simendsjo wrote:
 On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis

Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }

Since the element has been removed from the container, what would the pointer point to? The element doesn't exist anymore. - Jonathan m Davis

I was thinking it could be a shorthand for the following: auto item = key in aa; if (item) key.remove(key);
Jan 08 2012
parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:

 auto item = key in aa;
 if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis

Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "
Jan 08 2012
parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 12:57, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
 On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:

 auto item = key in aa;
 if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis

Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "

I confess that it's one of those things that I would have thought would have been fairly obvious, but it certainly wouldn't hurt to add it. The documentation does tend to be sparse of details in any case. - Jonathan M Davis

Certainly not obvious to me :) auto c = new C(); C[string] aa; aa["c"] = c; aa.remove("c"); I guess using c from this point is valid, but if the only reference to c was inside the aa (and we got c using `in`), using c would have been undefined..?
Jan 08 2012
next sibling parent reply simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 13:49, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
 Certainly not obvious to me :)

Well, what's obvious to one person is not always obvious to another. I'm sure that there are plenty of things that Walter thinks should be perfectly obvious which 90% of programmers would never think of. A lot of it depends on what you're experience level is and what you have experience in. In this case, it's very much like dealing with C++ iterators, so if you're used to dealing with them and when they do or don't get invalidate, then the situation with in and remove is probably quite straightforward and intuitive.
 auto c = new C();
 C[string] aa;
 aa["c"] = c;
 aa.remove("c");

 I guess using c from this point is valid, but if the only reference to c
 was inside the aa (and we got c using `in`), using c would have been
 undefined..?

Using c is valid regardless. The problem is the in operator. It returns a pointer to the element inside of the AA. So, if you did C* d = aa["c"]; then d would be a pointer to a reference to a C, and after the call to remove, the pointer is pointing to memory which may or may not be part of the AA anymore and which may or may not hold a reference to what c refers to. It's bit like doing this int* a() { int b; return&b; } though the compiler should catch this, since it's obviously wrong. In both cases, you're referring to memory which your really not supposed to be accessing anymore and may point to invalid memory. The case of the AA probably isn't as bad, since either you're pointing to memory on the GC which isn't part of the AA (but hasn't been collected, since you have a pointer to it), or you're pointing to an element in the AA which may still be the same as it was before it was removed, or it may be another element, or it may be garbage memory. You really don't know. It's undefined and therefore shouldn't be used. I don't believe that it's ever safe to assume that the pointer returned by the in operator is still valid after the AA that it comes from has had an element added or removed. It's the same with iterators or ranges with many iterators and ranges. If they point to elements within a container, and that container is altered, they're invalidated and aren't safe to use. So, your example is fine, because it doesn't involve the in operator, and therefore doesn't involve any pointers into the AA. It's only once you have pointers into the AA that you get into trouble. - Jonathan M Davis

Thanks for your clarifications. Does this mean even this is undefined? aa["a"] = new C(); auto c = "a" in aa; aa["b"] = new C(); // Using c here is undefined as an element was added to aa
Jan 08 2012
parent simendsjo <simendsjo gmail.com> writes:
On 08.01.2012 14:40, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.

 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis

Thanks! You've saved me a couple of future bugs :)
Jan 08 2012
prev sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Jonathan M Davis wrote:

 In this case, it's very much like dealing with C++ iterators

A relevant part of Andrei's thread on "associative arrays iteration": http://www.digitalmars.com/d/archives/digitalmars/D/associative_arrays_ iteration_is_finally_here_99576.html#N99614 -manfred
Jan 08 2012
prev sibling parent Kapps <Kapps NotValidEmail.com> writes:
Looks like this is fixed for 2.058.

https://github.com/D-Programming-Language/dmd/commit/3e23b0f5834acb32eaee20d88c30ead7e03bb2f4

On 08/01/2012 3:43 AM, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis

Jan 09 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.
 
 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis
Jan 08 2012
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/8/12, simendsjo <simendsjo gmail.com> wrote:
 Wouldn't it make sense to return a pointer to the item being
 removed/null?

Seems like that would be even more costly. Personally I think returning bool is unnecessary, if we really want to know if something is in the hash we can check with the `in` operator. I filed that bug ages ago simply because it was in TDPL, but I didn't really give it much thought.
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 11:02:41 simendsjo wrote:
 On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.
 
 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be changed, and it's good to see that that there's a pull request for it. So, it should be fixed in the next release. In fact, the number of TDPL-related bugs fixed for the next release should be quite high, which is great. - Jonathan M Davis

Wouldn't it make sense to return a pointer to the item being removed/null? This way you can continue to work with the item, and it would be consistent with `in`. if (auto item = aa.remove(key)) { // some additional work on item } else { // not found // throw or something }

Since the element has been removed from the container, what would the pointer point to? The element doesn't exist anymore. - Jonathan m Davis
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:
 
 auto item = key in aa;
 if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
 On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:
 
 auto item = key in aa;
 if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item is actually valid at that point. It points to memory which _was_ inside of aa but may or may not be now, depending on the AA implementation, and it may or may not be reused by aa. Because it's on the GC heap, the memory itself won't be invalid, but the memory could be reused by the AA for another element, depending on its implementation. If you're lucky, it's memory that's just sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it. I would _not_ consider that to be good code. It's just asking for trouble. - Jonathan M Davis

Ah.. This should be documented in http://dlang.org/hash-map.html The only documentation for remove is: " Particular keys in an associative array can be removed with the remove function: b.remove("hello"); "

I confess that it's one of those things that I would have thought would have been fairly obvious, but it certainly wouldn't hurt to add it. The documentation does tend to be sparse of details in any case. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
 Certainly not obvious to me :)

Well, what's obvious to one person is not always obvious to another. I'm sure that there are plenty of things that Walter thinks should be perfectly obvious which 90% of programmers would never think of. A lot of it depends on what you're experience level is and what you have experience in. In this case, it's very much like dealing with C++ iterators, so if you're used to dealing with them and when they do or don't get invalidate, then the situation with in and remove is probably quite straightforward and intuitive.
 auto c = new C();
 C[string] aa;
 aa["c"] = c;
 aa.remove("c");
 
 I guess using c from this point is valid, but if the only reference to c
 was inside the aa (and we got c using `in`), using c would have been
 undefined..?

Using c is valid regardless. The problem is the in operator. It returns a pointer to the element inside of the AA. So, if you did C* d = aa["c"]; then d would be a pointer to a reference to a C, and after the call to remove, the pointer is pointing to memory which may or may not be part of the AA anymore and which may or may not hold a reference to what c refers to. It's bit like doing this int* a() { int b; return &b; } though the compiler should catch this, since it's obviously wrong. In both cases, you're referring to memory which your really not supposed to be accessing anymore and may point to invalid memory. The case of the AA probably isn't as bad, since either you're pointing to memory on the GC which isn't part of the AA (but hasn't been collected, since you have a pointer to it), or you're pointing to an element in the AA which may still be the same as it was before it was removed, or it may be another element, or it may be garbage memory. You really don't know. It's undefined and therefore shouldn't be used. I don't believe that it's ever safe to assume that the pointer returned by the in operator is still valid after the AA that it comes from has had an element added or removed. It's the same with iterators or ranges with many iterators and ranges. If they point to elements within a container, and that container is altered, they're invalidated and aren't safe to use. So, your example is fine, because it doesn't involve the in operator, and therefore doesn't involve any pointers into the AA. It's only once you have pointers into the AA that you get into trouble. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.
 
 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, January 08, 2012 14:57:51 simendsjo wrote:
 On 08.01.2012 14:40, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.
 
 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to. - Jonathan M Davis

Thanks! You've saved me a couple of future bugs :)

No problem. It's a bit like how an array could be reallocated after you append to it, which causes slices to no longer point to that array, except that unlike the slice the pointer may not point to valid memory. Given that it's managed by the GC, it probably isn't pointing to complete garbage, but it could be used for something else now. In general, if you insert or remove elements from a container, you must be very careful about any references to the data within the container - be they pointers to elements in the container, or iterators or ranges for the container. That's why std.container has the stable functions. They guarantee that the container's existing ranges don't get invalidated when they're called, whereas the ones that aren't stable make no such guarantees. The issues with in are just one of the manifestations of the more general issue. - Jonathan M Davis
Jan 08 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sun, 08 Jan 2012 08:40:27 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.

 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when "b" is added to it, c could still point to exactly what it did before, or aa could have been rehashed, which could move any or all of its elements around, in which case who knows what c points to.

Actually, not invalid for the current implementation. I don't know if it's stated whether an AA specifically requires that elements do not re-associate on a rehash. There are certain hash implementations (such as ones that involve a single array), which would invalidate pointers on a rehash. So there is the potential (if it's not a stated requirement to the contrary in the spec) that some future AA implementation would invalidated references. However, I'd say it's unlikely this would happen. BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that adding elements does not invalidated cursors (dcollections' safe version of pointers) as long as you use the default Hash implementation. However, I just noticed this is not stated anywhere in the docs (will fix). -Steve
Jan 09 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, January 09, 2012 09:25:14 Steven Schveighoffer wrote:
 Actually, not invalid for the current implementation. I don't know if
 it's stated whether an AA specifically requires that elements do not
 re-associate on a rehash.

Well, like I said, it depends on the current implementation. There are hash table implementations where rehashing would invalidate the pointer returned by in, and I don't believe that the spec specificies that D's AA guarantees that rehashing doesn't do that. So in the future, it _could_ be changed to an implementation which invalidates the pointers on rehash. As such, it doesn't really matter what the current implementation does. Relying on the current behavior is unsafe if it's not guaranteed by the spec. Now, if we want to change the spec to make such guarantees, that's fine, but I don't believe it makes them right now. Good to know what the current implementation is doing though. - Jonathan m Davis
Jan 09 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that
 adding elements does not invalidated cursors (dcollections' safe version
 of pointers) as long as you use the default Hash implementation.  However,
 I just noticed this is not stated anywhere in the docs (will fix).

Funny, I was looking at the docs a few days ago and it said "Adding an element can invalidate cursors depending on the implementation.", so I just assumed it did invalidate the cursors. I do think those are dcollections v1 docs though. Anyway glad to hear from you about this.
Jan 09 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 09 Jan 2012 13:35:26 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that
 adding elements does not invalidated cursors (dcollections' safe version
 of pointers) as long as you use the default Hash implementation.   
 However,
 I just noticed this is not stated anywhere in the docs (will fix).

Funny, I was looking at the docs a few days ago and it said "Adding an element can invalidate cursors depending on the implementation.", so I just assumed it did invalidate the cursors.

Yeah, I could have sworn I stated somewhere that the current implementation doesn't invalidate cursors, but apparently it's not stated. I think the docs need a lot of TLC before the first release. Some day when I have time... To clarify what I plan to do, the add doc will say something like "adding an element can invalidate cursors depending on the implementation, however the default implementation guarantees not to invalidate them." I don't want to specifically disallow implementations which do invalidate (dcollections' implementations are pluggable so anyone can replace the internals).
 I do think those are dcollections v1 docs though. Anyway glad to hear
 from you about this.

The D2 docs are somewhat leftover from D1 version. However, I do remember when implementing the hash code that it purposely does not invalidate cursors on a rehash. Ranges can be invalidated, however. Given the implementation, it's actually faster *not* to invalidate them when rehashing because I just relink all the link-list nodes instead of reallocating them. -Steve
Jan 09 2012
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Ok, allow me to temporarily hijack again and ask: Would you mind
adding opIn_r (or rather the newer opBinaryRight with "in") that
forwards to contains() for the HashSet and similar hash-based classes
that define contains()? It would make porting code that uses builtin
hashes to your own implementation easier.
Jan 09 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 09 Jan 2012 14:57:36 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 Ok, allow me to temporarily hijack again and ask: Would you mind
 adding opIn_r (or rather the newer opBinaryRight with "in") that
 forwards to contains() for the HashSet and similar hash-based classes
 that define contains()? It would make porting code that uses builtin
 hashes to your own implementation easier.

Could this be you? http://www.dsource.org/projects/dcollections/ticket/18 If so, this means you are OK with the last suggestion I made? -Steve
Jan 09 2012
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 Could this be you?

Ah, yes. I didn't even notice you've replied to that, sorry. Yes, I'm ok with it.
Jan 09 2012