digitalmars.D.learn - associative arrays

RenatoL (12/12) Jan 07 2012 Very quick question

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (11/24) Jan 07 2012 I think that could be a valid design choice. On the other hand, since

RenatoL (3/3) Jan 07 2012 Yes, i agree this may acceptable. On the other hand if i really

Jonathan M Davis (19/22) Jan 07 2012 In general, if an element isn't in a container, and you expect it to be ...

RenatoL (5/5) Jan 07 2012 Yes, Jonathan, you're right.

bearophile (20/34) Jan 07 2012 This an example that shows that silent failures are sources of bugs...

Jonathan M Davis (29/31) Jan 07 2012 Why? What's the gain? If you care that the element is already in the AA,...

Manfred Nowak (18/19) Jan 07 2012 There are some more arguments:

Jesse Phillips (6/27) Jan 08 2012 Based on these arguments does that mean std.file.remove should

RenatoL (1/1) Jan 08 2012 Very interesting discussion. Tk u all.

dennis luehring (5/11) Jan 09 2012 but then it should be called optional_remove - because it "mabye" remove...

Manfred Nowak (22/24) Jan 09 2012 Exceptions or errors are not _needed_.

dennis luehring (8/23) Jan 09 2012 so your FileDelete would not return an FileDoesNotExists-Error?

Manfred Nowak (12/15) Jan 10 2012 Correct.

Stephen Bennett (15/23) Jan 08 2012 a.remove(x) doesn't fail though. It successfully satisfies its own

Jonathan M Davis (60/94) Jan 07 2012 ide

Manfred Nowak (4/4) Jan 07 2012 Jonathan M Davis wrote:

Jonathan M Davis (3/8) Jan 07 2012 Which part?

Kapps (11/23) Jan 07 2012 For most languages (such as C# and maybe Java), the Remove method on

Jonathan M Davis (7/19) Jan 08 2012 There _is_ extra cost in returning bool rather than void, but I'm not at...
Kapps (4/31) Jan 08 2012 Ah, found the bug / pull request.

Jonathan M Davis (6/10) Jan 08 2012 Ah, TDPL says that it returns bool. Well, then it definitely needs to be...

simendsjo (9/19) Jan 08 2012 Wouldn't it make sense to return a pointer to the item being

Manfred Nowak (3/5) Jan 08 2012 According to the docs this is the intended behavior.

simendsjo (5/10) Jan 08 2012 The only mention I can see of remove is at the top, and it doesn't state...

Manfred Nowak (3/4) Jan 08 2012 Yes, I haven't read carefully enough.

Jonathan M Davis (4/28) Jan 08 2012 Since the element has been removed from the container, what would the po...

simendsjo (4/32) Jan 08 2012 I was thinking it could be a shorthand for the following:

Jonathan M Davis (10/14) Jan 08 2012 I take it that you then intend to use item after that? I'm not sure that...

simendsjo (8/22) Jan 08 2012 Ah.. This should be documented in http://dlang.org/hash-map.html

Jonathan M Davis (5/33) Jan 08 2012 I confess that it's one of those things that I would have thought would ...

simendsjo (9/42) Jan 08 2012 Certainly not obvious to me :)

Jonathan M Davis (37/46) Jan 08 2012 Well, what's obvious to one person is not always obvious to another. I'm...

simendsjo (7/53) Jan 08 2012 Thanks for your clarifications.

Jonathan M Davis (6/13) Jan 08 2012 Yes. Depending on the current implementation of AAs and the state of aa ...

simendsjo (2/15) Jan 08 2012 Thanks! You've saved me a couple of future bugs :)

Jonathan M Davis (14/32) Jan 08 2012 No problem. It's a bit like how an array could be reallocated after you ...

Steven Schveighoffer (15/30) Jan 09 2012 Actually, not invalid for the current implementation. I don't know if

Jonathan M Davis (12/15) Jan 09 2012 Well, like I said, it depends on the current implementation. There are h...
Andrej Mitrovic (6/10) Jan 09 2012 Funny, I was looking at the docs a few days ago and it said "Adding an

Steven Schveighoffer (19/30) Jan 09 2012 Yeah, I could have sworn I stated somewhere that the current

Andrej Mitrovic (5/5) Jan 09 2012 Ok, allow me to temporarily hijack again and ask: Would you mind

Steven Schveighoffer (6/11) Jan 09 2012 Could this be you?

Andrej Mitrovic (3/4) Jan 09 2012 Ah, yes. I didn't even notice you've replied to that, sorry. Yes, I'm

Manfred Nowak (5/6) Jan 08 2012 A relevant part of Andrei's thread on "associative arrays iteration":

Andrej Mitrovic (6/8) Jan 08 2012 Seems like that would be even more costly. Personally I think

Kapps (3/13) Jan 09 2012 Looks like this is fixed for 2.058.

RenatoL <rexlen gmail.com> writes:

Very quick question

import std.stdio;
void main()
{
	auto arr1 = ["uno":1, "due":2, "tre":3];
	arr1.remove("xxx");
}

also in this case the compiler does not say anything and the
program goes out silently ... why? Would not it be better if an
exception was raised? After all if i write:

writeln(arr1["xxx"]);

runtime expresses its disappointment...

Jan 07 2012

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 01/07/2012 02:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 	auto arr1 = ["uno":1, "due":2, "tre":3];
 	arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised?

I think that could be a valid design choice. On the other hand, since 
what is required is to remove the element, it is not really an error if 
the object is already missing. The requirement is met.

 After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...

You get the exception more simply by this:

     auto e = arr1["xxx"];

But that's understandable because the [] operator is supposed to provide 
access to a usable object. Since there is no general concept of a null 
object, there is no object to provide access to, so there is nothing 
else to do but to throw.

Ali

Jan 07 2012

RenatoL <rexlen gmail.com> writes:

Yes, i agree this may acceptable. On the other hand if i really
want/have to remove an item i have to be very careful cause a
trivial typo could cause a disaster....

Jan 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, January 07, 2012 23:34:05 RenatoL wrote:
 Yes, i agree this may acceptable. On the other hand if i really
 want/have to remove an item i have to be very careful cause a
 trivial typo could cause a disaster....

In general, if an element isn't in a container, and you expect it to be there, 
it's bug in your code and _should_ cause disaster. That is, if your code is 
assuming that the element must be in the container before you remove it, you 
should be asserting for that.

assert(key in aa);
aa.remove(key);

This will result in an AssertError if key is not in aa, and upon failure will 
indicate that you have a bug in your code that you need to fix.

If, on the other hand, it's not a bug if that element is not in the container, 
then why would your code be assuming that it is? In that case, I don't see why 
you'd care whether it was actually in the container or not, and having remove 
not do anything if it isn't there is the perfect behavior in that case.

So, I'd argue that if your code is assuming that the element is in the 
container, an assertion should be used, and the fact that remove doesn't throw 
an exception is irrelevant, since it would be the wrong thing to do anyway. 
And if your code can't assume that the element is in the container already, 
then just let remove remove it if it's there and do nothing if it's not.

- Jonathan M Davis

Jan 07 2012

RenatoL <rexlen gmail.com> writes:

Yes, Jonathan, you're right.
the question arose precisely from a typo... i had to remove an
item with key "length"... i wrote "lengt" and the item never went
away... i knew that "lengt" was not in my key list... This kind of
mistake is quite tricky, may be using and IDE could help.

Jan 07 2012

bearophile <bearophileHUGS lycos.com> writes:

RenatoL:

 Yes, Jonathan, you're right.
 the question arose precisely from a typo... i had to remove an
 item with key "length"... i wrote "lengt" and the item never went
 away... i knew that "lengt" was not in my key list... This kind of
 mistake is quite tricky, may be using and IDE could help.

This an example that shows that silent failures are sources of bugs...
So this time I don't agree with Jonathan Davis.

Python Zen contains:
Errors should never pass silently.

And Python associative arrays show it:

 d = {1:2, 3:4}
 del d[5]



Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 5


The successive rule of the Python Zen says:
Unless explicitly silenced.

Python sets show this at work, they have a 'remove' method that throws if the
item is missing, plus a 'discard' method that acts more like D associative
arrays, in a sense it's a way to silence explicitly the error:

 s = set([1, 2])
 s.discard(1)
 s



set([2])
 s = set([1, 2])
 s.discard(3)
 s



set([1, 2])
 s.remove(3)



Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 3

Maybe D associative arrays too should have both kinds of deleting methods :-)

Bye,
bearophile

Jan 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, January 07, 2012 21:54:07 bearophile wrote:
 Maybe D associative arrays too should have both kinds of deleting methods
 :-)

Why? What's the gain? If you care that the element is already in the AA, then 
do either

assert(key in aa);
aa.remove(key);

or

enforce(key in aa);
aa.remove(key);

I suppose that that's less efficient than remove throwing on failure, since it 
has to do a double lookup, but since throwing the exception costs _far_ more 
than that, I don't think that the extra cost really matters

But if you _don't_ care whether the element is in the AA when you remove it, 
and remove threw when the element wasn't there, then you'd have to do

if(key in aa)
    aa.remove(key);

which would result is a double lookup when you don't want one, and performance 
_would_ matter, since there isn't necessarily another operation which would 
cost a lot which is overshadowing the cost of the double lookup (as occurs in 
the case where you throw the exception when the element isn't there). Also, if 
you considered the key not being there to be a bug (as seems more likely to 
me), the fact that remove then threw an assertion would force you to do the 
extra check _anyway_.

assert(key in aa);
aa.remove(key);

So, as far as I can tell, the current situation is more efficient, and it 
doesn't cost you any expressiveness. You can still have an exception thrown 
when remove fails if you use enforce before the call if you want an exception 
thrown when the element isn't there.

- Jonathan M Davis

Jan 07 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Jonathan M Davis wrote:

 So, as far as I can tell, the current situation is more efficient

There are some more arguments:

1) Different threads might want to `remove' the same key from the AA. I 
don't see a reason why only the first served thread should complete the 
operation without an error.

2) Because there is only one operation for insertion of a key into an 
AA, only one operation should be used for removing that key.

Note: This argument vanishes if insertions are divided in 
  a:declaration of a key as valid in that AA,
  b:first assignment to the value of a key,
  c:reassignment to the value of a key

3) Reverse operations should not implement more functionality than the 
reversed operation. Because `remove' is the reverse operation for 
inserting a key and that operation does not cause any error, neither 
should errors be given from `remove'.

Note: Reassigning to the value of a key might very well treated as an 
error.

-manfred

Jan 07 2012

"Jesse Phillips" <jessekphillips+D gmail.com> writes:

Based on these arguments does that mean std.file.remove should 
not be throwing when a file doesn't exist?

Or more precisely should I add a bugzilla entry to change this. I 
certainly have a lot of if(exists) remove() in my code and end up 
having to change where I just use remove().

On Sunday, 8 January 2012 at 05:03:45 UTC, Manfred Nowak wrote:
 Jonathan M Davis wrote:

 So, as far as I can tell, the current situation is more 
 efficient

 There are some more arguments:

 1) Different threads might want to `remove' the same key from 
 the AA. I don't see a reason why only the first served thread 
 should complete the operation without an error.

 2) Because there is only one operation for insertion of a key 
 into an AA, only one operation should be used for removing that 
 key.

 Note: This argument vanishes if insertions are divided in
 a:declaration of a key as valid in that AA,
 b:first assignment to the value of a key,
 c:reassignment to the value of a key

 3) Reverse operations should not implement more functionality 
 than the reversed operation. Because `remove' is the reverse 
 operation for inserting a key and that operation does not cause 
 any error, neither should errors be given from `remove'.

 Note: Reassigning to the value of a key might very well treated 
 as an error.

 -manfred

Jan 08 2012

RenatoL <rexlen gmail.com> writes:

Very interesting discussion. Tk u all.

Jan 08 2012

dennis luehring <dl.soluz gmx.net> writes:

 assert(key in aa);
 aa.remove(key);

 So, as far as I can tell, the current situation is more efficient, and it
 doesn't cost you any expressiveness. You can still have an exception thrown
 when remove fails if you use enforce before the call if you want an exception
 thrown when the element isn't there.

but then it should be called optional_remove - because it "mabye" removes

its like file_delete, DeleteFile (and all the others) on non existing 
files - why is there an exception/error neeeded if missing?

the "maybe"-situation is not that often used in good code - because you 
can't find your errors if many routines would work like remove

Jan 09 2012

Manfred Nowak <svv1999 hotmail.com> writes:

dennis luehring wrote:
 why is there an exception/error neeeded if missing?

Exceptions or errors are not _needed_.

Their existence stems from the modell under which the user of the 
operation _has_ to think about the operation, especially whether it 
is
  a:only the outcome of the operation or
  b:the amount of work to get to this outcome
and maybe 
  c:at runtime has to be known, whether the object exists.


Jesse Phillips wrote:
 I have a lot of if(exists) remove() in my code

As one can see, Jesse does not know, whether the object exists and 
has to ask, before he in fact removes the object, i.e. doubled access 
to the file, which may cause a slowdown.

As Jesse states the designer of the API has made up false assumptions 
about his knowledge, described in c:,  _and_ his interests, described 
in b:.

Jesse is interested only in the outcome of the operation, as 
described in a:,

Therefore the likely answer to your question stated at the beginning 
is: the designer of the API made a mistake.

-manfred

Jan 09 2012

dennis luehring <dl.soluz gmx.net> writes:

Am 09.01.2012 22:08, schrieb Manfred Nowak:
 dennis luehring wrote:
  why is there an exception/error neeeded if missing?

 Exceptions or errors are not _needed_.

 Their existence stems from the modell under which the user of the
 operation _has_ to think about the operation, especially whether it
 is
    a:only the outcome of the operation or
    b:the amount of work to get to this outcome
 and maybe
    c:at runtime has to be known, whether the object exists.


 Jesse Phillips wrote:
  I have a lot of if(exists) remove() in my code

 As one can see, Jesse does not know, whether the object exists and
 has to ask, before he in fact removes the object, i.e. doubled access
 to the file, which may cause a slowdown.

so your FileDelete would not return an FileDoesNotExists-Error?

it isn't always clear what is "ok" to happen - there are million types 
of remove-like-semantic, an Jesse's scenario is one type of usage, which 
other type of remove allows silently non-remove-removes? STL? Python, 
etc. (how do others handle such cases)

would it not help to better understand big code if the remove would be 
renamed to remove_existing or to add something like this?

Jan 09 2012

Manfred Nowak <svv1999 hotmail.com> writes:

dennis luehring wrote:

 so your FileDelete would not return an FileDoesNotExists-Error?

Correct.
 
 would it not help to better understand big code if the remove
 would be renamed to remove_existing or to add something like this?

Maybe.

You possibly know about the `rm'-command of *nix-like systems and the 
by typo inserted space, which makes
  `rm -r *.obj' to
  `rm -r * .obj'
This will certainly result in the nice error-message:
  "cannot remove `.obj': No such file or directory"

Therefore: trying to help a minority possibly routes the majority.

-manfred

Jan 10 2012

Stephen Bennett <d tarzenda.net> writes:

On 1/7/2012 8:54 PM, bearophile wrote:
 Yes, Jonathan, you're right.
 the question arose precisely from a typo... i had to remove an
 item with key "length"... i wrote "lengt" and the item never went
 away... i knew that "lengt" was not in my key list... This kind of
 mistake is quite tricky, may be using and IDE could help.

 
 This an example that shows that silent failures are sources of bugs...
 So this time I don't agree with Jonathan Davis.

a.remove(x) doesn't fail though. It successfully satisfies its own
postconditions by placing a in a state whereby it no longer contains x.
The real source of bugs in this example is using hardcoded strings
instead of named constants. :)

Besides, one frequently doesn't know if (x in a) or not ahead of time.
Think of evicting some computed intermediate results from a cache when
you detect that the underlying resource has changed, for instance. In
fact, it's really rare for an item to be removed from a long-lived
collection anywhere near the point that it was added. Insisting on
existence before removal clutters up the code for an extremely common
use case.

FWIW, Java's Map and .NET's IDictionary also handle this situation silently.

Regards,
~Stephen

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, January 07, 2012 15:06:30 Ali =C3=87ehreli wrote:
 On 01/07/2012 02:11 PM, RenatoL wrote:
  > Very quick question
  >=20
  > import std.stdio;
  > void main()
  > {
  >=20
  > =09auto arr1 =3D ["uno":1, "due":2, "tre":3];
  > =09arr1.remove("xxx");
  >=20
  > }
  >=20
  > also in this case the compiler does not say anything and the
  > program goes out silently ... why? Would not it be better if an
  > exception was raised?
=20
 I think that could be a valid design choice. On the other hand, since=

 what is required is to remove the element, it is not really an error =

if
 the object is already missing. The requirement is met.
=20
  > After all if i write:
  >=20
  > writeln(arr1["xxx"]);
  >=20
  > runtime expresses its disappointment...
=20
 You get the exception more simply by this:
=20
      auto e =3D arr1["xxx"];
=20
 But that's understandable because the [] operator is supposed to prov=

ide
 access to a usable object. Since there is no general concept of a nul=

l
 object, there is no object to provide access to, so there is nothing
 else to do but to throw.

You likely know this, but I'll point it out in case a newbie to decide =
to get=20
htrowing behavior out of the substript operator.

If the element is not in the container, then the subscript operator thr=
ows a=20
RangeError if -release isn't used, so it's an Error, not an Exception, =
and=20
isn't a Throwable that you're supposed to catch (since the unwinding of=
 the=20
stack skips destructors, scope statements, finall blocks for any Throwa=
bles=20
which aren't either Exception or derived from Exception). Also, without=
 -
release, it does no bounds checking and will likely result in a segfaul=
t. So,=20
relying on the subscript operator throwing is a _bad_ idea. The idea is=
 that=20
you never give it something that isn't actually in the container.

If you need to check it, and you're dealing with a dynamic or static ar=
ray,=20
then you check the array's length. If you're dealing with an associativ=
e=20
array, then you use the in operator to get a pointer to the element fro=
m the=20
container and check whether it's null or not (since if it's null, it wa=
sn't=20
there).

It would probably make remove more expensive if it had to throw when it=
 didn't=20
find an element, and for the most part, it's rather meaningless whether=
 it was=20
in the container before that. As pointed out, remove makes sure that th=
e=20
element is no longer in the container after it's been called. It has no=
 real=20
reason to care whether an element was in there before or not. And if _y=
ou_=20
don't care, then being forced to check whether an element was in the co=
ntainer=20
before trying to remove it would make your code less efficient. The way=
 remove=20
is now, the check only occurs if you do it yourself and therefore actua=
lly=20
_want_ it. And odds are, such a check would be an assertion anyway (and=
=20
therefore would be throwing an AssertError on failure, not an Exception=
),=20
since the fact that the element wasn't in there anymore is a bug in you=
r code,=20
since your code is assuming that it's there.

So, while I can understand someone's first reaction being why remove do=
esn't=20
throw if the element isn't there, when you really think through it, it =
does=20
make more sense for it not to throw.

- Jonathan M Davis

Jan 07 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Jonathan M Davis wrote:

[...]

This sort of explanation is missing in the docs.

-manfred

Jan 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 00:13:12 Manfred Nowak wrote:
 Jonathan M Davis wrote:
 
 [...]
 
 This sort of explanation is missing in the docs.

Which part?

- Jonathan M Davis

Jan 07 2012

Kapps <Kapps NotValidEmail.com> writes:


collections returns a boolean indicating whether it was removed. So you 
could write:

enforce(MyAA.remove("lenght"))
or
bool Result = MyAA.remove("lenght");
assert(Result);

I'm not sure why it doesn't in D, but I thought I remembered seeing a 
pull request or change that added it. Maybe I'm imagining things since I 
can't find it now.

On 07/01/2012 4:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 	auto arr1 = ["uno":1, "due":2, "tre":3];
 	arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised? After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...

Jan 07 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 01:39:24 Kapps wrote:

 collections returns a boolean indicating whether it was removed. So you
 could write:
 
 enforce(MyAA.remove("lenght"))
 or
 bool Result = MyAA.remove("lenght");
 assert(Result);
 
 I'm not sure why it doesn't in D, but I thought I remembered seeing a
 pull request or change that added it. Maybe I'm imagining things since I
 can't find it now.

There _is_ extra cost in returning bool rather than void, but I'm not at all 
sure that it's unreasonable to incur that cost. std.container.RedBlackTree's 
remove function returns how many elements were removed. Other functions in 
Phobos do similar in similar situations.

It probably merits an enhancement request.

- Jonathan M Davis

Jan 08 2012

Kapps <Kapps NotValidEmail.com> writes:

Ah, found the bug / pull request.

https://github.com/D-Programming-Language/dmd/pull/597
http://d.puremagic.com/issues/show_bug.cgi?id=4523

On 08/01/2012 1:39 AM, Kapps wrote:

 collections returns a boolean indicating whether it was removed. So you
 could write:

 enforce(MyAA.remove("lenght"))
 or
 bool Result = MyAA.remove("lenght");
 assert(Result);

 I'm not sure why it doesn't in D, but I thought I remembered seeing a
 pull request or change that added it. Maybe I'm imagining things since I
 can't find it now.

 On 07/01/2012 4:11 PM, RenatoL wrote:
 Very quick question

 import std.stdio;
 void main()
 {
 auto arr1 = ["uno":1, "due":2, "tre":3];
 arr1.remove("xxx");
 }

 also in this case the compiler does not say anything and the
 program goes out silently ... why? Would not it be better if an
 exception was raised? After all if i write:

 writeln(arr1["xxx"]);

 runtime expresses its disappointment...

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.
 
 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

Ah, TDPL says that it returns bool. Well, then it definitely needs to be 
changed, and it's good to see that that there's a pull request for it. So, it 
should be fixed in the next release. In fact, the number of TDPL-related bugs 
fixed for the next release should be quite high, which is great.

- Jonathan M Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

 Ah, TDPL says that it returns bool. Well, then it definitely needs to be
 changed, and it's good to see that that there's a pull request for it. So, it
 should be fixed in the next release. In fact, the number of TDPL-related bugs
 fixed for the next release should be quite high, which is great.

 - Jonathan M Davis

Wouldn't it make sense to return a pointer to the item being 
removed/null? This way you can continue to work with the item, and it 
would be consistent with `in`.

if (auto item = aa.remove(key)) {
   // some additional work on item
} else { // not found
   // throw or something
}

Jan 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

simendsjo wrote:

 Wouldn't it make sense to return a pointer to the item being 
 removed/null?

According to the docs this is the intended behavior.

-manfred

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 11:09, Manfred Nowak wrote:
 simendsjo wrote:

 Wouldn't it make sense to return a pointer to the item being
 removed/null?

 According to the docs this is the intended behavior.

 -manfred

The only mention I can see of remove is at the top, and it doesn't state 
return value: http://dlang.org/hash-map.html

The pull changes the return value to bool: 
https://github.com/9rnsr/dmd/commit/f3cc75445775927e2036054811afb686cf4f1624

Jan 08 2012

Manfred Nowak <svv1999 hotmail.com> writes:

simendsjo wrote:

 it doesn't state return value

Yes, I haven't read carefully enough.

-manfred

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 11:02:41 simendsjo wrote:
 On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.
 
 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

 
 Ah, TDPL says that it returns bool. Well, then it definitely needs to be
 changed, and it's good to see that that there's a pull request for it.
 So, it should be fixed in the next release. In fact, the number of
 TDPL-related bugs fixed for the next release should be quite high,
 which is great.
 
 - Jonathan M Davis

 
 Wouldn't it make sense to return a pointer to the item being
 removed/null? This way you can continue to work with the item, and it
 would be consistent with `in`.
 
 if (auto item = aa.remove(key)) {
    // some additional work on item
 } else { // not found
    // throw or something
 }

Since the element has been removed from the container, what would the pointer 
point to? The element doesn't exist anymore.

- Jonathan m Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 11:27, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:02:41 simendsjo wrote:
 On 08.01.2012 10:43, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

 Ah, TDPL says that it returns bool. Well, then it definitely needs to be
 changed, and it's good to see that that there's a pull request for it.
 So, it should be fixed in the next release. In fact, the number of
 TDPL-related bugs fixed for the next release should be quite high,
 which is great.

 - Jonathan M Davis

 Wouldn't it make sense to return a pointer to the item being
 removed/null? This way you can continue to work with the item, and it
 would be consistent with `in`.

 if (auto item = aa.remove(key)) {
     // some additional work on item
 } else { // not found
     // throw or something
 }

 Since the element has been removed from the container, what would the pointer
 point to? The element doesn't exist anymore.

 - Jonathan m Davis

I was thinking it could be a shorthand for the following:

auto item = key in aa;
if (item) key.remove(key);

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:
 
 auto item = key in aa;
 if (item) key.remove(key);

I take it that you then intend to use item after that? I'm not sure that item 
is actually valid at that point. It points to memory which _was_ inside of aa 
but may or may not be now, depending on the AA implementation, and it may or 
may not be reused by aa. Because it's on the GC heap, the memory itself won't 
be invalid, but the memory could be reused by the AA for another element, 
depending on its implementation. If you're lucky, it's memory that's just 
sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it.

I would _not_ consider that to be good code. It's just asking for trouble.

- Jonathan M Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:

 auto item = key in aa;
 if (item) key.remove(key);

 I take it that you then intend to use item after that? I'm not sure that item
 is actually valid at that point. It points to memory which _was_ inside of aa
 but may or may not be now, depending on the AA implementation, and it may or
 may not be reused by aa. Because it's on the GC heap, the memory itself won't
 be invalid, but the memory could be reused by the AA for another element,
 depending on its implementation. If you're lucky, it's memory that's just
 sitting on the GC heap and now unrelated to the AA, but I wouldn't bet on it.

 I would _not_ consider that to be good code. It's just asking for trouble.

 - Jonathan M Davis

Ah.. This should be documented in http://dlang.org/hash-map.html
The only documentation for remove is:
"
Particular keys in an associative array can be removed with the remove 
function:
     b.remove("hello");
"

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
 On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:
 
 auto item = key in aa;
 if (item) key.remove(key);

 
 I take it that you then intend to use item after that? I'm not sure that
 item is actually valid at that point. It points to memory which _was_
 inside of aa but may or may not be now, depending on the AA
 implementation, and it may or may not be reused by aa. Because it's on
 the GC heap, the memory itself won't be invalid, but the memory could
 be reused by the AA for another element, depending on its
 implementation. If you're lucky, it's memory that's just sitting on the
 GC heap and now unrelated to the AA, but I wouldn't bet on it.
 
 I would _not_ consider that to be good code. It's just asking for
 trouble.
 
 - Jonathan M Davis

 
 Ah.. This should be documented in http://dlang.org/hash-map.html
 The only documentation for remove is:
 "
 Particular keys in an associative array can be removed with the remove
 function:
      b.remove("hello");
 "

I confess that it's one of those things that I would have thought would have 
been fairly obvious, but it certainly wouldn't hurt to add it. The 
documentation does tend to be sparse of details in any case.

- Jonathan M Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 12:57, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 12:35:27 simendsjo wrote:
 On 08.01.2012 12:18, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 11:33:51 simendsjo wrote:
 I was thinking it could be a shorthand for the following:

 auto item = key in aa;
 if (item) key.remove(key);

 I take it that you then intend to use item after that? I'm not sure that
 item is actually valid at that point. It points to memory which _was_
 inside of aa but may or may not be now, depending on the AA
 implementation, and it may or may not be reused by aa. Because it's on
 the GC heap, the memory itself won't be invalid, but the memory could
 be reused by the AA for another element, depending on its
 implementation. If you're lucky, it's memory that's just sitting on the
 GC heap and now unrelated to the AA, but I wouldn't bet on it.

 I would _not_ consider that to be good code. It's just asking for
 trouble.

 - Jonathan M Davis

 Ah.. This should be documented in http://dlang.org/hash-map.html
 The only documentation for remove is:
 "
 Particular keys in an associative array can be removed with the remove
 function:
       b.remove("hello");
 "

 I confess that it's one of those things that I would have thought would have
 been fairly obvious, but it certainly wouldn't hurt to add it. The
 documentation does tend to be sparse of details in any case.

 - Jonathan M Davis

Certainly not obvious to me :)

auto c = new C();
C[string] aa;
aa["c"] = c;
aa.remove("c");

I guess using c from this point is valid, but if the only reference to c 
was inside the aa (and we got c using `in`), using c would have been 
undefined..?

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
 Certainly not obvious to me :)

Well, what's obvious to one person is not always obvious to another. I'm sure 
that there are plenty of things that Walter thinks should be perfectly obvious 
which 90% of programmers would never think of. A lot of it depends on what 
you're experience level is and what you have experience in. In this case, it's 
very much like dealing with C++ iterators, so if you're used to dealing with 
them and when they do or don't get invalidate, then the situation with in and 
remove is probably quite straightforward and intuitive.

 auto c = new C();
 C[string] aa;
 aa["c"] = c;
 aa.remove("c");
 
 I guess using c from this point is valid, but if the only reference to c
 was inside the aa (and we got c using `in`), using c would have been
 undefined..?

Using c is valid regardless. The problem is the in operator. It returns a 
pointer to the element inside of the AA. So, if you did

C* d = aa["c"];

then d would be a pointer to a reference to a C, and after the call to remove, 
the pointer is pointing to memory which may or may not be part of the AA 
anymore and which may or may not hold a reference to what c refers to. It's 
bit like doing this

int* a()
{
    int b;
    return &b;
}

though the compiler should catch this, since it's obviously wrong. In both 
cases, you're referring to memory which your really not supposed to be 
accessing anymore and may point to invalid memory. The case of the AA probably 
isn't as bad, since either you're pointing to memory on the GC which isn't 
part of the AA (but hasn't been collected, since you have a pointer to it), or 
you're pointing to an element in the AA which may still be the same as it was 
before it was removed, or it may be another element, or it may be garbage 
memory. You really don't know. It's undefined and therefore shouldn't be used.

I don't believe that it's ever safe to assume that the pointer returned by the 
in operator is still valid after the AA that it comes from has had an element 
added or removed. It's the same with iterators or ranges with many iterators 
and ranges. If they point to elements within a container, and that container 
is altered, they're invalidated and aren't safe to use.

So, your example is fine, because it doesn't involve the in operator, and 
therefore doesn't involve any pointers into the AA. It's only once you have 
pointers into the AA that you get into trouble.

- Jonathan M Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 13:49, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 13:06:06 simendsjo wrote:
 Certainly not obvious to me :)

 Well, what's obvious to one person is not always obvious to another. I'm sure
 that there are plenty of things that Walter thinks should be perfectly obvious
 which 90% of programmers would never think of. A lot of it depends on what
 you're experience level is and what you have experience in. In this case, it's
 very much like dealing with C++ iterators, so if you're used to dealing with
 them and when they do or don't get invalidate, then the situation with in and
 remove is probably quite straightforward and intuitive.

 auto c = new C();
 C[string] aa;
 aa["c"] = c;
 aa.remove("c");

 I guess using c from this point is valid, but if the only reference to c
 was inside the aa (and we got c using `in`), using c would have been
 undefined..?

 Using c is valid regardless. The problem is the in operator. It returns a
 pointer to the element inside of the AA. So, if you did

 C* d = aa["c"];

 then d would be a pointer to a reference to a C, and after the call to remove,
 the pointer is pointing to memory which may or may not be part of the AA
 anymore and which may or may not hold a reference to what c refers to. It's
 bit like doing this

 int* a()
 {
      int b;
      return&b;
 }

 though the compiler should catch this, since it's obviously wrong. In both
 cases, you're referring to memory which your really not supposed to be
 accessing anymore and may point to invalid memory. The case of the AA probably
 isn't as bad, since either you're pointing to memory on the GC which isn't
 part of the AA (but hasn't been collected, since you have a pointer to it), or
 you're pointing to an element in the AA which may still be the same as it was
 before it was removed, or it may be another element, or it may be garbage
 memory. You really don't know. It's undefined and therefore shouldn't be used.

 I don't believe that it's ever safe to assume that the pointer returned by the
 in operator is still valid after the AA that it comes from has had an element
 added or removed. It's the same with iterators or ranges with many iterators
 and ranges. If they point to elements within a container, and that container
 is altered, they're invalidated and aren't safe to use.

 So, your example is fine, because it doesn't involve the in operator, and
 therefore doesn't involve any pointers into the AA. It's only once you have
 pointers into the AA that you get into trouble.

 - Jonathan M Davis

Thanks for your clarifications.

Does this mean even this is undefined?
aa["a"] = new C();
auto c = "a" in aa;
aa["b"] = new C();
// Using c here is undefined as an element was added to aa

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.
 
 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

Yes. Depending on the current implementation of AAs and the state of aa when 
"b" is added to it, c could still point to exactly what it did before, or aa 
could have been rehashed, which could move any or all of its elements around, 
in which case who knows what c points to.

- Jonathan M Davis

Jan 08 2012

simendsjo <simendsjo gmail.com> writes:

On 08.01.2012 14:40, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.

 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

 Yes. Depending on the current implementation of AAs and the state of aa when
 "b" is added to it, c could still point to exactly what it did before, or aa
 could have been rehashed, which could move any or all of its elements around,
 in which case who knows what c points to.

 - Jonathan M Davis

Thanks! You've saved me a couple of future bugs :)

Jan 08 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, January 08, 2012 14:57:51 simendsjo wrote:
 On 08.01.2012 14:40, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.
 
 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

 
 Yes. Depending on the current implementation of AAs and the state of aa
 when "b" is added to it, c could still point to exactly what it did
 before, or aa could have been rehashed, which could move any or all of
 its elements around, in which case who knows what c points to.
 
 - Jonathan M Davis

 
 Thanks! You've saved me a couple of future bugs :)

No problem. It's a bit like how an array could be reallocated after you append 
to it, which causes slices to no longer point to that array, except that 
unlike the slice the pointer may not point to valid memory. Given that it's 
managed by the GC, it probably isn't pointing to complete garbage, but it 
could be used for something else now.

In general, if you insert or remove elements from a container, you must be 
very careful about any references to the data within the container - be they 
pointers to elements in the container, or iterators or ranges for the 
container. That's why std.container has the stable functions. They guarantee 
that the container's existing ranges don't get invalidated when they're 
called, whereas the ones that aren't stable make no such guarantees. The 
issues with in are just one of the manifestations of the more general issue.

- Jonathan M Davis

Jan 08 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 08 Jan 2012 08:40:27 -0500, Jonathan M Davis <jmdavisProg gmx.com>  
wrote:

 On Sunday, January 08, 2012 14:24:32 simendsjo wrote:
 Thanks for your clarifications.

 Does this mean even this is undefined?
 aa["a"] = new C();
 auto c = "a" in aa;
 aa["b"] = new C();
 // Using c here is undefined as an element was added to aa

 Yes. Depending on the current implementation of AAs and the state of aa  
 when
 "b" is added to it, c could still point to exactly what it did before,  
 or aa
 could have been rehashed, which could move any or all of its elements  
 around,
 in which case who knows what c points to.

Actually, not invalid for the current implementation.  I don't know if  
it's stated whether an AA specifically requires that elements do not  
re-associate on a rehash.

There are certain hash implementations (such as ones that involve a single  
array), which would invalidate pointers on a rehash.  So there is the  
potential (if it's not a stated requirement to the contrary in the spec)  
that some future AA implementation would invalidated references.  However,  
I'd say it's unlikely this would happen.

BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that  
adding elements does not invalidated cursors (dcollections' safe version  
of pointers) as long as you use the default Hash implementation.  However,  
I just noticed this is not stated anywhere in the docs (will fix).



-Steve

Jan 09 2012

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, January 09, 2012 09:25:14 Steven Schveighoffer wrote:
 Actually, not invalid for the current implementation. I don't know if
 it's stated whether an AA specifically requires that elements do not
 re-associate on a rehash.

Well, like I said, it depends on the current implementation. There are hash 
table implementations where rehashing would invalidate the pointer returned by 
in, and I don't believe that the spec specificies that D's AA guarantees that 
rehashing doesn't do that. So in the future, it _could_ be changed to an 
implementation which invalidates the pointers on rehash. As such, it doesn't 
really matter what the current implementation does. Relying on the current 
behavior is unsafe if it's not guaranteed by the spec. Now, if we want to 
change the spec to make such guarantees, that's fine, but I don't believe it 
makes them right now. Good to know what the current implementation is doing 
though.

- Jonathan m Davis

Jan 09 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that
 adding elements does not invalidated cursors (dcollections' safe version
 of pointers) as long as you use the default Hash implementation.  However,
 I just noticed this is not stated anywhere in the docs (will fix).

Funny, I was looking at the docs a few days ago and it said "Adding an
element can invalidate cursors depending on the implementation.", so I
just assumed it did invalidate the cursors.

I do think those are dcollections v1 docs though. Anyway glad to hear
from you about this.

Jan 09 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 09 Jan 2012 13:35:26 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 BTW, dcollections' HashMap, HashSet, and HashMultiset do guarantee that
 adding elements does not invalidated cursors (dcollections' safe version
 of pointers) as long as you use the default Hash implementation.   
 However,
 I just noticed this is not stated anywhere in the docs (will fix).

 Funny, I was looking at the docs a few days ago and it said "Adding an
 element can invalidate cursors depending on the implementation.", so I
 just assumed it did invalidate the cursors.

Yeah, I could have sworn I stated somewhere that the current  
implementation doesn't invalidate cursors, but apparently it's not  
stated.  I think the docs need a lot of TLC before the first release.   
Some day when I have time...

To clarify what I plan to do, the add doc will say something like "adding  
an element can invalidate cursors depending on the implementation, however  
the default implementation guarantees not to invalidate them."  I don't  
want to specifically disallow implementations which do invalidate  
(dcollections' implementations are pluggable so anyone can replace the  
internals).

 I do think those are dcollections v1 docs though. Anyway glad to hear
 from you about this.

The D2 docs are somewhat leftover from D1 version.  However, I do remember  
when implementing the hash code that it purposely does not invalidate  
cursors on a rehash.  Ranges can be invalidated, however.  Given the  
implementation, it's actually faster *not* to invalidate them when  
rehashing because I just relink all the link-list nodes instead of  
reallocating them.

-Steve

Jan 09 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Ok, allow me to temporarily hijack again and ask: Would you mind
adding opIn_r (or rather the newer opBinaryRight with "in") that
forwards to contains() for the HashSet and similar hash-based classes
that define contains()? It would make porting code that uses builtin
hashes to your own implementation easier.

Jan 09 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 09 Jan 2012 14:57:36 -0500, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 Ok, allow me to temporarily hijack again and ask: Would you mind
 adding opIn_r (or rather the newer opBinaryRight with "in") that
 forwards to contains() for the HashSet and similar hash-based classes
 that define contains()? It would make porting code that uses builtin
 hashes to your own implementation easier.

Could this be you?

http://www.dsource.org/projects/dcollections/ticket/18

If so, this means you are OK with the last suggestion I made?

-Steve

Jan 09 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/9/12, Steven Schveighoffer <schveiguy yahoo.com> wrote:
 Could this be you?

Ah, yes. I didn't even notice you've replied to that, sorry. Yes, I'm
ok with it.

Jan 09 2012

Manfred Nowak <svv1999 hotmail.com> writes:

Jonathan M Davis wrote:

 In this case, it's very much like dealing with C++ iterators

A relevant part of Andrei's thread on "associative arrays iteration":

http://www.digitalmars.com/d/archives/digitalmars/D/associative_arrays_
iteration_is_finally_here_99576.html#N99614

-manfred

Jan 08 2012

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 1/8/12, simendsjo <simendsjo gmail.com> wrote:
 Wouldn't it make sense to return a pointer to the item being
 removed/null?

Seems like that would be even more costly. Personally I think
returning bool is unnecessary, if we really want to know if something
is in the hash we can check with the `in` operator.

I filed that bug ages ago simply because it was in TDPL, but I didn't
really give it much thought.

Jan 08 2012

Kapps <Kapps NotValidEmail.com> writes:

Looks like this is fixed for 2.058.

https://github.com/D-Programming-Language/dmd/commit/3e23b0f5834acb32eaee20d88c30ead7e03bb2f4

On 08/01/2012 3:43 AM, Jonathan M Davis wrote:
 On Sunday, January 08, 2012 03:24:22 Kapps wrote:
 Ah, found the bug / pull request.

 https://github.com/D-Programming-Language/dmd/pull/597
 http://d.puremagic.com/issues/show_bug.cgi?id=4523

 Ah, TDPL says that it returns bool. Well, then it definitely needs to be
 changed, and it's good to see that that there's a pull request for it. So, it
 should be fixed in the next release. In fact, the number of TDPL-related bugs
 fixed for the next release should be quite high, which is great.

 - Jonathan M Davis

Jan 09 2012

D Programming

C/C++ Programming

Other

digitalmars.D.learn - associative arrays