digitalmars.D - associative arrays: iteration is finally here

Andrei Alexandrescu (19/19) Oct 28 2009 Walter has magically converted his work on T[new] into work on making

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (3/30) Oct 28 2009 aa.each, aa.keys and aa.values seem good names?

Andrei Alexandrescu (5/35) Oct 28 2009 That is debatable as it would make the same code do different things for...

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (5/46) Oct 28 2009 Is this bad? If you want an array from them you could just construct it

Lars T. Kyllingstad (6/19) Oct 28 2009 I've used iteration over values more often than iteration over keys.

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (5/28) Oct 28 2009 I don't understand this, when do you want the values without the keys?

Lars T. Kyllingstad (21/49) Oct 28 2009 Here's an example:

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (9/71) Oct 28 2009 I think foreach should be consistent with opIn, that is,

Nick Sabalausky (8/16) Oct 29 2009 I've thought for a long while that "in" should be value-based (so you ca...

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (3/21) Oct 30 2009 I, too, want opIn to work on arrays. On values. As a linear search. I do...

Leandro Lucarella (11/41) Oct 28 2009 I might be too pythonic, but aa.items sounds a little better for me ;)

Justin Johansson (5/32) Oct 28 2009 Don't know if I'm off track but ...
dsimcha (11/30) Oct 28 2009 Awesome, this definitely improves the interface, but how about the imple...

Andrei Alexandrescu (11/42) Oct 28 2009 I'm afraid that efficiency is a matter I need to defer to the community

Lars T. Kyllingstad (6/58) Oct 28 2009 Isn't that what the in operator does?

Andrei Alexandrescu (3/67) Oct 28 2009 Correct, sorry for the oversight.

Max Samukha (7/26) Oct 28 2009 It looks pretty intuitive to me.
yigal chripun (3/30) Oct 28 2009 that looks neat. What's the mechanism to tie the templates to the syntax...

Andrei Alexandrescu (5/38) Oct 28 2009 The cheapest to implement. As I mentioned in the Bogo containers

bearophile (9/11) Oct 28 2009 Iterating on the keys is more useful, in real-world programs.

Nick Sabalausky (18/33) Oct 29 2009 If you mean because it's trivial to get a value given just the key, but

Denis Koroskin (47/66) Oct 28 2009 Wow, this is outstanding! (I hope it didn't have any negative impact on ...

Robert Jacques (7/83) Oct 28 2009 Not finding an element is a common use case, not an exception. Using

=?ISO-8859-15?Q?Pelle_M=E5nsson?= (3/85) Oct 28 2009 Also, if opIn throws an exception, it kind of defeats the point of opIn,...
Denis Koroskin (5/106) Oct 28 2009 Ooops, right, I first wrote it to return a pointer but changed to a

=?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= (4/54) Oct 28 2009 Try implementing the range interface (front, popFront and empty), and
Andrei Alexandrescu (11/90) Oct 28 2009 Of course. In fact, given the iterator with .key and .value, you can

Denis Koroskin (5/19) Oct 28 2009 It doesn't have to be the case: key and value are both properties (i.e. ...

Andrei Alexandrescu (5/31) Oct 28 2009 I see. So you want a pointer to an elaborate type featuring a key and a

Denis Koroskin (4/33) Oct 28 2009 Well, my extended test case looks something up, manipulates the found

Andrei Alexandrescu (3/43) Oct 28 2009 Ok, I understand your points, thanks for explaining.

Leandro Lucarella (17/26) Oct 29 2009 What about and overload of remove() like this:

Andrei Alexandrescu (7/26) Oct 29 2009 I think this all is overdoing it. First, I disagree that remove should

Bill Baxter (15/44) Oct 29 2009 I agree with the folks who say it's error-prone. I can just see

KennyTM~ (13/59) Oct 29 2009 Um what? aa["theKey"] = 1 doesn't fail, why should aa.remove("theKey")

bearophile (6/11) Oct 29 2009 That's a different situation.

KennyTM~ (6/17) Oct 29 2009 This does not contradicts with .remove() not throwing an exception when

Leandro Lucarella (12/37) Oct 29 2009 Not if it's an Error instead of an exception. I think that should be the
bearophile (5/8) Oct 29 2009 In nothrow functions you can use a different method, like "discard" (or ...

Simen Kjaeraas (4/10) Oct 29 2009 Please elucidate, what is unsafe about deleting something that isn't the...

Leandro Lucarella (12/23) Oct 29 2009 A leak, if the key was wrong (not if the key was right but deleted

Bill Baxter (16/71) Oct 29 2009 t

Andrei Alexandrescu (15/51) Oct 29 2009 I don't find this reasonable. If you know removal must have succeeded,

Leandro Lucarella (19/35) Oct 29 2009 I don't agree, this is like saying you should bound-check every array

Andrei Alexandrescu (8/38) Oct 29 2009 Bounds checking is eliminated in release mode solely for efficiency

Leandro Lucarella (30/70) Oct 29 2009 In the sense of "memory corruption" that's true. But your program will b...

Bill Baxter (15/43) Oct 29 2009 far.

Leandro Lucarella (43/62) Oct 29 2009 I don't think that's the case. If you have a wrong value in a pointer yo...

Bill Baxter (17/79) Oct 29 2009 AA

Steven Schveighoffer (9/26) Oct 29 2009 It could return a struct holding both pointers, instead of a pointer to ...

bearophile (5/7) Oct 29 2009 I think that's a small design mistake. In a high level language you want...

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (2/13) Oct 29 2009 I agree with this. I usually want exceptions.

bearophile (4/5) Oct 29 2009 A problem with this is that currently exceptions are very slow in D comp...

bearophile (4/8) Oct 29 2009 I have just redone the timings with LDC on Ubuntu (I'll put the timings ...

rmcguire (4/104) Oct 29 2009 Wouldn't opIn be more useful if it returned a range starting with

Andrei Alexandrescu (4/6) Oct 29 2009 Thought about that, but it's hard to justify. Items aren't sorted in any...

rmcguire (9/18) Oct 29 2009 Oh, I thought that ranges were going to be applied to network/file IO.

Steven Schveighoffer (38/55) Oct 29 2009 This is one of the benefits of opApply.
Kagamin (2/4) Oct 29 2009 Well, AA already has properties keys and values and you can already iter...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Walter has magically converted his work on T[new] into work on making 
associative arrays true templates defined in druntime and not considered 
very special by the compiler.

This is very exciting because it opens up or simplifies a number of 
possibilities. One is that of implementing true iteration. I actually 
managed to implement last night something that allows you to do:

int[int] aa = [ 1:1 ];
auto iter = aa.each;
writeln(iter.front.key);
writeln(iter.front.value);

Two other iterations are possible: by key and by value (in those cases 
iter.front just returns a key or a value).

One question is, what names should these bear? I am thinking of makign 
opSlice() a universal method of getting the "all" iterator, a default 
that every container must implement.

For AAs, there would be a "iterate keys" and "iterate values" properties 
or functions. How should they be called?


Thanks,

Andrei

Oct 28 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Andrei Alexandrescu wrote:
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not considered 
 very special by the compiler.
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:
 
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 
 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).
 
 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.
 
 For AAs, there would be a "iterate keys" and "iterate values" properties 
 or functions. How should they be called?
 
 
 Thanks,
 
 Andrei

aa.each, aa.keys and aa.values seem good names?

Also, foreach with a single variable should default to keys, in my opinion.

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 aa.each, aa.keys and aa.values seem good names?

The latter two would break existing definitions of keys and values.

 Also, foreach with a single variable should default to keys, in my opinion.

That is debatable as it would make the same code do different things for 
e.g. vectors and sparse vectors.


Andrei

Oct 28 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Andrei Alexandrescu wrote:
 Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those 
 cases iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of 
 makign opSlice() a universal method of getting the "all" iterator, a 
 default that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 aa.each, aa.keys and aa.values seem good names?

 
 The latter two would break existing definitions of keys and values.
 

Is this bad? If you want an array from them you could just construct it 
from the iterator.

 Also, foreach with a single variable should default to keys, in my 
 opinion.

 
 That is debatable as it would make the same code do different things for 
 e.g. vectors and sparse vectors.
 
 
 Andrei

Debatable indeed, but I find myself using either just the keys or the 
keys and values together, rarely just the values. Maybe that's just me.

Oct 28 2009

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Pelle M�nsson wrote:
 Also, foreach with a single variable should default to keys, in my 
 opinion.

 That is debatable as it would make the same code do different things 
 for e.g. vectors and sparse vectors.


 Andrei

 
 Debatable indeed, but I find myself using either just the keys or the 
 keys and values together, rarely just the values. Maybe that's just me.


I've used iteration over values more often than iteration over keys.

Besides, I think consistency is important. Since the default for an 
ordinary array is to iterate over the values, it should be the same for 
associative arrays.

-Lars

Oct 28 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Lars T. Kyllingstad wrote:
 Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Pelle M�nsson wrote:
 Also, foreach with a single variable should default to keys, in my 
 opinion.

 That is debatable as it would make the same code do different things 
 for e.g. vectors and sparse vectors.


 Andrei

 Debatable indeed, but I find myself using either just the keys or the 
 keys and values together, rarely just the values. Maybe that's just me.

 
 
 I've used iteration over values more often than iteration over keys.
 
 Besides, I think consistency is important. Since the default for an 
 ordinary array is to iterate over the values, it should be the same for 
 associative arrays.
 
 -Lars

I don't understand this, when do you want the values without the keys? 
If you do, shouldn't you be using a regular array?

Actually, it doesn't matter all that much, as long as we get .keys and 
.values as alternatives.

Oct 28 2009

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Pelle M�nsson wrote:
 Lars T. Kyllingstad wrote:
 Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Pelle M�nsson wrote:
 Also, foreach with a single variable should default to keys, in my 
 opinion.

 That is debatable as it would make the same code do different things 
 for e.g. vectors and sparse vectors.


 Andrei

 Debatable indeed, but I find myself using either just the keys or the 
 keys and values together, rarely just the values. Maybe that's just me.


 I've used iteration over values more often than iteration over keys.

 Besides, I think consistency is important. Since the default for an 
 ordinary array is to iterate over the values, it should be the same 
 for associative arrays.

 -Lars

 I don't understand this, when do you want the values without the keys? 
 If you do, shouldn't you be using a regular array?

Here's an example:

    class SomeObject { ... }
    void doStuffWith(SomeObject s) { ... }
    void doOtherStuffWith(SomeObject s) { ... }

    // Make a collection of objects indexed by ID strings.
    SomeObject[string] myObjects;
    ...

    // First I just want to do something with one of the
    // objects, namely the one called "foo".
    doStuffWith(myObjects["foo"]);

    // Then, I want to do something with all the objects.
    foreach (obj; myObjects)  doOtherStuffWith(obj);

Of course, if iteration was over keys instead of values, I'd just write

    foreach (id, obj; myObjects)  doOtherStuffWith(obj);

But then again, right now, when iteration is over values and I want the 
keys I can just write the same thing. It all comes down to preference, 
and I prefer things the way they are now. :)


 Actually, it doesn't matter all that much, as long as we get .keys and 
 .values as alternatives.

I still think the default for foreach should be consistent with normal 
arrays.

-Lars

Oct 28 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Lars T. Kyllingstad wrote:
 Pelle M�nsson wrote:
 Lars T. Kyllingstad wrote:
 Pelle M�nsson wrote:
 Andrei Alexandrescu wrote:
 Pelle M�nsson wrote:
 Also, foreach with a single variable should default to keys, in my 
 opinion.

 That is debatable as it would make the same code do different 
 things for e.g. vectors and sparse vectors.


 Andrei

 Debatable indeed, but I find myself using either just the keys or 
 the keys and values together, rarely just the values. Maybe that's 
 just me.


 I've used iteration over values more often than iteration over keys.

 Besides, I think consistency is important. Since the default for an 
 ordinary array is to iterate over the values, it should be the same 
 for associative arrays.

 -Lars

 I don't understand this, when do you want the values without the keys? 
 If you do, shouldn't you be using a regular array?

 
 Here's an example:
 
    class SomeObject { ... }
    void doStuffWith(SomeObject s) { ... }
    void doOtherStuffWith(SomeObject s) { ... }
 
    // Make a collection of objects indexed by ID strings.
    SomeObject[string] myObjects;
    ...
 
    // First I just want to do something with one of the
    // objects, namely the one called "foo".
    doStuffWith(myObjects["foo"]);
 
    // Then, I want to do something with all the objects.
    foreach (obj; myObjects)  doOtherStuffWith(obj);
 
 Of course, if iteration was over keys instead of values, I'd just write
 
    foreach (id, obj; myObjects)  doOtherStuffWith(obj);
 
 But then again, right now, when iteration is over values and I want the 
 keys I can just write the same thing. It all comes down to preference, 
 and I prefer things the way they are now. :)
 
 
 Actually, it doesn't matter all that much, as long as we get .keys and 
 .values as alternatives.

 
 I still think the default for foreach should be consistent with normal 
 arrays.
 
 -Lars

I think foreach should be consistent with opIn, that is,
if (foo in aa) { //it is in the aa.
   foreach (f; aa) { // loop over each item in the aa
     //I expect foo to show up in here, since it is "in" the aa.
   }
}

I use key iteration more than I use value iteration, and it is what I am 
used to. It is, as you say, a matter of preference.

Oct 28 2009

"Nick Sabalausky" <a a.a> writes:

"Pelle M�nsson" <pelle.mansson gmail.com> wrote in message 
news:hcaaro$15e4$1 digitalmars.com...
 I think foreach should be consistent with opIn, that is,
 if (foo in aa) { //it is in the aa.
   foreach (f; aa) { // loop over each item in the aa
     //I expect foo to show up in here, since it is "in" the aa.
   }
 }

 I use key iteration more than I use value iteration, and it is what I am 
 used to. It is, as you say, a matter of preference.

I've thought for a long while that "in" should be value-based (so you can do 
things like "if(foo in [1,2,7,9])" instead of the not-as-nice 
"if([1,2,7,9].contains(foo))"), and that there should be some other way to 
check for the existance of a key (like "aa.hasKey(key)" or "key in aa.keys", 
or something like that). I need to check for values in an array much more 
often than I need to check for keys in an aa.

Oct 29 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Nick Sabalausky wrote:
 "Pelle M�nsson" <pelle.mansson gmail.com> wrote in message 
 news:hcaaro$15e4$1 digitalmars.com...
 I think foreach should be consistent with opIn, that is,
 if (foo in aa) { //it is in the aa.
   foreach (f; aa) { // loop over each item in the aa
     //I expect foo to show up in here, since it is "in" the aa.
   }
 }

 I use key iteration more than I use value iteration, and it is what I am 
 used to. It is, as you say, a matter of preference.

 
 I've thought for a long while that "in" should be value-based (so you can do 
 things like "if(foo in [1,2,7,9])" instead of the not-as-nice 
 "if([1,2,7,9].contains(foo))"), and that there should be some other way to 
 check for the existance of a key (like "aa.hasKey(key)" or "key in aa.keys", 
 or something like that). I need to check for values in an array much more 
 often than I need to check for keys in an aa. 

I, too, want opIn to work on arrays. On values. As a linear search. I do 
not see why you would want to remove it on AA keys, though.

Oct 30 2009

Leandro Lucarella <llucax gmail.com> writes:

Pelle Månsson, el 28 de octubre a las 15:48 me escribiste:
 Andrei Alexandrescu wrote:
Walter has magically converted his work on T[new] into work on
making associative arrays true templates defined in druntime and
not considered very special by the compiler.

This is very exciting because it opens up or simplifies a number
of possibilities. One is that of implementing true iteration. I
actually managed to implement last night something that allows you
to do:

int[int] aa = [ 1:1 ];
auto iter = aa.each;
writeln(iter.front.key);
writeln(iter.front.value);

Two other iterations are possible: by key and by value (in those
cases iter.front just returns a key or a value).

One question is, what names should these bear? I am thinking of
makign opSlice() a universal method of getting the "all" iterator,
a default that every container must implement.

For AAs, there would be a "iterate keys" and "iterate values"
properties or functions. How should they be called?


Thanks,

Andrei

 aa.each, aa.keys and aa.values seem good names?

I might be too pythonic, but aa.items sounds a little better for me ;)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Can you stand up?
I do believe it's working, good.
That'll keep you going through the show
Come on it's time to go.

Oct 28 2009

Justin Johansson <no spam.com> writes:

Andrei Alexandrescu Wrote:

 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not considered 
 very special by the compiler.
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:
 
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 
 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).
 
 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.
 
 For AAs, there would be a "iterate keys" and "iterate values" properties 
 or functions. How should they be called?
 
 
 Thanks,
 
 Andrei

Don't know if I'm off track but ...
would "opApplyKeys" and "opApplyValues" be a useful analogy in somewhat
keeping with the current semantics of opApply?

Justin

Oct 28 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Walter has magically converted his work on T[new] into work on making
 associative arrays true templates defined in druntime and not considered
 very special by the compiler.
 This is very exciting because it opens up or simplifies a number of
 possibilities. One is that of implementing true iteration. I actually
 managed to implement last night something that allows you to do:
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 Two other iterations are possible: by key and by value (in those cases
 iter.front just returns a key or a value).
 One question is, what names should these bear? I am thinking of makign
 opSlice() a universal method of getting the "all" iterator, a default
 that every container must implement.
 For AAs, there would be a "iterate keys" and "iterate values" properties
 or functions. How should they be called?
 Thanks,
 Andrei

Awesome, this definitely improves the interface, but how about the
implementation?
 The current implementation, while fast for reading, is unbelievably slow for
adding elements, requires a heap allocation (read:  a global lock) on *every*
insertion, and generates an insane amount of false pointers.  Even if I succeed
in
making heap scanning (mostly) precise, it's not clear if the current AA impl.
could easily be made to benefit, since it isn't template based.  It uses RTTI
internally instead, and the types it's operating on aren't known to the
implementation at compile time, so I wouldn't be able to use templates to
generate
the bitmask at compile time.  The structs it uses internally would therefor have
to be scanned conservatively.

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Walter has magically converted his work on T[new] into work on making
 associative arrays true templates defined in druntime and not considered
 very special by the compiler.
 This is very exciting because it opens up or simplifies a number of
 possibilities. One is that of implementing true iteration. I actually
 managed to implement last night something that allows you to do:
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 Two other iterations are possible: by key and by value (in those cases
 iter.front just returns a key or a value).
 One question is, what names should these bear? I am thinking of makign
 opSlice() a universal method of getting the "all" iterator, a default
 that every container must implement.
 For AAs, there would be a "iterate keys" and "iterate values" properties
 or functions. How should they be called?
 Thanks,
 Andrei

 
 Awesome, this definitely improves the interface, but how about the
implementation?
  The current implementation, while fast for reading, is unbelievably slow for
 adding elements, requires a heap allocation (read:  a global lock) on *every*
 insertion, and generates an insane amount of false pointers.  Even if I
succeed in
 making heap scanning (mostly) precise, it's not clear if the current AA impl.
 could easily be made to benefit, since it isn't template based.  It uses RTTI
 internally instead, and the types it's operating on aren't known to the
 implementation at compile time, so I wouldn't be able to use templates to
generate
 the bitmask at compile time.  The structs it uses internally would therefor
have
 to be scanned conservatively.

I'm afraid that efficiency is a matter I need to defer to the community 
for now. Right now, I am trying to get TDPL done. Having or not having 
range-style iteration influences the material. Making that efficient is 
a matter that would not influence the material (as long as there is a 
strong belief that that's doable).

Unrelated: one thing that we need to change about AAs is the inability 
to get a true reference to the stored element. aa[k] returns an rvalue, 
and a[k] = v is done in a manner akin to opIndexAssign. But a serious AA 
should have a method of reaching the actual storage for a value, I think.


Andrei

Oct 28 2009

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

Andrei Alexandrescu wrote:
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s 
 article
 Walter has magically converted his work on T[new] into work on making
 associative arrays true templates defined in druntime and not considered
 very special by the compiler.
 This is very exciting because it opens up or simplifies a number of
 possibilities. One is that of implementing true iteration. I actually
 managed to implement last night something that allows you to do:
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 Two other iterations are possible: by key and by value (in those cases
 iter.front just returns a key or a value).
 One question is, what names should these bear? I am thinking of makign
 opSlice() a universal method of getting the "all" iterator, a default
 that every container must implement.
 For AAs, there would be a "iterate keys" and "iterate values" properties
 or functions. How should they be called?
 Thanks,
 Andrei

 Awesome, this definitely improves the interface, but how about the 
 implementation?
  The current implementation, while fast for reading, is unbelievably 
 slow for
 adding elements, requires a heap allocation (read:  a global lock) on 
 *every*
 insertion, and generates an insane amount of false pointers.  Even if 
 I succeed in
 making heap scanning (mostly) precise, it's not clear if the current 
 AA impl.
 could easily be made to benefit, since it isn't template based.  It 
 uses RTTI
 internally instead, and the types it's operating on aren't known to the
 implementation at compile time, so I wouldn't be able to use templates 
 to generate
 the bitmask at compile time.  The structs it uses internally would 
 therefor have
 to be scanned conservatively.

 
 I'm afraid that efficiency is a matter I need to defer to the community 
 for now. Right now, I am trying to get TDPL done. Having or not having 
 range-style iteration influences the material. Making that efficient is 
 a matter that would not influence the material (as long as there is a 
 strong belief that that's doable).
 
 Unrelated: one thing that we need to change about AAs is the inability 
 to get a true reference to the stored element. aa[k] returns an rvalue, 
 and a[k] = v is done in a manner akin to opIndexAssign. But a serious AA 
 should have a method of reaching the actual storage for a value, I think.

Isn't that what the in operator does?

    T[U] aa;
    U key = something;
    T* p = key in aa;

-Lars

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lars T. Kyllingstad wrote:
 Andrei Alexandrescu wrote:
 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s 
 article
 Walter has magically converted his work on T[new] into work on making
 associative arrays true templates defined in druntime and not 
 considered
 very special by the compiler.
 This is very exciting because it opens up or simplifies a number of
 possibilities. One is that of implementing true iteration. I actually
 managed to implement last night something that allows you to do:
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 Two other iterations are possible: by key and by value (in those cases
 iter.front just returns a key or a value).
 One question is, what names should these bear? I am thinking of makign
 opSlice() a universal method of getting the "all" iterator, a default
 that every container must implement.
 For AAs, there would be a "iterate keys" and "iterate values" 
 properties
 or functions. How should they be called?
 Thanks,
 Andrei

 Awesome, this definitely improves the interface, but how about the 
 implementation?
  The current implementation, while fast for reading, is unbelievably 
 slow for
 adding elements, requires a heap allocation (read:  a global lock) on 
 *every*
 insertion, and generates an insane amount of false pointers.  Even if 
 I succeed in
 making heap scanning (mostly) precise, it's not clear if the current 
 AA impl.
 could easily be made to benefit, since it isn't template based.  It 
 uses RTTI
 internally instead, and the types it's operating on aren't known to the
 implementation at compile time, so I wouldn't be able to use 
 templates to generate
 the bitmask at compile time.  The structs it uses internally would 
 therefor have
 to be scanned conservatively.

 I'm afraid that efficiency is a matter I need to defer to the 
 community for now. Right now, I am trying to get TDPL done. Having or 
 not having range-style iteration influences the material. Making that 
 efficient is a matter that would not influence the material (as long 
 as there is a strong belief that that's doable).

 Unrelated: one thing that we need to change about AAs is the inability 
 to get a true reference to the stored element. aa[k] returns an 
 rvalue, and a[k] = v is done in a manner akin to opIndexAssign. But a 
 serious AA should have a method of reaching the actual storage for a 
 value, I think.

 
 Isn't that what the in operator does?
 
    T[U] aa;
    U key = something;
    T* p = key in aa;
 
 -Lars

Correct, sorry for the oversight.

Andrei

Oct 28 2009

Max Samukha <spambox d-coding.com> writes:

On Wed, 28 Oct 2009 09:22:00 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Walter has magically converted his work on T[new] into work on making 
associative arrays true templates defined in druntime and not considered 
very special by the compiler.

This is very exciting because it opens up or simplifies a number of 
possibilities. One is that of implementing true iteration. I actually 
managed to implement last night something that allows you to do:

int[int] aa = [ 1:1 ];
auto iter = aa.each;
writeln(iter.front.key);
writeln(iter.front.value);

Two other iterations are possible: by key and by value (in those cases 
iter.front just returns a key or a value).

One question is, what names should these bear? I am thinking of makign 
opSlice() a universal method of getting the "all" iterator, a default 
that every container must implement.

It looks pretty intuitive to me.

For AAs, there would be a "iterate keys" and "iterate values" properties 
or functions. How should they be called?

eachKey, eachValue
keyRange, valueRange
keys, values

I'd prefer the last one.

Thanks,

Andrei

Oct 28 2009

yigal chripun <yigal100 gmail.com> writes:

Andrei Alexandrescu Wrote:

 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not considered 
 very special by the compiler.
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:
 
 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);
 
 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).
 
 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.
 
 For AAs, there would be a "iterate keys" and "iterate values" properties 
 or functions. How should they be called?
 
 
 Thanks,
 
 Andrei

that looks neat. What's the mechanism to tie the templates to the syntax? 

I don't understand why all containers must provide a default range. What's the
default for a binary tree?  it has several orderings (pre, post, infix) but i
can't say that one is "more default" the the other.

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

yigal chripun wrote:
 Andrei Alexandrescu Wrote:
 
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not considered 
 very special by the compiler.

 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" properties 
 or functions. How should they be called?


 Thanks,

 Andrei

 
 that looks neat. What's the mechanism to tie the templates to the syntax? 
 
 I don't understand why all containers must provide a default range. What's the
default for a binary tree?  it has several orderings (pre, post, infix) but i
can't say that one is "more default" the the other.  

The cheapest to implement. As I mentioned in the Bogo containers 
discussion, I think any container must implement some way of iterating 
it. A container without an iterator is not a container.

Andrei

Oct 28 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 That is debatable as it would make the same code do different things for 
 e.g. vectors and sparse vectors.

Iterating on the keys is more useful, in real-world programs.

Regarding the names:
- "keys", "values" return lazy iterators. "keys" returns a set-like object that
supports an O(1) opIn_r (and eventually few other basic set operations).
- "items" returns a lazy iterator of Tuple(key, value) (structs, then). This
may also be named "pairs".
- "allkeys", "allvalues", "allitems"/"allpairs" return arrays of
items/vales/Tuple.

If you want to keep the API small you can even omit "allkeys", "allvalues",
"allitems" (so you need to do array(aa.keys) if you want them all.

Bye,
bearophile

Oct 28 2009

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:hca087$dvm$1 digitalmars.com...
 Andrei Alexandrescu:

 That is debatable as it would make the same code do different things for
 e.g. vectors and sparse vectors.

 Iterating on the keys is more useful, in real-world programs.

If you mean because it's trivial to get a value given just the key, but 
not-so-trivial the other way around, then the same argument could be made 
for regular arrays. In fact, IIRC, Haxe does just that. But I find that to 
be more of a pain than just adding that second iteration variable whenever I 
actually do need the index/key.

 Regarding the names:
 - "keys", "values" return lazy iterators. "keys" returns a set-like object 
 that supports an O(1) opIn_r (and eventually few other basic set 
 operations).
 - "items" returns a lazy iterator of Tuple(key, value) (structs, then). 
 This may also be named "pairs".
 - "allkeys", "allvalues", "allitems"/"allpairs" return arrays of 
 items/vales/Tuple.

 If you want to keep the API small you can even omit "allkeys", 
 "allvalues", "allitems" (so you need to do array(aa.keys) if you want them 
 all.

This is pretty much what I had in mind as well.

On the second bullet point, my vote goes for "pairs". In many contexts (like 
.NET, I think?) "item" is often used to refer to just a value, which could 
be potentially confusing.

And I think I'd prefer to omit the "all*" stuff, partly because under that 
scheme "keys"/"values" return ranges that *also* cover "all" the elements, 
which would be confusing, and also partly in hope of member-call-syntax 
eventually being extended to all types instead of just arrays so it could 
then become a nice clean "aa.keys.array". But if there's a performance issue 
with that, then maybe just something more like "aa.arraykeys" or 
"aa.eagerkeys".

Oct 29 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter has magically converted his work on T[new] into work on making  
 associative arrays true templates defined in druntime and not considered  
 very special by the compiler.

Wow, this is outstanding! (I hope it didn't have any negative impact on  
compile-time AA capabilities).

 This is very exciting because it opens up or simplifies a number of  
 possibilities. One is that of implementing true iteration. I actually  
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases  
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign  
 opSlice() a universal method of getting the "all" iterator, a default  
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" properties  
 or functions. How should they be called?


 Thanks,

 Andrei

If AA is providing a way to iterate over both keys and values (and it's a  
default iteration scheme), why should AA provide 2 other iteration  
schemes? Can't they be implemented externally (using adaptor ranges) with  
the same efficiency?

foreach (e; keys(aa)) {
     writefln("key: %s", e);
}

foreach (e; values(aa)) {
     writefln("value: %s", e);
}

I'd also like you to add a few things in an AA interface.

First, opIn should not return a pointer to Value, but a pointer to a pair  
of Key and Value, if possible (i.e. if this change won't sacrifice  
performance).
Second, AA.remove method should accept result of opIn operation to avoid  
an additional lookup for removal:

if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
}

Something like this would be perfect:

struct Element(K,V)
{
     const K key;
     V value;
}

struct AA(K,V)
{
     //...
     ref Element opIn(K key) { /* throws an exception if element is not  
found */ }
     void remove(ref Element elem) { /* removes an element from an AA */ }
     void remove(K key) { remove(key in this); }

     AARange!(K,V) opSlice() { /* iterates over both keys and values */ }
}

Last, I believe foreach loop should automatically call opSlice() on  
iteratee. There is currently an inconsistency with built-in types - you  
don't have to call [] on them, yet you must call it on all the other types:

// fine if array is T[] or K[V]
foreach (i; array) { ... }

// opSlice() is explicit and mandatory for user-defined containers because  
they are not ranges.
foreach (i; container[]) { ... }

Thanks!

Oct 28 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 28 Oct 2009 15:06:34 -0400, Denis Koroskin <2korden gmail.com>  
wrote:

 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Walter has magically converted his work on T[new] into work on making  
 associative arrays true templates defined in druntime and not  
 considered very special by the compiler.

 Wow, this is outstanding! (I hope it didn't have any negative impact on  
 compile-time AA capabilities).

 This is very exciting because it opens up or simplifies a number of  
 possibilities. One is that of implementing true iteration. I actually  
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases  
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign  
 opSlice() a universal method of getting the "all" iterator, a default  
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values"  
 properties or functions. How should they be called?


 Thanks,

 Andrei

 If AA is providing a way to iterate over both keys and values (and it's  
 a default iteration scheme), why should AA provide 2 other iteration  
 schemes? Can't they be implemented externally (using adaptor ranges)  
 with the same efficiency?

 foreach (e; keys(aa)) {
      writefln("key: %s", e);
 }

 foreach (e; values(aa)) {
      writefln("value: %s", e);
 }

 I'd also like you to add a few things in an AA interface.

 First, opIn should not return a pointer to Value, but a pointer to a  
 pair of Key and Value, if possible (i.e. if this change won't sacrifice  
 performance).
 Second, AA.remove method should accept result of opIn operation to avoid  
 an additional lookup for removal:

 if (auto value = key in aa) {
      aa.remove(key); // an unnecessary lookup
 }

 Something like this would be perfect:

 struct Element(K,V)
 {
      const K key;
      V value;
 }

 struct AA(K,V)
 {
      //...
      ref Element opIn(K key) { /* throws an exception if element is not  
 found */ }

Not finding an element is a common use case, not an exception. Using  
exceptions to pass information is bad style, slow and prevents the use of  
AAs in pure/nothrow functions. Returning a pointer to an element would  
allow both key and value to be accessed and could be null if no element is  
found.

      void remove(ref Element elem) { /* removes an element from an AA */  
 }
      void remove(K key) { remove(key in this); }

      AARange!(K,V) opSlice() { /* iterates over both keys and values */ }
 }

 Last, I believe foreach loop should automatically call opSlice() on  
 iteratee. There is currently an inconsistency with built-in types - you  
 don't have to call [] on them, yet you must call it on all the other  
 types:

 // fine if array is T[] or K[V]
 foreach (i; array) { ... }

 // opSlice() is explicit and mandatory for user-defined containers  
 because they are not ranges.
 foreach (i; container[]) { ... }

 Thanks!

Oct 28 2009

=?ISO-8859-15?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Robert Jacques wrote:
 On Wed, 28 Oct 2009 15:06:34 -0400, Denis Koroskin <2korden gmail.com> 
 wrote:
 
 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 Wow, this is outstanding! (I hope it didn't have any negative impact 
 on compile-time AA capabilities).

 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those 
 cases iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of 
 makign opSlice() a universal method of getting the "all" iterator, a 
 default that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 If AA is providing a way to iterate over both keys and values (and 
 it's a default iteration scheme), why should AA provide 2 other 
 iteration schemes? Can't they be implemented externally (using adaptor 
 ranges) with the same efficiency?

 foreach (e; keys(aa)) {
      writefln("key: %s", e);
 }

 foreach (e; values(aa)) {
      writefln("value: %s", e);
 }

 I'd also like you to add a few things in an AA interface.

 First, opIn should not return a pointer to Value, but a pointer to a 
 pair of Key and Value, if possible (i.e. if this change won't 
 sacrifice performance).
 Second, AA.remove method should accept result of opIn operation to 
 avoid an additional lookup for removal:

 if (auto value = key in aa) {
      aa.remove(key); // an unnecessary lookup
 }

 Something like this would be perfect:

 struct Element(K,V)
 {
      const K key;
      V value;
 }

 struct AA(K,V)
 {
      //...
      ref Element opIn(K key) { /* throws an exception if element is 
 not found */ }

 
 Not finding an element is a common use case, not an exception. Using 
 exceptions to pass information is bad style, slow and prevents the use 
 of AAs in pure/nothrow functions. Returning a pointer to an element 
 would allow both key and value to be accessed and could be null if no 
 element is found.

Also, if opIn throws an exception, it kind of defeats the point of opIn, 
and turns it to opIndex.

Oct 28 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 28 Oct 2009 22:24:46 +0300, Robert Jacques <sandford jhu.edu>  
wrote:

 On Wed, 28 Oct 2009 15:06:34 -0400, Denis Koroskin <2korden gmail.com>  
 wrote:

 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 Walter has magically converted his work on T[new] into work on making  
 associative arrays true templates defined in druntime and not  
 considered very special by the compiler.

 Wow, this is outstanding! (I hope it didn't have any negative impact on  
 compile-time AA capabilities).

 This is very exciting because it opens up or simplifies a number of  
 possibilities. One is that of implementing true iteration. I actually  
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases  
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign  
 opSlice() a universal method of getting the "all" iterator, a default  
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values"  
 properties or functions. How should they be called?


 Thanks,

 Andrei

 If AA is providing a way to iterate over both keys and values (and it's  
 a default iteration scheme), why should AA provide 2 other iteration  
 schemes? Can't they be implemented externally (using adaptor ranges)  
 with the same efficiency?

 foreach (e; keys(aa)) {
      writefln("key: %s", e);
 }

 foreach (e; values(aa)) {
      writefln("value: %s", e);
 }

 I'd also like you to add a few things in an AA interface.

 First, opIn should not return a pointer to Value, but a pointer to a  
 pair of Key and Value, if possible (i.e. if this change won't sacrifice  
 performance).
 Second, AA.remove method should accept result of opIn operation to  
 avoid an additional lookup for removal:

 if (auto value = key in aa) {
      aa.remove(key); // an unnecessary lookup
 }

 Something like this would be perfect:

 struct Element(K,V)
 {
      const K key;
      V value;
 }

 struct AA(K,V)
 {
      //...
      ref Element opIn(K key) { /* throws an exception if element is not  
 found */ }

 Not finding an element is a common use case, not an exception. Using  
 exceptions to pass information is bad style, slow and prevents the use  
 of AAs in pure/nothrow functions. Returning a pointer to an element  
 would allow both key and value to be accessed and could be null if no  
 element is found.

Ooops, right, I first wrote it to return a pointer but changed to a  
reference in last moment (mixed it up with opIndex for some reason).
AA.remove should accept a pointer, too.

      void remove(ref Element elem) { /* removes an element from an AA  
 */ }
      void remove(K key) { remove(key in this); }

      AARange!(K,V) opSlice() { /* iterates over both keys and values */  
 }
 }

 Last, I believe foreach loop should automatically call opSlice() on  
 iteratee. There is currently an inconsistency with built-in types - you  
 don't have to call [] on them, yet you must call it on all the other  
 types:

 // fine if array is T[] or K[V]
 foreach (i; array) { ... }

 // opSlice() is explicit and mandatory for user-defined containers  
 because they are not ranges.
 foreach (i; container[]) { ... }

 Thanks!

Oct 28 2009

=?UTF-8?B?UGVsbGUgTcOlbnNzb24=?= <pelle.mansson gmail.com> writes:

Denis Koroskin wrote:
 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 
 Wow, this is outstanding! (I hope it didn't have any negative impact on 
 compile-time AA capabilities).
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 
 If AA is providing a way to iterate over both keys and values (and it's 
 a default iteration scheme), why should AA provide 2 other iteration 
 schemes? Can't they be implemented externally (using adaptor ranges) 
 with the same efficiency?
 
 foreach (e; keys(aa)) {
     writefln("key: %s", e);
 }
 
 foreach (e; values(aa)) {
     writefln("value: %s", e);
 }

Why would you prefer keys(aa) over aa.keys?

 Last, I believe foreach loop should automatically call opSlice() on 
 iteratee. There is currently an inconsistency with built-in types - you 
 don't have to call [] on them, yet you must call it on all the other types:

Try implementing the range interface (front, popFront and empty), and 
they are ranges. Magic! opApply is worth mentioning here, as well.

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 
 Wow, this is outstanding! (I hope it didn't have any negative impact on 
 compile-time AA capabilities).
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 
 If AA is providing a way to iterate over both keys and values (and it's 
 a default iteration scheme), why should AA provide 2 other iteration 
 schemes? Can't they be implemented externally (using adaptor ranges) 
 with the same efficiency?
 
 foreach (e; keys(aa)) {
     writefln("key: %s", e);
 }
 
 foreach (e; values(aa)) {
     writefln("value: %s", e);
 }

Of course. In fact, given the iterator with .key and .value, you can 
always apply map!"a.key" or map!"a.value" to select the desired member.

 I'd also like you to add a few things in an AA interface.
 
 First, opIn should not return a pointer to Value, but a pointer to a 
 pair of Key and Value, if possible (i.e. if this change won't sacrifice 
 performance).

I'm coy about adding that because it forces the implementation to hold 
keys and values next to each other. I think that was a minor mistake of 
STL - there's too much exposure of layout details.

 Second, AA.remove method should accept result of opIn operation to avoid 
 an additional lookup for removal:
 
 if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

I'll make aa.remove(key) always work and return a bool that tells you 
whether there was a mapping or not.

 Something like this would be perfect:
 
 struct Element(K,V)
 {
     const K key;
     V value;
 }
 
 struct AA(K,V)
 {
     //...
     ref Element opIn(K key) { /* throws an exception if element is not 
 found */ }
     void remove(ref Element elem) { /* removes an element from an AA */ }
     void remove(K key) { remove(key in this); }
 
     AARange!(K,V) opSlice() { /* iterates over both keys and values */ }
 }
 
 Last, I believe foreach loop should automatically call opSlice() on 
 iteratee.

foreach in D2 should already call opSlice() whenever it's defined. If it 
doesn't, that's a bug in the compiler.


Andrei

Oct 28 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Wed, 28 Oct 2009 23:18:08 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I'd also like you to add a few things in an AA interface.
  First, opIn should not return a pointer to Value, but a pointer to a  
 pair of Key and Value, if possible (i.e. if this change won't sacrifice  
 performance).

 I'm coy about adding that because it forces the implementation to hold  
 keys and values next to each other. I think that was a minor mistake of  
 STL - there's too much exposure of layout details.

It doesn't have to be the case: key and value are both properties (i.e.  
methods), and they doesn't have to be located next to each other.

 Second, AA.remove method should accept result of opIn operation to  
 avoid an additional lookup for removal:
  if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

 I'll make aa.remove(key) always work and return a bool that tells you  
 whether there was a mapping or not.

Err... How does it solve the double lookup problem?

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Wed, 28 Oct 2009 23:18:08 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 I'd also like you to add a few things in an AA interface.
  First, opIn should not return a pointer to Value, but a pointer to a 
 pair of Key and Value, if possible (i.e. if this change won't 
 sacrifice performance).

 I'm coy about adding that because it forces the implementation to hold 
 keys and values next to each other. I think that was a minor mistake 
 of STL - there's too much exposure of layout details.

 
 It doesn't have to be the case: key and value are both properties (i.e. 
 methods), and they doesn't have to be located next to each other.

I see. So you want a pointer to an elaborate type featuring a key and a 
value.

 Second, AA.remove method should accept result of opIn operation to 
 avoid an additional lookup for removal:
  if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

 I'll make aa.remove(key) always work and return a bool that tells you 
 whether there was a mapping or not.

 
 Err... How does it solve the double lookup problem?

Your test looks something up and then removes it.


Andrei

Oct 28 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 29 Oct 2009 03:08:34 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 28 Oct 2009 23:18:08 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I'd also like you to add a few things in an AA interface.
  First, opIn should not return a pointer to Value, but a pointer to a  
 pair of Key and Value, if possible (i.e. if this change won't  
 sacrifice performance).

 I'm coy about adding that because it forces the implementation to hold  
 keys and values next to each other. I think that was a minor mistake  
 of STL - there's too much exposure of layout details.

  It doesn't have to be the case: key and value are both properties  
 (i.e. methods), and they doesn't have to be located next to each other.

 I see. So you want a pointer to an elaborate type featuring a key and a  
 value.

 Second, AA.remove method should accept result of opIn operation to  
 avoid an additional lookup for removal:
  if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

 I'll make aa.remove(key) always work and return a bool that tells you  
 whether there was a mapping or not.

  Err... How does it solve the double lookup problem?

 Your test looks something up and then removes it.


 Andrei

Well, my extended test case looks something up, manipulates the found  
value, and then possibly removes it.

Oct 28 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Denis Koroskin wrote:
 On Thu, 29 Oct 2009 03:08:34 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Denis Koroskin wrote:
 On Wed, 28 Oct 2009 23:18:08 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:

 I'd also like you to add a few things in an AA interface.
  First, opIn should not return a pointer to Value, but a pointer to 
 a pair of Key and Value, if possible (i.e. if this change won't 
 sacrifice performance).

 I'm coy about adding that because it forces the implementation to 
 hold keys and values next to each other. I think that was a minor 
 mistake of STL - there's too much exposure of layout details.

  It doesn't have to be the case: key and value are both properties 
 (i.e. methods), and they doesn't have to be located next to each other.

 I see. So you want a pointer to an elaborate type featuring a key and 
 a value.

 Second, AA.remove method should accept result of opIn operation to 
 avoid an additional lookup for removal:
  if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

 I'll make aa.remove(key) always work and return a bool that tells 
 you whether there was a mapping or not.

  Err... How does it solve the double lookup problem?

 Your test looks something up and then removes it.


 Andrei

 
 Well, my extended test case looks something up, manipulates the found 
 value, and then possibly removes it.

Ok, I understand your points, thanks for explaining.

Andrei

Oct 28 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
Your test looks something up and then removes it.


Andrei

Well, my extended test case looks something up, manipulates the
found value, and then possibly removes it.

 
 Ok, I understand your points, thanks for explaining.

What about and overload of remove() like this:
bool remove(in T key, out U value);

If the element was present, it's returned in "value", so you can
manipulate it. I thought about just returning a pointer:
U* remove(in T key);

But I guess that pointer would point to the element stored in the the AA
private data, but that element was just removed, so bad things would
happen, that's why the only option is to copy the data, right?

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Y no es el centro del Universo! El sol gira alrededor de la tierra!
Miles de girasoles no pueden estar equivocados!
	-- Inodoro Pereyra

Oct 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 
 What about and overload of remove() like this:
 bool remove(in T key, out U value);
 
 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);
 
 But I guess that pointer would point to the element stored in the the AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

I think this all is overdoing it. First, I disagree that remove should 
ever throw an exception. It's not a code that the user is supposed to 
check (with dire consequences if she doesn't), it's just additional 
information just in case you need it.

I think bool remove(key) is better than all other designs suggested so far.


Andrei

Oct 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Oct 29, 2009 at 8:39 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 What about and overload of remove() like this:
 bool remove(in T key, out U value);

 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);

 But I guess that pointer would point to the element stored in the the AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

 I think this all is overdoing it. First, I disagree that remove should ever
 throw an exception. It's not a code that the user is supposed to check (with
 dire consequences if she doesn't), it's just additional information just in
 case you need it.

 I think bool remove(key) is better than all other designs suggested so far.

I agree with the folks who say it's error-prone.  I can just see
myself now removing a key I know is in the dictionary and being
baffled when my program fails somewhere later on because I typed
aa.remove("theKey") when it should have been aa.remove("thekey").  I
knew it was there so I didn't want to clutter up my code with a check
for it.

So the advice would be to always check to make sure the key you remove
got removed to be on the safe side.   If that's the case I'd rather
just have the darn thing throw an exception so I don't have to bother
to write things like
     if (!aa.remove("theKey")) assert(false);
everywhere just to be on the safe side.



--bb

Oct 29 2009

KennyTM~ <kennytm gmail.com> writes:

On Oct 29, 09 23:59, Bill Baxter wrote:
 On Thu, Oct 29, 2009 at 8:39 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>  wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 What about and overload of remove() like this:
 bool remove(in T key, out U value);

 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);

 But I guess that pointer would point to the element stored in the the AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

 I think this all is overdoing it. First, I disagree that remove should ever
 throw an exception. It's not a code that the user is supposed to check (with
 dire consequences if she doesn't), it's just additional information just in
 case you need it.

 I think bool remove(key) is better than all other designs suggested so far.

 I agree with the folks who say it's error-prone.  I can just see
 myself now removing a key I know is in the dictionary and being
 baffled when my program fails somewhere later on because I typed
 aa.remove("theKey") when it should have been aa.remove("thekey").  I
 knew it was there so I didn't want to clutter up my code with a check
 for it.

Um what? aa["theKey"] = 1 doesn't fail, why should aa.remove("theKey") 
be special?

(Meanwhile, C++'s map::erase returns the number of elements removed, 

delete x[y] returns a bool, Perl, Ruby, Objective-C, Java, etc. all 
won't throw exceptions (Python being the only exceptional case). Do all 
languages clutter up with checks?)

 So the advice would be to always check to make sure the key you remove
 got removed to be on the safe side.   If that's the case I'd rather
 just have the darn thing throw an exception so I don't have to bother
 to write things like
       if (!aa.remove("theKey")) assert(false);
 everywhere just to be on the safe side.

void discard(K,V)(ref V[K] aa, in K key) {
   if (!aa.remove(key)) assert(false);
}

...

aa.discard("theKey");

 --bb

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

KennyTM~:

 Um what? aa["theKey"] = 1 doesn't fail, why should aa.remove("theKey") 
 be special?

That's a different situation.
You probably meant to say: If aa["theKey"]++; doesn't fail, why should
aa.remove("theKey") be special?


 void discard(K,V)(ref V[K] aa, in K key) {
    if (!aa.remove(key)) assert(false);
 }

This is stupid. Part of the point of built-in AAs is to avoid to import things.
If I need to import (or worse define) that discard template in all programs
where I use an AA, then I will create my own AAs and I'll just import and use
them in the first place.

Bye,
bearophile

Oct 29 2009

KennyTM~ <kennytm gmail.com> writes:

On Oct 30, 09 01:14, bearophile wrote:
 KennyTM~:

 Um what? aa["theKey"] = 1 doesn't fail, why should aa.remove("theKey")
 be special?

 That's a different situation.
 You probably meant to say: If aa["theKey"]++; doesn't fail, why should
aa.remove("theKey") be special?


 void discard(K,V)(ref V[K] aa, in K key) {
     if (!aa.remove(key)) assert(false);
 }

 This is stupid. Part of the point of built-in AAs is to avoid to import
things. If I need to import (or worse define) that discard template in all
programs where I use an AA, then I will create my own AAs and I'll just import
and use them in the first place.

 Bye,
 bearophile

This does not contradicts with .remove() not throwing an exception when 
a key is not found. If you like you can put it in object.d.

(Moreover, having .remove() to throw means you can't delete any 
dictionary items in nothrow functions. Sure you can silent it with 
try/catch but that's expensive.)

Oct 29 2009

Leandro Lucarella <llucax gmail.com> writes:

KennyTM~, el 30 de octubre a las 02:55 me escribiste:
 On Oct 30, 09 01:14, bearophile wrote:
KennyTM~:

Um what? aa["theKey"] = 1 doesn't fail, why should aa.remove("theKey")
be special?

That's a different situation.
You probably meant to say: If aa["theKey"]++; doesn't fail, why should
aa.remove("theKey") be special?


void discard(K,V)(ref V[K] aa, in K key) {
    if (!aa.remove(key)) assert(false);
}

This is stupid. Part of the point of built-in AAs is to avoid to import things.
If I need to import (or worse define) that discard template in all programs
where I use an AA, then I will create my own AAs and I'll just import and use
them in the first place.

Bye,
bearophile

 
 This does not contradicts with .remove() not throwing an exception
 when a key is not found. If you like you can put it in object.d.
 
 (Moreover, having .remove() to throw means you can't delete any
 dictionary items in nothrow functions. Sure you can silent it with
 try/catch but that's expensive.)

Not if it's an Error instead of an exception. I think that should be the
case, .remove() check should be like bound check, something done at
non-release mode, not something to use in the regular flow of a program to
avoid using opIn().

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Esta desenchufada la internet de ese televisor?
	-- Osvaldo Lucarella

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

KennyTM~:

 (Moreover, having .remove() to throw means you can't delete any 
 dictionary items in nothrow functions. Sure you can silent it with 
 try/catch but that's expensive.)

In nothrow functions you can use a different method, like "discard" (or a
similar name less intuitive than remove), that's like remove, but it doesn't
throw and just returns false when the key was absent.
The idea is to use the safer method by default and the less safe one as a
performance optimization (or where you are sure you want that semantics) in the
other places.

Bye,
bearophile

Oct 29 2009

"Simen Kjaeraas" <simen.kjaras gmail.com> writes:

bearophile <bearophileHUGS lycos.com> wrote:

 In nothrow functions you can use a different method, like "discard" (or  
 a similar name less intuitive than remove), that's like remove, but it  
 doesn't throw and just returns false when the key was absent.
 The idea is to use the safer method by default and the less safe one as  
 a performance optimization (or where you are sure you want that  
 semantics) in the other places.

Please elucidate, what is unsafe about deleting something that isn't there?

--
Simen

Oct 29 2009

Leandro Lucarella <llucax gmail.com> writes:

Simen Kjaeraas, el 29 de octubre a las 22:06 me escribiste:
 bearophile <bearophileHUGS lycos.com> wrote:
 
In nothrow functions you can use a different method, like
"discard" (or a similar name less intuitive than remove), that's
like remove, but it doesn't throw and just returns false when the
key was absent.
The idea is to use the safer method by default and the less safe
one as a performance optimization (or where you are sure you want
that semantics) in the other places.

 
 Please elucidate, what is unsafe about deleting something that isn't there?

A leak, if the key was wrong (not if the key was right but deleted
before).

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Soñé que tenia un caballo
Que me trataba mejor que vos
Tenia tan buena onda con ella
Era mi yegua, mucho mas que vos

Oct 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Oct 29, 2009 at 9:57 AM, KennyTM~ <kennytm gmail.com> wrote:
 On Oct 29, 09 23:59, Bill Baxter wrote:
 On Thu, Oct 29, 2009 at 8:39 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> =A0wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 What about and overload of remove() like this:
 bool remove(in T key, out U value);

 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);

 But I guess that pointer would point to the element stored in the the =




AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

 I think this all is overdoing it. First, I disagree that remove should
 ever
 throw an exception. It's not a code that the user is supposed to check
 (with
 dire consequences if she doesn't), it's just additional information jus=



t
 in
 case you need it.

 I think bool remove(key) is better than all other designs suggested so
 far.

 I agree with the folks who say it's error-prone. =A0I can just see
 myself now removing a key I know is in the dictionary and being
 baffled when my program fails somewhere later on because I typed
 aa.remove("theKey") when it should have been aa.remove("thekey"). =A0I
 knew it was there so I didn't want to clutter up my code with a check
 for it.

 Um what? aa["theKey"] =3D 1 doesn't fail, why should aa.remove("theKey") =

be
 special?

Huh?  aa["theKey"] also doesn't delete entries, so why should it have
any bearing on what remove does?  The closest analogy would be the
case where aa["theKey"]=3D1 can't do what you ask it to, which would be
an out-of-memory error, which does generate an error of some kind, I
think.

 (Meanwhile, C++'s map::erase returns the number of elements removed,

Erase is also used for multimaps, so it's not exactly apples to
apples.  Your other examples are ok AFAIK.

 void discard(K,V)(ref V[K] aa, in K key) {
 =A0if (!aa.remove(key)) assert(false);
 }

 ...

 aa.discard("theKey");

Yep, that would be perfect if it were part of the standard AA
interface.  That would actually be my preferred solution, to have two
methods for removing elements from AAs.  One that throws and one that
doesn't.

--bb

Oct 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Thu, Oct 29, 2009 at 8:39 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 What about and overload of remove() like this:
 bool remove(in T key, out U value);

 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);

 But I guess that pointer would point to the element stored in the the AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

 I think this all is overdoing it. First, I disagree that remove should ever
 throw an exception. It's not a code that the user is supposed to check (with
 dire consequences if she doesn't), it's just additional information just in
 case you need it.

 I think bool remove(key) is better than all other designs suggested so far.

 
 I agree with the folks who say it's error-prone.  I can just see
 myself now removing a key I know is in the dictionary and being
 baffled when my program fails somewhere later on because I typed
 aa.remove("theKey") when it should have been aa.remove("thekey").  I
 knew it was there so I didn't want to clutter up my code with a check
 for it.

I don't find this reasonable. If you know removal must have succeeded, 
just type enforce(aa.remove("theKey")). I don't think that's the 
overwhelmingly common case though, and if it's, say, about 50/50, then 
it's much more sensible to have a non-throwing primitive than a throwing 
one. And it looks like defining two primitives just to save a call to 
enforce is not a good design. After all, if you argue people forget and 
misspell things and all, I could argue they call the wrong function out 
of two with very similar charters. Honest, I just read a couple of posts 
proposing two primitives and for the life of me I already can't remember 
which was throwing and which wasn't.

 So the advice would be to always check to make sure the key you remove
 got removed to be on the safe side.

I'm not seeing how that comes about. The advice is to check if you care, 
which is common sense. I'm wondering how such an obvious design came 
into question.


Andrei

Oct 29 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 29 de octubre a las 12:33 me escribiste:
 Bill Baxter wrote:
I think bool remove(key) is better than all other designs suggested so far.

I agree with the folks who say it's error-prone.  I can just see
myself now removing a key I know is in the dictionary and being
baffled when my program fails somewhere later on because I typed
aa.remove("theKey") when it should have been aa.remove("thekey").  I
knew it was there so I didn't want to clutter up my code with a check
for it.

 
 I don't find this reasonable. If you know removal must have
 succeeded, just type enforce(aa.remove("theKey")). I don't think

I don't agree, this is like saying you should bound-check every array
access. I think it's nice to have safety by default. If you are expecting
to have an error, you will be extra careful to spell the key correctly.
This should cover the cases where you are distracted and not expecting any
errors.

I think the check could be done in non-release mode only though, just
like bound checking.

 that's the overwhelmingly common case though, and if it's, say,
 about 50/50, then it's much more sensible to have a non-throwing
 primitive than a throwing one. And it looks like defining two
 primitives just to save a call to enforce is not a good design.

This is one case where I think practicality beats purity, because the
whole point of throwing an exception is for the cases you don't expect it
to fail. Again, think of array bounds, you can force the programmer to add
lots of enforce() in the code and remove bound checking from the compiler.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
EXPOSICION INTERNACIONAL DE INODOROS
	-- Crónica TV

Oct 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Leandro Lucarella wrote:
 Andrei Alexandrescu, el 29 de octubre a las 12:33 me escribiste:
 Bill Baxter wrote:
 I think bool remove(key) is better than all other designs suggested so far.

 I agree with the folks who say it's error-prone.  I can just see
 myself now removing a key I know is in the dictionary and being
 baffled when my program fails somewhere later on because I typed
 aa.remove("theKey") when it should have been aa.remove("thekey").  I
 knew it was there so I didn't want to clutter up my code with a check
 for it.

 I don't find this reasonable. If you know removal must have
 succeeded, just type enforce(aa.remove("theKey")). I don't think

 
 I don't agree, this is like saying you should bound-check every array
 access. I think it's nice to have safety by default.

remove from an associative array is not unsafe.

 If you are expecting
 to have an error, you will be extra careful to spell the key correctly.
 This should cover the cases where you are distracted and not expecting any
 errors.
 
 I think the check could be done in non-release mode only though, just
 like bound checking.

Bounds checking is eliminated in release mode solely for efficiency 
reasons. There is no other good reason to eliminate checks. But in the 
case of remove, the amount of work done is consistent enough to make a 
check negligible.

 that's the overwhelmingly common case though, and if it's, say,
 about 50/50, then it's much more sensible to have a non-throwing
 primitive than a throwing one. And it looks like defining two
 primitives just to save a call to enforce is not a good design.

 
 This is one case where I think practicality beats purity, because the
 whole point of throwing an exception is for the cases you don't expect it
 to fail. Again, think of array bounds, you can force the programmer to add
 lots of enforce() in the code and remove bound checking from the compiler.

I can't think of array bounds. The situations are completely unrelated.


Andrei

Oct 29 2009

Leandro Lucarella <llucax gmail.com> writes:

Andrei Alexandrescu, el 29 de octubre a las 13:26 me escribiste:
 Leandro Lucarella wrote:
Andrei Alexandrescu, el 29 de octubre a las 12:33 me escribiste:
Bill Baxter wrote:
I think bool remove(key) is better than all other designs suggested so far.

I agree with the folks who say it's error-prone.  I can just see
myself now removing a key I know is in the dictionary and being
baffled when my program fails somewhere later on because I typed
aa.remove("theKey") when it should have been aa.remove("thekey").  I
knew it was there so I didn't want to clutter up my code with a check
for it.

I don't find this reasonable. If you know removal must have
succeeded, just type enforce(aa.remove("theKey")). I don't think

I don't agree, this is like saying you should bound-check every array
access. I think it's nice to have safety by default.

 
 remove from an associative array is not unsafe.

In the sense of "memory corruption" that's true. But your program will be
at least leaking memory if you thought the key was present in the AA (the
majority of the use cases for remove(), because come on, if you are
removing something is because you think there are very good chances that
the key is in the AA :). If your program then iterate the AA and there is
a key that it doesn't supposed to be there, it will blow in very original,
hard to track ways. What's the point of that?

If you want to remove something you're not sure that it really is in the
AA, you should use a safer method. Maybe try_remove(), or an extra
parameter to remove() or maybe just ask first if key in aa (but this have
an extra lookup, and I think that's the whole point that triggered this
discussion).

If you are expecting
to have an error, you will be extra careful to spell the key correctly.
This should cover the cases where you are distracted and not expecting any
errors.

I think the check could be done in non-release mode only though, just
like bound checking.

 
 Bounds checking is eliminated in release mode solely for efficiency
 reasons. There is no other good reason to eliminate checks. But in

Well, same for removing an inexistent key from an AA then :)

 the case of remove, the amount of work done is consistent enough to
 make a check negligible.

Ok, but make it an "Error", not an exception, so it can be used in nothrow
functions. Removing an nonexistent key from an AA should be a programming
error, except if you state in some way that you know that maybe the key is
not there (which should be rare).

that's the overwhelmingly common case though, and if it's, say,
about 50/50, then it's much more sensible to have a non-throwing
primitive than a throwing one. And it looks like defining two
primitives just to save a call to enforce is not a good design.

This is one case where I think practicality beats purity, because the
whole point of throwing an exception is for the cases you don't expect it
to fail. Again, think of array bounds, you can force the programmer to add
lots of enforce() in the code and remove bound checking from the compiler.

 
 I can't think of array bounds. The situations are completely unrelated.

I don't think they are *completely* unrelated. I agree is not exactly the
same because array bound error automatically imply memory corruption and
nonexistent key removal don't, but they are a bug in the vast majority of
the cases. At least this is my experience.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Fitter, happier, more productive, comfortable, not drinking too much,
regular exercise at the gym (3 days a week),
getting on better with your associate employee contemporaries,

Oct 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Oct 29, 2009 at 11:16 AM, Leandro Lucarella <llucax gmail.com> wrot=
e:
 Andrei Alexandrescu, el 29 de octubre a las 12:33 me escribiste:
 Bill Baxter wrote:
I think bool remove(key) is better than all other designs suggested so=




 far.
I agree with the folks who say it's error-prone. =A0I can just see
myself now removing a key I know is in the dictionary and being
baffled when my program fails somewhere later on because I typed
aa.remove("theKey") when it should have been aa.remove("thekey"). =A0I
knew it was there so I didn't want to clutter up my code with a check
for it.

 I don't find this reasonable. If you know removal must have
 succeeded, just type enforce(aa.remove("theKey")). I don't think

 I don't agree, this is like saying you should bound-check every array
 access. I think it's nice to have safety by default. If you are expecting
 to have an error, you will be extra careful to spell the key correctly.
 This should cover the cases where you are distracted and not expecting an=

y
 errors.

 I think the check could be done in non-release mode only though, just
 like bound checking.

 that's the overwhelmingly common case though, and if it's, say,
 about 50/50, then it's much more sensible to have a non-throwing
 primitive than a throwing one. And it looks like defining two
 primitives just to save a call to enforce is not a good design.

 This is one case where I think practicality beats purity, because the
 whole point of throwing an exception is for the cases you don't expect it
 to fail. Again, think of array bounds, you can force the programmer to ad=

d
 lots of enforce() in the code and remove bound checking from the compiler=

.

The problem with this argument (which I was also about to make) is
that accessing an array out of bounds is almost never normal behavior.
 But for deleting things, it often is.  Think also about delete
itself.  In C++ and D delete on a null pointer is fine, and is defined
to be a no-op.  This was done explicitly to avoid always having to do
if(p) delete p; all the time.  And really it works out just fine in
practice.   If we're going to keep arguing that remove() should throw,
then really delete should too.

--bb

Oct 29 2009

Leandro Lucarella <llucax gmail.com> writes:

Bill Baxter, el 29 de octubre a las 11:31 me escribiste:
 On Thu, Oct 29, 2009 at 11:16 AM, Leandro Lucarella <llucax gmail.com> wrote:
 that's the overwhelmingly common case though, and if it's, say,
 about 50/50, then it's much more sensible to have a non-throwing
 primitive than a throwing one. And it looks like defining two
 primitives just to save a call to enforce is not a good design.

 This is one case where I think practicality beats purity, because the
 whole point of throwing an exception is for the cases you don't expect it
 to fail. Again, think of array bounds, you can force the programmer to add
 lots of enforce() in the code and remove bound checking from the compiler.

 
 The problem with this argument (which I was also about to make) is
 that accessing an array out of bounds is almost never normal behavior.
  But for deleting things, it often is.  Think also about delete
 itself.  In C++ and D delete on a null pointer is fine, and is defined
 to be a no-op.  This was done explicitly to avoid always having to do
 if(p) delete p; all the time.  And really it works out just fine in
 practice.   If we're going to keep arguing that remove() should throw,
 then really delete should too.

I don't think that's the case. If you have a wrong value in a pointer you
want to delete, you have 2 posibilities:
1) Is a valid pointer, in which case you are doomed, a wrong object would
   be deleted and your program is fucked.
2) Is an invalid pointer, in which case the program could explode
   inmediately. I don't know what happens right now, but an assertion in
   this case sould be useful too.

NULL pointers are not a problem, because it rarely indicates an error in
the program. delete NULL is useful for things like:

T* x = null;
if (blah)
	x = bleh();
foo(x);
delete x;

This might ever be a valid use case for AA too, and in that case I don't
really mind aa.remove(null) being a NOP.

The issue is with WRONG KEYS, not null ones. If you messed up with your
key (as well if you messed up with your pointer in delete), *bad* things
will happen, and even when it's not "memory corruption" per se (I think
the definition of SafeD is avoiding memory corruption only), it's close to
undefined behaviour, your program will start failing in very strange ways
and the bug will be too hard to track. It's basically the same problem
with dynamic languages that doesn't even need to "declare" a variable,
like perl or php.

So I think yes, remove should assert() if the key is not found (unless
you explicitly inform in some way that you don't want it to fail) and
delete should assert() if the pointer passed is not a pointer to a start
of a HEAP block. In either case is a bug, and I want to be notified about
it as soon as possible (at least in non-release mode).


-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Sé que tu me miras, pero yo me juraría que, en esos ojos negros que
tenés, hay un indio sensible que piensa: "Qué bárbaro que este tipo
blanco esté tratando de comunicarse conmigo que soy un ser inferior en
la escala del homo sapiens". Por eso, querido indio, no puedo dejar de
mirarte como si fueras un cobayo de mierda al que puedo pisar cuando
quiera.
	-- Ricardo Vaporeso. Carta a los aborígenes, ed. Gredos,
		Barcelona, 1912, página 102.

Oct 29 2009

Bill Baxter <wbaxter gmail.com> writes:

On Thu, Oct 29, 2009 at 10:33 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Thu, Oct 29, 2009 at 8:39 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Leandro Lucarella wrote:
 Andrei Alexandrescu, el 28 de octubre a las 20:29 me escribiste:
 Your test looks something up and then removes it.


 Andrei

 Well, my extended test case looks something up, manipulates the
 found value, and then possibly removes it.

 Ok, I understand your points, thanks for explaining.

 What about and overload of remove() like this:
 bool remove(in T key, out U value);

 If the element was present, it's returned in "value", so you can
 manipulate it. I thought about just returning a pointer:
 U* remove(in T key);

 But I guess that pointer would point to the element stored in the the =




AA
 private data, but that element was just removed, so bad things would
 happen, that's why the only option is to copy the data, right?

 I think this all is overdoing it. First, I disagree that remove should
 ever
 throw an exception. It's not a code that the user is supposed to check
 (with
 dire consequences if she doesn't), it's just additional information jus=



t
 in
 case you need it.

 I think bool remove(key) is better than all other designs suggested so
 far.

 I agree with the folks who say it's error-prone. =A0I can just see
 myself now removing a key I know is in the dictionary and being
 baffled when my program fails somewhere later on because I typed
 aa.remove("theKey") when it should have been aa.remove("thekey"). =A0I
 knew it was there so I didn't want to clutter up my code with a check
 for it.

 I don't find this reasonable. If you know removal must have succeeded, ju=

st
 type enforce(aa.remove("theKey")). I don't think that's the overwhelmingl=

y
 common case though, and if it's, say, about 50/50, then it's much more
 sensible to have a non-throwing primitive than a throwing one. And it loo=

ks
 like defining two primitives just to save a call to enforce is not a good
 design. After all, if you argue people forget and misspell things and all=

, I
 could argue they call the wrong function out of two with very similar
 charters. Honest, I just read a couple of posts proposing two primitives =

and
 for the life of me I already can't remember which was throwing and which
 wasn't.

 So the advice would be to always check to make sure the key you remove
 got removed to be on the safe side.

 I'm not seeing how that comes about. The advice is to check if you care,
 which is common sense. I'm wondering how such an obvious design came into
 question.

Ok.  You convinced me that your way doesn't suck.  And you can ask the
Python designers how the so-called "obvious" design came into
question.  Apparently it's not as obvious as you think.  And I would
suggest from experience with Python that having two functions is also
not as bad as you think.  But I'm convinced that having two functions
is not significantly better than having one function that doesn't
throw.

--bb

Oct 29 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 28 Oct 2009 20:08:34 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Denis Koroskin wrote:
 On Wed, 28 Oct 2009 23:18:08 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 I'd also like you to add a few things in an AA interface.
  First, opIn should not return a pointer to Value, but a pointer to a  
 pair of Key and Value, if possible (i.e. if this change won't  
 sacrifice performance).

 I'm coy about adding that because it forces the implementation to hold  
 keys and values next to each other. I think that was a minor mistake  
 of STL - there's too much exposure of layout details.

  It doesn't have to be the case: key and value are both properties  
 (i.e. methods), and they doesn't have to be located next to each other.

 I see. So you want a pointer to an elaborate type featuring a key and a  
 value.

It could return a struct holding both pointers, instead of a pointer to a  
struct.  I don't see this as unreasonable (in fact, it's what I do in  
dcollections, see the cursor types).  An implementation which stores the  
two close together in actuality may only require one pointer.

The only drawback of this is it can't be part of a formal interface, since  
the return type is defined by the derived type.

-Steve

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 I'll make aa.remove(key) always work and return a bool that tells you 
 whether there was a mapping or not.

I think that's a small design mistake. In a high level language you want things
to not fail silently. You want them to fail in an explicit way because
programmers often forget to read and use return values.

So AAs may have two methods, "remove" and "drop" (Python sets use "remove" and
"discard" for this). The "remove" can be the safer one and used by default in D
programs (especially in SafeD modules, safety is in things like this too), that
raises an exception when you try to remove a missing key. "drop/discard" is
faster and silent, it removes the key if it's present, as you want.

Bye,
bearophile

Oct 29 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

bearophile wrote:
 Andrei Alexandrescu:
 
 I'll make aa.remove(key) always work and return a bool that tells you 
 whether there was a mapping or not.

 
 I think that's a small design mistake. In a high level language you want
things to not fail silently. You want them to fail in an explicit way because
programmers often forget to read and use return values.
 
 So AAs may have two methods, "remove" and "drop" (Python sets use "remove" and
"discard" for this). The "remove" can be the safer one and used by default in D
programs (especially in SafeD modules, safety is in things like this too), that
raises an exception when you try to remove a missing key. "drop/discard" is
faster and silent, it removes the key if it's present, as you want.
 
 Bye,
 bearophile

I agree with this. I usually want exceptions.

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

Pelle M�nsson:
 I agree with this. I usually want exceptions.

A problem with this is that currently exceptions are very slow in D compiled
with DMD (something like up to 11 times slower than Java exceptions, about 2-4
times slower than Python exceptions. I have a benchmark for this on my site).

Bye,
bearophile

Oct 29 2009

bearophile <bearophileHUGS lycos.com> writes:

bearophile:

 Pelle M�nsson:
 I agree with this. I usually want exceptions.

 
 A problem with this is that currently exceptions are very slow in D compiled
with DMD (something like up to 11 times slower than Java exceptions, about 2-4
times slower than Python exceptions. I have a benchmark for this on my site).

I have just redone the timings with LDC on Ubuntu (I'll put the timings online
soon): Java is only about 2.2 times faster. This is much better and almost
acceptable.

Bye,
bearophile

Oct 29 2009

rmcguire <rjmcguire gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 
 Denis Koroskin wrote:
 On Wed, 28 Oct 2009 17:22:00 +0300, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Walter has magically converted his work on T[new] into work on making 
 associative arrays true templates defined in druntime and not 
 considered very special by the compiler.

 
 Wow, this is outstanding! (I hope it didn't have any negative impact on 
 compile-time AA capabilities).
 
 This is very exciting because it opens up or simplifies a number of 
 possibilities. One is that of implementing true iteration. I actually 
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign 
 opSlice() a universal method of getting the "all" iterator, a default 
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" 
 properties or functions. How should they be called?


 Thanks,

 Andrei

 
 If AA is providing a way to iterate over both keys and values (and it's 
 a default iteration scheme), why should AA provide 2 other iteration 
 schemes? Can't they be implemented externally (using adaptor ranges) 
 with the same efficiency?
 
 foreach (e; keys(aa)) {
     writefln("key: %s", e);
 }
 
 foreach (e; values(aa)) {
     writefln("value: %s", e);
 }

 
 Of course. In fact, given the iterator with .key and .value, you can 
 always apply map!"a.key" or map!"a.value" to select the desired member.
 
 I'd also like you to add a few things in an AA interface.
 
 First, opIn should not return a pointer to Value, but a pointer to a 
 pair of Key and Value, if possible (i.e. if this change won't sacrifice 
 performance).

 
 I'm coy about adding that because it forces the implementation to hold 
 keys and values next to each other. I think that was a minor mistake of 
 STL - there's too much exposure of layout details.
 
 Second, AA.remove method should accept result of opIn operation to avoid 
 an additional lookup for removal:
 
 if (auto value = key in aa) {
     aa.remove(key); // an unnecessary lookup
 }

 
 I'll make aa.remove(key) always work and return a bool that tells you 
 whether there was a mapping or not.
 
 Something like this would be perfect:
 
 struct Element(K,V)
 {
     const K key;
     V value;
 }
 
 struct AA(K,V)
 {
     //...
     ref Element opIn(K key) { /* throws an exception if element is not 
 found */ }
     void remove(ref Element elem) { /* removes an element from an AA */ }
     void remove(K key) { remove(key in this); }
 
     AARange!(K,V) opSlice() { /* iterates over both keys and values */ }
 }
 
 Last, I believe foreach loop should automatically call opSlice() on 
 iteratee.

 
 foreach in D2 should already call opSlice() whenever it's defined. If it 
 doesn't, that's a bug in the compiler.
 
 
 Andrei
 


Wouldn't opIn be more useful if it returned a range starting with
the element that was found?

Oct 29 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

rmcguire wrote:
 Wouldn't opIn be more useful if it returned a range starting with
 the element that was found?

Thought about that, but it's hard to justify. Items aren't sorted in any 
particular order, so you'd get pretty much a random bunch of stuff.

Andrei

Oct 29 2009

rmcguire <rjmcguire gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 
 rmcguire wrote:
 Wouldn't opIn be more useful if it returned a range starting with
 the element that was found?

 
 Thought about that, but it's hard to justify. Items aren't sorted in any 
 particular order, so you'd get pretty much a random bunch of stuff.
 
 Andrei
 

Oh, I thought that ranges were going to be applied to network/file IO.

I which case there is plenty of places one might want to jump forward in a 
"range"...

ah but then we're talking AAs.

Well I suppose the Socket class could have opIn return a range.

hehe

Thanks for replying

Oct 29 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Wed, 28 Oct 2009 10:22:00 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Walter has magically converted his work on T[new] into work on making  
 associative arrays true templates defined in druntime and not considered  
 very special by the compiler.

 This is very exciting because it opens up or simplifies a number of  
 possibilities. One is that of implementing true iteration. I actually  
 managed to implement last night something that allows you to do:

 int[int] aa = [ 1:1 ];
 auto iter = aa.each;
 writeln(iter.front.key);
 writeln(iter.front.value);

 Two other iterations are possible: by key and by value (in those cases  
 iter.front just returns a key or a value).

 One question is, what names should these bear? I am thinking of makign  
 opSlice() a universal method of getting the "all" iterator, a default  
 that every container must implement.

 For AAs, there would be a "iterate keys" and "iterate values" properties  
 or functions. How should they be called?

This is one of the benefits of opApply.

Rather than define some new names to make up for the deficiencies of  
range-foreach (especially so near the release of D2), why not focus on  
fixing the deficiencies?

i.e. here is my desired interface:

foreach(key, value; aa) // iterate both keys and values
foreach(value; aa) // default to iterating values (to be consistent with  
arrays)
foreach(key; aa.keys) // iterate keys lazily.

If ranges cannot support this interface, I think the range paradigm needs  
some work.  opApply works great for this since the delegate type defines  
the desired interface.  The problem with ranges is the potential "opRange"  
method in it's natural form should overload on the return value, and have  
no parameters.

Here's a possible idea: make foreach(X x, Y y, Z z,...; container)  
translate to foreach(X x, Y y, Z z,...; container.opRange!(X, Y, Z,  
...)()).  In the case of inferred types, maybe you can do something like:

opRange!(T : keytype, U : valuetype)() and have the compiler infer types  
for x and y as keytype and valuetype.

The only drawback I see is opRange can't be virtual.  However, how does  
one define a "virtual" range return, when a range is invariably a struct?   
It might be a moot issue.

Of course, you could always meet the virtual requirement by naming your  
range-generating method...

So with my scheme, the AA becomes:

struct AA(Key, Val)
{
   ... // implementation
   auto opRange(T : Key, U : Val)() { ... }
   auto opRange(T : Val)() { ... }
   auto keys() { ... }
}

One other thing left to define is what foreach calls to get the key(s) in  
the case of multiple arguments to foreach.  The value is generally  
accessed via front().  Might as well tackle the ref possibilities too :)

-Steve

Oct 29 2009

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 Two other iterations are possible: by key and by value (in those cases 
 iter.front just returns a key or a value).

Well, AA already has properties keys and values and you can already iterate
over them. I think, it's ok to restrict to opApply(int delegate(ref KeyType,
ref ValueType)).

Oct 29 2009

D Programming

C/C++ Programming

Other

digitalmars.D - associative arrays: iteration is finally here