digitalmars.D - Revamping associative arrays

Andrei Alexandrescu (10/10) Oct 17 2009 Associative arrays are today quite problematic because they don't offer

Jason House (3/18) Oct 17 2009 I've heard that D's AA's are not GC friendly.
Chris Nicholson-Sauls (3/18) Oct 17 2009 Idea: the .keys and .values properties, rather than creating arrays, cou...

BCS (5/26) Oct 17 2009 what will this do?

Moritz Warning (5/34) Oct 17 2009 It's undefined behavior.

Moritz Warning (4/40) Oct 17 2009 Uh, sorry.

BCS (3/21) Oct 17 2009 As it happens, your comment would be correct if .keys were, as proposed,...

BCS (9/26) Oct 17 2009 That's the reason for the above idiom. Under the current spec, the above...
Christopher Wright (9/20) Oct 18 2009 That's really annoying, and it's true of most (all?) C# base class

Chris Nicholson-Sauls (8/39) Oct 18 2009 Pre-cache a pointer to the next node in sequence; two pointers is still ...
Steven Schveighoffer (7/29) Oct 20 2009 http://www.dsource.org/projects/dcollections/docs/current/dcollections.m...

dsimcha (329/339) Oct 17 2009 I've thought about this a lot, here's a laundry list.

language_fan (5/19) Oct 17 2009 [snip]

Walter Bright (4/7) Oct 18 2009 They already are. In D1, the complete implementation is in

BCS (62/79) Oct 17 2009 Adding more primes to that list should resolve that issue nicely while r...

Max Samukha (12/22) Oct 17 2009 They should be true reference types. Now we have oddities like this:

grauzone (8/39) Oct 17 2009 That's because the actual AA is allocated on demand (when you add the

Max Samukha (9/48) Oct 18 2009 I agree. But the problem is the lack of a decent way to instantiate an

bearophile (18/26) Oct 18 2009 This is an empty AA, you need nothing else:

bearophile (5/7) Oct 18 2009 I have just found that you can also use "null" there. It's less explicit...
Max Samukha (4/15) Oct 18 2009 I showed in my original post it is not enough to use AA as a reference

Robert Clipsham (5/7) Oct 17 2009 Their speed. I did a benchmark a while ago comparing tango's HashMap

Andrei Alexandrescu (3/11) Oct 17 2009 I hear you, but fortunately that doesn't concern the interface.

bearophile (37/37) Oct 17 2009 Andrei Alexandrescu:

Walter Bright (3/9) Oct 18 2009 You can iterate over both keys and values with:

bearophile (6/8) Oct 18 2009 I know this, of course, but you are missing the point by a mile. An expl...

Piotrek (7/9) Oct 18 2009 No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember t...

bearophile (11/18) Oct 18 2009 I have have followed many Python programmers for a lot of time and I thi...

Sergey Gromov (8/15) Oct 18 2009 It's easy to see what's intuitive if you consider what a collection
Piotrek (31/41) Oct 18 2009 I didn't mean to be offensive. All people (including me) have tendency

grauzone (8/16) Oct 18 2009 In a perfect world, iterating over an AA would yield a tuple (key,

bearophile (7/14) Oct 18 2009 I don't know why Python has originally chosen to iterate on just keys, m...

Christopher Wright (4/11) Oct 18 2009 Why not mandate using both keys and values? That should eliminate ambigu...

Andrei Alexandrescu (6/14) Oct 18 2009 Well it should be said Walter is doing his best too, I'm just bitching

bearophile (7/8) Oct 18 2009 The things I've written about aren't all equally important and they don'...

Bill Baxter (28/30) Oct 18 2009 I typed a long post that weighed lots of pros and cons of different

Piotrek (12/14) Oct 18 2009 I was almost convinced, because that rule has a sense. But treating

Bill Baxter (14/27) Oct 18 2009 That sounds like an argument that there should be no default, because

bearophile (4/5) Oct 18 2009 In the world for every line of D1 code you may want to port to D2, there...

Denis Koroskin (4/12) Oct 18 2009 Why would you want to port C code to D, if you can easily interface with...

bearophile (16/17) Oct 18 2009 First of all you have to consider programmer experience, they know C, so...

language_fan (7/15) Oct 19 2009 Is this a joke? Being backwards compatible with C often means that the

Piotrek (41/63) Oct 18 2009 force the user to be

Bill Baxter (18/52) Oct 18 2009 Got that, but we got two explanations for the "most logical behavior".

Piotrek (19/37) Oct 19 2009 And only python that I am aware of. Java,C#,PHP (which hold most of all

KennyTM~ (2/42) Oct 19 2009

Bill Baxter (11/31) Oct 19 2009 I took a look at C# yesterday, and I don't think it is true there.

Leandro Lucarella (9/39) Oct 19 2009 I guess everybody knows, but this is what C++ does =P
Piotrek (4/10) Oct 21 2009 True. Java does it too. My fault.

=?ISO-8859-1?Q?Pelle_M=E5nsson?= (8/92) Oct 21 2009 Put it this way:

Piotrek (10/109) Oct 21 2009 Yes!

bearophile (4/6) Oct 21 2009 Relax. Despite both you and Walter are wrong, things will not change, an...

Andrei Alexandrescu (6/20) Oct 18 2009 Well clearly sometimes both are needed. I think it would be limiting to

bearophile (14/25) Oct 19 2009 ...
Steven Schveighoffer (15/23) Oct 20 2009 Do not require opCmp for AAs. There are some good hashmap implementatio...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Associative arrays are today quite problematic because they don't offer 
any true iteration. Furthermore, the .keys and .values properties create 
new arrays, which is wasteful.

Another issue with associative arrays is that ++a[k] is hacked, which 
reflects a grave language limitation. That needs to be replaced with a 
true facility.

Any other issues with AAs that you want to see fixed, and ideas guiding 
a redesign?


Thanks,

Andrei

Oct 17 2009

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu Wrote:

 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?
 
 
 Thanks,
 
 Andrei

I've heard that D's AA's are not GC friendly.

I'd recommend looking at "high scale lib". It's a java container library
designed for high throughput and high concurrency. I was going to try my hand
at writing it, but the whole shared debacle kind of squashed that. With all the
polishing of D2, when will shared be both well documented and well implemented.
Shared updates haven't been in any of the recent changelogs.

Oct 17 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

Andrei Alexandrescu wrote:
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?
 
 
 Thanks,
 
 Andrei

Idea: the .keys and .values properties, rather than creating arrays, could
create iterable 
ranges with the smallest possible footprint, internally walking the tree
structure.

Oct 17 2009

BCS <none anon.com> writes:

Hello Chris Nicholson-Sauls,

 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 

 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.
 

what will this do?

foreach(key; aa.keys)
   if(Test(key))
      aa.remove(key);

Oct 17 2009

Moritz Warning <moritzwarning web.de> writes:

On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:

 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 

 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.
 
 

 what will this do?
 
 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);

It's undefined behavior.
You shouldn't try to mutate the aa while iterating.
I hope that will be fixed.
It took me some time to find this out.

Oct 17 2009

Moritz Warning <moritzwarning web.de> writes:

On Sat, 17 Oct 2009 19:06:36 +0000, Moritz Warning wrote:

 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:
 
 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
 
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
 
 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
 
 Thanks,
 
 Andrei
 

 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.
 
 

 what will this do?
 
 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);

 
 It's undefined behavior.
 You shouldn't try to mutate the aa while iterating. I hope that will be
 fixed.
 It took me some time to find this out.

Uh, sorry.
You are iterating over an fresh allocated array of the keys.
That's ok.

Oct 17 2009

BCS <none anon.com> writes:

Hello Moritz,

 On Sat, 17 Oct 2009 19:06:36 +0000, Moritz Warning wrote:
 
 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 what will this do?
 
 foreach(key; aa.keys)
 if(Test(key))
 aa.remove(key);

 It's undefined behavior.
 You shouldn't try to mutate the aa while iterating. I hope that will
 be
 fixed.
 It took me some time to find this out.

 Uh, sorry.
 You are iterating over an fresh allocated array of the keys.
 That's ok.

As it happens, your comment would be correct if .keys were, as proposed, 
made to be a range that walks the AA.

Oct 17 2009

BCS <none anon.com> writes:

Hello Moritz,

 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 
 Hello Chris Nicholson-Sauls,
 
 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.
 

 what will this do?
 
 foreach(key; aa.keys)
 if(Test(key))
 aa.remove(key);

 It's undefined behavior.
 You shouldn't try to mutate the aa while iterating.
 I hope that will be fixed.
 It took me some time to find this out.

That's the reason for the above idiom. Under the current spec, the above 
is totally valid and, as far as I know, the only valid way to remove all 
elements fitting some test with only a single pass.

Whatever is chosen, the above code had better work or at a minimum, requiter 
nothing more than the addition of a .dup

OTOH building .keys must do a pass of it's own so all that's gained is sugar. 
But still, switching to a .keys that gets invalidated on updating the AA 
will break existing code so making the fix simple should be kept in mind.

Oct 17 2009

Christopher Wright <dhasenan gmail.com> writes:

Moritz Warning wrote:
 On Sat, 17 Oct 2009 18:58:08 +0000, BCS wrote:
 what will this do?

 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);

 
 It's undefined behavior.
 You shouldn't try to mutate the aa while iterating.
 I hope that will be fixed.
 It took me some time to find this out.




var remove = new HashedSet<Foo>();
foreach (var foo in foos)
	if (Test(foo)) remove.Add(foo);
foos.RemoveAll(remove);

It's more allocation than necessary, but more than that, it's an 
unnecessarily complicated way of interacting with the collection.

Oct 18 2009

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

BCS wrote:
 Hello Chris Nicholson-Sauls,
 
 Andrei Alexandrescu wrote:

 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.

 Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?

 Thanks,

 Andrei

 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.

 
 what will this do?
 
 foreach(key; aa.keys)
   if(Test(key))
      aa.remove(key);
 
 

Pre-cache a pointer to the next node in sequence; two pointers is still better
than an 
arbitrarily long on-the-fly array.  Its my understanding that AA trees do not 
automatically rebalance, so the sequence isn't going to (otherwise) change on a
removal. 
End of the sequence means a null "next" pointer.

For that matter: how common is looping over keys just to remove select ones
versus other 
purposes in looping just the keys?

-- Chris Nicholson-Sauls

Oct 18 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 17 Oct 2009 14:58:08 -0400, BCS <none anon.com> wrote:

 Hello Chris Nicholson-Sauls,

 Andrei Alexandrescu wrote:

 Associative arrays are today quite problematic because they don't
 offer any true iteration. Furthermore, the .keys and .values
 properties create new arrays, which is wasteful.
  Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with
 a true facility.
  Any other issues with AAs that you want to see fixed, and ideas
 guiding a redesign?
  Thanks,
  Andrei

 Idea: the .keys and .values properties, rather than creating arrays,
 could create iterable ranges with the smallest possible footprint,
 internally walking the tree structure.

 what will this do?

 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);

http://www.dsource.org/projects/dcollections/docs/current/dcollections.model.Iterator.html

Search for Purgeable.

I always hated that limitation of not being able to remove elements while  

iteration compared to C++.

-Steve

Oct 20 2009

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 Associative arrays are today quite problematic because they don't offer
 any true iteration. Furthermore, the .keys and .values properties create
 new arrays, which is wasteful.
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 Any other issues with AAs that you want to see fixed, and ideas guiding
 a redesign?
 Thanks,
 Andrei

I've thought about this a lot, here's a laundry list.

1.  Huge AAs (over about 1 million elements) create massive amounts of false
pointers.  Whether this should be fixed in the AA design or whether the AA
design
should assume a good GC design and hope we get one is a matter of debate.

2.  Every insertion to an AA requires a memory allocation (read:  requires
taking
a lock in a multithreaded program).  Again, it's debatable whether an AA design
should consider this or whether the real answer is to fix the GC.

3.  The structure of AAs means that the GC must scan everything.  There is no
part
of the data structure that can be marked as not having pointers, so that the GC
can skip scanning it.  This takes forever and a day.

4.  AAs are really, really, *really* space inefficient.  For *every element* in
the array, you have a struct that looks like the following:

struct aaA
{
    aaA *left;
    aaA *right;
    hash_t hash;
    /* key   */
    /* value */
}

Assuming 32-bit ints and pointers and zero overhead from other sources, this
means
you have *at least* 12 bytes per element in wasted space.

5.  Everything in the current design is based on RTTI instead of templates.  I
haven't benchmarked this, but I'd imagine it has to slow things down a little.

6.  The array of aaA structs can only have the following sizes:

immutable size_t[] prime_list = [
              97UL,            389UL,
           1_543UL,          6_151UL,
          24_593UL,         98_317UL,
          393_241UL,      1_572_869UL,
        6_291_469UL,     25_165_843UL,
      100_663_319UL,    402_653_189UL,
    1_610_612_741UL,  4_294_967_291UL,
//  8_589_934_513UL, 17_179_869_143UL
];

I understand that there is theoretical justification for this, but in practice
I'd
rather have O(N) performance in a few more corner cases in exchange for having
my
AA not be horribly space-inefficient.


I've worked a little on a design that's based simply on two parallel arrays, one
for keys and one for values.  I used parallel arrays instead of structs to cut
down on alignment overhead and so that if one slot has ptrs and the other
doesn't,
only the one that does can be scanned by the GC.  Collisions are resolved by
probing in an order defined by a linear congruential random number generator
seeded with the hash before computing the modulus.  This means that, even if two
elements have the same hash in modulus space, as long as their hashes are
different in full 32-bit space, the probing sequences for resolving collisions
will be completely different.  I might have mentioned it here a while back, but
I'll post it here for comment.  AFAIK it solves every one of the problems listed
above.
begin 644 LinAA2.d
M+RHJ06X 87-S;V-I871I=F4 87)R87D :6UP;&5M96YT871I;VX =&AA="!U
M<V5S(')A;F1O;6EZ960 ;&EN96%R(&-O;F=R=65N=&EA;`T*("H <')O8FEN




M87)G92P =&\ 86YY('!E<G-O;B!O<B!O<F=A;FEZ871I;VX-"B`J(&]B=&%I
M;FEN9R!A(&-O<'D ;V8 =&AE('-O9G1W87)E(&%N9"!A8V-O;7!A;GEI;F< 

M92`B4V]F='=A<F4B*2!T;R!U<V4L(')E<')O9'5C92P 9&ES<&QA>2P 9&ES
M=')I8G5T92P-"B`J(&5X96-U=&4L(&%N9"!T<F%N<VUI="!T:&4 4V]F='=A

M*B!3;V9T=V%R92P 86YD('1O('!E<FUI="!T:&ER9"UP87)T:65S('1O('=H
M;VT =&AE(%-O9G1W87)E(&ES(&9U<FYI<VAE9"!T;PT*("H 9&\ <V\L(&%L

M:6=H="!N;W1I8V5S(&EN('1H92!3;V9T=V%R92!A;F0 =&AI<R!E;G1I<F4 
M<W1A=&5M96YT+"!I;F-L=61I;F<-"B`J('1H92!A8F]V92!L:6-E;G-E(&=R
M86YT+"!T:&ES(')E<W1R:6-T:6]N(&%N9"!T:&4 9F]L;&]W:6YG(&1I<V-L

M=&AE(%-O9G1W87)E+"!I;B!W:&]L92!O<B!I;B!P87)T+"!A;F0-"B`J(&%L
M;"!D97)I=F%T:79E('=O<FMS(&]F('1H92!3;V9T=V%R92P =6YL97-S('-U
M8V  8V]P:65S(&]R(&1E<FEV871I=F4-"B`J('=O<FMS(&%R92!S;VQE;'D 
M:6X =&AE(&9O<FT ;V8 ;6%C:&EN92UE>&5C=71A8FQE(&]B:F5C="!C;V1E
M(&=E;F5R871E9"!B>0T*("H 82!S;W5R8V4 ;&%N9W5A9V4 <')O8V5S<V]R
M+ T*("H-"B`J(%1(12!33T945T%212!)4R!04D]6241%1"`B05, 25,B+"!7

M35!,245$+"!)3D-,541)3D< 0E54($Y/5"!,24U)5$5$(%1/(%1(12!705)2
M04Y42453($]&($U%4D-(04Y404))3$E462P-"B`J($9)5$Y%4U, 1D]2($$ 


M4R!/4B!!3EE/3D4 1$E35%))0E5424Y'(%1(12!33T945T%212!"12!,24%"
M3$4-"B`J($9/4B!!3ED 1$%-04=%4R!/4B!/5$A%4B!,24%"24Q)5%DL(%=(
M151(15( 24X 0T].5%)!0U0L(%1/4E0 3U( 3U1(15)725-%+`T*("H 05))



M;65M;W)Y+"!S=&0N:6YT<FEN<VEC+"!C;W)E+F5X8V5P=&EO;BP <W1D+F%L
M9V]R:71H;2P-"B` ("!S=&0N8V]N=CL-" T*8VQA<W, 2V5Y17)R;W( .B!%
M>&-E<'1I;VX >PT*("` ('1H:7,H<W1R:6YG(&US9RD >PT*("` ("` ("!S

M.B!U:6YT('L-"B` ("!%35!462P-"B` ("!54T5$+`T*("` (%)%34]6140-
M"GT-" T*<W1R=6-T($ME>59A;%)A;F=E*$LL(%8L(&)O;VP <W1O<F5(87-H


M(&%L:6%S($L 5#L-
M("!087)A;&QE;$%!(2A++"!6+"!S=&]R94AA<V I(&%A.PT*<'5B;&EC. T*
M("` ('1H:7,H4&%R86QL96Q!02$H2RP 5BP <W1O<F5(87-H*2!A82D >PT*
M("` ("` ("!T:&ES+F%A(#T
M6VEN9&5X72`A/2!54T5$("8F(&EN9&5X(#P 86$N<W!A8V4I('L-"B` ("` 

M;VYT*"D >PT*("` ("` ("!S=&%T:6, :68H=F%L<RD >PT*("` ("` ("` 
M("` <F5T=7)N(&%A+G9A;'-;:6YD97A=.PT*("` ("` ("!](&5L<V4 >PT*
M("` ("` ("` ("` <F5T=7)N(&%A+E]K97ES6VEN9&5X73L-"B` ("` ("` 


M("8F(&EN9&5X(#P 86$N<W!A8V4I('L-"B` ("` ("` ("` (&EN9&5X*RL[


M+R!)="=S(&9A<W1E<B!T;R!S=&]R92!T:&4 :&%S:"!I9B!I="=S(&5X<&5N
M<VEV92!T;R!C;VUP=71E+"!B=70-"B\O(&9A<W1E<B!N;W0 =&\ :68 :70G
M<R!C:&5A<"!T;R!C;VUP=71E+ T*<')I=F%T92!T96UP;&%T92!S:&]U;&13
M=&]R94AA<V H2RD >PT*("` (&5N=6T 8F]O;"!S:&]U;&13=&]R94AA<V  
M/2`A:7-&;&]A=&EN9U!O:6YT(4L )B8 (6ES26YT96=R86PA2SL-"GT-" T*
M9FEN86P 8VQA<W, 4&%R86QL96Q!02A++"!6+"!B;V]L('-T;W)E2&%S:"`]




M7W0 7VQE;F=T:#L ("\O($QO9VEC86P <VEZ90T*("` ('-I>F5?="!S<&%C
M93L ("\O(%-T;W)A9V4 <W!A8V4-"B` ("!S:7IE7W0 ;D1E860[("`O+R!.
M=6UB97( ;V8 96QE;65N=', <F5M;W9E9"X-" T*("` ("\O($=O;V0 =F%L
M=65S(&9O<B!A(&QI;F5A<B!C;VYG<G5E;G1I86P <F%N9&]M(&YU;6)E<B!G



M;V8 /#T :&%S:%]T+G-I>F5O9BD >PT*("` ("` ("` ("` :&%S:%]T(&AA
M<V  /2!C87-T*&AA<VA?="D :V5Y.PT*("` ("` ("!](&5L<V4 <W1A=&EC
M(&EF*&ES*'1Y<&5O9BAK97DN=&](87-H*"DI*2D >PT*("` ("` ("` ("` 




M("` ("` 875T;R!H87-H1G5L;"`](&=E=$AA<V H:V5Y*3L-"B` ("` ("` 




M:7IE7W0 <&]S(#T
M:7IE7W0 <F%N9"`](&AA<VA&=6QL("L ,3L-" T*("` ("` ("` ("` =6EN
M="!F;&%G(#T =F]I9#L-"B` ("` ("` ("` ('=H:6QE*'1R=64I('L-"B` 
M("` ("` ("` ("` ("!F;&%G(#T 9FQA9W-;<&]S73L-"B` ("` ("` ("` 
M("` ("!I9B H:&%S:$9U;&P /3T :&%S:&5S6W!O<UT )B8 :V5Y(#T](%]K
M97ES6W!O<UT )B8 9FQA9R`A/2!%35!462D-"B` ("` ("` ("` ("` ("` 
M("!\?"!F;&%G(#T
M96%K.PT*("` ("` ("` ("` ("` ('T-" T*("` ("` ("` ("` ("` (')A
M;F0 /2!C87-T*'-I>F5?="D *&-A<W0H=6QO;F<I(')A;F0 *B!M=6P )2!M
M;V0I.PT*("` ("` ("` ("` ("` ('!O<R`]("AR86YD("L :&%S:$9U;&PI




M(#T :&%S:$9U;&P )2!S<&%C93L-"B` ("` ("` ("` ('-I>F5?="!R86YD
M(#T

M("8F(%]K97ES6W!O<UT /3T :V5Y*2!\?`T*("` ("` ("` ("` ("` ("` 
M(&9L86=S6W!O<UT (3T 55-%1"D >PT*("` ("` ("` ("` ("` ("` ("` 

M<F%N9"`](&-A<W0H<VEZ95]T*2`H8V%S="AU;&]N9RD <F%N9"`J(&UU;"`E

M;"D )2!S<&%C93L-"B` ("` ("` ("` ('T-" T*("` ("` ("` ("` :&%S
M:&5S6W!O<UT /2!H87-H1G5L;#L-"B` ("` ("` ("` (')E='5R;B!P;W,[



M7W0 <&]S(#T
M7W0 <F%N9"`](&AA<VA&=6QL("L ,3L-" T*("` ("` ("` ("` =6EN="!F
M;&%G(#T =F]I9#L-"B` ("` ("` ("` ('=H:6QE*'1R=64I('L-" T*("` 
M("` ("` ("` ("` (&9L86< /2!F;&%G<UMP;W-=.PT*("` ("` ("` ("` 
M("` (&EF*"A?:V5Y<UMP;W-=(#T](&ME>2`F)B!F;&%G("$]($5-4%19*2!\
M?"!F;&%G(#T
M.PT*("` ("` ("` ("` ("` ('T-" T*("` ("` ("` ("` ("` (')A;F0 
M/2!C87-T*'-I>F5?="D *&-A<W0H=6QO;F<I(')A;F0 *B!M=6P )2!M;V0I
M.PT*("` ("` ("` ("` ("` ('!O<R`]("AR86YD("L :&%S:$9U;&PI("4 


M("` ("` ('-I>F5?="!F:6YD1F]R26YS97)T*$L :V5Y+"!I;6UU=&%B;&4 

M:&%S:$9U;&P )2!S<&%C93L-"B` ("` ("` ("` ('-I>F5?="!R86YD(#T 

M(#T](%53140 )B8 7VME>7-;<&]S72`A/2!K97DI('L-"B` ("` ("` ("` 
M("` ("!R86YD(#T 8V%S="AS:7IE7W0I("AC87-T*'5L;VYG*2!R86YD("H 
M;75L("4 ;6]D*3L-"B` ("` ("` ("` ("` ("!P;W, /2`H<F%N9"`K(&AA


M<W-I9VY.;U)E:&%S:$-H96-K*$L :V5Y+"!6('9A;"P :&%S:%]T(&AA<VA&

M(&AA<VA&=6QL*3L-"B` ("` ("` =F%L<UMI72`]('9A;#L-"B` ("` ("` 
M:6UM=71A8FQE('5I;G0 9FQA9R`](&9L86=S6VE=.PT*("` ("` ("!I9BAF
M;&%G("$](%53140I('L-"B` ("` ("` ("` (&EF*&9L86< /3T 4D5-3U9%




M(&ME>2P 5B!V86PI('L-"B` ("` ("` <VEZ95]T(&D /2!F:6YD1F]R26YS

M;75T86)L92!U:6YT(&9L86< /2!F;&%G<UMI73L-"B` ("` ("` :68H9FQA

M('L-"B` ("` ("` ("` ("` ("!N1&5A9"TM.PT*("` ("` ("` ("` ?0T*
M("` ("` ("` ("` 7VQE;F=T:"LK.PT*("` ("` ("` ("` 9FQA9W-;:5T 


M("` ("` ("!I9BAC87-T*')E86PI("A?;&5N9W1H("L ;D1E860I("\ <W!A
M8V4 /"`P+C<I('L-"B` ("` ("` ("` (')E='5R;CL-"B` ("` ("` ?0T*









M("` ("` ("`H='EP96ED*%8I+F9L86=S*2`_(&-A<W0H1T,N0FQK071T<BD 

M("` ('-T871I8R!I9BAS=&]R94AA<V I('L-"B` ("` ("` ("` (&AA<VAE
M<R`](&-A<W0H:&%S:%]T*BD-"B` ("` ("` ("` ("` ("` ("` 1T,N;6%L
M;&]C*&EN:713:7IE("H :&%S:%]T+G-I>F5O9BP 1T,N0FQK071T<BY.3U]3

M95MI;FET4VEZ95TI+G!T<CL-"B` ("` ("` <W!A8V4 /2!I;FET4VEZ93L-



M("` ("` ('-C;W!E('1Y<&5O9BAT:&ES*2!N97=486)L92`](&YE=R!T>7!E

M"B` ("` ("` ("` ('-T871I8R!I9BAS=&]R94AA<V I('L-"B` ("` ("` 
M("` ("` ("!N97=486)L92YS<&%C92`](&UI;BA'0RYS:7IE3V8H;F5W5&%B

M("` ("` ("` ("` ("` 1T,N<VEZ94]F*&YE=U1A8FQE+G9A;',I("\ 5BYS

M1T,N<VEZ94]F*&YE=U1A8FQE+F9L86=S*2`O('5B>71E+G-I>F5O9BP-"B` 
M("` ("` ("` ("` ("` ("` ("` ("` ("` ("` ("` ("!'0RYS:7IE3V8H
M;F5W5&%B;&4N:&%S:&5S*2`O(&AA<VA?="YS:7IE;V8I.PT*("` ("` ("` 
M("` ?2!E;'-E('L-"B` ("` ("` ("` ("` ("!N97=486)L92YS<&%C92`]

M("` ("` ("` ("` ("` ("` ("` ("` 1T,N<VEZ94]F*&YE=U1A8FQE+G9A




M:"AI.R`P+BYS<&%C92D >PT*("` ("` ("` ("` :68H9FQA9W-;:5T /3T 
M55-%1"D >PT*("` ("` ("` ("` ("` ('-T871I8R!I9BAS=&]R94AA<V I
M('L-"B` ("` ("` ("` ("` ("` ("` ;F5W5&%B;&4N87-S:6=N3F]296AA

M("` ("` ("` ("!](&5L<V4 >PT*("` ("` ("` ("` ("` ("` ("!N97=4
M86)L92YA<W-I9VY.;U)E:&%S:$-H96-K*%]K97ES6VE=+"!V86QS6VE=*3L-

M" T*("` ("` ("!T:&ES+F9R964H*3L-" T*("` ("` ("!F;W)E86-H*'1I

M+G1U<&QE;V9;=&E=(#T 96QE;3L-"B` ("` ("` ?0T*("` ('T-" T*("` 
M("\O+PT*("` (')E9B!6(&]P26YD97 H2R!I;F1E>"D >PT*("` ("` ("!S

M/3T <VEZ95]T+FUA>"D >PT*("` ("` ("` ("` =&AR;W< ;F5W($ME>45R

M.PT*("` ("` ("!](&5L<V4 >PT*("` ("` ("` ("` <F5T=7)N('9A;'-;

M97 H2R!I;F1E>"D 8V]N<W0 >PT*("` ("` ("!S:7IE7W0 :2`](&9I;F1%


M9FEN9"!K97D (B!^('1O(7-T<FEN9RAI;F1E>"DI.PT*("` ("` ("!](&5L



M;F1E>"P =F%L*3L-"B` ("` ("` <F5H87-H269.96-E<W-A<GDH*3L-"B` 

M87-H+"!F86QS92D :V5Y<R I('L-"B` ("` ("` <F5T=7)N($ME>59A;%)A

M"B` ("`O+R\-"B` ("!+97E686Q286YG92$H2RP 5BP <W1O<F5(87-H+"!T



M(#T 9FEN9$5X:7-T:6YG*&EN9&5X*3L-"B` ("` ("` :68H:2`]/2!S:7IE



M("` ("!N1&5A9"LK.PT*("` ("` ("` ("` 9FQA9W-;:5T /2!214U/5D5$



M(&EF*&D /3T <VEZ95]T+FUA>"D >PT*("` ("` ("` ("` <F5T=7)N(&YU


M(&1E;&5T:6YG('1H92!C;VYT96YT<R!O9B!T:&4 87)R87D ;6%N=6%L;'DL
M(&EF('-U<'!O<G1E9`T*("` ("`J(&)Y('1H92!'0RXJ+PT*("` ('9O:60 
M9G)E92 I('L-"B` ("` ("` 1T,N9G)E92  8V%S="AV;VED*BD =&AI<RY?





M>PT*("` ("` ("!R971U<FX 7VQE;F=T:#L-


M<FEN9R!R86YD4W1R:6YG*')E86P 97AP96-T961,96YG=& I('L-"B` ("` 
M("` <W1R:6YG(')E=#L-"B` ("` ("` <F5A;"!C=71O9F8 /2`Q+C!,("\ 


M=&]F9BD >PT*("` ("` ("` ("` <F5T('X]('5N:69O<FTA(EM=(B G82<L



M;7E!02`](&YE=R!087)A;&QE;$%!(2AS=')I;F<L('-T<FEN9RDH*3L-" T*
M("` (&9O<F5A8V H:3L ,"XN,3`P7S`P,"D >PT*("` ("` ("!A=71O(&UY
M2V5Y(#T

M;#L-"B` ("` ("` ;7E!05MM>4ME>5T /2!M>59A;#L-
M("!E;F9O<F-E*&UY04$N;&5N9W1H(#T](&)U:6QT:6XN;&5N9W1H*3L-"B` 

M;7E!05MK97E=(#T
M;R!K97ES(#T 8G5I;'1I;BYK97ES.PT*("` (')A;F1O;5-H=69F;&4H:V5Y





M(&5N9F]R8V4H:R!I;B!M>4%!*3L-"B` ("` ("` 96YF;W)C92  *BAK(&EN
M(&UY04$I(#T]('8I.PT*("` ('T-" T*("` ('-T<FEN9UM=(&UY5F%L=65S
M.PT*("` (&9O<F5A8V H=F%L.R!M>4%!+G9A;'5E<RD >PT*("` ("` ("!M

M.PT*("` (&9O<F5A8V H:V5Y.R!M>4%!+FME>7,I('L-"B` ("` ("` ;7E+

M8G5I;'1I;BYK97ES.PT*("` (&%U=&\ 8G5I;'1I;E9A;', /2!B=6EL=&EN
M+G9A;'5E<SL-"B` ("!S;W)T*&)U:6QT:6Y686QS*3L-"B` ("!S;W)T*&)U



B=W)I=&5L;B B4&%S<V5D(&%L;"!T97-T<RXB*3L-"GT-" ``
`
end

Oct 17 2009

language_fan <foo bar.com.invalid> writes:

Sat, 17 Oct 2009 19:16:32 +0000, dsimcha thusly wrote:

 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 article
 Associative arrays are today quite problematic because they don't offer
 any true iteration. Furthermore, the .keys and .values properties
 create new arrays, which is wasteful.
 Another issue with associative arrays is that ++a[k] is hacked, which
 reflects a grave language limitation. That needs to be replaced with a
 true facility.
 Any other issues with AAs that you want to see fixed, and ideas guiding
 a redesign?
 Thanks,
 Andrei

 
 I've thought about this a lot, here's a laundry list.

[snip]

It would be great if the underlying implementation could be switched 
based on the task at hand. The AAs could be completely library provided, 
with only a small built-in syntactical sugaring on top of it.

Oct 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

language_fan wrote:
 It would be great if the underlying implementation could be switched 
 based on the task at hand. The AAs could be completely library provided, 
 with only a small built-in syntactical sugaring on top of it.

They already are. In D1, the complete implementation is in 
phobos/internal/aaA.d, and the compiler knows nothing about it other 
than the function signatures.

Oct 18 2009

BCS <none anon.com> writes:

Hello dsimcha,

 6.  The array of aaA structs can only have the following sizes:
 
 immutable size_t[] prime_list = [
 97UL,            389UL,
 1_543UL,          6_151UL,
 24_593UL,         98_317UL,
 393_241UL,      1_572_869UL,
 6_291_469UL,     25_165_843UL,
 100_663_319UL,    402_653_189UL,
 1_610_612_741UL,  4_294_967_291UL,
 //  8_589_934_513UL, 17_179_869_143UL
 ];
 
 I understand that there is theoretical justification for this, but in
 practice I'd rather have O(N) performance in a few more corner cases
 in exchange for having my AA not be horribly space-inefficient.
 

Adding more primes to that list should resolve that issue nicely while
retaining 
the advantages of using mod-prime in the hashing.

for example:

1       97
2       197
3       397
4       797
5       1597
6       3203
7       6421
8       12853
9       25717
10      51437
11      102877
12      205759
13      411527
14      823117
15      1646237
16      3292489
17      6584983
18      13169977
19      26339969
20      52679969
21      105359939
22      ...

heres some code to generate a list at whatever rate you want:

import std.stdio;
import std.math;

ulong[] primes;


// running time O(intagral(pi(sqrt(x)), dx, 0, n))
void main()
{
	real rate = 2.0;
	uint from = 97;

	int count = 1;
	writef("%s\t%s\n", count, from);


	primes.length = 10_000; // stop somewhere north of 4_294_967_291UL WARNING: 
this might not end correctly

	long prime_n = 0;
	ulong sqrt_n = 0;
	ulong p = 1;

	outer: while(sqrt_n < primes.length - 5)
	{
		p += 2;
		ulong sqrt_p = 1+cast(ulong)sqrt(cast(real)p);
		sqrt_n = 0;

		while(sqrt_n < prime_n && primes[sqrt_n] <= sqrt_p)
			if(p % primes[sqrt_n] == 0)
				continue outer;
			else
				sqrt_n++;

		if(prime_n < primes.length) primes[prime_n] = p;
		prime_n++;
		if(p >= from * rate)
		{
			count++;
			from = p;
			writef("%s\t%s\n", count, from);
		}
	}
}

Oct 17 2009

Max Samukha <spambox d-coding.com> writes:

On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

Associative arrays are today quite problematic because they don't offer 
any true iteration. Furthermore, the .keys and .values properties create 
new arrays, which is wasteful.

Another issue with associative arrays is that ++a[k] is hacked, which 
reflects a grave language limitation. That needs to be replaced with a 
true facility.

Any other issues with AAs that you want to see fixed, and ideas guiding 
a redesign?


Thanks,

Andrei

They should be true reference types. Now we have oddities like this: 

int[int] a;
auto b = a;
a[1] =  2;
assert(b[1] == 2); // Range violation

int[int] a;
a[1] =  2;
auto b = a;
a[2] = 3;
assert(b[2] == 3); // Ok

Oct 17 2009

grauzone <none example.net> writes:

Max Samukha wrote:
 On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?


 Thanks,

 Andrei

 
 They should be true reference types. Now we have oddities like this: 
 
 int[int] a;
 auto b = a;
 a[1] =  2;
 assert(b[1] == 2); // Range violation
 
 int[int] a;
 a[1] =  2;
 auto b = a;
 a[2] = 3;
 assert(b[2] == 3); // Ok

That's because the actual AA is allocated on demand (when you add the 
first element). The new array type T[new] is going to have the same 
"problem".

I like it, because if you don't need it, it doesn't allocate memory. Of 
course it's possible that the complication in semantics isn't it worth. 
But D is still a systems programming language with a *very* bad GC, 
which is why I think it should continue to work this way.

Oct 17 2009

Max Samukha <spambox d-coding.com> writes:

On Sat, 17 Oct 2009 22:47:21 +0200, grauzone <none example.net> wrote:

Max Samukha wrote:
 On Sat, 17 Oct 2009 13:28:51 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Associative arrays are today quite problematic because they don't offer 
 any true iteration. Furthermore, the .keys and .values properties create 
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which 
 reflects a grave language limitation. That needs to be replaced with a 
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?


 Thanks,

 Andrei

 
 They should be true reference types. Now we have oddities like this: 
 
 int[int] a;
 auto b = a;
 a[1] =  2;
 assert(b[1] == 2); // Range violation
 
 int[int] a;
 a[1] =  2;
 auto b = a;
 a[2] = 3;
 assert(b[2] == 3); // Ok

That's because the actual AA is allocated on demand (when you add the 
first element). The new array type T[new] is going to have the same 
"problem".

I agree. But the problem is the lack of a decent way to instantiate an
empty AA. Now you either have to do idiotic things like:

int[int] a;
a[1] = 1;
a.remove(1);

Or wrap the array in another type. 

And, talking about T[new], we should be able to instantiate them
eagerly as well.

I like it, because if you don't need it, it doesn't allocate memory. Of 
course it's possible that the complication in semantics isn't it worth. 
But D is still a systems programming language with a *very* bad GC, 
which is why I think it should continue to work this way.

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Max Samukha:

 I agree. But the problem is the lack of a decent way to instantiate an
 empty AA. Now you either have to do idiotic things like:
 
 int[int] a;
 a[1] = 1;
 a.remove(1);
 
 Or wrap the array in another type.

This is an empty AA, you need nothing else:
int[int] a;

In my post I have also said I'd like to have a built-in expression literal for
empty AA, I use this in my dlibs:

private template AA_impl(KeyType, ValueType) {
    ValueType[KeyType] result;
    const ValueType[KeyType] res = result;
}
template AA(KeyType, ValueType) {
    const ValueType[KeyType] AA = AA_impl!(KeyType, ValueType).res;
}

That you can use for example like this, to give an empty AA to a foo() function:
foo(AA!(int, float));

Something built-in with a better syntax may be found, example:

That can be used like:
foo([int:float]);

Bye,
bearophile

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

bearophile:
 That you can use for example like this, to give an empty AA to a foo()
function:
 foo(AA!(int, float));

I have just found that you can also use "null" there. It's less explicit
because if you take a look at the call site only, you don't know that the
receiver will need an AA.

That's why I'd like to disallow "null" to be used where an empty array can be
given (and force the programmer to use [] there). This is also how Delight
language works.

Bye,
bearophile

Oct 18 2009

Max Samukha <spambox d-coding.com> writes:

On Sun, 18 Oct 2009 04:29:57 -0400, bearophile
<bearophileHUGS lycos.com> wrote:

Max Samukha:

 I agree. But the problem is the lack of a decent way to instantiate an
 empty AA. Now you either have to do idiotic things like:
 
 int[int] a;
 a[1] = 1;
 a.remove(1);
 
 Or wrap the array in another type.

This is an empty AA, you need nothing else:
int[int] a;

I showed in my original post it is not enough to use AA as a reference
type.

Oct 18 2009

Robert Clipsham <robert octarineparrot.com> writes:

Andrei Alexandrescu wrote:
 Any other issues with AAs that you want to see fixed, and ideas guiding 
 a redesign?

Their speed. I did a benchmark a while ago comparing tango's HashMap 
implementation and built in AAs, tango's hashmaps were twice as fast... 
It'd be nice if the built in language feature was faster than an 
implementation done within the language! :)

Oct 17 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Clipsham wrote:
 Andrei Alexandrescu wrote:
 Any other issues with AAs that you want to see fixed, and ideas 
 guiding a redesign?

 
 Their speed. I did a benchmark a while ago comparing tango's HashMap 
 implementation and built in AAs, tango's hashmaps were twice as fast... 
 It'd be nice if the built in language feature was faster than an 
 implementation done within the language! :)

I hear you, but fortunately that doesn't concern the interface.

Andrei

Oct 17 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

I like how you leave no stone unturned. Regardless of what the "final" quality
of the D2 language will be, I will always think you have done your best to
improve it. I will say so to the people I know too.

I have already discussed about D AAs in the past, but I guess it was not the
right moment to do.

Do we want to keep them in the language? It's easy to do an:
import std.collections: HashMap;
And a flexible language is supposed to allow for a handy syntax even for
user-defined collections.
But the built-ins have a nicer syntax and a little different purpose. So I
think having the built-in syntax is good.

That syntax may map to built-in logic, or to library code. Both options are
acceptable, but I think having built-in logic is a little better.
But I need templated HashMaps and TreeMaps too in a std.collections.

Using templates leads to faster code, so the hash in collections can be
templated. But templates slow down compilation and inflate binary size. Built-i
AAs have a handy syntax, so they get used more often, even where you put only 6
items into them. So I'd like built-in AAs to be optimized for a very small
number of items too. There are several ways to do this. Built-in AAs must be
the most general and very flexible, easy to use, and not bug-prone. I don't
need built-in AAs to be the faster ones, or the ones the use the less memory.
This is an optimization, so for such more demanding needs I can use the HashMap
from the collections module.

So a good thing of built-in AAs is that they don't inflate code a lot, and the
compilation is fast, so they can be used everywhere in the program performance
in both speed and memory isn't critical (and this happens).

I like how the current AA design never degenerate to an O(n^2) behaviour
(Python dicts are much faster than the current D AAs but Python dicts in corner
cases go quadratic). But this quality has the disadvantage of requiring a opCmp
too to items that I want to put in an AA (because final chaining is resolved
with an ordered tree), while in Python and other languages I need just an opEq
and an opHash.

Regarding the iteration, I'd like the design used in Python3, where keys and
values (and items, that are pairs, if you want) return a small iterable object
that's a view to the dict. I may also want methods that return a true array of
values/items/keys as now, because once in a while you need this too (Python3
doesn't have this, and you need to do list(d.keys())).
The keys() returns an light immutable set object, this is very good and very
handy. Doing set operations is a godsend in certain kinds of code.

To delete an key-value another possible syntax is:
delete d[key]
But d.delete(key) too is acceptable.

LDC compiler is able to optimize away the double lookups if they are done
nearby, so it's even possible for the "in" to return a boolean. (But I like to
have both !in and !is too).

I don't use .rehash often, and it takes some time to run. So I don't know how
much useful it is.

I need a .clear method too. And maybe a .copy too. AAs *must* support opEqual
too, as in LDC, because I have to know when a function returns the correct AA
inside unittests.
(Python dicts even support opCmp, but this is a little tricky, and less useful,
so this can be omitted).
This features don't require 100 lines of code, but they make AAs quite more
useful and handy.

A method like Python .get(key[, default]) is quite useful, it returns default
if the key is missing. In D probably the default can't be optional.

fromkeys, update (and maybe pop and popitem too) methods can be useful, see
Python docs: 
http://docs.python.org/library/stdtypes.html#dict
.update can also be written as ~, but there it works only with another AA, and
it's not commutative.

I'd like to have a literal for an empty AA, I use AA!(K,V) in my dlibs.

An old idea: If the value of an AA is of type void, the AA may not allocate
memory for them, so the AA becomes a set (and it needs few basic set
operations, with operator overload).

I may like an AA to be false when empty.

I'd like to have a standard way to, given a pointer to a AA value, return its
key. I have a function that does this in my dlibs (module "extra") but it's not
officially, so it will break when the AA implementation will change. This is
useful if I want to implement a richer data structure on top of built-in AAs,
for example an "ordered AA" that when iterated on, yields items in the same
order as the insertion order, so values are kept linked in a double linked
list, to do this and keep the data structure flexible I need to know a key
given a pointer to a value.

I may like a "freeze" method that turns an AA into an immutable one. Ruby has
this.

A .reserve method may be useful to speed up AA creation when you know you have
to add tons of pairs.

I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.

Reducing the number of small memory allocations done by the AAs currently used
may be positive. Making them a little more precise for the GC will be very good
as D matures.

The management of AAs at compile-time can be better. Having literals that work
at compile time, or compile-time functions used to create runtime AAs is
positive. It can even be possible to use perfect hashes for immutable AAs built
at compile time.

Bye,
bearophile

Oct 17 2009

Walter Bright <newshound1 digitalmars.com> writes:

bearophile wrote:
 I'd really like the "default" iteration on an AA to yield its keys,
 instead of values as currently done. Because if I have a key I can
 find its value, while the opposite is not possible, so having keys is
 much more useful. This is true in Python too. In my dlibs all
 iterables and functions behave like this. The current D design is
 just a design mistake ad Walter was wrong on this.

You can iterate over both keys and values with:

     foreach (key, value; aa)

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Walter Bright:

 You can iterate over both keys and values with:
      foreach (key, value; aa)

I know this, of course, but you are missing the point by a mile. An explicit
foreach is not the only way you may want to iterate an AA on.

If I have a higher order function like map, I must be able to use it on any
kind of iterable. Like an array, associative array, a set, a list, and so on. I
must be able to change the data structure I give to the map, if the need arise
while I change the code, and it has to keep working. If the mapping function
takes a single argument the map has to choose to what iterate, among keys or
values (or even both, as pairs). In such case iterating on keys is more useful.
That's how all my dlibs higher order functions work when you give them an AA,
they iterate on AA keys, array items, set items, list items, etc.

If you think still I am not right, you may ask for a poll here, to see how many
like the default AA iteration to be on keys or values.

Bye,
bearophile

Oct 18 2009

Piotrek <starpit tlen.pl> writes:

bearophile Wrote:

 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.
 

No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I
spent many hours on finding bug in python's script written by me in my job. The
reason was the python's behaviour decribed by you. Then I couldn't understand
why the hell iterating on collection returns a key in the first place. It's so
not intuitive. Your explanation is not even close in convincing me. If I wanted
keys I would write:
 foreach (key, value; set) or (key; set.keys)

Now I know why I don't like Python and I hope I will never have to need it
again. For scriptng (but not at work since I don't do scripting enymore) I
prefer D (rdmd). 

bearophile, I like your great commitment in D development but I don't like
pushing D toward pythonish world. (Of course some ideas from Python project
could be succesfully used in D) 

Cheers 
Piotrek

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Piotrek:

No! No! No! Maybe you are wrong.<

Life is complex so I am usually wrong, because it's hard to consider all sides
of a thing, but sometimes other people are even more wrong :-)


I remember that I spent many hours on finding bug in python's script written by
me in my job. The reason was the python's behaviour decribed by you.<

I have have followed many Python programmers for a lot of time and I think such
problem of yours is not common. But I'll keep an eye open for possible other
people with your problem.


Then I couldn't understand why the hell iterating on collection returns a key
in the first place. It's so not intuitive.<

What's intuitive on iterating on values? Well, I think Walter agrees with you,
I remember his explanation (iterating on a normal array doesn't yield its
indexes), but beside what's intuitive you have also to keep in mind what's
handy, and iterating on keys is more useful.

Beside the things I have said, if you think of associative arrays as sets where
there is a value associated to each set item (and this is how Python dicts are
implemented and how the light iterable objects they give you when you ask for
keys in Python3), when you iterate on the set you get the set items, so if you
extend the set you keep iterating on the set items...


 Your explanation is not even close in convincing me. If I wanted keys I would
write:
  foreach (key, value; set) or (key; set.keys)

In D set.keys needs a good amount of memory and time, it's useful only in
special situations, for example when you are sure your AA is small. Andrei will
probably push to add/replace something to find keys and a lazy way.


Now I know why I don't like Python and I hope I will never have to need it
again. For scriptng (but not at work since I don't do scripting enymore) I
prefer D (rdmd).<

Refusing forever to use a popular (and usually quite appreciated) language just
because you don't like a single small feature is stupid. I don't want to force
you to like Python, but the choice of a language must be based on a bit more
global evaluation of it. Every language under the sun has plenty warts. C++ has
enough warts (far larger than the one you have listed) that you can write a big
book on them, but people keep using it still.

"Not doing scripting any more" too is probably a not smart thing to say,
because in most programming jobs I've seen, there are small files to munge,
commands to automate, things to show or plot, and so on, etc. A scripting
language (even just shell scripting) is designed for such things.


bearophile, I like your great commitment in D development but I don't like
pushing D toward pythonish world. (Of course some ideas from Python project
could be succesfully used in D)<

Thank you :-) I've known several languages (I think this is normal in this
newsgroup), and I try to suggest what my experience shows me :-) I'm often
wrong anyway.

Bye,
bearophile

Oct 18 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Sun, 18 Oct 2009 06:18:34 -0400, bearophile wrote:

 Then I couldn't understand why the hell iterating on collection
 returns a key in the first place. It's so not intuitive.<

 
 What's intuitive on iterating on values? Well, I think Walter agrees
 with you, I remember his explanation (iterating on a normal array
 doesn't yield its indexes), but beside what's intuitive you have also
 to keep in mind what's handy, and iterating on keys is more useful.

It's easy to see what's intuitive if you consider what a collection
contains.  To me, it contains *values*, always.  These values may be
indexed: by an arbitrary key (AA), by an integral index (array), or not
at all (single-linked list).  But the index is not the point, it's only
a way to access values.  And when I iterate over a collection, I
definitely wan to iterate over the values it contains, regardless of an
indexing scheme this particular collection uses.

Oct 18 2009

Piotrek <starpit tlen.pl> writes:

bearophile pisze:
 Piotrek:
 
 No! No! No! Maybe you are wrong.<

 
 Life is complex so I am usually wrong, because it's hard to consider all sides
of a thing, but sometimes other people are even more wrong :-)
 

I didn't mean to be offensive. All people (including me) have tendency 
to claim that they point is the only right.


 In D set.keys needs a good amount of memory and time, it's useful only in
special situations, for example when you are sure your AA is small. Andrei will
probably push to add/replace something to find keys and a lazy way.
 

Yes that could be good. Any views, ranges are option of course. However 
I was referring to semantics not mechanics.

 Refusing forever to use a popular (and usually quite appreciated) language
just because you don't like a single small feature is stupid. I don't want to
force you to like Python, but the choice of a language must be based on a bit
more global evaluation of it. Every language under the sun has plenty warts.
C++ has enough warts (far larger than the one you have listed) that you can
write a big book on them, but people keep using it still.

The more time I live the more stupid things I find in my behaviour.
But to be honest I have longer list of reasons why I don't like Python
- dynamic typing
- whitespace indentation
- performance
- not friendly for C-syntax-boys like me
- lack of many features that are in D
and some more

I find D to be sufficient alternative for all scripting language (good 
std library is a cure). I said it before but I can say it again. Using 
many languages when one should do is waste for my time and efficiency ;)

C++ (or C in embedded systems)is used because of extremely big momentum 
created for years expressed in invested money in learning, software 
development, etc. But hopefully everything could be stopped. It's a 
matter of time and force.

 "Not doing scripting any more" too is probably a not smart thing to say,
because in most programming jobs I've seen, there are small files to munge,
commands to automate, things to show or plot, and so on, etc. A scripting
language (even just shell scripting) is designed for such things.

It's not my choice and I'm glad here. My duties don't include scripting 
any more. And to me, scripting languages are some kind of joke (having D 
around).

Despite it could seem (at first look) that I'm against you I'm not 
(except that area connected with those creatures). I really like the 
benchmarking done by you and I really appreciate how much time (much 
much more than me of course) you gave this community.

First I must pay same debts then maybe I will start the project I'm 
thinking of for long time.

Cheers
Piotrek

Oct 18 2009

grauzone <none example.net> writes:

Piotrek wrote:
 bearophile Wrote:
 
 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.

 
 No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I
spent many hours on finding bug in python's script written by me in my job. The
reason was the python's behaviour decribed by you. Then I couldn't understand
why the hell iterating on collection returns a key in the first place. It's so
not intuitive. Your explanation is not even close in convincing me. If I wanted
keys I would write:
  foreach (key, value; set) or (key; set.keys)
 

In a perfect world, iterating over an AA would yield a tuple (key, 
value). You could iterate over either the keys or values by iterating 
over a "view" on the key or value list. I'm surprised Python doesn't do 
that.

(I'm expecting that Andrei will replace the .key and .value properties 
by "lazy" ranges, that don't allocate memory; so that aspect will be 
alright. But too bad the gods don't see the need for better tuple support.)

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

grauzone:

 In a perfect world, iterating over an AA would yield a tuple (key, 
 value). You could iterate over either the keys or values by iterating 
 over a "view" on the key or value list. I'm surprised Python doesn't do 
 that.

I don't know why Python has originally chosen to iterate on just keys, maybe
it's a performance optimization (iterating on pairs is slower, because a single
reference needs no allocation, while a tuple of 2 may need it, in practice
CPython uses the same memory for the tuple, avoiding allocating it at every
loop cycle), or maybe for one of the reasons I've explained.

I don't know if D can make looping on key-value of built-in AAs as fast as
iterating on just keys or just values. Iterating on AA keys or values is a very
common operation that must be fast.
 

 (I'm expecting that Andrei will replace the .key and .value properties 
 by "lazy" ranges, that don't allocate memory; so that aspect will be 
 alright.

At least, the range of the keys supports a O(1) opIn_r, I hope :-)

Bye,
bearophile

Oct 18 2009

Christopher Wright <dhasenan gmail.com> writes:

Piotrek wrote:
 bearophile Wrote:
 
 I'd really like the "default" iteration on an AA to yield its keys, instead of
values as currently done. Because if I have a key I can find its value, while
the opposite is not possible, so having keys is much more useful. This is true
in Python too. In my dlibs all iterables and functions behave like this. The
current D design is just a design mistake ad Walter was wrong on this.

 
 No! No! No! Maybe you are wrong. Or it's a metter of taste. I remember that I
spent many hours on finding bug in python's script written by me in my job. The
reason was the python's behaviour decribed by you. Then I couldn't understand
why the hell iterating on collection returns a key in the first place. It's so
not intuitive. Your explanation is not even close in convincing me. If I wanted
keys I would write:
  foreach (key, value; set) or (key; set.keys)

Why not mandate using both keys and values? That should eliminate ambiguity.

Essentially, an associative array would be a Tuple!(Tkey, Tvalue)[] with 
some extra accessors.

Oct 18 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 
 I like how you leave no stone unturned. Regardless of what the
 "final" quality of the D2 language will be, I will always think you
 have done your best to improve it. I will say so to the people I know
 too.

Well it should be said Walter is doing his best too, I'm just bitching 
more :o).

 I have already discussed about D AAs in the past, but I guess it was
 not the right moment to do.

My thoughts are aligned to yours. I'm not sure whether much or any of 
this is attainable within D2.


Andrei

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

I'm not sure whether much or any of this is attainable within D2.<

The things I've written about aren't all equally important and they don't
require the same amount of work to be implemented. I am sorry for writing that
gazpacho.
There are two things that I think are more important that can implemented now:
the opEquals among AAs, and the default iteration with foreach on the keys. The
other things can wait.
The opEquals among AAs requires probably less than 20 lines of code.
Two persons (plus Walter, of course) have said they don't like to iterate on
the keys first. The other people have kept muzzle shut so I can't tell yet.

Bye,
bearophile

Oct 18 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Oct 18, 2009 at 10:56 AM, bearophile <bearophileHUGS lycos.com> wrote:

 The opEquals among AAs requires probably less than 20 lines of code.
 Two persons (plus Walter, of course) have said they don't like to iterate on
the keys first. The other people have kept muzzle shut so I can't tell yet.

I typed a long post that weighed lots of pros and cons of different
options, but then I hit upon a simple rule that I think makes a lot of
sense:

I think the default should be to iterate over whatever 'in' looks at.

And conversely, I think 'in' should compare against whatever default
iteration iterates over.

Proposed:
arrays -- default iteration over values,   "x in A" answers if x is
one of the values.
assoc arrays -- default iteration over keys,  "x in AA" answers if x
is one of the keys
sets -- iteration over keys (or call 'em values, could be either), "x
in S" answers if x is one of them

I think this is what Python uses, actually.  But I just think it makes
a lot of sense to say that 'x in Y' should be some kind of shorthand
for
foreach(thing; Y) {
     if (x == thing) {
           return something useful
     }
}

I guess that's even clearer in Python where you iterate by writing
"for thing in Y:"

This looks to me like a general rule that trumps a rule which merely
stems from the happenstantial syntactic similarity between arrays and
associative arrays.

--bb

Oct 18 2009

Piotrek <starpit tlen.pl> writes:

Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.
 

I was almost convinced, because that rule has a sense. But treating 
normal arrays and associative array has more sense to me.


fun (SomeObject object) {
foreach (element;object.arr1){ //normal, but how do I know at first look
//just do something with element  		
}

foreach (element;object.arr2){ // assoc, but how do I know at first look
//just do something with element hopefully not index
}

Cheers
Piotrek

Oct 18 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.

 I was almost convinced, because that rule has a sense. But treating normal
 arrays and associative array has more sense to me.


 fun (SomeObject object) {
 foreach (element;object.arr1){ //normal, but how do I know at first look
 //just do something with element
 }

 foreach (element;object.arr2){ // assoc, but how do I know at first look
 //just do something with element hopefully not index
 }

That sounds like an argument that there should be no default, because
either way it's not clear whether you're iterating over keys or
values.  That's reasonable too, I think.  Just get rid of the the
one-argument foreach over AAs altogether and force the user to be
explicit about it.  Probably much less error-prone than quietly
changing the D1 default, for sure. :-)   As much as people go on about
making it easy to port C code, really ya gots to think about all the
D1 code too.  It shouldn't be harder to port D1 code to D2 than C
code!

So for a new language I would go for what I said before.  But for D, I
think the better move is to get rid of the one-arg foreach and require
.keys / .values explicitly.  (And make that efficient, of course).

--bb

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
It shouldn't be harder to port D1 code to D2 than C code!<

In the world for every line of D1 code you may want to port to D2, there are
probably 100 or 1000+ lines of C code that you may want to port to D2, so the
situation is not the same. Keeping compatibility with C is far more important
than keeping compatibility with D1.

Bye,
bearophile

Oct 18 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Mon, 19 Oct 2009 01:37:36 +0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Bill Baxter:
 It shouldn't be harder to port D1 code to D2 than C code!<

 In the world for every line of D1 code you may want to port to D2, there  
 are probably 100 or 1000+ lines of C code that you may want to port to  
 D2, so the situation is not the same. Keeping compatibility with C is  
 far more important than keeping compatibility with D1.

 Bye,
 bearophile

Why would you want to port C code to D, if you can easily interface with  
it?

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Denis Koroskin:

Why would you want to port C code to D, if you can easily interface with it?<

First of all you have to consider programmer experience, they know C, so
keeping the language backwards compatible with C helps them avoid bugs and
learn D faster.
If you ignore what programmers know, and you assume an easy interface between a
new language and C, then you don't need to design a language like C++/D, that
keeps lot of little compatibility with C, you are more free, and you can avoid
warts like the design of C switch().

I have ported some thousands of lines of C code to D. Surely there are
situations where keeping a large amount of C code is the best thing do to, and
saves you lot of time and work.

But other times you may want to port the C code. Some of the reasons you may
have to port C code to D:
- because I like the look of D code more than D code;
- because the original C code may be so old and ugly that keeping it in my
project hurts my aesthetic sense;
- gives me more safety than certain C code;
- allows me to use a GC that may be safer than the original manual memory
management;
- because I may use my dlibs and shorten the original C code. Less code is
usually a good thing;
- because I will probably need to change and improve the code and I prefer to
do it on D code that's nicer and allows me to program in a faster way;
- because I use bud to compile small D projects and in them adding a C
dependency requires more time than translating 20 lines of C code and adding it
into an already existing D module;
- because I am creating some pure D library, to keep things simpler and tidy.
- I'd even like to see the official D2 compiler to be written in D1/D2 (and a
little of assembly).

Bye,
bearophile

Oct 18 2009

language_fan <foo bar.com.invalid> writes:

Sun, 18 Oct 2009 18:01:51 -0400, bearophile thusly wrote:

 Denis Koroskin:
 
Why would you want to port C code to D, if you can easily interface with
it?<

 
 First of all you have to consider programmer experience, they know C, so
 keeping the language backwards compatible with C helps them avoid bugs
 and learn D faster.

Is this a joke? Being backwards compatible with C often means that the 
compiler does not test for as many bugs on compile time. You probably 
have some kind of vague idea of the improvements over traditional C. Are 
you talking about competent programmers here? One part of mastering the 
skill programming means that you can easily switch languages if need be. 
You are a sucky novice if the only languages you know are C & C++.

Oct 19 2009

Piotrek <starpit tlen.pl> writes:

Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.

 I was almost convinced, because that rule has a sense. But treating normal
 arrays and associative array has more sense to me.


 fun (SomeObject object) {
 foreach (element;object.arr1){ //normal, but how do I know at first look
 //just do something with element
 }

 foreach (element;object.arr2){ // assoc, but how do I know at first look
 //just do something with element hopefully not index
 }

 
 That sounds like an argument that there should be no default, because
 either way it's not clear whether you're iterating over keys or
 values. 


Really?! That wasn't my intention :) In both cases I wish it were values ;)

 Just get rid of the the one-argument foreach over AAs altogether and 

force the user to be
 explicit about it.

I wouldn't do so. Would anybody do an error by thinking that foreach 
(elem,table) should iterate over keys?

Maybe I'm not thinking correctly but for me an assoc array is just an 
array with additional key (index) features thanks to which I save space 
and/or have more indexing method than only integers.


e.g.

Normal array

No.       Item
0	George
1	Fred
2	Dany
3	Lil

Index/key is infered from position (offset)


Now Assoc array:

No.	Item
10	Lindsey
21	Romeo
1001	C-Jay

Or
No.	Item
first	Europe
second	South America
third	Australia

Or
Names occurrence frequency:

No. Item
Andy	21
John	23
Kate	12

And the only difference is the need for using a hash function for value 
lookup (calculate position) which should not bother a user when he 
doesn't care.

Then when you ask somebody to iterate over the tables, what he will do 
almost for certain? If it would be me, you know... values all the time. 
Even for last example most important values are those numbers (despite 
in this case they're meaningless without keys).

Cheers
Piotrek

Oct 18 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Oct 18, 2009 at 3:28 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.

 I was almost convinced, because that rule has a sense. But treating
 normal
 arrays and associative array has more sense to me.


 fun (SomeObject object) {
 foreach (element;object.arr1){ //normal, but how do I know at first look
 //just do something with element
 }

 foreach (element;object.arr2){ // assoc, but how do I know at first look
 //just do something with element hopefully not index
 }

 That sounds like an argument that there should be no default, because
 either way it's not clear whether you're iterating over keys or
 values.


 Really?! That wasn't my intention :) In both cases I wish it were values ;)

Got that, but we got two explanations for the "most logical behavior".
 I like my explanation, you like yours.  Given that in a sample size
of like 4 here we can't agree, I don't see much hope of there being
overwhelming agreement on the right behavior in the population at
large.  Seems like there's interest in making it harder to make
mistakes with D (see the T[new] discussion), and there's a genuine
ambiguity here so the user should be made to specify what they want.
Otherwise it's easy to make mistakes.  I know I've gotten wrong
before.  To the point where I pretty much always just use the
foreack(k,v; AA) form now just to be sure.  Even if I don't need the
values.  Or the keys or whichever it is ;-)

 Just get rid of the the one-argument foreach over AAs altogether and force
 the user to be
 explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach
 (elem,table) should iterate over keys?

Bearophile.  And anyone coming from python, at the least.  And anyone
who agrees with the logic of connecting 'in' with what gets iterated.

 Maybe I'm not thinking correctly but for me an assoc array is just an array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.

It can also be thought of as a set with some ancillary data associated
with each element.  In that case the keys are the set elements, and
the values are just some extra stuff hanging off the elements.

--bb

Oct 18 2009

Piotrek <starpit tlen.pl> writes:

Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and force
 the user to be
 explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach
 (elem,table) should iterate over keys?

 
 Bearophile.  And anyone coming from python, at the least.  And anyone
 who agrees with the logic of connecting 'in' with what gets iterated.
 


programmers) are defaulting to values unless explicitly stated.

 Maybe I'm not thinking correctly but for me an assoc array is just an array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.

 
 It can also be thought of as a set with some ancillary data associated
 with each element.  In that case the keys are the set elements, and
 the values are just some extra stuff hanging off the elements.
 
 --bb

Sorry in advance, I couldn't resist.

 From Wikipedia:
"From the perspective of a computer programmer, an associative array can 
be viewed as a generalization of an array. While a regular array maps an 
index to an arbitrary data type such as integers, other primitive types, 
or even objects, an associative array's keys can be arbitrarily typed. 
The values of an associative array do not need to be the same type, 
although this is dependent on the programming language"

So it almost the same what I have said. I hadn't seen wiki entry before 
nor didn't changed the article's test ;) So now you can see that for 
most people (except Python guys and maybe a some more) it should behave 
like normal array. It's just intuitive.

What you are talking about it's a side effect of AAs (or rather 
derivative feature) and then we use the keys property or the key,value pair.

Cheers
Piotrek

Oct 19 2009

KennyTM~ <kennytm gmail.com> writes:

On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach
 (elem,table) should iterate over keys?

 Bearophile. And anyone coming from python, at the least. And anyone
 who agrees with the logic of connecting 'in' with what gets iterated.


 programmers) are defaulting to values unless explicitly stated.

and Javascript. and Objective-C.

 Maybe I'm not thinking correctly but for me an assoc array is just an
 array
 with additional key (index) features thanks to which I save space and/or
 have more indexing method than only integers.

 It can also be thought of as a set with some ancillary data associated
 with each element. In that case the keys are the set elements, and
 the values are just some extra stuff hanging off the elements.

 --bb

 Sorry in advance, I couldn't resist.

  From Wikipedia:
 "From the perspective of a computer programmer, an associative array can
 be viewed as a generalization of an array. While a regular array maps an
 index to an arbitrary data type such as integers, other primitive types,
 or even objects, an associative array's keys can be arbitrarily typed.
 The values of an associative array do not need to be the same type,
 although this is dependent on the programming language"

 So it almost the same what I have said. I hadn't seen wiki entry before
 nor didn't changed the article's test ;) So now you can see that for
 most people (except Python guys and maybe a some more) it should behave
 like normal array. It's just intuitive.

 What you are talking about it's a side effect of AAs (or rather
 derivative feature) and then we use the keys property or the key,value
 pair.

 Cheers
 Piotrek

Oct 19 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Oct 19, 2009 at 1:58 PM, KennyTM~ <kennytm gmail.com> wrote:
 On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach
 (elem,table) should iterate over keys?

 Bearophile. And anyone coming from python, at the least. And anyone
 who agrees with the logic of connecting 'in' with what gets iterated.


 programmers) are defaulting to values unless explicitly stated.

 and Javascript. and Objective-C.



for it default iteration yields Key,Value pairs called
DictionaryEntry.  MSDN Says: "Since each element of the IDictionary
object is a key/value pair, the element type is not the type of the
key or the type of the value. Instead, the element type is
DictionaryEntry."





--bb

Oct 19 2009

Leandro Lucarella <llucax gmail.com> writes:

Bill Baxter, el 19 de octubre a las 14:18 me escribiste:
 On Mon, Oct 19, 2009 at 1:58 PM, KennyTM~ <kennytm gmail.com> wrote:
 On Oct 20, 09 03:40, Piotrek wrote:
 Bill Baxter pisze:
 Just get rid of the the one-argument foreach over AAs altogether and
 force
 the user to be
 explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach
 (elem,table) should iterate over keys?

 Bearophile. And anyone coming from python, at the least. And anyone
 who agrees with the logic of connecting 'in' with what gets iterated.


 programmers) are defaulting to values unless explicitly stated.

 and Javascript. and Objective-C.

 


 for it default iteration yields Key,Value pairs called
 DictionaryEntry.  MSDN Says: "Since each element of the IDictionary
 object is a key/value pair, the element type is not the type of the
 key or the type of the value. Instead, the element type is
 DictionaryEntry."

I guess everybody knows, but this is what C++ does =P

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Did you know the originally a Danish guy invented the burglar-alarm
unfortunately, it got stolen

Oct 19 2009

Piotrek <starpit tlen.pl> writes:

Bill Baxter pisze:
 

 

 
 --bb

True. Java does it too. My fault.

Cheers
Piotrek

Oct 21 2009

=?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:

Piotrek wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.

 I was almost convinced, because that rule has a sense. But treating 
 normal
 arrays and associative array has more sense to me.

 fun (SomeObject object) {
 foreach (element;object.arr1){ //normal, but how do I know at first look
 //just do something with element
 }

 foreach (element;object.arr2){ // assoc, but how do I know at first look
 //just do something with element hopefully not index
 }

 That sounds like an argument that there should be no default, because
 either way it's not clear whether you're iterating over keys or
 values. 

 Really?! That wasn't my intention :) In both cases I wish it were values ;)

  > Just get rid of the the one-argument foreach over AAs altogether and 
 force the user to be
  > explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach 
 (elem,table) should iterate over keys?

 Maybe I'm not thinking correctly but for me an assoc array is just an 
 array with additional key (index) features thanks to which I save space 
 and/or have more indexing method than only integers.

 e.g.

 Normal array

 No.       Item
 0    George
 1    Fred
 2    Dany
 3    Lil

 Index/key is infered from position (offset)

 Now Assoc array:

 No.    Item
 10    Lindsey
 21    Romeo
 1001    C-Jay

 Or
 No.    Item
 first    Europe
 second    South America
 third    Australia

 Or
 Names occurrence frequency:

 No. Item
 Andy    21
 John    23
 Kate    12

 And the only difference is the need for using a hash function for value 
 lookup (calculate position) which should not bother a user when he 
 doesn't care.

 Then when you ask somebody to iterate over the tables, what he will do 
 almost for certain? If it would be me, you know... values all the time. 
 Even for last example most important values are those numbers (despite 
 in this case they're meaningless without keys).

 Cheers
 Piotrek

Put it this way:
  Is there any time you are interested in the values without the keys?
  Is there any time you are interested in the keys without the values?

If you're not interested in the keys, the real question would be why you 
are using an associative array instead of just an array.

I can think of at least one example of when you want key iteration, 
which would be when using a bool[T] as a set.

Oct 21 2009

Piotrek <starpit tlen.pl> writes:

Pelle M�nsson pisze:
 Piotrek wrote:
 Bill Baxter pisze:
 On Sun, Oct 18, 2009 at 1:12 PM, Piotrek <starpit tlen.pl> wrote:
 Bill Baxter pisze:
 I think the default should be to iterate over whatever 'in' looks at.

 I was almost convinced, because that rule has a sense. But treating 
 normal
 arrays and associative array has more sense to me.

 fun (SomeObject object) {
 foreach (element;object.arr1){ //normal, but how do I know at first 
 look
 //just do something with element
 }

 foreach (element;object.arr2){ // assoc, but how do I know at first 
 look
 //just do something with element hopefully not index
 }

 That sounds like an argument that there should be no default, because
 either way it's not clear whether you're iterating over keys or
 values. 

 Really?! That wasn't my intention :) In both cases I wish it were 
 values ;)

  > Just get rid of the the one-argument foreach over AAs altogether 
 and force the user to be
  > explicit about it.

 I wouldn't do so. Would anybody do an error by thinking that foreach 
 (elem,table) should iterate over keys?

 Maybe I'm not thinking correctly but for me an assoc array is just an 
 array with additional key (index) features thanks to which I save 
 space and/or have more indexing method than only integers.

 e.g.

 Normal array

 No.       Item
 0    George
 1    Fred
 2    Dany
 3    Lil

 Index/key is infered from position (offset)

 Now Assoc array:

 No.    Item
 10    Lindsey
 21    Romeo
 1001    C-Jay

 Or
 No.    Item
 first    Europe
 second    South America
 third    Australia

 Or
 Names occurrence frequency:

 No. Item
 Andy    21
 John    23
 Kate    12

 And the only difference is the need for using a hash function for 
 value lookup (calculate position) which should not bother a user when 
 he doesn't care.

 Then when you ask somebody to iterate over the tables, what he will do 
 almost for certain? If it would be me, you know... values all the 
 time. Even for last example most important values are those numbers 
 (despite in this case they're meaningless without keys).

 Cheers
 Piotrek

 Put it this way:
  Is there any time you are interested in the values without the keys?

Yes!

  Is there any time you are interested in the keys without the values?

Yes!

 If you're not interested in the keys, the real question would be why you 
 are using an associative array instead of just an array.

The answer is simple. I can reuse AA in many different functions. 
Sometimes I need keys other time values and even... both :) That isn't 
the issue. The problem was about what should return short version of 
foreach over AA.

 I can think of at least one example of when you want key iteration, 
 which would be when using a bool[T] as a set.

See above.

Cheers
Piotrek

Oct 21 2009

bearophile <bearophileHUGS lycos.com> writes:

Piotrek:

 The problem was about what should return short version of 
 foreach over AA.

Relax. Despite both you and Walter are wrong, things will not change, and D
will probably keep the wrong design you like :-)

Bye,
bearophile

Oct 21 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 
 I'm not sure whether much or any of this is attainable within D2.<

 
 The things I've written about aren't all equally important and they
 don't require the same amount of work to be implemented. I am sorry
 for writing that gazpacho. There are two things that I think are more
 important that can implemented now: the opEquals among AAs, and the
 default iteration with foreach on the keys. The other things can
 wait. The opEquals among AAs requires probably less than 20 lines of
 code.

You may want to submit those to bugzilla so they don't get forgotten.

 Two persons (plus Walter, of course) have said they don't like to
 iterate on the keys first. The other people have kept muzzle shut so
 I can't tell yet.

Well clearly sometimes both are needed. I think it would be limiting to 
e.g. only offer iteration on keys, to then do one extra lookup to fetch 
the value.


Andrei

Oct 18 2009

bearophile <bearophileHUGS lycos.com> writes:

Moritz Warning:

 foreach(key; aa.keys)
    if(Test(key))
       aa.remove(key);

 
 It's undefined behavior.
 You shouldn't try to mutate the aa while iterating.
 I hope that will be fixed.
 It took me some time to find this out.

In Python:

 d = {1:2, 3:4}
 for k in d: del d[k]



...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

The "for k in d" sets a flag inside the d dict (AA), and del and insert
operations test this flag, if it's true a runtime exception is thrown.

Something like this is enough in Python, but I think in D it's not safe enough
and it may be a little slow too (because you have to test a flag every time).
In D a compile-time test pass may be able to catch and warn against most of
such situations (lint tools are able to do far more complex things).

--------------------

Christopher Wright

Why not mandate using both keys and values?<

Iterating on tuples of an AA is probably a little slower than iterating on just
keys.
On the other hand such AAs will have something like xkeys and xvalues methods
too, that return light iterable objects (ranges, the first one has a O(1)
opIn_r, while the opIn_r of the second is O(n)) that are enough to regain the
lost performance when you need only keys or values, so this situation can
become acceptable.

Bye,
bearophile

Oct 19 2009

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 17 Oct 2009 14:28:51 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Associative arrays are today quite problematic because they don't offer  
 any true iteration. Furthermore, the .keys and .values properties create  
 new arrays, which is wasteful.

 Another issue with associative arrays is that ++a[k] is hacked, which  
 reflects a grave language limitation. That needs to be replaced with a  
 true facility.

 Any other issues with AAs that you want to see fixed, and ideas guiding  
 a redesign?

Do not require opCmp for AAs.  There are some good hashmap implementations  
do not require using a tree for collisions.  This would also eliminate at  
least one pointer in the element struct.

It also causes some problems with using arbitrary classes as keys.  It is  
easy to define a default hash and default opEquals for a class, but it is  
difficult to define a default opCmp.  In fact, the default opCmp in object  
simply throws an exception, making it a nuisance to have the compiler  
allow using a class as a key, and then throwing an exception at runtime  
when you use it.

Removing the requirement for opCmp would also eliminate the requirement  
for opCmp to be in object (it currently by default throws an exception),  
so it could be in an interface instead.

-Steve

Oct 20 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Revamping associative arrays