digitalmars.D.bugs - Access violation with AA's

Kris (81/81) Oct 28 2005 class Foo

Walter Bright (11/12) Oct 28 2005 Nobody had a nice word to say about the original implementation.

Kris (43/58) Oct 28 2005 Hi, Walter ~

Regan Heath (56/56) Oct 29 2005 Just to add my 2c here as well.

Regan Heath (14/23) Oct 29 2005 If it really bothers people that "val = AA["key"];" throws an exception ...
Bruno Medeiros (10/90) Oct 31 2005 I strongly agree with all of this. I just want to add that we should

Walter Bright (3/4) Nov 05 2005 Yes, I'd like to fix that one. Can you post an example specifically for

kris (3/10) Nov 05 2005 There was an example in the original post, commented with the location

Walter Bright (6/16) Nov 05 2005 I checked it out. The GPF happens when compiled with -release, the

kris (26/31) Nov 05 2005 In the example provided, the lookup happened in a library. The library

Derek Parnell (42/78) Nov 05 2005 Huh? Of course!? Why would you think it wise to accept data from an unkn...

kris (52/153) Nov 06 2005 Hey Derek; I think you may have misunderstand the problem, so I'll

Derek Parnell (78/233) Nov 06 2005 I don't believe that I have misunderstood "the problem" at all.

kris (98/225) Nov 06 2005 I apologize for the long post. The salient point is right at the end, so...

Regan Heath (26/42) Nov 06 2005 Sez you! ;)
Sean Kelly (18/33) Nov 06 2005 I like the [] and 'in' syntax as it was originally implemented, as it
Derek Parnell (66/333) Nov 06 2005 That may be true. However I am working with we we've got, knowing that

kris (14/39) Nov 06 2005 Yes ~ there is that aspect. We should know better by now

Unknown W. Brackets (37/37) Nov 06 2005 I would strongly argue that if you want such checking, you should have

kris (22/52) Nov 07 2005 I must have made a serious mistake in the couching of that argument,

Unknown W. Brackets (31/40) Nov 07 2005 Perhaps it does. But, this is contrasted with the fact that every

Kris (16/34) Nov 07 2005 Interesting, Allthough scripting languages do have a different set of

Walter Bright (19/43) Nov 05 2005 It was repeatedly asked for.

Sean Kelly (7/16) Nov 06 2005 For what it's worth, I liked it too. And I believe the C++ map worked

Regan Heath (51/68) Nov 06 2005 I think we can and should avoid pointers.

Derek Parnell (14/28) Nov 06 2005
Bruno Medeiros (9/24) Nov 07 2005 ...

kris (12/19) Nov 06 2005 If you mean adding an AA method similar to this:

Regan Heath (10/27) Nov 06 2005 Good point, I hadn't thought of that. I was seeing this as redundant in ...
Ameer Armaly (2/21) Nov 07 2005

Kris (10/25) Oct 28 2005 Ah. I just remembered that AA lookup's would insert an empty entry if on...

David Medlock (15/53) Oct 31 2005 I will maintain my position that we should simply have a couple of built...

Kris (10/64) Oct 31 2005 Yep ~ that would be great.

Ben Hinkle (7/32) Oct 29 2005 FWIW, MinTL HashAA returns a user-settable missing value on invalid keys...

kris (14/58) Oct 29 2005 That all sounds cool, Ben.

"Kris" <fu bar.com> writes:

class Foo
{
        private Record [char[]] map;

        static class Record
        {
                void write (Foo parent, void[] data) {}
        }

        synchronized void put (char[] key, void[] data)
        {
                /****** access violation here *****/
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }
}


void main()
{
        Foo f = new Foo;
        f.put ("foo", new void[10]);
}



Depending upon where (in the code body) the dereference of 'map' occurs, one 
gets either an access-violation or an OutOfBounds exception. Fragile.

However ~ this, again, casts a shadow upon the AA implementation. I mean, 
throwing an exception in this case is surely dubious (a missing entry in an 
AA is *not* exceptional; often it is the norm). Further compare these two 
implementation of the above code:


        synchronized void put (char[] key, void[] data)
        {
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }


        synchronized void put (char[] key, void[] data)
        {
                Record  *r = key in map;

                if (r is null)
                   {
                   Record rr = new Record();
                   map [key] =  rr;
                   r = &rr;
                   }
                (*r).write (this, data);
        }



pointers. Is this a good thing? For something as rudimentary as a HashMap? 
Alternatively, one could do this:


        synchronized void put (char[] key, void[] data)
        {
                Record r;

                try {
                     r = map [key];
                     } catch (OutOfBoundsException e)
                                 {
                                  r = new Record ();
                                  map [key] =  r;
                                 }
                r.write (this, data);
        }



something one needs for a rudimentary container. I think this particular 
change to AA's is just flat-out bogus (and, none of this would be necessary 
if AA's were implemented as a template library).

Finally, any code written for the "old" implementation of AA's is broken. 

broken. Wonderful!


simplest and most in line with what D used to represent. What is one 
supposed to do?

Oct 28 2005

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation.

Just write it as:

    if (!(key in map))
        map[key] = new Record;
    r = map[key];
    ...

No, it isn't as efficient as the old way. But, like I said, the old way was
called lots of unkind things <g>.

It is possible that a future compiler may recognize the above as an idiom
and rewrite it so the array lookup is done only once.

Oct 28 2005

"Kris" <fu bar.com> writes:

Hi, Walter ~

I haven't been at all shy in the past in terms of criticizing the multiple 
lookups required for AA usage. Nor for some of the other "deficiencies" (as 
I imagine you're referring to).

However, what you suggest below /minimally/ requires two lookups, and three 
on an insert <g>. Your changes just made the performance concern notable 
worse than it was, along with breaking the existing code-base. The compiler 
optimization you speak of does not exist, and really should not need to.

Can we step back a moment, please?

One issue here is the Access Violation (why this is posted in the bugs 
section rather than in the main forum), although this new behaviour of 
throwing an exception is, I think, highly questionable. How did that get by 
the wolves in the first place? <g>

For example, I quite often use an AA to identify things as 'special' ~ URL 
schemes for example. If the scheme is not in the AA then it ain't special. 
The missing case is /not/ exceptional; instead it is actually the norm; I 
certainly don't wish to be catching exceptions for the normal case (from 
either a semantic or performance perspective). Nor do I wish to use pointers 
for such usual, simplistic, cases.

OTOH, what you did with the 'in' keyword and pointers improved that aspect 
of it ~ if one wants to eliminate a potential double-lookup then one can use 
the pointer syntax. Good!

The problem here is that, at the same time, you changed the semantics of a 
simple lookup such that it now either requires pointer-syntax, the overhead 
of try/catch, or yet another lookup. I think that was a mistake, and am not 
too shy to say so :-)

With respect, I think this sets a rather poor precedent for D beginners ~ as 
a questionable example of exception usage, and the associated added 
complexity or overhead of the current AA lookup model (or, alternatively, 
the required use of pointers).

Let's not forget there's an access-violation here too. Just compile that 
example and run it.

Lastly, I /do/ actually have a kind word to say about the original 
implementation: other than the potential double-lookup, it was fast, and it 
was simple. I still think AA's could/should have been handled via templates 
when they came along, and could therefore have been treated as a library 
utility rather than being built into the compiler itself. Regardless, the 
usage model is now arguably slower and more complex than before ~ largely 
negating the effort of placing AA's within the compiler in the first place. 
IMO.

Regards;





"Walter Bright" <newshound digitalmars.com> wrote in message 
news:djuj2d$2vvv$1 digitaldaemon.com...
 "Kris" <fu bar.com> wrote in message 
 news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

 Nobody had a nice word to say about the original implementation.

 Just write it as:

    if (!(key in map))
        map[key] = new Record;
    r = map[key];
    ...

 No, it isn't as efficient as the old way. But, like I said, the old way 
 was
 called lots of unkind things <g>.

 It is possible that a future compiler may recognize the above as an idiom
 and rewrite it so the array lookup is done only once.

Oct 28 2005

"Regan Heath" <regan netwin.co.nz> writes:

Just to add my 2c here as well.

I disliked the original AA and array behaviour because it inserted when  
you did a lookup.

I was a supporter of the "make it an exception" idea because to me the  
statement "val = aa["key"]; says get me the value for "key". However it  
seems that in practice, where we use AA's it's more typical that we're  
saying "have we got a value for "key" (as Kris mentioned). It is for this  
reason I have come to dislike the exception.

The trouble as I see it is that the the compiler cannot know in each case  
whether a missing item is exceptional and this is because we have no way  
of telling it. We've got one way to ask for an item "item = aa[index]" and  
that's it.

The solution IMO, something I have been an advocate of for some time now,  
is adding different ways to ask for items. The first and most relevant  
here is a "contains" method, eg.

bool contains(VALUE[KEY] a);
bool contains(VALUE[KEY] a, out VALUE v);

The first from essentiall does what "in" does. The second does what "in"  
does but assigns the value to 'v'.

Sure, I can and have written a template that uses 'in' to achieve these,  
but it seems that something this useful should be part of the default  
built-in array handling so that everyone has access to it. At the very  
least it should be part of the standard library.

Here is a list of things I can imagine a programmer wanting to do on  
lookup of an item, ideally all should be supported in the most efficient  
manner possible i.e. no double lookup/hash.

  - check for item (i.e. if ("a" in AA))
  - check for item
    - if exists, get value
    - if not exists, error
    - if not exists, add 'this one' (or .init value)

(feel free to add to this list)

My feeling is that we handle each of the above like so:

"check for item"
   if ("key" in AA) {}
   if (AA.contains("key")) {}

"check for item, if exists get value"
   if (AA.contains("key",val)) {}

"check for item, if not exists, error"
   val = AA["key"];
   val = AA.get("key");

So, contains is added (get is added), the rest remains as it is now.

The tricky one seems to be:

"check for item, if not exists, add 'this one' (or .init value)"

perhaps?

val = <value to insert>;
AA.getset("key",val);

so val would end up being the existing value, or the new inserted value.  
If you wanted to tell if there was an existing value you could keep a copy  
i.e.

nval = val = <value to insert>;
AA.getset("key",val);

if (nval is val) { //new value was inserted }
else { //we had an existing value }

A better name than "getset" can likely be found.

Regan

Oct 29 2005

"Regan Heath" <regan netwin.co.nz> writes:

 My feeling is that we handle each of the above like so:

 "check for item"
    if ("key" in AA) {}
    if (AA.contains("key")) {}

 "check for item, if exists get value"
    if (AA.contains("key",val)) {}

 "check for item, if not exists, error"
    val = AA["key"];
    val = AA.get("key");

If it really bothers people that "val = AA["key"];" throws an exception  
then perhaps it could return the .init value and the explicit "get" call  
could throw the exception. I don't think it matters too much as I can see  
myself using "contains" in most cases.

I dislike "val = AA["key"];" returning .init if there is no _other_ method  
(i.e. "contains") to get a value which can tell me whether the item did in  
fact exist or not. Example:

int[char[]] AA;
int v;

v = AA["test"];

the .init value for int is 0, so if v == 0 after this call we do not know  
whether it existed and was 0 or didn't exist at all.

Regan



Regan

Oct 29 2005

Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:

Regan Heath wrote:
 Just to add my 2c here as well.
 
 I disliked the original AA and array behaviour because it inserted when  
 you did a lookup.
 
 I was a supporter of the "make it an exception" idea because to me the  
 statement "val = aa["key"]; says get me the value for "key". However it  
 seems that in practice, where we use AA's it's more typical that we're  
 saying "have we got a value for "key" (as Kris mentioned). It is for 
 this  reason I have come to dislike the exception.
 
 The trouble as I see it is that the the compiler cannot know in each 
 case  whether a missing item is exceptional and this is because we have 
 no way  of telling it. We've got one way to ask for an item "item = 
 aa[index]" and  that's it.
 
 The solution IMO, something I have been an advocate of for some time 
 now,  is adding different ways to ask for items. The first and most 
 relevant  here is a "contains" method, eg.
 
 bool contains(VALUE[KEY] a);
 bool contains(VALUE[KEY] a, out VALUE v);
 
 The first from essentiall does what "in" does. The second does what 
 "in"  does but assigns the value to 'v'.
 
 Sure, I can and have written a template that uses 'in' to achieve 
 these,  but it seems that something this useful should be part of the 
 default  built-in array handling so that everyone has access to it. At 
 the very  least it should be part of the standard library.
 
 Here is a list of things I can imagine a programmer wanting to do on  
 lookup of an item, ideally all should be supported in the most 
 efficient  manner possible i.e. no double lookup/hash.
 
  - check for item (i.e. if ("a" in AA))
  - check for item
    - if exists, get value
    - if not exists, error
    - if not exists, add 'this one' (or .init value)
 
 (feel free to add to this list)
 
 My feeling is that we handle each of the above like so:
 
 "check for item"
   if ("key" in AA) {}
   if (AA.contains("key")) {}
 
 "check for item, if exists get value"
   if (AA.contains("key",val)) {}
 
 "check for item, if not exists, error"
   val = AA["key"];
   val = AA.get("key");
 
 So, contains is added (get is added), the rest remains as it is now.
 
 The tricky one seems to be:
 
 "check for item, if not exists, add 'this one' (or .init value)"
 
 perhaps?
 
 val = <value to insert>;
 AA.getset("key",val);
 
 so val would end up being the existing value, or the new inserted 
 value.  If you wanted to tell if there was an existing value you could 
 keep a copy  i.e.
 
 nval = val = <value to insert>;
 AA.getset("key",val);
 
 if (nval is val) { //new value was inserted }
 else { //we had an existing value }
 
 A better name than "getset" can likely be found.
 
 Regan

I strongly agree with all of this. I just want to add that we should 
also have a method to set/add a new pair. Like:

   AA.set("key",val);  // or AA.add("key",val);

I find the current array usage syntax (and the previous one too, for 
that matter), quite strange and unnatural.


-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to 
be... unnatural."

Oct 31 2005

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 One issue here is the Access Violation

Yes, I'd like to fix that one. Can you post an example specifically for
that? That's a separate issue from discussion of how it should work.

Nov 05 2005

kris <fu bar.org> writes:

Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 
One issue here is the Access Violation

 
 
 Yes, I'd like to fix that one. Can you post an example specifically for
 that? That's a separate issue from discussion of how it should work.



There was an example in the original post, commented with the location 
of the GPF.

Nov 05 2005

"Walter Bright" <newshound digitalmars.com> writes:

"kris" <fu bar.org> wrote in message news:dkiu4q$20bq$1 digitaldaemon.com...
 Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message


news:djumru$1a9$1 digitaldaemon.com...
One issue here is the Access Violation


 Yes, I'd like to fix that one. Can you post an example specifically for
 that? That's a separate issue from discussion of how it should work.

 There was an example in the original post, commented with the location
 of the GPF.

I checked it out. The GPF happens when compiled with -release, the
ArrayBoundsError without -release. That is as designed; the idea is similar
for regular arrays. Array bounds checking is not done when -release is
thrown, and you get whatever happens.

Nov 05 2005

kris <fu bar.org> writes:

Walter Bright wrote:
 
 I checked it out. The GPF happens when compiled with -release, the
 ArrayBoundsError without -release. That is as designed; the idea is similar
 for regular arrays. Array bounds checking is not done when -release is
 thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library 
was -release, while the client code was not.

*cough*

You do realize, I hope, that the above approach dictates one /must/ use 
'in' with AA's to avoid GPFs. I mean, if there might ever be a missing 
entry in an AA (of which is pretty much assured in the general case; esp 
within libraries) then one will end up with a GPF via -release

Thus, since 'in' requires the use of pointers, it holds that AA's 
require pointer-syntax to enable robust coding. I sincerely hope you see 
the irony in that, Walter?

I can somewhat understand your sensitivity in this regard; yet, the 
design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
to understand this is not the case?

I posit the following:

1) use of the array-syntax for AA lookups will produce a GPF if the 
entry does not exist. Please put aside -release for the moment ~ it is a 
red-herring

2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
'in' statement.

3) The usage of the AA 'in' statement renders the array-syntax lookup 
redundant, since one uses the pointer to reach the data instead.

4) Thus, the array-syntax is effectively worthless (and superfluous) for 
AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
Further, the presence of AA array-syntax lookup is an invitation to 
write non-robust code.

Nov 05 2005

Derek Parnell <derek psych.ward> writes:

On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:

 Walter Bright wrote:
 
 I checked it out. The GPF happens when compiled with -release, the
 ArrayBoundsError without -release. That is as designed; the idea is similar
 for regular arrays. Array bounds checking is not done when -release is
 thrown, and you get whatever happens.

 
 In the example provided, the lookup happened in a library. The library 
 was -release, while the client code was not.
 
 *cough*
 
 You do realize, I hope, that the above approach dictates one /must/ use 
 'in' with AA's to avoid GPFs. I mean, if there might ever be a missing 
 entry in an AA (of which is pretty much assured in the general case; esp 
 within libraries) then one will end up with a GPF via -release

Huh? Of course!? Why would you think it wise to accept data from an unknown
source without either validating it or accepting the consequences. To
otherwise complain is not logical. It is parallel in concept to accepting a
user's keyboard-entered data without checking that its okay to use.

In a library routine in which you open up it's API to external usage, one
must either validate the parameters or accept the consequences of trying
use unacceptable data.

 Thus, since 'in' requires the use of pointers, it holds that AA's 
 require pointer-syntax to enable robust coding. I sincerely hope you see 
 the irony in that, Walter?

It does not *require* one to use pointers. It is optional.

   if ( (UsrParm in MyArray) == null)
   {
       MyArray[UsrParm] = ArrayEntry = UsrData;
   }
   else
   {
       ArrayEntry = MyArray[UsrParm];
   }

Look! No pointers involved. And a good optimising D compiler might even be
able to make this more efficient by caching the 'double' look up.

 I can somewhat understand your sensitivity in this regard; yet, the 
 design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
 to understand this is not the case?
 
 I posit the following:
 
 1) use of the array-syntax for AA lookups will produce a GPF if the 
 entry does not exist. Please put aside -release for the moment ~ it is a 
 red-herring

Only on a 'get' access. Not on an 'enquiry' or 'set' access.

 2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
 'in' statement.

No so.
 
 3) The usage of the AA 'in' statement renders the array-syntax lookup 
 redundant, since one uses the pointer to reach the data instead.

Not so.
 
 4) Thus, the array-syntax is effectively worthless (and superfluous) for 
 AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
 Further, the presence of AA array-syntax lookup is an invitation to 
 write non-robust code.

Not so.

What would have been nice is instead of Walter get all upset over people
not liking his implementation, is to provide all four types of access.

   'enquiry' ::  Key in Array (returns pointer to Value or Null)
   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                     error otherwise)
   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                     it doesn't exist.
   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                          otherwise creates an entry with
                                          .init values.)

Or any other equivalent syntax. The point is that there is no reason for
the old behaviour to be totally removed from the language, just shifted
away from being the default behaviour for 'Value = Array(Key)' syntax.

-- 
Derek Parnell
Melbourne, Australia
6/11/2005 5:41:50 PM

Nov 05 2005

kris <fu bar.org> writes:

Hey Derek; I think you may have misunderstand the problem, so I'll 
attempt to clarify somewhat ~

Derek Parnell wrote:
 On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:
 
 
Walter Bright wrote:

I checked it out. The GPF happens when compiled with -release, the
ArrayBoundsError without -release. That is as designed; the idea is similar
for regular arrays. Array bounds checking is not done when -release is
thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library 
was -release, while the client code was not.

*cough*

You do realize, I hope, that the above approach dictates one /must/ use 
'in' with AA's to avoid GPFs. I mean, if there might ever be a missing 
entry in an AA (of which is pretty much assured in the general case; esp 
within libraries) then one will end up with a GPF via -release

 
 
 Huh? Of course!? Why would you think it wise to accept data from an unknown
 source without either validating it or accepting the consequences. To
 otherwise complain is not logical. It is parallel in concept to accepting a
 user's keyboard-entered data without checking that its okay to use.
 
 In a library routine in which you open up it's API to external usage, one
 must either validate the parameters or accept the consequences of trying
 use unacceptable data.

I failed miserably to get your drift here. Hash tables are *not* like 
arrays. If they don't contain a key it is surely not a reason to GPF. Is 
it? We're talking about this code causing a GPF:

char[char[]] AA;

char[] s = AA["unforseen key"];  // GPF; can't check for a null return



 
 
Thus, since 'in' requires the use of pointers, it holds that AA's 
require pointer-syntax to enable robust coding. I sincerely hope you see 
the irony in that, Walter?

 
 
 It does not *require* one to use pointers. It is optional.
 
    if ( (UsrParm in MyArray) == null)
    {
        MyArray[UsrParm] = ArrayEntry = UsrData;
    }
    else
    {
        ArrayEntry = MyArray[UsrParm];
    }
 
 Look! No pointers involved. And a good optimising D compiler might even be
 able to make this more efficient by caching the 'double' look up.

Oh please!

Let's try to stay in the land of reason here. Yes, you can come up with 
all sort of ways to /make/ it work with /multiple/ lookups. Walter 
suggested a way to do it with three lookups instead. I know you 
appreciate optimal code paths, Derek, so can we sidestep this please? 
The above code has two lookups, where only one should be necessary. I 
sure hope you avoid multiple lookups within Build?

Since you're forcing the issue, let's change my posit to assert that 
gratuitous use of multiple lookups should be not be considered ideal?


I can somewhat understand your sensitivity in this regard; yet, the 
design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
to understand this is not the case?

I posit the following:

1) use of the array-syntax for AA lookups will produce a GPF if the 
entry does not exist. Please put aside -release for the moment ~ it is a 
red-herring

 
 
 Only on a 'get' access. Not on an 'enquiry' or 'set' access.

False. We're talking about array-syntax, and not 'in' syntax.
As above: char[] s = AA["unforseen key"];  // causes GPF


 
 
2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
'in' statement.

 
 
 No so.

Au contraire, my friend ~ unless you're prepared to perform unecessary 
multiple lookups. I don't consider redundant lookups to be relevant, and 
neither should anyone following this ridiculous saga.


  
 
3) The usage of the AA 'in' statement renders the array-syntax lookup 
redundant, since one uses the pointer to reach the data instead.

 
 
 Not so.

Certainly! Let's waste our time looking the entry up once again, just 
for jollies. And let's make sure the key is very, very long; and there's 
lot's of collisions in the hash table. Gratuitously wasting CPU cycles 
is surely a good thing.

  
 
4) Thus, the array-syntax is effectively worthless (and superfluous) for 
AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
Further, the presence of AA array-syntax lookup is an invitation to 
write non-robust code.

 
 
 Not so.

Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
used in conjunction with 'in', and is therefore superfluous (since 'in' 
supplies the data anyway). Sure; you can lookup the AA again if you 
wish, but your counter-argument is redundant; just like the additional 
lookup. As noted, the existence of s=AA[], by itself, encourages rather 
fragile code. D requires pointer-syntax to lookup an AA entry without 
GPFing (redundant multiple lookups aside).

 
 What would have been nice is instead of Walter get all upset over people
 not liking his implementation, is to provide all four types of access.
 
    'enquiry' ::  Key in Array (returns pointer to Value or Null)
    'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                      error otherwise)
    'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                      it doesn't exist.
    'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                           otherwise creates an entry with
                                           .init values.)
 
 Or any other equivalent syntax. The point is that there is no reason for
 the old behaviour to be totally removed from the language, just shifted
 away from being the default behaviour for 'Value = Array(Key)' syntax.
 

The point of this is to expose the flaws in the current design. The 
counterpoints you've made are based entirely upon the use of thoroughly 
redundant additional lookups, so of what true value are they? I mean 
that sincerely, since I just can't see any value in multiple lookups 
where perfectly sound alternatives have been around for decades.

Do you want an alternative, robust, optimal API? It would be good to see 
AAs use a set of 'properties' instead, such as bool get("key", inout 
value) along with put(key, value) ~ which, BTW, has no pointer-syntax 
and is optimal in terms of avoiding those wholly redundant lookups.

The problem here is not the built-in AAs per se ~ instead, it's the 
force-fit of array-syntax as the API. Replace that with a set of 
'properties' and there would be nothing to bitch and moan about. Right? 
Either that, or replace them with a template? One with the above functions?

What's truly extraordinary is that such a fundamental aspect of the 
language is still so unsound after all this time; and after so much 
worthless bickering. I mean, it's just a frickin' hash table for Bob's 
sake ... it ain't rocket science, and it sure as heck shouldn't be a 
political football.

Oh ~ perhaps AAs are not intended to be hash tables?

Nov 06 2005

Derek Parnell <derek psych.ward> writes:

On Sun, 06 Nov 2005 00:49:26 -0800, kris wrote:

 Hey Derek; I think you may have misunderstand the problem, so I'll 
 attempt to clarify somewhat ~

I don't believe that I have misunderstood "the problem" at all.

Currently in D, when one attempts to retrieve a non-existent element in an
array, it causes a run-time error to occur. This applies to all array
types: fixed-length, dynamic-length, and associative. (And yes, in the
current D, an associative array is implemented as a hash-table.) The type
of error depends on whether the -release switch has been used or not. If it
has been used then a memory access violation occurs (ie. GPF under unix),
otherwise if -release was not used an ArrayBoundsError exception is thrown.

The problem is that you don't like this behaviour for associative arrays.

I assume that when trying to fetch a non-existent element you would either
like the element to be automatically created with .init value(s) and/or to
return some initialized value, or to always throw an ArrayBoundsError
regardless of the -release status. Which is it you'd like to see happen?

 Derek Parnell wrote:
 On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:
 
 
Walter Bright wrote:

I checked it out. The GPF happens when compiled with -release, the
ArrayBoundsError without -release. That is as designed; the idea is similar
for regular arrays. Array bounds checking is not done when -release is
thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library 
was -release, while the client code was not.

*cough*

You do realize, I hope, that the above approach dictates one /must/ use 
'in' with AA's to avoid GPFs. I mean, if there might ever be a missing 
entry in an AA (of which is pretty much assured in the general case; esp 
within libraries) then one will end up with a GPF via -release

 
 
 Huh? Of course!? Why would you think it wise to accept data from an unknown
 source without either validating it or accepting the consequences. To
 otherwise complain is not logical. It is parallel in concept to accepting a
 user's keyboard-entered data without checking that its okay to use.
 
 In a library routine in which you open up it's API to external usage, one
 must either validate the parameters or accept the consequences of trying
 use unacceptable data.

 
 I failed miserably to get your drift here. 

I apologize. Sometimes I'm not as good with words as I think I am.

I made the assumption that the library managed an AA, and that an public
function was available that fetches data from that AA based on a supplied
key in one of the parameters. I was just saying that if this is the case,
then you'd be wise to validate the key data prior to fetching the AA based
on the externally supplied key value.

 Hash tables are *not* like arrays. If they don't contain a key it is surely
not a reason to GPF. Is 
 it? 

D's associative arrays are a specific type of hash table. The entries in
the table are based on keys. And I agree, a GPF is only one of the possible
implementation behaviors that are possible in response to a fetch attempt
for an element that does not exist.

 We're talking about this code causing a GPF:
 
 char[char[]] AA;
 
 char[] s = AA["unforseen key"];  // GPF; can't check for a null return
 

This is why you might benefit from Walter reestablishing this sort of
behaviour in D - in addition to the current AA behaviour. Sophisticated
coders such as yourself can use such facilities. 

  char[] s = AA.initset("unforseen key");

Now you can check for s.length == 0 if that's important to you. Of course,
that isn't always a perfect way of detecting unforseen key accesses.

In spite of the "double lookup" effect, I would still code it thus ...

  char[] s;
  if ("unforseen key" in AA)
     s = AA["unforseen key"];
  else
     -- some error processing if appropriate.

because it tells the reader of the code that it is possible to get bad keys
and it implements a way to handle those unambiguously.

 
 
 
Thus, since 'in' requires the use of pointers, it holds that AA's 
require pointer-syntax to enable robust coding. I sincerely hope you see 
the irony in that, Walter?

 
 
 It does not *require* one to use pointers. It is optional.
 
    if ( (UsrParm in MyArray) == null)
    {
        MyArray[UsrParm] = ArrayEntry = UsrData;
    }
    else
    {
        ArrayEntry = MyArray[UsrParm];
    }
 
 Look! No pointers involved. And a good optimising D compiler might even be
 able to make this more efficient by caching the 'double' look up.

 
 Oh please!
 
 Let's try to stay in the land of reason here. Yes, you can come up with 
 all sort of ways to /make/ it work with /multiple/ lookups. Walter 
 suggested a way to do it with three lookups instead. I know you 
 appreciate optimal code paths, Derek, so can we sidestep this please? 

You might be misunderstanding me, now. I prize maintainable source code
over runtime performance any day. If run time performance is really such an
issue, code in assembler otherwise get back into the land of reason.

 The above code has two lookups, where only one should be necessary. I 
 sure hope you avoid multiple lookups within Build?

Not if I can help it ;-) By the way, Build runs pretty fast in spite of me
'wasting' cycles checking for valid AA keys.
 
 Since you're forcing the issue, let's change my posit to assert that 
 gratuitous use of multiple lookups should be not be considered ideal?
 
 
I can somewhat understand your sensitivity in this regard; yet, the 
design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
to understand this is not the case?

I posit the following:

1) use of the array-syntax for AA lookups will produce a GPF if the 
entry does not exist. Please put aside -release for the moment ~ it is a 
red-herring

 
 
 Only on a 'get' access. Not on an 'enquiry' or 'set' access.

 
 False. We're talking about array-syntax, and not 'in' syntax.
 As above: char[] s = AA["unforseen key"];  // causes GPF

Isn't that what I said? Your code is performing a 'get' and not an
'enquiry'. 

 
 
 
2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
'in' statement.

 
 
 No so.

 
 Au contraire, my friend ~ unless you're prepared to perform unecessary 
 multiple lookups. I don't consider redundant lookups to be relevant, and 
 neither should anyone following this ridiculous saga.

I must be one of the clowns then. I don't follow your philosophy anymore.
Cost of the application over time is more important to me than trivial
optimizations. Trivial in the sense that if it doesn't account for more
than 5% of a program's execution time, why optimize it to death. My
philosophy regard this is more along the lines of code it legibly first,
and then profile it to locate areas that are worth optimizing.

 
  
 
3) The usage of the AA 'in' statement renders the array-syntax lookup 
redundant, since one uses the pointer to reach the data instead.

 
 
 Not so.

 
 Certainly! Let's waste our time looking the entry up once again, just 
 for jollies. And let's make sure the key is very, very long; and there's 
 lot's of collisions in the hash table. Gratuitously wasting CPU cycles 
 is surely a good thing.

And you have measured this, right?

  
 
4) Thus, the array-syntax is effectively worthless (and superfluous) for 
AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
Further, the presence of AA array-syntax lookup is an invitation to 
write non-robust code.

 
 
 Not so.

 
 Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
 used in conjunction with 'in', and is therefore superfluous (since 'in' 
 supplies the data anyway). 

Well not actually so. The "in" supplies the key and not the data. The data
can be something totally different.

   real[char[]] AA;
   . . . 

   real X;
   . . . 
   if ("Some Key" in AA)
     X = AA["Some Key"];
   else
     -- Handle unknown key value.


 Sure; you can lookup the AA again if you 
 wish, but your counter-argument is redundant; just like the additional 
 lookup. As noted, the existence of s=AA[], by itself, encourages rather 
 fragile code. D requires pointer-syntax to lookup an AA entry without 
 GPFing (redundant multiple lookups aside).

D does not *require* pointer syntax. It is optional. But for completeness
here is the pointer version.
 
   real *q;
   q = "Some Key" in AA;
   if (a !is null)
     X = *q;
   else
     -- Handle unknown key value.

 
 What would have been nice is instead of Walter get all upset over people
 not liking his implementation, is to provide all four types of access.
 
    'enquiry' ::  Key in Array (returns pointer to Value or Null)
    'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                      error otherwise)
    'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                      it doesn't exist.
    'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                           otherwise creates an entry with
                                           .init values.)
 
 Or any other equivalent syntax. The point is that there is no reason for
 the old behaviour to be totally removed from the language, just shifted
 away from being the default behaviour for 'Value = Array(Key)' syntax.
 

 
 The point of this is to expose the flaws in the current design. The 
 counterpoints you've made are based entirely upon the use of thoroughly 
 redundant additional lookups, so of what true value are they? 

They have true value to me. They help me write code which can be read by
other people because my intentions etc are made clearing in the code.

-- 
Derek Parnell
Melbourne, Australia
7/11/2005 2:17:21 AM

Nov 06 2005

kris <fu bar.org> writes:

I apologize for the long post. The salient point is right at the end, so 
please skip over the point/counterpoint argy-bargy.


Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The type
 of error depends on whether the -release switch has been used or not. If it
 has been used then a memory access violation occurs (ie. GPF under unix),
 otherwise if -release was not used an ArrayBoundsError exception is thrown.

I see you've bought into that. There is no such thing as an array-bounds 
error from the API of a hash-table, Derek. It's purely a manufactured 
idiom of the current API.

 The problem is that you don't like this behaviour for associative arrays.

Really? If there's something I "don't like" here, it is an API that is 
problematic purely for the sake of using a particular syntax. You've 
read the tale of the Emporer's New Clothes, haven't you?


 I assume that when trying to fetch a non-existent element you would either
 like the element to be automatically created with .init value(s) and/or to
 return some initialized value, or to always throw an ArrayBoundsError
 regardless of the -release status. Which is it you'd like to see happen?

I can't understand why you feel these are the only options, Derek. I 
agree those are perhaps the options when using array-syntax, but that's 
exactly where the problem lies. Neither of your two options are 
attractive; particularly so when they are purely artificial constraints.


I failed miserably to get your drift here. 

 
 
 I apologize. Sometimes I'm not as good with words as I think I am.
 
 I made the assumption that the library managed an AA, and that an public
 function was available that fetches data from that AA based on a supplied
 key in one of the parameters. I was just saying that if this is the case,
 then you'd be wise to validate the key data prior to fetching the AA based
 on the externally supplied key value.

This is hardly on topic, and smells of smoke. I have to remind you that 
you do not, and should not, require redundant lookups to check if an 
entry exists before fetching it from a hash-table.


Hash tables are *not* like arrays. If they don't contain a key it is surely not
a reason to GPF. Is 
it? 

 
 
 D's associative arrays are a specific type of hash table. The entries in
 the table are based on keys. And I agree, a GPF is only one of the possible
 implementation behaviors that are possible in response to a fetch attempt
 for an element that does not exist.

Well, thank goodness. But, "specific type"? It's just a plain old 
hash-table, with some unwieldly syntax bolted onto it. The latter is the 
problem, not the former.


We're talking about this code causing a GPF:

char[char[]] AA;

char[] s = AA["unforseen key"];  // GPF; can't check for a null return

 
 
 This is why you might benefit from Walter reestablishing this sort of
 behaviour in D - in addition to the current AA behaviour. Sophisticated
 coders such as yourself can use such facilities. 

Sophisticated coders? Yer arse <g>. Hash tables are supposed to be 
trivial from the perspective of the user.


   char[] s = AA.initset("unforseen key");
 
 Now you can check for s.length == 0 if that's important to you. Of course,
 that isn't always a perfect way of detecting unforseen key accesses.
 

Complexity just for the sake of it. This is entirely unecessary.


 In spite of the "double lookup" effect, I would still code it thus ...
 
   char[] s;
   if ("unforseen key" in AA)
      s = AA["unforseen key"];
   else
      -- some error processing if appropriate.
 
 because it tells the reader of the code that it is possible to get bad keys
 and it implements a way to handle those unambiguously.


I see. Pray explain why this alternate API is so appalling by comparison:

   char[] s;

   if (aa.get("unforseen key", s))
       // do something with s
   else
      // do something else

Look! No redundant lookups! No pointers! It must be magic! And, to quote 
you, "it tells the reader of the code that it is possible to get bad 
keys and it implements a way to handle those unambiguously". Wouldn't 
you agree?


Let's try to stay in the land of reason here. Yes, you can come up with 
all sort of ways to /make/ it work with /multiple/ lookups. Walter 
suggested a way to do it with three lookups instead. I know you 
appreciate optimal code paths, Derek, so can we sidestep this please? 

 
 
 You might be misunderstanding me, now. I prize maintainable source code
 over runtime performance any day. If run time performance is really such an
 issue, code in assembler otherwise get back into the land of reason.
 

Entirely misleading. Look at the example above and reconsider. You 
appear to be trying to turn this into something unrelated, Derek. Please 
desist. Yes, there is a performance related aspect here, but only 
because you insist on applying entirely redundant lookups.

One can write perfectly clear intentions (arguably more so) by using an 
alternate, and more appropriate, API.



The above code has two lookups, where only one should be necessary. I 
sure hope you avoid multiple lookups within Build?

 
 
 Not if I can help it ;-) By the way, Build runs pretty fast in spite of me
 'wasting' cycles checking for valid AA keys.


Build is a great tool. However, it appears as though Build takes longer 
to execute than both the compiler plus linker together. It is not a high 
performance application, because it doesn't really need to be. But 
that's hardly important since we're talking about API's here. Build is 
great at what it does ~ it does not represent every application.

Again, your statement vaguely implies that I argue against checking for 
valid AA keys. That's silly, Derek. I'm claiming that one can clearly 
and unambiguously both test the existence of, and avoid redundant 
lookups upon, a hash-table entry by using a more appropriate API.


 Isn't that what I said? Your code is performing a 'get' and not an
 'enquiry'. 

As far as HT's are concerned, a get is equivalent to a query. You can 
argue about seperating them all you wish, but you're simply arguing for 
redundant lookups. To avoid this is exactly why Walter is returning a 
pointer from the 'in' statement. Are you disagreeing with all perspectives?


Au contraire, my friend ~ unless you're prepared to perform unecessary 
multiple lookups. I don't consider redundant lookups to be relevant, and 
neither should anyone following this ridiculous saga.

 
 
 I must be one of the clowns then. I don't follow your philosophy anymore.
 Cost of the application over time is more important to me than trivial
 optimizations. Trivial in the sense that if it doesn't account for more
 than 5% of a program's execution time, why optimize it to death. My
 philosophy regard this is more along the lines of code it legibly first,
 and then profile it to locate areas that are worth optimizing.

(the saga is ridiculous because it's years' old, whilst perfectly 
suitable alternatives have existed for decades)

You're welcome to do double lookups all you want, Derek. However, you're 
insisting that D remain staunchly oblivious to better alternatives.

There's so much spin in your counter that I'm feeling dizzy. You're 
attempting to suggest I don't care a whit about legibility, and that any 
quest to avoid redundant code is misguided. That's utter nonsense.


 And you have measured this, right?

Nobody needs to measure it, Derek. If one executes two lookups where one 
would suffice, then one will expend close to twice the effort/time. It 
stands to reason. You're trying to argue that a single lookup somehow 
makes the code less clear (entirely false), therefore we should all use 
two lookups instead. It's a pointless argument. Please try to keep an 
open mind about alternative APIs.


Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
used in conjunction with 'in', and is therefore superfluous (since 'in' 
supplies the data anyway). 

 
 
 Well not actually so. The "in" supplies the key and not the data. The data
 can be something totally different.
 
    real[char[]] AA;
    . . . 
 
    real X;
    . . . 
    if ("Some Key" in AA)
      X = AA["Some Key"];
    else
      -- Handle unknown key value.


What is your point there? That statment makes no sense at all. Here's 
the pointer version of you example:

real[char[]] AA;
...
real* x;
...
x = ("Some Key" in AA);
if (x)
     // do something with *x
else
    // do something else


And here's a robust, simple, efficient API, sans pointers:

real[char[]] aa;
...
real x;
...
if (aa.get("Some Key", x))
     // do something with x
else
    // do something else



Sure; you can lookup the AA again if you 
wish, but your counter-argument is redundant; just like the additional 
lookup. As noted, the existence of s=AA[], by itself, encourages rather 
fragile code. D requires pointer-syntax to lookup an AA entry without 
GPFing (redundant multiple lookups aside).

 
 
 D does not *require* pointer syntax. It is optional. But for completeness
 here is the pointer version.


I'll repeat what I said above: "D requires pointer-syntax to lookup an 
AA entry without GPFing (redundant multiple lookups aside)"

Your counter chooses to ignore the parenthisis. Restated: to avoid 
multiple lookups, and GPFs, one must use pointer syntax in D. Period.



What would have been nice is instead of Walter get all upset over people
not liking his implementation, is to provide all four types of access.

   'enquiry' ::  Key in Array (returns pointer to Value or Null)
   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                     error otherwise)
   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                     it doesn't exist.
   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                          otherwise creates an entry with
                                          .init values.)

Or any other equivalent syntax. The point is that there is no reason for
the old behaviour to be totally removed from the language, just shifted
away from being the default behaviour for 'Value = Array(Key)' syntax.




I agree. But I see you're insisting on force-fitting the [] syntax, 
resulting in a sub-optimal and overly busy API. All one needs is right here:

bool get(key, inout value);
void put(key, value);

That is simple, robust, intuitive, optimal, proven, succinct. No 
redundant lookups. No pointers anywhere to be seen.

The [] syntax seriously limits D in the API it can expose for these 
purposes. Which is why it's messy at this point. And it's why you have 
chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
methods are pefectly capable instead.

Nov 06 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 06 Nov 2005 13:23:40 -0800, kris <fu bar.org> wrote:
 Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in  
 an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The  
 type
 of error depends on whether the -release switch has been used or not.  
 If it
 has been used then a memory access violation occurs (ie. GPF under  
 unix),
 otherwise if -release was not used an ArrayBoundsError exception is  
 thrown.

 I see you've bought into that. There is no such thing as an array-bounds  
 error from the API of a hash-table, Derek. It's purely a manufactured  
 idiom of the current API.

Sez you! ;)

Seriously though I disagree. I think it depends on what you're using it  
for. I have found the thrown exception useful for catching bugs in at  
least one app I have been writing. The code in question assumed a value  
existed, it was a program error for it not to exist. Thus, the current  
implementation, the current API was exactly what I desired in this case.  
"array bounds error" may not be exactly what it is, but whatever you want  
to call it an error when the item does not exist was a requirement in this  
case.

However, I agree with your original point. There are cases where it's  
never an error for the value to be non existant, in fact I think perhaps  
it's more common for this to be the case. In which case if you were to  
reword your statement above to say that an array bounds error was not  
common in the API of a hash table I would be quite happy to agree.

You guys seem to be arguing about all the wrong things. How about we start  
with what we want, i.e.

1. ability to code different "use cases" in a clear and simple manner.





See my reply to Sean in another branch of this thread, it has the API I  
would most like to see, essentially the addition of a function to check  



Regan

Nov 06 2005

Sean Kelly <sean f4.ca> writes:

kris wrote:
 
 I agree. But I see you're insisting on force-fitting the [] syntax, 
 resulting in a sub-optimal and overly busy API. All one needs is right 
 here:
 
 bool get(key, inout value);
 void put(key, value);
 
 That is simple, robust, intuitive, optimal, proven, succinct. No 
 redundant lookups. No pointers anywhere to be seen.
 
 The [] syntax seriously limits D in the API it can expose for these 
 purposes. Which is why it's messy at this point. And it's why you have 
 chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
 methods are pefectly capable instead.

I like the [] and 'in' syntax as it was originally implemented, as it 
covered the majority of cases that I typically use dictionaries: either 
testing for existence or adding/modifying something already there.  I 
personally have never used the [] syntax, for example, in instances 
where I did not want a value to be created if one did not exist, 
assuming it's modifying an lvalue.  ie.

var[key]++;
var[key] = val;

The only sticky issue with this syntax is how to handle rvalue expressions:

x = var[key];

Does the above insert or merely return the init() value?  I would prefer 
the latter, but I can see how it would be confusing.  Assuming creation 
in all cases seems entirely reasonable to me, and it would be consitent 
with the C++ syntax.

That aside, I would like to see your proposed get/put syntax added as it 
is both meaningful and relatively succinct.


Sean

Nov 06 2005

Derek Parnell <derek psych.ward> writes:

On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

 I apologize for the long post. The salient point is right at the end, so 
 please skip over the point/counterpoint argy-bargy.
 
 
 Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The type
 of error depends on whether the -release switch has been used or not. If it
 has been used then a memory access violation occurs (ie. GPF under unix),
 otherwise if -release was not used an ArrayBoundsError exception is thrown.

 
 I see you've bought into that. There is no such thing as an array-bounds 
 error from the API of a hash-table, Derek. It's purely a manufactured 
 idiom of the current API.

That may be true. However I am working with we we've got, knowing that
change to D is such an unlikely thing that we'd really be better off
building Beeblebrox's Probability Drive.

 The problem is that you don't like this behaviour for associative arrays.

 
 Really? If there's something I "don't like" here, it is an API that is 
 problematic purely for the sake of using a particular syntax. You've 
 read the tale of the Emporer's New Clothes, haven't you?

I see where you're coming from now. And I have to agree that the
functionality of AAs is being restricted by a strict adherence to the
'array' style of syntax.

 I assume that when trying to fetch a non-existent element you would either
 like the element to be automatically created with .init value(s) and/or to
 return some initialized value, or to always throw an ArrayBoundsError
 regardless of the -release status. Which is it you'd like to see happen?

 
 I can't understand why you feel these are the only options, Derek. I 
 agree those are perhaps the options when using array-syntax, but that's 
 exactly where the problem lies. Neither of your two options are 
 attractive; particularly so when they are purely artificial constraints.

Can you help me see another option? If one is trying to access an
non-existent element, one either wants to know that it didn't exist or
wants a default value returned. What else could there be?

I failed miserably to get your drift here. 

 
 
 I apologize. Sometimes I'm not as good with words as I think I am.
 
 I made the assumption that the library managed an AA, and that an public
 function was available that fetches data from that AA based on a supplied
 key in one of the parameters. I was just saying that if this is the case,
 then you'd be wise to validate the key data prior to fetching the AA based
 on the externally supplied key value.

 
 This is hardly on topic, and smells of smoke. I have to remind you that 
 you do not, and should not, require redundant lookups to check if an 
 entry exists before fetching it from a hash-table.

Of course one does not *require* redundant run time lookups. I still think
that I was 'on topic'. I thought the original post came about because you
have a library routine that GPFed when presented with a non-existent key.
My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for
the possibility of 'bad' keys. If we had other tools (for example, a better
AA functionality) then you'd approach this topic differently.

Hash tables are *not* like arrays. If they don't contain a key it is surely not
a reason to GPF. Is 
it? 

 
 
 D's associative arrays are a specific type of hash table. The entries in
 the table are based on keys. And I agree, a GPF is only one of the possible
 implementation behaviors that are possible in response to a fetch attempt
 for an element that does not exist.

 
 Well, thank goodness. But, "specific type"? It's just a plain old 
 hash-table, with some unwieldly syntax bolted onto it. The latter is the 
 problem, not the former.

 
Okay. 'Specific type' in the sense that some hash tables are only used to
detect the presence of the element keys, whereas other types of hash tables
associate non-key data with the elements.

We're talking about this code causing a GPF:

char[char[]] AA;

char[] s = AA["unforseen key"];  // GPF; can't check for a null return

 
 
 This is why you might benefit from Walter reestablishing this sort of
 behaviour in D - in addition to the current AA behaviour. Sophisticated
 coders such as yourself can use such facilities. 

 
 Sophisticated coders? Yer arse <g>. Hash tables are supposed to be 
 trivial from the perspective of the user.

Totally agree.

 
   char[] s = AA.initset("unforseen key");
 
 Now you can check for s.length == 0 if that's important to you. Of course,
 that isn't always a perfect way of detecting unforseen key accesses.
 

 
 Complexity just for the sake of it. This is entirely unecessary.

We agree to differ.

 
 In spite of the "double lookup" effect, I would still code it thus ...
 
   char[] s;
   if ("unforseen key" in AA)
      s = AA["unforseen key"];
   else
      -- some error processing if appropriate.
 
 because it tells the reader of the code that it is possible to get bad keys
 and it implements a way to handle those unambiguously.

 
 
 I see. Pray explain why this alternate API is so appalling by comparison:
 
    char[] s;
 
    if (aa.get("unforseen key", s))
        // do something with s
    else
       // do something else

It isn't appalling. Did I say that it was? In fact it is identical to my
example, except for the syntax. I'm trying to discuss concepts, and not
syntax.

 Look! No redundant lookups! No pointers! It must be magic! And, to quote 
 you, "it tells the reader of the code that it is possible to get bad 
 keys and it implements a way to handle those unambiguously". Wouldn't 
 you agree?

Yes. It is identical to my code (bar the syntax).

Let's try to stay in the land of reason here. Yes, you can come up with 
all sort of ways to /make/ it work with /multiple/ lookups. Walter 
suggested a way to do it with three lookups instead. I know you 
appreciate optimal code paths, Derek, so can we sidestep this please? 

 
 
 You might be misunderstanding me, now. I prize maintainable source code
 over runtime performance any day. If run time performance is really such an
 issue, code in assembler otherwise get back into the land of reason.
 

 
 Entirely misleading. Look at the example above and reconsider. You 
 appear to be trying to turn this into something unrelated, Derek. Please 
 desist. Yes, there is a performance related aspect here, but only 
 because you insist on applying entirely redundant lookups.
 
 One can write perfectly clear intentions (arguably more so) by using an 
 alternate, and more appropriate, API.

I agree. I didn't know your issue was with the syntax, as your original
post was talking about GPFs and not syntax. I admit my mistake in not
understanding your point of view regarding syntax.

The above code has two lookups, where only one should be necessary. I 
sure hope you avoid multiple lookups within Build?

 
 
 Not if I can help it ;-) By the way, Build runs pretty fast in spite of me
 'wasting' cycles checking for valid AA keys.

 
 
 Build is a great tool. However, it appears as though Build takes longer 
 to execute than both the compiler plus linker together.

That would be because it does a shit load of work before calling the bloody
compiler and linker!

 It is not a high 
 performance application, because it doesn't really need to be. 

Exactly. An if it was, I'd definitely reconsider some of the coding idioms
used.

But 
 that's hardly important since we're talking about API's here. Build is 
 great at what it does ~ it does not represent every application.
 
 Again, your statement vaguely implies that I argue against checking for 
 valid AA keys. That's silly, Derek. I'm claiming that one can clearly 
 and unambiguously both test the existence of, and avoid redundant 
 lookups upon, a hash-table entry by using a more appropriate API.

That was the part that I didn't get. Sorry for the waste of bandwidth. 

 Isn't that what I said? Your code is performing a 'get' and not an
 'enquiry'. 

 
 As far as HT's are concerned, a get is equivalent to a query.

Only for certain types of hash tables. If I want to get the data associated
with a key I need to validate the key before getting the data.

 You can 
 argue about seperating them all you wish, but you're simply arguing for 
 redundant lookups. To avoid this is exactly why Walter is returning a 
 pointer from the 'in' statement. Are you disagreeing with all perspectives?

No! Where did that come from?! I imagine that a pointe is being returned so
that the coder can get access to data (not the key) when a valid key is
presented.

 
Au contraire, my friend ~ unless you're prepared to perform unecessary 
multiple lookups. I don't consider redundant lookups to be relevant, and 
neither should anyone following this ridiculous saga.

 
 
 I must be one of the clowns then. I don't follow your philosophy anymore.
 Cost of the application over time is more important to me than trivial
 optimizations. Trivial in the sense that if it doesn't account for more
 than 5% of a program's execution time, why optimize it to death. My
 philosophy regard this is more along the lines of code it legibly first,
 and then profile it to locate areas that are worth optimizing.

 
 (the saga is ridiculous because it's years' old, whilst perfectly 
 suitable alternatives have existed for decades)
 
 You're welcome to do double lookups all you want, Derek. However, you're 
 insisting that D remain staunchly oblivious to better alternatives.

How *do* you read this into my words? I cannot understand where I have said
that the current D syntax is the best available and we should stop looking
for better? I'm sure that anyone could discover with a short scan of
previous posts, that I'm one of Walter's biggest critic. D is a great
language but I'm one of the first to say that some decisions that Walter
has made are terrible (IMNSHO), and that some other non-decisions are
inexcusable.

 There's so much spin in your counter that I'm feeling dizzy. You're 
 attempting to suggest I don't care a whit about legibility, and that any 
 quest to avoid redundant code is misguided. That's utter nonsense.

As are the words you've placed into my posts ;-)
 
 And you have measured this, right?

 
 Nobody needs to measure it, Derek. If one executes two lookups where one 
 would suffice, then one will expend close to twice the effort/time. It 
 stands to reason. You're trying to argue that a single lookup somehow 
 makes the code less clear (entirely false), therefore we should all use 
 two lookups instead. It's a pointless argument. Please try to keep an 
 open mind about alternative APIs.

My mind is not, and has never been closed (on that issue anyway). Of course
two lookups are going to take longer than one lookup! But there are some
situations that it doesn't actually matter.

Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
used in conjunction with 'in', and is therefore superfluous (since 'in' 
supplies the data anyway). 

 
 
 Well not actually so. The "in" supplies the key and not the data. The data
 can be something totally different.
 
    real[char[]] AA;
    . . . 
 
    real X;
    . . . 
    if ("Some Key" in AA)
      X = AA["Some Key"];
    else
      -- Handle unknown key value.

 
 
 What is your point there? That statment makes no sense at all. Here's 
 the pointer version of you example:
 
 real[char[]] AA;
 ...
 real* x;
 ...
 x = ("Some Key" in AA);
 if (x)
      // do something with *x
 else
     // do something else
 
 
 And here's a robust, simple, efficient API, sans pointers:
 
 real[char[]] aa;
 ...
 real x;
 ...
 if (aa.get("Some Key", x))
      // do something with x
 else
     // do something else
 

The only difference is syntax. The concepts are the same.
 
Sure; you can lookup the AA again if you 
wish, but your counter-argument is redundant; just like the additional 
lookup. As noted, the existence of s=AA[], by itself, encourages rather 
fragile code. D requires pointer-syntax to lookup an AA entry without 
GPFing (redundant multiple lookups aside).

 
 
 D does not *require* pointer syntax. It is optional. But for completeness
 here is the pointer version.

 
 
 I'll repeat what I said above: "D requires pointer-syntax to lookup an 
 AA entry without GPFing (redundant multiple lookups aside)"

Agreed.
 
 Your counter chooses to ignore the parenthisis. Restated: to avoid 
 multiple lookups, and GPFs, one must use pointer syntax in D. Period.

Agreed.

What would have been nice is instead of Walter get all upset over people
not liking his implementation, is to provide all four types of access.

   'enquiry' ::  Key in Array (returns pointer to Value or Null)
   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                     error otherwise)
   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                     it doesn't exist.
   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                          otherwise creates an entry with
                                          .init values.)

Or any other equivalent syntax. The point is that there is no reason for
the old behaviour to be totally removed from the language, just shifted
away from being the default behaviour for 'Value = Array(Key)' syntax.



 
 
 I agree. But I see you're insisting on force-fitting the [] syntax, 
 resulting in a sub-optimal and overly busy API.

Well actually it turns out that I was just trying to work within the
constraints that Walter has implemented. I didn't know that you were really
advocating syntax change. I wish for better functionality in AA's too.

 All one needs is right here:
 
 bool get(key, inout value);
 void put(key, value);
 
 That is simple, robust, intuitive, optimal, proven, succinct. No 
 redundant lookups. No pointers anywhere to be seen.

Well, 'inout' implements a pointer, but that's splitting hairs.
 
 The [] syntax seriously limits D in the API it can expose for these 
 purposes. Which is why it's messy at this point. And it's why you have 
 chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
 methods are pefectly capable instead.

Yep. A new syntax for AA would be a wonderful addition to D.

-- 
Derek Parnell
Melbourne, Australia
7/11/2005 9:15:39 AM

Nov 06 2005

kris <fu bar.org> writes:

Thanks ~ I'm really glad that's cleared up! Some replies inline:

Derek Parnell wrote:
 On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

 That may be true. However I am working with we we've got, knowing that
 change to D is such an unlikely thing that we'd really be better off
 building Beeblebrox's Probability Drive.

Yes ~ there is that aspect. We should know better by now <g>

 Can you help me see another option? If one is trying to access an
 non-existent element, one either wants to know that it didn't exist or
 wants a default value returned. What else could there be?

Oh, I think one wants to know the entry does not exist; just via a more 
suitable API. I believe a "bool get(key, inout value)" style of function 
resolves those issues. I suspect you'd agree. It's something that 
various folks were asking for /eons/ ago. Worth revisiting, I felt.


 My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for
 the possibility of 'bad' keys. If we had other tools (for example, a better
 AA functionality) then you'd approach this topic differently.

Point taken. Sorry for miscontruing your perspective.

 That would be because it does a shit load of work before calling the bloody
 compiler and linker!

Sure, it will take longer ~ I meant it takes about twice as long. Of 
course, that's neither here nor there since Build is such a great tool.

There's so much spin in your counter that I'm feeling dizzy. You're 
attempting to suggest I don't care a whit about legibility, and that any 
quest to avoid redundant code is misguided. That's utter nonsense.

 
 
 As are the words you've placed into my posts ;-)

Touche <g>

The [] syntax seriously limits D in the API it can expose for these 
purposes. Which is why it's messy at this point. And it's why you have 
chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
methods are pefectly capable instead.

 
 
 Yep. A new syntax for AA would be a wonderful addition to D.

I feel it is actually more essential than that ~ but wholeheartedy agree 
otherwise. The original syntax could well have stayed intact, if it were 
bolstered via the addition of a "bool get(key, inout value)" method.

Nov 06 2005

"Unknown W. Brackets" <unknown simplemachines.org> writes:

I would strongly argue that if you want such checking, you should have 
two versions of the library: one with -release, and one without.

For example, if I write a C program in release mode, and pass negative 
coordinates to a function that renders data to the screen (obviously, 
assuming it took 'int's), I would not be surprised if it crashed.  Nor 
would I complain to the makers of my compiler or library.  I am passing 
bad data.

It's clear you disagree; you want to catch every case of the bad data 
(even though, I'm entirely sure, there are places in your library where 
your OWN logic might cause bugs/crashes because of bad data.)

Back to the dual library concept, I think this is more of an argument 
for that than for changing the way associative arrays are handled 
*again*.  If I could have some way to compile my D program with the 
contacts-on version of phobos, I'm sure that would be a great gain.

Anyway, your arguments are also flawed, as follows:

1. This is true.

2. This is true, only in the case that you use release mode and want to 
avoid GPFs when bad data is provided (assuming that a GPF can be stack 
traced, either by code in the program or a debugger.  That is outside 
this issue, so we assume reasonable-case.)

3. This is not true.  Having a bike, even if you need to use it to get 
gas sometimes, does not render your car redundant nor useless.  Even if 
gas prices are so high that you cannot use the car, that does not mean 
your wife would appreciate you selling it.  As argued elsewhere, the 
usage of the in statement does not necessitate using pointers to access 
the data at all.

4. Obviously, this is a bizarre conclusion to make.  For your uses, 
surely we might agree that the array-like syntax isn't commonly useful, 
but for other usage - indeed, for common usage - I really can't see such 
a wild statement being true.

Furthermore, saying that having an array-style syntax is an invitation 
to writing bad code is something some can (and have) said about arrays, 
pointers, classes, class-less functions, couches, and generally 
everything else.  Yes, for your uses of it, an inexperienced novice 
might fall into bad habits with such syntax, but that does not again 
mean it applies everywhere.

-[Unknown]

Nov 06 2005

kris <fu bar.org> writes:

Good points, but with some caveats:

Unknown W. Brackets wrote:
 I would strongly argue that if you want such checking, you should have 
 two versions of the library: one with -release, and one without.
 
 For example, if I write a C program in release mode, and pass negative 
 coordinates to a function that renders data to the screen (obviously, 
 assuming it took 'int's), I would not be surprised if it crashed.  Nor 
 would I complain to the makers of my compiler or library.  I am passing 
 bad data.
 
 It's clear you disagree; you want to catch every case of the bad data 
 (even though, I'm entirely sure, there are places in your library where 
 your OWN logic might cause bugs/crashes because of bad data.)

I must have made a serious mistake in the couching of that argument, 
since I'm not at all familiar with this angle. Sorry about that. The 
intent was to identify where [] syntax limits the expressiveness of the 
AA API; to the point where it trips over itself. Wanted to isolate that 
as a discussion point before making any suggestion as to how it might be 
resolved. Clearly, I failed pitifully in that goal.

As to catching a bad case of data, I'm much happier dealing with that on 
my own (rather than the compiler assuming it knows all). Given an 
ammended API, everything would be groovy.


 Anyway, your arguments are also flawed, as follows:

 3. This is not true.  Having a bike, even if you need to use it to get 
 gas sometimes, does not render your car redundant nor useless.  Even if 
 gas prices are so high that you cannot use the car, that does not mean 
 your wife would appreciate you selling it.  As argued elsewhere, the 
 usage of the in statement does not necessitate using pointers to access 
 the data at all.

Well stated; though this point was ammended to exclude multiple lookups 
as a reasonable alternative (as had originally been assumed). Hence, I 
feel it stands.

 
 4. Obviously, this is a bizarre conclusion to make.  For your uses, 
 surely we might agree that the array-like syntax isn't commonly useful, 
 but for other usage - indeed, for common usage - I really can't see such 
 a wild statement being true.

Fair enough. I feel it's valid, since I don't see any point of using [] 
rvalues without using 'in' to avoid GPFs. Given the need for 'in', one 
doesn't need a redundant lookup via []. But, hey ~ if we were to get an 
additional method of the style that's apparently agreeable, then we'll 
have good reason to rejoice and to leap about with gay abandon:




 Furthermore, saying that having an array-style syntax is an invitation 
 to writing bad code is something some can (and have) said about arrays, 
 pointers, classes, class-less functions, couches, and generally 
 everything else.  Yes, for your uses of it, an inexperienced novice 
 might fall into bad habits with such syntax, but that does not again 
 mean it applies everywhere.

Indeed <G>


Cheers.

Nov 07 2005

"Unknown W. Brackets" <unknown simplemachines.org> writes:

 I must have made a serious mistake in the couching of that argument, 
 since I'm not at all familiar with this angle. Sorry about that. The 
 intent was to identify where [] syntax limits the expressiveness of the 
 AA API; to the point where it trips over itself. Wanted to isolate that 
 as a discussion point before making any suggestion as to how it might be 
 resolved. Clearly, I failed pitifully in that goal.

Perhaps it does.  But, this is contrasted with the fact that every 

PHP, Perl, and others - uses the same syntax.  It may be well and good 
to explain how much healthier an apple is than candy, but that doesn't 
mean stores are going to replace their candy with apples.

In PHP, typically, you do something like this:

isset($var['associative key']) ? $var['associative key'] : 'fall back';

It's actually pretty ugly.  You'll also notice the double lookup. 
Otherwise, you may get an error - if associative key does not exist. 
Mind you, most people ignore this error (which leads to bugs.)  The PHP 
Group is actually considering some other sort of syntax, like 
ifsetor($var['associative key'], 'fall back'), which is much easier.

But, still, that's using (or maybe abusing) the array-style syntax.  On 
the other hand, this:

int val;
if (assoc.get("associative key", val))
    writef("assocative key: %d\n", val);

Looks like a class, yes, but not like an associative array.  I must say, 
while this has been the strength of C++ in many ways (as little as 
possible built in), it has also been (in my view) the reason why other 
languages, like even D, look so much better.

 Well stated; though this point was ammended to exclude multiple lookups 
 as a reasonable alternative (as had originally been assumed). Hence, I 
 feel it stands.

Only assuming the worst-case theory that multiple lookups will happen. 
I think it very reasonable to assume a compiler might cache/optimize out 
such double lookups, even if the current does not.

This reminds me of web browsers.  You may not know, but there is one 
from the W3C (which writes/wrote HTML itself.)  It's called Amaya.  It 
sucks.  In fact, I'm not sure it handles their own standards as well as 
Opera/Mozilla/Safari.

While DMD may be good (I especially like the compile times), time will 
only tell if it's the best compiler for D in the future.

-[Unknown]

Nov 07 2005

"Kris" <fu bar.com> writes:

"Unknown W. Brackets" <unknown simplemachines.org> wrote
 In PHP, typically, you do something like this:

 isset($var['associative key']) ? $var['associative key'] : 'fall back';

 It's actually pretty ugly.  You'll also notice the double lookup. 
 Otherwise, you may get an error - if associative key does not exist. Mind 
 you, most people ignore this error (which leads to bugs.)  The PHP Group 
 is actually considering some other sort of syntax, like 
 ifsetor($var['associative key'], 'fall back'), which is much easier.

Interesting, Allthough scripting languages do have a different set of 
priorities.


 But, still, that's using (or maybe abusing) the array-style syntax.  On 
 the other hand, this:

 int val;
 if (assoc.get("associative key", val))
    writef("assocative key: %d\n", val);

 Looks like a class, yes, but not like an associative array.  I must say, 
 while this has been the strength of C++ in many ways (as little as 
 possible built in), it has also been (in my view) the reason why other 
 languages, like even D, look so much better.

Perhaps. Though I'd argue that the functionality is more important than the 
way it looks. Maybe you'd prefer an array-style version of the above:

bool opIndex (key, inout value);
~~~~~~~~~~~~~~~~~~~~~
int val;

if ( assoc["key", val] )
     writef("key: %d\n", val);
~~~~~~~~~~~~~~~~~~~~~

Would that be more apropos?


 While DMD may be good (I especially like the compile times), time will 
 only tell if it's the best compiler for D in the future.

True. Yet one must be able to depend on the feature-set across compiler 
implementations. This is a particularly sensitive concern since AAs are not 
part of the library; instead they are embedded within the language proper.

- Kris

Nov 07 2005

"Walter Bright" <newshound digitalmars.com> writes:

"Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 although this new behaviour of
 throwing an exception is, I think, highly questionable. How did that get

by
 the wolves in the first place? <g>

It was repeatedly asked for.

 For example, I quite often use an AA to identify things as 'special' ~ URL
 schemes for example. If the scheme is not in the AA then it ain't special.
 The missing case is /not/ exceptional; instead it is actually the norm; I
 certainly don't wish to be catching exceptions for the normal case (from
 either a semantic or performance perspective). Nor do I wish to use

pointers
 for such usual, simplistic, cases.

That's why the 'in' version is there.

 OTOH, what you did with the 'in' keyword and pointers improved that aspect
 of it ~ if one wants to eliminate a potential double-lookup then one can

use
 the pointer syntax. Good!

 The problem here is that, at the same time, you changed the semantics of a
 simple lookup such that it now either requires pointer-syntax, the

overhead
 of try/catch, or yet another lookup. I think that was a mistake, and am

not
 too shy to say so :-)

I liked the previous syntax, and the way it worked, because it was
efficient. But nobody, not one, spoke out in favor of it, and all heaped
ridicule on it (and not completely without merit, I threw in the towel on it
when it was pointed out that javascript didn't do it that way either, though
I thought it did). Sorry if I'm a little sensitive about this <g>.

 Lastly, I /do/ actually have a kind word to say about the original
 implementation: other than the potential double-lookup, it was fast, and

it
 was simple. I still think AA's could/should have been handled via

templates
 when they came along, and could therefore have been treated as a library
 utility rather than being built into the compiler itself. Regardless, the
 usage model is now arguably slower and more complex than before ~ largely
 negating the effort of placing AA's within the compiler in the first

place.
 IMO.

For most uses of AAs, the lookups of existing entries one expects to be in
there far outnumber the test/set style, so while it is a bit slower, it
isn't appreciably.

Nov 05 2005

Sean Kelly <sean f4.ca> writes:

Walter Bright wrote:
 
 I liked the previous syntax, and the way it worked, because it was
 efficient. But nobody, not one, spoke out in favor of it, and all heaped
 ridicule on it (and not completely without merit, I threw in the towel on it
 when it was pointed out that javascript didn't do it that way either, though
 I thought it did). Sorry if I'm a little sensitive about this <g>.

For what it's worth, I liked it too.  And I believe the C++ map worked 
this way (as justification for the design).

 For most uses of AAs, the lookups of existing entries one expects to be in
 there far outnumber the test/set style, so while it is a bit slower, it
 isn't appreciably.

I would have preferred leaving the existing syntax as-is and adding a 
new method called 'find' or some such that returned a pointer to the 
element or null if it doesn't exist.


Sean

Nov 06 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 06 Nov 2005 11:29:28 -0800, Sean Kelly <sean f4.ca> wrote:
 Walter Bright wrote:
  I liked the previous syntax, and the way it worked, because it was
 efficient. But nobody, not one, spoke out in favor of it, and all heaped
 ridicule on it (and not completely without merit, I threw in the towel  
 on it
 when it was pointed out that javascript didn't do it that way either,  
 though
 I thought it did). Sorry if I'm a little sensitive about this <g>.

 For what it's worth, I liked it too.  And I believe the C++ map worked  
 this way (as justification for the design).

 For most uses of AAs, the lookups of existing entries one expects to be  
 in
 there far outnumber the test/set style, so while it is a bit slower, it
 isn't appreciably.

 I would have preferred leaving the existing syntax as-is and adding a  
 new method called 'find' or some such that returned a pointer to the  
 element or null if it doesn't exist.

I think we can and should avoid pointers.

I think the types of things we want to do can be broken into categories:

1. 'check' for existance of an item.
2. 'check' for existance of an item and get it.
3. 'get' value, error if not exists.
4. 'set' value, create or replace existing.
[optional]
5. 'set' value if not existing, i.e. create only, don't replace.
6. 'set' value if existing, i.e. replace only, don't create.

I think ideally we want to be able to achieve all of the above without  
double lookups.
I think we can, or come pretty close without too many changes, here is  
what I recommend:


leave it as is, or change it back to returning true/false.
(NOCHANGE)


returns true/false and gets value if existing.
(ADD)


   - 'value = aa[key]'
returns value or throws error.
(ADD/NOCHANGE)


   - 'aa[key] = value'
creates or replaces value for key with (v).
(ADD/NOCHANGE)

[optional]

return true and assign value if non-existant (creating it), false  
otherwise and get existing value.
(ADD)


return true and assign value if exists (replacing), false otherwise
(ADD)


if (!aa.find(key,cur)) { aa[key] = value; }
if (aa.find(key,cur)) { aa[key] = value; }

I believe we need one get/find method that throws and one that doesn't  
(get throws, find doesn't) allowing us to make our intentions clear, i.e.  
you use the one that throws in cases where the item should exist, and not  
existing is an error.

I think 'find' is the essential component of AA's which we are missing at  
present.

I really dont mind what the syntax looks like, be it method style i.e.  
"value = aa.get(key)" or array style "value = aa[key]".

However, I think one consistent style is a good idea, and I don't think  
it's possible for the array style to represent the different intentions we  
have, which is why 'find' is essential.

Regan

Nov 06 2005

Derek Parnell <derek psych.ward> writes:

On Mon, 07 Nov 2005 10:13:22 +1300, Regan Heath wrote:


[snip]
 I think the types of things we want to do can be broken into categories:
 
 1. 'check' for existance of an item.
 2. 'check' for existance of an item and get it.
 3. 'get' value, error if not exists.
 4. 'set' value, create or replace existing.
 [optional]
 5. 'set' value if not existing, i.e. create only, don't replace.
 6. 'set' value if existing, i.e. replace only, don't create.

 
Well said. I think you are on to a winner here.

[snip]

 I really dont mind what the syntax looks like, be it method style i.e.  
 "value = aa.get(key)" or array style "value = aa[key]".

Totally agree with you. I'm not wedded to either syntax. However, we should
really stop calling Associative Arrays, "arrays" if we drop the array
syntax ;-)

 However, I think one consistent style is a good idea, and I don't think  
 it's possible for the array style to represent the different intentions we  
 have, which is why 'find' is essential.

Agreed. The array syntax only covers some of the desired behaviours one
would want to see in a hash-table (a.k.a AA)

-- 
Derek Parnell
Melbourne, Australia
7/11/2005 9:11:40 AM

Nov 06 2005

Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:

Regan Heath wrote:
...
 I think we can and should avoid pointers.
 
 I think the types of things we want to do can be broken into categories:
 
 1. 'check' for existance of an item.
 2. 'check' for existance of an item and get it.

...
 

 leave it as is, or change it back to returning true/false.
 (NOCHANGE)
 

 returns true/false and gets value if existing.
 (ADD)
 

There should also be an 'bool aa.finds(key)' here since all of those 


-- 
Bruno Medeiros - CS/E student
"Certain aspects of D are a pathway to many abilities some consider to 
be... unnatural."

Nov 07 2005

kris <fu bar.org> writes:

Sean Kelly wrote:
 
 I would have preferred leaving the existing syntax as-is and adding a 
 new method called 'find' or some such that returned a pointer to the 
 element or null if it doesn't exist.
 
 
 Sean

If you mean adding an AA method similar to this:



... then I'd fully agree with you. I think adding such a method is a 
good way to satisfy/resolve so many different requirements, and tastes. 
That particular signature avoids pointer usage and redundant lookups. An 
alternative would be to twist the array syntax some more, to do the same 
thing:




I do like how Walter changed 'in' (returning a pointer), since that can 
be useful for integration with C functions. But the AA[] rvalue, x = 
AA["foo"], change could be reverted in the presence of that new method.

Nov 06 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sun, 06 Nov 2005 19:57:15 -0800, kris <fu bar.org> wrote:
 Sean Kelly wrote:
  I would have preferred leaving the existing syntax as-is and adding a  
 new method called 'find' or some such that returned a pointer to the  
 element or null if it doesn't exist.
   Sean

 If you mean adding an AA method similar to this:



 ... then I'd fully agree with you. I think adding such a method is a  
 good way to satisfy/resolve so many different requirements, and tastes.  
 That particular signature avoids pointer usage and redundant lookups. An  
 alternative would be to twist the array syntax some more, to do the same  
 thing:




 I do like how Walter changed 'in' (returning a pointer), since that can  
 be useful for integration with C functions.

Good point, I hadn't thought of that. I was seeing this as redundant in  
the face of a 'get' function as shown above.

 But the AA[] rvalue, x = AA["foo"], change could be reverted in the  
 presence of that new method.

Do you mean reverted all the way back to inserting on lookup? i.e.

x = AA["foo"];

So, this causes the creation and insertion of an item for the key "foo"  
(assuming none existed prior to the call) into AA?

I can't see what advantage that gives us over returning typeof(v).init and  
not inserting?

Regan

Nov 06 2005

"Ameer Armaly" <ameer_armaly hotmail.com> writes:

"kris" <fu bar.org> wrote in message news:dkmio5$1qtr$1 digitaldaemon.com...
 Sean Kelly wrote:
 I would have preferred leaving the existing syntax as-is and adding a new 
 method called 'find' or some such that returned a pointer to the element 
 or null if it doesn't exist.


 Sean

 If you mean adding an AA method similar to this:



Yes something like this would be nice.
 ... then I'd fully agree with you. I think adding such a method is a good 
 way to satisfy/resolve so many different requirements, and tastes. That 
 particular signature avoids pointer usage and redundant lookups. An 
 alternative would be to twist the array syntax some more, to do the same 
 thing:




 I do like how Walter changed 'in' (returning a pointer), since that can be 
 useful for integration with C functions. But the AA[] rvalue, x = 
 AA["foo"], change could be reverted in the presence of that new method.

Nov 07 2005

"Kris" <fu bar.com> writes:

Ah. I just remembered that AA lookup's would insert an empty entry if one 
was not there already ... that behaviour has been exchanged for throwing an 
exception instead. Bleah.  As for idioms (mentioned below), in this case 
they can be optimized by the API rather than by the compiler.

I believe there is a way, within D, to elegantly resolve these ongoing 
issues ... if you'd be open to change, then I'd be happy to make some 
suggestions <g>

- Kris


"Walter Bright" <newshound digitalmars.com> wrote in message 
news:djuj2d$2vvv$1 digitaldaemon.com...
 "Kris" <fu bar.com> wrote in message 
 news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

 Nobody had a nice word to say about the original implementation.

 Just write it as:

    if (!(key in map))
        map[key] = new Record;
    r = map[key];
    ...

 No, it isn't as efficient as the old way. But, like I said, the old way 
 was
 called lots of unkind things <g>.

 It is possible that a future compiler may recognize the above as an idiom
 and rewrite it so the array lookup is done only once.

Oct 28 2005

David Medlock <noone nowhere.com> writes:

Kris wrote:
 Ah. I just remembered that AA lookup's would insert an empty entry if one 
 was not there already ... that behaviour has been exchanged for throwing an 
 exception instead. Bleah.  As for idioms (mentioned below), in this case 
 they can be optimized by the API rather than by the compiler.
 
 I believe there is a way, within D, to elegantly resolve these ongoing 
 issues ... if you'd be open to change, then I'd be happy to make some 
 suggestions <g>
 
 - Kris
 
 
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:djuj2d$2vvv$1 digitaldaemon.com...
 
"Kris" <fu bar.com> wrote in message 
news:dju31o$2he6$1 digitaldaemon.com...

 I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation.

Just write it as:

   if (!(key in map))
       map[key] = new Record;
   r = map[key];
   ...

No, it isn't as efficient as the old way. But, like I said, the old way 
was
called lots of unkind things <g>.

It is possible that a future compiler may recognize the above as an idiom
and rewrite it so the array lookup is done only once.

 

I will maintain my position that we should simply have a couple of builtins:

Previous thread:
http://www.digitalmars.com/d/archives/digitalmars/D/26554.html

The builtin methods would handle all cases pretty easily:

  bool get( in key, out value )
  bool insert( in key, in value, out oldvalue )


int[ char[] ] 	map;

int n;
if ( map.get( "Hello" , n ) ) { ..do something with n.. }
else { ... no value in n ..  }

if ( map.insert( "Hello", 200, n ) ) { .. previous value in n.. }
else { .. no previous value was in map.. }

Built in methods are overloadable whereas 'in' is not.

-DavidM

Oct 31 2005

"Kris" <fu bar.com> writes:

Yep ~ that would be great.

At the risk of being repetitive: I suspect AAs are trying to extract a bit 
too much out of  [] semantics ~ would be more productive to concentrate on a 
solid API rather than wringing out the array syntax; IMO. If one could 

no-brainer.

AAs just need a more usable veneer, beyond the [] syntax ~ or be moved into 
templates instead.



"David Medlock" <noone nowhere.com> wrote in message 
news:dk5rvr$1lbm$1 digitaldaemon.com...
 Kris wrote:
 Ah. I just remembered that AA lookup's would insert an empty entry if one 
 was not there already ... that behaviour has been exchanged for throwing 
 an exception instead. Bleah.  As for idioms (mentioned below), in this 
 case they can be optimized by the API rather than by the compiler.

 I believe there is a way, within D, to elegantly resolve these ongoing 
 issues ... if you'd be open to change, then I'd be happy to make some 
 suggestions <g>

 - Kris


 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:djuj2d$2vvv$1 digitaldaemon.com...

"Kris" <fu bar.com> wrote in message 
news:dju31o$2he6$1 digitaldaemon.com...

 I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation.

Just write it as:

   if (!(key in map))
       map[key] = new Record;
   r = map[key];
   ...

No, it isn't as efficient as the old way. But, like I said, the old way 
was
called lots of unkind things <g>.

It is possible that a future compiler may recognize the above as an idiom
and rewrite it so the array lookup is done only once.


 I will maintain my position that we should simply have a couple of 
 builtins:

 Previous thread:
 http://www.digitalmars.com/d/archives/digitalmars/D/26554.html

 The builtin methods would handle all cases pretty easily:

  bool get( in key, out value )
  bool insert( in key, in value, out oldvalue )


 int[ char[] ] map;

 int n;
 if ( map.get( "Hello" , n ) ) { ..do something with n.. }
 else { ... no value in n ..  }

 if ( map.insert( "Hello", 200, n ) ) { .. previous value in n.. }
 else { .. no previous value was in map.. }

 Built in methods are overloadable whereas 'in' is not.

 -DavidM

Oct 31 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
 class Foo
 {
        private Record [char[]] map;

        static class Record
        {
                void write (Foo parent, void[] data) {}
        }

        synchronized void put (char[] key, void[] data)
        {
                /****** access violation here *****/
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }
 }


 void main()
 {
        Foo f = new Foo;
        f.put ("foo", new void[10]);
 }



FWIW, MinTL HashAA returns a user-settable missing value on invalid keys. 
The default missing value is value.init. So you code above would have worked 
if map was a HashAA!(char[],Record). HashAA also has more methods that tweek 
things like "contains" and "take". It also supports sorting - by default 
elements are sorted by insertion order.

-Ben

Oct 29 2005

kris <fu bar.org> writes:

That all sounds cool, Ben.

Perhaps the core problem with AAs is that they're just trying to do too 
much with the [] syntax? Perhaps if AAs supported a contains(key, inout 
value) property, as Regan noted a year or more ago, then it would be 
more palatable and useful?

On the other hand, templates are more than adequate for handling such 
things; as you've proved with MinTL. Removing AAs would simplify the 
compiler also (obviously). If the template syntax were deemed too 
complicated for new users, perhaps the compiler could provide some 
generic sugar for 'special' templates, instead of all the AA specific 
code? Perhaps some kind of alias (or variation thereupon) might be 
sufficiently sugary?

- Kris


Ben Hinkle wrote:
 "Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
 
class Foo
{
       private Record [char[]] map;

       static class Record
       {
               void write (Foo parent, void[] data) {}
       }

       synchronized void put (char[] key, void[] data)
       {
               /****** access violation here *****/
               Record  r = map [key];

               if (r is null)
                  {
                  r = new Record ();
                  map [key] =  r;
                  }
               r.write (this, data);
       }
}


void main()
{
       Foo f = new Foo;
       f.put ("foo", new void[10]);
}



 
 
 FWIW, MinTL HashAA returns a user-settable missing value on invalid keys. 
 The default missing value is value.init. So you code above would have worked 
 if map was a HashAA!(char[],Record). HashAA also has more methods that tweek 
 things like "contains" and "take". It also supports sorting - by default 
 elements are sorted by insertion order.
 
 -Ben

Oct 29 2005

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - Access violation with AA's