www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - Access violation with AA's

reply "Kris" <fu bar.com> writes:
class Foo
{
        private Record [char[]] map;

        static class Record
        {
                void write (Foo parent, void[] data) {}
        }

        synchronized void put (char[] key, void[] data)
        {
                /****** access violation here *****/
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }
}


void main()
{
        Foo f = new Foo;
        f.put ("foo", new void[10]);
}

# Error: Access Violation

Depending upon where (in the code body) the dereference of 'map' occurs, one 
gets either an access-violation or an OutOfBounds exception. Fragile.

However ~ this, again, casts a shadow upon the AA implementation. I mean, 
throwing an exception in this case is surely dubious (a missing entry in an 
AA is *not* exceptional; often it is the norm). Further compare these two 
implementation of the above code:

#1
        synchronized void put (char[] key, void[] data)
        {
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }

#2
        synchronized void put (char[] key, void[] data)
        {
                Record  *r = key in map;

                if (r is null)
                   {
                   Record rr = new Record();
                   map [key] =  rr;
                   r = &rr;
                   }
                (*r).write (this, data);
        }


#2 is the way one is apparently expected to code; forcing one into using 
pointers. Is this a good thing? For something as rudimentary as a HashMap? 
Alternatively, one could do this:

#3
        synchronized void put (char[] key, void[] data)
        {
                Record r;

                try {
                     r = map [key];
                     } catch (OutOfBoundsException e)
                                 {
                                  r = new Record ();
                                  map [key] =  r;
                                 }
                r.write (this, data);
        }

While some might argue this is "better than #1", it has the side effect of 
being rather slow; and is rather less clear than #1, in my view. Again, not 
something one needs for a rudimentary container. I think this particular 
change to AA's is just flat-out bogus (and, none of this would be necessary 
if AA's were implemented as a template library).

Finally, any code written for the "old" implementation of AA's is broken. 
Turns out that Mango has quite a few cases similar to #1; each one now 
broken. Wonderful!

For Mango, neither # 2 or #3 are attractive options; #1 is clearly (IMO) the 
simplest and most in line with what D used to represent. What is one 
supposed to do?
Oct 28 2005
next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation. Just write it as: if (!(key in map)) map[key] = new Record; r = map[key]; ... No, it isn't as efficient as the old way. But, like I said, the old way was called lots of unkind things <g>. It is possible that a future compiler may recognize the above as an idiom and rewrite it so the array lookup is done only once.
Oct 28 2005
next sibling parent reply "Kris" <fu bar.com> writes:
Hi, Walter ~

I haven't been at all shy in the past in terms of criticizing the multiple 
lookups required for AA usage. Nor for some of the other "deficiencies" (as 
I imagine you're referring to).

However, what you suggest below /minimally/ requires two lookups, and three 
on an insert <g>. Your changes just made the performance concern notable 
worse than it was, along with breaking the existing code-base. The compiler 
optimization you speak of does not exist, and really should not need to.

Can we step back a moment, please?

One issue here is the Access Violation (why this is posted in the bugs 
section rather than in the main forum), although this new behaviour of 
throwing an exception is, I think, highly questionable. How did that get by 
the wolves in the first place? <g>

For example, I quite often use an AA to identify things as 'special' ~ URL 
schemes for example. If the scheme is not in the AA then it ain't special. 
The missing case is /not/ exceptional; instead it is actually the norm; I 
certainly don't wish to be catching exceptions for the normal case (from 
either a semantic or performance perspective). Nor do I wish to use pointers 
for such usual, simplistic, cases.

OTOH, what you did with the 'in' keyword and pointers improved that aspect 
of it ~ if one wants to eliminate a potential double-lookup then one can use 
the pointer syntax. Good!

The problem here is that, at the same time, you changed the semantics of a 
simple lookup such that it now either requires pointer-syntax, the overhead 
of try/catch, or yet another lookup. I think that was a mistake, and am not 
too shy to say so :-)

With respect, I think this sets a rather poor precedent for D beginners ~ as 
a questionable example of exception usage, and the associated added 
complexity or overhead of the current AA lookup model (or, alternatively, 
the required use of pointers).

Let's not forget there's an access-violation here too. Just compile that 
example and run it.

Lastly, I /do/ actually have a kind word to say about the original 
implementation: other than the potential double-lookup, it was fast, and it 
was simple. I still think AA's could/should have been handled via templates 
when they came along, and could therefore have been treated as a library 
utility rather than being built into the compiler itself. Regardless, the 
usage model is now arguably slower and more complex than before ~ largely 
negating the effort of placing AA's within the compiler in the first place. 
IMO.

Regards;





"Walter Bright" <newshound digitalmars.com> wrote in message 
news:djuj2d$2vvv$1 digitaldaemon.com...
 "Kris" <fu bar.com> wrote in message 
 news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation. Just write it as: if (!(key in map)) map[key] = new Record; r = map[key]; ... No, it isn't as efficient as the old way. But, like I said, the old way was called lots of unkind things <g>. It is possible that a future compiler may recognize the above as an idiom and rewrite it so the array lookup is done only once.

Oct 28 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
Just to add my 2c here as well.

I disliked the original AA and array behaviour because it inserted when  
you did a lookup.

I was a supporter of the "make it an exception" idea because to me the  
statement "val = aa["key"]; says get me the value for "key". However it  
seems that in practice, where we use AA's it's more typical that we're  
saying "have we got a value for "key" (as Kris mentioned). It is for this  
reason I have come to dislike the exception.

The trouble as I see it is that the the compiler cannot know in each case  
whether a missing item is exceptional and this is because we have no way  
of telling it. We've got one way to ask for an item "item = aa[index]" and  
that's it.

The solution IMO, something I have been an advocate of for some time now,  
is adding different ways to ask for items. The first and most relevant  
here is a "contains" method, eg.

bool contains(VALUE[KEY] a);
bool contains(VALUE[KEY] a, out VALUE v);

The first from essentiall does what "in" does. The second does what "in"  
does but assigns the value to 'v'.

Sure, I can and have written a template that uses 'in' to achieve these,  
but it seems that something this useful should be part of the default  
built-in array handling so that everyone has access to it. At the very  
least it should be part of the standard library.

Here is a list of things I can imagine a programmer wanting to do on  
lookup of an item, ideally all should be supported in the most efficient  
manner possible i.e. no double lookup/hash.

  - check for item (i.e. if ("a" in AA))
  - check for item
    - if exists, get value
    - if not exists, error
    - if not exists, add 'this one' (or .init value)

(feel free to add to this list)

My feeling is that we handle each of the above like so:

"check for item"
   if ("key" in AA) {}
   if (AA.contains("key")) {}

"check for item, if exists get value"
   if (AA.contains("key",val)) {}

"check for item, if not exists, error"
   val = AA["key"];
   val = AA.get("key");

So, contains is added (get is added), the rest remains as it is now.

The tricky one seems to be:

"check for item, if not exists, add 'this one' (or .init value)"

perhaps?

val = <value to insert>;
AA.getset("key",val);

so val would end up being the existing value, or the new inserted value.  
If you wanted to tell if there was an existing value you could keep a copy  
i.e.

nval = val = <value to insert>;
AA.getset("key",val);

if (nval is val) { //new value was inserted }
else { //we had an existing value }

A better name than "getset" can likely be found.

Regan
Oct 29 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
 My feeling is that we handle each of the above like so:

 "check for item"
    if ("key" in AA) {}
    if (AA.contains("key")) {}

 "check for item, if exists get value"
    if (AA.contains("key",val)) {}

 "check for item, if not exists, error"
    val = AA["key"];
    val = AA.get("key");

If it really bothers people that "val = AA["key"];" throws an exception then perhaps it could return the .init value and the explicit "get" call could throw the exception. I don't think it matters too much as I can see myself using "contains" in most cases. I dislike "val = AA["key"];" returning .init if there is no _other_ method (i.e. "contains") to get a value which can tell me whether the item did in fact exist or not. Example: int[char[]] AA; int v; v = AA["test"]; the .init value for int is 0, so if v == 0 after this call we do not know whether it existed and was 0 or didn't exist at all. Regan Regan
Oct 29 2005
prev sibling parent Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Regan Heath wrote:
 Just to add my 2c here as well.
 
 I disliked the original AA and array behaviour because it inserted when  
 you did a lookup.
 
 I was a supporter of the "make it an exception" idea because to me the  
 statement "val = aa["key"]; says get me the value for "key". However it  
 seems that in practice, where we use AA's it's more typical that we're  
 saying "have we got a value for "key" (as Kris mentioned). It is for 
 this  reason I have come to dislike the exception.
 
 The trouble as I see it is that the the compiler cannot know in each 
 case  whether a missing item is exceptional and this is because we have 
 no way  of telling it. We've got one way to ask for an item "item = 
 aa[index]" and  that's it.
 
 The solution IMO, something I have been an advocate of for some time 
 now,  is adding different ways to ask for items. The first and most 
 relevant  here is a "contains" method, eg.
 
 bool contains(VALUE[KEY] a);
 bool contains(VALUE[KEY] a, out VALUE v);
 
 The first from essentiall does what "in" does. The second does what 
 "in"  does but assigns the value to 'v'.
 
 Sure, I can and have written a template that uses 'in' to achieve 
 these,  but it seems that something this useful should be part of the 
 default  built-in array handling so that everyone has access to it. At 
 the very  least it should be part of the standard library.
 
 Here is a list of things I can imagine a programmer wanting to do on  
 lookup of an item, ideally all should be supported in the most 
 efficient  manner possible i.e. no double lookup/hash.
 
  - check for item (i.e. if ("a" in AA))
  - check for item
    - if exists, get value
    - if not exists, error
    - if not exists, add 'this one' (or .init value)
 
 (feel free to add to this list)
 
 My feeling is that we handle each of the above like so:
 
 "check for item"
   if ("key" in AA) {}
   if (AA.contains("key")) {}
 
 "check for item, if exists get value"
   if (AA.contains("key",val)) {}
 
 "check for item, if not exists, error"
   val = AA["key"];
   val = AA.get("key");
 
 So, contains is added (get is added), the rest remains as it is now.
 
 The tricky one seems to be:
 
 "check for item, if not exists, add 'this one' (or .init value)"
 
 perhaps?
 
 val = <value to insert>;
 AA.getset("key",val);
 
 so val would end up being the existing value, or the new inserted 
 value.  If you wanted to tell if there was an existing value you could 
 keep a copy  i.e.
 
 nval = val = <value to insert>;
 AA.getset("key",val);
 
 if (nval is val) { //new value was inserted }
 else { //we had an existing value }
 
 A better name than "getset" can likely be found.
 
 Regan

also have a method to set/add a new pair. Like: AA.set("key",val); // or AA.add("key",val); I find the current array usage syntax (and the previous one too, for that matter), quite strange and unnatural. -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Oct 31 2005
prev sibling next sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 One issue here is the Access Violation

Yes, I'd like to fix that one. Can you post an example specifically for that? That's a separate issue from discussion of how it should work.
Nov 05 2005
parent reply kris <fu bar.org> writes:
Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 
One issue here is the Access Violation

Yes, I'd like to fix that one. Can you post an example specifically for that? That's a separate issue from discussion of how it should work.

There was an example in the original post, commented with the location of the GPF.
Nov 05 2005
parent reply "Walter Bright" <newshound digitalmars.com> writes:
"kris" <fu bar.org> wrote in message news:dkiu4q$20bq$1 digitaldaemon.com...
 Walter Bright wrote:
 "Kris" <fu bar.com> wrote in message


One issue here is the Access Violation

Yes, I'd like to fix that one. Can you post an example specifically for that? That's a separate issue from discussion of how it should work.

of the GPF.

I checked it out. The GPF happens when compiled with -release, the ArrayBoundsError without -release. That is as designed; the idea is similar for regular arrays. Array bounds checking is not done when -release is thrown, and you get whatever happens.
Nov 05 2005
parent reply kris <fu bar.org> writes:
Walter Bright wrote:
 
 I checked it out. The GPF happens when compiled with -release, the
 ArrayBoundsError without -release. That is as designed; the idea is similar
 for regular arrays. Array bounds checking is not done when -release is
 thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library was -release, while the client code was not. *cough* You do realize, I hope, that the above approach dictates one /must/ use 'in' with AA's to avoid GPFs. I mean, if there might ever be a missing entry in an AA (of which is pretty much assured in the general case; esp within libraries) then one will end up with a GPF via -release Thus, since 'in' requires the use of pointers, it holds that AA's require pointer-syntax to enable robust coding. I sincerely hope you see the irony in that, Walter? I can somewhat understand your sensitivity in this regard; yet, the design appears to be a ticking bomb. Everyone ~ anyone ~ please help me to understand this is not the case? I posit the following: 1) use of the array-syntax for AA lookups will produce a GPF if the entry does not exist. Please put aside -release for the moment ~ it is a red-herring 2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 'in' statement. 3) The usage of the AA 'in' statement renders the array-syntax lookup redundant, since one uses the pointer to reach the data instead. 4) Thus, the array-syntax is effectively worthless (and superfluous) for AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. Further, the presence of AA array-syntax lookup is an invitation to write non-robust code.
Nov 05 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:

 Walter Bright wrote:
 
 I checked it out. The GPF happens when compiled with -release, the
 ArrayBoundsError without -release. That is as designed; the idea is similar
 for regular arrays. Array bounds checking is not done when -release is
 thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library was -release, while the client code was not. *cough* You do realize, I hope, that the above approach dictates one /must/ use 'in' with AA's to avoid GPFs. I mean, if there might ever be a missing entry in an AA (of which is pretty much assured in the general case; esp within libraries) then one will end up with a GPF via -release

Huh? Of course!? Why would you think it wise to accept data from an unknown source without either validating it or accepting the consequences. To otherwise complain is not logical. It is parallel in concept to accepting a user's keyboard-entered data without checking that its okay to use. In a library routine in which you open up it's API to external usage, one must either validate the parameters or accept the consequences of trying use unacceptable data.
 Thus, since 'in' requires the use of pointers, it holds that AA's 
 require pointer-syntax to enable robust coding. I sincerely hope you see 
 the irony in that, Walter?

It does not *require* one to use pointers. It is optional. if ( (UsrParm in MyArray) == null) { MyArray[UsrParm] = ArrayEntry = UsrData; } else { ArrayEntry = MyArray[UsrParm]; } Look! No pointers involved. And a good optimising D compiler might even be able to make this more efficient by caching the 'double' look up.
 I can somewhat understand your sensitivity in this regard; yet, the 
 design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
 to understand this is not the case?
 
 I posit the following:
 
 1) use of the array-syntax for AA lookups will produce a GPF if the 
 entry does not exist. Please put aside -release for the moment ~ it is a 
 red-herring

Only on a 'get' access. Not on an 'enquiry' or 'set' access.
 2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
 'in' statement.

No so.
 3) The usage of the AA 'in' statement renders the array-syntax lookup 
 redundant, since one uses the pointer to reach the data instead.

Not so.
 4) Thus, the array-syntax is effectively worthless (and superfluous) for 
 AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
 Further, the presence of AA array-syntax lookup is an invitation to 
 write non-robust code.

Not so. What would have been nice is instead of Walter get all upset over people not liking his implementation, is to provide all four types of access. 'enquiry' :: Key in Array (returns pointer to Value or Null) 'get' :: Value = Array[Key] (Gets the Value if it exists, error otherwise) 'set' :: Array[Key] = Value (Sets/Replaces the Value. Creates if it doesn't exist. 'initget' :: Value = Array.initset(Key) (Gets the Value if it exists, otherwise creates an entry with .init values.) Or any other equivalent syntax. The point is that there is no reason for the old behaviour to be totally removed from the language, just shifted away from being the default behaviour for 'Value = Array(Key)' syntax. -- Derek Parnell Melbourne, Australia 6/11/2005 5:41:50 PM
Nov 05 2005
parent reply kris <fu bar.org> writes:
Hey Derek; I think you may have misunderstand the problem, so I'll 
attempt to clarify somewhat ~

Derek Parnell wrote:
 On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:
 
 
Walter Bright wrote:

I checked it out. The GPF happens when compiled with -release, the
ArrayBoundsError without -release. That is as designed; the idea is similar
for regular arrays. Array bounds checking is not done when -release is
thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library was -release, while the client code was not. *cough* You do realize, I hope, that the above approach dictates one /must/ use 'in' with AA's to avoid GPFs. I mean, if there might ever be a missing entry in an AA (of which is pretty much assured in the general case; esp within libraries) then one will end up with a GPF via -release

Huh? Of course!? Why would you think it wise to accept data from an unknown source without either validating it or accepting the consequences. To otherwise complain is not logical. It is parallel in concept to accepting a user's keyboard-entered data without checking that its okay to use. In a library routine in which you open up it's API to external usage, one must either validate the parameters or accept the consequences of trying use unacceptable data.

I failed miserably to get your drift here. Hash tables are *not* like arrays. If they don't contain a key it is surely not a reason to GPF. Is it? We're talking about this code causing a GPF: char[char[]] AA; char[] s = AA["unforseen key"]; // GPF; can't check for a null return
 
 
Thus, since 'in' requires the use of pointers, it holds that AA's 
require pointer-syntax to enable robust coding. I sincerely hope you see 
the irony in that, Walter?

It does not *require* one to use pointers. It is optional. if ( (UsrParm in MyArray) == null) { MyArray[UsrParm] = ArrayEntry = UsrData; } else { ArrayEntry = MyArray[UsrParm]; } Look! No pointers involved. And a good optimising D compiler might even be able to make this more efficient by caching the 'double' look up.

Oh please! Let's try to stay in the land of reason here. Yes, you can come up with all sort of ways to /make/ it work with /multiple/ lookups. Walter suggested a way to do it with three lookups instead. I know you appreciate optimal code paths, Derek, so can we sidestep this please? The above code has two lookups, where only one should be necessary. I sure hope you avoid multiple lookups within Build? Since you're forcing the issue, let's change my posit to assert that gratuitous use of multiple lookups should be not be considered ideal?
I can somewhat understand your sensitivity in this regard; yet, the 
design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
to understand this is not the case?

I posit the following:

1) use of the array-syntax for AA lookups will produce a GPF if the 
entry does not exist. Please put aside -release for the moment ~ it is a 
red-herring

Only on a 'get' access. Not on an 'enquiry' or 'set' access.

False. We're talking about array-syntax, and not 'in' syntax. As above: char[] s = AA["unforseen key"]; // causes GPF
 
 
2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
'in' statement.

No so.

Au contraire, my friend ~ unless you're prepared to perform unecessary multiple lookups. I don't consider redundant lookups to be relevant, and neither should anyone following this ridiculous saga.
  
 
3) The usage of the AA 'in' statement renders the array-syntax lookup 
redundant, since one uses the pointer to reach the data instead.

Not so.

Certainly! Let's waste our time looking the entry up once again, just for jollies. And let's make sure the key is very, very long; and there's lot's of collisions in the hash table. Gratuitously wasting CPU cycles is surely a good thing.
  
 
4) Thus, the array-syntax is effectively worthless (and superfluous) for 
AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
Further, the presence of AA array-syntax lookup is an invitation to 
write non-robust code.

Not so.

Please re-read. Array-syntax lookup /by itself/ is borked. It has to be used in conjunction with 'in', and is therefore superfluous (since 'in' supplies the data anyway). Sure; you can lookup the AA again if you wish, but your counter-argument is redundant; just like the additional lookup. As noted, the existence of s=AA[], by itself, encourages rather fragile code. D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside).
 
 What would have been nice is instead of Walter get all upset over people
 not liking his implementation, is to provide all four types of access.
 
    'enquiry' ::  Key in Array (returns pointer to Value or Null)
    'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                      error otherwise)
    'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                      it doesn't exist.
    'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                           otherwise creates an entry with
                                           .init values.)
 
 Or any other equivalent syntax. The point is that there is no reason for
 the old behaviour to be totally removed from the language, just shifted
 away from being the default behaviour for 'Value = Array(Key)' syntax.
 

The point of this is to expose the flaws in the current design. The counterpoints you've made are based entirely upon the use of thoroughly redundant additional lookups, so of what true value are they? I mean that sincerely, since I just can't see any value in multiple lookups where perfectly sound alternatives have been around for decades. Do you want an alternative, robust, optimal API? It would be good to see AAs use a set of 'properties' instead, such as bool get("key", inout value) along with put(key, value) ~ which, BTW, has no pointer-syntax and is optimal in terms of avoiding those wholly redundant lookups. The problem here is not the built-in AAs per se ~ instead, it's the force-fit of array-syntax as the API. Replace that with a set of 'properties' and there would be nothing to bitch and moan about. Right? Either that, or replace them with a template? One with the above functions? What's truly extraordinary is that such a fundamental aspect of the language is still so unsound after all this time; and after so much worthless bickering. I mean, it's just a frickin' hash table for Bob's sake ... it ain't rocket science, and it sure as heck shouldn't be a political football. Oh ~ perhaps AAs are not intended to be hash tables?
Nov 06 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 06 Nov 2005 00:49:26 -0800, kris wrote:

 Hey Derek; I think you may have misunderstand the problem, so I'll 
 attempt to clarify somewhat ~

I don't believe that I have misunderstood "the problem" at all. Currently in D, when one attempts to retrieve a non-existent element in an array, it causes a run-time error to occur. This applies to all array types: fixed-length, dynamic-length, and associative. (And yes, in the current D, an associative array is implemented as a hash-table.) The type of error depends on whether the -release switch has been used or not. If it has been used then a memory access violation occurs (ie. GPF under unix), otherwise if -release was not used an ArrayBoundsError exception is thrown. The problem is that you don't like this behaviour for associative arrays. I assume that when trying to fetch a non-existent element you would either like the element to be automatically created with .init value(s) and/or to return some initialized value, or to always throw an ArrayBoundsError regardless of the -release status. Which is it you'd like to see happen?
 Derek Parnell wrote:
 On Sat, 05 Nov 2005 18:57:08 -0800, kris wrote:
 
 
Walter Bright wrote:

I checked it out. The GPF happens when compiled with -release, the
ArrayBoundsError without -release. That is as designed; the idea is similar
for regular arrays. Array bounds checking is not done when -release is
thrown, and you get whatever happens.

In the example provided, the lookup happened in a library. The library was -release, while the client code was not. *cough* You do realize, I hope, that the above approach dictates one /must/ use 'in' with AA's to avoid GPFs. I mean, if there might ever be a missing entry in an AA (of which is pretty much assured in the general case; esp within libraries) then one will end up with a GPF via -release

Huh? Of course!? Why would you think it wise to accept data from an unknown source without either validating it or accepting the consequences. To otherwise complain is not logical. It is parallel in concept to accepting a user's keyboard-entered data without checking that its okay to use. In a library routine in which you open up it's API to external usage, one must either validate the parameters or accept the consequences of trying use unacceptable data.

I failed miserably to get your drift here.

I apologize. Sometimes I'm not as good with words as I think I am. I made the assumption that the library managed an AA, and that an public function was available that fetches data from that AA based on a supplied key in one of the parameters. I was just saying that if this is the case, then you'd be wise to validate the key data prior to fetching the AA based on the externally supplied key value.
 Hash tables are *not* like arrays. If they don't contain a key it is surely
not a reason to GPF. Is 
 it? 

D's associative arrays are a specific type of hash table. The entries in the table are based on keys. And I agree, a GPF is only one of the possible implementation behaviors that are possible in response to a fetch attempt for an element that does not exist.
 We're talking about this code causing a GPF:
 
 char[char[]] AA;
 
 char[] s = AA["unforseen key"];  // GPF; can't check for a null return
 

This is why you might benefit from Walter reestablishing this sort of behaviour in D - in addition to the current AA behaviour. Sophisticated coders such as yourself can use such facilities. char[] s = AA.initset("unforseen key"); Now you can check for s.length == 0 if that's important to you. Of course, that isn't always a perfect way of detecting unforseen key accesses. In spite of the "double lookup" effect, I would still code it thus ... char[] s; if ("unforseen key" in AA) s = AA["unforseen key"]; else -- some error processing if appropriate. because it tells the reader of the code that it is possible to get bad keys and it implements a way to handle those unambiguously.
 
 
 
Thus, since 'in' requires the use of pointers, it holds that AA's 
require pointer-syntax to enable robust coding. I sincerely hope you see 
the irony in that, Walter?

It does not *require* one to use pointers. It is optional. if ( (UsrParm in MyArray) == null) { MyArray[UsrParm] = ArrayEntry = UsrData; } else { ArrayEntry = MyArray[UsrParm]; } Look! No pointers involved. And a good optimising D compiler might even be able to make this more efficient by caching the 'double' look up.

Oh please! Let's try to stay in the land of reason here. Yes, you can come up with all sort of ways to /make/ it work with /multiple/ lookups. Walter suggested a way to do it with three lookups instead. I know you appreciate optimal code paths, Derek, so can we sidestep this please?

You might be misunderstanding me, now. I prize maintainable source code over runtime performance any day. If run time performance is really such an issue, code in assembler otherwise get back into the land of reason.
 The above code has two lookups, where only one should be necessary. I 
 sure hope you avoid multiple lookups within Build?

Not if I can help it ;-) By the way, Build runs pretty fast in spite of me 'wasting' cycles checking for valid AA keys.
 Since you're forcing the issue, let's change my posit to assert that 
 gratuitous use of multiple lookups should be not be considered ideal?
 
 
I can somewhat understand your sensitivity in this regard; yet, the 
design appears to be a ticking bomb. Everyone ~ anyone ~ please help me 
to understand this is not the case?

I posit the following:

1) use of the array-syntax for AA lookups will produce a GPF if the 
entry does not exist. Please put aside -release for the moment ~ it is a 
red-herring

Only on a 'get' access. Not on an 'enquiry' or 'set' access.

False. We're talking about array-syntax, and not 'in' syntax. As above: char[] s = AA["unforseen key"]; // causes GPF

Isn't that what I said? Your code is performing a 'get' and not an 'enquiry'.
 
 
 
2) To avoid GPFs, one must use pointer-syntax in conjunction with the AA 
'in' statement.

No so.

Au contraire, my friend ~ unless you're prepared to perform unecessary multiple lookups. I don't consider redundant lookups to be relevant, and neither should anyone following this ridiculous saga.

I must be one of the clowns then. I don't follow your philosophy anymore. Cost of the application over time is more important to me than trivial optimizations. Trivial in the sense that if it doesn't account for more than 5% of a program's execution time, why optimize it to death. My philosophy regard this is more along the lines of code it legibly first, and then profile it to locate areas that are worth optimizing.
 
  
 
3) The usage of the AA 'in' statement renders the array-syntax lookup 
redundant, since one uses the pointer to reach the data instead.

Not so.

Certainly! Let's waste our time looking the entry up once again, just for jollies. And let's make sure the key is very, very long; and there's lot's of collisions in the hash table. Gratuitously wasting CPU cycles is surely a good thing.

And you have measured this, right?
  
 
4) Thus, the array-syntax is effectively worthless (and superfluous) for 
AA lookup, since pointer-syntax is a prerequisite to avoid GPFs. 
Further, the presence of AA array-syntax lookup is an invitation to 
write non-robust code.

Not so.

Please re-read. Array-syntax lookup /by itself/ is borked. It has to be used in conjunction with 'in', and is therefore superfluous (since 'in' supplies the data anyway).

Well not actually so. The "in" supplies the key and not the data. The data can be something totally different. real[char[]] AA; . . . real X; . . . if ("Some Key" in AA) X = AA["Some Key"]; else -- Handle unknown key value.
 Sure; you can lookup the AA again if you 
 wish, but your counter-argument is redundant; just like the additional 
 lookup. As noted, the existence of s=AA[], by itself, encourages rather 
 fragile code. D requires pointer-syntax to lookup an AA entry without 
 GPFing (redundant multiple lookups aside).

D does not *require* pointer syntax. It is optional. But for completeness here is the pointer version. real *q; q = "Some Key" in AA; if (a !is null) X = *q; else -- Handle unknown key value.
 
 What would have been nice is instead of Walter get all upset over people
 not liking his implementation, is to provide all four types of access.
 
    'enquiry' ::  Key in Array (returns pointer to Value or Null)
    'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                      error otherwise)
    'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                      it doesn't exist.
    'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                           otherwise creates an entry with
                                           .init values.)
 
 Or any other equivalent syntax. The point is that there is no reason for
 the old behaviour to be totally removed from the language, just shifted
 away from being the default behaviour for 'Value = Array(Key)' syntax.
 

The point of this is to expose the flaws in the current design. The counterpoints you've made are based entirely upon the use of thoroughly redundant additional lookups, so of what true value are they?

They have true value to me. They help me write code which can be read by other people because my intentions etc are made clearing in the code. -- Derek Parnell Melbourne, Australia 7/11/2005 2:17:21 AM
Nov 06 2005
parent reply kris <fu bar.org> writes:
I apologize for the long post. The salient point is right at the end, so 
please skip over the point/counterpoint argy-bargy.


Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The type
 of error depends on whether the -release switch has been used or not. If it
 has been used then a memory access violation occurs (ie. GPF under unix),
 otherwise if -release was not used an ArrayBoundsError exception is thrown.

I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.
 The problem is that you don't like this behaviour for associative arrays.

Really? If there's something I "don't like" here, it is an API that is problematic purely for the sake of using a particular syntax. You've read the tale of the Emporer's New Clothes, haven't you?
 I assume that when trying to fetch a non-existent element you would either
 like the element to be automatically created with .init value(s) and/or to
 return some initialized value, or to always throw an ArrayBoundsError
 regardless of the -release status. Which is it you'd like to see happen?

I can't understand why you feel these are the only options, Derek. I agree those are perhaps the options when using array-syntax, but that's exactly where the problem lies. Neither of your two options are attractive; particularly so when they are purely artificial constraints.
I failed miserably to get your drift here. 

I apologize. Sometimes I'm not as good with words as I think I am. I made the assumption that the library managed an AA, and that an public function was available that fetches data from that AA based on a supplied key in one of the parameters. I was just saying that if this is the case, then you'd be wise to validate the key data prior to fetching the AA based on the externally supplied key value.

This is hardly on topic, and smells of smoke. I have to remind you that you do not, and should not, require redundant lookups to check if an entry exists before fetching it from a hash-table.
Hash tables are *not* like arrays. If they don't contain a key it is surely not
a reason to GPF. Is 
it? 

D's associative arrays are a specific type of hash table. The entries in the table are based on keys. And I agree, a GPF is only one of the possible implementation behaviors that are possible in response to a fetch attempt for an element that does not exist.

Well, thank goodness. But, "specific type"? It's just a plain old hash-table, with some unwieldly syntax bolted onto it. The latter is the problem, not the former.
We're talking about this code causing a GPF:

char[char[]] AA;

char[] s = AA["unforseen key"];  // GPF; can't check for a null return

This is why you might benefit from Walter reestablishing this sort of behaviour in D - in addition to the current AA behaviour. Sophisticated coders such as yourself can use such facilities.

Sophisticated coders? Yer arse <g>. Hash tables are supposed to be trivial from the perspective of the user.
   char[] s = AA.initset("unforseen key");
 
 Now you can check for s.length == 0 if that's important to you. Of course,
 that isn't always a perfect way of detecting unforseen key accesses.
 

Complexity just for the sake of it. This is entirely unecessary.
 In spite of the "double lookup" effect, I would still code it thus ...
 
   char[] s;
   if ("unforseen key" in AA)
      s = AA["unforseen key"];
   else
      -- some error processing if appropriate.
 
 because it tells the reader of the code that it is possible to get bad keys
 and it implements a way to handle those unambiguously.

I see. Pray explain why this alternate API is so appalling by comparison: char[] s; if (aa.get("unforseen key", s)) // do something with s else // do something else Look! No redundant lookups! No pointers! It must be magic! And, to quote you, "it tells the reader of the code that it is possible to get bad keys and it implements a way to handle those unambiguously". Wouldn't you agree?
Let's try to stay in the land of reason here. Yes, you can come up with 
all sort of ways to /make/ it work with /multiple/ lookups. Walter 
suggested a way to do it with three lookups instead. I know you 
appreciate optimal code paths, Derek, so can we sidestep this please? 

You might be misunderstanding me, now. I prize maintainable source code over runtime performance any day. If run time performance is really such an issue, code in assembler otherwise get back into the land of reason.

Entirely misleading. Look at the example above and reconsider. You appear to be trying to turn this into something unrelated, Derek. Please desist. Yes, there is a performance related aspect here, but only because you insist on applying entirely redundant lookups. One can write perfectly clear intentions (arguably more so) by using an alternate, and more appropriate, API.
The above code has two lookups, where only one should be necessary. I 
sure hope you avoid multiple lookups within Build?

Not if I can help it ;-) By the way, Build runs pretty fast in spite of me 'wasting' cycles checking for valid AA keys.

Build is a great tool. However, it appears as though Build takes longer to execute than both the compiler plus linker together. It is not a high performance application, because it doesn't really need to be. But that's hardly important since we're talking about API's here. Build is great at what it does ~ it does not represent every application. Again, your statement vaguely implies that I argue against checking for valid AA keys. That's silly, Derek. I'm claiming that one can clearly and unambiguously both test the existence of, and avoid redundant lookups upon, a hash-table entry by using a more appropriate API.
 Isn't that what I said? Your code is performing a 'get' and not an
 'enquiry'. 

As far as HT's are concerned, a get is equivalent to a query. You can argue about seperating them all you wish, but you're simply arguing for redundant lookups. To avoid this is exactly why Walter is returning a pointer from the 'in' statement. Are you disagreeing with all perspectives?
Au contraire, my friend ~ unless you're prepared to perform unecessary 
multiple lookups. I don't consider redundant lookups to be relevant, and 
neither should anyone following this ridiculous saga.

I must be one of the clowns then. I don't follow your philosophy anymore. Cost of the application over time is more important to me than trivial optimizations. Trivial in the sense that if it doesn't account for more than 5% of a program's execution time, why optimize it to death. My philosophy regard this is more along the lines of code it legibly first, and then profile it to locate areas that are worth optimizing.

(the saga is ridiculous because it's years' old, whilst perfectly suitable alternatives have existed for decades) You're welcome to do double lookups all you want, Derek. However, you're insisting that D remain staunchly oblivious to better alternatives. There's so much spin in your counter that I'm feeling dizzy. You're attempting to suggest I don't care a whit about legibility, and that any quest to avoid redundant code is misguided. That's utter nonsense.
 And you have measured this, right?

Nobody needs to measure it, Derek. If one executes two lookups where one would suffice, then one will expend close to twice the effort/time. It stands to reason. You're trying to argue that a single lookup somehow makes the code less clear (entirely false), therefore we should all use two lookups instead. It's a pointless argument. Please try to keep an open mind about alternative APIs.
Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
used in conjunction with 'in', and is therefore superfluous (since 'in' 
supplies the data anyway). 

Well not actually so. The "in" supplies the key and not the data. The data can be something totally different. real[char[]] AA; . . . real X; . . . if ("Some Key" in AA) X = AA["Some Key"]; else -- Handle unknown key value.

What is your point there? That statment makes no sense at all. Here's the pointer version of you example: real[char[]] AA; ... real* x; ... x = ("Some Key" in AA); if (x) // do something with *x else // do something else And here's a robust, simple, efficient API, sans pointers: real[char[]] aa; ... real x; ... if (aa.get("Some Key", x)) // do something with x else // do something else
Sure; you can lookup the AA again if you 
wish, but your counter-argument is redundant; just like the additional 
lookup. As noted, the existence of s=AA[], by itself, encourages rather 
fragile code. D requires pointer-syntax to lookup an AA entry without 
GPFing (redundant multiple lookups aside).

D does not *require* pointer syntax. It is optional. But for completeness here is the pointer version.

I'll repeat what I said above: "D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside)" Your counter chooses to ignore the parenthisis. Restated: to avoid multiple lookups, and GPFs, one must use pointer syntax in D. Period.
What would have been nice is instead of Walter get all upset over people
not liking his implementation, is to provide all four types of access.

   'enquiry' ::  Key in Array (returns pointer to Value or Null)
   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                     error otherwise)
   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                     it doesn't exist.
   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                          otherwise creates an entry with
                                          .init values.)

Or any other equivalent syntax. The point is that there is no reason for
the old behaviour to be totally removed from the language, just shifted
away from being the default behaviour for 'Value = Array(Key)' syntax.



I agree. But I see you're insisting on force-fitting the [] syntax, resulting in a sub-optimal and overly busy API. All one needs is right here: bool get(key, inout value); void put(key, value); That is simple, robust, intuitive, optimal, proven, succinct. No redundant lookups. No pointers anywhere to be seen. The [] syntax seriously limits D in the API it can expose for these purposes. Which is why it's messy at this point. And it's why you have chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple methods are pefectly capable instead.
Nov 06 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 06 Nov 2005 13:23:40 -0800, kris <fu bar.org> wrote:
 Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in  
 an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The  
 type
 of error depends on whether the -release switch has been used or not.  
 If it
 has been used then a memory access violation occurs (ie. GPF under  
 unix),
 otherwise if -release was not used an ArrayBoundsError exception is  
 thrown.

I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.

Sez you! ;) Seriously though I disagree. I think it depends on what you're using it for. I have found the thrown exception useful for catching bugs in at least one app I have been writing. The code in question assumed a value existed, it was a program error for it not to exist. Thus, the current implementation, the current API was exactly what I desired in this case. "array bounds error" may not be exactly what it is, but whatever you want to call it an error when the item does not exist was a requirement in this case. However, I agree with your original point. There are cases where it's never an error for the value to be non existant, in fact I think perhaps it's more common for this to be the case. In which case if you were to reword your statement above to say that an array bounds error was not common in the API of a hash table I would be quite happy to agree. You guys seem to be arguing about all the wrong things. How about we start with what we want, i.e. 1. ability to code different "use cases" in a clear and simple manner. 2. avoid double lookups if possible, without sacraficing #1. The problem we all have with the current implentation is that in places #1 destroys #2 and vice-versa. See my reply to Sean in another branch of this thread, it has the API I would most like to see, essentially the addition of a function to check and get an item without an exception. I believe this API satisfies #1 and #2 above. Regan
Nov 06 2005
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
kris wrote:
 
 I agree. But I see you're insisting on force-fitting the [] syntax, 
 resulting in a sub-optimal and overly busy API. All one needs is right 
 here:
 
 bool get(key, inout value);
 void put(key, value);
 
 That is simple, robust, intuitive, optimal, proven, succinct. No 
 redundant lookups. No pointers anywhere to be seen.
 
 The [] syntax seriously limits D in the API it can expose for these 
 purposes. Which is why it's messy at this point. And it's why you have 
 chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
 methods are pefectly capable instead.

I like the [] and 'in' syntax as it was originally implemented, as it covered the majority of cases that I typically use dictionaries: either testing for existence or adding/modifying something already there. I personally have never used the [] syntax, for example, in instances where I did not want a value to be created if one did not exist, assuming it's modifying an lvalue. ie. var[key]++; var[key] = val; The only sticky issue with this syntax is how to handle rvalue expressions: x = var[key]; Does the above insert or merely return the init() value? I would prefer the latter, but I can see how it would be confusing. Assuming creation in all cases seems entirely reasonable to me, and it would be consitent with the C++ syntax. That aside, I would like to see your proposed get/put syntax added as it is both meaningful and relatively succinct. Sean
Nov 06 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

 I apologize for the long post. The salient point is right at the end, so 
 please skip over the point/counterpoint argy-bargy.
 
 
 Derek Parnell wrote:
 Currently in D, when one attempts to retrieve a non-existent element in an
 array, it causes a run-time error to occur. This applies to all array
 types: fixed-length, dynamic-length, and associative. (And yes, in the
 current D, an associative array is implemented as a hash-table.) The type
 of error depends on whether the -release switch has been used or not. If it
 has been used then a memory access violation occurs (ie. GPF under unix),
 otherwise if -release was not used an ArrayBoundsError exception is thrown.

I see you've bought into that. There is no such thing as an array-bounds error from the API of a hash-table, Derek. It's purely a manufactured idiom of the current API.

That may be true. However I am working with we we've got, knowing that change to D is such an unlikely thing that we'd really be better off building Beeblebrox's Probability Drive.
 The problem is that you don't like this behaviour for associative arrays.

Really? If there's something I "don't like" here, it is an API that is problematic purely for the sake of using a particular syntax. You've read the tale of the Emporer's New Clothes, haven't you?

I see where you're coming from now. And I have to agree that the functionality of AAs is being restricted by a strict adherence to the 'array' style of syntax.
 I assume that when trying to fetch a non-existent element you would either
 like the element to be automatically created with .init value(s) and/or to
 return some initialized value, or to always throw an ArrayBoundsError
 regardless of the -release status. Which is it you'd like to see happen?

I can't understand why you feel these are the only options, Derek. I agree those are perhaps the options when using array-syntax, but that's exactly where the problem lies. Neither of your two options are attractive; particularly so when they are purely artificial constraints.

Can you help me see another option? If one is trying to access an non-existent element, one either wants to know that it didn't exist or wants a default value returned. What else could there be?
I failed miserably to get your drift here. 

I apologize. Sometimes I'm not as good with words as I think I am. I made the assumption that the library managed an AA, and that an public function was available that fetches data from that AA based on a supplied key in one of the parameters. I was just saying that if this is the case, then you'd be wise to validate the key data prior to fetching the AA based on the externally supplied key value.

This is hardly on topic, and smells of smoke. I have to remind you that you do not, and should not, require redundant lookups to check if an entry exists before fetching it from a hash-table.

Of course one does not *require* redundant run time lookups. I still think that I was 'on topic'. I thought the original post came about because you have a library routine that GPFed when presented with a non-existent key. My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for the possibility of 'bad' keys. If we had other tools (for example, a better AA functionality) then you'd approach this topic differently.
Hash tables are *not* like arrays. If they don't contain a key it is surely not
a reason to GPF. Is 
it? 

D's associative arrays are a specific type of hash table. The entries in the table are based on keys. And I agree, a GPF is only one of the possible implementation behaviors that are possible in response to a fetch attempt for an element that does not exist.

Well, thank goodness. But, "specific type"? It's just a plain old hash-table, with some unwieldly syntax bolted onto it. The latter is the problem, not the former.

Okay. 'Specific type' in the sense that some hash tables are only used to detect the presence of the element keys, whereas other types of hash tables associate non-key data with the elements.
We're talking about this code causing a GPF:

char[char[]] AA;

char[] s = AA["unforseen key"];  // GPF; can't check for a null return

This is why you might benefit from Walter reestablishing this sort of behaviour in D - in addition to the current AA behaviour. Sophisticated coders such as yourself can use such facilities.

Sophisticated coders? Yer arse <g>. Hash tables are supposed to be trivial from the perspective of the user.

Totally agree.
 
   char[] s = AA.initset("unforseen key");
 
 Now you can check for s.length == 0 if that's important to you. Of course,
 that isn't always a perfect way of detecting unforseen key accesses.
 

Complexity just for the sake of it. This is entirely unecessary.

We agree to differ.
 
 In spite of the "double lookup" effect, I would still code it thus ...
 
   char[] s;
   if ("unforseen key" in AA)
      s = AA["unforseen key"];
   else
      -- some error processing if appropriate.
 
 because it tells the reader of the code that it is possible to get bad keys
 and it implements a way to handle those unambiguously.

I see. Pray explain why this alternate API is so appalling by comparison: char[] s; if (aa.get("unforseen key", s)) // do something with s else // do something else

It isn't appalling. Did I say that it was? In fact it is identical to my example, except for the syntax. I'm trying to discuss concepts, and not syntax.
 Look! No redundant lookups! No pointers! It must be magic! And, to quote 
 you, "it tells the reader of the code that it is possible to get bad 
 keys and it implements a way to handle those unambiguously". Wouldn't 
 you agree?

Yes. It is identical to my code (bar the syntax).
Let's try to stay in the land of reason here. Yes, you can come up with 
all sort of ways to /make/ it work with /multiple/ lookups. Walter 
suggested a way to do it with three lookups instead. I know you 
appreciate optimal code paths, Derek, so can we sidestep this please? 

You might be misunderstanding me, now. I prize maintainable source code over runtime performance any day. If run time performance is really such an issue, code in assembler otherwise get back into the land of reason.

Entirely misleading. Look at the example above and reconsider. You appear to be trying to turn this into something unrelated, Derek. Please desist. Yes, there is a performance related aspect here, but only because you insist on applying entirely redundant lookups. One can write perfectly clear intentions (arguably more so) by using an alternate, and more appropriate, API.

I agree. I didn't know your issue was with the syntax, as your original post was talking about GPFs and not syntax. I admit my mistake in not understanding your point of view regarding syntax.
The above code has two lookups, where only one should be necessary. I 
sure hope you avoid multiple lookups within Build?

Not if I can help it ;-) By the way, Build runs pretty fast in spite of me 'wasting' cycles checking for valid AA keys.

Build is a great tool. However, it appears as though Build takes longer to execute than both the compiler plus linker together.

That would be because it does a shit load of work before calling the bloody compiler and linker!
 It is not a high 
 performance application, because it doesn't really need to be. 

Exactly. An if it was, I'd definitely reconsider some of the coding idioms used.
But 
 that's hardly important since we're talking about API's here. Build is 
 great at what it does ~ it does not represent every application.
 
 Again, your statement vaguely implies that I argue against checking for 
 valid AA keys. That's silly, Derek. I'm claiming that one can clearly 
 and unambiguously both test the existence of, and avoid redundant 
 lookups upon, a hash-table entry by using a more appropriate API.

That was the part that I didn't get. Sorry for the waste of bandwidth.
 Isn't that what I said? Your code is performing a 'get' and not an
 'enquiry'. 

As far as HT's are concerned, a get is equivalent to a query.

Only for certain types of hash tables. If I want to get the data associated with a key I need to validate the key before getting the data.
 You can 
 argue about seperating them all you wish, but you're simply arguing for 
 redundant lookups. To avoid this is exactly why Walter is returning a 
 pointer from the 'in' statement. Are you disagreeing with all perspectives?

No! Where did that come from?! I imagine that a pointe is being returned so that the coder can get access to data (not the key) when a valid key is presented.
 
Au contraire, my friend ~ unless you're prepared to perform unecessary 
multiple lookups. I don't consider redundant lookups to be relevant, and 
neither should anyone following this ridiculous saga.

I must be one of the clowns then. I don't follow your philosophy anymore. Cost of the application over time is more important to me than trivial optimizations. Trivial in the sense that if it doesn't account for more than 5% of a program's execution time, why optimize it to death. My philosophy regard this is more along the lines of code it legibly first, and then profile it to locate areas that are worth optimizing.

(the saga is ridiculous because it's years' old, whilst perfectly suitable alternatives have existed for decades) You're welcome to do double lookups all you want, Derek. However, you're insisting that D remain staunchly oblivious to better alternatives.

How *do* you read this into my words? I cannot understand where I have said that the current D syntax is the best available and we should stop looking for better? I'm sure that anyone could discover with a short scan of previous posts, that I'm one of Walter's biggest critic. D is a great language but I'm one of the first to say that some decisions that Walter has made are terrible (IMNSHO), and that some other non-decisions are inexcusable.
 There's so much spin in your counter that I'm feeling dizzy. You're 
 attempting to suggest I don't care a whit about legibility, and that any 
 quest to avoid redundant code is misguided. That's utter nonsense.

As are the words you've placed into my posts ;-)
 And you have measured this, right?

Nobody needs to measure it, Derek. If one executes two lookups where one would suffice, then one will expend close to twice the effort/time. It stands to reason. You're trying to argue that a single lookup somehow makes the code less clear (entirely false), therefore we should all use two lookups instead. It's a pointless argument. Please try to keep an open mind about alternative APIs.

My mind is not, and has never been closed (on that issue anyway). Of course two lookups are going to take longer than one lookup! But there are some situations that it doesn't actually matter.
Please re-read. Array-syntax lookup /by itself/ is borked. It has to be 
used in conjunction with 'in', and is therefore superfluous (since 'in' 
supplies the data anyway). 

Well not actually so. The "in" supplies the key and not the data. The data can be something totally different. real[char[]] AA; . . . real X; . . . if ("Some Key" in AA) X = AA["Some Key"]; else -- Handle unknown key value.

What is your point there? That statment makes no sense at all. Here's the pointer version of you example: real[char[]] AA; ... real* x; ... x = ("Some Key" in AA); if (x) // do something with *x else // do something else And here's a robust, simple, efficient API, sans pointers: real[char[]] aa; ... real x; ... if (aa.get("Some Key", x)) // do something with x else // do something else

The only difference is syntax. The concepts are the same.
Sure; you can lookup the AA again if you 
wish, but your counter-argument is redundant; just like the additional 
lookup. As noted, the existence of s=AA[], by itself, encourages rather 
fragile code. D requires pointer-syntax to lookup an AA entry without 
GPFing (redundant multiple lookups aside).

D does not *require* pointer syntax. It is optional. But for completeness here is the pointer version.

I'll repeat what I said above: "D requires pointer-syntax to lookup an AA entry without GPFing (redundant multiple lookups aside)"

Agreed.
 Your counter chooses to ignore the parenthisis. Restated: to avoid 
 multiple lookups, and GPFs, one must use pointer syntax in D. Period.

Agreed.
What would have been nice is instead of Walter get all upset over people
not liking his implementation, is to provide all four types of access.

   'enquiry' ::  Key in Array (returns pointer to Value or Null)
   'get'     ::  Value = Array[Key] (Gets the Value if it exists,
                                     error otherwise)
   'set'     ::  Array[Key] = Value (Sets/Replaces the Value. Creates if
                                     it doesn't exist.
   'initget' ::  Value = Array.initset(Key) (Gets the Value if it exists,
                                          otherwise creates an entry with
                                          .init values.)

Or any other equivalent syntax. The point is that there is no reason for
the old behaviour to be totally removed from the language, just shifted
away from being the default behaviour for 'Value = Array(Key)' syntax.



I agree. But I see you're insisting on force-fitting the [] syntax, resulting in a sub-optimal and overly busy API.

Well actually it turns out that I was just trying to work within the constraints that Walter has implemented. I didn't know that you were really advocating syntax change. I wish for better functionality in AA's too.
 All one needs is right here:
 
 bool get(key, inout value);
 void put(key, value);
 
 That is simple, robust, intuitive, optimal, proven, succinct. No 
 redundant lookups. No pointers anywhere to be seen.

Well, 'inout' implements a pointer, but that's splitting hairs.
 The [] syntax seriously limits D in the API it can expose for these 
 purposes. Which is why it's messy at this point. And it's why you have 
 chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
 methods are pefectly capable instead.

Yep. A new syntax for AA would be a wonderful addition to D. -- Derek Parnell Melbourne, Australia 7/11/2005 9:15:39 AM
Nov 06 2005
parent kris <fu bar.org> writes:
Thanks ~ I'm really glad that's cleared up! Some replies inline:

Derek Parnell wrote:
 On Sun, 06 Nov 2005 13:23:40 -0800, kris wrote:

 That may be true. However I am working with we we've got, knowing that
 change to D is such an unlikely thing that we'd really be better off
 building Beeblebrox's Probability Drive.

Yes ~ there is that aspect. We should know better by now <g>
 Can you help me see another option? If one is trying to access an
 non-existent element, one either wants to know that it didn't exist or
 wants a default value returned. What else could there be?

Oh, I think one wants to know the entry does not exist; just via a more suitable API. I believe a "bool get(key, inout value)" style of function resolves those issues. I suspect you'd agree. It's something that various folks were asking for /eons/ ago. Worth revisiting, I felt.
 My point was that **GIVEN THE TOOLS WE HAVE** you'd be wise to cater for
 the possibility of 'bad' keys. If we had other tools (for example, a better
 AA functionality) then you'd approach this topic differently.

Point taken. Sorry for miscontruing your perspective.
 That would be because it does a shit load of work before calling the bloody
 compiler and linker!

Sure, it will take longer ~ I meant it takes about twice as long. Of course, that's neither here nor there since Build is such a great tool.
There's so much spin in your counter that I'm feeling dizzy. You're 
attempting to suggest I don't care a whit about legibility, and that any 
quest to avoid redundant code is misguided. That's utter nonsense.

As are the words you've placed into my posts ;-)

Touche <g>
The [] syntax seriously limits D in the API it can expose for these 
purposes. Which is why it's messy at this point. And it's why you have 
chosen to suggest 4 methods, plus the use of pointers, whereas 2 simple 
methods are pefectly capable instead.

Yep. A new syntax for AA would be a wonderful addition to D.

I feel it is actually more essential than that ~ but wholeheartedy agree otherwise. The original syntax could well have stayed intact, if it were bolstered via the addition of a "bool get(key, inout value)" method.
Nov 06 2005
prev sibling parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
I would strongly argue that if you want such checking, you should have 
two versions of the library: one with -release, and one without.

For example, if I write a C program in release mode, and pass negative 
coordinates to a function that renders data to the screen (obviously, 
assuming it took 'int's), I would not be surprised if it crashed.  Nor 
would I complain to the makers of my compiler or library.  I am passing 
bad data.

It's clear you disagree; you want to catch every case of the bad data 
(even though, I'm entirely sure, there are places in your library where 
your OWN logic might cause bugs/crashes because of bad data.)

Back to the dual library concept, I think this is more of an argument 
for that than for changing the way associative arrays are handled 
*again*.  If I could have some way to compile my D program with the 
contacts-on version of phobos, I'm sure that would be a great gain.

Anyway, your arguments are also flawed, as follows:

1. This is true.

2. This is true, only in the case that you use release mode and want to 
avoid GPFs when bad data is provided (assuming that a GPF can be stack 
traced, either by code in the program or a debugger.  That is outside 
this issue, so we assume reasonable-case.)

3. This is not true.  Having a bike, even if you need to use it to get 
gas sometimes, does not render your car redundant nor useless.  Even if 
gas prices are so high that you cannot use the car, that does not mean 
your wife would appreciate you selling it.  As argued elsewhere, the 
usage of the in statement does not necessitate using pointers to access 
the data at all.

4. Obviously, this is a bizarre conclusion to make.  For your uses, 
surely we might agree that the array-like syntax isn't commonly useful, 
but for other usage - indeed, for common usage - I really can't see such 
a wild statement being true.

Furthermore, saying that having an array-style syntax is an invitation 
to writing bad code is something some can (and have) said about arrays, 
pointers, classes, class-less functions, couches, and generally 
everything else.  Yes, for your uses of it, an inexperienced novice 
might fall into bad habits with such syntax, but that does not again 
mean it applies everywhere.

-[Unknown]
Nov 06 2005
parent reply kris <fu bar.org> writes:
Good points, but with some caveats:

Unknown W. Brackets wrote:
 I would strongly argue that if you want such checking, you should have 
 two versions of the library: one with -release, and one without.
 
 For example, if I write a C program in release mode, and pass negative 
 coordinates to a function that renders data to the screen (obviously, 
 assuming it took 'int's), I would not be surprised if it crashed.  Nor 
 would I complain to the makers of my compiler or library.  I am passing 
 bad data.
 
 It's clear you disagree; you want to catch every case of the bad data 
 (even though, I'm entirely sure, there are places in your library where 
 your OWN logic might cause bugs/crashes because of bad data.)

I must have made a serious mistake in the couching of that argument, since I'm not at all familiar with this angle. Sorry about that. The intent was to identify where [] syntax limits the expressiveness of the AA API; to the point where it trips over itself. Wanted to isolate that as a discussion point before making any suggestion as to how it might be resolved. Clearly, I failed pitifully in that goal. As to catching a bad case of data, I'm much happier dealing with that on my own (rather than the compiler assuming it knows all). Given an ammended API, everything would be groovy.
 Anyway, your arguments are also flawed, as follows:

 3. This is not true.  Having a bike, even if you need to use it to get 
 gas sometimes, does not render your car redundant nor useless.  Even if 
 gas prices are so high that you cannot use the car, that does not mean 
 your wife would appreciate you selling it.  As argued elsewhere, the 
 usage of the in statement does not necessitate using pointers to access 
 the data at all.

Well stated; though this point was ammended to exclude multiple lookups as a reasonable alternative (as had originally been assumed). Hence, I feel it stands.
 
 4. Obviously, this is a bizarre conclusion to make.  For your uses, 
 surely we might agree that the array-like syntax isn't commonly useful, 
 but for other usage - indeed, for common usage - I really can't see such 
 a wild statement being true.

Fair enough. I feel it's valid, since I don't see any point of using [] rvalues without using 'in' to avoid GPFs. Given the need for 'in', one doesn't need a redundant lookup via []. But, hey ~ if we were to get an additional method of the style that's apparently agreeable, then we'll have good reason to rejoice and to leap about with gay abandon: # bool get(key, inout value); // or some equivalent twist on opArray()
 Furthermore, saying that having an array-style syntax is an invitation 
 to writing bad code is something some can (and have) said about arrays, 
 pointers, classes, class-less functions, couches, and generally 
 everything else.  Yes, for your uses of it, an inexperienced novice 
 might fall into bad habits with such syntax, but that does not again 
 mean it applies everywhere.

Indeed <G> Cheers.
Nov 07 2005
parent reply "Unknown W. Brackets" <unknown simplemachines.org> writes:
 I must have made a serious mistake in the couching of that argument, 
 since I'm not at all familiar with this angle. Sorry about that. The 
 intent was to identify where [] syntax limits the expressiveness of the 
 AA API; to the point where it trips over itself. Wanted to isolate that 
 as a discussion point before making any suggestion as to how it might be 
 resolved. Clearly, I failed pitifully in that goal.

Perhaps it does. But, this is contrasted with the fact that every major/popular C-like language that has associative arrays built in - C#, PHP, Perl, and others - uses the same syntax. It may be well and good to explain how much healthier an apple is than candy, but that doesn't mean stores are going to replace their candy with apples. In PHP, typically, you do something like this: isset($var['associative key']) ? $var['associative key'] : 'fall back'; It's actually pretty ugly. You'll also notice the double lookup. Otherwise, you may get an error - if associative key does not exist. Mind you, most people ignore this error (which leads to bugs.) The PHP Group is actually considering some other sort of syntax, like ifsetor($var['associative key'], 'fall back'), which is much easier. But, still, that's using (or maybe abusing) the array-style syntax. On the other hand, this: int val; if (assoc.get("associative key", val)) writef("assocative key: %d\n", val); Looks like a class, yes, but not like an associative array. I must say, while this has been the strength of C++ in many ways (as little as possible built in), it has also been (in my view) the reason why other languages, like even D, look so much better.
 Well stated; though this point was ammended to exclude multiple lookups 
 as a reasonable alternative (as had originally been assumed). Hence, I 
 feel it stands.

Only assuming the worst-case theory that multiple lookups will happen. I think it very reasonable to assume a compiler might cache/optimize out such double lookups, even if the current does not. This reminds me of web browsers. You may not know, but there is one from the W3C (which writes/wrote HTML itself.) It's called Amaya. It sucks. In fact, I'm not sure it handles their own standards as well as Opera/Mozilla/Safari. While DMD may be good (I especially like the compile times), time will only tell if it's the best compiler for D in the future. -[Unknown]
Nov 07 2005
parent "Kris" <fu bar.com> writes:
"Unknown W. Brackets" <unknown simplemachines.org> wrote
 In PHP, typically, you do something like this:

 isset($var['associative key']) ? $var['associative key'] : 'fall back';

 It's actually pretty ugly.  You'll also notice the double lookup. 
 Otherwise, you may get an error - if associative key does not exist. Mind 
 you, most people ignore this error (which leads to bugs.)  The PHP Group 
 is actually considering some other sort of syntax, like 
 ifsetor($var['associative key'], 'fall back'), which is much easier.

Interesting, Allthough scripting languages do have a different set of priorities.
 But, still, that's using (or maybe abusing) the array-style syntax.  On 
 the other hand, this:

 int val;
 if (assoc.get("associative key", val))
    writef("assocative key: %d\n", val);

 Looks like a class, yes, but not like an associative array.  I must say, 
 while this has been the strength of C++ in many ways (as little as 
 possible built in), it has also been (in my view) the reason why other 
 languages, like even D, look so much better.

Perhaps. Though I'd argue that the functionality is more important than the way it looks. Maybe you'd prefer an array-style version of the above: bool opIndex (key, inout value); ~~~~~~~~~~~~~~~~~~~~~ int val; if ( assoc["key", val] ) writef("key: %d\n", val); ~~~~~~~~~~~~~~~~~~~~~ Would that be more apropos?
 While DMD may be good (I especially like the compile times), time will 
 only tell if it's the best compiler for D in the future.

True. Yet one must be able to depend on the feature-set across compiler implementations. This is a particularly sensitive concern since AAs are not part of the library; instead they are embedded within the language proper. - Kris
Nov 07 2005
prev sibling parent reply "Walter Bright" <newshound digitalmars.com> writes:
"Kris" <fu bar.com> wrote in message news:djumru$1a9$1 digitaldaemon.com...
 although this new behaviour of
 throwing an exception is, I think, highly questionable. How did that get

 the wolves in the first place? <g>

It was repeatedly asked for.
 For example, I quite often use an AA to identify things as 'special' ~ URL
 schemes for example. If the scheme is not in the AA then it ain't special.
 The missing case is /not/ exceptional; instead it is actually the norm; I
 certainly don't wish to be catching exceptions for the normal case (from
 either a semantic or performance perspective). Nor do I wish to use

 for such usual, simplistic, cases.

That's why the 'in' version is there.
 OTOH, what you did with the 'in' keyword and pointers improved that aspect
 of it ~ if one wants to eliminate a potential double-lookup then one can

 the pointer syntax. Good!

 The problem here is that, at the same time, you changed the semantics of a
 simple lookup such that it now either requires pointer-syntax, the

 of try/catch, or yet another lookup. I think that was a mistake, and am

 too shy to say so :-)

I liked the previous syntax, and the way it worked, because it was efficient. But nobody, not one, spoke out in favor of it, and all heaped ridicule on it (and not completely without merit, I threw in the towel on it when it was pointed out that javascript didn't do it that way either, though I thought it did). Sorry if I'm a little sensitive about this <g>.
 Lastly, I /do/ actually have a kind word to say about the original
 implementation: other than the potential double-lookup, it was fast, and

 was simple. I still think AA's could/should have been handled via

 when they came along, and could therefore have been treated as a library
 utility rather than being built into the compiler itself. Regardless, the
 usage model is now arguably slower and more complex than before ~ largely
 negating the effort of placing AA's within the compiler in the first

 IMO.

For most uses of AAs, the lookups of existing entries one expects to be in there far outnumber the test/set style, so while it is a bit slower, it isn't appreciably.
Nov 05 2005
parent reply Sean Kelly <sean f4.ca> writes:
Walter Bright wrote:
 
 I liked the previous syntax, and the way it worked, because it was
 efficient. But nobody, not one, spoke out in favor of it, and all heaped
 ridicule on it (and not completely without merit, I threw in the towel on it
 when it was pointed out that javascript didn't do it that way either, though
 I thought it did). Sorry if I'm a little sensitive about this <g>.

For what it's worth, I liked it too. And I believe the C++ map worked this way (as justification for the design).
 For most uses of AAs, the lookups of existing entries one expects to be in
 there far outnumber the test/set style, so while it is a bit slower, it
 isn't appreciably.

I would have preferred leaving the existing syntax as-is and adding a new method called 'find' or some such that returned a pointer to the element or null if it doesn't exist. Sean
Nov 06 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 06 Nov 2005 11:29:28 -0800, Sean Kelly <sean f4.ca> wrote:
 Walter Bright wrote:
  I liked the previous syntax, and the way it worked, because it was
 efficient. But nobody, not one, spoke out in favor of it, and all heaped
 ridicule on it (and not completely without merit, I threw in the towel  
 on it
 when it was pointed out that javascript didn't do it that way either,  
 though
 I thought it did). Sorry if I'm a little sensitive about this <g>.

For what it's worth, I liked it too. And I believe the C++ map worked this way (as justification for the design).
 For most uses of AAs, the lookups of existing entries one expects to be  
 in
 there far outnumber the test/set style, so while it is a bit slower, it
 isn't appreciably.

I would have preferred leaving the existing syntax as-is and adding a new method called 'find' or some such that returned a pointer to the element or null if it doesn't exist.

I think we can and should avoid pointers. I think the types of things we want to do can be broken into categories: 1. 'check' for existance of an item. 2. 'check' for existance of an item and get it. 3. 'get' value, error if not exists. 4. 'set' value, create or replace existing. [optional] 5. 'set' value if not existing, i.e. create only, don't replace. 6. 'set' value if existing, i.e. replace only, don't create. I think ideally we want to be able to achieve all of the above without double lookups. I think we can, or come pretty close without too many changes, here is what I recommend: #1 - 'key in aa' leave it as is, or change it back to returning true/false. (NOCHANGE) #2 - 'aa.finds(key,[out]value)' returns true/false and gets value if existing. (ADD) #3- 'value = aa.get(key)' - 'value = aa[key]' returns value or throws error. (ADD/NOCHANGE) #4- 'aa.set(key,value)' - 'aa[key] = value' creates or replaces value for key with (v). (ADD/NOCHANGE) [optional] #5- 'aa.create(key,[inout]value)' return true and assign value if non-existant (creating it), false otherwise and get existing value. (ADD) #6- 'aa.replace(key,value)' return true and assign value if exists (replacing), false otherwise (ADD) #5 and #6 can be done with double lookups using 'find' eg. if (!aa.find(key,cur)) { aa[key] = value; } if (aa.find(key,cur)) { aa[key] = value; } I believe we need one get/find method that throws and one that doesn't (get throws, find doesn't) allowing us to make our intentions clear, i.e. you use the one that throws in cases where the item should exist, and not existing is an error. I think 'find' is the essential component of AA's which we are missing at present. I really dont mind what the syntax looks like, be it method style i.e. "value = aa.get(key)" or array style "value = aa[key]". However, I think one consistent style is a good idea, and I don't think it's possible for the array style to represent the different intentions we have, which is why 'find' is essential. Regan
Nov 06 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Mon, 07 Nov 2005 10:13:22 +1300, Regan Heath wrote:


[snip]
 I think the types of things we want to do can be broken into categories:
 
 1. 'check' for existance of an item.
 2. 'check' for existance of an item and get it.
 3. 'get' value, error if not exists.
 4. 'set' value, create or replace existing.
 [optional]
 5. 'set' value if not existing, i.e. create only, don't replace.
 6. 'set' value if existing, i.e. replace only, don't create.

Well said. I think you are on to a winner here. [snip]
 I really dont mind what the syntax looks like, be it method style i.e.  
 "value = aa.get(key)" or array style "value = aa[key]".

Totally agree with you. I'm not wedded to either syntax. However, we should really stop calling Associative Arrays, "arrays" if we drop the array syntax ;-)
 However, I think one consistent style is a good idea, and I don't think  
 it's possible for the array style to represent the different intentions we  
 have, which is why 'find' is essential.

Agreed. The array syntax only covers some of the desired behaviours one would want to see in a hash-table (a.k.a AA) -- Derek Parnell Melbourne, Australia 7/11/2005 9:11:40 AM
Nov 06 2005
prev sibling parent Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Regan Heath wrote:
...
 I think we can and should avoid pointers.
 
 I think the types of things we want to do can be broken into categories:
 
 1. 'check' for existance of an item.
 2. 'check' for existance of an item and get it.

 
 #1 - 'key in aa'
 leave it as is, or change it back to returning true/false.
 (NOCHANGE)
 
 #2 - 'aa.finds(key,[out]value)'
 returns true/false and gets value if existing.
 (ADD)
 

oprations should be doable with methods (namely #1 too). -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Nov 07 2005
prev sibling parent reply kris <fu bar.org> writes:
Sean Kelly wrote:
 
 I would have preferred leaving the existing syntax as-is and adding a 
 new method called 'find' or some such that returned a pointer to the 
 element or null if it doesn't exist.
 
 
 Sean

If you mean adding an AA method similar to this: # bool get(key, inout value); ... then I'd fully agree with you. I think adding such a method is a good way to satisfy/resolve so many different requirements, and tastes. That particular signature avoids pointer usage and redundant lookups. An alternative would be to twist the array syntax some more, to do the same thing: # bool opArray(key, inout value); I do like how Walter changed 'in' (returning a pointer), since that can be useful for integration with C functions. But the AA[] rvalue, x = AA["foo"], change could be reverted in the presence of that new method.
Nov 06 2005
next sibling parent "Regan Heath" <regan netwin.co.nz> writes:
On Sun, 06 Nov 2005 19:57:15 -0800, kris <fu bar.org> wrote:
 Sean Kelly wrote:
  I would have preferred leaving the existing syntax as-is and adding a  
 new method called 'find' or some such that returned a pointer to the  
 element or null if it doesn't exist.
   Sean

If you mean adding an AA method similar to this: # bool get(key, inout value); ... then I'd fully agree with you. I think adding such a method is a good way to satisfy/resolve so many different requirements, and tastes. That particular signature avoids pointer usage and redundant lookups. An alternative would be to twist the array syntax some more, to do the same thing: # bool opArray(key, inout value); I do like how Walter changed 'in' (returning a pointer), since that can be useful for integration with C functions.

Good point, I hadn't thought of that. I was seeing this as redundant in the face of a 'get' function as shown above.
 But the AA[] rvalue, x = AA["foo"], change could be reverted in the  
 presence of that new method.

Do you mean reverted all the way back to inserting on lookup? i.e. x = AA["foo"]; So, this causes the creation and insertion of an item for the key "foo" (assuming none existed prior to the call) into AA? I can't see what advantage that gives us over returning typeof(v).init and not inserting? Regan
Nov 06 2005
prev sibling parent "Ameer Armaly" <ameer_armaly hotmail.com> writes:
"kris" <fu bar.org> wrote in message news:dkmio5$1qtr$1 digitaldaemon.com...
 Sean Kelly wrote:
 I would have preferred leaving the existing syntax as-is and adding a new 
 method called 'find' or some such that returned a pointer to the element 
 or null if it doesn't exist.


 Sean

If you mean adding an AA method similar to this: # bool get(key, inout value);

 ... then I'd fully agree with you. I think adding such a method is a good 
 way to satisfy/resolve so many different requirements, and tastes. That 
 particular signature avoids pointer usage and redundant lookups. An 
 alternative would be to twist the array syntax some more, to do the same 
 thing:

 # bool opArray(key, inout value);


 I do like how Walter changed 'in' (returning a pointer), since that can be 
 useful for integration with C functions. But the AA[] rvalue, x = 
 AA["foo"], change could be reverted in the presence of that new method. 

Nov 07 2005
prev sibling parent reply "Kris" <fu bar.com> writes:
Ah. I just remembered that AA lookup's would insert an empty entry if one 
was not there already ... that behaviour has been exchanged for throwing an 
exception instead. Bleah.  As for idioms (mentioned below), in this case 
they can be optimized by the API rather than by the compiler.

I believe there is a way, within D, to elegantly resolve these ongoing 
issues ... if you'd be open to change, then I'd be happy to make some 
suggestions <g>

- Kris


"Walter Bright" <newshound digitalmars.com> wrote in message 
news:djuj2d$2vvv$1 digitaldaemon.com...
 "Kris" <fu bar.com> wrote in message 
 news:dju31o$2he6$1 digitaldaemon.com...
  I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation. Just write it as: if (!(key in map)) map[key] = new Record; r = map[key]; ... No, it isn't as efficient as the old way. But, like I said, the old way was called lots of unkind things <g>. It is possible that a future compiler may recognize the above as an idiom and rewrite it so the array lookup is done only once.

Oct 28 2005
parent reply David Medlock <noone nowhere.com> writes:
Kris wrote:
 Ah. I just remembered that AA lookup's would insert an empty entry if one 
 was not there already ... that behaviour has been exchanged for throwing an 
 exception instead. Bleah.  As for idioms (mentioned below), in this case 
 they can be optimized by the API rather than by the compiler.
 
 I believe there is a way, within D, to elegantly resolve these ongoing 
 issues ... if you'd be open to change, then I'd be happy to make some 
 suggestions <g>
 
 - Kris
 
 
 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:djuj2d$2vvv$1 digitaldaemon.com...
 
"Kris" <fu bar.com> wrote in message 
news:dju31o$2he6$1 digitaldaemon.com...

 I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation. Just write it as: if (!(key in map)) map[key] = new Record; r = map[key]; ... No, it isn't as efficient as the old way. But, like I said, the old way was called lots of unkind things <g>. It is possible that a future compiler may recognize the above as an idiom and rewrite it so the array lookup is done only once.


I will maintain my position that we should simply have a couple of builtins: Previous thread: http://www.digitalmars.com/d/archives/digitalmars/D/26554.html The builtin methods would handle all cases pretty easily: bool get( in key, out value ) bool insert( in key, in value, out oldvalue ) int[ char[] ] map; int n; if ( map.get( "Hello" , n ) ) { ..do something with n.. } else { ... no value in n .. } if ( map.insert( "Hello", 200, n ) ) { .. previous value in n.. } else { .. no previous value was in map.. } Built in methods are overloadable whereas 'in' is not. -DavidM
Oct 31 2005
parent "Kris" <fu bar.com> writes:
Yep ~ that would be great.

At the risk of being repetitive: I suspect AAs are trying to extract a bit 
too much out of  [] semantics ~ would be more productive to concentrate on a 
solid API rather than wringing out the array syntax; IMO. If one could 
attach properties to the compiler types (a la C#), then this would be a 
no-brainer.

AAs just need a more usable veneer, beyond the [] syntax ~ or be moved into 
templates instead.



"David Medlock" <noone nowhere.com> wrote in message 
news:dk5rvr$1lbm$1 digitaldaemon.com...
 Kris wrote:
 Ah. I just remembered that AA lookup's would insert an empty entry if one 
 was not there already ... that behaviour has been exchanged for throwing 
 an exception instead. Bleah.  As for idioms (mentioned below), in this 
 case they can be optimized by the API rather than by the compiler.

 I believe there is a way, within D, to elegantly resolve these ongoing 
 issues ... if you'd be open to change, then I'd be happy to make some 
 suggestions <g>

 - Kris


 "Walter Bright" <newshound digitalmars.com> wrote in message 
 news:djuj2d$2vvv$1 digitaldaemon.com...

"Kris" <fu bar.com> wrote in message 
news:dju31o$2he6$1 digitaldaemon.com...

 I think this particular change to AA's is just flat-out bogus

Nobody had a nice word to say about the original implementation. Just write it as: if (!(key in map)) map[key] = new Record; r = map[key]; ... No, it isn't as efficient as the old way. But, like I said, the old way was called lots of unkind things <g>. It is possible that a future compiler may recognize the above as an idiom and rewrite it so the array lookup is done only once.


I will maintain my position that we should simply have a couple of builtins: Previous thread: http://www.digitalmars.com/d/archives/digitalmars/D/26554.html The builtin methods would handle all cases pretty easily: bool get( in key, out value ) bool insert( in key, in value, out oldvalue ) int[ char[] ] map; int n; if ( map.get( "Hello" , n ) ) { ..do something with n.. } else { ... no value in n .. } if ( map.insert( "Hello", 200, n ) ) { .. previous value in n.. } else { .. no previous value was in map.. } Built in methods are overloadable whereas 'in' is not. -DavidM

Oct 31 2005
prev sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
 class Foo
 {
        private Record [char[]] map;

        static class Record
        {
                void write (Foo parent, void[] data) {}
        }

        synchronized void put (char[] key, void[] data)
        {
                /****** access violation here *****/
                Record  r = map [key];

                if (r is null)
                   {
                   r = new Record ();
                   map [key] =  r;
                   }
                r.write (this, data);
        }
 }


 void main()
 {
        Foo f = new Foo;
        f.put ("foo", new void[10]);
 }

 # Error: Access Violation

FWIW, MinTL HashAA returns a user-settable missing value on invalid keys. The default missing value is value.init. So you code above would have worked if map was a HashAA!(char[],Record). HashAA also has more methods that tweek things like "contains" and "take". It also supports sorting - by default elements are sorted by insertion order. -Ben
Oct 29 2005
parent kris <fu bar.org> writes:
That all sounds cool, Ben.

Perhaps the core problem with AAs is that they're just trying to do too 
much with the [] syntax? Perhaps if AAs supported a contains(key, inout 
value) property, as Regan noted a year or more ago, then it would be 
more palatable and useful?

On the other hand, templates are more than adequate for handling such 
things; as you've proved with MinTL. Removing AAs would simplify the 
compiler also (obviously). If the template syntax were deemed too 
complicated for new users, perhaps the compiler could provide some 
generic sugar for 'special' templates, instead of all the AA specific 
code? Perhaps some kind of alias (or variation thereupon) might be 
sufficiently sugary?

- Kris


Ben Hinkle wrote:
 "Kris" <fu bar.com> wrote in message news:dju31o$2he6$1 digitaldaemon.com...
 
class Foo
{
       private Record [char[]] map;

       static class Record
       {
               void write (Foo parent, void[] data) {}
       }

       synchronized void put (char[] key, void[] data)
       {
               /****** access violation here *****/
               Record  r = map [key];

               if (r is null)
                  {
                  r = new Record ();
                  map [key] =  r;
                  }
               r.write (this, data);
       }
}


void main()
{
       Foo f = new Foo;
       f.put ("foo", new void[10]);
}

# Error: Access Violation

FWIW, MinTL HashAA returns a user-settable missing value on invalid keys. The default missing value is value.init. So you code above would have worked if map was a HashAA!(char[],Record). HashAA also has more methods that tweek things like "contains" and "take". It also supports sorting - by default elements are sorted by insertion order. -Ben

Oct 29 2005