www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - in operator generalization

reply Les Baker <les_baker REMOVEbellsouthREMOVE.net> writes:
Looking through old NG posts I found this one by Ben Hinkle last year 
about extending the "in" operator to support static/dynamic arrays.

http://www.digitalmars.com/d/archives/digitalmars/D/25164

Was this ever totally dismissed?  It just seemed to fall off the radar. 
  I'm liking the concept though; I don't know how many times I've 
written loops in quickie utility programs just to hunt for a item and 
return it.  I would also additionally suggest (if it hasn't already 
been) that "in" be overloadable so that if D supports other data 
structures in the standard library that they can use that syntax as well.

The only disadvantage I can think of is that when a developer sees "in", 
he/she can't assume it's a constant time operation anymore.  I think the 
increase in utility outweights that though.

Thoughts?

Les Baker
Mar 17 2006
next sibling parent reply BCS <BCS_member pathlink.com> writes:
Les Baker wrote:
 Looking through old NG posts I found this one by Ben Hinkle last year 
 about extending the "in" operator to support static/dynamic arrays.
 
 http://www.digitalmars.com/d/archives/digitalmars/D/25164
 
 Was this ever totally dismissed?  It just seemed to fall off the radar. 
  I'm liking the concept though; I don't know how many times I've written 
 loops in quickie utility programs just to hunt for a item and return 
 it.  I would also additionally suggest (if it hasn't already been) that 
 "in" be overloadable so that if D supports other data structures in the 
 standard library that they can use that syntax as well.
 
 The only disadvantage I can think of is that when a developer sees "in", 
 he/she can't assume it's a constant time operation anymore.  I think the 
 increase in utility outweights that though.
 
 Thoughts?
 
 Les Baker

Actually this would be backwards, "in" determines if the given value is a key for the AA. Using it to examine the contents would be something different than it's current meaning, but this might not be a bad idea.
Mar 17 2006
parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
BCS wrote:
 Les Baker wrote:
 Looking through old NG posts I found this one by Ben Hinkle last year
 about extending the "in" operator to support static/dynamic arrays.

 http://www.digitalmars.com/d/archives/digitalmars/D/25164

 Was this ever totally dismissed?  It just seemed to fall off the
 radar.  I'm liking the concept though; I don't know how many times
 I've written loops in quickie utility programs just to hunt for a item
 and return it.  I would also additionally suggest (if it hasn't
 already been) that "in" be overloadable so that if D supports other
 data structures in the standard library that they can use that syntax
 as well.

 The only disadvantage I can think of is that when a developer sees
 "in", he/she can't assume it's a constant time operation anymore.  I
 think the increase in utility outweights that though.

 Thoughts?

 Les Baker

Actually this would be backwards, "in" determines if the given value is a key for the AA. Using it to examine the contents would be something different than it's current meaning, but this might not be a bad idea.

I didn't understand before nor do I understand why this was always an argument against 'in' for arrays? float[MyObject] array; -> stores MyObjects MyObject[] array; -> stores MyObjects AA are different from normal arrays. In AA's the index is the thing you are storing, and it makes sense to search for it.
Mar 17 2006
parent BCS <BCS_member pathlink.com> writes:
Ivan Senji wrote:
 BCS wrote:
 
Les Baker wrote:

Looking through old NG posts I found this one by Ben Hinkle last year
about extending the "in" operator to support static/dynamic arrays.

http://www.digitalmars.com/d/archives/digitalmars/D/25164

Was this ever totally dismissed?  It just seemed to fall off the
radar.  I'm liking the concept though; I don't know how many times
I've written loops in quickie utility programs just to hunt for a item
and return it.  I would also additionally suggest (if it hasn't
already been) that "in" be overloadable so that if D supports other
data structures in the standard library that they can use that syntax
as well.

The only disadvantage I can think of is that when a developer sees
"in", he/she can't assume it's a constant time operation anymore.  I
think the increase in utility outweights that though.

Thoughts?

Les Baker

Actually this would be backwards, "in" determines if the given value is a key for the AA. Using it to examine the contents would be something different than it's current meaning, but this might not be a bad idea.

I didn't understand before nor do I understand why this was always an argument against 'in' for arrays? float[MyObject] array; -> stores MyObjects MyObject[] array; -> stores MyObjects AA are different from normal arrays. In AA's the index is the thing you are storing, and it makes sense to search for it.

First of all, I don't have an opinion with regards to the "in" operator searching through an array. Second, IIRC the intended use of AA is for the keys to be used to reference the content, not for storing the keys. However they do store the keys and would make a good device to store set of things. I think the argument you refer to comes from a desirer for orthogonality. With an AA the "in" operator searches the things that goes in the [], using "in" on a normal array would search the things that come out when you index into the array.
Mar 17 2006
prev sibling next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Fri, 17 Mar 2006 17:50:02 -0500, Les Baker  
<les_baker REMOVEbellsouthREMOVE.net> wrote:
 Looking through old NG posts I found this one by Ben Hinkle last year  
 about extending the "in" operator to support static/dynamic arrays.

 http://www.digitalmars.com/d/archives/digitalmars/D/25164

 Was this ever totally dismissed?  It just seemed to fall off the radar.  
   I'm liking the concept though; I don't know how many times I've  
 written loops in quickie utility programs just to hunt for a item and  
 return it.  I would also additionally suggest (if it hasn't already  
 been) that "in" be overloadable so that if D supports other data  
 structures in the standard library that they can use that syntax as well.

 The only disadvantage I can think of is that when a developer sees "in",  
 he/she can't assume it's a constant time operation anymore.  I think the  
 increase in utility outweights that though.

 Thoughts?

I think it might be better to keep this in library code, i.e. it could be achieved with a utility functions or template. i.e. a template that does a binary search of a (presumed sorted) object which can be indexed with [], etc. It could then be used on dynamic/static arrays and other container style objects. This sort of thing is what I'd expect to see in the proposed DTL library. Regan
Mar 18 2006
parent Les Baker <les_baker REMOVEbellsouthREMOVE.net> writes:
 I think it might be better to keep this in library code, i.e. it could 
 be  achieved with a utility functions or template. i.e. a template that 
 does a  binary search of a (presumed sorted) object which can be indexed 
 with [],  etc. It could then be used on dynamic/static arrays and other 
 container  style objects. This sort of thing is what I'd expect to see 
 in the  proposed DTL library.

You make a very good point, because all "in" is, is a shortcut for a method call. But why even have "in" in the first place, if it's just has one limited usage (looking up a key in an AA)? Looking at Python, IIRC, they allow "in" on lists and tuples, and allow overloading on it (__contains__). Why not D? (Actually D's seems better -- you get the object as a result -- Python's "in" just gives a boolean stating whether the list/dictionary contains the object. Again, that's just from memory, could be wrong) I'm digressing though, although I do desire an increase in orthogonality (Additionally, I like the "out-loud" readability of "in"; it does exactly what it says, and means the exact same thing as the discrete math "in" membership test operator).... I do agree that as much as possible should be kept in the library. But I'm thinking only linear/binary search (based upon the "sortedness" of the array) would have to be added internally. Custom containers (like you mentioned that would be coming in DTL) would implement their own "opIn", so that would definately be kept totally in the standard library, and of course equality testing would be delegated as well. I think this needs more thought, as there is a fine line between having too much built-in and not enough. My main point is this though: why have an operator that can only be used on _one_ type of container? Les Baker
Mar 18 2006
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Les Baker wrote:
 Looking through old NG posts I found this one by Ben Hinkle last year 
 about extending the "in" operator to support static/dynamic arrays.
 
 http://www.digitalmars.com/d/archives/digitalmars/D/25164

There are two problems with that proposal: 1. At the moment, the only use of the in operator is to determine whether a key is present in an AA. Logically, therefore, for linear arrays, x in y should report on whether x is an index within the bounds of y, i.e. x >= 0 && x < y.length This has been talked about before: digitalmars.D/6082 2. The result of the in operator is designed to be directly usable as a boolean value and to do what it says on the tin when used as such. Making it return an index, instead of a pointer as in for AAs does, screws this up totally. As such, under Ben's proposal if (x in y) would be equivalent to if (y.length != 0 && y[0] != x) which is well and truly counter-intuitive. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 20 2006
parent reply Ivan Senji <ivan.senji_REMOVE_ _THIS__gmail.com> writes:
Stewart Gordon wrote:
 Les Baker wrote:
 Looking through old NG posts I found this one by Ben Hinkle last year
 about extending the "in" operator to support static/dynamic arrays.

 http://www.digitalmars.com/d/archives/digitalmars/D/25164

There are two problems with that proposal:

To reply or not to reply :) I see that there are two groups of people when it comes to in and arrays. Much like there are in the bool story.
 
 1. At the moment, the only use of the in operator is to determine
 whether a key is present in an AA.  Logically, therefore, for linear
 arrays,
 
     x in y
 
 should report on whether x is an index within the bounds of y, i.e.

One group thinks this is the way it should work.
 
     x >= 0 && x < y.length

But that doesn't make any sense. A syntax sugar for this is not needed because this expression is not that hard to write. AAs and normal arrays are both called arrays but I see them as completely different things. Let's say I'm implementing a word counting program in D. I could do it like this: int[char[]] words; Then for each word words[word]++; But if I wanted to use normal arrays the one storing words would be: char[] [] words; int[] count; Now I have a word and a common thing would be to find out if it is in 'words'. I would like to be able to simply write something like: if( (auto x = word in words) == -1 ) { words.length = words.length ++; count.length = count.length ++; } else { count[x]++; } The version of 'in' that would be testing arrays index for range would never (IMO) be useful/used. So the problem here is the key-value difference in array-AA. Arrays: Value - the interesting part Key - index, usually not that important AAs: Key - the interesting part Value - some additional information about my key, like number of words, or any other information associated with the interesting part. I am interested in the interesting part when it comes both to arrays and AAs. But I also understand why Walter is probably never going to add this potentially useful feature: There will always be those that disagree.
 
 This has been talked about before:
 
 digitalmars.D/6082
 
 
 2. The result of the in operator is designed to be directly usable as a
 boolean value and to do what it says on the tin when used as such.
 Making it return an index, instead of a pointer as in for AAs does,
 screws this up totally.  As such, under Ben's proposal
 
     if (x in y)
 
 would be equivalent to
 
     if (y.length != 0 && y[0] != x)
 

I don't understand. Are you sure? Isn't what Ben was suggesting something like: if(x in y) is the same as: int index=-1; for(int i=0; i<y.length; i++) { if(y[i]==x) {index = i; break;} }
 which is well and truly counter-intuitive.

Sure is. But what I wrote above makes sense? Doesn't it? Isn't that what everyone would expect in for arrays to do? Actually I don't care that much if it return a pointer or an index. Pointer might be a bit more useful. y[x in y] ++; vs. (*(x in y)) ++; Although this is the way I would expect things to work there seem to be those thinking that there should be a way to check if there is a certain value in the associative array. Is this really needed? Do I really want to know which word has a certain frequency in a text? Actually I had to do something like that once but I was able to do it by constructing a reverse AA (For example char[] [int]), but searching for values in AAs doesn't seem as interesting and useful as searching for values in normal arrays?
Mar 21 2006
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Ivan Senji wrote:
<snip>
 Let's say I'm implementing a word counting program in D.
 I could do it like this:
 
 int[char[]] words;
 Then for each word words[word]++;
 
 But if I wanted to use normal arrays the one storing words would be:

Why would you want to make life harder by using normal arrays instead?
 char[] [] words;
 int[] count;
 
 Now I have a word and a common thing would be to find out if it is in
 'words'.
 
 I would like to be able to simply write something like:
 
 if((auto x = word in words) == -1 )

std.string.find already has these semantics. How about having a version of this function for other array types?
 {
   words.length = words.length ++;

That strikes me as undefined behaviour. Either it's a complete no-op (increment words.length and then reassign the original value) or it'll assign words.length to itself and then increment it.
   count.length = count.length ++;
 }

<snip>
 But what I wrote above makes sense? Doesn't it? Isn't that what everyone
 would expect in for arrays to do?

When people discover that in works on LAs as well as AAs, they will expect the result to be equally usable directly as a boolean value.
 Actually I don't care that much if it return a pointer or an index.
 Pointer might be a bit more useful.
 
 y[x in y] ++; vs. (*(x in y)) ++;

Indeed.
 Although this is the way I would expect things to work there seem to be
 those thinking that there should be a way to check if there is a certain
 value in the associative array. Is this really needed?

I guess it might have a generic programming advantage in templates that can work on either kind of an array. But can anyone think of a practical example?
 Do I really want to know which word has a certain frequency in a text?

That's not the only purpose. It could be used to implement bidirectional mappings in general.
 Actually I had to do something like that once but I was able to do it by
 constructing a reverse AA (For example char[] [int]),

Yes, that's a way to do bidirectional mappings that might be more efficient than value searching at the moment. Yet another approach would be a single data structure with two hash tables. What you say makes sense on the whole. But I recall it being stated somewhere in the docs (I forget where) that every operator has an intended meaning that should be taken into account when overloading. Language builtins in particular have a moral duty to set a good example. It's true that the current meaning of in and that being proposed have something in common, namely that they check for containment of the what you consider the "interesting" feature in each case. But "interesting" is a highly subjective concept, varying both from person to person and from application to application, and you could argue either way on whether a given judgement of "interesting" is an acceptable basis of what the overloaded uses of one operator have in common. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Mar 22 2006