www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1323] New: Implement opIn_r for arrays

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323

           Summary: Implement opIn_r for arrays
           Product: D
           Version: 2.002
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: wbaxter gmail.com


int[] someInts = getInts();
if (0 in someInts) {
   handleZero(someInts);
}

 ERROR
I just did this again for the umpteenth time so I thought I'd go ahead and post it as an enhancement request. In Python at least the 'in' operator works on any sequence type. There's no reason why it shouldn't work on regular arrays in D too. The implementation should basically be the linear search equivalent of what associative arrays do. Something like: T* opIn_r(T searchval) { for(size_t i=0; i<array.length; i++) { if (array[i]==searchval) { return &array[i]; } } return null; } Maybe == might not be the right equality check to use (since that would fail for an array with nulls). Maybe it should be specialized depending on whether the elements are build-in/class/struct. But the idea is that "x in array" should do /something/ useful. --bb --
Jul 08 2007
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com





It's been said before, no harm in repeating it here.

For associative arrays,

        X in A

means that there exists a Y for which

        A[X] == Y

According to the semantics that get proposed every time this is brought up, for
linear arrays, it would mean the reverse: there exists a Y for which

        A[Y] == X

This inconsistency might be undesirable from a generic programming POV.

Besides, if we're going to have an operator to check whether a specified
_value_ is in an array, shouldn't it be available for AAs as well?


-- 
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323






Workarounds... (that dont work if key and value type are the same)

bool contains(T)(T[] array, T value)
{
        foreach(v; array) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, V value)
{
        foreach(v; array.values) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, K key)
{
        foreach(k; array.keys) if (k == key) return true;
        return false;
}

int find(T)(T[] array, T value)
{
        foreach(i, v; array) if (v == value) return i;
        return -1;
}

K find(V, K)(V[K] array, V value)
{
        //sadly inefficient
        foreach(v; array.values)
        {
                if (v != value) continue;
                foreach(k; array.keys)
                {
                        if (array[k] == v) return k;
                }
        }
        return K.init;
}

V find(V, K)(V[K] array, K key)
{
        return array[key];
}

void main()
{
        string[char]    ctos;
        char[int]       itoc;
        char[]          carr;
        int[]           iarr;

        ctos['c'] = "string";   
        assert(ctos.contains('c'));
        assert(ctos.contains("string"));

        itoc[5] = 'c';
        assert(itoc.contains(5));
        assert(itoc.contains('c'));

        carr ~= 'c';
        assert(carr.contains('c'));

        iarr ~= 5;
        assert(iarr.contains(5));

        assert(ctos.find('c') == "string");
        assert(ctos.find("string") == 'c');

        assert(itoc.find(5) == 'c');
        assert(itoc.find('c') == 5);

        assert(carr.find('c') == 0);
        assert(iarr.find(5) == 0);
}


-- 
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







 Workarounds... (that dont work if key and value type are the same)
Only because you've chosen to give the functions the same names.
 bool contains(V, K)(V[K] array, K key)
 {
         foreach(k; array.keys) if (k == key) return true;
         return false;
 }
What's wrong with return cast(bool) (k in array); ? --
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







 It's been said before, no harm in repeating it here.
 
 For associative arrays,
 
         X in A
 
 means that there exists a Y for which
 
         A[X] == Y
 
 According to the semantics that get proposed every time this is brought up, for
 linear arrays, it would mean the reverse: there exists a Y for which
 
         A[Y] == X
 
 This inconsistency might be undesirable from a generic programming POV.
I can't see how. Arrays and associative arrays are different beasts. You can't really generically interchange the two, regardless of what the behavior of 'in' is, because of other semantic differences. And note that currently they certainly behave very differently regarding how 'in' works, and there hasn't been a meltdown because of the lack of ability to generically interchange arrays and associative arrays on account of that. In Python since everything is dynamic and uses 'duck typing' it's really got more to lose from loss of genericity than D. In Python you're effectively *always* writing generic template code. And yet in Python "foo in alist" checks if foo is in the alist, and "foo in aa" checks if foo is a key in aa (a 'dict' in Python). And there has been no outcry over this as far as I know. Probably because you usually know before you start coding whether you need a dict or an array. It's pretty obvious that if I want to look up people by their employee ID's that the ID shouldn't be the index in an regular array, an associative array would be better.
 Besides, if we're going to have an operator to check whether a specified
 _value_ is in an array, shouldn't it be available for AAs as well?
Yes, it would be spelled "foo in AA.values". Ok, that wouldn't be so efficient since it builds a big temporary array, but that's a problem with AA.values. AA's need some keys() and values()-type methods that return lightweight iterators. But the iterators situation in D is a whole 'nother mess. Arguing from another direction, there's only one 'opIn_r', and that's not likely to change, so which version of opIn_r is going to save you more work if implemented for arrays? An opIn_r for arrays that does (0<=k && k<A.length) or one which does bool found = false; foreach (tmp; A) { if (tmp==k) { found=true; break; } } The former is already an expression, and so can be used in an if condition, for example; the latter is a) more typing and b) not an expression. If you want to use it as an expression you need to do more typing to wrap it in an function. I'd say the former is already convenient enough (and you can even get away with just k<A.length if k is unsigned). And really the actual implementation of the latter should probably be a little more complex to dance around the ticking time-bomb behavior of == for classes. Something like: bool contains(T[] A, T k) { bool found = false; foreach (v; A) { static if (is(T==class)) { if (v is k || (k !is null && k==v)) { found=true; break; } } else { if (v==k) { found=true; break; } } } return found; } And maybe I've missed something there too. Like maybe the v should be 'ref' for better efficiency when it's a big struct type? Probably it shouldn't be ref for class types though, so maybe I need another static if there. Anyway, that's part of why I want it taken care of for me in the language. I just want to type 'x in some_array' and let the system find it for me in the most efficient way possible (which may include some sort of fancy parallel scanning in the future). --
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323






How about make

(i in array)

go to

((i>=0 && i<array.length) ? &array[i] : null)


-- 
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







 How about make
 
 (i in array)
 
 go to
 
 ((i>=0 && i<array.length) ? &array[i] : null)
 
Whichever meaning it is given it should probably return a useful pointer like the opIn for AA's does. But I'd argue the above is pretty useless or at least brief enough already to not be worth the effort. Especially since now you've got a pointer that might or might not be null, so the next thing you're probably going to do is turn around and check if it's null and if not dereference it. So it's like saying: auto x = ((i>=0 && i<array.length) ? &array[i] : null); if (x !is null) { auto X = *x; ... } That's just silly when you can instead do if (i>=0 && i<array.length) { auto X = array[i]; } Do you folks actually have some killer use case for wanting to pretend that indices are like AA keys? Or is it just an ideological thing? Because I agree, you're totally right on the idealogical front. We can think of them both as a map or partial function from one set to another. Gotcha. But it's just not useful. Like I said, very seldom in my experience are you ever tempted to throw AA's and regular arrays at the same problem. You're more likely to switch an array implementation to a Set implementation of some sort. And guess what, if you implement said Set using AA's then you'll be putting the contents of the arrays in the _keys_, not the values, using a bool[T] or somesuch. So if X in Y checks keys for AA's then, in this common use case, it would be better for "generic code" if X in Y checked the values of the arrays rather than the indices. --
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323






as to in returning null, that is what 'in' does with AA's; if the key is in the
AA return a pointer to the value, otherwise, return null.

In fact, the code I wrote works the same as 'in' for AA's

The usual way to use the return from 'in' where it might give null is this:

if(auto v = i in j)
{
  use(*v);
}
else
{
  // i not in j
}


-- 
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







 as to in returning null, that is what 'in' does with AA's; if the key is in the
 AA return a pointer to the value, otherwise, return null.
 
 In fact, the code I wrote works the same as 'in' for AA's
 
 The usual way to use the return from 'in' where it might give null is this:
 
 if(auto v = i in j)
 {
   use(*v);
 }
 else
 {
   // i not in j
 }
Sure, but my point was that it doesn't gain you any efficiency or clarity in the case of arrays. You might as well do: if(0<=i && i<j.length) { use(j[i]); } else { // i not in j } It's only a few characters longer and clearer in my opinion. --
Jul 09 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







<snip>
 This inconsistency might be undesirable from a generic 
 programming POV.
I can't see how. Arrays and associative arrays are different beasts. You can't really generically interchange the two, regardless of what the behavior of 'in' is, because of other semantic differences.
Other than in how you can add/remove elements, are there many differences that actually compromise their interchangeability? For example, suppose you want a function that iterates over the keys and values, and possibly changes some of the values it encounters. This is already something that can be done for LAs and AAs alike with the same code.
 And note that currently they certainly behave very differently 
 regarding how 'in' works,
Associative arrays: in checks if the key is there Linear arrays: in doesn't work at all So why is this an issue?
 and there hasn't been a meltdown because of the lack of 
 ability to generically interchange arrays and associative 
 arrays on account of that.
That doesn't mean it won't happen. Suppose you want a container indexed by non-negative integers, and a moderate proportion of indexes within some range will actually have some useful value on the other end. You could use an AA, or you could use an LA and fill in the blanks with ValueType.init or something (which might be more efficient). At some point you change your mind about which to use. You're going to need to change some code anyway, and the compiler would likely catch quite a few instances. But if in does something semantically completely different because you've changed the internal data structure, it could easily lead to bugs. <snip>
 Besides, if we're going to have an operator to check whether 
 a specified _value_ is in an array, shouldn't it be available 
 for AAs as well?
Yes, it would be spelled "foo in AA.values".
Yes, you have a point there. The .values property could even be defined for LAs to just return the array, which would enable values of either to be searched in the same way.... <snip>
 And really the actual implementation of the latter should 
 probably be a little more complex to dance around the ticking 
 time-bomb behavior of == for classes.  Something like:
 bool contains(T[] A, T k) {
<snip>
 }
That depends on whether it's implemented using templates or TypeInfo. But TypeInfo_C.equals should probably be changed to do take nulls into account. After all, TypeInfo_C.compare already does take null references into account. --
Jul 10 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323








 <snip>
 This inconsistency might be undesirable from a generic 
 programming POV.
I can't see how. Arrays and associative arrays are different beasts. You can't really generically interchange the two, regardless of what the behavior of 'in' is, because of other semantic differences.
Other than in how you can add/remove elements, are there many differences that actually compromise their interchangeability?
Well the biggies to me are 1) "keys" in a linear array must be of type size_t. 2) "keys" in a linear array must contiguous. (You can say you'll put a special value in the slots you're not using, but the hypothetical built-in 'in' operator for arrays will have no knowledge of your convention.) We can represent any set!(T) using an array T[]. We cannot represent all AAs T[K] using an array, because we need two arbitrary types and arrays only have one. I can see that it might be nice to be able to write generically: if (x in y) return y[x]; But if you really need to write an algorithm that works generically on both arrays and AAs, you're going to need a whole lot more than that one syntactic similarity to make it work. So you might as well include something like the 'contains' functions that Regan posted. Ok, I can see that pretty much any argument here can be posed either way. Here's the way I see it: LA's as a special case of AA's. * Special case, only very particular AA's can be implemented using an LA * Syntactically/theoretically simple -- x in y means you can say y[x] * Use case is kind of specialized (Only applies if you're treating contiguous size_t's as keys) * No real need: - Checking if an index is in range is already easy (i<A.length). - If you need an AA just use an AA. It's built-in! (I believe it's pretty uncommon not to know up front if you'll need to support discontiguous indexes or not). * Leads to uninuitive code like assert(2 in ["Hi", "There", "Bob"]); (Ask five random beginning programmers if this is true or not...) * Lacks precedent? Is there some other language where this is used? LA's as sets: * General case -- a set can always be implemented as an LA * Syntactically non-uniform "x in y"==>"x[y] ok" for AA, but not for LA. But then again lots of other syntax is different between AA and LA too: ~= doesn't work on AA assigning length doesn't work on AA .remove, .rehash, .keys, .values don't work on LA foo["hi"] doesn't work on an any type of LA * Plenty of use cases from everyday coding in the trenches: char[][] what_we_seek = ["hi", "there", "bob"]; foreach(w; words) { if(w in what_we_seek) { return true; } * Useful - Checking if a value is in an array is not nearly as easy as checking if an index is valid for the array. - D lacks any sort of built in set or bag, so makes arrays more usable as "poor man's set/bag" * Intuitive: assert("Bob" in ["Jill","Jim","Bob"]) (Again, ask five random beginning programmers if this is true or not...) * Has precedent in Python's behavior. --
Jul 10 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323







 * Syntactically non-uniform "x in y"==>"x[y] ok" for AA, but not for LA.
Whoops, 'course that should be "x in y" ==> "y[x] ok". --
Jul 10 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323






Walter weighs in on the NG:

Walter Bright wrote:
 Denton Cockburn wrote:
 I just figure it's done often enough that adding it wouldn't be hard,
 especially considering the keyword is already there, and with the same
 general meaning.
With AAs, the 'in' expression looks for the key, not the value. It would be inconsistent to overload it to search for values in regular arrays.
digitalmars.com digitalmars.D:61103 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=61103 --
Nov 04 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc



Associative arrays are a set (where each item has associated one value), where
item order doesn't matter.
While arrays are an ordered sequence, each item has one or less predecessor and
one or less successor.
In both cases I often want to know if an item is present in this set or this
ordered sequence.
Walter is wrong on this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |andrei metalanguage.com
         Resolution|                            |WONTFIX


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 08 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323




The reverse "in" operator for built-in arrays is a really useful feature, so
useful that "Although practicality beats purity." as Python Zen says.

I am not reopening this yet, but closing this needs some discussion, it's not a
uncontroversial issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 08 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323




---
I think this absolutely needs to be reconsidered. The success of the same
semantics in Python should be enough proof of the convenience and low risk of
this feature.

On top of that, the `in' operator is overloadable anyway, meaning it already
has semantic ambiguity for different types. Should `~' be considered an issue
since it can accept both T and T[]?

Practicality seems preferable here. I think this should be reopened.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 30 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1323


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com



PDT ---
 I think this absolutely needs to be reconsidered. The success of the same
 semantics in Python should be enough proof of the convenience and low risk of
 this feature.
 On top of that, the `in' operator is overloadable anyway, meaning it already
 has semantic ambiguity for different types. Should `~' be considered an issue
 since it can accept both T and T[]?
 Practicality seems preferable here. I think this should be reopened.
We care a lot more about efficiency than python does, and we hold to one of the key concepts that C++ does - that certain operators and operations need to have a maximum algorithmic complexity. A prime example of this is std.container. All of the common operations have a maximum algorithmic complexity, and any container which can't implement them with tha complexity, doesn't have them. For instance, SList and DList don't have a length property, because they can't implement it in O(1). The same goes for range-base operations. front, popFront, length, slicing, etc. have to be O(1). If they can't be, then they're not implemented. That's why narrow strings aren't sliceable, have no length, and don't provide random access. They can't do it in O(1). In the case of in, it's required to have at most O(log n), which is what it takes to search for an element in a balanced binary tree. AAs can do it in O(1), which is even better, so they get in. Dynamic arrays require O(n), which is worse than O(log n), so they don't have it and never will. If we allowed them to have in, then functions could not rely on in's algorithmic complexity, which would make it useless for generic code. Of course, it's true that anyone can overload an operator or define a function which has worse algorithmic complexity than it's required to have (e.g. they define length on a range which has to count all of its elements to get its length, making it O(n) instead of the required O(1)), but in that case, the programmer screwed up, and it's their fault when algorithms using their type have horrible performance. But as long as they implement those functions with the correct algorithmic complexity, then algorithms using those functions can guarantee a certain level of performance. They can't do that if they can't rely on the maximum algorithmic complexity of the functions and operations that they use. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 30 2012