digitalmars.D.bugs - [Issue 1323] New: Implement opIn

digitalmars.D.bugs - [Issue 1323] New: Implement opIn_r for arrays

d-bugmail puremagic.com (34/35) Jul 08 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323

d-bugmail puremagic.com (18/18) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (64/64) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (8/14) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (56/74) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (7/7) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (31/39) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (15/15) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (15/30) Jul 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (31/53) Jul 10 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (55/67) Jul 10 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (4/5) Jul 10 2007 Whoops, 'course that should be "x in y" ==> "y[x] ok".
d-bugmail puremagic.com (7/14) Nov 04 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (16/16) Aug 12 2010 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (10/10) Jan 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (9/9) Jan 08 2011 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (12/12) Oct 30 2012 http://d.puremagic.com/issues/show_bug.cgi?id=1323
d-bugmail puremagic.com (35/42) Oct 30 2012 http://d.puremagic.com/issues/show_bug.cgi?id=1323

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323

           Summary: Implement opIn_r for arrays
           Product: D
           Version: 2.002
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: wbaxter gmail.com


int[] someInts = getInts();
if (0 in someInts) {
   handleZero(someInts);
}

 ERROR


I just did this again for the umpteenth time so I thought I'd go ahead and post
it as an enhancement request.

In Python at least the 'in' operator works on any sequence type.  There's no
reason why it shouldn't work on regular arrays in D too.  The implementation
should basically be the linear search equivalent of what associative arrays do.
 Something like:

T* opIn_r(T searchval) {
 for(size_t i=0; i<array.length; i++) {
    if (array[i]==searchval) { return &array[i]; }
 }
 return null;
}

Maybe == might not be the right equality check to use (since that would fail
for an array with nulls).  Maybe it should be specialized depending on whether
the elements are build-in/class/struct.  But the idea is that "x in array"
should do /something/ useful.

--bb


--

Jul 08 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323


smjg iname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com





It's been said before, no harm in repeating it here.

For associative arrays,

        X in A

means that there exists a Y for which

        A[X] == Y

According to the semantics that get proposed every time this is brought up, for
linear arrays, it would mean the reverse: there exists a Y for which

        A[Y] == X

This inconsistency might be undesirable from a generic programming POV.

Besides, if we're going to have an operator to check whether a specified
_value_ is in an array, shouldn't it be available for AAs as well?


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323






Workarounds... (that dont work if key and value type are the same)

bool contains(T)(T[] array, T value)
{
        foreach(v; array) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, V value)
{
        foreach(v; array.values) if (v == value) return true;
        return false;
}

bool contains(V, K)(V[K] array, K key)
{
        foreach(k; array.keys) if (k == key) return true;
        return false;
}

int find(T)(T[] array, T value)
{
        foreach(i, v; array) if (v == value) return i;
        return -1;
}

K find(V, K)(V[K] array, V value)
{
        //sadly inefficient
        foreach(v; array.values)
        {
                if (v != value) continue;
                foreach(k; array.keys)
                {
                        if (array[k] == v) return k;
                }
        }
        return K.init;
}

V find(V, K)(V[K] array, K key)
{
        return array[key];
}

void main()
{
        string[char]    ctos;
        char[int]       itoc;
        char[]          carr;
        int[]           iarr;

        ctos['c'] = "string";   
        assert(ctos.contains('c'));
        assert(ctos.contains("string"));

        itoc[5] = 'c';
        assert(itoc.contains(5));
        assert(itoc.contains('c'));

        carr ~= 'c';
        assert(carr.contains('c'));

        iarr ~= 5;
        assert(iarr.contains(5));

        assert(ctos.find('c') == "string");
        assert(ctos.find("string") == 'c');

        assert(itoc.find(5) == 'c');
        assert(itoc.find('c') == 5);

        assert(carr.find('c') == 0);
        assert(iarr.find(5) == 0);
}


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







 Workarounds... (that dont work if key and value type are the same)

Only because you've chosen to give the functions the same names.

 bool contains(V, K)(V[K] array, K key)
 {
         foreach(k; array.keys) if (k == key) return true;
         return false;
 }

What's wrong with
    return cast(bool) (k in array);
?


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







 It's been said before, no harm in repeating it here.
 
 For associative arrays,
 
         X in A
 
 means that there exists a Y for which
 
         A[X] == Y
 
 According to the semantics that get proposed every time this is brought up, for
 linear arrays, it would mean the reverse: there exists a Y for which
 
         A[Y] == X
 
 This inconsistency might be undesirable from a generic programming POV.

I can't see how.  Arrays and associative arrays are different beasts.  You
can't really generically interchange the two, regardless of what the behavior
of 'in' is, because of other semantic differences.  And note that currently
they certainly behave very differently regarding how 'in' works, and there
hasn't been a meltdown because of the lack of ability to generically
interchange arrays and associative arrays on account of that.  

In Python since everything is dynamic and uses 'duck typing' it's really got
more to lose from loss of genericity than D.  In Python you're effectively
*always* writing generic template code.  And yet in Python "foo in alist"
checks if foo is in the alist, and "foo in aa" checks if foo is a key in aa (a
'dict' in Python).  And there has been no outcry over this as far as I know. 
Probably because you usually know before you start coding whether you need a
dict or an array.  It's pretty obvious that if I want to look up people by
their employee ID's that the ID shouldn't be the index in an regular array, an
associative array would be better.


 Besides, if we're going to have an operator to check whether a specified
 _value_ is in an array, shouldn't it be available for AAs as well?

Yes, it would be spelled "foo in AA.values".

Ok, that wouldn't be so efficient since it builds a big temporary array, but
that's a problem with AA.values.  AA's need some keys() and values()-type
methods that return lightweight iterators.  But the iterators situation in D is
a whole 'nother mess.


Arguing from another direction, there's only one 'opIn_r', and that's not
likely to change, so which version of opIn_r is going to save you more work if
implemented for arrays?  An opIn_r for arrays that does (0<=k && k<A.length) or
one which does 

  bool found = false;
  foreach (tmp; A) { if (tmp==k) { found=true; break; } }

The former is already an expression, and so can be used in an if condition, for
example; the latter is a) more typing and b) not an expression.  If you want to
use it as an expression you need to do more typing to wrap it in an function.  

I'd say the former is already convenient enough (and you can even get away with
just k<A.length if k is unsigned).  And really the actual implementation of the
latter should probably be a little more complex to dance around the ticking
time-bomb behavior of == for classes.  Something like:

bool contains(T[] A, T k) {
  bool found = false;
  foreach (v; A) {
    static if (is(T==class)) {
       if (v is k || (k !is null && k==v)) { found=true; break; }
    }
    else {
       if (v==k) { found=true; break; }
    }
  }
  return found;
}

And maybe I've missed something there too.  Like maybe the v should be 'ref'
for better efficiency when it's a big struct type?  Probably it shouldn't be
ref for class types though, so maybe I need another static if there.  Anyway,
that's part of why I want it taken care of for me in the language.  I just want
to type 'x in some_array' and let the system find it for me in the most
efficient way possible (which may include some sort of fancy parallel scanning
in the future).


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323






How about make

(i in array)

go to

((i>=0 && i<array.length) ? &array[i] : null)


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







 How about make
 
 (i in array)
 
 go to
 
 ((i>=0 && i<array.length) ? &array[i] : null)
 

Whichever meaning it is given it should probably return a useful pointer like
the opIn for AA's does.  But I'd argue the above is pretty useless or at least
brief enough already to not be worth the effort.  Especially since now you've
got a pointer that might or might not be null, so the next thing you're
probably going to do is turn around and check if it's null and if not
dereference it.  So it's like saying:

auto x = ((i>=0 && i<array.length) ? &array[i] : null);
if (x !is null) {
   auto X = *x;
   ...
}
That's just silly when you can instead do
if (i>=0 && i<array.length) {
   auto X = array[i];
}

Do you folks actually have some killer use case for wanting to pretend that
indices are like AA keys?  Or is it just an ideological thing?  Because I
agree, you're totally right on the idealogical front.  We can think of them
both as a map or partial function from one set to another.  Gotcha.  

But it's just not useful.  Like I said, very seldom in my experience are you
ever tempted to throw AA's and regular arrays at the same problem.  You're more
likely to switch an array implementation to a Set implementation of some sort. 
And guess what, if you implement said Set using AA's then you'll be putting the
contents of the arrays in the _keys_, not the values, using a bool[T] or
somesuch.  So if X in Y checks keys for AA's then, in this common use case, it
would be better for "generic code" if X in Y checked the values of the arrays
rather than the indices.


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323






as to in returning null, that is what 'in' does with AA's; if the key is in the
AA return a pointer to the value, otherwise, return null.

In fact, the code I wrote works the same as 'in' for AA's

The usual way to use the return from 'in' where it might give null is this:

if(auto v = i in j)
{
  use(*v);
}
else
{
  // i not in j
}


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







 as to in returning null, that is what 'in' does with AA's; if the key is in the
 AA return a pointer to the value, otherwise, return null.
 
 In fact, the code I wrote works the same as 'in' for AA's
 
 The usual way to use the return from 'in' where it might give null is this:
 
 if(auto v = i in j)
 {
   use(*v);
 }
 else
 {
   // i not in j
 }

Sure, but my point was that it doesn't gain you any efficiency or clarity in
the case of arrays.  You might as well do:

if(0<=i && i<j.length)
{
  use(j[i]);
}
else
{
  // i not in j
}

It's only a few characters longer and clearer in my opinion.


--

Jul 09 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







<snip>
 This inconsistency might be undesirable from a generic 
 programming POV.

 
 I can't see how.  Arrays and associative arrays are different 
 beasts.  You can't really generically interchange the two, 
 regardless of what the behavior of 'in' is, because of other 
 semantic differences.

Other than in how you can add/remove elements, are there many differences that
actually compromise their interchangeability?

For example, suppose you want a function that iterates over the keys and
values, and possibly changes some of the values it encounters.  This is already
something that can be done for LAs and AAs alike with the same code.

 And note that currently they certainly behave very differently 
 regarding how 'in' works,

Associative arrays: in checks if the key is there
Linear arrays: in doesn't work at all

So why is this an issue?

 and there hasn't been a meltdown because of the lack of 
 ability to generically interchange arrays and associative 
 arrays on account of that.

That doesn't mean it won't happen.

Suppose you want a container indexed by non-negative integers, and a moderate
proportion of indexes within some range will actually have some useful value on
the other end.  You could use an AA, or you could use an LA and fill in the
blanks with ValueType.init or something (which might be more efficient).  At
some point you change your mind about which to use.  You're going to need to
change some code anyway, and the compiler would likely catch quite a few
instances.  But if in does something semantically completely different because
you've changed the internal data structure, it could easily lead to bugs.

<snip>
 Besides, if we're going to have an operator to check whether 
 a specified _value_ is in an array, shouldn't it be available 
 for AAs as well?

 
 Yes, it would be spelled "foo in AA.values".

Yes, you have a point there.  The .values property could even be defined for
LAs to just return the array, which would enable values of either to be
searched in the same way....

<snip>
 And really the actual implementation of the latter should 
 probably be a little more complex to dance around the ticking 
 time-bomb behavior of == for classes.  Something like:
 bool contains(T[] A, T k) {

<snip>
 }

That depends on whether it's implemented using templates or TypeInfo.  But
TypeInfo_C.equals should probably be changed to do take nulls into account. 
After all, TypeInfo_C.compare already does take null references into account.


--

Jul 10 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323








 <snip>
 This inconsistency might be undesirable from a generic 
 programming POV.

 
 I can't see how.  Arrays and associative arrays are different 
 beasts.  You can't really generically interchange the two, 
 regardless of what the behavior of 'in' is, because of other 
 semantic differences.

 
 Other than in how you can add/remove elements, are there many differences that
 actually compromise their interchangeability?

Well the biggies to me are 
1) "keys" in a linear array must be of type size_t.
2) "keys" in a linear array must contiguous.
   (You can say you'll put a special value in the slots you're not using, but
the hypothetical built-in 'in' operator for arrays will have no knowledge of
your convention.)

We can represent any set!(T) using an array T[].
We cannot represent all AAs T[K] using an array, because we need two arbitrary
types and arrays only have one.

I can see that it might be nice to be able to write generically:
   if (x in y) 
      return y[x];

But if you really need to write an algorithm that works generically on both
arrays and AAs, you're going to need a whole lot more than that one syntactic
similarity to make it work.  So you might as well include something like the
'contains' functions that Regan posted.

Ok, I can see that pretty much any argument here can be posed either way. 
Here's the way I see it:

LA's as a special case of AA's.
* Special case, only very particular AA's can be implemented using an LA
* Syntactically/theoretically simple -- x in y  means you can say y[x]
* Use case is kind of specialized
    (Only applies if you're treating contiguous size_t's as keys)
* No real need:
    - Checking if an index is in range is already easy (i<A.length).
    - If you need an AA just use an AA.  It's built-in!
      (I believe it's pretty uncommon not to know up front if you'll need to
support discontiguous indexes or not).
* Leads to uninuitive code like 
     assert(2 in ["Hi", "There", "Bob"]);
     (Ask five random beginning programmers if this is true or not...)
* Lacks precedent?  Is there some other language where this is used?

LA's as sets:
* General case -- a set can always be implemented as an LA
* Syntactically non-uniform "x in y"==>"x[y] ok" for AA, but not for LA.
  But then again lots of other syntax is different between AA and LA too:
     ~= doesn't work on AA
     assigning length doesn't work on AA
     .remove, .rehash, .keys, .values don't work on LA
     foo["hi"] doesn't work on an any type of LA
* Plenty of use cases from everyday coding in the trenches:
    char[][] what_we_seek = ["hi", "there", "bob"];
    foreach(w; words) { if(w in what_we_seek) { return true; }
* Useful
    - Checking if a value is in an array is not nearly as easy as checking if
an index is valid for the array.
    - D lacks any sort of built in set or bag, so makes arrays more usable as
"poor man's set/bag"
* Intuitive:  assert("Bob" in ["Jill","Jim","Bob"])
     (Again, ask five random beginning programmers if this is true or not...)
* Has precedent in Python's behavior.


--

Jul 10 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323







 * Syntactically non-uniform "x in y"==>"x[y] ok" for AA, but not for LA.

Whoops, 'course that should be "x in y" ==> "y[x] ok".


--

Jul 10 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323






Walter weighs in on the NG:

Walter Bright wrote:
 Denton Cockburn wrote:
 I just figure it's done often enough that adding it wouldn't be hard,
 especially considering the keyword is already there, and with the same
 general meaning.

 
 With AAs, the 'in' expression looks for the key, not the value. It would 
 be inconsistent to overload it to search for values in regular arrays.

digitalmars.com digitalmars.D:61103
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=61103


--

Nov 04 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc



Associative arrays are a set (where each item has associated one value), where
item order doesn't matter.
While arrays are an ordered sequence, each item has one or less predecessor and
one or less successor.
In both cases I often want to know if an item is present in this set or this
ordered sequence.
Walter is wrong on this.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Aug 12 2010

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |andrei metalanguage.com
         Resolution|                            |WONTFIX


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 08 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323




The reverse "in" operator for built-in arrays is a really useful feature, so
useful that "Although practicality beats purity." as Python Zen says.

I am not reopening this yet, but closing this needs some discussion, it's not a
uncontroversial issue.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Jan 08 2011

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323




---
I think this absolutely needs to be reconsidered. The success of the same
semantics in Python should be enough proof of the convenience and low risk of
this feature.

On top of that, the `in' operator is overloadable anyway, meaning it already
has semantic ambiguity for different types. Should `~' be considered an issue
since it can accept both T and T[]?

Practicality seems preferable here. I think this should be reopened.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Oct 30 2012

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1323


Jonathan M Davis <jmdavisProg gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg gmx.com



PDT ---
 I think this absolutely needs to be reconsidered. The success of the same
 semantics in Python should be enough proof of the convenience and low risk of
 this feature.

 On top of that, the `in' operator is overloadable anyway, meaning it already
 has semantic ambiguity for different types. Should `~' be considered an issue
 since it can accept both T and T[]?

 Practicality seems preferable here. I think this should be reopened.

We care a lot more about efficiency than python does, and we hold to one of the
key concepts that C++ does - that certain operators and operations need to have
a maximum algorithmic complexity. A prime example of this is std.container. All
of the common operations have a maximum algorithmic complexity, and any
container which can't implement them with tha complexity, doesn't have them.
For instance, SList and DList don't have a length property, because they can't
implement it in O(1). The same goes for range-base operations. front, popFront,
length, slicing, etc. have to be O(1). If they can't be, then they're not
implemented. That's why narrow strings aren't sliceable, have no length, and
don't provide random access. They can't do it in O(1).

In the case of in, it's required to have at most O(log n), which is what it
takes to search for an element in a balanced binary tree. AAs can do it in
O(1), which is even better, so they get in. Dynamic arrays require O(n), which
is worse than O(log n), so they don't have it and never will. If we allowed
them to have in, then functions could not rely on in's algorithmic complexity,
which would make it useless for generic code.

Of course, it's true that anyone can overload an operator or define a function
which has worse algorithmic complexity than it's required to have (e.g. they
define length on a range which has to count all of its elements to get its
length, making it O(n) instead of the required O(1)), but in that case, the
programmer screwed up, and it's their fault when algorithms using their type
have horrible performance. But as long as they implement those functions with
the correct algorithmic complexity, then algorithms using those functions can
guarantee a certain level of performance. They can't do that if they can't rely
on the maximum algorithmic complexity of the functions and operations that they
use.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Oct 30 2012

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 1323] New: Implement opIn_r for arrays