www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - "in" everywhere

reply atommixz <atommixz gmail.com> writes:
It would be nice if it were possible to use the "in" expression wherever
possible. Now it is only implemented for associative. arrays. (Weird).
Examples of how this could be used:
- Find string in string
- Search for a character in a string
- Search for an item in the array, array of characters, array of strings,
tuples, enum, structure
- what else?

In Python done something like this.

Here it would be useful to me
http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d
Oct 07 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

This has been suggested before. The problem is that 'in' is generally considered to be a fast operation (< O(n)) so linear search is out. -Steve
Oct 07 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:
 
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

This has been suggested before. The problem is that 'in' is generally considered to be a fast operation (< O(n)) so linear search is out. -Steve

Is it? I wouldn't expect it to be < O(n) on a regular array, but it'd still be convenient to have instead of iterating over the array and comparing the contents yourself. Cheers, - Daniel
Oct 07 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson 
 <metalcaedes gmail.com> wrote:
 
 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression 
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of 
 strings,
 tuples, enum, structure
 - what else?

generally considered to be a fast operation (< O(n)) so linear search is out. -Steve

Is it? I wouldn't expect it to be < O(n) on a regular array, but it'd still be convenient to have instead of iterating over the array and comparing the contents yourself.

The problem is with generic code. Generic code that accepts some type that defines the "in" operator. What can that generic code expect for performance when using "in"? As a general rule, generic programming must always assume the worst case, and if we have no rules for 'in', the worst case is linear. Which means generic code may not use 'in' when it would be a fast operation. Same thing with indexing. Try sorting a 'random access' range which uses a linear search on opIndex, and see what the performance is. In addition, arr.find(x) seems pretty simple to me. -Steve

I didn't know about arr.find(). Is that documented somewhere? I do however agree that arr.find() is sufficient if it's there. Cheers, - Daniel
Oct 07 2010
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
arr.find()

http://dpldocs.info/std.algorithm.find

With arrays, you can call free functions with the dot member
syntax so import std.algorithm and that just works.
Oct 07 2010
parent Daniel Gibson <metalcaedes gmail.com> writes:
Adam D. Ruppe schrieb:
 arr.find()
 
 http://dpldocs.info/std.algorithm.find
 
 With arrays, you can call free functions with the dot member
 syntax so import std.algorithm and that just works.

Ah ok, I thought it was a property of arrays or something like that. Thanks :-)
Oct 07 2010
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 07/10/2010 15:22, Steven Schveighoffer wrote:
<snip>
 The problem is with generic code. Generic code that accepts some type
 that defines the "in" operator. What can that generic code expect for
 performance when using "in"? As a general rule, generic programming must
 always assume the worst case, and if we have no rules for 'in', the
 worst case is linear. Which means generic code may not use 'in' when it
 would be a fast operation. Same thing with indexing. Try sorting a
 'random access' range which uses a linear search on opIndex, and see
 what the performance is.

Surely, what matters most for generic code is that in has consistent _semantics_, rather than the computational complexity? Stewart.
Oct 09 2010
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 10/10/2010 02:05, Jonathan M Davis wrote:
 On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:

 Surely, what matters most for generic code is that in has consistent
 _semantics_, rather than the computational complexity?

 Stewart.

_Both_ matter.

True, yet you don't address semantic differences at all in what follows, so it doesn't really follow on from what I said. But anyway....
 Imagine, for instance, if we were to make Java's mistake of having List be an
 interface which ArrayList and LinkedList implement and that List has a get()
 function. Using ArrayList, get() has a complexity of O(1). Using LinkedList, it
 has a complexity of O(n).

Are you suggesting we just don't have a List interface, just to prevent people from writing algorithms that work well in some list implementations but poorly in others? I'm not sure if that risk outweighs any benefits....
 Now, imagine code written with List rather than
 ArrayList or LinkedList directly. If you use get(), then your code will vary
 considerably in how efficient is. In small cases, this won't matter, but with
 large lists, it could matter a lot. Your generic code which uses List _has_ to
 worry about the computational complexity of get() or it could become very
 inefficient. In this case, for most situations, that would mean using a foreach
 loop or iterators directly, but using get() is likely a _bad_ idea. You need to
 know the computational complexity of get() in order to use List efficiently.

Indeed. But what is an efficient algorithm depends on how the data structure works in the first place, not just the computational complexity of particular operations. It is perhaps for this reason that it is foolish to try to write such generic algorithms. An example of this is Collections.sort(List) in Java, which gets around it by dumping the list into an array, an operation which itself adds overhead. Better would be to have sort as a method of List, so that for each list class an efficient implementation of sort can be provided.
 To write efficient algorithms, you _must_ be able to rely on certain complexity
 guarantees for the operations that you use. So, regardless of what those
 complexity gurantees actually _are_, they need to be there.

Setting the complexity guarantee of everything to O(n!) would achieve _that_ aim in most everyday cases.... :) Stewart.
Oct 12 2010
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
On 13/10/2010 21:11, Steven Schveighoffer wrote:
<snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

And make get assert(false) if a fast get isn't implementable? I think this is bug-prone - somebody may use it forgetting that it won't work on all List implementations, and then the algorithm will refuse at runtime to work at all rather than degrading gracefully to something that does. But something else that would be useful is methods in the interface to tell which operations are fast and which are slow. Stewart.
Oct 14 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com> wrote:

 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

considered to be a fast operation (< O(n)) so linear search is out. -Steve

Is it? I wouldn't expect it to be < O(n) on a regular array, but it'd still be convenient to have instead of iterating over the array and comparing the contents yourself.

The problem is with generic code. Generic code that accepts some type that defines the "in" operator. What can that generic code expect for performance when using "in"? As a general rule, generic programming must always assume the worst case, and if we have no rules for 'in', the worst case is linear. Which means generic code may not use 'in' when it would be a fast operation. Same thing with indexing. Try sorting a 'random access' range which uses a linear search on opIndex, and see what the performance is. In addition, arr.find(x) seems pretty simple to me. -Steve
Oct 07 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei
Oct 07 2010
next sibling parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird. Cheers, - Daniel
Oct 07 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me. Andrei
Oct 07 2010
parent reply Daniel Gibson <metalcaedes gmail.com> writes:
Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression 
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d 

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me.

So what about static arrays? Cheers, - Daniel
Oct 07 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me.

So what about static arrays?

Same deal - same as literal arrays: they can be searched. The expected run time is known during compilation and independent of the input size. Andrei
Oct 07 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?
 
 In Python done something like this.
 
 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 
 
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d
 

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me.

So what about static arrays?

Same deal - same as literal arrays: they can be searched. The expected run time is known during compilation and independent of the input size.

What about a static array of 200000 elements? At which point does it become linear? I'm okay with it for compile-time constant literals, because the values are known at compile time and the compiler can optimize the search at compile time (so it doesn't have to be linear), but for static array where the content isn't known at compile time, I'd not allow it. Perhaps it could be allowed with a tuple. :-) if (x in ("abcde", "asd")) { ... } -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 07 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 12:11 CDT, Michel Fortin wrote:
 On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays.
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py



 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me.

So what about static arrays?

Same deal - same as literal arrays: they can be searched. The expected run time is known during compilation and independent of the input size.

What about a static array of 200000 elements? At which point does it become linear?

At no point. "Linear" means "linear in the input size". I don't think such arguments are valid. Andrei
Oct 07 2010
next sibling parent Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 On 10/7/10 12:11 CDT, Michel Fortin wrote:
 On 2010-10-07 11:47:21 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 10:39 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 9:59 CDT, Daniel Gibson wrote:
 Andrei Alexandrescu schrieb:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression
 wherever
 possible. Now it is only implemented for associative. arrays.
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py 




 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d 

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me.

So what about static arrays?

Same deal - same as literal arrays: they can be searched. The expected run time is known during compilation and independent of the input size.

What about a static array of 200000 elements? At which point does it become linear?

At no point. "Linear" means "linear in the input size". I don't think such arguments are valid.

I actually see a very significant difference between 'in' on a compile-time array literal, vs 'in' on a static array. When all values are known, the compiler can make a perfect hash. So potentially it could be O(1). But 'find' on a static array will run the same function which would be used for a dynamic array. The difference is theoretical ONLY -- the exact same asm instructions will be executed! In fact, x in ["abc", "def"] could be supported by allowing implicit conversion from array literals to AA literals. ["abc", "def"] -----> ["abc" : 0, "def" : 1]
Oct 07 2010
prev sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 At no point. "Linear" means "linear in the input size". I don't think 
 such arguments are valid.

It is linear in regard to the array length, the static array being the input. That the length is known at compile time doesn't make it less of an input for the "in" operator, even though it's not an input of the program at runtime. Let's say for instance that you have a big static array of keywords and you want to check if a string contains one of those keywords, would you really do this? string[200] keywords = ["for", "foreach", "auto", .... 200 keywords .... ]; foreach (word; listOfWords) if (word in keywords) writeln("Found a keyword! ", word); That's so easy to read and write! but so stupid at the same time if "in" performs a linear search. In the example above, if "keywords" was immutable then the compiler could optimize things by using a good search strategy, but otherwise it's just ridiculous to allow it to compile. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 07 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 16:23 CDT, Michel Fortin wrote:
 On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 At no point. "Linear" means "linear in the input size". I don't think
 such arguments are valid.

It is linear in regard to the array length, the static array being the input. That the length is known at compile time doesn't make it less of an input for the "in" operator, even though it's not an input of the program at runtime.

"Input" == "Input to the program" i.e. not known during compilation of the program. Andrei
Oct 07 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-10-07 18:53:51 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 10/7/10 16:23 CDT, Michel Fortin wrote:
 On 2010-10-07 14:38:50 -0400, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 
 At no point. "Linear" means "linear in the input size". I don't think
 such arguments are valid.

It is linear in regard to the array length, the static array being the input. That the length is known at compile time doesn't make it less of an input for the "in" operator, even though it's not an input of the program at runtime.

"Input" == "Input to the program" i.e. not known during compilation of the program.

Sorry, but my interpretation in this context is that "Input" should be "Input to the 'in' operator". Sure, it won't affect the complexity of the program if the input of the operator is constant, but the complexity of that operator is still linear in the sense that the time it takes for one search scales linearly with the size of the input. That the size of the input is decided at runtime or at compile time does not change the complexity of the operator, nor does it make it less stupid to use the operator on arrays longer than a few elements. "in" shouldn't be allowed to perform a linear operation, and I mean linear relative to the size of it's input. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Oct 07 2010
prev sibling parent reply "Manfred_Nowak" <svv1999 hotmail.com> writes:
Andrei Alexandrescu wrote:

 The size of the operand is constant, known, and visible.

 The expected run time is known during compilation and independent of
 the input size.

Seems that you are argumenting for some elaborated sort of hungarian notation: operators should be attributed by some otherwise hidden expectations of their runtime? -manfred
Oct 07 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 15:00 CDT, Manfred_Nowak wrote:
 Andrei Alexandrescu wrote:

 The size of the operand is constant, known, and visible.

 The expected run time is known during compilation and independent of
 the input size.

Seems that you are argumenting for some elaborated sort of hungarian notation: operators should be attributed by some otherwise hidden expectations of their runtime? -manfred

All I'm saying is that complexity is not an implementation detail. Andrei
Oct 07 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, October 07, 2010 08:37:19 Andrei Alexandrescu wrote:
 
 I'm a bit leary of adopting this feature (it has been discussed). To
 me "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

That feels inconsistent.. to be able to use it with "literal arrays to their right" (and what about fixed size arrays?) but not with actual arrays and dynamic arrays seems weird.

It's not. It's all about constant size in the size of the input vs. arbitrary size. Makes perfect sense to me. Andrei

It feels inconsistent because it only works on a particular type (arrays) some of the time. At least with associative arrays it _always_ works. You don't have to worry about whether it's legal in a particular context. Being able to use in on only a subset of arrays (like array literals) would not be consistent with regards to type (even if it's consistent with regards to complexity) and could lead to confusion. Personally, I think that find() and indexof() do the the job just fine and I see no need to expand in. Regardless, I certainly don't want to have in working on arrays only some of the time. I completely agree with Daniel that doing so would be inconsistent. - Jonathan M Davis
Oct 07 2010
prev sibling next sibling parent Marianne Gagnon <auria.mg gmail.com> writes:
IMO this could be added, with the documentation specifying the implementation
doesn't need to best fast. So people who want quick development for things that
don't need high performance may use it, while people coding for performance
just need not use it. The compiler may try to optimize it if it can.

I have seen python programs avoid the use of "in" since it was slower

About substring I don't know, though. But I think it makes sense for
collections (arrays) and opIn or so could be added so other types of
collections may use the "in" keyword perhaps

-- Auria

 On 10/7/10 6:54 CDT, atommixz wrote:
 I'm a bit leary of adopting this feature (it has been discussed). To me 
 "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to 
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.
 
 
 Andrei

Oct 07 2010
prev sibling next sibling parent reply Austin Hastings <ah08010-d yahoo.com> writes:
On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

This is a non-argument. "In" is an operator. "*" is an operator. Everyone "knew" that multiplying longs was slower than multiplying ints. Everyone "knew" that multiplying floating point was slower than multiplying integers. But * was still defined for floating point types because it made sense. If querying a collection for the presence of a key makes sense (and it does to me) then the operator should work across as many collections as possible, regardless of the running time. (The presence of optimized variations is a bonus. Not a requirement.) =Austin
Oct 07 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 12:28 CDT, Austin Hastings wrote:
 On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py


 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

This is a non-argument. "In" is an operator. "*" is an operator. Everyone "knew" that multiplying longs was slower than multiplying ints. Everyone "knew" that multiplying floating point was slower than multiplying integers. But * was still defined for floating point types because it made sense. If querying a collection for the presence of a key makes sense (and it does to me) then the operator should work across as many collections as possible, regardless of the running time. (The presence of optimized variations is a bonus. Not a requirement.) =Austin

Sorry, I disagree with this so completely, I wouldn't know where to start. Andrei
Oct 07 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 13:28:51 -0400, Austin Hastings <ah08010-d yahoo.com>  
wrote:

 On 10/7/2010 10:52 AM, Andrei Alexandrescu wrote:
 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py

 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

This is a non-argument. "In" is an operator. "*" is an operator. Everyone "knew" that multiplying longs was slower than multiplying ints. Everyone "knew" that multiplying floating point was slower than multiplying integers. But * was still defined for floating point types because it made sense.

No, this is not just a difference in runtime, this is a difference in complexity. The complexity of in on an associative array is O(1), the complexity of in on an array is O(n). This is a huge difference from comparing multiplying floats and ints. in's complexity should be O(lg(n)) or better. I also disagree with Andrei that in could work on arrays in certain cases. It has no business being called on an array, we already have find. -Steve
Oct 07 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 16:32:22 -0400, Don <nospam nospam.com> wrote:

 I actually see a very significant difference between 'in' on a  
 compile-time array literal, vs 'in' on a static array. When all values  
 are known, the compiler can make a perfect hash. So potentially it could  
 be O(1).

 But 'find' on a static array will run the same function which would be  
 used for a dynamic array. The difference is theoretical ONLY -- the  
 exact same asm instructions will be executed!

 In fact,   x in ["abc", "def"] could be supported by allowing implicit  
 conversion from array literals to AA literals.
 ["abc", "def"] -----> ["abc" : 0, "def" : 1]

Let's first make literals not allocate heap data or else the big-O complexity don't matter at all :) -Steve
Oct 07 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 19:28:59 -0400, Michel Fortin  
<michel.fortin michelf.com> wrote:

 On 2010-10-07 18:53:51 -0400, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> said:
  "Input" == "Input to the program" i.e. not known during compilation of  
 the program.

Sorry, but my interpretation in this context is that "Input" should be "Input to the 'in' operator". Sure, it won't affect the complexity of the program if the input of the operator is constant, but the complexity of that operator is still linear in the sense that the time it takes for one search scales linearly with the size of the input. That the size of the input is decided at runtime or at compile time does not change the complexity of the operator, nor does it make it less stupid to use the operator on arrays longer than a few elements. "in" shouldn't be allowed to perform a linear operation, and I mean linear relative to the size of it's input.

100% agree. If it can do something non-linear (i.e. auto sort the array or optimize using a static hash table), then I think that's fine. -Steve
Oct 08 2010
prev sibling next sibling parent so <so so.do> writes:
On Thu, 07 Oct 2010 17:52:10 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 10/7/10 6:54 CDT, atommixz wrote:
 It would be nice if it were possible to use the "in" expression wherever
 possible. Now it is only implemented for associative. arrays. (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

 In Python done something like this.

 Here it would be useful to me
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyze-x86.py
 http://code.google.com/p/atommixz/source/browse/analyze-x86/analyzex86.d

I'm a bit leary of adopting this feature (it has been discussed). To me "in" implies a fast operation and substring searching isn't quite it. One thing that could be done is to allow "in" with literal arrays to their right: if (x in ["abcde", "asd"]) { ... } The size of the operand is constant, known, and visible. Andrei

Reading these boards, both new and old posts, i hardly find any topic that i disagree with you. This one is one of those rare cases. It is really hard for me to connect "operator usage" and "operation complexity". To me, operators are just "cute", and mostly meaningful (and not necessarily) shortcuts. You can overload any operator, doesn't matter what it says/feels, you can do pretty much anything you like, on any complexity level. Either purge it (you get Java) or lift the restrictions and let coder have it. Thanks. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Oct 09 2010
prev sibling parent so <so so.do> writes:
 Reading these boards, both new and old posts, i hardly find any topic  
 that i disagree with you.
 This one is one of those rare cases.

 It is really hard for me to connect "operator usage" and "operation  
 complexity". To me, operators are just "cute", and mostly meaningful  
 (and not necessarily) shortcuts.
 You can overload any operator, doesn't matter what it says/feels, you  
 can do pretty much anything you like, on any complexity level.

 Either purge it (you get Java) or lift the restrictions and let coder  
 have it.

 Thanks.

Language libraries are special and naturally care needs to be taken, you would need strong guaranties for pretty much anything in Phobos. You can just use some other function, as a reference library, by doing that you also discourage its usage on things that require strong guaranties. Thanks. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Oct 09 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 13:51:36 -0400, Daniel Gibson <metalcaedes gmail.com>  
wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 10:09:17 -0400, Daniel Gibson  
 <metalcaedes gmail.com> wrote:

 Steven Schveighoffer schrieb:
 On Thu, 07 Oct 2010 07:54:14 -0400, atommixz <atommixz gmail.com>  
 wrote:

 It would be nice if it were possible to use the "in" expression  
 wherever
 possible. Now it is only implemented for associative. arrays.  
 (Weird).
 Examples of how this could be used:
 - Find string in string
 - Search for a character in a string
 - Search for an item in the array, array of characters, array of  
 strings,
 tuples, enum, structure
 - what else?

generally considered to be a fast operation (< O(n)) so linear search is out. -Steve

Is it? I wouldn't expect it to be < O(n) on a regular array, but it'd still be convenient to have instead of iterating over the array and comparing the contents yourself.

that defines the "in" operator. What can that generic code expect for performance when using "in"? As a general rule, generic programming must always assume the worst case, and if we have no rules for 'in', the worst case is linear. Which means generic code may not use 'in' when it would be a fast operation. Same thing with indexing. Try sorting a 'random access' range which uses a linear search on opIndex, and see what the performance is. In addition, arr.find(x) seems pretty simple to me. -Steve

I didn't know about arr.find(). Is that documented somewhere? I do however agree that arr.find() is sufficient if it's there.

std.algorithm.find: http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html#find arr.find works because of the universal call syntax for arrays makes arr.find(x) equivalent to find(arr, x). -Steve
Oct 07 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:
 On 07/10/2010 15:22, Steven Schveighoffer wrote:
 <snip>
 
 The problem is with generic code. Generic code that accepts some type
 that defines the "in" operator. What can that generic code expect for
 performance when using "in"? As a general rule, generic programming must
 always assume the worst case, and if we have no rules for 'in', the
 worst case is linear. Which means generic code may not use 'in' when it
 would be a fast operation. Same thing with indexing. Try sorting a
 'random access' range which uses a linear search on opIndex, and see
 what the performance is.

<snip> Surely, what matters most for generic code is that in has consistent _semantics_, rather than the computational complexity? Stewart.

_Both_ matter. Imagine, for instance, if we were to make Java's mistake of having List be an interface which ArrayList and LinkedList implement and that List has a get() function. Using ArrayList, get() has a complexity of O(1). Using LinkedList, it has a complexity of O(n). Now, imagine code written with List rather than ArrayList or LinkedList directly. If you use get(), then your code will vary considerably in how efficient is. In small cases, this won't matter, but with large lists, it could matter a lot. Your generic code which uses List _has_ to worry about the computational complexity of get() or it could become very inefficient. In this case, for most situations, that would mean using a foreach loop or iterators directly, but using get() is likely a _bad_ idea. You need to know the computational complexity of get() in order to use List efficiently. To write efficient algorithms, you _must_ be able to rely on certain complexity guarantees for the operations that you use. So, regardless of what those complexity gurantees actually _are_, they need to be there. - Jonathan M Davis
Oct 09 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Stewart Gordon <smjg_1998 yahoo.com> wrote:
 Imagine, for instance, if we were to make Java's mistake of having List  
 be an
 interface which ArrayList and LinkedList implement and that List has a  
 get()
 function. Using ArrayList, get() has a complexity of O(1). Using  
 LinkedList, it
 has a complexity of O(n).

Are you suggesting we just don't have a List interface, just to prevent people from writing algorithms that work well in some list implementations but poorly in others? I'm not sure if that risk outweighs any benefits....

Not at all. What is being argued is that an interface not only dictate what the object does, but to an extent how it does it (i.e. complexity guarantees). Just as an interface offering only function signatures with no explanation of what a particular function does, would be considered insufficient or outright dangerous, as should one not warning of complexity consideration.
 To write efficient algorithms, you _must_ be able to rely on certain  
 complexity
 guarantees for the operations that you use. So, regardless of what those
 complexity gurantees actually _are_, they need to be there.

Setting the complexity guarantee of everything to O(n!) would achieve _that_ aim in most everyday cases.... :)

As well as being completely useless. -- Simen
Oct 12 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 12 Oct 2010 10:48:26 -0400, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 10/10/2010 02:05, Jonathan M Davis wrote:
 On Saturday 09 October 2010 06:54:31 Stewart Gordon wrote:

 Surely, what matters most for generic code is that in has consistent
 _semantics_, rather than the computational complexity?

 Stewart.

_Both_ matter.

True, yet you don't address semantic differences at all in what follows, so it doesn't really follow on from what I said. But anyway....
 Imagine, for instance, if we were to make Java's mistake of having List  
 be an
 interface which ArrayList and LinkedList implement and that List has a  
 get()
 function. Using ArrayList, get() has a complexity of O(1). Using  
 LinkedList, it
 has a complexity of O(n).

Are you suggesting we just don't have a List interface, just to prevent people from writing algorithms that work well in some list implementations but poorly in others? I'm not sure if that risk outweighs any benefits....

No, what is suggested is that we build the complexity guarantees into the interface. For example, if you had a List interface, and it could only guarantee O(n) performance for get, you'd call it e.g. slowGet(), to signify it's a slow operation, and define get to signify a fast operation. Similarly, 'in' is defined to be fast, so it has no business being a function on an array. So you can still have the List interface, but you don't hijack the "fast operation" of 'in' with a slow operation. This allows it to still provide the slow operation, just not in a way that makes generic code use it assuming it is fast. -Steve
Oct 13 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Stewart Gordon <smjg_1998 yahoo.com> wrote:

 On 13/10/2010 21:11, Steven Schveighoffer wrote:
 <snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

And make get assert(false) if a fast get isn't implementable?

No. That list would not be able to implement the List interface.
 I think this is bug-prone - somebody may use it forgetting that it won't  
 work on all List implementations, and then the algorithm will refuse at  
 runtime to work at all rather than degrading gracefully to something  
 that does.

-- Simen
Oct 14 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 14 Oct 2010 06:04:24 -0400, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:

 On 13/10/2010 21:11, Steven Schveighoffer wrote:
 <snip>
 No, what is suggested is that we build the complexity guarantees into
 the interface. For example, if you had a List interface, and it could
 only guarantee O(n) performance for get, you'd call it e.g. slowGet(),
 to signify it's a slow operation, and define get to signify a fast
 operation. Similarly, 'in' is defined to be fast, so it has no business
 being a function on an array.

And make get assert(false) if a fast get isn't implementable? I think this is bug-prone - somebody may use it forgetting that it won't work on all List implementations, and then the algorithm will refuse at runtime to work at all rather than degrading gracefully to something that does. But something else that would be useful is methods in the interface to tell which operations are fast and which are slow.

No, not really. What I meant by "define get to be a fast operation", I mean define the term "get" to mean fast operation, not to define get on the List interface and you implement the one you want. You'd only define slowGet as a method on the List interface, and then compile-time checks that require fast operations wouldn't be able to use List. If it's acceptable for a compile-time checked function to use slowGet, then it could use either. Essentially, in your interface, you define the operation's complexity, and avoid using terms that imply fast operation. This allows both compile-time interfaces and runtime interfaces. For example, dcollections' List interface doesn't even define something like get. Why? Because Lists aren't meant to be storage containers for quick access to elements. It used to have a contains method, but I eliminated that because the value to cost ratio was too low. -Steve
Oct 14 2010