www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - !in

reply bearophile <bearophileHUGS lycos.com> writes:
I've seen this change:
http://dsource.org/projects/dmd/changeset/388

!in is one of the things I have asked in the first list of requests I have
posted in this newsgroup a lot of time ago :-) Glad to see few of those things
get implemented. Thank you.

The related thing I was asking for, is to use "in" for a linear search of
single items inside an array, as in Python, a very handy thing that's used
frequently (CPython it also implements an efficient algorithm to search for a
substring: http://effbot.org/zone/stringlib.htm  But probably Andrei wants such
substring search to be a separated algorithm, even if it's very specifically
tuned for strings only and not for arrays in general. I can understand this).

Bye,
bearophile
Feb 17 2010
next sibling parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-02-17 10:22:01 -0500, bearophile <bearophileHUGS lycos.com> said:

 I've seen this change:
 http://dsource.org/projects/dmd/changeset/388
 
 !in is one of the things I have asked in the first list of requests I 
 have posted in this newsgroup a lot of time ago :-) Glad to see few of 
 those things get implemented. Thank you.

Indeed, that's great.
 The related thing I was asking for, is to use "in" for a linear search 
 of single items inside an array, as in Python, a very handy thing 
 that's used frequently (CPython it also implements an efficient 
 algorithm to search for a substring: 
 http://effbot.org/zone/stringlib.htm  But probably Andrei wants such 
 substring search to be a separated algorithm, even if it's very 
 specifically tuned for strings only and not for arrays in general. I 
 can understand this).

Are you sure the 'not in' operator in the Python algorithm you liked to works like D's '!in'? It seems to me like it's searching for the absence of a substring. In D 'in' and '!in' only look for one element. I agree that it'd be handy to have 'in' and '!in' work with arrays, especially with array literals: if (x in [1,2,5,6,10]) do(something); Perhaps this syntax could be allowed but only for array literals. The reason being that array literals are usually short so a linear search wouldn't be too bad. Moreover, an array literal being a literal, it's easy for the compiler to optimize things; you're not constrained to a linear search. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 17 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:
 Are you sure the 'not in' operator in the Python algorithm you liked to 
 works like D's '!in'? It seems to me like it's searching for the 
 absence of a substring.

In Python "in" calls the standard method __contains__() of an object. __contains__ performs hash search in an associative array and a linear search in a list, tuple, array, etc, or in a lazily generated sequence:
 txt = "How are you?"
 "are" in txt



 seq = [1, 2, 3, 4]
 3 in seq



 adict = {1:2, 3:4}
 3 in adict



 from itertools import count
 evens = (2*i for i in count())
 16 in evens



In Python there are no chars, there are just strings, a char is a string of length 1, so the __contains__ performs a substring search (with many optimizations, so if you look for a substring of length 1 it's faster, etc).
In D 'in' and '!in' only look for one element.<

Currently In D 'in' and '!in' only look for one element inside an AA, or calls the opIn/opIn_r if present.
 I agree that it'd be handy to have 'in' and '!in' work with arrays, 
 especially with array literals:
 	if (x in [1,2,5,6,10])
 		do(something);
 Perhaps this syntax could be allowed but only for array literals.

No silly special cases please. If it works for arrays, it works for all arrays. Otherwise nothing is better. Special cases increase complexity and kill languages with a death by a thousand cuts.
The reason being that array literals are usually short so a linear search
wouldn't be too bad.<

"bad" is meaningless. If I have to perform a single linear search in a long array I don't want a nanny compiler to tell me that's a bad thing. Not even Python does it.
Moreover, an array literal being a literal, it's easy for the compiler to
optimize things; you're not constrained to a linear search.<

A linear search on a small array is quite quick in a low level language like D, faster than a badly implemeted hash search (see LDC+Tango, the breaking point for an array of integers is about 15 numbers. DMD is better here). Bye, bearophile
Feb 17 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-02-17 11:24:43 -0500, bearophile <bearophileHUGS lycos.com> said:

 In Python there are no chars, there are just strings, a char is a 
 string of length 1, so the __contains__ performs a substring search 
 (with many optimizations, so if you look for a substring of length 1 
 it's faster, etc).

Actually that's an interesting topic. You're suggesting that 'in' for arrays could search not only for elements but also for subarrays? The interesting part is that the syntax is very natural. The less interesting part is that it's slower and slightly different from 'in' in associative arrays. I'm on the fence. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 17 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Michel Fortin:

You're suggesting that 'in' for arrays could search not only for elements but
also for subarrays?<

I have suggested this probably two years ago or more. The main difference compared the Python situation is that D is statically typed, so the compiler can tell apart the case of searching of a single item from the case of searching a subsequence. A similar case: in Python lists have the .append() and .extend() methods because Python is dynamically typed, so it can't tell apart the two cases. But the static type system of D allows you to just use ~= for both cases.
 The interesting part is that the syntax is very natural. The less 
 interesting part is that it's slower and slightly different from 'in' 
 in associative arrays. I'm on the fence.

I am for it, because the operation is the same, "in" is used to search inside a collection, then the collection uses the faster method it has to answer. But D2 is now finalized, so this feature might need to wait after the phase of debugging and improvement of D2. -------------- Andrei Alexandrescu:
Generally I'd be hesitant to make it all too easy to do linear searches. Too
much of that would encourage people to stick with linear searches because it's
the path of least resistance.<

D programmers know the difference between linear search, hash search and search in an ordered search tree. They are O(n), O(1), and O(ln n). Do you want to use a different name for each of those searches (and maybe a (n log log n if you search with interpolated search, etc) just because their computational complexity is different, or do you want just a syntax that asks the collection to find something using the faster code it has? Even in my Python programs I am well aware of the performance difference between a linear search in a list and an hash search in a set. If I know I will need to perform a search operation often, on in an important loop, I will use a set, otherwise using a list (that is an array) is often good enough. The syntax to search is the same so you can swap data structure in a moment, eventually.
Searching a value in a literal should actually be allowed: x in [10, 20, 30, 0]<

This is very wrong. No special cases, please. If you don't want to support the linear search on all arrays, then please don't add this feature at all, I will keep using my isIn templated function (that digests AAs too, lazy iterables, etc too of course). I don't want more warts and special cases in D, thank you. As Python Zen says: "Special cases aren't special enough to break the rules." Such rules must be read and understood by D devs too, if you want to make a language that's not a mess as C++. Bye, bearophile
Feb 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 
 Generally I'd be hesitant to make it all too easy to do linear
 searches. Too much of that would encourage people to stick with
 linear searches because it's the path of least resistance.<

D programmers know the difference between linear search, hash search and search in an ordered search tree. They are O(n), O(1), and O(ln n). Do you want to use a different name for each of those searches (and maybe a (n log log n if you search with interpolated search, etc) just because their computational complexity is different, or do you want just a syntax that asks the collection to find something using the faster code it has?

I think you're giving D programmers too much credit. I think a best-effort search is not widely useful.
 Searching a value in a literal should actually be allowed: x in
 [10, 20, 30, 0]<

This is very wrong. No special cases, please. If you don't want to support the linear search on all arrays, then please don't add this feature at all, I will keep using my isIn templated function (that digests AAs too, lazy iterables, etc too of course). I don't want more warts and special cases in D, thank you. As Python Zen says: "Special cases aren't special enough to break the rules." Such rules must be read and understood by D devs too, if you want to make a language that's not a mess as C++.

Searching an array literal vs. an array variable is not a special case. Andrei
Feb 17 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

I think a best-effort search is not widely useful.<

I meant using a hash search in a hash and a linear search in an array, etc. And having the same syntax to search in different type of collections is widely useful.
Searching an array literal vs. an array variable is not a special case.<

I can't agree. They are both arrays. Bye, bearophile
Feb 17 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
Searching an array literal vs. an array variable is not a special case.<

if (5 in [1, 2, 5]) { ... // OK if (5 in [1, 2, 5].dup) { ... // not OK If you can't see a special case there you are blind. Bye, bearophile
Feb 17 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 
 I think a best-effort search is not widely useful.<

I meant using a hash search in a hash and a linear search in an array, etc. And having the same syntax to search in different type of collections is widely useful.

What evidence do you have that it's widely useful?
 Searching an array literal vs. an array variable is not a special 
 case.<

I can't agree. They are both arrays.

Well I understand you don't agree from the frame you're now in, but you haven't heard my argument yet. Under such conditions you shouldn't say "can't"! Here goes my argument. 1. An array literal has no name and cannot be aliased, hence it's private to the implementation. The implementation is therefore free to induce structure on the searched elements, e.g. by sorting the array or using a hash. 2. The size of an array variable must be conservatively assumed to scale with the size of the input. The size of an array literal scales with the size of the program, or in the worst case (code generation) with statically-known inputs to the program. Andrei
Feb 17 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Andrei Alexandrescu wrote:
 1. An array literal has no name and cannot be aliased, hence it's 
 private to the implementation.

Oops. I meant to say an array literal in the expression "a in [ ... ]". Generally array literals definitely can be aliased :o). Andrei
Feb 17 2010
prev sibling parent reply =?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:
On 02/17/2010 10:25 PM, Andrei Alexandrescu wrote:
 What evidence do you have that it's widely useful?

I use it once in a while in python, evidence enough? :)
 Well I understand you don't agree from the frame you're now in, but you
 haven't heard my argument yet. Under such conditions you shouldn't say
 "can't"! Here goes my argument.

 1. An array literal has no name and cannot be aliased, hence it's
 private to the implementation. The implementation is therefore free to
 induce structure on the searched elements, e.g. by sorting the array or
 using a hash.

 2. The size of an array variable must be conservatively assumed to scale
 with the size of the input. The size of an array literal scales with the
 size of the program, or in the worst case (code generation) with
 statically-known inputs to the program.

I don't see how this is any argument against opIn_r for array variables as well. I mean, the argument against it seems to be to make it more difficult to do a linear search over an array, which seems rather backwards to me.
Feb 17 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Pelle Månsson wrote:
 On 02/17/2010 10:25 PM, Andrei Alexandrescu wrote:
 What evidence do you have that it's widely useful?

I use it once in a while in python, evidence enough? :)
 Well I understand you don't agree from the frame you're now in, but you
 haven't heard my argument yet. Under such conditions you shouldn't say
 "can't"! Here goes my argument.

 1. An array literal has no name and cannot be aliased, hence it's
 private to the implementation. The implementation is therefore free to
 induce structure on the searched elements, e.g. by sorting the array or
 using a hash.

 2. The size of an array variable must be conservatively assumed to scale
 with the size of the input. The size of an array literal scales with the
 size of the program, or in the worst case (code generation) with
 statically-known inputs to the program.

I don't see how this is any argument against opIn_r for array variables as well. I mean, the argument against it seems to be to make it more difficult to do a linear search over an array, which seems rather backwards to me.

Not more difficult, but not deceivingly easy either. I like how you can search with simple syntax in an associative array but not in a linear array. Andrei
Feb 17 2010
prev sibling parent reply BCS <none anon.com> writes:
Hello Michel,

 I agree that it'd be handy to have 'in' and '!in' work with arrays,
 especially with array literals:
 
 if (x in [1,2,5,6,10]) do(something);

 Perhaps this syntax could be allowed but only for array literals. The
 reason being that array literals are usually short so a linear search
 wouldn't be too bad. Moreover, an array literal being a literal, it's
 easy for the compiler to optimize things; you're not constrained to a
 linear search.

How about expanding that to compile time constants of any kind. From that, even with long arrays, you should always be able to get n-log(n) performance (tree search) or better (translate it into a normal AA 'in' or even a perfect hash). -- ... <IXOYE><
Feb 17 2010
parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-02-17 12:56:51 -0500, BCS <none anon.com> said:

 How about expanding that to compile time constants of any kind. From 
 that, even with long arrays, you should always be able to get n-log(n) 
 performance (tree search) or better (translate it into a normal AA 'in' 
 or even a perfect hash).  --

I agree with this. I wrote an example of it in my reply to Andrei. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 17 2010
prev sibling next sibling parent daoryn <asdf sfa.we> writes:
bearophile Wrote:

 I've seen this change:
 http://dsource.org/projects/dmd/changeset/388
 
 !in is one of the things I have asked in the first list of requests I have
posted in this newsgroup a lot of time ago :-) Glad to see few of those things
get implemented. Thank you.
 
 The related thing I was asking for, is to use "in" for a linear search of
single items inside an array, as in Python, a very handy thing that's used
frequently (CPython it also implements an efficient algorithm to search for a
substring: http://effbot.org/zone/stringlib.htm  But probably Andrei wants such
substring search to be a separated algorithm, even if it's very specifically
tuned for strings only and not for arrays in general. I can understand this).
 
 Bye,
 bearophile

I tried to rename the function opIn hoping it would get automatically assumed to be synonim to the "in" operator, but ended up having to put it on my "common.d" file and explicitly use it. public bool inArray(T)(in const(T)[] array, in T elem) pure { foreach(size_t i, T e; array) if (e==elem) return true; return false; } Would be nice also to have a general search function such as public int findFirst(T)(in const(T)[] haystack, in const(T)[] needle) pure in { assert(haystack.length!=0, "haystack.length on slices!() == 0"); assert(needle.length!=0, "haystack.length on slices!() == 0"); assert(haystack.length>=needle.length, "Needle is bigger than haystack."); } out (result) { assert(result>=-1); assert(result<=(haystack.length-needle.length)); } body { const int end=(haystack.length-needle.length)+1; for (int i=0; i<end; i++) if (haystack[i..i+needle.length]==needle) return i; return -1; } I found something similar on std.array but discovered that syntax was very obscure and ended up making my own functions.
Feb 17 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 I've seen this change: http://dsource.org/projects/dmd/changeset/388
 
 !in is one of the things I have asked in the first list of requests I
 have posted in this newsgroup a lot of time ago :-) Glad to see few
 of those things get implemented. Thank you.
 
 The related thing I was asking for, is to use "in" for a linear
 search of single items inside an array, as in Python, a very handy
 thing that's used frequently (CPython it also implements an efficient
 algorithm to search for a substring:
 http://effbot.org/zone/stringlib.htm  But probably Andrei wants such
 substring search to be a separated algorithm, even if it's very
 specifically tuned for strings only and not for arrays in general. I
 can understand this).

That algorithm is nice. Generally I'd be hesitant to make it all too easy to do linear searches. Too much of that would encourage people to stick with linear searches because it's the path of least resistance. Searching a value in a literal should actually be allowed: x in [10, 20, 30, 0] is great because the compiler has complete discretion in how to conduct the search (e.g. linear vs. binary vs. hash search; it can take the initiative of presorting the literal). But general search in an unstructured range... maybe not. That being said, the CPython string you pointed me to looks very interesting. Phobos has a Boyer-Moore search implementation, but, just like Fredrik Lundh, I'm unhappy that it does a fair amount of computation upfront and that it needs dynamic allocation. If someone has the time, please contribute a better implementation of find() in std.algorithm. Andrei
Feb 17 2010
parent reply Michel Fortin <michel.fortin michelf.com> writes:
On 2010-02-17 12:07:05 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Searching a value in a literal should actually be allowed:
 
 x in [10, 20, 30, 0]
 
 is great because the compiler has complete discretion in how to conduct 
 the search (e.g. linear vs. binary vs. hash search; it can take the 
 initiative of presorting the literal). But general search in an 
 unstructured range... maybe not.

Are you talking about literals or compile-time constants? A literal can be built using variables and functions, such as: x in [a, b, c, d, e] This would be mostly equivalent to this: x == a || x == b || x == c || x == d || x == e I'd tend to allow it as it makes it easier to write and read conditionals with repeated comparisons against the same variable. But I guess that's less important than supporting compile-time constants: const allowedCharacters = ['0','1','2','3','4','5','6','7','8','9']; if (x in allowedCharacters) ...; -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Feb 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Michel Fortin wrote:
 On 2010-02-17 12:07:05 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> said:
 
 Searching a value in a literal should actually be allowed:

 x in [10, 20, 30, 0]

 is great because the compiler has complete discretion in how to 
 conduct the search (e.g. linear vs. binary vs. hash search; it can 
 take the initiative of presorting the literal). But general search in 
 an unstructured range... maybe not.

Are you talking about literals or compile-time constants? A literal can be built using variables and functions, such as: x in [a, b, c, d, e] This would be mostly equivalent to this: x == a || x == b || x == c || x == d || x == e I'd tend to allow it as it makes it easier to write and read conditionals with repeated comparisons against the same variable.

I was thinking CTFE-able array literals, but I think linear search in a short array (in a way all array literals are O(1)) is fair game too.
 But I guess that's less important than supporting compile-time constants:
 
     const allowedCharacters = ['0','1','2','3','4','5','6','7','8','9'];
 
     if (x in allowedCharacters)
         ...;
 
 

Indeed. Andrei
Feb 17 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Indeed.

Indubitably. (sips brandy)
Feb 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Indeed.

Indubitably. (sips brandy)

What's the cultural reference? I know of "indubitably" used as a punchline to a great joke, but there's no brandy in that one. Andrei
Feb 17 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Indeed.

Indubitably. (sips brandy)

What's the cultural reference? I know of "indubitably" used as a punchline to a great joke, but there's no brandy in that one.

Imagine a victorian era upper class gentlemen's club, where the members sit around sipping brandy and puffing on cigars, congratulating each other. (puffs on cigar)
Feb 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Walter Bright wrote:
 Andrei Alexandrescu wrote:
 Indeed.

Indubitably. (sips brandy)

What's the cultural reference? I know of "indubitably" used as a punchline to a great joke, but there's no brandy in that one.

Imagine a victorian era upper class gentlemen's club, where the members sit around sipping brandy and puffing on cigars, congratulating each other. (puffs on cigar)

My joke reference is better. Low-brow husband to low-brow woman: "What's with that vocabulary night class you go to, woman? I bet there's orgies in there!" "Indubitably." Andrei
Feb 17 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 My joke reference is better.

Harrumph. Pistols at dawn?
Feb 18 2010
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 Andrei Alexandrescu wrote:
 My joke reference is better.

Harrumph. Pistols at dawn?

A stinky sock is my weapon of choice. Andrei
Feb 18 2010
prev sibling parent reply BCS <none anon.com> writes:
Hello Walter,

 Andrei Alexandrescu wrote:
 
 My joke reference is better.
 


Now that would be D-isaster! -- ... <IXOYE><
Feb 18 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
BCS wrote:
 Hello Walter,
 
 Andrei Alexandrescu wrote:

 My joke reference is better.


Now that would be D-isaster!

Not really. I failed at marksmanship.
Feb 18 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Walter Bright wrote:
 BCS wrote:
 Hello Walter,

 Andrei Alexandrescu wrote:

 My joke reference is better.


Now that would be D-isaster!

Not really. I failed at marksmanship.

That means "really"! For the sake of D you'd better shoot me rather than the other way around :o). Andrei
Feb 18 2010
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
On 02/18/2010 07:31 PM, Andrei Alexandrescu wrote:

 .. the other way around :o).

 Andrei

<skeptical> with a stinky sock? </skeptical>
Feb 18 2010