www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: "in" everywhere

reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 I'm a bit leary of adopting this feature (it has been discussed). To me 
 "in" implies a fast operation and substring searching isn't quite it.
 
 One thing that could be done is to allow "in" with literal arrays to 
 their right:
 
 if (x in ["abcde", "asd"]) { ... }
 
 The size of the operand is constant, known, and visible.

The "in" is useful to tell if an item is present in a collection (dynamic arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and it's useful to tell if a substring is present in a string. Those two purposes are very commonly useful in programs. Despite those purposes are very common, there is no need for an "in" syntax. In dlibs1 I have created an isIn() function that was good enough. D supports the "in" operator for associative arrays, and in its operator overloading (my Set()/set() implementation supports the "in" operator), so it's natural for people to desire that operator to be usable for the other kind of collections too, like arrays. Maybe those people have used Python, where the in operator (__contains__) is widey used. Regarding generic programming, Python doesn't have templates, but it has dynamic typing, that also has purposes similar to generic programming. To a python function that uses the "in" operator you may give both a list and an associative array, or other objects that support the __contains__. In this case the Python code performs an O(n) or O(1) operation according to the dynamic type given to the function. I can live without the "in" operation for arrays and strings (useful to look for substrings too). So one solution is not not change the D language, and not add support for "in" to arrays. Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this. Another solution is to support the "in" operator for dynamic arrays too, and define a new attribute, like complexity(), plus an Enum that allows to specify the worst case complexity. So associative arrays are annotated with complexity(O.linear), while the function that searches for items/substrings in arrays/strings is complexity(O.constant). At compile-time the generic code is then able to query the computational complexity of the "in" operator implementation. The problem is that the compiler today can't enforce functions annotated with complexity(O.constant) to actually not perform a linear search (but I don't think it's a problem, today if the Range protocol asks an operation to not be worse than O(n ln n) the compiler doesn't enforce it). I don't like the idea of allowing the "in" operator for sorted arrays, tuples, fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic arrays. It looks a bit silly. People for the next ten years will then ask "in" to be extended for unsorted dynamic arrays & substring search too, you can bet on it. Bye, bearophile
Oct 07 2010
next sibling parent Pelle <pelle.mansson gmail.com> writes:
On 10/07/2010 09:40 PM, bearophile wrote:
 Andrei Alexandrescu:

 I'm a bit leary of adopting this feature (it has been discussed). To me
 "in" implies a fast operation and substring searching isn't quite it.

 One thing that could be done is to allow "in" with literal arrays to
 their right:

 if (x in ["abcde", "asd"]) { ... }

 The size of the operand is constant, known, and visible.

The "in" is useful to tell if an item is present in a collection (dynamic arrays, fixed-sized arrays, sorted arrays, tuple, typetuple, hash set, tree set, hash map, tree map, trees, graph, deque, skip list, finger tree, etc), and it's useful to tell if a substring is present in a string. Those two purposes are very commonly useful in programs. Despite those purposes are very common, there is no need for an "in" syntax. In dlibs1 I have created an isIn() function that was good enough. D supports the "in" operator for associative arrays, and in its operator overloading (my Set()/set() implementation supports the "in" operator), so it's natural for people to desire that operator to be usable for the other kind of collections too, like arrays. Maybe those people have used Python, where the in operator (__contains__) is widey used. Regarding generic programming, Python doesn't have templates, but it has dynamic typing, that also has purposes similar to generic programming. To a python function that uses the "in" operator you may give both a list and an associative array, or other objects that support the __contains__. In this case the Python code performs an O(n) or O(1) operation according to the dynamic type given to the function. I can live without the "in" operation for arrays and strings (useful to look for substrings too). So one solution is not not change the D language, and not add support for "in" to arrays. Another solution is just to accept O(n) as the worst complexity for the "in" operator. I don't understand what's the problem in this. Another solution is to support the "in" operator for dynamic arrays too, and define a new attribute, like complexity(), plus an Enum that allows to specify the worst case complexity. So associative arrays are annotated with complexity(O.linear), while the function that searches for items/substrings in arrays/strings is complexity(O.constant). At compile-time the generic code is then able to query the computational complexity of the "in" operator implementation. The problem is that the compiler today can't enforce functions annotated with complexity(O.constant) to actually not perform a linear search (but I don't think it's a problem, today if the Range protocol asks an operation to not be worse than O(n ln n) the compiler doesn't enforce it). I don't like the idea of allowing the "in" operator for sorted arrays, tuples, fixed-sized arrays, skip lists, maps, sets, trees and disallow it for dynamic arrays. It looks a bit silly. People for the next ten years will then ask "in" to be extended for unsorted dynamic arrays& substring search too, you can bet on it. Bye, bearophile

I think it's better to know that in always has logarithmic or better complexity. Otherwise, there is always canFind (which needs a new name :-)
Oct 07 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for the "in"
operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound. Andrei
Oct 07 2010
next sibling parent reply Rainer Deyke <rainerd eldwood.com> writes:
On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation. All I care about is that it's "fast enough", which is highly context-dependent and may have nothing to do with complexity. I can't see myself replacing my 'int[]' arrays with the much slower and bigger 'int[MAX_SIZE]' arrays just to satisfy the compiler. I shouldn't have to. The type system shouldn't encourage me to. I think it's an abuse of the type system to use it to guarantee performance. However, if I wanted the type system to provide performance guarantees, I would need a lot more language support than a convention that certain operations are supposed to be O(n). I'm talking performance specification on *all* functions, with a compile-time error if the compiler can't prove that the compiled function meets those guarantees. And *even then*, I would like to be able to use an O(n) implementation of 'in' where I know that O(n) performance is acceptable. -- Rainer Deyke - rainerd eldwood.com
Oct 07 2010
next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com> 
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is. Here is an example of how caring about the big O complexity cut the runtime of dmd to about 1/3: http://d.puremagic.com/issues/show_bug.cgi?id=4721 big O complexity is very important when you are writing libraries. Not so much when you are writing applications -- if you can live with it in your application, then fine. But Phobos should not have these problems for people who *do* care. What I'd suggest is to write your own function that uses in when possible and find when not possible. Then use that in your code.

Personally, I'd vote for inclusion of such a function in Phobos. The big problem with C++ containers was and is an absence of uniform interface. Typedefing those templates to save typing was good, but it didn't completely solve the problem of interchangeability. E.g., you make a change from list to vector and suddenly realize that remove() doesn't work anymore. Having uniform interface for insertion/removal/lookup is plainly awesome. What if Phobos defined a function, say, E* contains(C,E)(C container, E element) that'd behave as 'in' operator as best it could for type C? It'd fit frequent demands without making big promises - in other words, would still be largerly useful. Yes, I understand that good (performance-wise) generic solution isn't that simple to discover and implement. But having an easily accessible (i.e. standard), though maybe not all that efficient solution is a good thing nevertheless.
Oct 07 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/7/10 15:23 CDT, Rainer Deyke wrote:
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation.

Complexity composes very badly. Issues tend to manifest at moderate sizes and may make things unbearable at large sizes. I'm really grateful I'm at a workplace where the exclamation "Damn! I was waiting like an idiot for this quadratic append!" is met with understanding nods from workmates who've made the same mistake before. As an example, only last week I was working on cramming a sort of an index of the entire Wikipedia on one machine. I was waiting for the indexer which ran slower and slower and slower. In the end I figured there was only _one_ quadratic operation - appending to a vector<size_t> that held document lengths. That wasn't even the bulk of the data and it was the last thing I looked at! Yet it made the run time impossible to endure.
 All I care about is that it's "fast enough", which is highly
 context-dependent and may have nothing to do with complexity.  I can't
 see myself replacing my 'int[]' arrays with the much slower and bigger
 'int[MAX_SIZE]' arrays just to satisfy the compiler.  I shouldn't have
 to.  The type system shouldn't encourage me to.

 I think it's an abuse of the type system to use it to guarantee
 performance.  However, if I wanted the type system to provide
 performance guarantees, I would need a lot more language support than a
 convention that certain operations are supposed to be O(n).  I'm talking
 performance specification on *all* functions, with a compile-time error
 if the compiler can't prove that the compiled function meets those
 guarantees.  And *even then*, I would like to be able to use an O(n)
 implementation of 'in' where I know that O(n) performance is acceptable.

std.container introduces the convention that O(n) methods start with "linear". Andrei
Oct 07 2010
next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
Andrei Alexandrescu wrote:

 
 Complexity composes very badly. Issues tend to manifest at moderate 
 sizes and may make things unbearable at large sizes. I'm really grateful 
 I'm at a workplace where the exclamation "Damn! I was waiting like an 
 idiot for this quadratic append!" is met with understanding nods from 
 workmates who've made the same mistake before.
 
 As an example, only last week I was working on cramming a sort of an 
 index of the entire Wikipedia on one machine. I was waiting for the 
 indexer which ran slower and slower and slower. In the end I figured 
 there was only _one_ quadratic operation - appending to a vector<size_t> 
 that held document lengths. That wasn't even the bulk of the data and it 
 was the last thing I looked at! Yet it made the run time impossible to 
 endure.
 

But that is not a matter of library interface isn't it? It's a matter of algorithm/container choice. It's not the push_back that was slow in the end, it was std::vector (yes, that's arguable, but the point is that container defines rules for its methods, not vice-versa).
 I think it's an abuse of the type system to use it to guarantee
 performance.  However, if I wanted the type system to provide
 performance guarantees, I would need a lot more language support than a
 convention that certain operations are supposed to be O(n).  I'm talking
 performance specification on *all* functions, with a compile-time error
 if the compiler can't prove that the compiled function meets those
 guarantees.  And *even then*, I would like to be able to use an O(n)
 implementation of 'in' where I know that O(n) performance is acceptable.

std.container introduces the convention that O(n) methods start with "linear".

I find such convention useful indeed, though it brings a fact to surface: if we need to emphasize method that make strong promises about complexity with prefixes/suffixes, or say by putting them into separate module, then why don't we have an non-emphasized counterpart that won't make strong promises but would fit wider range of containers? After all, std.algorithm offers different search mechanisms with varying complexity (e.g. find() vs boyerMooreFinder()).
Oct 07 2010
next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
Stanislav Blinov wrote:

 ... make strong promises but would fit wider range of containers? After all, 
 std.algorithm offers ...

Sorry, that should be 'cases', not 'containers' there.
Oct 07 2010
prev sibling next sibling parent reply Pelle <pelle.mansson gmail.com> writes:
On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

What do you suggest for fast lookup in a container?
Oct 08 2010
parent reply Juanjo Alvarez <juanjux gmail.com> writes:
Pelle Wrote:

 On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

What do you suggest for fast lookup in a container?

What is being used now? How can having "in" and not using it (just like now) in functions requiring guaranteed complexity can be worse than not having it? The only drawback I can see to having an "in" operator with all containers is that some programmers would not read the documentation and use it expecting it to be fast. But then that also happens with many other language constructs and some programmers will write crappy algoritms anyway.
Oct 08 2010
parent Pelle <pelle.mansson gmail.com> writes:
On 10/08/2010 01:45 PM, Juanjo Alvarez wrote:
 Pelle Wrote:

 On 10/08/2010 03:18 AM, Juanjo Alvarez wrote:
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com>  wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

What do you suggest for fast lookup in a container?

What is being used now? How can having "in" and not using it (just like now) in functions requiring guaranteed complexity can be worse than not having it?

If I write a generic algorithm, I can use opIn and assume it is fast. If I don't need the speed, I can use canFind over the container's range instead. If we say opIn can be slow, the fast version goes away.
 The only drawback I can see to having an "in" operator with all containers is
that some programmers would not read the documentation and use it expecting it
to be fast. But then that also happens with many other language constructs and
some programmers will write crappy algoritms anyway.

Oct 08 2010
prev sibling next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
Jonathan M Davis wrote:
 On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:

 But that is not a matter of library interface isn't it? It's a matter of
 algorithm/container choice. It's not the push_back that was slow in the
 end, it was std::vector (yes, that's arguable, but the point is that
 container defines rules for its methods, not vice-versa).

Except that when you're dealing with generic code which has to deal with multiple container types (like std.algorithm), you _need_ certain complexity guarantees about an operation since it could happen on any container that it's given. Using operator in in an algorithm could be perfectly reasonable if it had O(1) or O(log n) complexity but be completely unreasonable if it had O(n) complexity. So, the algorithm then is reasonable with some containers and not others when if in were restricted to O(1) or O(log n), it could be used by the algorithm without worrying that a container would be used with it which would make it an order of complexity greater, or worse. If it were strictly a matter of writing code which directly used a particular algorithm and container type, then you could know what the complexities of the algorithm and operations on the container are, but once you're dealing with generic code, operations need to have complexity guarantees regardless of their container, or the complexity of the algorithm will vary with the container type. And if that algorithm is used in yet another algorithm, then it gets that much worse. It can quickly become easy to bury the place where what you're trying to do becomes an order of complexity greater, because the container that you selected was an order of complexity greater than other containers on an operation that an algorithm is using buried in code somewhere. - Jonathan M Davis

Yet still, generality ends at some point. You can't devise every possible algorithm for any possible types and have it have set-in-stone complexity independently of types. Take std.range.popFrontN(). It's generic, and it's used in other algorithms. Yet it has O(1) complexity for ranges that support slicing, and O(n) for those that do not. But you don't take into account complexity of slicing operation, or complexity of stepping through the range. Or std.algorithm.find(). It's basically O(n), but then again, when using it, one should also consider complexity of used predicate. Just the same happened in the case Andrei described: the algorithm was O(n) judging from the description (performing n insertions into container). But the container itself blew it up into quadratic time because of it's own insertion algorithm. What I mean is you'll always have algorithms that will perform differently for different containers, and you'll always have to choose containers that best fit your needs. Generic code is not only about efficiency: you'd at least expect it to work for different types. Replacing vector with list in Andrei's case would probably solve the problem at the cost of losing random access (together with contiguous storage). Which means it'd also require changing code if random access was in use somewhere. Having a generic random access function at(Container,index) (offering complexity that varies with container used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.
Oct 08 2010
parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
Jonathan M Davis wrote:
 On Friday 08 October 2010 03:06:01 Stanislav Blinov wrote:

 [...]
 What I mean is you'll always have algorithms that will perform
 differently for different containers, and you'll always have to choose
 containers that best fit your needs [...]


 
 All true. However, the point is that operations need to have a know
complexity. 
 If in is known to be O(1), the algorithms can safely use it, knowing that it's 
 O(1). Any container that can't do it in O(1) doesn't implement in. You could,
on 
 the other hand, have in be O(n) (which doesn't stop containers from
implementing 
 it as O(1) since that's better than O(n)), and then any algorithm that uses it 
 has to assume that it could be O(n) and therfore may need to avoid it. The
real 
 question is what complexity we want to define it as. At the moment, the only 
 container to use it are the built-in associative arrays, and for them, it's 
 O(1). Since find() is already going to be O(n), it seems to me that in should
be 
 better than O(n) - be it O(1) or O(log n) - and Andrei appears to agree, but 
 others don't, hence this long discussion...
 

Ahh, I think I get the perspective now, though I had to reread the whole thread two times. Thank you.
Oct 08 2010
prev sibling parent reply Juanjo Alvarez <juanjux gmail.com> writes:
Steven Schveighoffer Wrote:

 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
 wrote:
 
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it. When I call sort(x); and it performs horribly, am I going to blame x or sort? Certainly, I'll never think it's my own fault :) -Steve

True! And that's the only drawback I see on generalizing "in", but there are many things in programming languages that doesn't feel right when you don't know the language well. That doesn't mean that D should be the "programming for dummies on rails with a constant automated tutor included" language; if I read well the site, it is mean to be a practical language with the ability to shot yourself in the foot. Still, I don't understand how generalizing "in" could affect std.algorithm et al if they only use "in" for AAs, just like now.
Oct 08 2010
parent Lutger <lutger.blijdestijn gmail.com> writes:
Juanjo Alvarez wrote:

 Steven Schveighoffer Wrote:
 
 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>
 wrote:
 
 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it. When I call sort(x); and it performs horribly, am I going to blame x or sort? Certainly, I'll never think it's my own fault :) -Steve

True! And that's the only drawback I see on generalizing "in", but there are many things in programming languages that doesn't feel right when you don't know the language well. That doesn't mean that D should be the "programming for dummies on rails with a constant automated tutor included" language; if I read well the site, it is mean to be a practical language with the ability to shot yourself in the foot. Still, I don't understand how generalizing "in" could affect std.algorithm et al if they only use "in" for AAs, just like now.

Suppose I would like to use a faster AA (for some cases) and define opIn with good lookup behavior. Then the algorithms in phobos would think my opIn is crap, what now? All code using regular AA's could have just been replaced by my super- duper user-defined AA if it were not for this generalizing of OpIn. So we need another operation that replaces the previously fast opIn and make phobos use that instead. But why bother if we have a perfectly good one to begin with?
Oct 08 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, October 07, 2010 14:44:50 Stanislav Blinov wrote:
 Andrei Alexandrescu wrote:
 Complexity composes very badly. Issues tend to manifest at moderate
 sizes and may make things unbearable at large sizes. I'm really grateful
 I'm at a workplace where the exclamation "Damn! I was waiting like an
 idiot for this quadratic append!" is met with understanding nods from
 workmates who've made the same mistake before.
 
 As an example, only last week I was working on cramming a sort of an
 index of the entire Wikipedia on one machine. I was waiting for the
 indexer which ran slower and slower and slower. In the end I figured
 there was only _one_ quadratic operation - appending to a vector<size_t>
 that held document lengths. That wasn't even the bulk of the data and it
 was the last thing I looked at! Yet it made the run time impossible to
 endure.

But that is not a matter of library interface isn't it? It's a matter of algorithm/container choice. It's not the push_back that was slow in the end, it was std::vector (yes, that's arguable, but the point is that container defines rules for its methods, not vice-versa).

Except that when you're dealing with generic code which has to deal with multiple container types (like std.algorithm), you _need_ certain complexity guarantees about an operation since it could happen on any container that it's given. Using operator in in an algorithm could be perfectly reasonable if it had O(1) or O(log n) complexity but be completely unreasonable if it had O(n) complexity. So, the algorithm then is reasonable with some containers and not others when if in were restricted to O(1) or O(log n), it could be used by the algorithm without worrying that a container would be used with it which would make it an order of complexity greater, or worse. If it were strictly a matter of writing code which directly used a particular algorithm and container type, then you could know what the complexities of the algorithm and operations on the container are, but once you're dealing with generic code, operations need to have complexity guarantees regardless of their container, or the complexity of the algorithm will vary with the container type. And if that algorithm is used in yet another algorithm, then it gets that much worse. It can quickly become easy to bury the place where what you're trying to do becomes an order of complexity greater, because the container that you selected was an order of complexity greater than other containers on an operation that an algorithm is using buried in code somewhere. - Jonathan M Davis
Oct 07 2010
prev sibling next sibling parent Juanjo Alvarez <fake fakeemail.com> writes:
On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis 
<jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal 

 multiple container types (like std.algorithm), you _need_ certain 

 guarantees about an operation since it could happen on any 

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.
Oct 07 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
wrote:

 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it. When I call sort(x); and it performs horribly, am I going to blame x or sort? Certainly, I'll never think it's my own fault :) -Steve
Oct 08 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/8/10 5:24 CDT, Torarin wrote:
 2010/10/7 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>:
 In the end I figured there was only _one_
 quadratic operation - appending to a vector<size_t>  that held document
 lengths. That wasn't even the bulk of the data and it was the last thing I
 looked at! Yet it made the run time impossible to endure.

 From sgi's stl vector:

void push_back(const _Tp& __x) { if (_M_finish != _M_end_of_storage) { construct(_M_finish, __x); ++_M_finish; } else _M_insert_aux(end(), __x); } How is this quadratic, let alone linear?

My code wasn't using push_back. Andrei
Oct 08 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Juanjo Alvarez <juanjux gmail.com> wrote:

 True! And that's the only drawback I see on generalizing "in", but there  
 are many things in programming languages that doesn't feel right when  
 you don't
 know the language well. That doesn't mean that D should be the  
 "programming for dummies on rails with a constant automated tutor  
 included" language; if I
 read well the site, it is mean to be a practical language with the  
 ability to shot yourself in the foot.

Absolutely. And one of its trademarks is being as fast as C. Now, C clearly does not have the 'in' operator, but it is a goal in D that the obvious way to do something should be fast and correct.
 Still, I don't understand how generalizing "in" could affect  
 std.algorithm et al if they only use "in" for AAs, just like now.

Because if 'in' is available for other uses for other containers, one would be tempted to use it also there. The alternative is to put it in the coding standards: 43. Thou shalt not use magic numbers. 44. Thou shalt not use 'in', as it may be slow as heck. 45. Thou shalt not write spaghetti code. Nor macaroni. This would make the feature useless. -- Simen
Oct 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 08 Oct 2010 09:46:54 -0400, Juanjo Alvarez <juanjux gmail.com>  
wrote:

 Steven Schveighoffer Wrote:

 That kind of "documentation" is useless, it doesn't prevent use, and it
 doesn't feel right to the person who accidentally uses it.  When I call

 sort(x);

 and it performs horribly, am I going to blame x or sort?  Certainly,  
 I'll
 never think it's my own fault :)

 -Steve

True! And that's the only drawback I see on generalizing "in", but there are many things in programming languages that doesn't feel right when you don't know the language well. That doesn't mean that D should be the "programming for dummies on rails with a constant automated tutor included" language; if I read well the site, it is mean to be a practical language with the ability to shot yourself in the foot. Still, I don't understand how generalizing "in" could affect std.algorithm et al if they only use "in" for AAs, just like now.

Let's move off of in for a minute, so I can illustrate what I mean. in is kind of a non-universal operator (other languages define it differently). Let's try indexing. If you see the following code: for(int i = 0; i < x.length; i++) x[i]++; If you see this code, you might think it's O(n). But let's say the author of x didn't care about complexity, and just felt like [] should be for indexing, no matter what the cost. Then x could possibly be a linked list, and each index operation is O(n), then this block of code becomes O(n^2), and your performance suffers. You may not notice it, it might be "acceptable", but then somewhere down the road you start calling this function more, and your program all of a sudden gets really slow. What gives? Let's say you spend 1-2 hours looking for this and find out that the problem is that indexing x is linear, you can change the loop to this: foreach(ref i; x) i++; and all of a sudden your performance comes back. Maybe the author of x puts right in his docs that indexing is an O(n) operation. You might grumble about it, and move on. But if this happens all the time, you are just going to start blaming the author of x more than your own incompetence. It's one of those things where user interface designers get it, and most engineers don't -- people have certain flaws, and a big one is not reading the manual. Making the interface as intuitive as possible is very important for the success of a product. But what if we made the language such that this *couldn't possibly happen* because you don't allow indexing on linked lists? This has the two very good properties: 1. It makes the user more aware of the limitations, even though the syntax is harder to use (hm.. it doesn't let me use indexing, there must be a good reason). 2. It makes *even experienced users* avoid this bug because they can't possibly compile it. #2 is what I care about most. As an experienced developer, I still might make the above mistake (look at Walter's mistake on the compiler that I mentioned earlier). We don't want to set minefields for experienced developers if we can help it. Yes they can shoot themselves in the foot, but we don't want to remove the safety on the gun so they are *more likely* to shoot themselves in the foot. -Steve
Oct 08 2010
prev sibling next sibling parent so <so so.do> writes:
On Fri, 08 Oct 2010 14:45:43 +0300, Juanjo Alvarez <juanjux gmail.com>  
wrote:

 Pelle Wrote:

 The only drawback I can see to having an "in" operator with all  
 containers is that some programmers would not read the documentation and  
 use it expecting it to be fast. But then that also happens with many  
 other language constructs and some programmers will write crappy  
 algoritms anyway.

This is pretty much it. I would never use a language which is designed for "some programmers". -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Oct 09 2010
prev sibling parent so <so so.do> writes:
On Fri, 08 Oct 2010 14:58:27 +0300, Steven Schveighoffer  
<schveiguy yahoo.com> wrote:

 On Thu, 07 Oct 2010 21:18:56 -0400, Juanjo Alvarez <fake fakeemail.com>  
 wrote:

 On Thu, 7 Oct 2010 15:53:13 -0700, Jonathan M Davis  
 <jmdavisProg gmx.com> wrote:
 Except that when you're dealing with generic code which has to deal

 multiple container types (like std.algorithm), you _need_ certain

 guarantees about an operation since it could happen on any

Then, don't use it in std.algorithm or any other code that needs guaranteed complexity, just like now. I don't see the problem with a generic "in" operator, nobody would be forced to use it.

That kind of "documentation" is useless, it doesn't prevent use, and it doesn't feel right to the person who accidentally uses it. When I call sort(x); and it performs horribly, am I going to blame x or sort? Certainly, I'll never think it's my own fault :) -Steve

Sure, write some random strings and compile it, if it doesn't compile, you can always blame Walter, right? If documentation is useless, so is most of the programmers, you got to accept it :) Question is, should this affect compiler design? If you think it should, you can't write a single line that calls "some other guy"'s code, it doesn't matter if he use "in" or "out", operators or just simple functions. Thanks! -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Oct 09 2010
prev sibling next sibling parent Rainer Deyke <rainerd eldwood.com> writes:
On 10/7/2010 14:33, Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is.

Let me rephrase that. I care about performance. Big-O complexity can obviously have a significant effect on performance, so I so do care about it, but only to the extend that it affects performance. Low big-O complexity is a means to an end, not a goal in and of itself. If 'n' is low enough, then a O(2**n) algorithm may well be faster than an O(1) algorithm. I also believe that, in the absence of a sophisticated system that actually verifies performance guarantees, the language and standard library should trust the programmer to know what he is doing. The standard library should only provide transitive performance guarantees, e.g. this algorithm calls function 'f' 'n' times, so the algorithm's performance is O(n * complexity(f)). If 'f' runs in constant time, the algorithm runs in linear time. If 'f' runs in exponential time, the algorithm still runs.
 big O complexity is very important when you are writing libraries.  Not
 so much when you are writing applications -- if you can live with it in
 your application, then fine.  But Phobos should not have these problems
 for people who *do* care.
 
 What I'd suggest is to write your own function that uses in when
 possible and find when not possible.  Then use that in your code.

The issue is that algorithms may use 'in' internally, so I may have to rewrite large parts of Phobos. (And the issue isn't about 'in' specifically, but complexity guarantees in general.) -- Rainer Deyke - rainerd eldwood.com
Oct 07 2010
prev sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
=20
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that=



 has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation=


=20
 Then you don't understand how important it is.

If big O complexity is so important, then why does everyone use quicksort (which is O(n**2)) and not heap sort or merge sort (which are O(n*log(n)))? Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 08 2010
next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
"Jérôme M. Berger" wrote:

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 
 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is.

If big O complexity is so important, then why does everyone use quicksort (which is O(n**2)) and not heap sort or merge sort (which are O(n*log(n)))? Jerome

For average case big O and dwarfing heap/merge sort in the constant factor. But in fact its not totally true that everyone uses quicksort pur sang: most sort implementations deal with the worst case of quicksort, by switching to heapsort for example. It is exactly because of this behavior.
Oct 08 2010
prev sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Seth Hoenig wrote:
 2010/10/8 "J=C3=A9r=C3=B4me M. Berger" <jeberger free.fr>
=20
 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com=


 wrote:
 I can't say I've ever cared about the big-O complexity of an operati=




 Then you don't understand how important it is.



 quicksort (which is O(n**2)) and not heap sort or merge sort (which
 are O(n*log(n)))?

                Jerome

Because on average quicksort is faster than heap sort, and uses much le=

 space than merge sort. Also, trivial guards can be put in place to avoi=

 running quicksort in a worst case (pre-sorted data) scenario.
=20

Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 09 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/10 9:05 CDT, "Jérôme M. Berger" wrote:
 Seth Hoenig wrote:
 2010/10/8 "Jérôme M. Berger"<jeberger free.fr>

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke<rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.


quicksort (which is O(n**2)) and not heap sort or merge sort (which are O(n*log(n)))? Jerome

Because on average quicksort is faster than heap sort, and uses much less space than merge sort. Also, trivial guards can be put in place to avoid running quicksort in a worst case (pre-sorted data) scenario.

Jerome

In fact, guards can be put to ensure that the _expected_ (not average, not best-case) complexity is O(n log n). This makes the risk of hitting a worst-case scenario negligible in a principled manner. http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ Currently std.algorithm.getPivot picks the middle of the range as the pivot, but I made it modular such that it can be improved in a number of ways (i.e. randomized, median-of-3, median-of-5, etc). Andrei
Oct 09 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the 
 pivot, but I made it modular such that it can be improved in a number of 
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile
Oct 09 2010
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot. Middle - Fastest, not very good Median-of-3 - Slower, but slightly better Median-of-5 - Slower still, but even better Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.
Oct 09 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot. Middle - Fastest, not very good Median-of-3 - Slower, but slightly better Median-of-5 - Slower still, but even better Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.

Also: in-place sorting for the median cases. Andrei
Oct 09 2010
parent reply Peter Alexander <peter.alexander.au gmail.com> writes:
On 9/10/10 5:25 PM, Andrei Alexandrescu wrote:
 On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a
 number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot. Middle - Fastest, not very good Median-of-3 - Slower, but slightly better Median-of-5 - Slower still, but even better Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.

Also: in-place sorting for the median cases. Andrei

Could you elaborate? I don't see how you lose in-place sorting with the random and middle selection cases.
Oct 09 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/10 12:12 CDT, Peter Alexander wrote:
 On 9/10/10 5:25 PM, Andrei Alexandrescu wrote:
 On 10/9/10 11:23 CDT, Peter Alexander wrote:
 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a
 number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot. Middle - Fastest, not very good Median-of-3 - Slower, but slightly better Median-of-5 - Slower still, but even better Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.

Also: in-place sorting for the median cases. Andrei

Could you elaborate? I don't see how you lose in-place sorting with the random and middle selection cases.

You just take median-of-some and also sort those elements right away before returning the pivot. It's a minor improvement. Andrei
Oct 09 2010
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
Peter Alexander Wrote:

 On 9/10/10 4:53 PM, bearophile wrote:
 Andrei:

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

Isn't median of 3 a better default? Bye, bearophile

There is no "best". They are all trade-offs between the time taken to select the pivot and the suitability of the pivot. Middle - Fastest, not very good Median-of-3 - Slower, but slightly better Median-of-5 - Slower still, but even better Randomized - Speed depends on the random number algorithm. Suitability is random, so could be good, could be bad, but on average it's good enough. Has the added quality that it's practically impossible to devise worst-case input for it (you'd need to know the generator and seed), whereas for the other 3, a malicious user could provide data that gives you O(n^2) complexity.

Introsort (a quicksort variant) degrades to heapsort for sorts that are going quadratic (it tracks the recursion depth and transitions when the depth is more than a desired threshold for the size of the range). The sort routine I wrote for Tango uses this approach plus media-of-3, the insertion sort fallback, and the quicksort variant that separately tracks equal values. All told, it's as fast or faster than everything else I've tested, even for contrived degenerate cases. Perhaps I'll convert it to use ranges and see how it does.
Oct 09 2010
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
Andrei Alexandrescu Wrote:
 
 In fact, guards can be put to ensure that the _expected_ (not average, 
 not best-case) complexity is O(n log n). This makes the risk of hitting 
 a worst-case scenario negligible in a principled manner.
 
 http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity
 
 http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ
 
 Currently std.algorithm.getPivot picks the middle of the range as the 
 pivot, but I made it modular such that it can be improved in a number of 
 ways (i.e. randomized, median-of-3, median-of-5, etc).

It would be nice if it used insertion sort for small ranges as well. There's also a quicksort variant that partitions equal-to elements separately from less-than and greater-than values, which is faster for ranges with a lot of equal elements (and no slower for ranges without).
Oct 09 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/9/10 11:52 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 In fact, guards can be put to ensure that the _expected_ (not average,
 not best-case) complexity is O(n log n). This makes the risk of hitting
 a worst-case scenario negligible in a principled manner.

 http://en.wikipedia.org/wiki/Quicksort#Randomized_quicksort_expected_complexity

 http://docs.google.com/viewer?a=v&q=cache:MdBVR26N5UsJ:www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf+randomized+quicksort&hl=en&gl=us&pid=bl&srcid=ADGEESi3GTSxfHWkeb_f14H0pkbigduS94qJVc9XLQ7aPa6lPUJ5JZbggI0izFe3ogiVOJCYcVkGtdumaS9hBvrGw0-TA_yZQj2qd1-AEudKyEWEGXnO4sTwqCZL95OpFkdFHDF2WXFV&sig=AHIEtbT1R0q5RIR4rob17QUKlYVl90vXyQ

 Currently std.algorithm.getPivot picks the middle of the range as the
 pivot, but I made it modular such that it can be improved in a number of
 ways (i.e. randomized, median-of-3, median-of-5, etc).

It would be nice if it used insertion sort for small ranges as well. There's also a quicksort variant that partitions equal-to elements separately from less-than and greater-than values, which is faster for ranges with a lot of equal elements (and no slower for ranges without).

I guess that's "fit pivot" :o). Fat pivot does cost some more. Andrei
Oct 09 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is. Here is an example of how caring about the big O complexity cut the runtime of dmd to about 1/3: http://d.puremagic.com/issues/show_bug.cgi?id=4721 big O complexity is very important when you are writing libraries. Not so much when you are writing applications -- if you can live with it in your application, then fine. But Phobos should not have these problems for people who *do* care. What I'd suggest is to write your own function that uses in when possible and find when not possible. Then use that in your code. -Steve
Oct 07 2010
prev sibling next sibling parent Torarin <torarind gmail.com> writes:
2010/10/7 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 In the end I figured there was only _one_
 quadratic operation - appending to a vector<size_t> that held document
 lengths. That wasn't even the bulk of the data and it was the last thing I
 looked at! Yet it made the run time impossible to endure.

From sgi's stl vector:

void push_back(const _Tp& __x) { if (_M_finish != _M_end_of_storage) { construct(_M_finish, __x); ++_M_finish; } else _M_insert_aux(end(), __x); } How is this quadratic, let alone linear?
Oct 08 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 07 Oct 2010 19:07:55 -0400, Rainer Deyke <rainerd eldwood.com>  
wrote:

 On 10/7/2010 14:33, Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:
 I can't say I've ever cared about the big-O complexity of an operation.

Then you don't understand how important it is.

Let me rephrase that. I care about performance. Big-O complexity can obviously have a significant effect on performance, so I so do care about it, but only to the extend that it affects performance. Low big-O complexity is a means to an end, not a goal in and of itself. If 'n' is low enough, then a O(2**n) algorithm may well be faster than an O(1) algorithm.

You'd be surprised at how important it is. I remember competing on TopCoder when they just barely added C++ as a possible language. I remember wondering "how are all the Java guys going to even compete with C++?" But the big-O complexity was always way way more important than the performance differences of native vs. JVM'd code. The thing about big-O complexity is that it gives you a good idea of how well your library will perform under defined circumstances. And libraries must be completely aware and rigid about it, or else you have situations where things are nice and easy syntax-wise but perform horribly when actually used. What you end up with when libraries don't care about it are "mysterious" slowdowns, or cases where people just start blaming the language for being so slow. Imagine if a database backend developer said "you know, who cares about big-O complexity, I think linear performance is good enough, most people have small databases anyways" who would ever use that backend? This is akin to how phobos needs to strive for the best performance.
 I also believe that, in the absence of a sophisticated system that
 actually verifies performance guarantees, the language and standard
 library should trust the programmer to know what he is doing.  The
 standard library should only provide transitive performance guarantees,
 e.g. this algorithm calls function 'f' 'n' times, so the algorithm's
 performance is O(n * complexity(f)).  If 'f' runs in constant time, the
 algorithm runs in linear time.  If 'f' runs in exponential time, the
 algorithm still runs.

 big O complexity is very important when you are writing libraries.  Not
 so much when you are writing applications -- if you can live with it in
 your application, then fine.  But Phobos should not have these problems
 for people who *do* care.

 What I'd suggest is to write your own function that uses in when
 possible and find when not possible.  Then use that in your code.

The issue is that algorithms may use 'in' internally, so I may have to rewrite large parts of Phobos. (And the issue isn't about 'in' specifically, but complexity guarantees in general.)\

You have two options, define opIn how you want if you don't care about complexity guarantees (not recommended), or define a wrapper function that uses the best available option, which your code can use. Despite phobos defining opIn to require lg(n) or better complexity, it does not restrict you from defining opIn how you want (even on arrays if you wish). I personally find the tools phobos provides completely adequate for the tasks you would need to do. -Steve
Oct 08 2010
prev sibling parent Seth Hoenig <seth.a.hoenig gmail.com> writes:
--002354867c2255beca049221272c
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

2010/10/8 "J=E9r=F4me M. Berger" <jeberger free.fr>

 Steven Schveighoffer wrote:
 On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke <rainerd eldwood.com>
 wrote:

 On 10/7/2010 13:57, Andrei Alexandrescu wrote:
 On 10/7/10 14:40 CDT, bearophile wrote:
 Another solution is just to accept O(n) as the worst complexity for
 the "in" operator. I don't understand what's the problem in this.

That means we'd have to define another operation, i.e. "quickIn" that has O(log n) bound.

Why? I can't say I've ever cared about the big-O complexity of an operation=



 Then you don't understand how important it is.

If big O complexity is so important, then why does everyone use quicksort (which is O(n**2)) and not heap sort or merge sort (which are O(n*log(n)))? Jerome -- mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr

Because on average quicksort is faster than heap sort, and uses much less space than merge sort. Also, trivial guards can be put in place to avoid running quicksort in a worst case (pre-sorted data) scenario. --002354867c2255beca049221272c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><br><div class=3D"gmail_quote">2010/10/8 &quot;J=E9r=F4me M. Berger&quo= t; <span dir=3D"ltr">&lt;<a href=3D"mailto:jeberger free.fr">jeberger free.= fr</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 = 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> <div class=3D"im">Steven Schveighoffer wrote:<br> &gt; On Thu, 07 Oct 2010 16:23:47 -0400, Rainer Deyke &lt;<a href=3D"mailto= :rainerd eldwood.com">rainerd eldwood.com</a>&gt;<br> &gt; wrote:<br> &gt;<br> &gt;&gt; On 10/7/2010 13:57, Andrei Alexandrescu wrote:<br> &gt;&gt;&gt; On 10/7/10 14:40 CDT, bearophile wrote:<br> &gt;&gt;&gt;&gt; Another solution is just to accept O(n) as the worst compl= exity for<br> &gt;&gt;&gt;&gt; the &quot;in&quot; operator. I don&#39;t understand what&#= 39;s the problem in this.<br> &gt;&gt;&gt;<br> &gt;&gt;&gt; That means we&#39;d have to define another operation, i.e. &qu= ot;quickIn&quot; that<br> &gt;&gt;&gt; has O(log n) bound.<br> &gt;&gt;<br> &gt;&gt; Why?<br> &gt;&gt;<br> &gt;&gt; I can&#39;t say I&#39;ve ever cared about the big-O complexity of = an operation.<br> &gt;<br> &gt; Then you don&#39;t understand how important it is.<br> <br> </div> =A0 =A0 =A0 =A0If big O complexity is so important, then why does ev= eryone use<br> quicksort (which is O(n**2)) and not heap sort or merge sort (which<br> are O(n*log(n)))?<br> <br> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Jerome<br> <font color=3D"#888888">--<br> mailto:<a href=3D"mailto:jeberger free.fr">jeberger free.fr</a><br> <a href=3D"http://jeberger.free.fr" target=3D"_blank">http://jeberger.free.= fr</a><br> Jabber: <a href=3D"mailto:jeberger jabber.fr">jeberger jabber.fr</a><br> <br> </font></blockquote></div><br><div><br></div><div>Because on average quicks= ort is faster than heap sort, and uses much less space than merge sort. Als= o, trivial guards can be put in place to avoid running quicksort in a worst= case (pre-sorted data) scenario.=A0</div> --002354867c2255beca049221272c--
Oct 08 2010
prev sibling next sibling parent reply Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
bearophile napisał:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are annotated
 with  complexity(O.linear), while the function that searches for
 items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an operation
 to not be worse than O(n ln n) the compiler doesn't enforce it).

Good idea. It would make a nice use case for user-defined attributes in D3. Making the language aware of complexity specifically doesn't buy much, all you need is: __traits(getAttribute, opIn, complexity).bigOh == O.constant -- Tomek
Oct 07 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Tomek Sowi=C5=84ski <just ask.me> wrote:

 __traits(getAttribute, opIn,  complexity).bigOh =3D=3D O.constant

How does this test for things like N log M? __traits(getAttribute, opIn, complexity).bigOh =3D=3D tuple( O.linear, = = O.logarithmic ) ? -- = Simen
Oct 07 2010
parent reply Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Simen kjaeraas napisał:

 Tomek Sowiński <just ask.me> wrote:
 
 __traits(getAttribute, opIn,  complexity).bigOh == O.constant

How does this test for things like N log M? __traits(getAttribute, opIn, complexity).bigOh == tuple( O.linear, O.logarithmic )

Or: __traits(getAttribute, opIn, complexity).bigOh = O.linear * O.log bigOh could be e.g. a struct with an overloaded multiplier. But you brought up something interesting -- how to bind N, M with different properties of function arguments; big oh expressions can get quite complex, e.g. void completeSort(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable, Range1, Range2)(SortedRange!(Range1,less) lhs, Range2 rhs); "[...] Performs Ο(lhs.length + rhs.length * log(rhs.length)) (best case) to Ο((lhs.length + rhs.length) * log(lhs.length + rhs.length)) (worst-case) evaluations of swap." Even if the attribute properties could see the arguments, how to deal with things like lhs.length + rhs.length? It has to be inspectable at compile-time. One idea is to store the expression's abstract syntax tree (we want AST macros in D3 anyway)... But I got a feeling we're heading for an overkill :) -- Tomek
Oct 07 2010
next sibling parent Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Tomek Sowiński napisał:

 Simen kjaeraas napisał:
 
 Tomek Sowiński <just ask.me> wrote:
 
 __traits(getAttribute, opIn,  complexity).bigOh == O.constant

How does this test for things like N log M? __traits(getAttribute, opIn, complexity).bigOh == tuple( O.linear, O.logarithmic )

Or: __traits(getAttribute, opIn, complexity).bigOh = O.linear * O.log bigOh could be e.g. a struct with an overloaded multiplier. But you brought up something interesting -- how to bind N, M with different properties of function arguments; big oh expressions can get quite complex, e.g. void completeSort(alias less = "a < b", SwapStrategy ss = SwapStrategy.unstable, Range1, Range2)(SortedRange!(Range1,less) lhs, Range2 rhs); "[...] Performs Ο(lhs.length + rhs.length * log(rhs.length)) (best case) to Ο((lhs.length + rhs.length) * log(lhs.length + rhs.length)) (worst-case) evaluations of swap." Even if the attribute properties could see the arguments, how to deal with things like lhs.length + rhs.length? It has to be inspectable at compile-time. One idea is to store the expression's abstract syntax tree (we want AST macros in D3 anyway)... But I got a feeling we're heading for an overkill :)

If O.linear was a template, it could take an alias... alias completeSort f; __traits(getAttribute, f, complexity).bigOh == (O.linear!(f.args.lhs.length) + O.linear! (f.args.lhs.length)) * O.log!(O.linear!(f.args.lhs.length) + O.linear!(f.args.lhs.length)) :) -- Tomek
Oct 07 2010
prev sibling parent reply Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Simen kjaeraas napisał:

 Tomek Sowiński <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal
 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One
 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...
 But I got a feeling
 we're heading for an overkill :)

Just as soon as we solve the halting problem, eh?

What's the halting problem? -- Tomek
Oct 07 2010
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
On 08/10/2010 00:39, Tomek Sowiński wrote:
<snip>
 What's the halting problem?

Basically, it's the challenge of determining algorithmically whether an arbitrary algorithm given arbitrary input will eventually halt or carry on running forever. The point is that the halting problem is known to be unsolvable. The standard proof of this is as follows. Suppose the halt analyser algorithm we seek exists. Call it WillHalt(Algorithm, Input). Then we can consider WillHalt(Algorithm, Algorithm). Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), defined as if WillHalt(Algorithm, Algorithm) then loop forever else return Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself). Personally, I think it's a shame that the halting problem can't be solved. If it could, we could use it to solve many mathematical problems that have as it happens remained unsolved for centuries. Stewart.
Oct 09 2010
parent Peter Alexander <peter.alexander.au gmail.com> writes:
On 9/10/10 3:11 PM, Stewart Gordon wrote:
 Personally, I think it's a shame that the halting problem can't be
 solved. If it could, we could use it to solve many mathematical problems
 that have as it happens remained unsolved for centuries.

But solving those problems would mean nothing in that hypothetical situation, because, for the halting problem to be solvable, it would require that P <=> ~P, so any "theorem" would be meaningless. Besides, I don't care to think about universes where P <=> ~P :-)
Oct 09 2010
prev sibling parent Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:
On 08/10/2010 00:42, Jonathan M Davis wrote:
 On Thursday, October 07, 2010 16:39:15 Tomek Sowiński wrote:

 http://en.wikipedia.org/wiki/Halting_problem

 It's a classic problem in computer science. Essentially what it comes down to
is
 that you can't determine when - or even if - a program will halt until it
 actually has. It's why stuff like file transfer dialogs can never be totally
 accurate. And best, you can estimate how long a file transfer will take based
on
 the current progress, but you can't _know_ when it will complete.

O_o -- Bruno Medeiros - Software Engineer
Oct 29 2010
prev sibling next sibling parent Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
Daniel Gibson napisał:

 2010/10/8 Tomek Sowiński <just ask.me>:
 bearophile napisał:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are
 annotated with  complexity(O.linear), while the function that searches
 for items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an
 operation to not be worse than O(n ln n) the compiler doesn't enforce
 it).

Good idea. It would make a nice use case for user-defined attributes in D3. Making the language aware of complexity specifically doesn't buy much, all you need is: __traits(getAttribute, opIn, complexity).bigOh == O.constant -- Tomek

Doesn't the language have to be aware of this if it's supposed to work with ordinary arrays?

I don't think so, there can always be uniform syntax wrappers (there's already a bunch of them in std.array): complexity(O.constant) size_t length(T[] arr) { return arr.length; } or special cases similar to hasLength!string. -- Tomek
Oct 07 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal =

 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One=

 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...=

 But I got a feeling
 we're heading for an overkill :)

Just as soon as we solve the halting problem, eh? -- = Simen
Oct 07 2010
prev sibling next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Tomek Sowi=C5=84ski <just ask.me> wrote:

 Simen kjaeraas napisa=C5=82:

 Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to dea=



 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. O=



 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway).=



 But I got a feeling
 we're heading for an overkill :)

Just as soon as we solve the halting problem, eh?

What's the halting problem?

http://en.wikipedia.org/wiki/Halting_problem Basically, inspecting the AST in search of the complexity might have an infinite (or at least, arbitrarily high) complexity itself. It is likely= possible in some situations, but in the general case, not so much. Also, consider the while loop. It may have an arbitrarily complex termination condition, making it hard or impossible to find the complexity. Example, with added complexity by omitting the source of foo: extern bool foo( ); void bar( bool delegate( ) dg ) { while ( 1 ) { if ( dg( ) ) { break; } } } bar( &foo ); -- = Simen
Oct 07 2010
prev sibling parent Seth Hoenig <seth.a.hoenig gmail.com> writes:
--20cf301d3ec8e417570492385da7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Sat, Oct 9, 2010 at 9:11 AM, Stewart Gordon <smjg_1998 yahoo.com> wrote:

 On 08/10/2010 00:39, Tomek Sowi=C5=84ski wrote:
 <snip>

 What's the halting problem?

Basically, it's the challenge of determining algorithmically whether an arbitrary algorithm given arbitrary input will eventually halt or carry o=

 running forever.

 The point is that the halting problem is known to be unsolvable.  The
 standard proof of this is as follows.  Suppose the halt analyser algorith=

 we seek exists.  Call it WillHalt(Algorithm, Input).  Then we can conside=

 WillHalt(Algorithm, Algorithm).

 Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), defin=

 as

 if WillHalt(Algorithm, Algorithm) then
    loop forever
 else
    return

 Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself=

 Personally, I think it's a shame that the halting problem can't be solved=

  If it could, we could use it to solve many mathematical problems that ha=

 as it happens remained unsolved for centuries.

 Stewart.

Or more poetically, No general procedure for bug checks succeeds. Now, I won=E2=80=99t just assert that, I=E2=80=99ll show where it leads: I will prove that although you might work till you drop, you cannot tell if computation will stop. For imagine we have a procedure called P that for specified input permits you to see whether specified source code, with all of its faults, defines a routine that eventually halts. You feed in your program, with suitable data, and P gets to work, and a little while later (in finite compute time) correctly infers whether infinite looping behavior occurs. If there will be no looping, then P prints out `Good.=E2=80=99 That means work on this input will halt, as it should. But if it detects an unstoppable loop, then P reports `Bad!=E2=80=99 =E2=80=94 which means you=E2=80=99re in the s= oup. Well, the truth is that P cannot possibly be, because if you wrote it and gave it to me, I could use it to set up a logical bind that would shatter your reason and scramble your mind. Here=E2=80=99s the trick that I=E2=80=99ll use =E2=80=94 and it=E2=80=99s s= imple to do. I=E2=80=99ll define a procedure, which I will call Q, that will use P=E2=80=99s predictions of halting success to stir up a terrible logical mess. For a specified program, say A, one supplies, the first step of this program called Q I devise is to find out from P what=E2=80=99s the right thing to say of the looping behavior of A run on A. If P=E2=80=99s answer is `Bad!=E2=80=99, Q will suddenly stop. But otherwise, Q will go back to the top, and start off again, looping endlessly back, till the universe dies and turns frozen and black. And this program called Q wouldn=E2=80=99t stay on the shelf; I would ask it to forecast its run on itself. When it reads its own source code, just what will it do? What=E2=80=99s the looping behavior of Q run on Q? If P warns of infinite loops, Q will quit; yet P is supposed to speak truly of it! And if Q=E2=80=99s going to quit, then P should say `Good=E2=80=99 =E2=80=94 which makes Q start to loop! (P denied that it would.) No matter how P might perform, Q will scoop it: Q uses P=E2=80=99s output to make P look stupid. Whatever P says, it cannot predict Q: P is right when it=E2=80=99s wrong, and is false when it=E2=80=99s true! I=E2=80=99ve created a paradox, neat as can be =E2=80=94 and simply by using your putative P. When you posited P you stepped into a snare; Your assumption has led you right into my lair. So where can this argument possibly go? I don=E2=80=99t have to tell you; I=E2=80=99m sure you must know. By reductio, there cannot possibly be a procedure that acts like the mythical P. You can never find general mechanical means for predicting the acts of computing machines. It=E2=80=99s something that cannot be done. So we users must find our own bugs. Our computers are losers! - Geoffrey K. Pullum --20cf301d3ec8e417570492385da7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <br><br><div class=3D"gmail_quote">On Sat, Oct 9, 2010 at 9:11 AM, Stewart = Gordon <span dir=3D"ltr">&lt;<a href=3D"mailto:smjg_1998 yahoo.com">smjg_19= 98 yahoo.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty= le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> On 08/10/2010 00:39, Tomek Sowi=C5=84ski wrote:<br> &lt;snip&gt;<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> What&#39;s the halting problem?<br> </blockquote> <br> Basically, it&#39;s the challenge of determining algorithmically whether an= arbitrary algorithm given arbitrary input will eventually halt or carry on= running forever.<br> <br> The point is that the halting problem is known to be unsolvable. =C2=A0The = standard proof of this is as follows. =C2=A0Suppose the halt analyser algor= ithm we seek exists. =C2=A0Call it WillHalt(Algorithm, Input). =C2=A0Then w= e can consider WillHalt(Algorithm, Algorithm).<br> <br> Then we can define a new algorithm, LoopIfHaltsOnItself(Algorithm), defined= as<br> <br> if WillHalt(Algorithm, Algorithm) then<br> =C2=A0 =C2=A0loop forever<br> else<br> =C2=A0 =C2=A0return<br> <br> Now try to analyse the outcome of LoopIfHaltsOnItself(LoopIfHaltsOnItself).= <br> <br> <br> Personally, I think it&#39;s a shame that the halting problem can&#39;t be = solved. =C2=A0If it could, we could use it to solve many mathematical probl= ems that have as it happens remained unsolved for centuries.<br><font color= =3D"#888888"> <br> Stewart.</font></blockquote><div><br></div><div>Or =C2=A0more poetically,= =C2=A0</div><div><br></div><div><meta http-equiv=3D"content-type" content= =3D"text/html; charset=3Dutf-8"><span class=3D"Apple-style-span" style=3D"f= ont-family: Tahoma, arial, helvetica, sans-serif; font-size: 13px; font-sty= le: italic; "><p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom= : 0px; margin-left: 0px; padding-top: 10px; padding-right: 0px; padding-bot= tom: 10px; padding-left: 0px; font-size: 10pt; line-height: 1.4em; "> No general procedure for bug checks succeeds.<br style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; = padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">Now, I won=E2= =80=99t just assert that, I=E2=80=99ll show where it leads:<br style=3D"mar= gin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padd= ing-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "=

-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding= -top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> you cannot tell if computation will stop.</p><p style=3D"margin-top: 0px; m= argin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 10px; = padding-right: 0px; padding-bottom: 10px; padding-left: 0px; font-size: 10p= t; line-height: 1.4em; "> For imagine we have a procedure called P<br style=3D"margin-top: 0px; margi= n-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; paddi= ng-right: 0px; padding-bottom: 0px; padding-left: 0px; ">that for specified= input permits you to see<br style=3D"margin-top: 0px; margin-right: 0px; m= argin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; = padding-bottom: 0px; padding-left: 0px; "> whether specified source code, with all of its faults,<br style=3D"margin-t= op: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-t= op: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">defi= nes a routine that eventually halts.</p> <p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-= left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">You feed in your pro= gram, with suitable data,<br style=3D"margin-top: 0px; margin-right: 0px; m= argin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; = padding-bottom: 0px; padding-left: 0px; "> and P gets to work, and a little while later<br style=3D"margin-top: 0px; m= argin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; p= adding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">(in finite com= pute time) correctly infers<br style=3D"margin-top: 0px; margin-right: 0px;= margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px= ; padding-bottom: 0px; padding-left: 0px; "> whether infinite looping behavior occurs.</p><p style=3D"margin-top: 0px; m= argin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 10px; = padding-right: 0px; padding-bottom: 10px; padding-left: 0px; font-size: 10p= t; line-height: 1.4em; "> If there will be no looping, then P prints out `Good.=E2=80=99<br style=3D"= margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; p= adding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px= ; ">That means work on this input will halt, as it should.<br style=3D"marg= in-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; paddi= ng-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> But if it detects an unstoppable loop,<br style=3D"margin-top: 0px; margin-= right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding= -right: 0px; padding-bottom: 0px; padding-left: 0px; ">then P reports `Bad!= =E2=80=99 =E2=80=94 which means you=E2=80=99re in the soup.</p> <p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-= left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">Well, the truth is t= hat P cannot possibly be,<br style=3D"margin-top: 0px; margin-right: 0px; m= argin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; = padding-bottom: 0px; padding-left: 0px; "> because if you wrote it and gave it to me,<br style=3D"margin-top: 0px; mar= gin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; pad= ding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">I could use it t= o set up a logical bind<br style=3D"margin-top: 0px; margin-right: 0px; mar= gin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; pa= dding-bottom: 0px; padding-left: 0px; "> that would shatter your reason and scramble your mind.</p><p style=3D"margi= n-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; paddin= g-top: 10px; padding-right: 0px; padding-bottom: 10px; padding-left: 0px; f= ont-size: 10pt; line-height: 1.4em; "> Here=E2=80=99s the trick that I=E2=80=99ll use =E2=80=94 and it=E2=80=99s s= imple to do.<br style=3D"margin-top: 0px; margin-right: 0px; margin-bottom:= 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-botto= m: 0px; padding-left: 0px; ">I=E2=80=99ll define a procedure, which I will = call Q,<br style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px;= margin-left: 0px; padding-top: 0px; padding-right: 0px; padding-bottom: 0p= x; padding-left: 0px; "> that will use P=E2=80=99s predictions of halting success<br style=3D"margin= -top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding= -top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">to= stir up a terrible logical mess.</p> <p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-= left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">For a specified prog= ram, say A, one supplies,<br style=3D"margin-top: 0px; margin-right: 0px; m= argin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; = padding-bottom: 0px; padding-left: 0px; "> the first step of this program called Q I devise<br style=3D"margin-top: 0p= x; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0p= x; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">is to find= out from P what=E2=80=99s the right thing to say<br style=3D"margin-top: 0= px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0= px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> of the looping behavior of A run on A.</p><p style=3D"margin-top: 0px; marg= in-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 10px; pad= ding-right: 0px; padding-bottom: 10px; padding-left: 0px; font-size: 10pt; = line-height: 1.4em; "> If P=E2=80=99s answer is `Bad!=E2=80=99, Q will suddenly stop.<br style=3D"= margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; p= adding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px= ; ">But otherwise, Q will go back to the top,<br style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; = padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> and start off again, looping endlessly back,<br style=3D"margin-top: 0px; m= argin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; p= adding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">till the unive= rse dies and turns frozen and black.</p> <p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-= left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">And this program cal= led Q wouldn=E2=80=99t stay on the shelf;<br style=3D"margin-top: 0px; marg= in-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padd= ing-right: 0px; padding-bottom: 0px; padding-left: 0px; "> I would ask it to forecast its run on itself.<br style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; = padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">When it reads= its own source code, just what will it do?<br style=3D"margin-top: 0px; ma= rgin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; pa= dding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> What=E2=80=99s the looping behavior of Q run on Q?</p><p style=3D"margin-to= p: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-to= p: 10px; padding-right: 0px; padding-bottom: 10px; padding-left: 0px; font-= size: 10pt; line-height: 1.4em; "> If P warns of infinite loops, Q will quit;<br style=3D"margin-top: 0px; mar= gin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; pad= ding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">yet P is suppose= d to speak truly of it!<br style=3D"margin-top: 0px; margin-right: 0px; mar= gin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; pa= dding-bottom: 0px; padding-left: 0px; "> And if Q=E2=80=99s going to quit, then P should say `Good=E2=80=99<br style= =3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0p= x; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left:= 0px; ">=E2=80=94 which makes Q start to loop! (P denied that it would.)</p=

left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">No matter how P migh= t perform, Q will scoop it:<br style=3D"margin-top: 0px; margin-right: 0px;= margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px= ; padding-bottom: 0px; padding-left: 0px; "> Q uses P=E2=80=99s output to make P look stupid.<br style=3D"margin-top: 0p= x; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0p= x; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">Whatever P= says, it cannot predict Q:<br style=3D"margin-top: 0px; margin-right: 0px;= margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px= ; padding-bottom: 0px; padding-left: 0px; "> P is right when it=E2=80=99s wrong, and is false when it=E2=80=99s true!</p=
<p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin=

dding-left: 0px; font-size: 10pt; line-height: 1.4em; "> I=E2=80=99ve created a paradox, neat as can be =E2=80=94<br style=3D"margin= -top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding= -top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">an= d simply by using your putative P.<br style=3D"margin-top: 0px; margin-righ= t: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-rig= ht: 0px; padding-bottom: 0px; padding-left: 0px; "> When you posited P you stepped into a snare;<br style=3D"margin-top: 0px; m= argin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; p= adding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">Your assumptio= n has led you right into my lair.</p> <p style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-= left: 0px; padding-top: 10px; padding-right: 0px; padding-bottom: 10px; pad= ding-left: 0px; font-size: 10pt; line-height: 1.4em; ">So where can this ar= gument possibly go?<br style=3D"margin-top: 0px; margin-right: 0px; margin-= bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right: 0px; paddin= g-bottom: 0px; padding-left: 0px; "> I don=E2=80=99t have to tell you; I=E2=80=99m sure you must know.<br style= =3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0p= x; padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left:= 0px; ">By reductio, there cannot possibly be<br style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; = padding-right: 0px; padding-bottom: 0px; padding-left: 0px; "> a procedure that acts like the mythical P.</p><p style=3D"margin-top: 0px; = margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 10px;= padding-right: 0px; padding-bottom: 10px; padding-left: 0px; font-size: 10= pt; line-height: 1.4em; "> You can never find general mechanical means<br style=3D"margin-top: 0px; ma= rgin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; pa= dding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">for predicting = the acts of computing machines.<br style=3D"margin-top: 0px; margin-right: = 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-right:= 0px; padding-bottom: 0px; padding-left: 0px; "> It=E2=80=99s something that cannot be done. So we users<br style=3D"margin-= top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-= top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; ">mus= t find our own bugs. Our computers are losers!</p> </span></div></div>-=C2=A0<meta http-equiv=3D"content-type" content=3D"text= /html; charset=3Dutf-8"><span class=3D"Apple-style-span" style=3D"font-fami= ly: &#39;Times New Roman&#39;; font-size: medium; font-style: italic; ">Geo= ffrey K. Pullum</span> --20cf301d3ec8e417570492385da7--
Oct 09 2010
prev sibling next sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
2010/10/8 Tomek Sowi=F1ski <just ask.me>:
 bearophile napisa=B3:

 Another solution is to support the "in" operator for dynamic arrays too,
 and define a new attribute, like  complexity(), plus an Enum that allows
 to specify the worst case complexity. So associative arrays are annotate=


 with  complexity(O.linear), while the function that searches for
 items/substrings in arrays/strings is  complexity(O.constant). At
 compile-time the generic code is then able to query the computational
 complexity of the "in" operator implementation. The problem is that the
 compiler today can't enforce functions annotated with
  complexity(O.constant) to actually not perform a linear search (but I
 don't think it's a problem, today if the Range protocol asks an operatio=


 to not be worse than O(n ln n) the compiler doesn't enforce it).

Good idea. It would make a nice use case for user-defined attributes in D=

 language aware of  complexity specifically doesn't buy much, all you need=

 __traits(getAttribute, opIn,  complexity).bigOh =3D=3D O.constant

 --
 Tomek

Doesn't the language have to be aware of this if it's supposed to work with ordinary arrays?
Oct 07 2010
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, October 07, 2010 16:39:15 Tomek Sowi=C5=84ski wrote:
 Simen kjaeraas napisa=C5=82:
 Tomek Sowi=C5=84ski <just ask.me> wrote:
 Even if the attribute properties could see the arguments, how to deal
 with things like
 lhs.length + rhs.length? It has to be inspectable at compile-time. One
 idea is to store the
 expression's abstract syntax tree (we want AST macros in D3 anyway)...
 But I got a feeling
 we're heading for an overkill :)

Just as soon as we solve the halting problem, eh?

What's the halting problem?

http://en.wikipedia.org/wiki/Halting_problem It's a classic problem in computer science. Essentially what it comes down = to is=20 that you can't determine when - or even if - a program will halt until it=20 actually has. It's why stuff like file transfer dialogs can never be totall= y=20 accurate. And best, you can estimate how long a file transfer will take bas= ed on=20 the current progress, but you can't _know_ when it will complete. It's even= =20 worse for algorithms where you can't even estimate how much work there is l= eft.=20 And of course, you don't even necessarily know that the program _will_ halt= =2E It=20 could be an infinite loop for all you know. =2D Jonathan M Davis
Oct 07 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday 08 October 2010 03:06:01 Stanislav Blinov wrote:
 Yet still, generality ends at some point. You can't devise every
 possible algorithm for any possible types and have it have set-in-stone
 complexity independently of types.
 
 Take std.range.popFrontN(). It's generic, and it's used in other
 algorithms. Yet it has O(1) complexity for ranges that support slicing,
 and O(n) for those that do not. But you don't take into account
 complexity of slicing operation, or complexity of stepping through the
 range.
 
 Or std.algorithm.find(). It's basically O(n), but then again, when using
 it, one should also consider complexity of used predicate.
 Just the same happened in the case Andrei described: the algorithm was
 O(n) judging from the description (performing n insertions into
 container). But the container itself blew it up into quadratic time
 because of it's own insertion algorithm.
 
 What I mean is you'll always have algorithms that will perform
 differently for different containers, and you'll always have to choose
 containers that best fit your needs. Generic code is not only about
 efficiency: you'd at least expect it to work for different types.
 Replacing vector with list in Andrei's case would probably solve the
 problem at the cost of losing random access (together with contiguous
 storage). Which means it'd also require changing code if random access
 was in use somewhere. Having a generic random access function
 at(Container,index) (offering complexity that varies with container
 used: e.g. O(1) for arrays, O(n) for lists) would save you maintenance.

All true. However, the point is that operations need to have a know complexity. If in is known to be O(1), the algorithms can safely use it, knowing that it's O(1). Any container that can't do it in O(1) doesn't implement in. You could, on the other hand, have in be O(n) (which doesn't stop containers from implementing it as O(1) since that's better than O(n)), and then any algorithm that uses it has to assume that it could be O(n) and therfore may need to avoid it. The real question is what complexity we want to define it as. At the moment, the only container to use it are the built-in associative arrays, and for them, it's O(1). Since find() is already going to be O(n), it seems to me that in should be better than O(n) - be it O(1) or O(log n) - and Andrei appears to agree, but others don't, hence this long discussion... - Jonathan M Davis
Oct 08 2010