digitalmars.D - Do sorted ranges have any special properties?

Andrei Alexandrescu (26/26) Jul 25 2010 I'm trying to find justifications for keeping assumeSorted and friends

Tomek =?UTF-8?B?U293acWEc2tp?= (12/44) Jul 25 2010 Sort of trivial, but I'd like to see the predicate with which it's sorte...

Steven Schveighoffer (19/57) Jul 26 2010 I think the "range" returned by assumeSorted has a predicate alias in it...

Andrei Alexandrescu (15/81) Jul 26 2010 We just need to do the STL trick of re-exposing the template argument

Andrei Alexandrescu (9/53) Jul 26 2010 Good point, though that reintroduces the question of comparing find's

Michel Fortin (9/17) Jul 27 2010 That would be like implementing the uniform calling syntax for certain
Philippe Sigaud (6/12) Jul 27 2010 I really don't see how you would do that in a generic way... Even taking

Andrei Alexandrescu (4/23) Jul 27 2010 Yah; it would help somehow if there was a way to get the body of a

Tomek =?UTF-8?B?U293acWEc2tp?= (3/10) Jul 27 2010 Isn't dereferencing function pointers illegal?

Andrei Alexandrescu (3/13) Jul 27 2010 We need that too :o).

Tomek =?UTF-8?B?U293acWEc2tp?= (7/10) Jul 27 2010 Don't get it:) Need them to be dereferencable or not?

Tomek =?UTF-8?B?U293acWEc2tp?= (4/5) Jul 25 2010 Generalize to quantile.

Andrei Alexandrescu (3/9) Jul 26 2010 Makes sense.

Philippe Sigaud (15/26) Jul 27 2010 Ah, not as methods, but free functions taking sorted ranges:

bearophile (4/5) Jul 27 2010 I was planning to ask for it in an enhancement request.
Andrei Alexandrescu (5/42) Jul 27 2010 I definitely wanted to add that... in fact I vaguely thought I did!

Philippe Sigaud (5/9) Jul 30 2010 Better late than never...

Lutger (9/47) Jul 25 2010 I think the best justification is not in Phobos alone, but having a stan...

Philippe Sigaud (49/75) Jul 25 2010 It's not exactly what you're asking for, but my first thought was:

Andrei Alexandrescu (16/117) Jul 26 2010 Yah.

Peter Alexander (25/25) Jul 27 2010 Personally, I would just expose all accelerated algorithms for sorted

Andrei Alexandrescu (30/63) Jul 27 2010 I am quoting this in full because I agree with it in full! I think this

bearophile (4/9) Jul 27 2010 But the safety (reliability) of such test is low, and it can increase a ...

Andrei Alexandrescu (7/17) Jul 27 2010 Some reliability is considerably to have than patently none especially

Jonathan M Davis (9/12) Jul 27 2010 It's a range, so put it in std.range. It's already likely pretty common ...

Andrei Alexandrescu (7/20) Jul 27 2010 I agree with the argument. I just fear that someone wants binary search,...

Steven Schveighoffer (4/24) Jul 27 2010 That's what indexes are for :)

Tomek =?UTF-8?B?U293acWEc2tp?= (6/18) Jul 27 2010 Some of std.algorithm (e.g. find, partition) can profit on sortedness, o...

Andrei Alexandrescu (4/24) Jul 27 2010 Yah but the type is a range.

Philippe Sigaud (28/55) Jul 28 2010 t.

Tomek =?UTF-8?B?U293acWEc2tp?= (19/31) Jul 28 2010 OK, range then.

Andrei Alexandrescu (5/42) Jul 28 2010 Like it, just one issue - destructive copy is unlikely to solve any

Tomek =?UTF-8?B?U293acWEc2tp?= (3/29) Jul 28 2010 I know. Yet, it does no harm and is better than what is now.

KennyTM~ (6/16) Jul 27 2010 No

Andrei Alexandrescu (3/18) Jul 27 2010 Oops. Thanks for the correction.

Philippe Sigaud (33/57) Jul 27 2010 Another thing, though I'm not sure it's a good idea, it to have them def...

Andrei Alexandrescu (10/38) Jul 27 2010 One issue is dealing with sorted random-access ranges of size_t. Then

Philippe Sigaud (53/73) Jul 28 2010 You're right. And I cannot discard the notion of size_t ranges, because

bearophile (17/24) Jul 28 2010 In Python you can give a lamba function to filter(), or you can use:
bearophile (4/5) Jul 28 2010 In a so small array of chars (or even wchars/dchars) a sequential search...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I'm trying to find justifications for keeping assumeSorted and friends 
within Phobos. Background: assumeSorted(r) where r is some range returns 
a value that wraps r and clarifies to the caller that it can assume r 
has been sorted.

The obvious use case in Phobos is that find(haystack, needle) can 
proceed faster if haystack == assumeSorted(something).

The nicest way to go about this is to make the type returned by 
assumeSorted a kind of "range with benefits". That is, it would 
implement all range primitives, plus a few more primitives that are 
advantageous for sorted ranges. Question is: what are those? What kind 
of cool primitives could a sorted range define?

Here are a few I can think of:

find -> uses binary search if random-access, or at least early stopping 
otherwise.

minElement -> r.front

maxElement -> r.back

topN -> essentially does nothing

median -> r[r.length / 2]

Such functions could have free counterparts in std.algorithm. The free 
functions check whether the range already implements the fast functions 
and uses them transparently if present. So then we have e.g. find() 
which works in linear time for most ranges, but logarithmic time on 
random-access sorted ranges.

Other ideas?


Thanks,

Andrei

Jul 25 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.
 
 The obvious use case in Phobos is that find(haystack, needle) can
 proceed faster if haystack == assumeSorted(something).
 
 The nicest way to go about this is to make the type returned by
 assumeSorted a kind of "range with benefits". That is, it would
 implement all range primitives, plus a few more primitives that are
 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?
 
 Here are a few I can think of:
 
 find -> uses binary search if random-access, or at least early stopping
 otherwise.
 
 minElement -> r.front
 
 maxElement -> r.back
 
 topN -> essentially does nothing
 
 median -> r[r.length / 2]
 
 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.

Sort of trivial, but I'd like to see the predicate with which it's sorted.

Also, I can't stop thinking that it's stand-alone find()'s job utilize 
whatever features the range has (be it random access, sortedness, or 
anything else) to execute fast, not the passed in range's.

Checking for a member find() sounds good but doesn't have anything specific 
to do with the range being sorted. Any range could define its find() to 
special-case on its structure.

BTW, would it be a big deal if sort() returned assumeSorted!Range? Many uses 
are a re-sort + binary search combo.

 
Tomek

Jul 25 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 25 Jul 2010 15:44:36 -0400, Tomek Sowiński <just ask.me> wrote:

 Andrei Alexandrescu wrote:

 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.

 The obvious use case in Phobos is that find(haystack, needle) can
 proceed faster if haystack == assumeSorted(something).

 The nicest way to go about this is to make the type returned by
 assumeSorted a kind of "range with benefits". That is, it would
 implement all range primitives, plus a few more primitives that are
 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?

 Here are a few I can think of:

 find -> uses binary search if random-access, or at least early stopping
 otherwise.

 minElement -> r.front

 maxElement -> r.back

 topN -> essentially does nothing

 median -> r[r.length / 2]

 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.

 Sort of trivial, but I'd like to see the predicate with which it's  
 sorted.

I think the "range" returned by assumeSorted has a predicate alias in it  
already (which defaults to "a < b")

 BTW, would it be a big deal if sort() returned assumeSorted!Range? Many  
 uses
 are a re-sort + binary search combo.

Yeah, that would be good.

The only problem I see with this is that "sortedness" is broken quite  
easily in ways that does not remove the assumeSorted type.  For example,  
if assumeSorted just forwards its underlying range's methods, then given  
an array arr:

auto x = assumeSorted(arr.sort);

x[0] = 999999; // probably no longer sorted

x.find(999999); // probably doesn't work.

And there are other subtle ways to break it in ways that you couldn't even  
program assumeSorted to deal with:

arr[0] = 999999; // how does x detect this?

I feel like assumeSorted is something that's temporary.  It can only be  
used as a parameter or return decoration, but as soon as you assign it to  
a variable, it's gone.  Not sure if that can work.  User-defined  
attributes would be nice here :)

-Steve

Jul 26 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Sun, 25 Jul 2010 15:44:36 -0400, Tomek Sowiński <just ask.me> wrote:
 
 Andrei Alexandrescu wrote:

 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.

 The obvious use case in Phobos is that find(haystack, needle) can
 proceed faster if haystack == assumeSorted(something).

 The nicest way to go about this is to make the type returned by
 assumeSorted a kind of "range with benefits". That is, it would
 implement all range primitives, plus a few more primitives that are
 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?

 Here are a few I can think of:

 find -> uses binary search if random-access, or at least early stopping
 otherwise.

 minElement -> r.front

 maxElement -> r.back

 topN -> essentially does nothing

 median -> r[r.length / 2]

 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.

 Sort of trivial, but I'd like to see the predicate with which it's 
 sorted.

 
 I think the "range" returned by assumeSorted has a predicate alias in it 
 already (which defaults to "a < b")

We just need to do the STL trick of re-exposing the template argument 
from within the template:

struct SortedRange(alias _less)
{
     alias binaryFun!_less less;
     ...
}

Clients will use SomeType.less.

 BTW, would it be a big deal if sort() returned assumeSorted!Range? 
 Many uses
 are a re-sort + binary search combo.

 
 Yeah, that would be good.
 
 The only problem I see with this is that "sortedness" is broken quite 
 easily in ways that does not remove the assumeSorted type.  For example, 
 if assumeSorted just forwards its underlying range's methods, then given 
 an array arr:
 
 auto x = assumeSorted(arr.sort);
 
 x[0] = 999999; // probably no longer sorted
 
 x.find(999999); // probably doesn't work.

I was considering disabling that by having SortedRange's operators 
return rvalues instead of lvalues.

 And there are other subtle ways to break it in ways that you couldn't 
 even program assumeSorted to deal with:
 
 arr[0] = 999999; // how does x detect this?

That's pretty brutal :o).

 I feel like assumeSorted is something that's temporary.  It can only be 
 used as a parameter or return decoration, but as soon as you assign it 
 to a variable, it's gone.  Not sure if that can work.  User-defined 
 attributes would be nice here :)

I agree that assumeSorted is somewhat evanescent. I'm still undecided 
whether it deserves a place in Phobos or not.


Andrei

Jul 26 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Tomek Sowin'ski wrote:
 Andrei Alexandrescu wrote:
 
 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.

 The obvious use case in Phobos is that find(haystack, needle) can
 proceed faster if haystack == assumeSorted(something).

 The nicest way to go about this is to make the type returned by
 assumeSorted a kind of "range with benefits". That is, it would
 implement all range primitives, plus a few more primitives that are
 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?

 Here are a few I can think of:

 find -> uses binary search if random-access, or at least early stopping
 otherwise.

 minElement -> r.front

 maxElement -> r.back

 topN -> essentially does nothing

 median -> r[r.length / 2]

 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.

 
 Sort of trivial, but I'd like to see the predicate with which it's sorted.

Thanks for your feedback. Yah, that's a given.

 Also, I can't stop thinking that it's stand-alone find()'s job utilize 
 whatever features the range has (be it random access, sortedness, or 
 anything else) to execute fast, not the passed in range's.

Good point, though that reintroduces the question of comparing find's 
predicate with SortedRange's predicate.

 Checking for a member find() sounds good but doesn't have anything specific 
 to do with the range being sorted. Any range could define its find() to 
 special-case on its structure.

And that's not bad I think. If the range feels it has defined enough 
structure to implement fast find, maybe find should defer to it. I'm 
unclear about the best strategy here.

 BTW, would it be a big deal if sort() returned assumeSorted!Range? Many uses 
 are a re-sort + binary search combo.

Yah, that would be great.


Andrei

Jul 26 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-07-27 01:04:28 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Tomek Sowin'ski wrote:
 Checking for a member find() sounds good but doesn't have anything 
 specific to do with the range being sorted. Any range could define its 
 find() to special-case on its structure.

 
 And that's not bad I think. If the range feels it has defined enough 
 structure to implement fast find, maybe find should defer to it. I'm 
 unclear about the best strategy here.

That would be like implementing the uniform calling syntax for certain 
functions. I like it, except that it being only for certain functions 
breaks the 'uniform' thing.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jul 27 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Tue, Jul 27, 2010 at 07:04, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Also, I can't stop thinking that it's stand-alone find()'s job utilize
 whatever features the range has (be it random access, sortedness, or
 anything else) to execute fast, not the passed in range's.

 Good point, though that reintroduces the question of comparing find's
 predicate with SortedRange's predicate.

I really don't see how you would do that in a generic way... Even taking
into account that predicates return a very simple value (bool) and that they
terminate (well, the input range's one _was_ used to sort it), that's akin
to determining if two unknown functions produce equal values.

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Philippe Sigaud wrote:
 On Tue, Jul 27, 2010 at 07:04, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> 
 wrote:
 
 
         Also, I can't stop thinking that it's stand-alone find()'s job
         utilize whatever features the range has (be it random access,
         sortedness, or anything else) to execute fast, not the passed in
         range's.
 
 
     Good point, though that reintroduces the question of comparing
     find's predicate with SortedRange's predicate.
 
 
 I really don't see how you would do that in a generic way... Even taking 
 into account that predicates return a very simple value (bool) and that 
 they terminate (well, the input range's one _was_ used to sort it), 
 that's akin to determining if two unknown functions produce equal values.

Yah; it would help somehow if there was a way to get the body of a 
function as a void[]. The address already exists, we need the sizeof.

Andrei

Jul 27 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

 I really don't see how you would do that in a generic way... Even taking
 into account that predicates return a very simple value (bool) and that
 they terminate (well, the input range's one was used to sort it),
 that's akin to determining if two unknown functions produce equal values.

 
 Yah; it would help somehow if there was a way to get the body of a
 function as a void[]. The address already exists, we need the sizeof.

Isn't dereferencing function pointers illegal?


Tomek

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Tomek Sowiński wrote:
 Andrei Alexandrescu wrote:
 
 I really don't see how you would do that in a generic way... Even taking
 into account that predicates return a very simple value (bool) and that
 they terminate (well, the input range's one was used to sort it),
 that's akin to determining if two unknown functions produce equal values.

 Yah; it would help somehow if there was a way to get the body of a
 function as a void[]. The address already exists, we need the sizeof.

 
 Isn't dereferencing function pointers illegal?

We need that too :o).

Andrei

Jul 27 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

  Isn't dereferencing function pointers illegal?
 
 We need that too :o).

Don't get it:) Need them to be dereferencable or not?

BTW, time ago I posted that the expression &fun should be immutable and that 
function pointers should be treated as values because they can't be 
dereferenced. Maybe we need to revive this forgotten post:

http://www.mail-archive.com/digitalmars-d puremagic.com/msg25104.html


Tomek

Jul 27 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

 median -> r[r.length / 2]

Generalize to quantile.
http://en.wikipedia.org/wiki/Quantile


Tomek

Jul 25 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Tomek Sowiński wrote:
 Andrei Alexandrescu wrote:
 
 median -> r[r.length / 2]

 
 Generalize to quantile.
 http://en.wikipedia.org/wiki/Quantile

Makes sense.

Andrei

Jul 26 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Tue, Jul 27, 2010 at 07:13, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Tomek Sowi=C5=84ski wrote:

 Andrei Alexandrescu wrote:

  median -> r[r.length / 2]

 Generalize to quantile.
 http://en.wikipedia.org/wiki/Quantile

 Makes sense.

 Andrei


Ah, not as methods, but free functions taking sorted ranges:


*  fast merge (on n ranges in one go),
* fast intersection, union, difference, symmetricDifference. But I see thes=
e
already exist in std.algo for sets.
* Also, maybe, some kind of takeUpTo(range, Max), that takes all values up
to a certain value Max. As it's ordered, we know there is no more value les=
s
than Max. I don't if there is a mathematical name for that.

It's just a specialized takeWhile!((ElementType!R v) { return
v<Max;})(range);

btw, takeWhile is a _very_ useful function to add to std.algorithm.



Philippe

Jul 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Philippe Sigaud:
 btw, takeWhile is a _very_ useful function to add to std.algorithm.

I was planning to ask for it in an enhancement request.

Bye,
bearophile

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Philippe Sigaud wrote:
 
 
 On Tue, Jul 27, 2010 at 07:13, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> 
 wrote:
 
     Tomek Sowi�ski wrote:
 
         Andrei Alexandrescu wrote:
 
             median -> r[r.length / 2]
 
 
         Generalize to quantile.
         http://en.wikipedia.org/wiki/Quantile
 
 
     Makes sense.
 
     Andrei
 
 
 
 Ah, not as methods, but free functions taking sorted ranges:
 
 
 *  fast merge (on n ranges in one go),
 * fast intersection, union, difference, symmetricDifference. But I see 
 these already exist in std.algo for sets.
 * Also, maybe, some kind of takeUpTo(range, Max), that takes all values 
 up to a certain value Max. As it's ordered, we know there is no more 
 value less than Max. I don't if there is a mathematical name for that.
 
 It's just a specialized takeWhile!((ElementType!R v) { return 
 v<Max;})(range);
 
 btw, takeWhile is a _very_ useful function to add to std.algorithm.

I definitely wanted to add that... in fact I vaguely thought I did! 
Could you please make a bugzilla entry so it's not forgotten?

Thanks,

Andrei

Jul 27 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

2010/7/27 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>

 btw, takeWhile is a _very_ useful function to add to std.algorithm.

 I definitely wanted to add that... in fact I vaguely thought I did! Could
 you please make a bugzilla entry so it's not forgotten?

Better late than never...

Here it is:
http://d.puremagic.com/issues/show_bug.cgi?id=4535

<http://d.puremagic.com/issues/show_bug.cgi?id=4535>Philippe

Jul 30 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Andrei Alexandrescu wrote:

 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.
 
 The obvious use case in Phobos is that find(haystack, needle) can
 proceed faster if haystack == assumeSorted(something).

I think the best justification is not in Phobos alone, but having a standard
way 
of signaling that a range is sorted. If this is not in Phobos, other authors 
will have to use their own ways of communicating this precondition to the 
client. sortedness is general and useful enough.
 
 The nicest way to go about this is to make the type returned by
 assumeSorted a kind of "range with benefits". That is, it would
 implement all range primitives, plus a few more primitives that are
 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?
 
 Here are a few I can think of:
 
 find -> uses binary search if random-access, or at least early stopping
 otherwise.
 
 minElement -> r.front
 
 maxElement -> r.back
 
 topN -> essentially does nothing
 
 median -> r[r.length / 2]
 
 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.
 
 Other ideas?
 
 
 Thanks,
 
 Andrei

I agree with Tomek Sowinski here, only the predicate is needed and perhaps a
way 
of retrieving the original range. A small assumeSorted range seems more 
consistent with the general ranges approach.

Jul 25 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

 Andrei Alexandrescu wrote:

 I'm trying to find justifications for keeping assumeSorted and friends
 within Phobos. Background: assumeSorted(r) where r is some range returns
 a value that wraps r and clarifies to the caller that it can assume r
 has been sorted.



 advantageous for sorted ranges. Question is: what are those? What kind
 of cool primitives could a sorted range define?

 Here are a few I can think of:

 find -> uses binary search if random-access, or at least early stopping
 otherwise.

 minElement -> r.front

 maxElement -> r.back

 topN -> essentially does nothing

 median -> r[r.length / 2]


It's not exactly what you're asking for, but my first thought was:
"propagation". When you have a sorted range, it's rich, it's something you
worked for to get. You don't want to waste it. So higher-order ranges and
some other functions should take care to propagate the information.

* slicing a sorted range should produce a sorted range. so your wrapper
range opIndex must be modified to account for this.
* ditto for .save()

* Take(sortedRange, 5) should indicate it's also sorted.
* The same for retro(sorted), but with a predicate being Not!pred (or
something akin to this anyway)
* filter ! predicate (sortedRange) is also sorted
* stride
etc, etc.


 Such functions could have free counterparts in std.algorithm. The free
 functions check whether the range already implements the fast functions
 and uses them transparently if present. So then we have e.g. find()
 which works in linear time for most ranges, but logarithmic time on
 random-access sorted ranges.



* Warning, daydreaming ahead *

As a side note, I'm playing with a Graph struct. And (generic) trees are
just graphs, statically speaking. I can encode trees as graphs. The only
difference is at runtime: no loop, one ancestor per node, etc. So, it's a
bit like the standard range/sorted range problem: in general, being sorted
is a runtime property. Being able to define this at the type level is quite
interesting and that's what your assumeSorted does.


But, in general, it would be nice to have a way to add flags/properties as
side-cars to an input type.

something like:

struct WithProperty(Original, alias property)
{
     Original _original;
     /+ insert here some alias this-like magic to make WithProperty melt
into an Original and expose _original most of the time +/
     /+ expose property (SortedBy!pred, ConvergesTo!someValue, etc) with an
alias.
}

Of course, it should be recursive: calling WithProperty on a
WithProperty!(SomeType, OtherProperties) should add the new property to a
CT-list, maybe a type-tuple.
And there should be a way to disable some properties. As I said before,
filter ! pred (some sorted range) is sorted, so it should indicate it. But
map ! fun (sorted range) is _not_ sorted in general, and should 'off' this
particular property.

What you're doing is defining 'interfaces', a sort of CT duck-typing, and
your constraints templates check for the presence or absence of those new
functions. I now realize much of what I longed for could be encoded in this
way.

* Convergence to a certain value => expose a .limit() primitive.
* every value in the range occupy a certain 'range', smaller than what
allows ElementType!Range (constraining values between 0.0 and 1.0 for a
double-producing range, for example) => expose minValue/maxValue methods.

OK, it's past midnight around here, tomorrow my daughters will want to jump
in the Mediterranean for hours, again. I'll shut up and go to bed.


Philippe

Jul 25 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Philippe Sigaud wrote:

     Andrei Alexandrescu wrote:

      > I'm trying to find justifications for keeping assumeSorted and
     friends
      > within Phobos. Background: assumeSorted(r) where r is some range
     returns
      > a value that wraps r and clarifies to the caller that it can assume r
      > has been sorted.

      > advantageous for sorted ranges. Question is: what are those? What
     kind
      > of cool primitives could a sorted range define?
      >
      > Here are a few I can think of:
      >
      > find -> uses binary search if random-access, or at least early
     stopping
      > otherwise.
      >
      > minElement -> r.front
      >
      > maxElement -> r.back
      >
      > topN -> essentially does nothing
      >
      > median -> r[r.length / 2]

 It's not exactly what you're asking for, but my first thought was: 
 "propagation". When you have a sorted range, it's rich, it's something 
 you worked for to get. You don't want to waste it. So higher-order 
 ranges and some other functions should take care to propagate the 
 information.

 * slicing a sorted range should produce a sorted range. so your wrapper 
 range opIndex must be modified to account for this.

Cool, makes sense.

 * ditto for .save()

Yah.

 * Take(sortedRange, 5) should indicate it's also sorted.

Fortunately take already returns the type of the slice if the range 
supports slice, so that's in place already.

 * The same for retro(sorted), but with a predicate being Not!pred (or 
 something akin to this anyway)

Whoa, interesting.

 * filter ! predicate (sortedRange) is also sorted
 * stride
 etc, etc.

Yah... that is all cool. My only reservations are that with what we have 
right now this looks like a lot of manual special casing. By the way, 
chain() with ranges sorted with the same predicate is also sorted...

Then, such an effort would be much better motivated if sorted ranges had 
some really interesting properties. Aside from those discussed - there's 
not a lot of them!

      > Such functions could have free counterparts in std.algorithm. The
     free
      > functions check whether the range already implements the fast
     functions
      > and uses them transparently if present. So then we have e.g. find()
      > which works in linear time for most ranges, but logarithmic time on
      > random-access sorted ranges.
      >

 * Warning, daydreaming ahead *

 As a side note, I'm playing with a Graph struct. And (generic) trees are 
 just graphs, statically speaking. I can encode trees as graphs. The only 
 difference is at runtime: no loop, one ancestor per node, etc. So, it's 
 a bit like the standard range/sorted range problem: in general, being 
 sorted is a runtime property. Being able to define this at the type 
 level is quite interesting and that's what your assumeSorted does.

 But, in general, it would be nice to have a way to add flags/properties 
 as side-cars to an input type. 

 something like:

 struct WithProperty(Original, alias property)
 {
      Original _original;
      /+ insert here some alias this-like magic to make WithProperty melt 
 into an Original and expose _original most of the time +/
      /+ expose property (SortedBy!pred, ConvergesTo!someValue, etc) with 
 an alias.
 }

 Of course, it should be recursive: calling WithProperty on a 
 WithProperty!(SomeType, OtherProperties) should add the new property to 
 a CT-list, maybe a type-tuple.
 And there should be a way to disable some properties. As I said before, 
 filter ! pred (some sorted range) is sorted, so it should indicate it. 
 But map ! fun (sorted range) is _not_ sorted in general, and should 
 'off' this particular property.

 What you're doing is defining 'interfaces', a sort of CT duck-typing, 
 and your constraints templates check for the presence or absence of 
 those new functions. I now realize much of what I longed for could be 
 encoded in this way.

 * Convergence to a certain value => expose a .limit() primitive.
 * every value in the range occupy a certain 'range', smaller than what 
 allows ElementType!Range (constraining values between 0.0 and 1.0 for a 
 double-producing range, for example) => expose minValue/maxValue methods.

 OK, it's past midnight around here, tomorrow my daughters will want to 
 jump in the Mediterranean for hours, again. I'll shut up and go to bed.

Sounds like fun - of both the hacking and family kind. Keep those ideas 
coming, this is very interesting and something terse and usable could 
come out of it.

Andrei

Jul 26 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

Personally, I would just expose all accelerated algorithms for sorted 
ranges, and make the user call them explicitly e.g. if they know their 
range is sorted, make them call binarySearch (or whatever) instead of find.

I have two primary reasons for this:

1. Having the function choose the right algorithm puts an undue burden 
on the function writer, which translates into increased uncertainty for 
the user of the function, "If I call find(assumeSorted(r)), is it going 
to use binary search? What if I call SomeThirdParty.find? Will it know 
about assumeSorted?"

2. While we're only discussing assumeSorted now, there are a myriad of 
other assumeX wrappers that would be useful:

assumeUnimodal(range) - Allows ternary search for maxElement.
assumeDistinct(range) - uniq doesn't need to do anything.
assumePrime(int) - e.g. getFactors(assumePrime(x)) returns [1, x]
assumeAssociative(binOp) - power can work in O(lg(n)).

* power!(binOp)(a, b) := reduce!(binOp)(replicate(a, b))

Do we provide wrappers for these as well, and make the algorithms check 
for all relevant ones?

---

I say no. Provide ternarySearch, powerAssociative etc., and let the user 
choose them when he/she knows they are the right algorithm. That way, 
function writers can focus on writing specialized functions, and 
function users can rest assured that the function they are calling is 
performing the algorithm that they expect (with the complexity bounds 
that they expect).

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Peter Alexander wrote:
 Personally, I would just expose all accelerated algorithms for sorted 
 ranges, and make the user call them explicitly e.g. if they know their 
 range is sorted, make them call binarySearch (or whatever) instead of find.
 
 I have two primary reasons for this:
 
 1. Having the function choose the right algorithm puts an undue burden 
 on the function writer, which translates into increased uncertainty for 
 the user of the function, "If I call find(assumeSorted(r)), is it going 
 to use binary search? What if I call SomeThirdParty.find? Will it know 
 about assumeSorted?"
 
 2. While we're only discussing assumeSorted now, there are a myriad of 
 other assumeX wrappers that would be useful:
 
 assumeUnimodal(range) - Allows ternary search for maxElement.
 assumeDistinct(range) - uniq doesn't need to do anything.
 assumePrime(int) - e.g. getFactors(assumePrime(x)) returns [1, x]
 assumeAssociative(binOp) - power can work in O(lg(n)).
 
 * power!(binOp)(a, b) := reduce!(binOp)(replicate(a, b))
 
 Do we provide wrappers for these as well, and make the algorithms check 
 for all relevant ones?
 
 ---
 
 I say no. Provide ternarySearch, powerAssociative etc., and let the user 
 choose them when he/she knows they are the right algorithm. That way, 
 function writers can focus on writing specialized functions, and 
 function users can rest assured that the function they are calling is 
 performing the algorithm that they expect (with the complexity bounds 
 that they expect).

I am quoting this in full because I agree with it in full! I think this 
and opIn_r settle the matter in favor of making the additional goodies 
members of Sorted!Range.

A few additional thoughts:

- Experience with the STL has shown that "specialized functions for 
improved complexity with specialized syntax" (a la find as a member vs 
nonmember) fare better than "best-effort functions" (a la STL's distance).

- I'm considering allowing assignment to .front, .back, [k] etc. with 
runtime enforcement that the assignment doesn't break the ordering. 
There is a lure to have them actually shuffle the assigned element in 
the right place, but that would break many assumptions. Many algorithms 
assume that after r.front = x they can assert(r.front == x).

- All binary search operators (lowerBound, upperBound, equalRange) 
should be made members of Sorted!Range. So instead of writing:

// We know r is sorted, baby
auto r1 = upperBound(r, x);

you'd write:

auto r1 = assumeSorted(r).upperBound(x);

thus essentially moving the comment into the code. (I also have a cool 
idea for statistically checking sortedness without deteriorating 
complexity. Essentially assumeSorted(r) could check that r is sorted by 
tossing a rigged coin with P(head)=1.0/r.length and deciding to check if 
head comes up. That way, the amortized complexity of the check is O(1)).

This all raises the question: where should this Sorted(R) in the making 
go? It's a range so it should go in std.range, but it's mostly a 
motivator for algorithms, so it should go in std.algorithm. Tiebreakers?

I'm very excited about this development. Maybe I should include some of 
this stuff in my Google talk on Friday.


Andrei

Jul 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 (I also have a cool 
 idea for statistically checking sortedness without deteriorating 
 complexity. Essentially assumeSorted(r) could check that r is sorted by 
 tossing a rigged coin with P(head)=1.0/r.length and deciding to check if 
 head comes up. That way, the amortized complexity of the check is O(1)).

But the safety (reliability) of such test is low, and it can increase a lot the
variance of the running time of a piece of code... So it's a cute idea, but I
am not sure it's a good one.

Bye,
bearophile

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 (I also have a cool idea for statistically checking sortedness
 without deteriorating complexity. Essentially assumeSorted(r) could
 check that r is sorted by tossing a rigged coin with
 P(head)=1.0/r.length and deciding to check if head comes up. That
 way, the amortized complexity of the check is O(1)).

 
 But the safety (reliability) of such test is low, and it can increase
 a lot the variance of the running time of a piece of code... So it's
 a cute idea, but I am not sure it's a good one.

Some reliability is considerably to have than patently none especially 
when approached in a principled manner (i.e. engineered to not influence 
complexity), and I don't think the variance argument holds except for 
synthetic cases (very few searches in very large ranges). Anyway, I 
think I'll insert the checks only in the debug version.


Andrei

Jul 27 2010

Jonathan M Davis <jmdavisprog gmail.com> writes:

On Tuesday, July 27, 2010 07:05:11 Andrei Alexandrescu wrote:
 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm. Tiebreakers?

It's a range, so put it in std.range. It's already likely pretty common to be 
importing both anyway, and std.algorithm has more in it than std.range at this 
point. Sure, it may be used with std.algorithm, but someone may have their own 
functions that they'd want to use it with without std.algorithm.

I suppose that it is a bit borderline as to which module it should go in, but 
I'd argue that since it's a range that isn't associated with any particular 
function (like Find! or Until! or whatnot), it should go in std.range.

- Jonathan M Davis

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jonathan M Davis wrote:
 On Tuesday, July 27, 2010 07:05:11 Andrei Alexandrescu wrote:
 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm. Tiebreakers?

 
 It's a range, so put it in std.range. It's already likely pretty common to be 
 importing both anyway, and std.algorithm has more in it than std.range at this 
 point. Sure, it may be used with std.algorithm, but someone may have their own 
 functions that they'd want to use it with without std.algorithm.
 
 I suppose that it is a bit borderline as to which module it should go in, but 
 I'd argue that since it's a range that isn't associated with any particular 
 function (like Find! or Until! or whatnot), it should go in std.range.

I agree with the argument. I just fear that someone wants binary search, 
searches the std.algorithm document for "binary", doesn't find any, and 
concludes it doesn't exist. I think I should drop such names as 
"binarySearch", "lowerBound" etc. in the documentation of std.algorithm 
and have the terms xref to std.range.

Andrei

Jul 27 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 27 Jul 2010 12:59:00 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Jonathan M Davis wrote:
 On Tuesday, July 27, 2010 07:05:11 Andrei Alexandrescu wrote:
 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm.  
 Tiebreakers?

  It's a range, so put it in std.range. It's already likely pretty  
 common to be importing both anyway, and std.algorithm has more in it  
 than std.range at this point. Sure, it may be used with std.algorithm,  
 but someone may have their own functions that they'd want to use it  
 with without std.algorithm.
  I suppose that it is a bit borderline as to which module it should go  
 in, but I'd argue that since it's a range that isn't associated with  
 any particular function (like Find! or Until! or whatnot), it should go  
 in std.range.

 I agree with the argument. I just fear that someone wants binary search,  
 searches the std.algorithm document for "binary", doesn't find any, and  
 concludes it doesn't exist. I think I should drop such names as  
 "binarySearch", "lowerBound" etc. in the documentation of std.algorithm  
 and have the terms xref to std.range.

That's what indexes are for :)

-Steve

Jul 27 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

 - All binary search operators (lowerBound, upperBound, equalRange)
 should be made members of Sorted!Range. So instead of writing:
 
 // We know r is sorted, baby
 auto r1 = upperBound(r, x);
 
 you'd write:
 
 auto r1 = assumeSorted(r).upperBound(x);

Some of std.algorithm (e.g. find, partition) can profit on sortedness, others 
(e.g. group, setIntersection, setDifference, nWayUnion) require it. Do you 
want to put *all* of them as members of Sorted?

 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm. Tiebreakers?

If members are algorithms then in algorithm...


Tomek

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Tomek Sowiński wrote:
 Andrei Alexandrescu wrote:
 
 - All binary search operators (lowerBound, upperBound, equalRange)
 should be made members of Sorted!Range. So instead of writing:

 // We know r is sorted, baby
 auto r1 = upperBound(r, x);

 you'd write:

 auto r1 = assumeSorted(r).upperBound(x);

 
 Some of std.algorithm (e.g. find, partition) can profit on sortedness, others 
 (e.g. group, setIntersection, setDifference, nWayUnion) require it. Do you 
 want to put *all* of them as members of Sorted?

Affirmative.

 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm. Tiebreakers?

 
 If members are algorithms then in algorithm...

Yah but the type is a range.


Andrei

Jul 27 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Wed, Jul 28, 2010 at 00:18, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Tomek Sowi=C5=84ski wrote:

 Andrei Alexandrescu wrote:

 - All binary search operators (lowerBound, upperBound, equalRange)
 should be made members of Sorted!Range. So instead of writing:

 // We know r is sorted, baby
 auto r1 =3D upperBound(r, x);

 you'd write:

 auto r1 =3D assumeSorted(r).upperBound(x);

 Some of std.algorithm (e.g. find, partition) can profit on sortedness,
 others (e.g. group, setIntersection, setDifference, nWayUnion) require i=


t.
 Do you want to put *all* of them as members of Sorted?

 Affirmative.



btw, std.algorithm.sort does an in-place sort and as such does not affect
the input type. It might be interesting to call it sortInPlace instead (or
anything that has your fancy) and have sort() _returns_ a Sorted!Range and
automatically get all the goodies.

Also, the .sort properties for arrays may be ditched and use std.algo
instead, by way of a free function in std.array.


And, while I'm at it, most of these primitives do not work for infinite
ranges, which may perfectly be sorted, have random-access and slicing. Case
in point: the natural numbers 0,1,2,...


I guess infinity will not be accepted in that case... If that's so, my
impression is that most algorithms will require something very array-like:
random-access && hasLength && hasSlicing && !isInfinite.
That may be interesting to produce a new template constraint:

template isArrayLike(R) // aka "Rich Range"
{...}



 This all raises the question: where should this Sorted(R) in the making
 go? It's a range so it should go in std.range, but it's mostly a
 motivator for algorithms, so it should go in std.algorithm. Tiebreakers=



?

 If members are algorithms then in algorithm...

 Yah but the type is a range.


But then, so are Map and Filter.
Personally, I do not consider them algorithms per se, they are too
low-level: they're building blocks and, as ranges, should go in std.range.
LevenshteinDistance or bringToFront, those are algorithms.

My own experience of this is that most of the time, I import std.range
_only_ to get map/filter/reduce. That is, without reaching for a real
algorithm. They are in std.algo for 'historical' reasons, because std.algo
precedes std.range.


Philippe

Jul 28 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Philippe Sigaud wrote:

 Yah but the type is a range.

 
 
 But then, so are Map and Filter.
 Personally, I do not consider them algorithms per se, they are too
 low-level: they're building blocks and, as ranges, should go in std.range.
 LevenshteinDistance or bringToFront, those are algorithms.
 
 My own experience of this is that most of the time, I import std.range
 only to get map/filter/reduce. That is, without reaching for a real
 algorithm. They are in std.algo for 'historical' reasons, because std.algo
 precedes std.range.

OK, range then.

From a different angle, I was thinking that assumptions should give more; 
look at this:

auto range = [2, 3, 4];
auto sorted = assumeSorted(r);
range[0] = 666; // BANG! assumeSorted replaced range with Range.init
sorted[0] = 666; // BANG! assertion in Sorted's setter went off
range = sorted.dropAssumption;
sorted[0] = 666;  // BANG! now sorted's range reference is nulled.
range[0] = 666;  // OK... phew!

Assumption producers like sort() would take the range by ref to comply with 
this idiom. Of course if you save() the range before it's eaten by 
assumeSorted, you can twiddle away through the saved range. Still, it's 
better than nothing. Oh, and the unique-like behavior can be expanded onto 
assumeUnique and, I dare say, assumeWhatever.

Like it?
 

Tomek

Jul 28 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Tomek Sowiński wrote:
 Philippe Sigaud wrote:
 
 Yah but the type is a range.

 But then, so are Map and Filter.
 Personally, I do not consider them algorithms per se, they are too
 low-level: they're building blocks and, as ranges, should go in std.range.
 LevenshteinDistance or bringToFront, those are algorithms.

 My own experience of this is that most of the time, I import std.range
 only to get map/filter/reduce. That is, without reaching for a real
 algorithm. They are in std.algo for 'historical' reasons, because std.algo
 precedes std.range.

 
 OK, range then.
 
 From a different angle, I was thinking that assumptions should give more; 
 look at this:
 
 auto range = [2, 3, 4];
 auto sorted = assumeSorted(r);
 range[0] = 666; // BANG! assumeSorted replaced range with Range.init
 sorted[0] = 666; // BANG! assertion in Sorted's setter went off
 range = sorted.dropAssumption;
 sorted[0] = 666;  // BANG! now sorted's range reference is nulled.
 range[0] = 666;  // OK... phew!
 
 Assumption producers like sort() would take the range by ref to comply with 
 this idiom. Of course if you save() the range before it's eaten by 
 assumeSorted, you can twiddle away through the saved range. Still, it's 
 better than nothing. Oh, and the unique-like behavior can be expanded onto 
 assumeUnique and, I dare say, assumeWhatever.
 
 Like it?
  
 
 Tomek

Like it, just one issue - destructive copy is unlikely to solve any 
problems. The big problems arise with long-distance aliasing, much less 
from messing with the source of the assignment directly.

Andrei

Jul 28 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu wrote:

 Tomek Sowiński wrote:
 From a different angle, I was thinking that assumptions should give more;
 look at this:
 
 auto range = [2, 3, 4];
 auto sorted = assumeSorted(r);
 range[0] = 666; // BANG! assumeSorted replaced range with Range.init
 sorted[0] = 666; // BANG! assertion in Sorted's setter went off
 range = sorted.dropAssumption;
 sorted[0] = 666;  // BANG! now sorted's range reference is nulled.
 range[0] = 666;  // OK... phew!
 
 Assumption producers like sort() would take the range by ref to comply
 with this idiom. Of course if you save() the range before it's eaten by
 assumeSorted, you can twiddle away through the saved range. Still, it's
 better than nothing. Oh, and the unique-like behavior can be expanded
 onto assumeUnique and, I dare say, assumeWhatever.
 
 Like it?
  
 
 Tomek

 
 Like it, just one issue - destructive copy is unlikely to solve any
 problems. The big problems arise with long-distance aliasing, much less
 from messing with the source of the assignment directly.

I know. Yet, it does no harm and is better than what is now.


Tomek

Jul 28 2010

KennyTM~ <kennytm gmail.com> writes:

On Jul 27, 10 13:20, Andrei Alexandrescu wrote:
 Philippe Sigaud wrote:
 Andrei Alexandrescu wrote:


[snip]
 Yah... that is all cool. My only reservations are that with what we have
 right now this looks like a lot of manual special casing. By the way,
 chain() with ranges sorted with the same predicate is also sorted...

No

      chain([4,5,6,7], [1,2])  // <-- not sorted

unless you can also assert   input[i-1].back <= input[i].front.

 Then, such an effort would be much better motivated if sorted ranges had
 some really interesting properties. Aside from those discussed - there's
 not a lot of them!

[snip]
 Andrei

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

KennyTM~ wrote:
 On Jul 27, 10 13:20, Andrei Alexandrescu wrote:
 Philippe Sigaud wrote:
 Andrei Alexandrescu wrote:


 [snip]
 Yah... that is all cool. My only reservations are that with what we have
 right now this looks like a lot of manual special casing. By the way,
 chain() with ranges sorted with the same predicate is also sorted...

 
 No
 
      chain([4,5,6,7], [1,2])  // <-- not sorted
 
 unless you can also assert   input[i-1].back <= input[i].front.

Oops. Thanks for the correction.

Andrei

Jul 27 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Tue, Jul 27, 2010 at 07:20, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 * slicing a sorted range should produce a sorted range. so your wrapper
 range opIndex must be modified to account for this.

 Cool, makes sense.

Another thing, though I'm not sure it's a good idea, it to have them define
an opIndex[ElementType!R e], that would just either forward to .find() or do
a simple binary search by itself: O(lg(n)) is quite good.

Yeah, isAssociativeRange!R  ;)
But that's blurring the line between container and range, I guess.


In the same vein, a sorted range could optionally define opIn_r that also
promises O(lg(n)) complexity.

As for taking ideas from other languages: in Clojure, data structures for
which it makes sense (maps, sets?) are functions of their element and return
true if the provided value is in the structure. That allows one to use them
as predicate for finding and such, which makes for very clean code, as they
also have encoded literal values.




  * ditto for .save()

 Yah.


  * Take(sortedRange, 5) should indicate it's also sorted.

 Fortunately take already returns the type of the slice if the range
 supports slice, so that's in place already.


I was about to reply that not all sorted ranges have slicing, but I think
it's better to restrict this to 'rich' ranges: Random-Access and slicing.
So, OK.



  * The same for retro(sorted), but with a predicate being Not!pred (or
 something akin to this anyway)

 Whoa, interesting.


It's the only case where I found that you could act on the predicate without
analyzing it.


  * filter ! predicate (sortedRange) is also sorted
 * stride
 etc, etc.

 Yah... that is all cool. My only reservations are that with what we have
 right now this looks like a lot of manual special casing. By the way,
 chain() with ranges sorted with the same predicate is also sorted...

 Then, such an effort would be much better motivated if sorted ranges had
 some really interesting properties. Aside from those discussed - there's not
 a lot of them!

Yes, I agree. Maybe for retro() it's interesting, as retro could replace
foreach_reverse...

Btw, should they really be called "sorted" or "ordered"? "sorted" implies
there was a sorting action, which is not always the case. It may very well
have been created that way.

Maybe (just maybe, I'm extending myself there) it's because you tend to use
ranges this way:

- read from a file some content over which you had no control.
- act on the read stuff, most probably sort it to prepare for the real
action
- do the heavy-duty computation.

Whereas I tend to produce my own ranges, so I design them to be
ordered/sorted, always positive or whatever.


Philippe

Jul 27 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Philippe Sigaud wrote:
 On Tue, Jul 27, 2010 at 07:20, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> 
 wrote:
 
 
     * slicing a sorted range should produce a sorted range. so your
     wrapper range opIndex must be modified to account for this.
 
     Cool, makes sense.
 
 
 
 Another thing, though I'm not sure it's a good idea, it to have them 
 define an opIndex[ElementType!R e], that would just either forward to 
 .find() or do a simple binary search by itself: O(lg(n)) is quite good.
  
 Yeah, isAssociativeRange!R  ;)
 But that's blurring the line between container and range, I guess.

One issue is dealing with sorted random-access ranges of size_t. Then 
there's ambiguity - which [] did you mean, searching on straight indexing?

 In the same vein, a sorted range could optionally define opIn_r that 
 also promises O(lg(n)) complexity.

This is a strong argument for putting sorted range functionality inside 
Sorted!Range.

 As for taking ideas from other languages: in Clojure, data structures 
 for which it makes sense (maps, sets?) are functions of their element 
 and return true if the provided value is in the structure. That allows 
 one to use them as predicate for finding and such, which makes for very 
 clean code, as they also have encoded literal values.

Not getting it, could you please expand/insert a link?

 Btw, should they really be called "sorted" or "ordered"? "sorted" 
 implies there was a sorting action, which is not always the case. It may 
 very well have been created that way.

Not sure, but as a putative user, with "sorted" I know 100% what's going 
on; with the slightly oblique "ordered" I might be worried enough to 
look up the manual.


Andrei

Jul 27 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Tue, Jul 27, 2010 at 15:48, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Another thing, though I'm not sure it's a good idea, it to have them
 define an opIndex[ElementType!R e], that would just either forward to
 .find() or do a simple binary search by itself: O(lg(n)) is quite good.
  Yeah, isAssociativeRange!R  ;)
 But that's blurring the line between container and range, I guess.

 One issue is dealing with sorted random-access ranges of size_t. Then
 there's ambiguity - which [] did you mean, searching on straight indexing?

You're right. And I cannot discard the notion of size_t ranges, because
ranges of indices are quite useful.
OK, carry on.



  In the same vein, a sorted range could optionally define opIn_r that also
 promises O(lg(n)) complexity.

 This is a strong argument for putting sorted range functionality inside
 Sorted!Range.

 agree. Does that also means that in that case, an algorithm trying to
determine if its argument range is sorted will not use compile-time duck
typing, with a isSortedRange!R, but will just see if the input is a
Sorted!T?

In that case, indicate quite clearly in Sorted docs that if the user knows a
range is sorted, then she really ought to wrap it in Sorted.

Anyway, Sorted will be an interesting example of a wrapper struct that
transmits type-encoded information, like I wanted to have. Seeing how it
works in practice will be interesting. It's really a way to provide at
compile-time information on the (runtime) values.

Though what I wanted is a proper subtype, whereas in Sorted!R case, Sorted
is parallel to R, but due to the duck typing used everywhere, it will get
along.
I'm making any sense there?


As a little brother to Sorted, you could have Bounded!R, that signifies the
ranges values won't take all possible values of their element type: a range
producing doubles, but all positive, for example, or a char range with all
values in the 'a'-'z' range...
Exposed methods: minBound, maxBound

These do not tell you what is the max or the min of the range, they may not
even exist if the range is infinite. It just tell you that all values will
be between minBound and maxBound)
Filter may be enough for this, but it's complicated to extract the min and
max values.

Another one I like is Periodic... (or, Cyclic). Right now, some of my
algorithms, like rotateWhile!predicate(range) detect Cycle!R and Repeat!R
and deal with it accordingly.
I do not want

rotateWhile!"a<6"(cycle([0,1,2,3]))

to rotate without stopping.



  As for taking ideas from other languages: in Clojure, data structures for
 which it makes sense (maps, sets?) are functions of their element and return
 true if the provided value is in the structure. That allows one to use them
 as predicate for finding and such, which makes for very clean code, as they
 also have encoded literal values.

 Not getting it, could you please expand/insert a link?



http://clojure.org/data_structures  (maps are functions of their keys, sets
are functions of their elements)

and

http://clojure.org/sequences
<http://clojure.org/data_structures>
That will fail in D, due to your remark on range with size_t elements that
prevents using opIndex to make a range an associative range. But the idea is
that if you want to filter a sequence to find all vowels for example:

auto vowels = set("aeiou");
auto vowelsInMyText = filter!vowels(myText);

vowels act as a predicate: if you call it with a char, it will returns true
iff the char is in vowels. In Clojure, it produces very clean code. I guess
in D, you'd do:

auto vowelsInMyText!((dchar d) { return d in vowels;})(myText);

or even:

auto vowelsInMyText = filter ! q{a in assumeSorted("aeiou")} (myText);

And I just convinced myself that the D way is good enough :)


Philippe

Jul 28 2010

bearophile <bearophileHUGS lycos.com> writes:

Philippe Sigaud:

 But the idea is that if you want to filter a sequence to find all vowels for
example:
 
 auto vowels = set("aeiou");
 auto vowelsInMyText = filter!vowels(myText);
 
 vowels act as a predicate: if you call it with a char, it will returns true
 iff the char is in vowels. In Clojure, it produces very clean code.

In Python you can give a lamba function to filter(), or you can use:

vowels = set("aeiou")
txt = "hello how are you"
print filter(vowels.__contains__, txt)


__contains__ is the standard method used by the Python "in" operator. That code
is a bit less nice than the Clojure code, but it's a bit more explicit (and you
can use other member functions beside __contains__).

This is similar code in D, but it doesn't work:

import std.range, std.algorithm, std.array;
void main() {
  auto vowels = ['a':0, 'e':0, 'i':0, 'o':0, 'u':0];
  string txt = "hello how are you";
  auto f = filter!(&vowels.opBinary!("in"))(txt);
  writeln(array(f));
}

I don't know if there is some way to make that code work.

Bye,
bearophile

Jul 28 2010

bearophile <bearophileHUGS lycos.com> writes:

Philippe Sigaud:

 auto vowelsInMyText = filter ! q{a in assumeSorted("aeiou")} (myText);

In a so small array of chars (or even wchars/dchars) a sequential search (if
well implemented!) is the fastest algorithm.

Bye,
bearophile

Jul 28 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Do sorted ranges have any special properties?