digitalmars.D - range chunks

Adrian Matoga (13/13) Aug 05 2010 Hi,

Philippe Sigaud (29/32) Aug 06 2010 None that I know of.

bearophile (7/9) Aug 06 2010 It's better to give it a default chunk size of 2.

Philippe Sigaud (10/19) Aug 06 2010 Why?

Steven Schveighoffer (6/9) Aug 06 2010 [snip]

Philippe Sigaud (13/23) Aug 06 2010 Hmm, good idea. And that way, if n is big, you get a lazy range. But you

Steven Schveighoffer (7/31) Aug 06 2010 Doesn't take return a random-access range if the original is a random

Andrei Alexandrescu (3/38) Aug 06 2010 It does since recently.

Philippe Sigaud (5/13) Aug 06 2010 Damn. Uh, I mean, cool! I did not integrate this.

Adrian Matoga (8/54) Aug 06 2010 By an analogy to taking a file by chunks. The other case also occured to...

bearophile (7/14) Aug 06 2010 No duplication across chunks, so on default it can split the natural num...

Andrei Alexandrescu (3/15) Aug 06 2010 When I see "partition" I think it's quicksort's partition algorithm...

Peter Alexander (4/12) Aug 06 2010 Yeah, partition has always meant that to me as well.

bearophile (5/6) Aug 06 2010 Probably you come from a quite different part of computer science (The e...

Peter Alexander (21/25) Aug 07 2010 It's used a fair amount in geometry. I know the Kirkpatrick-Seidel

bearophile (5/12) Aug 07 2010 Yes, it's named partition in my implementations of Quicksort too :-) So ...

Peter Alexander (5/10) Aug 07 2010 Hmm, how about kPartitions?

Rory Mcguire (3/18) Aug 18 2010 how about

Andrei Alexandrescu (3/7) Aug 07 2010 http://en.wikipedia.org/wiki/Selection_algorithm for example.

KennyTM~ (10/24) Aug 07 2010 In D, 'partition' is already used for rearranging (splitting) a range

bearophile (4/5) Aug 07 2010 As alternative do you like "chunker"?

KennyTM~ (5/10) Aug 07 2010 I prefer 'chunks', but that conflicts with std.stdio.chunks. It can be

Adrian Matoga <epi atari8.info> writes:

Hi,

Is there any off the shelf solution for iterating over a range by 
chunks? Example:

int[] arr = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];
int i;
foreach (chunk; chunks(arr, 3))
{
	assert(chunk == arr[i * 3 .. min(i * 3 + 3, arr.length)]);
}

(should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk in 
subsequent iterations)

Regards,
Adrian Matoga

Aug 05 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

2010/8/6 Adrian Matoga <epi atari8.info>

 Hi,

 Is there any off the shelf solution for iterating over a range by chunks?


None that I know of.

(should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk in
 subsequent iterations)

As a data point, why do you think it should produce [10] and not stop at
[7,8,9]?

Here is what I cooked, it's still a bit rough around the edges. It has an
optional step argument, to see how many elements to jump.

import std.range;
struct Chunks(R) if (isInputRange!R)
{
    R range;
    size_t n; // chunk size
    size_t step; // how many elements to jump

    bool empty()  property { return range.empty;}
    ElementType!R[] front()  property { return array(take(range, n));} //
inefficient if you call front() many times in a row
    void popFront()  property { popFrontN(range, step);}

    static if (hasLength!R)
        size_t length()  property { return (range.length+step-1)/step;}
}

Chunks!R chunks(R)(R range, size_t n) if (isInputRange!R)
{
    return Chunks!R(range, n, n); // default is step == n
}

Chunks!R chunks(R)(R range, size_t n, size_t step) if (isInputRange!R)
{
    return Chunks!R(range, n, step);
}


Philippe

Aug 06 2010

bearophile <bearophileHUGS lycos.com> writes:

Philippe Sigaud:
 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

It's better to give it a default chunk size of 2.
If I have understood this well, then this is the partition() function:
http://reference.wolfram.com/mathematica/ref/Partition.html

If you want to be more complete you can add two more features: to start
counting items from the end (but they are given from the start) and give a
"duplication window", so for example the natural numbers with a window of 2 and
a duplication of 1 are [1, 2], [2, 3], [3, 4]. But adding too many features
risks to produce functions like the Mathematica ones, that sometimes are hard
to use because they have tons of complex arguments.)

Bye,
bearophile

Aug 06 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Fri, Aug 6, 2010 at 19:41, bearophile <bearophileHUGS lycos.com> wrote:

 Philippe Sigaud:
 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

 It's better to give it a default chunk size of 2.

Why?
And what should the default step be, according to you? If chose n (the chunk
size), because that's want the OP wanted.


If I have understood this well, then this is the partition() function:
 http://reference.wolfram.com/mathematica/ref/Partition.html

Yeah, partition, chunk, segment, it a basic thing, worth including in
std.range.
Very useful in conjunction with mapping: calculating the moving average, for
example.


 If you want to be more complete you can add two more features: to start
 counting items from the end (but they are given from the start) and give a
 "duplication window", so for example the natural numbers with a window of 2
 and a duplication of 1 are [1, 2], [2, 3], [3, 4].


This simple code does this already: chunks(range, 2, 1).

Aug 06 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

[snip]

     ElementType!R[] front()  property { return array(take(range, n));} //


efficient D code, avoid the heap when you can :)

-Steve

Aug 06 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

 [snip]


     ElementType!R[] front()  property { return array(take(range, n));} //


 efficient D code, avoid the heap when you can :)

Hmm, good idea. And that way, if n is big, you get a lazy range. But you
lose the random access and such. I guess the user will call map!array() if
she wants to get arrays?

Intially, I had a cache, but I ran into problems to initiate it correctly
and told myself "Hey, it's a 10' function, don't sweat it, post it already,
if that may help."

Maybe the type produced could be decided by a policy: lazy, dynamic array...
In a version I use, the n arg is a template arg and the range returns
tuples. If found that easier to deal with: from a tuple, you can easily
create and array (static, or dynamic), if you want to. And I can easily use
it as a function argument list...

Philippe

Aug 06 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Fri, 06 Aug 2010 14:59:17 -0400, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer  
 <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It has  
 an
 optional step argument, to see how many elements to jump.

 [snip]


     ElementType!R[] front()  property { return array(take(range, n));}  
 //


 efficient D code, avoid the heap when you can :)

 Hmm, good idea. And that way, if n is big, you get a lazy range. But you
 lose the random access and such. I guess the user will call map!array()  
 if
 she wants to get arrays?

Doesn't take return a random-access range if the original is a random  
access range?

I would actually expect take(range, n) to return range[0..n] if range  
supports that.

-Steve

Aug 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Steven Schveighoffer wrote:
 On Fri, 06 Aug 2010 14:59:17 -0400, Philippe Sigaud 
 <philippe.sigaud gmail.com> wrote:
 
 On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer 
 <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It 
 has an
 optional step argument, to see how many elements to jump.

 [snip]


     ElementType!R[] front()  property { return array(take(range, 
 n));} //


 efficient D code, avoid the heap when you can :)

 Hmm, good idea. And that way, if n is big, you get a lazy range. But you
 lose the random access and such. I guess the user will call 
 map!array() if
 she wants to get arrays?

 
 Doesn't take return a random-access range if the original is a random 
 access range?
 
 I would actually expect take(range, n) to return range[0..n] if range 
 supports that.

It does since recently.

Andrei

Aug 06 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Fri, Aug 6, 2010 at 22:11, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Doesn't take return a random-access range if the original is a random
 access range?

 I would actually expect take(range, n) to return range[0..n] if range
 supports that.

 It does since recently.


Damn. Uh, I mean, cool! I did not integrate this.

Then everything is for the best.


Philippe

Aug 06 2010

Adrian Matoga <epi atari8.info> writes:

On 2010-08-06 19:33, Philippe Sigaud wrote:
 2010/8/6 Adrian Matoga <epi atari8.info <mailto:epi atari8.info>>
 
     Hi,
 
     Is there any off the shelf solution for iterating over a range by
     chunks?
 
 
 None that I know of.
 
     (should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk
     in subsequent iterations)
 
 
 As a data point, why do you think it should produce [10] and not stop at 
 [7,8,9]?

By an analogy to taking a file by chunks. The other case also occured to 
me but I simply thought I could imagine more situations in which I need 
the whole input range to be exhausted, including remainder.

 Here is what I cooked, it's still a bit rough around the edges. It has 
 an optional step argument, to see how many elements to jump.
 
 import std.range;
 struct Chunks(R) if (isInputRange!R)
 {
     R range;
     size_t n; // chunk size
     size_t step; // how many elements to jump
 
     bool empty()  property { return range.empty;}
     ElementType!R[] front()  property { return array(take(range, n));} 
 // inefficient if you call front() many times in a row
     void popFront()  property { popFrontN(range, step);}
 
     static if (hasLength!R)
         size_t length()  property { return (range.length+step-1)/step;}
 }
 
 Chunks!R chunks(R)(R range, size_t n) if (isInputRange!R)
 {
     return Chunks!R(range, n, n); // default is step == n
 }
 
 Chunks!R chunks(R)(R range, size_t n, size_t step) if (isInputRange!R)
 {
     return Chunks!R(range, n, step);
 }
  
 

Thank you very much!

And yes, I also think it should be included in std.range (actually 
before posting the question I was almost sure I could find it there ;)).

Adrian

Aug 06 2010

bearophile <bearophileHUGS lycos.com> writes:

Philippe Sigaud:
 It's better to give it a default chunk size of 2.

 
 Why?

I'm using partition() for years in various languages, and I've seen that the
chunks of size 2 are the most common.


 And what should the default step be, according to you? If chose n (the chunk
 size), because that's want the OP wanted.

No duplication across chunks, so on default it can split the natural numbers as:
[1, 2], [3, 4], [5, 6], etc.


 Yeah, partition, chunk, segment, it a basic thing, worth including in
 std.range.

The most common name is partition().

Bye,
bearophile

Aug 06 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 08/06/2010 06:52 PM, bearophile wrote:
 Philippe Sigaud:
 It's better to give it a default chunk size of 2.

 Why?

 I'm using partition() for years in various languages, and I've seen that the
chunks of size 2 are the most common.


 And what should the default step be, according to you? If chose n (the chunk
 size), because that's want the OP wanted.

 No duplication across chunks, so on default it can split the natural numbers
as:
 [1, 2], [3, 4], [5, 6], etc.


 Yeah, partition, chunk, segment, it a basic thing, worth including in
 std.range.

 The most common name is partition().

When I see "partition" I think it's quicksort's partition algorithm...

Andrei

Aug 06 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 7/08/10 2:51 AM, Andrei Alexandrescu wrote:
 On 08/06/2010 06:52 PM, bearophile wrote:
 Philippe Sigaud:
 Yeah, partition, chunk, segment, it a basic thing, worth including in
 std.range.

 The most common name is partition().

 When I see "partition" I think it's quicksort's partition algorithm...

 Andrei

Yeah, partition has always meant that to me as well.

I'd call that "chunking" algorithm "divide" because it is dividing the 
range into equal parts.

Aug 06 2010

bearophile <bearophileHUGS lycos.com> writes:

Peter Alexander:
 Yeah, partition has always meant that to me as well.

Probably you come from a quite different part of computer science (The example
I have shown from Mathematica is not the only one). 
What are the usages of that algorithm, beside being used in the QuickSort?

Bye,
bearophile

Aug 06 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 7/08/10 7:57 AM, bearophile wrote:
 Peter Alexander:
 Yeah, partition has always meant that to me as well.

 Probably you come from a quite different part of computer science (The example
I have shown from Mathematica is not the only one).
 What are the usages of that algorithm, beside being used in the QuickSort?

It's used a fair amount in geometry. I know the Kirkpatrick-Seidel 
algorithm uses it and I've seen some Delaunay triangulation 
implementations use it. I'm pretty sure there's other geometrical 
applications for it as well.

Outside of specific algorithms, I think partition is quite a neglected 
algorithm, because people typically just use sort, even when partition 
would be more suitable. For example, consider some distribution service 
that distributes some product to premium customers first, followed by 
regular customers. Really, that's a job for partition, but in practice 
your average Joe coder would just sort based on the same predicate, 
wasting a factor of O(lg(n)). Either that or they'd just iterate over 
the sequence twice: once for premium, and a second time for regular.

I think the main argument for calling it that is because it is called 
that by almost all computer science literature. Do a Google search for 
quick sort pseudocode. Almost invariably you'll find that they'll define 
a helper function called partition, so programmers have come to learn 
that that's what partition does.

(I just Googled for "quicksort pseudocode" and sure enough, all of the 
top 10 entries that actually contained code defined a partition function 
that does what std.algorithm.partition does).

Aug 07 2010

bearophile <bearophileHUGS lycos.com> writes:

Peter Alexander:
 It's used a fair amount in geometry. I know the Kirkpatrick-Seidel 
 algorithm uses it and I've seen some Delaunay triangulation 
 implementations use it. I'm pretty sure there's other geometrical 
 applications for it as well.

Thank you for your explanations.


 (I just Googled for "quicksort pseudocode" and sure enough, all of the 
 top 10 entries that actually contained code defined a partition function 
 that does what std.algorithm.partition does).

Yes, it's named partition in my implementations of Quicksort too :-) So it's
better to find a different name for the other one.

Bye,
bearophile

Aug 07 2010

Peter Alexander <peter.alexander.au gmail.com> writes:

On 7/08/10 3:06 PM, bearophile wrote:
 Peter Alexander:
 (I just Googled for "quicksort pseudocode" and sure enough, all of the
 top 10 entries that actually contained code defined a partition function
 that does what std.algorithm.partition does).

 Yes, it's named partition in my implementations of Quicksort too :-) So it's
better to find a different name for the other one.

Hmm, how about kPartitions?

e.g. kPartitions([1,2,3,4,5,6], 3) == [[1,2,3], [4,5,6]]

A k-partition is common terminology for partitioning a set into 
k-element subsets, so that seems to fit, and isn't too verbose.

Aug 07 2010

Rory Mcguire <rjmcguire gm_no_ail.com> writes:

Peter Alexander wrote:

 On 7/08/10 3:06 PM, bearophile wrote:
 Peter Alexander:
 (I just Googled for "quicksort pseudocode" and sure enough, all of the
 top 10 entries that actually contained code defined a partition function
 that does what std.algorithm.partition does).

 Yes, it's named partition in my implementations of Quicksort too :-) So
 it's better to find a different name for the other one.

 
 Hmm, how about kPartitions?
 
 e.g. kPartitions([1,2,3,4,5,6], 3) == [[1,2,3], [4,5,6]]
 
 A k-partition is common terminology for partitioning a set into
 k-element subsets, so that seems to fit, and isn't too verbose.


how about

std.range.tochunks([1,2,3,4,5,6],3) == [[1,2,3],[4,5,6]]

Aug 18 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 08/07/2010 01:57 AM, bearophile wrote:
 Peter Alexander:
 Yeah, partition has always meant that to me as well.

 Probably you come from a quite different part of computer science (The example
I have shown from Mathematica is not the only one).
 What are the usages of that algorithm, beside being used in the QuickSort?

http://en.wikipedia.org/wiki/Selection_algorithm for example.

Andrei

Aug 07 2010

KennyTM~ <kennytm gmail.com> writes:

On Aug 7, 10 07:52, bearophile wrote:
 Philippe Sigaud:
 It's better to give it a default chunk size of 2.

 Why?

 I'm using partition() for years in various languages, and I've seen that the
chunks of size 2 are the most common.


 And what should the default step be, according to you? If chose n (the chunk
 size), because that's want the OP wanted.

 No duplication across chunks, so on default it can split the natural numbers
as:
 [1, 2], [3, 4], [5, 6], etc.


 Yeah, partition, chunk, segment, it a basic thing, worth including in
 std.range.

 The most common name is partition().

In D, 'partition' is already used for rearranging (splitting) a range 
into two according to a predicate*. (It's the same in C++, Haskell and 
Ruby.) Since the name is already taken, it's very bad to break the 
compatibility to change what the function does even if the name were 
more common.

Also, in Python this function called grouper() (in itertool's 
"Recipes"), and in Ruby it's called each_slice. I've never seen it 
called Partition[] besides Mathematica.

*: http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html#partition

 Bye,
 bearophile

Aug 07 2010

bearophile <bearophileHUGS lycos.com> writes:

KennyTM~:
 I've never seen it called Partition[] besides Mathematica.

As alternative do you like "chunker"?

Bye,
bearophile

Aug 07 2010

KennyTM~ <kennytm gmail.com> writes:

On Aug 7, 10 20:11, bearophile wrote:
 KennyTM~:
 I've never seen it called Partition[] besides Mathematica.

 As alternative do you like "chunker"?

 Bye,
 bearophile

I prefer 'chunks', but that conflicts with std.stdio.chunks. It can be 
resolved by generalizing std.stdio.chunks to std.range.chunks and making 
File a range that iterates ubyte, but this doesn't seem to be the 
natural choice.

Aug 07 2010

D Programming

C/C++ Programming

Other

digitalmars.D - range chunks