www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - range chunks

reply Adrian Matoga <epi atari8.info> writes:
Hi,

Is there any off the shelf solution for iterating over a range by 
chunks? Example:

int[] arr = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ];
int i;
foreach (chunk; chunks(arr, 3))
{
	assert(chunk == arr[i * 3 .. min(i * 3 + 3, arr.length)]);
}

(should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk in 
subsequent iterations)

Regards,
Adrian Matoga
Aug 05 2010
next sibling parent reply Philippe Sigaud <philippe.sigaud gmail.com> writes:
--0016e6d7eeb47c8d11048d2b0f7f
Content-Type: text/plain; charset=ISO-8859-1

2010/8/6 Adrian Matoga <epi atari8.info>

 Hi,

 Is there any off the shelf solution for iterating over a range by chunks?

None that I know of. (should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk in
 subsequent iterations)

[7,8,9]? Here is what I cooked, it's still a bit rough around the edges. It has an optional step argument, to see how many elements to jump. import std.range; struct Chunks(R) if (isInputRange!R) { R range; size_t n; // chunk size size_t step; // how many elements to jump bool empty() property { return range.empty;} ElementType!R[] front() property { return array(take(range, n));} // inefficient if you call front() many times in a row void popFront() property { popFrontN(range, step);} static if (hasLength!R) size_t length() property { return (range.length+step-1)/step;} } Chunks!R chunks(R)(R range, size_t n) if (isInputRange!R) { return Chunks!R(range, n, n); // default is step == n } Chunks!R chunks(R)(R range, size_t n, size_t step) if (isInputRange!R) { return Chunks!R(range, n, step); } Philippe --0016e6d7eeb47c8d11048d2b0f7f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">2010/8/6 Adrian Matoga <span dir=3D"ltr">&lt;<a = href=3D"mailto:epi atari8.info">epi atari8.info</a>&gt;</span><br><blockquo= te class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1= px solid rgb(204, 204, 204); padding-left: 1ex;"> Hi,<br> <br> Is there any off the shelf solution for iterating over a range by chunks?</= blockquote><div><br>None that I know of.<br><br></div><blockquote class=3D"= gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb= (204, 204, 204); padding-left: 1ex;"> (should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk in subse= quent iterations)<br><br></blockquote><div><br>As a data point, why do you = think it should produce [10] and not=20 stop at [7,8,9]?<br><br>Here is what I cooked, it&#39;s still a bit rough a= round the edges. It has an optional step argument, to see how many elements= to jump.<br><br>import std.range;<br>struct Chunks(R) if (isInputRange!R)<= br> {<br>=A0=A0=A0 R range;<br>=A0=A0=A0 size_t n; // chunk size<br>=A0=A0=A0 s= ize_t step; // how many elements to jump<br><br>=A0=A0=A0 bool empty() pro= perty { return range.empty;}<br>=A0=A0=A0 ElementType!R[] front() property= { return array(take(range, n));} // inefficient if you call front() many t= imes in a row<br> =A0=A0=A0 void popFront() property { popFrontN(range, step);}<br><br>=A0= =A0=A0 static if (hasLength!R)<br>=A0=A0=A0=A0=A0=A0=A0 size_t length() pr= operty { return (range.length+step-1)/step;}<br>}<br><br>Chunks!R chunks(R)= (R range, size_t n) if (isInputRange!R)<br> {<br>=A0=A0=A0 return Chunks!R(range, n, n); // default is step =3D=3D n<br=
}<br><br>Chunks!R chunks(R)(R range, size_t n, size_t step) if (isInputRan=

<br><br>Philippe<br></div></div> --0016e6d7eeb47c8d11048d2b0f7f--
Aug 06 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Philippe Sigaud:
 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

It's better to give it a default chunk size of 2. If I have understood this well, then this is the partition() function: http://reference.wolfram.com/mathematica/ref/Partition.html If you want to be more complete you can add two more features: to start counting items from the end (but they are given from the start) and give a "duplication window", so for example the natural numbers with a window of 2 and a duplication of 1 are [1, 2], [2, 3], [3, 4]. But adding too many features risks to produce functions like the Mathematica ones, that sometimes are hard to use because they have tons of complex arguments.) Bye, bearophile
Aug 06 2010
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Steven Schveighoffer wrote:
 On Fri, 06 Aug 2010 14:59:17 -0400, Philippe Sigaud 
 <philippe.sigaud gmail.com> wrote:
 
 On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer 
 <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It 
 has an
 optional step argument, to see how many elements to jump.

[snip] ElementType!R[] front() property { return array(take(range, n));} //

I'd change this to just return take(range, n). Rule #1 in writing efficient D code, avoid the heap when you can :)

Hmm, good idea. And that way, if n is big, you get a lazy range. But you lose the random access and such. I guess the user will call map!array() if she wants to get arrays?

Doesn't take return a random-access range if the original is a random access range? I would actually expect take(range, n) to return range[0..n] if range supports that.

It does since recently. Andrei
Aug 06 2010
prev sibling parent Adrian Matoga <epi atari8.info> writes:
On 2010-08-06 19:33, Philippe Sigaud wrote:
 2010/8/6 Adrian Matoga <epi atari8.info <mailto:epi atari8.info>>
 
     Hi,
 
     Is there any off the shelf solution for iterating over a range by
     chunks?
 
 
 None that I know of.
 
     (should substitute [1, 2, 3], [4, 5, 6], [7, 8, 9], [10] for chunk
     in subsequent iterations)
 
 
 As a data point, why do you think it should produce [10] and not stop at 
 [7,8,9]?

By an analogy to taking a file by chunks. The other case also occured to me but I simply thought I could imagine more situations in which I need the whole input range to be exhausted, including remainder.
 Here is what I cooked, it's still a bit rough around the edges. It has 
 an optional step argument, to see how many elements to jump.
 
 import std.range;
 struct Chunks(R) if (isInputRange!R)
 {
     R range;
     size_t n; // chunk size
     size_t step; // how many elements to jump
 
     bool empty()  property { return range.empty;}
     ElementType!R[] front()  property { return array(take(range, n));} 
 // inefficient if you call front() many times in a row
     void popFront()  property { popFrontN(range, step);}
 
     static if (hasLength!R)
         size_t length()  property { return (range.length+step-1)/step;}
 }
 
 Chunks!R chunks(R)(R range, size_t n) if (isInputRange!R)
 {
     return Chunks!R(range, n, n); // default is step == n
 }
 
 Chunks!R chunks(R)(R range, size_t n, size_t step) if (isInputRange!R)
 {
     return Chunks!R(range, n, step);
 }
  
 

Thank you very much! And yes, I also think it should be included in std.range (actually before posting the question I was almost sure I could find it there ;)). Adrian
Aug 06 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

[snip]
     ElementType!R[] front()  property { return array(take(range, n));} //

I'd change this to just return take(range, n). Rule #1 in writing efficient D code, avoid the heap when you can :) -Steve
Aug 06 2010
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--00032555a3da81058c048d2c43a7
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

[snip] ElementType!R[] front() property { return array(take(range, n));} //

I'd change this to just return take(range, n). Rule #1 in writing efficient D code, avoid the heap when you can :)

Hmm, good idea. And that way, if n is big, you get a lazy range. But you lose the random access and such. I guess the user will call map!array() if she wants to get arrays? Intially, I had a cache, but I ran into problems to initiate it correctly and told myself "Hey, it's a 10' function, don't sweat it, post it already, if that may help." Maybe the type produced could be decided by a policy: lazy, dynamic array... In a version I use, the n arg is a template arg and the range returns tuples. If found that easier to deal with: from a tuple, you can easily create and array (static, or dynamic), if you want to. And I can easily use it as a function argument list... Philippe --00032555a3da81058c048d2c43a7 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Fri, Aug 6, 2010 at 19:48, Steven Schveighoff= er <span dir=3D"ltr">&lt;<a href=3D"mailto:schveiguy yahoo.com">schveiguy y= ahoo.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style= =3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); p= adding-left: 1ex;"> <div class=3D"im">On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud &lt;<= a href=3D"mailto:philippe.sigaud gmail.com" target=3D"_blank">philippe.siga= ud gmail.com</a>&gt; wrote:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Here is what I cooked, it&#39;s still a bit rough around the edges. It has = an<br> optional step argument, to see how many elements to jump.<br> </blockquote> <br></div> [snip]<div class=3D"im"><br> <br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> =A0 =A0ElementType!R[] front() property { return array(take(range, n));} = //<br> </blockquote> <br></div> I&#39;d change this to just return take(range, n). =A0Rule #1 in writing ef= ficient D code, avoid the heap when you can :)<br></blockquote><div><br>Hmm= , good idea. And that way, if n is big, you get a lazy range. But you lose = the random access and such. I guess the user will call map!array() if she w= ants to get arrays?<br> <br>Intially, I had a cache, but I ran into problems to initiate it correct= ly and told myself &quot;Hey, it&#39;s a 10&#39; function, don&#39;t sweat = it, post it already, if that may help.&quot;<br><br>Maybe the type produced= could be decided by a policy: lazy, dynamic array...<br> In a version I use, the n arg is a template arg and the range returns tuple= s. If found that easier to deal with: from a tuple, you can easily create a= nd array (static, or dynamic), if you want to. And I can easily use it as a= function argument list...<br> <br>Philippe<br><br></div></div> --00032555a3da81058c048d2c43a7--
Aug 06 2010
prev sibling next sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--00032555b95a731bc7048d2c4e01
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Aug 6, 2010 at 19:41, bearophile <bearophileHUGS lycos.com> wrote:

 Philippe Sigaud:
 Here is what I cooked, it's still a bit rough around the edges. It has an
 optional step argument, to see how many elements to jump.

It's better to give it a default chunk size of 2.

Why? And what should the default step be, according to you? If chose n (the chunk size), because that's want the OP wanted. If I have understood this well, then this is the partition() function:
 http://reference.wolfram.com/mathematica/ref/Partition.html

Yeah, partition, chunk, segment, it a basic thing, worth including in std.range. Very useful in conjunction with mapping: calculating the moving average, for example.
 If you want to be more complete you can add two more features: to start
 counting items from the end (but they are given from the start) and give a
 "duplication window", so for example the natural numbers with a window of 2
 and a duplication of 1 are [1, 2], [2, 3], [3, 4].

This simple code does this already: chunks(range, 2, 1). --00032555b95a731bc7048d2c4e01 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <br><br><div class=3D"gmail_quote">On Fri, Aug 6, 2010 at 19:41, bearophile= <span dir=3D"ltr">&lt;<a href=3D"mailto:bearophileHUGS lycos.com">bearophi= leHUGS lycos.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote"= style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 2= 04); padding-left: 1ex;"> Philippe Sigaud:<br> <div class=3D"im">&gt; Here is what I cooked, it&#39;s still a bit rough ar= ound the edges. It has an<br> &gt; optional step argument, to see how many elements to jump.<br> <br> </div>It&#39;s better to give it a default chunk size of 2.<br></blockquote=
<div><br>Why?<br>And what should the default step be, according to you? If=


8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> If I have understood this well, then this is the partition() function:<br> <a href=3D"http://reference.wolfram.com/mathematica/ref/Partition.html" tar= get=3D"_blank">http://reference.wolfram.com/mathematica/ref/Partition.html<= /a><br></blockquote><div><br>Yeah, partition, chunk, segment, it a basic th= ing, worth including in std.range.<br> Very useful in conjunction with mapping: calculating the moving average, fo= r example.<br>=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margi= n: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-le= ft: 1ex;"> <br> If you want to be more complete you can add two more features: to start cou= nting items from the end (but they are given from the start) and give a &qu= ot;duplication window&quot;, so for example the natural numbers with a wind= ow of 2 and a duplication of 1 are [1, 2], [2, 3], [3, 4]. </blockquote> <div><br>This simple code does this already: chunks(range, 2, 1).<br>=A0<br=
</div></div><br>

--00032555b95a731bc7048d2c4e01--
Aug 06 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 06 Aug 2010 14:59:17 -0400, Philippe Sigaud  
<philippe.sigaud gmail.com> wrote:

 On Fri, Aug 6, 2010 at 19:48, Steven Schveighoffer  
 <schveiguy yahoo.com>wrote:

 On Fri, 06 Aug 2010 13:33:09 -0400, Philippe Sigaud <
 philippe.sigaud gmail.com> wrote:

  Here is what I cooked, it's still a bit rough around the edges. It has  
 an
 optional step argument, to see how many elements to jump.

[snip] ElementType!R[] front() property { return array(take(range, n));} //

I'd change this to just return take(range, n). Rule #1 in writing efficient D code, avoid the heap when you can :)

Hmm, good idea. And that way, if n is big, you get a lazy range. But you lose the random access and such. I guess the user will call map!array() if she wants to get arrays?

Doesn't take return a random-access range if the original is a random access range? I would actually expect take(range, n) to return range[0..n] if range supports that. -Steve
Aug 06 2010
prev sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
--00032555b69605150f048d2de694
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Aug 6, 2010 at 22:11, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 Steven Schveighoffer wrote:

 Doesn't take return a random-access range if the original is a random
 access range?

 I would actually expect take(range, n) to return range[0..n] if range
 supports that.

It does since recently.

Damn. Uh, I mean, cool! I did not integrate this. Then everything is for the best. Philippe --00032555b69605150f048d2de694 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Fri, Aug 6, 2010 at 22:11, Andrei Alexandresc= u <span dir=3D"ltr">&lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org">Se= eWebsiteForEmail erdani.org</a>&gt;</span> wrote:<br><blockquote class=3D"g= mail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(= 204, 204, 204); padding-left: 1ex;"> <div><div></div><div class=3D"h5">Steven Schveighoffer wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde= r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Doesn&#39;t take return a random-access range if the original is a random a= ccess range?<br> <br> I would actually expect take(range, n) to return range[0..n] if range suppo= rts that.<br> </blockquote> <br></div></div> It does since recently.<font color=3D"#888888"></font></blockquote><div><br=
Damn. Uh, I mean, cool! I did not integrate this.<br><br>Then everything i=

--00032555b69605150f048d2de694--
Aug 06 2010