digitalmars.D - groupBy/chunkBy redux

Andrei Alexandrescu (20/20) Feb 13 2015 Looks like there's a backlog of stuff to finalize for groupBy and aggreg...

Peter Alexander (16/38) Feb 13 2015 Agreed.

Andrei Alexandrescu (9/45) Feb 14 2015 Oops, I thought that's groups. I guess we could call it groupBy as well,...

Peter Alexander (4/5) Feb 15 2015 Yep, I have some time.

Andrei Alexandrescu (2/7) Feb 15 2015 Fantastic, thanks! Remember we plan to release on March 1. -- Andrei

"Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> (8/12) Apr 17 2015 I am somewhat confused. I know these changes have been done. The

Andrei Alexandrescu (3/15) Apr 17 2015 Sighhhh... the master has chunkBy but 2.067.0 has groupBy. Martin? -- An...

w0rp (2/2) Apr 18 2015 I wonder what it's going to look like to see byChunk and chunkBy

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Looks like there's a backlog of stuff to finalize for groupBy and aggregate:

* Perhaps rename groupBy to chunkBy. People coming from SQL and other 
languages might expect groupBy to do hash-based grouping.

* The unary function implementation must return for each group a tuple 
consisting of the key and the lazy range of values. The binary function 
implementation should continue to only return the lazy range of values.

* SortedRange should add a method called group(). Invoked with no 
predicate, group() should do what chunkBy does, using the sorting predicate.

* aggregate() should detect the two kinds of results per group (well, 
chunk) and process them accordingly: for unary-predicate chunks, pass 
the key through and only process the lazy range. Meaning:

auto data = [
   tuple("John", 100),
   tuple("John", 35),
   tuple("Jane", 200),
   tuple("Jane", 87),
];
auto r = data.chunkBy!(x => x[0]).aggregate!sum;

yields a range of tuples: tuple("John", 135), tuple("Jane", 187).



Andrei

Feb 13 2015

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Friday, 13 February 2015 at 18:32:35 UTC, Andrei Alexandrescu 
wrote:
 * Perhaps rename groupBy to chunkBy. People coming from SQL and 
 other languages might expect groupBy to do hash-based grouping.

Agreed.


 * The unary function implementation must return for each group 
 a tuple consisting of the key and the lazy range of values. The 
 binary function implementation should continue to only return 
 the lazy range of values.

Is the purpose of this just to avoid the user potentially needing 
to evaluate the key function twice?


 * SortedRange should add a method called group(). Invoked with 
 no predicate, group() should do what chunkBy does, using the 
 sorting predicate.

Will need to be called something else since there may be existing 
code trying to call std.algorithm.group using UFCS. This would 
change its behaviour.


 * aggregate() should detect the two kinds of results per group 
 (well, chunk) and process them accordingly: for unary-predicate 
 chunks, pass the key through and only process the lazy range. 
 Meaning:

 auto data = [
   tuple("John", 100),
   tuple("John", 35),
   tuple("Jane", 200),
   tuple("Jane", 87),
 ];
 auto r = data.chunkBy!(x => x[0]).aggregate!sum;

 yields a range of tuples: tuple("John", 135), tuple("Jane", 
 187).

Not sure I understand how this is meant to work.

With your second bullet implemented, data.chunkBy!(x => x[0]) 
will return:

tuple("John", [tuple("John", 100), tuple("John", 35)]),
tuple("Jane", [tuple("Jane", 200), tuple("Jane", 87)])

(here [...] denotes the sub-range, not an array).

So aggregate will ignore the key part, but how does it know to 
ignore the name in sub-ranges?

Feb 13 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/13/15 3:45 PM, Peter Alexander wrote:
 On Friday, 13 February 2015 at 18:32:35 UTC, Andrei Alexandrescu wrote:
 * Perhaps rename groupBy to chunkBy. People coming from SQL and other
 languages might expect groupBy to do hash-based grouping.

 Agreed.


 * The unary function implementation must return for each group a tuple
 consisting of the key and the lazy range of values. The binary
 function implementation should continue to only return the lazy range
 of values.

 Is the purpose of this just to avoid the user potentially needing to
 evaluate the key function twice?

Yah. Also in many cases of grouping you need the key anyway.

 * SortedRange should add a method called group(). Invoked with no
 predicate, group() should do what chunkBy does, using the sorting
 predicate.

 Will need to be called something else since there may be existing code
 trying to call std.algorithm.group using UFCS. This would change its
 behaviour.

Oops, I thought that's groups. I guess we could call it groupBy as well, 
even though it has no predicate so "by" does not participate to a sentence.

 * aggregate() should detect the two kinds of results per group (well,
 chunk) and process them accordingly: for unary-predicate chunks, pass
 the key through and only process the lazy range. Meaning:

 auto data = [
   tuple("John", 100),
   tuple("John", 35),
   tuple("Jane", 200),
   tuple("Jane", 87),
 ];
 auto r = data.chunkBy!(x => x[0]).aggregate!sum;

 yields a range of tuples: tuple("John", 135), tuple("Jane", 187).

 Not sure I understand how this is meant to work.

 With your second bullet implemented, data.chunkBy!(x => x[0]) will return:

 tuple("John", [tuple("John", 100), tuple("John", 35)]),
 tuple("Jane", [tuple("Jane", 200), tuple("Jane", 87)])

Correct.

 (here [...] denotes the sub-range, not an array).

 So aggregate will ignore the key part, but how does it know to ignore
 the name in sub-ranges?

Oops, I was wrong here. Let's think about aggregate() integration 
post-2.067 and remove it for now.

Peter, could you please take this?


Andrei

Feb 14 2015

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Saturday, 14 February 2015 at 19:39:44 UTC, Andrei 
Alexandrescu wrote:
 Peter, could you please take this?

Yep, I have some time.

https://issues.dlang.org/show_bug.cgi?id=14183

Feb 15 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/15/15 11:34 AM, Peter Alexander wrote:
 On Saturday, 14 February 2015 at 19:39:44 UTC, Andrei Alexandrescu wrote:
 Peter, could you please take this?

 Yep, I have some time.

 https://issues.dlang.org/show_bug.cgi?id=14183

Fantastic, thanks! Remember we plan to release on March 1. -- Andrei

Feb 15 2015

"Ulrich =?UTF-8?B?S8O8dHRsZXIi?= <kuettler gmail.com> writes:

On Sunday, 15 February 2015 at 19:42:16 UTC, Andrei Alexandrescu 
wrote:
 On 2/15/15 11:34 AM, Peter Alexander wrote:
 https://issues.dlang.org/show_bug.cgi?id=14183

 Fantastic, thanks! Remember we plan to release on March 1. --

I am somewhat confused. I know these changes have been done. The 
function has been renamed to chunkBy and the return type of the 
unary version has been changed. I am surprised to learn, however, 
that the old version is included in the 2.067.0 release:



What has happened?

Apr 17 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/17/15 2:30 PM, "Ulrich =?UTF-8?B?S8O8dHRsZXIi?= 
<kuettler gmail.com>" wrote:
 On Sunday, 15 February 2015 at 19:42:16 UTC, Andrei Alexandrescu wrote:
 On 2/15/15 11:34 AM, Peter Alexander wrote:
 https://issues.dlang.org/show_bug.cgi?id=14183

 Fantastic, thanks! Remember we plan to release on March 1. --

 I am somewhat confused. I know these changes have been done. The
 function has been renamed to chunkBy and the return type of the unary
 version has been changed. I am surprised to learn, however, that the old
 version is included in the 2.067.0 release:



 What has happened?

Sighhhh... the master has chunkBy but 2.067.0 has groupBy. Martin? -- Andrei

Apr 17 2015

"w0rp" <devw0rp gmail.com> writes:

I wonder what it's going to look like to see byChunk and chunkBy 
next to each other.

Apr 18 2015

D Programming

C/C++ Programming

Other

digitalmars.D - groupBy/chunkBy redux