digitalmars.D - finish function for output ranges

Andrei Alexandrescu (34/34) Aug 11 2012 N.B. I haven't yet reviewed the proposal.

Jonathan M Davis (7/13) Aug 11 2012 finish is what I've used for similar functions in the past. It seems lik...
Russel Winder (18/24) Aug 12 2012 =20
Daniel (3/9) Aug 12 2012 How about naming "finish", flush? Which is unambiguous...

Walter Bright (3/4) Aug 12 2012 We discussed that and rejected it, because "flush" has connotations of b...

Johannes Pfau (26/43) Aug 12 2012 This is a little off topic, but when I implemented the recent changes

Walter Bright (3/6) Aug 12 2012 I worry about too many parts to an interface.

Jonathan M Davis (19/27) Aug 12 2012 The big question is whether it's merited in output ranges in general. A

Dmitry Olshansky (17/43) Aug 12 2012 Agreed. Current appender has .data (a-la peek) just because of

Jonathan M Davis (6/15) Aug 12 2012 I'm not sure. He was basically told to redo it as ArrayBuilder rather th...

Dmitry Olshansky (23/42) Aug 12 2012 Would have been nice to have operator '<-' for "place" *LOL* :)
=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= (15/19) Aug 12 2012 As another data point, std.log has a Logger interface which defines
Joseph Rushton Wakeling (7/11) Aug 12 2012 What about a start() method? You may recall in the RandomSample revisio...
Mehrdad (3/4) Aug 15 2012 Wouldn't "~" be a better choice?

Andrei Alexandrescu (3/6) Aug 15 2012 I think neither would be a good choice.

Joseph Rushton Wakeling (2/3) Aug 15 2012 Was this a daft question? :-)

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

N.B. I haven't yet reviewed the proposal.

There's been a lot of discussion about the behavior of hash 
accumulators, and I've just have a chat with Walter about such.

There are two angles in the discussion:

1. One is, the hash accumulator should work as an operand in an 
accumulation expression. Then the reduce() algorithm can be used as follows:

HashAccumulator ha;
reduce!((a, b) => a + b)(ha, [1, 2, 3]);
writeln(ha.finish());

This assumes the hash overloads operator +.

2. The other is, the hash accumulator is an output range - a sink! - 
that supports put() for a lot of stuff. Then the code would go:

HashAccumulator ha;
copy([1, 2, 3], ha);
writeln(ha.finish());

I think (2) is a much more fertile view than (1) because the notion of 
"reduce" emphasizes the accumulation operation (such as "+"), and that 
is a forced notion for hashes (we're not really adding stuff there). In 
contrast, the notion that the hash accumulator is a sink is very 
natural: you just dump a lot of stuff into the accumulator, and then you 
call finish and you get its digest.

So, where does this leave us?

I think we should reify the notion of finish() as an optional method for 
output ranges. We define in std.range a free finish(r) function that 
does nothing if r does not define a finish() method, and invokes the 
method if r does define it.

Then people can call r.finish() for all output ranges no problem.

For files, finish() should close the file (or at least flush it - 
unclear on that). I also wonder whether there exists a better name than 
finish(), and how to handle cases in which e.g. you finish() an output 
range and then you put more stuff into it, or you finish() a range 
several times, etc.


Destroy!

Andrei

Aug 11 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday, August 11, 2012 19:29:53 Andrei Alexandrescu wrote:
 I also wonder whether there exists a better name than
 finish()

finish is what I've used for similar functions in the past. It seems like a
fine 
name to me.

 and how to handle cases in which e.g. you finish() an output
 range and then you put more stuff into it, or you finish() a range
 several times, etc.

In all of the cases that I've dealt with where anything like finish is 
required, it's made no sense whatsoever to call finish mulitple times.

 Destroy!

Overall, seems like a sensible idea to me.

- Jonathan M Davis

Aug 11 2012

Russel Winder <russel winder.org.uk> writes:

On Sat, 2012-08-11 at 19:29 -0400, Andrei Alexandrescu wrote:
[=E2=80=A6]
 I think (2) is a much more fertile view than (1) because the notion of=

=20
 "reduce" emphasizes the accumulation operation (such as "+"), and that=

=20
 is a forced notion for hashes (we're not really adding stuff there). In=

=20
 contrast, the notion that the hash accumulator is a sink is very=20
 natural: you just dump a lot of stuff into the accumulator, and then you=

=20
 call finish and you get its digest.

One could also consider the hash generator to be a builder, which would
support 2 rather than 1.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Aug 12 2012

"Daniel" <wyrlon gmx.net> writes:

On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu 
wrote:
 N.B. I haven't yet reviewed the proposal.

 For files, finish() should close the file (or at least flush it 
 - unclear on that). I also wonder whether there exists a better 
 name than finish(), and how to handle cases in which e.g. you 
 finish() an output range and then you put more stuff into it, 
 or you finish() a range several times, etc.

How about naming "finish", flush? Which is unambiguous...

Aug 12 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/12/2012 1:25 AM, Daniel wrote:
 How about naming "finish", flush? Which is unambiguous...


We discussed that and rejected it, because "flush" has connotations of being an 
intermediate operation, not a final one.

Aug 12 2012

Johannes Pfau <nospam example.com> writes:

Am Sat, 11 Aug 2012 19:29:53 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 2. The other is, the hash accumulator is an output range - a sink! - 
 that supports put() for a lot of stuff. Then the code would go:
 
 HashAccumulator ha;
 copy([1, 2, 3], ha);

This is a little off topic, but when I implemented the recent changes
for std.hash I noticed the above code doesn't work, as ha is passed by
value. You currently have to do this:
ha = copy([1, 2, 3], ha); //or
copy([1, 2, 3], &ha);

 I think we should reify the notion of finish() as an optional method
 for output ranges. We define in std.range a free finish(r) function
 that does nothing if r does not define a finish() method, and invokes
 the method if r does define it.
 
 Then people can call r.finish() for all output ranges no problem.

Sounds good.
 
 For files, finish() should close the file (or at least flush it - 
 unclear on that). I also wonder whether there exists a better name
 than finish(), and how to handle cases in which e.g. you finish() an
 output range and then you put more stuff into it, or you finish() a
 range several times, etc.

The current behavior in std.hash is to reset the 'HashAccumulator' to
it's initial state after finish was called, so it can be reused. Finish
does some computation which leaves the 'HashAccumulator' in an invalid
state and resetting it is cheap, so I thought an implicit reset is
convenient.

I'm not sure about files.

The data property in Appender is also similar, but it doesn't modify
the internal state AFAIK, so it's possible to continue using Appender
after accessing data. It's probably more a peek function than a finish
function.

Probably we should distinguish between finish functions which destroy
the internal state and peek functions which do not modify the internal
state.
The implicit reset done in the hash finish functions would probably
have to be removed then. The downside of this is that it's then possible
to have a 'HashAccumulator' with invalid state, so 'put' would have to
check for that (at least in debug mode). With the implicit reset it's
not possible to get a 'HashAccumulator' with invalid state.

Aug 12 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

I worry about too many parts to an interface.

A peek function needs a really strong use case to justify it.

Aug 12 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
 On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

 
 I worry about too many parts to an interface.
 
 A peek function needs a really strong use case to justify it.

The big question is whether it's merited in output ranges in general. A 
function could still be worth having on an individual type without making 
sense for output ranges in general as long as it's not required to use the 
type.

However, I really don't think that peek makes sense for output ranges in 
general. Most of the time, you're just writing to them and not worrying about 
what's already been written. It's basically the same as when you write to a 
stream. I don't think that I've seen a peek function on an output stream.

And I really don't think that peek makes sense in the context of hashes. You 
don't care what a hash is until it's finished. And once it's finished, it
really 
doesn't make sense to keep adding to it. I don't know why you'd ever want an 
intermediate hash result.

If you call finish, you're done. And if finish gets called again, it's just
like 
if you call popFront after an input range is empty. The behavior is undefined 
(though popFront - or finish in this case - probably has an assertion for 
checking in non-release mode). I really don't think that it's all that big a 
deal.

- Jonathan M Davis

Aug 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 12-Aug-12 14:35, Jonathan M Davis wrote:
 On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
 On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

 I worry about too many parts to an interface.

 A peek function needs a really strong use case to justify it.

 The big question is whether it's merited in output ranges in general. A
 function could still be worth having on an individual type without making
 sense for output ranges in general as long as it's not required to use the
 type.

 However, I really don't think that peek makes sense for output ranges in
 general. Most of the time, you're just writing to them and not worrying about
 what's already been written. It's basically the same as when you write to a
 stream. I don't think that I've seen a peek function on an output stream.

Agreed. Current appender has .data (a-la peek) just because of 
implementation details that allow it. In fact there was a pull for 
Phobos including s better Appender (that would end up being a Builder I 
guess) that doesn't allow to peek at array in creation until the very 
end but provides much better performance profile.

BTW what's happened with that pull? I recall github nickname sandford, 
but can't recall whose awesome work that was. I'd love to see it make 
its way into Phobos.

 And I really don't think that peek makes sense in the context of hashes. You
 don't care what a hash is until it's finished. And once it's finished, it
really
 doesn't make sense to keep adding to it. I don't know why you'd ever want an
 intermediate hash result.

Having partial hashes over data is very useful e.g. for fast binary diff 
algorithms. However it requires specific form of hash function (so that 
you can look at result) and/or operating at specific block granularity.

 If you call finish, you're done. And if finish gets called again, it's just
like
 if you call popFront after an input range is empty. The behavior is undefined
 (though popFront - or finish in this case - probably has an assertion for
 checking in non-release mode). I really don't think that it's all that big a
 deal.

Agreed. finish seems as a good name because indicates that it's an end 
of operation. But consider that hash accumulator can be reset after 
finish and reused (unlike say network stream).

-- 
Dmitry Olshansky

Aug 12 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Sunday, August 12, 2012 14:57:00 Dmitry Olshansky wrote:
 Agreed. Current appender has .data (a-la peek) just because of
 implementation details that allow it. In fact there was a pull for
 Phobos including s better Appender (that would end up being a Builder I
 guess) that doesn't allow to peek at array in creation until the very
 end but provides much better performance profile.
 
 BTW what's happened with that pull? I recall github nickname sandford,
 but can't recall whose awesome work that was. I'd love to see it make
 its way into Phobos.

I'm not sure. He was basically told to redo it as ArrayBuilder rather than 
changing Appender, since changing Appender would break a lot of code, and his 
changes arguably made it so that it wasn't an appender anymore anyway. But he 
has yet to post a new pull request with those changes.

- Jonathan M Davis

Aug 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 12-Aug-12 03:29, Andrei Alexandrescu wrote:
 N.B. I haven't yet reviewed the proposal.

 There's been a lot of discussion about the behavior of hash
 accumulators, and I've just have a chat with Walter about such.

 There are two angles in the discussion:

 1. One is, the hash accumulator should work as an operand in an
 accumulation expression. Then the reduce() algorithm can be used as
 follows:

 HashAccumulator ha;
 reduce!((a, b) => a + b)(ha, [1, 2, 3]);
 writeln(ha.finish());

 This assumes the hash overloads operator +.

Would have been nice to have operator '<-'  for "place" *LOL* :)
(OT: I think C++ would be a much better place if it had it for e.g. 
iostream ...)

 I think we should reify the notion of finish() as an optional method for
 output ranges. We define in std.range a free finish(r) function that
 does nothing if r does not define a finish() method, and invokes the
 method if r does define it.

 Then people can call r.finish() for all output ranges no problem.

 For files, finish() should close the file (or at least flush it -
 unclear on that).

Easy to check:
{
   File f = File("myfile", "w");
   auto sink = f.lockingTextWriter;
   dumpTo(sink, some_vars);
   dumpTo(sink, some_other_vars);
   sink.finish(); //would be taking on f's job to close file
   return f; //and now what? clearly f is the one responsible
// (with the means to transfer that responsibility)
}

So IMO ranges should not step down to topology & origins of data be it 
output range or input range. This also means that with streams, finish 
is a flush and thus I'd expect finish to be callable many times in row.


 Destroy!

One thing I don't like about it is a by-hand nature of it. Manual way is 
good only when you are interested in the result of finish.

I half expect to see rule #X of D coding standard:

use RAI or scope(exit) to flush an output range

-- 
Dmitry Olshansky

Aug 12 2012

=?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= <jsancio gmail.com> writes:

On Sat, Aug 11, 2012 at 4:29 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 For files, finish() should close the file (or at least flush it - unclear on
 that). I also wonder whether there exists a better name than finish(), and
 how to handle cases in which e.g. you finish() an output range and then you
 put more stuff into it, or you finish() a range several times, etc.

As another data point, std.log has a Logger interface which defines
log(...) and flush (...) so this fits perfectly into this output range
design. Here are the current signatures:

shared void log(const ref LogMessage message);
shared void flush();

finish would be a weird name for 'interface Logger' because it is not
a final operation. For what it is worth, Java's OutputStream has the
following:

void write(...) // overloaded three times
void flush() // "Flushes this output stream and forces any buffered
output bytes to be written out."
void close() // "Closes this output stream and releases any system
resources associated with this stream."

Aug 12 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 12/08/12 00:29, Andrei Alexandrescu wrote:
 I think we should reify the notion of finish() as an optional method for output
 ranges. We define in std.range a free finish(r) function that does nothing if r
 does not define a finish() method, and invokes the method if r does define it.

 Then people can call r.finish() for all output ranges no problem.

What about a start() method?  You may recall in the RandomSample revisions I
had 
to introduce a tweak to ensure that the first value returned by front() was set 
only the first time front() was called, and not in the constructor.

The idea of the start() method would be to addresses this requirement, i.e. to 
do something immediately before front() gets called for the first time and not 
earlier.

Aug 12 2012

"Mehrdad" <wfunction hotmail.com> writes:

On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu 
wrote:
 This assumes the hash overloads operator +.

Wouldn't "~" be a better choice?

Aug 15 2012

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/15/12 6:27 AM, Mehrdad wrote:
 On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:
 This assumes the hash overloads operator +.

 Wouldn't "~" be a better choice?

I think neither would be a good choice.

Andrei

Aug 15 2012

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On 13/08/12 01:08, Joseph Rushton Wakeling wrote:
 What about a start() method?

Was this a daft question? :-)

Aug 15 2012

D Programming

C/C++ Programming

Other

digitalmars.D - finish function for output ranges