www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - finish function for output ranges

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
N.B. I haven't yet reviewed the proposal.

There's been a lot of discussion about the behavior of hash 
accumulators, and I've just have a chat with Walter about such.

There are two angles in the discussion:

1. One is, the hash accumulator should work as an operand in an 
accumulation expression. Then the reduce() algorithm can be used as follows:

HashAccumulator ha;
reduce!((a, b) => a + b)(ha, [1, 2, 3]);
writeln(ha.finish());

This assumes the hash overloads operator +.

2. The other is, the hash accumulator is an output range - a sink! - 
that supports put() for a lot of stuff. Then the code would go:

HashAccumulator ha;
copy([1, 2, 3], ha);
writeln(ha.finish());

I think (2) is a much more fertile view than (1) because the notion of 
"reduce" emphasizes the accumulation operation (such as "+"), and that 
is a forced notion for hashes (we're not really adding stuff there). In 
contrast, the notion that the hash accumulator is a sink is very 
natural: you just dump a lot of stuff into the accumulator, and then you 
call finish and you get its digest.

So, where does this leave us?

I think we should reify the notion of finish() as an optional method for 
output ranges. We define in std.range a free finish(r) function that 
does nothing if r does not define a finish() method, and invokes the 
method if r does define it.

Then people can call r.finish() for all output ranges no problem.

For files, finish() should close the file (or at least flush it - 
unclear on that). I also wonder whether there exists a better name than 
finish(), and how to handle cases in which e.g. you finish() an output 
range and then you put more stuff into it, or you finish() a range 
several times, etc.


Destroy!

Andrei
Aug 11 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, August 11, 2012 19:29:53 Andrei Alexandrescu wrote:
 I also wonder whether there exists a better name than
 finish()

finish is what I've used for similar functions in the past. It seems like a fine name to me.
 and how to handle cases in which e.g. you finish() an output
 range and then you put more stuff into it, or you finish() a range
 several times, etc.

In all of the cases that I've dealt with where anything like finish is required, it's made no sense whatsoever to call finish mulitple times.
 Destroy!

Overall, seems like a sensible idea to me. - Jonathan M Davis
Aug 11 2012
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, 2012-08-11 at 19:29 -0400, Andrei Alexandrescu wrote:
[=E2=80=A6]
 I think (2) is a much more fertile view than (1) because the notion of=

 "reduce" emphasizes the accumulation operation (such as "+"), and that=

 is a forced notion for hashes (we're not really adding stuff there). In=

 contrast, the notion that the hash accumulator is a sink is very=20
 natural: you just dump a lot of stuff into the accumulator, and then you=

 call finish and you get its digest.

One could also consider the hash generator to be a builder, which would support 2 rather than 1. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Aug 12 2012
prev sibling next sibling parent reply "Daniel" <wyrlon gmx.net> writes:
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu 
wrote:
 N.B. I haven't yet reviewed the proposal.

 For files, finish() should close the file (or at least flush it 
 - unclear on that). I also wonder whether there exists a better 
 name than finish(), and how to handle cases in which e.g. you 
 finish() an output range and then you put more stuff into it, 
 or you finish() a range several times, etc.

How about naming "finish", flush? Which is unambiguous...
Aug 12 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/12/2012 1:25 AM, Daniel wrote:
 How about naming "finish", flush? Which is unambiguous...

We discussed that and rejected it, because "flush" has connotations of being an intermediate operation, not a final one.
Aug 12 2012
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 11 Aug 2012 19:29:53 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 2. The other is, the hash accumulator is an output range - a sink! - 
 that supports put() for a lot of stuff. Then the code would go:
 
 HashAccumulator ha;
 copy([1, 2, 3], ha);

This is a little off topic, but when I implemented the recent changes for std.hash I noticed the above code doesn't work, as ha is passed by value. You currently have to do this: ha = copy([1, 2, 3], ha); //or copy([1, 2, 3], &ha);
 I think we should reify the notion of finish() as an optional method
 for output ranges. We define in std.range a free finish(r) function
 that does nothing if r does not define a finish() method, and invokes
 the method if r does define it.
 
 Then people can call r.finish() for all output ranges no problem.

Sounds good.
 
 For files, finish() should close the file (or at least flush it - 
 unclear on that). I also wonder whether there exists a better name
 than finish(), and how to handle cases in which e.g. you finish() an
 output range and then you put more stuff into it, or you finish() a
 range several times, etc.

The current behavior in std.hash is to reset the 'HashAccumulator' to it's initial state after finish was called, so it can be reused. Finish does some computation which leaves the 'HashAccumulator' in an invalid state and resetting it is cheap, so I thought an implicit reset is convenient. I'm not sure about files. The data property in Appender is also similar, but it doesn't modify the internal state AFAIK, so it's possible to continue using Appender after accessing data. It's probably more a peek function than a finish function. Probably we should distinguish between finish functions which destroy the internal state and peek functions which do not modify the internal state. The implicit reset done in the hash finish functions would probably have to be removed then. The downside of this is that it's then possible to have a 'HashAccumulator' with invalid state, so 'put' would have to check for that (at least in debug mode). With the implicit reset it's not possible to get a 'HashAccumulator' with invalid state.
Aug 12 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.
Aug 12 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Aug-12 14:35, Jonathan M Davis wrote:
 On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
 On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.

The big question is whether it's merited in output ranges in general. A function could still be worth having on an individual type without making sense for output ranges in general as long as it's not required to use the type. However, I really don't think that peek makes sense for output ranges in general. Most of the time, you're just writing to them and not worrying about what's already been written. It's basically the same as when you write to a stream. I don't think that I've seen a peek function on an output stream.

Agreed. Current appender has .data (a-la peek) just because of implementation details that allow it. In fact there was a pull for Phobos including s better Appender (that would end up being a Builder I guess) that doesn't allow to peek at array in creation until the very end but provides much better performance profile. BTW what's happened with that pull? I recall github nickname sandford, but can't recall whose awesome work that was. I'd love to see it make its way into Phobos.
 And I really don't think that peek makes sense in the context of hashes. You
 don't care what a hash is until it's finished. And once it's finished, it
really
 doesn't make sense to keep adding to it. I don't know why you'd ever want an
 intermediate hash result.

Having partial hashes over data is very useful e.g. for fast binary diff algorithms. However it requires specific form of hash function (so that you can look at result) and/or operating at specific block granularity.
 If you call finish, you're done. And if finish gets called again, it's just
like
 if you call popFront after an input range is empty. The behavior is undefined
 (though popFront - or finish in this case - probably has an assertion for
 checking in non-release mode). I really don't think that it's all that big a
 deal.

Agreed. finish seems as a good name because indicates that it's an end of operation. But consider that hash accumulator can be reset after finish and reused (unlike say network stream). -- Dmitry Olshansky
Aug 12 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, August 12, 2012 03:20:49 Walter Bright wrote:
 On 8/12/2012 1:36 AM, Johannes Pfau wrote:
 Probably we should distinguish between finish functions which destroy
 the internal state and peek functions which do not modify the internal
 state.

I worry about too many parts to an interface. A peek function needs a really strong use case to justify it.

The big question is whether it's merited in output ranges in general. A function could still be worth having on an individual type without making sense for output ranges in general as long as it's not required to use the type. However, I really don't think that peek makes sense for output ranges in general. Most of the time, you're just writing to them and not worrying about what's already been written. It's basically the same as when you write to a stream. I don't think that I've seen a peek function on an output stream. And I really don't think that peek makes sense in the context of hashes. You don't care what a hash is until it's finished. And once it's finished, it really doesn't make sense to keep adding to it. I don't know why you'd ever want an intermediate hash result. If you call finish, you're done. And if finish gets called again, it's just like if you call popFront after an input range is empty. The behavior is undefined (though popFront - or finish in this case - probably has an assertion for checking in non-release mode). I really don't think that it's all that big a deal. - Jonathan M Davis
Aug 12 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, August 12, 2012 14:57:00 Dmitry Olshansky wrote:
 Agreed. Current appender has .data (a-la peek) just because of
 implementation details that allow it. In fact there was a pull for
 Phobos including s better Appender (that would end up being a Builder I
 guess) that doesn't allow to peek at array in creation until the very
 end but provides much better performance profile.
 
 BTW what's happened with that pull? I recall github nickname sandford,
 but can't recall whose awesome work that was. I'd love to see it make
 its way into Phobos.

I'm not sure. He was basically told to redo it as ArrayBuilder rather than changing Appender, since changing Appender would break a lot of code, and his changes arguably made it so that it wasn't an appender anymore anyway. But he has yet to post a new pull request with those changes. - Jonathan M Davis
Aug 12 2012
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Aug-12 03:29, Andrei Alexandrescu wrote:
 N.B. I haven't yet reviewed the proposal.

 There's been a lot of discussion about the behavior of hash
 accumulators, and I've just have a chat with Walter about such.

 There are two angles in the discussion:

 1. One is, the hash accumulator should work as an operand in an
 accumulation expression. Then the reduce() algorithm can be used as
 follows:

 HashAccumulator ha;
 reduce!((a, b) => a + b)(ha, [1, 2, 3]);
 writeln(ha.finish());

 This assumes the hash overloads operator +.

Would have been nice to have operator '<-' for "place" *LOL* :) (OT: I think C++ would be a much better place if it had it for e.g. iostream ...)
 I think we should reify the notion of finish() as an optional method for
 output ranges. We define in std.range a free finish(r) function that
 does nothing if r does not define a finish() method, and invokes the
 method if r does define it.

 Then people can call r.finish() for all output ranges no problem.

 For files, finish() should close the file (or at least flush it -
 unclear on that).

Easy to check: { File f = File("myfile", "w"); auto sink = f.lockingTextWriter; dumpTo(sink, some_vars); dumpTo(sink, some_other_vars); sink.finish(); //would be taking on f's job to close file return f; //and now what? clearly f is the one responsible // (with the means to transfer that responsibility) } So IMO ranges should not step down to topology & origins of data be it output range or input range. This also means that with streams, finish is a flush and thus I'd expect finish to be callable many times in row.
 Destroy!

One thing I don't like about it is a by-hand nature of it. Manual way is good only when you are interested in the result of finish. I half expect to see rule #X of D coding standard: use RAI or scope(exit) to flush an output range -- Dmitry Olshansky
Aug 12 2012
prev sibling next sibling parent =?ISO-8859-1?Q?Jos=E9_Armando_Garc=EDa_Sancio?= <jsancio gmail.com> writes:
On Sat, Aug 11, 2012 at 4:29 PM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 For files, finish() should close the file (or at least flush it - unclear on
 that). I also wonder whether there exists a better name than finish(), and
 how to handle cases in which e.g. you finish() an output range and then you
 put more stuff into it, or you finish() a range several times, etc.

As another data point, std.log has a Logger interface which defines log(...) and flush (...) so this fits perfectly into this output range design. Here are the current signatures: shared void log(const ref LogMessage message); shared void flush(); finish would be a weird name for 'interface Logger' because it is not a final operation. For what it is worth, Java's OutputStream has the following: void write(...) // overloaded three times void flush() // "Flushes this output stream and forces any buffered output bytes to be written out." void close() // "Closes this output stream and releases any system resources associated with this stream."
Aug 12 2012
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 12/08/12 00:29, Andrei Alexandrescu wrote:
 I think we should reify the notion of finish() as an optional method for output
 ranges. We define in std.range a free finish(r) function that does nothing if r
 does not define a finish() method, and invokes the method if r does define it.

 Then people can call r.finish() for all output ranges no problem.

What about a start() method? You may recall in the RandomSample revisions I had to introduce a tweak to ensure that the first value returned by front() was set only the first time front() was called, and not in the constructor. The idea of the start() method would be to addresses this requirement, i.e. to do something immediately before front() gets called for the first time and not earlier.
Aug 12 2012
prev sibling next sibling parent reply "Mehrdad" <wfunction hotmail.com> writes:
On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu 
wrote:
 This assumes the hash overloads operator +.

Wouldn't "~" be a better choice?
Aug 15 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/15/12 6:27 AM, Mehrdad wrote:
 On Saturday, 11 August 2012 at 23:29:57 UTC, Andrei Alexandrescu wrote:
 This assumes the hash overloads operator +.

Wouldn't "~" be a better choice?

I think neither would be a good choice. Andrei
Aug 15 2012
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 13/08/12 01:08, Joseph Rushton Wakeling wrote:
 What about a start() method?

Was this a daft question? :-)
Aug 15 2012