www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Transience of .front in input vs. forward ranges

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Now that Andrei is back (I think?), I want to bring up this discussion
again, because I think it's important.

Recently, in another thread, it was found that std.algorithm.joiner
doesn't work properly with input ranges whose .front value is
invalidated by popFront(). Andrei stated that for input ranges .front
should not be assumed to return a persistent value, whereas for forward
ranges, .front can be assumed to be persistent. However, Jonathan
believes that .front should never be transient.

Obviously, both cannot be the case simultaneously. So we need to decide
exactly what it should be, because the current situation is subtly
broken, and this subtle brokenness is pervasive. For example, I recently
rewrote joiner to eliminate the assumption that .front is persistent,
only to discover that in the unittests, I can't use array() or equal()
(or, for that matter, writefln()), because they apparently all make this
assumption at some point (I didn't bother to find out exactly where).

In other words, right now input ranges really only work with arrays and
array-like objects. Not the generic ranges that Andrei has in mind in
his article "On Iteration". Many input ranges will subtly break, the
prime whipping boy example being byLine (which I hate to bring up
because it does not represent the full scope of such transient ranges),
a range that returns in-place permutations of an array, or anything that
reuses a buffer, really.

This situation isn't as simple as input ranges being transient and
forward ranges not, though. I want to bring another example besides the
dead horse byLine() to the spotlight. Let's say you have a range R that
spans all permutations of some starting array A. For efficiency reasons,
we don't want to allocate a new array every time we return a
permutation; so we have an internal buffer in R that holds the current
permutation, which is what .front returns. Then popFront() simply
permutes this buffer in-place. Something like this:

	struct AllPermutations(T) {
		T[] front, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			front = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(front, first))
				done = true;
		}
		 property bool empty() {
			return !done;
		}
	}

This is an input range, according to Andrei's definition. The value of
.front is transient, since popFront() modifies it in-place. According to
Jonathan's definition, however, this isn't a valid range for that very
reason.

Now consider what happens if we add this member:

	auto save() {
		AllPermutations!T copy;
		copy.front = this.front;
		copy.first = this.first;
		copy.done = this.done;
		return copy;
	}

This returns a separate instance of the same range, starting with the
current permutation, and ending with the original permutation, as
before. I submit that this makes it a forward range. However, this fails
to be a forward range under Andrei's definition, because forward ranges
require .front to be persistent. So we'd have to modify the range to be
something like this:

	struct AllPermutations(T) {
		T[] current, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			current = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(current, first))
				done = true;
		}
		 property bool empty() {
			return !done;
		}
		 property T[] front() {
			return current.dup; // <--- note this line
		}
		auto save() {
			AllPermutations!T copy;
			copy.front = this.front;
			copy.first = this.first;
			copy.done = this.done;
			return copy;
		}

	}

Note that now we have to duplicate the output array every time .front is
accessed. So whatever gains we may have had by using nextPermutation to
modify the array in-place is lost, just so that we can conform to an
arbitrary standard of what a forward range is.

Under Jonathan's definition, we'd have to incur this cost regardless of
whether we had save() or not, since .front is *always* required to be
persistent.

But I propose that the correct solution is to recognize that whether or
not .front is transient is orthogonal to whether a range is an input
range or a forward range. Many algorithms actually don't care if .front
is persistent or not (at least in theory), including equal(), find(),
map(), count(), reduce(), joiner(). Maybe they *currently* assume that
.front is persistent, but they are easily implementable *without* this
assumption (I have an implementation of joiner that doesn't make this
assumption).

Not allowing forward ranges to have transient .front values, or not
allowing transient .front values at all, will introduce the arbitrary
need for lots of implicit copying, which makes D code inefficient. It
will also limit the applicability of std.algorithm and std.range (if
said the cost of said copying is unacceptable for a particular
application).

The problem, of course, is how to check of a range has transient .front
or not. I proposed adding a .isTransient member to the range, but this
depends on the range implementor to remember to mark the range with
.isTransient=true, and we know that coding by convention is not
scalable. So here's another idea:

	template isTransient(R) if (isInputRange!R)
	{
		static if (is(typeof(R.front) : immutable(typeof(R.front))))
			enum isTransient = false;
		else
			enum isTransient = true;
	}

The idea is that value types are implicitly convertible to immutable,
and value types are non-transient. Reference types, if they can be made
immutable, ensures that calling popFront() will never invalidate them
(by definition of immutable).

To use byLine as an example, if we want to make it non-transient, we
simply have .front return string instead of char[]. Then we can safely
use it with any of the existing algorithms that assume that .front won't
mutate under them when popFront() is called.

Algorithms that *don't* need .front to be persistent can just take
input/forward/whatever ranges as usual.

In any case, whether we decide to do this or not, we NEED to set down a
clear, unambiguous definition of exactly what an input range is, and
what a forward range is, and what kind of assumptions may be safely made
regarding the value of .front. The current ambiguous situation is a
hiding place for subtle bugs, and is unacceptable.


T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see
through walls. It was called the "window".
Oct 30 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 30/10/2012 23:29, H. S. Teoh a crit :
 Now that Andrei is back (I think?), I want to bring up this discussion
 again, because I think it's important.

Interesting post. It appears to me that invalidating .front is done for performances reasons most of the time. As this is the unsafe face of the coin, I strongly think that this shouldn't be the default behavior. To me the best solution is to provide a function or a property on ranges that provide the unsafe version of the range. It is easy to provide a default implementation as UFCS that is a NOOP so no code is being broken. With that design, it is possible to provide an always safe interface while allowing algorithm that can handle non persistent .front to run at full speed. As it is always safe to provide a range that persist .front when a range that do not persist it is expected, no code needs to be broken. Obviously, the performance boost will not show up in this case :D byLine and alike will have to be modified to follow this policy, but hopefully, no code should be broken in the process. If I use your permuting example, it should be done as follow : struct InvalidatingAllPermutations(T) { T[] current, first; bool done; this(T[] initialBuf) { first = initialBuf; current = first.dup; } void popFront() { nextPermutation(current); if (equal(current, first)) done = true; } property bool empty() { return !done; } property T[] front() { return current; } auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } } struct AllPermutations(T) { InvalidatingAllPermutations!T innerRange; alias innerRange this; T[] current; this(T[] initialBuf) { innerRange = InvalidatingAllPermutations!T(initialBuf); current = innerRange.front.dup; } property T[] front() { return current; } void popFront() { innerRange.popFront(); current = innerRange.front.dup; } } I do think this is the way that conciliate safe as default but still allow to be fast when needed, without adding much on most code.
Oct 30 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 31/10/2012 01:56, H. S. Teoh a crit :
 +1. I like this better than my proposal.

The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.
Oct 31 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 31/10/2012 18:19, H. S. Teoh a crit :
 On Wed, Oct 31, 2012 at 02:18:55PM +0100, deadalnix wrote:
 Le 31/10/2012 01:56, H. S. Teoh a crit :
 +1. I like this better than my proposal.

The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.

Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.
Oct 31 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 01/11/2012 05:30, H. S. Teoh a crit :
 On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:
 Le 31/10/2012 18:19, H. S. Teoh a crit :

 Actually, there is another problem: many algorithms' output will be
 transient or not depending on the input range. For example, we could
 write map to use the transient version of byLine, say, but then the
 range that map returns will also be transient (no longer safe).

 IOW, transience of .front is transitive in some cases. This again
 makes things complicated: as soon as you use a single transient
 range, it makes downstream ranges transient as well. So we're back to
 the problem of how to mark the range as transient in a transparent
 way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.

But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved.

Would joiner be able to take advantage of this if it was known or not ?
Nov 01 2012
parent reply deadalnix <deadalnix gmail.com> writes:
This is up to the consumer to choose if .front can be oblitered on 
popFront, not to an intermediate algorithm.

joiner isn't a consumer, this is a  transformer . transformer have to 
propagate the .fast (I don't like this name, but this is unimportant for 
now) to its source.

Let'(s consider the following sheme :
source -> transformer (possibly several of them) -> consumer.

Now see example below :

auto transform(R)(R range) {
     struct Transfomer(R) {
         // Operations . . .

          property
         auto fast() {
             return transform(source.fast);
         }
     }
}

So the fast is used by the consumer and bubble throw all ranges that 
support it up to the source. Or is made a NOOP in the process in one of 
the transformers or the source do not support that optimization.

As said before, this is up to the consumer to know it it accept .front 
to be obliterated. In your case, writeln don't support that, so .fast 
must not be used with writeln.
Nov 01 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/3/12 7:21 PM, H. S. Teoh wrote:
 On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:
 On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or
 any intermediate step) that decides whether to use the optimization,
 but the final consumer.

 Though now we have the issue that all intermediate ranges must
 propagate .fast, which is troublesome if every range has to do it
 manually. Can this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.

I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.

I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain. In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one. Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable. Andrei
Nov 04 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:16 AM, H. S. Teoh wrote:
 Fetching lines should be solved by using types; trafficking in
 char[] does not offer guarantees about the preservation of the
 content. In contrast, trafficking in string formalizes a binding
 agreement between the range and its user that the content is
 immutable.

I think that this may be oversimplifying the issue.

The express purpose here is to simplify instead of offering highly granular notions with many odd corner cases and combinations. So the risk of oversimplifying is indeed material.
 What about the
 example of a range of in-place permutations over an array? That range
 can be made a forward range, or even bidirectional (by reversing the
 sense of the comparison used by nextPermutation). But it is inherently a
 transient range.

That's an interesting example indeed. It would keep a T[] as its state and each popFront() would mutate that array to its next permutation. Implementing .save would be achieved by duplicating the array. I would argue this is a tenuous range to characterize as a forward range. Invoking .save on a "usual" forward range means saving the iteration state, with an understanding that the saved range will go through the same iteration steps as the initial one. But nonono, actually the saved range goes through DUPLICATES of the states that the original goes through. Clearly there is an isomorphism of the two sequences of states, but they're not the same states - for example, someone looking at the original array will see the original changing the array, but not the .save. So what you're looking at is a range that can define .dup, not one that can define .save.
 And this is merely one example. I have some non-trivial code that
 evaluates an expression tree, each node of which may generate multiple
 elements; one may take a range over all possible values that the tree
 may have. For efficiency reasons, each output value (which is
 vector-valued) reuses the same buffer. It is a forward range with a
 transient .front (it has to be a forward range, because otherwise one
 can't range over all combinations of multiple values that subtrees may
 produce).

I'm not sure about this example. Are we looking again at a .dup kind of thing?
 While I'm perfectly fine with the decision that such ranges shouldn't be
 considered proper ranges at all, it would greatly limit the
 applicability of std.algorithm in my code. Either I would have to
 artificially limit the range to an input range (which is a bit silly, as
 it amounts to renaming .save to something else just so std.algorithm
 wouldn't recognize it as a forward range), or I wouldn't be able to use
 std.algorithm to manipulate these ranges, but would have to roll my own.
 Of course, D makes it so that rolling my own isn't all that hard, but it
 would be a bit disappointing that the supposedly-generic algorithms in
 the standard library aren't that generic after all.

I understand. At the same time, there is a point where an abstraction can't comprehend all possible instances. There's already significant aggravation moveFront and hasLvalueElements etc., which I'd want to simplify given a chance. Continuing down that path seems risky to me. Andrei
Nov 04 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 02/11/2012 21:17, Jonathan M Davis a écrit :
 On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or any
 intermediate step) that decides whether to use the optimization, but the
 final consumer.

 Though now we have the issue that all intermediate ranges must propagate
 .fast, which is troublesome if every range has to do it manually. Can
 this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it. - Jonathan M Davis

The whole point of my proposal is to avoid adding isTransient or something similar. Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front . 3/ Range that don't support that feature can be made compatible with UFCS, .fast being a NOP in such case. 4/ Ranges that work as a transofmration on existing ranges can propose the property too if they don't require .front to be persistant. It is implemented by bubbling the property call to the source range. 5/ It is up to the ultimate consumer to call that property or not.
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms). Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast. The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned. The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming. Andrei
Nov 04 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/4/2012 7:16 PM, Andrei Alexandrescu пишет:
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R)

Or rather onePass!R == ... but I seriously doubt that ability to save iteration state and the fact of reusing single slot/memory chunk for .front are always correlated.
 I think it would be vastly better to avoid defining a new property of
 ranges (that subsequently would need to be cross-checked by many
 algorithms). Then we have additional scalability issues because all
 derived ranges would need to propagate .fast etc. Then we need to cater
 for odd cases such as random-access ranges being .fast.

Neither .fast nor separate transient flag strike me as good solutions as placing a lot of burden on the range implementer (and introducing yet another convention).
 The simpler way would be to simply state that input ranges can't
 guarantee the persistence of .front after calling .popFront. This is
 quite expected for input ranges, and no different for any object that
 gives a char[]-based API; subsequently changing the object may change
 the contents of the char[] returned.

 The algorithms that need to worry about .front's transience are only the
 ones that work with input ranges (as opposed to forward ranges or
 stronger).

The fact that algorithm doesn't save iteration state != it counts on transient .front. A simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood. Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.
This approach puts the burden in the right place - on the
 implementers of specific algorithms supporting one-pass streaming.

Aside from mixing separate concepts into one* I like the approach. *InputRange vs ForwardRange has nothing to do with mutable indirections of .front. -- Dmitry Olshansky
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
 The fact that algorithm doesn't save iteration state != it counts on
 transient .front.

I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee. Note the IMPLIES (not EQUIVALENT) and no guarantee as opposed to guaranteed change of contents.
 A simple example of std.array.array will do - it iterates range once and
 pumps elements into appender/builder "sink". So it doesn't require
 forward range but just that elements are not _aliased_ under the hood.

No surprise here.
 Another example is dirEntries - it returns entries with immutable
 strings, but it can't save iteration state. It does work with array
 because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience. Andrei
Nov 04 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/4/2012 9:02 PM, Andrei Alexandrescu пишет:
 On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
 The fact that algorithm doesn't save iteration state != it counts on
 transient .front.

I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee.

Indeed, but it wasn't your claim that I meant. It's fine with me but it's hardly moving us forward. I was investigating your simplified proposal and its consequences for e.g. std.array.array. And apparently forgot to make the central point. I was mostly going by:
 The simpler way would be to simply state that input ranges can't 

and
The algorithms that need to worry about .front's transience are only 

stronger). And conclude that applying this simple rule to std.array.array* means it has to assume non-transient elements and somehow copy/dup'em. Problem is we don't have generic function to make 100% guaranteed unaliased copy (it could omit doing any work on non-aliased objects). Also that it misses forward ranges that have aliased .front, assuming they are non-transient. *It has to work with input ranges as I noted below.
 A simple example of std.array.array will do - it iterates range once and
 pumps elements into appender/builder "sink". So it doesn't require
 forward range but just that elements are not _aliased_ under the hood.

No surprise here.
 Another example is dirEntries - it returns entries with immutable
 strings, but it can't save iteration state. It does work with array
 because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience.

-- Dmitry Olshansky
Nov 04 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).

You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.
 Then we have additional scalability issues because all
 derived ranges would need to propagate .fast etc. Then we need to cater
 for odd cases such as random-access ranges being .fast.

The beauty of the proposal is that nothing NEED to be .fast .
 The simpler way would be to simply state that input ranges can't
 guarantee the persistence of .front after calling .popFront. This is
 quite expected for input ranges, and no different for any object that
 gives a char[]-based API; subsequently changing the object may change
 the contents of the char[] returned.

 The algorithms that need to worry about .front's transience are only the
 ones that work with input ranges (as opposed to forward ranges or
 stronger). This approach puts the burden in the right place - on the
 implementers of specific algorithms supporting one-pass streaming.

Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:40 AM, deadalnix wrote:
 Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).

You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.

Indeed I'd misunderstood. So I went back and my current understanding is that it's all about defining this: auto transient(R r) if (isInputRange!R) { return r; } Then certain ranges would implement a property .transient if there's an opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the end of the day, simply invoking r.transient for any range r would either get the optimized range or would vacuously return that same range. As far as I can tell from the subsequent discussion, there are quite a few issues related to propagating transient by ranges that compose on top of other ranges. By and large, adding an optional attribute of ranges is difficult to make free or minimal in cost. Andrei
Nov 04 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 17:57, Andrei Alexandrescu a écrit :
 Indeed I'd misunderstood. So I went back and my current understanding is
 that it's all about defining this:

 auto transient(R r) if (isInputRange!R)
 {
 return r;
 }

 Then certain ranges would implement a property .transient if there's an
 opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the
 end of the day, simply invoking r.transient for any range r would either
 get the optimized range or would vacuously return that same range.

Now we are on the same page. (and I like transient much better than .fast).
 As far as I can tell from the subsequent discussion, there are quite a
 few issues related to propagating transient by ranges that compose on
 top of other ranges. By and large, adding an optional attribute of
 ranges is difficult to make free or minimal in cost.

To sum it up, a transformation range should implement .transient if it can handle the fact that its source can be transient. The good news is that if it doesn't, then the program is still 100% correct. It is just slower. So this approach promote program correctness, but still allow crazy bearded programmer to optimize the code where needed. I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 12:26 PM, deadalnix wrote:
 I think it fit nicely D's phylosophy, in the sense it does provide a
 safe, easy to use interface while providing a backdoor when this isn't
 enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient. Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences. Andrei
Nov 04 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 23:10, H. S. Teoh a écrit :
 On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
 On 11/4/12 12:26 PM, deadalnix wrote:
 I think it fit nicely D's phylosophy, in the sense it does provide a
 safe, easy to use interface while providing a backdoor when this
 isn't enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.

Actually, it does. The proposal was to modify byLine so that it returns strings instead of a reused char[] buffer. The current version of byLine that has transient front will be moved into a .transient property. This has the following results: - All existing code doesn't break: it just becomes a tad slower from duplicating each line. Correctness is not broken. - Code that wants to be fast can call .transient at the end of the construction, e.g., stdin.byLine().map!myFunc.transient. Assuming that map propagates .transient (which also implies it's safe to use with transient ranges), this will return an optimized range that reuses the transient buffer safely. - Code that calls .transient on non-transient ranges get a no-op: [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc, because non-transient ranges just alias .transient to themselves (this can be done via UFCS so no actual change needs to be made to existing non-transient ranges). IOW, code is always 100% safe by default. You can optionally use .transient with certain ranges if you'd like for it to be faster. Using .transient where it isn't supported simply defaults to the usual safe behaviour (no performance increase, but no safety bugs either). Now, this can be taken one step further. User code need not even know what .transient is. As deadalnix said, the insight is that it is only invoked by range *consumers*. For example, one could in theory make canFind() transient-safe, so the user would write this: auto n = stdin.byLine().map!myFunc.canFind(myNeedle); and canFind uses the .transient range to do the finding. Assuming map!myFunc correctly supports .transient, this request for the faster range automatically propagates back to byLine(), and so the code is fast *without the user even asking for .transient*. And if the range being scanned is non-transient, then it behaves normally as it does today. For example, if map!myFunc does *not* support .transient, then when canFind asks for .transient, UFCS takes over and it just gets the (non-transient) mapped range back. Which in turn asks for the *non-transient* version of byLine(), so in no case will there be any safety breakage. I think this solution is most elegant so far, in the sense that: (1) Existing code doesn't break; (2) Existing code doesn't even need to change, or can be slowly optimized to use .transient on a case-by-case basis without any code breakage; (3) The user doesn't even need to know what .transient is to reap the benefits, where it's supported; (4) The algorithm writer decides whether or not .transient is supported, guaranteeing that there will be no hidden bugs caused by assuming .front is persistent and then getting a transient range passed to it. So it's the algorithm that declares whether or not it supports .transient.
 Back to a simpler solution: what's wrong with adding alternative
 APIs for certain input ranges? We have byLine, byChunk,
 byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
 and let the documentation explain the differences.

This is less elegant than deadalnix's proposal, because there is a certain risk associated with using .transient ranges. Now the onus is on the user to make sure he doesn't pass a transient range to an algorithm that can't handle it. With deadalnix's proposal, it's the *algorithm* that decides whether or not it knows how to handle .transient properly. The user doesn't have to know, and can still reap the benefits. An algorithm that doesn't know how to deal with transient ranges simply does nothing, and UFCS takes over to provide the default .transient, which just returns the original non-transient range.

I think I'll hire you when I need to promote my ideas :D This is much better explained than what I did myself and is exactly what I had in mind.
Nov 05 2012