www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Transience of .front in input vs. forward ranges

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Now that Andrei is back (I think?), I want to bring up this discussion
again, because I think it's important.

Recently, in another thread, it was found that std.algorithm.joiner
doesn't work properly with input ranges whose .front value is
invalidated by popFront(). Andrei stated that for input ranges .front
should not be assumed to return a persistent value, whereas for forward
ranges, .front can be assumed to be persistent. However, Jonathan
believes that .front should never be transient.

Obviously, both cannot be the case simultaneously. So we need to decide
exactly what it should be, because the current situation is subtly
broken, and this subtle brokenness is pervasive. For example, I recently
rewrote joiner to eliminate the assumption that .front is persistent,
only to discover that in the unittests, I can't use array() or equal()
(or, for that matter, writefln()), because they apparently all make this
assumption at some point (I didn't bother to find out exactly where).

In other words, right now input ranges really only work with arrays and
array-like objects. Not the generic ranges that Andrei has in mind in
his article "On Iteration". Many input ranges will subtly break, the
prime whipping boy example being byLine (which I hate to bring up
because it does not represent the full scope of such transient ranges),
a range that returns in-place permutations of an array, or anything that
reuses a buffer, really.

This situation isn't as simple as input ranges being transient and
forward ranges not, though. I want to bring another example besides the
dead horse byLine() to the spotlight. Let's say you have a range R that
spans all permutations of some starting array A. For efficiency reasons,
we don't want to allocate a new array every time we return a
permutation; so we have an internal buffer in R that holds the current
permutation, which is what .front returns. Then popFront() simply
permutes this buffer in-place. Something like this:

	struct AllPermutations(T) {
		T[] front, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			front = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(front, first))
				done = true;
		}
		 property bool empty() {
			return !done;
		}
	}

This is an input range, according to Andrei's definition. The value of
.front is transient, since popFront() modifies it in-place. According to
Jonathan's definition, however, this isn't a valid range for that very
reason.

Now consider what happens if we add this member:

	auto save() {
		AllPermutations!T copy;
		copy.front = this.front;
		copy.first = this.first;
		copy.done = this.done;
		return copy;
	}

This returns a separate instance of the same range, starting with the
current permutation, and ending with the original permutation, as
before. I submit that this makes it a forward range. However, this fails
to be a forward range under Andrei's definition, because forward ranges
require .front to be persistent. So we'd have to modify the range to be
something like this:

	struct AllPermutations(T) {
		T[] current, first;
		bool done;

		this(T[] initialBuf) {
			first = initialBuf;
			current = first.dup;
		}
		void popFront() {
			nextPermutation(current);
			if (equal(current, first))
				done = true;
		}
		 property bool empty() {
			return !done;
		}
		 property T[] front() {
			return current.dup; // <--- note this line
		}
		auto save() {
			AllPermutations!T copy;
			copy.front = this.front;
			copy.first = this.first;
			copy.done = this.done;
			return copy;
		}

	}

Note that now we have to duplicate the output array every time .front is
accessed. So whatever gains we may have had by using nextPermutation to
modify the array in-place is lost, just so that we can conform to an
arbitrary standard of what a forward range is.

Under Jonathan's definition, we'd have to incur this cost regardless of
whether we had save() or not, since .front is *always* required to be
persistent.

But I propose that the correct solution is to recognize that whether or
not .front is transient is orthogonal to whether a range is an input
range or a forward range. Many algorithms actually don't care if .front
is persistent or not (at least in theory), including equal(), find(),
map(), count(), reduce(), joiner(). Maybe they *currently* assume that
.front is persistent, but they are easily implementable *without* this
assumption (I have an implementation of joiner that doesn't make this
assumption).

Not allowing forward ranges to have transient .front values, or not
allowing transient .front values at all, will introduce the arbitrary
need for lots of implicit copying, which makes D code inefficient. It
will also limit the applicability of std.algorithm and std.range (if
said the cost of said copying is unacceptable for a particular
application).

The problem, of course, is how to check of a range has transient .front
or not. I proposed adding a .isTransient member to the range, but this
depends on the range implementor to remember to mark the range with
.isTransient=true, and we know that coding by convention is not
scalable. So here's another idea:

	template isTransient(R) if (isInputRange!R)
	{
		static if (is(typeof(R.front) : immutable(typeof(R.front))))
			enum isTransient = false;
		else
			enum isTransient = true;
	}

The idea is that value types are implicitly convertible to immutable,
and value types are non-transient. Reference types, if they can be made
immutable, ensures that calling popFront() will never invalidate them
(by definition of immutable).

To use byLine as an example, if we want to make it non-transient, we
simply have .front return string instead of char[]. Then we can safely
use it with any of the existing algorithms that assume that .front won't
mutate under them when popFront() is called.

Algorithms that *don't* need .front to be persistent can just take
input/forward/whatever ranges as usual.

In any case, whether we decide to do this or not, we NEED to set down a
clear, unambiguous definition of exactly what an input range is, and
what a forward range is, and what kind of assumptions may be safely made
regarding the value of .front. The current ambiguous situation is a
hiding place for subtle bugs, and is unacceptable.


T

-- 
Long, long ago, the ancient Chinese invented a device that lets them see
through walls. It was called the "window".
Oct 30 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 30/10/2012 23:29, H. S. Teoh a crit :
 Now that Andrei is back (I think?), I want to bring up this discussion
 again, because I think it's important.

Interesting post. It appears to me that invalidating .front is done for performances reasons most of the time. As this is the unsafe face of the coin, I strongly think that this shouldn't be the default behavior. To me the best solution is to provide a function or a property on ranges that provide the unsafe version of the range. It is easy to provide a default implementation as UFCS that is a NOOP so no code is being broken. With that design, it is possible to provide an always safe interface while allowing algorithm that can handle non persistent .front to run at full speed. As it is always safe to provide a range that persist .front when a range that do not persist it is expected, no code needs to be broken. Obviously, the performance boost will not show up in this case :D byLine and alike will have to be modified to follow this policy, but hopefully, no code should be broken in the process. If I use your permuting example, it should be done as follow : struct InvalidatingAllPermutations(T) { T[] current, first; bool done; this(T[] initialBuf) { first = initialBuf; current = first.dup; } void popFront() { nextPermutation(current); if (equal(current, first)) done = true; } property bool empty() { return !done; } property T[] front() { return current; } auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } } struct AllPermutations(T) { InvalidatingAllPermutations!T innerRange; alias innerRange this; T[] current; this(T[] initialBuf) { innerRange = InvalidatingAllPermutations!T(initialBuf); current = innerRange.front.dup; } property T[] front() { return current; } void popFront() { innerRange.popFront(); current = innerRange.front.dup; } } I do think this is the way that conciliate safe as default but still allow to be fast when needed, without adding much on most code.
Oct 30 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 31, 2012 at 12:00:50AM +0100, deadalnix wrote:
[...]
 It appears to me that invalidating .front is done for performances
 reasons most of the time. As this is the unsafe face of the coin, I
 strongly think that this shouldn't be the default behavior.
 
 To me the best solution is to provide a function or a property on
 ranges that provide the unsafe version of the range.
 
 It is easy to provide a default implementation as UFCS that is a
 NOOP so no code is being broken.
 
 With that design, it is possible to provide an always safe interface
 while allowing algorithm that can handle non persistent .front to
 run at full speed.

Hmm. I like this idea! This is much better than my proposal. So let's say we change byLine() to always return a copy of the buffer, so that .front is never invalidated. Then for algorithms that don't care about .front being invalidated, they can do something like this: auto myAlgo(R)(R inputRange) { auto r = inputRange.fastRange; while (!r.empty) { auto x = r.front; // make use of x r.popFront(); // this invalidates x } return result; } void main() { myAlgo(stdin.byLine()); } We can use UFCS so that fastRange is always defined: // Default implementation auto fastRange(R)(R range) { return range; } For byLine(), then, we have: struct ByLine(...) { // safe implementation property auto fastRange() { struct UnsafeByLine(...) { ... } return UnsafeByLine(this); } } I like this. Very clean, and doesn't break backward compatibility, and allows select algorithms to be optimized for speed without affecting everything else. [...]
 If I use your permuting example, it should be done as follow :
 
 struct InvalidatingAllPermutations(T) {
 	T[] current, first;
 	bool done;
 
 	this(T[] initialBuf) {
 		first = initialBuf;
 		current = first.dup;
 	}
 
 	void popFront() {
 		nextPermutation(current);
 		if (equal(current, first))
 			done = true;
 	}
 
 	 property bool empty() {
 		return !done;
 	}
 
 	 property T[] front() {
 		return current;
 	}
 
 	auto save() {
 		AllPermutations!T copy;
 		copy.front = this.front;
 		copy.first = this.first;
 		copy.done = this.done;
 		return copy;
 	}
 }
 
 struct AllPermutations(T) {
 	InvalidatingAllPermutations!T innerRange;
 	alias innerRange this;
 
 	T[] current;
 
 	this(T[] initialBuf) {
 		innerRange = InvalidatingAllPermutations!T(initialBuf);
 		current = innerRange.front.dup;
 	}
 
 	 property T[] front() {
 		return current;
 	}
 
 	void popFront() {
 		innerRange.popFront();
 		current = innerRange.front.dup;
 	}
 }
 
 I do think this is the way that conciliate safe as default but still
 allow to be fast when needed, without adding much on most code.

+1. I like this better than my proposal. T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
Oct 30 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 31/10/2012 01:56, H. S. Teoh a crit :
 +1. I like this better than my proposal.

The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.
Oct 31 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 31, 2012 at 02:18:55PM +0100, deadalnix wrote:
 Le 31/10/2012 01:56, H. S. Teoh a crit :
+1. I like this better than my proposal.

The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.

Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-( T -- VI = Visual Irritation
Oct 31 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 31/10/2012 18:19, H. S. Teoh a crit :
 On Wed, Oct 31, 2012 at 02:18:55PM +0100, deadalnix wrote:
 Le 31/10/2012 01:56, H. S. Teoh a crit :
 +1. I like this better than my proposal.

The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.

Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.
Oct 31 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:
 Le 31/10/2012 18:19, H. S. Teoh a crit :

[...]
Actually, there is another problem: many algorithms' output will be
transient or not depending on the input range. For example, we could
write map to use the transient version of byLine, say, but then the
range that map returns will also be transient (no longer safe).

IOW, transience of .front is transitive in some cases. This again
makes things complicated: as soon as you use a single transient
range, it makes downstream ranges transient as well. So we're back to
the problem of how to mark the range as transient in a transparent
way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.

But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved. T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
Oct 31 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 01/11/2012 05:30, H. S. Teoh a crit :
 On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:
 Le 31/10/2012 18:19, H. S. Teoh a crit :

[...]
 Actually, there is another problem: many algorithms' output will be
 transient or not depending on the input range. For example, we could
 write map to use the transient version of byLine, say, but then the
 range that map returns will also be transient (no longer safe).

 IOW, transience of .front is transitive in some cases. This again
 makes things complicated: as soon as you use a single transient
 range, it makes downstream ranges transient as well. So we're back to
 the problem of how to mark the range as transient in a transparent
 way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.

But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved.

Would joiner be able to take advantage of this if it was known or not ?
Nov 01 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 01, 2012 at 03:18:13PM +0100, deadalnix wrote:
 Le 01/11/2012 05:30, H. S. Teoh a crit :
On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:
Le 31/10/2012 18:19, H. S. Teoh a crit :

[...]
Actually, there is another problem: many algorithms' output will be
transient or not depending on the input range. For example, we
could write map to use the transient version of byLine, say, but
then the range that map returns will also be transient (no longer
safe).

IOW, transience of .front is transitive in some cases. This again
makes things complicated: as soon as you use a single transient
range, it makes downstream ranges transient as well. So we're back
to the problem of how to mark the range as transient in a
transparent way. :-(

No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.

But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved.

Would joiner be able to take advantage of this if it was known or not ?

I have a working version of joiner that doesn't assume the persistence of .front. The thing is, if the range you hand it is transient, then the resulting joined range is also transient. This shows up in one of my unittests: I can't use writeln() because it assumes persistence of .front, and I can't use array() either for the same reason. I have to explicit write a foreach to .dup the elements of the range into an array before it can be safely used in writeln(), etc.. This makes it impossible for joiner to transparently switch to a fast implementation (call .fast on the original range), because in this case the transience is transitive -- the user will have to explicitly call joiner with .fast. I don't like that, because then how would you know which algos are safe to call with .fast? It's only by documentation, i.e., by convention, and is bound to be a hiding place for bugs later on. T -- Those who don't understand Unix are condemned to reinvent it, poorly.
Nov 01 2012
parent reply deadalnix <deadalnix gmail.com> writes:
This is up to the consumer to choose if .front can be oblitered on 
popFront, not to an intermediate algorithm.

joiner isn't a consumer, this is a  transformer . transformer have to 
propagate the .fast (I don't like this name, but this is unimportant for 
now) to its source.

Let'(s consider the following sheme :
source -> transformer (possibly several of them) -> consumer.

Now see example below :

auto transform(R)(R range) {
     struct Transfomer(R) {
         // Operations . . .

          property
         auto fast() {
             return transform(source.fast);
         }
     }
}

So the fast is used by the consumer and bubble throw all ranges that 
support it up to the source. Or is made a NOOP in the process in one of 
the transformers or the source do not support that optimization.

As said before, this is up to the consumer to know it it accept .front 
to be obliterated. In your case, writeln don't support that, so .fast 
must not be used with writeln.
Nov 01 2012
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Nov 01, 2012 at 11:40:52PM +0100, deadalnix wrote:
 This is up to the consumer to choose if .front can be oblitered on
 popFront, not to an intermediate algorithm.
 
 joiner isn't a consumer, this is a  transformer . transformer have
 to propagate the .fast (I don't like this name, but this is
 unimportant for now) to its source.

What would be a better name for it?
 Let'(s consider the following sheme :
 source -> transformer (possibly several of them) -> consumer.
 
 Now see example below :
 
 auto transform(R)(R range) {
     struct Transfomer(R) {
         // Operations . . .
 
          property
         auto fast() {
             return transform(source.fast);
         }
     }
 }
 
 So the fast is used by the consumer and bubble throw all ranges that
 support it up to the source. Or is made a NOOP in the process in one
 of the transformers or the source do not support that optimization.
 
 As said before, this is up to the consumer to know it it accept
 .front to be obliterated. In your case, writeln don't support that,
 so .fast must not be used with writeln.

Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS? T -- What doesn't kill me makes me stranger.
Nov 02 2012
prev sibling next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or any
 intermediate step) that decides whether to use the optimization, but the
 final consumer.
 
 Though now we have the issue that all intermediate ranges must propagate
 .fast, which is troublesome if every range has to do it manually. Can
 this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it. - Jonathan M Davis
Nov 02 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 02/11/2012 21:17, Jonathan M Davis a écrit :
 On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or any
 intermediate step) that decides whether to use the optimization, but the
 final consumer.

 Though now we have the issue that all intermediate ranges must propagate
 .fast, which is troublesome if every range has to do it manually. Can
 this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it. - Jonathan M Davis

The whole point of my proposal is to avoid adding isTransient or something similar. Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front . 3/ Range that don't support that feature can be made compatible with UFCS, .fast being a NOP in such case. 4/ Ranges that work as a transofmration on existing ranges can propose the property too if they don't require .front to be persistant. It is implemented by bubbling the property call to the source range. 5/ It is up to the ultimate consumer to call that property or not.
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms). Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast. The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned. The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming. Andrei
Nov 04 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/4/2012 7:16 PM, Andrei Alexandrescu пишет:
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R)

Or rather onePass!R == ... but I seriously doubt that ability to save iteration state and the fact of reusing single slot/memory chunk for .front are always correlated.
 I think it would be vastly better to avoid defining a new property of
 ranges (that subsequently would need to be cross-checked by many
 algorithms). Then we have additional scalability issues because all
 derived ranges would need to propagate .fast etc. Then we need to cater
 for odd cases such as random-access ranges being .fast.

Neither .fast nor separate transient flag strike me as good solutions as placing a lot of burden on the range implementer (and introducing yet another convention).
 The simpler way would be to simply state that input ranges can't
 guarantee the persistence of .front after calling .popFront. This is
 quite expected for input ranges, and no different for any object that
 gives a char[]-based API; subsequently changing the object may change
 the contents of the char[] returned.

 The algorithms that need to worry about .front's transience are only the
 ones that work with input ranges (as opposed to forward ranges or
 stronger).

The fact that algorithm doesn't save iteration state != it counts on transient .front. A simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood. Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.
This approach puts the burden in the right place - on the
 implementers of specific algorithms supporting one-pass streaming.

Aside from mixing separate concepts into one* I like the approach. *InputRange vs ForwardRange has nothing to do with mutable indirections of .front. -- Dmitry Olshansky
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
 The fact that algorithm doesn't save iteration state != it counts on
 transient .front.

I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee. Note the IMPLIES (not EQUIVALENT) and no guarantee as opposed to guaranteed change of contents.
 A simple example of std.array.array will do - it iterates range once and
 pumps elements into appender/builder "sink". So it doesn't require
 forward range but just that elements are not _aliased_ under the hood.

No surprise here.
 Another example is dirEntries - it returns entries with immutable
 strings, but it can't save iteration state. It does work with array
 because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience. Andrei
Nov 04 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
11/4/2012 9:02 PM, Andrei Alexandrescu пишет:
 On 11/4/12 11:36 AM, Dmitry Olshansky wrote:
 The fact that algorithm doesn't save iteration state != it counts on
 transient .front.

I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee.

Indeed, but it wasn't your claim that I meant. It's fine with me but it's hardly moving us forward. I was investigating your simplified proposal and its consequences for e.g. std.array.array. And apparently forgot to make the central point. I was mostly going by:
 The simpler way would be to simply state that input ranges can't 

guarantee the persistence of .front after calling .popFront. and
The algorithms that need to worry about .front's transience are only 

the ones that work with input ranges (as opposed to forward ranges or stronger). And conclude that applying this simple rule to std.array.array* means it has to assume non-transient elements and somehow copy/dup'em. Problem is we don't have generic function to make 100% guaranteed unaliased copy (it could omit doing any work on non-aliased objects). Also that it misses forward ranges that have aliased .front, assuming they are non-transient. *It has to work with input ranges as I noted below.
 A simple example of std.array.array will do - it iterates range once and
 pumps elements into appender/builder "sink". So it doesn't require
 forward range but just that elements are not _aliased_ under the hood.

No surprise here.
 Another example is dirEntries - it returns entries with immutable
 strings, but it can't save iteration state. It does work with array
 because of no aliasing of .front values.

This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience.

-- Dmitry Olshansky
Nov 04 2012
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, November 04, 2012 21:49:58 Dmitry Olshansky wrote:
 I was mostly going by:
  > The simpler way would be to simply state that input ranges can't
 
 guarantee the persistence of .front after calling .popFront.
 
 and
 
  >The algorithms that need to worry about .front's transience are only
 
 the ones that work with input ranges (as opposed to forward ranges or
 stronger).
 
 And conclude that applying this simple rule to std.array.array* means it
 has to assume non-transient elements and somehow copy/dup'em. Problem is
 we don't have generic function to make 100% guaranteed unaliased copy
 (it could omit doing any work on non-aliased objects).
 
 Also that it misses forward ranges that have aliased .front, assuming
 they are non-transient.

The reality is that in the general case, you can't know for certain whether the result of front is transient or not. In some cases, you know, because the type is clearly immutable or a value type. In others (e.g. char[]), you can't know. A particularly annoying one though is classes, because they're almost never going to be immutable, so in almost all cases, if input ranges can have transient fronts and the range's element types is a class, then you have to assume that its front is transient. And I'm not sure whether it's possible to determine whether a struct is a value type or a reference type or not. The mutable indirections bit probably makes it so that you can determine for sure that some structs are value types, but you can't be certain for some of them whether they're a reference type or value type without examining the postblit constructor. So, what it ends up coming down to if we accept that input ranges can have transient fronts is that algorithms then either have to require forward ranges in the case where they need to keep the result of front around, or we need a template of some kind which can determine in at least some cases that front in not transient (but it's going to have to say that front is transient in many cases where it isn't), and such algorithms will have to use that trait when dealing with input ranges. The big problem though is std.array.array. Based on the functions that an input range has, all it needs is an input range, but it'll never work with a transient front. So, what is likely one of the most heavily used range-based functions can't work with input ranges. This is a big problem, and fixing it _will_ break code. Because either, we need to create a hasTransientFront template or make std.array.array require at least a forward range, which will make dealing with input ranges that much worse. Still, as much as I hate the idea of permitting transient fronts, I'd rather have it be part of input ranges and be able to assume that other range types have non-transient fronts than create isTransient. Ranges are already overly complicated as it is. Regardless, if we decide that input ranges can have transient fronts, then we need to make that very clear, which clearly hasn't been the case, because only Andrei thought that the transience of front had anything to do with the definition of an input range. - Jonathan M Davis
Nov 04 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).

You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.
 Then we have additional scalability issues because all
 derived ranges would need to propagate .fast etc. Then we need to cater
 for odd cases such as random-access ranges being .fast.

The beauty of the proposal is that nothing NEED to be .fast .
 The simpler way would be to simply state that input ranges can't
 guarantee the persistence of .front after calling .popFront. This is
 quite expected for input ranges, and no different for any object that
 gives a char[]-based API; subsequently changing the object may change
 the contents of the char[] returned.

 The algorithms that need to worry about .front's transience are only the
 ones that work with input ranges (as opposed to forward ranges or
 stronger). This approach puts the burden in the right place - on the
 implementers of specific algorithms supporting one-pass streaming.

Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:40 AM, deadalnix wrote:
 Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :
 On 11/4/12 9:36 AM, deadalnix wrote:
 Let's put back relevant elements of the solution I propose :
 1/ range preserve .front by default .
 2/ Ranges that would get performance boost from invalidating .front can
 propose a property (we called it .fast in the thread) that return a new
 range that invalidate .front .

IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).

You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.

Indeed I'd misunderstood. So I went back and my current understanding is that it's all about defining this: auto transient(R r) if (isInputRange!R) { return r; } Then certain ranges would implement a property .transient if there's an opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the end of the day, simply invoking r.transient for any range r would either get the optimized range or would vacuously return that same range. As far as I can tell from the subsequent discussion, there are quite a few issues related to propagating transient by ranges that compose on top of other ranges. By and large, adding an optional attribute of ranges is difficult to make free or minimal in cost. Andrei
Nov 04 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 17:57, Andrei Alexandrescu a écrit :
 Indeed I'd misunderstood. So I went back and my current understanding is
 that it's all about defining this:

 auto transient(R r) if (isInputRange!R)
 {
 return r;
 }

 Then certain ranges would implement a property .transient if there's an
 opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the
 end of the day, simply invoking r.transient for any range r would either
 get the optimized range or would vacuously return that same range.

Now we are on the same page. (and I like transient much better than .fast).
 As far as I can tell from the subsequent discussion, there are quite a
 few issues related to propagating transient by ranges that compose on
 top of other ranges. By and large, adding an optional attribute of
 ranges is difficult to make free or minimal in cost.

To sum it up, a transformation range should implement .transient if it can handle the fact that its source can be transient. The good news is that if it doesn't, then the program is still 100% correct. It is just slower. So this approach promote program correctness, but still allow crazy bearded programmer to optimize the code where needed. I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.
Nov 04 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 12:26 PM, deadalnix wrote:
 I think it fit nicely D's phylosophy, in the sense it does provide a
 safe, easy to use interface while providing a backdoor when this isn't
 enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient. Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences. Andrei
Nov 04 2012
next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Sunday, 4 November 2012 at 17:38:07 UTC, Andrei Alexandrescu 
wrote:
 Back to a simpler solution: what's wrong with adding 
 alternative APIs for certain input ranges? We have byLine, 
 byChunk, byChunkAsync. We may as well add eachLine, eachChunk, 
 eachChunkAsync and let the documentation explain the 
 differences.


 Andrei

Yes, please! (<nitpick>Although I don't necessarily agree with the naming</nitpick>). I think we should go with transient ranges the same way we went with "non-size_t index" ranges. Acknowledge that they exist, but not guarantee their support when passed to Phobos algorithms. byLine is a perfect example of this: You can use it all you want in a for loop, but pass it "copy" or "array" at your own risk. At that point, all we do is "tag" byLine as "Warning: Transient.", and call it a day. Not as perfectly safe as we'd want, but good enough, IMO. We then offer "eachLine", which isn't transient, and everyone is happy.
Nov 04 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
 On 11/4/12 12:26 PM, deadalnix wrote:
I think it fit nicely D's phylosophy, in the sense it does provide a
safe, easy to use interface while providing a backdoor when this
isn't enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.

Actually, it does. The proposal was to modify byLine so that it returns strings instead of a reused char[] buffer. The current version of byLine that has transient front will be moved into a .transient property. This has the following results: - All existing code doesn't break: it just becomes a tad slower from duplicating each line. Correctness is not broken. - Code that wants to be fast can call .transient at the end of the construction, e.g., stdin.byLine().map!myFunc.transient. Assuming that map propagates .transient (which also implies it's safe to use with transient ranges), this will return an optimized range that reuses the transient buffer safely. - Code that calls .transient on non-transient ranges get a no-op: [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc, because non-transient ranges just alias .transient to themselves (this can be done via UFCS so no actual change needs to be made to existing non-transient ranges). IOW, code is always 100% safe by default. You can optionally use .transient with certain ranges if you'd like for it to be faster. Using .transient where it isn't supported simply defaults to the usual safe behaviour (no performance increase, but no safety bugs either). Now, this can be taken one step further. User code need not even know what .transient is. As deadalnix said, the insight is that it is only invoked by range *consumers*. For example, one could in theory make canFind() transient-safe, so the user would write this: auto n = stdin.byLine().map!myFunc.canFind(myNeedle); and canFind uses the .transient range to do the finding. Assuming map!myFunc correctly supports .transient, this request for the faster range automatically propagates back to byLine(), and so the code is fast *without the user even asking for .transient*. And if the range being scanned is non-transient, then it behaves normally as it does today. For example, if map!myFunc does *not* support .transient, then when canFind asks for .transient, UFCS takes over and it just gets the (non-transient) mapped range back. Which in turn asks for the *non-transient* version of byLine(), so in no case will there be any safety breakage. I think this solution is most elegant so far, in the sense that: (1) Existing code doesn't break; (2) Existing code doesn't even need to change, or can be slowly optimized to use .transient on a case-by-case basis without any code breakage; (3) The user doesn't even need to know what .transient is to reap the benefits, where it's supported; (4) The algorithm writer decides whether or not .transient is supported, guaranteeing that there will be no hidden bugs caused by assuming .front is persistent and then getting a transient range passed to it. So it's the algorithm that declares whether or not it supports .transient.
 Back to a simpler solution: what's wrong with adding alternative
 APIs for certain input ranges? We have byLine, byChunk,
 byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
 and let the documentation explain the differences.

[...] This is less elegant than deadalnix's proposal, because there is a certain risk associated with using .transient ranges. Now the onus is on the user to make sure he doesn't pass a transient range to an algorithm that can't handle it. With deadalnix's proposal, it's the *algorithm* that decides whether or not it knows how to handle .transient properly. The user doesn't have to know, and can still reap the benefits. An algorithm that doesn't know how to deal with transient ranges simply does nothing, and UFCS takes over to provide the default .transient, which just returns the original non-transient range. T -- Государство делает вид, что платит нам зарплату, а мы делаем вид, что работаем.
Nov 04 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 23:10, H. S. Teoh a écrit :
 On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:
 On 11/4/12 12:26 PM, deadalnix wrote:
 I think it fit nicely D's phylosophy, in the sense it does provide a
 safe, easy to use interface while providing a backdoor when this
 isn't enough.

It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.

Actually, it does. The proposal was to modify byLine so that it returns strings instead of a reused char[] buffer. The current version of byLine that has transient front will be moved into a .transient property. This has the following results: - All existing code doesn't break: it just becomes a tad slower from duplicating each line. Correctness is not broken. - Code that wants to be fast can call .transient at the end of the construction, e.g., stdin.byLine().map!myFunc.transient. Assuming that map propagates .transient (which also implies it's safe to use with transient ranges), this will return an optimized range that reuses the transient buffer safely. - Code that calls .transient on non-transient ranges get a no-op: [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc, because non-transient ranges just alias .transient to themselves (this can be done via UFCS so no actual change needs to be made to existing non-transient ranges). IOW, code is always 100% safe by default. You can optionally use .transient with certain ranges if you'd like for it to be faster. Using .transient where it isn't supported simply defaults to the usual safe behaviour (no performance increase, but no safety bugs either). Now, this can be taken one step further. User code need not even know what .transient is. As deadalnix said, the insight is that it is only invoked by range *consumers*. For example, one could in theory make canFind() transient-safe, so the user would write this: auto n = stdin.byLine().map!myFunc.canFind(myNeedle); and canFind uses the .transient range to do the finding. Assuming map!myFunc correctly supports .transient, this request for the faster range automatically propagates back to byLine(), and so the code is fast *without the user even asking for .transient*. And if the range being scanned is non-transient, then it behaves normally as it does today. For example, if map!myFunc does *not* support .transient, then when canFind asks for .transient, UFCS takes over and it just gets the (non-transient) mapped range back. Which in turn asks for the *non-transient* version of byLine(), so in no case will there be any safety breakage. I think this solution is most elegant so far, in the sense that: (1) Existing code doesn't break; (2) Existing code doesn't even need to change, or can be slowly optimized to use .transient on a case-by-case basis without any code breakage; (3) The user doesn't even need to know what .transient is to reap the benefits, where it's supported; (4) The algorithm writer decides whether or not .transient is supported, guaranteeing that there will be no hidden bugs caused by assuming .front is persistent and then getting a transient range passed to it. So it's the algorithm that declares whether or not it supports .transient.
 Back to a simpler solution: what's wrong with adding alternative
 APIs for certain input ranges? We have byLine, byChunk,
 byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync
 and let the documentation explain the differences.

[...] This is less elegant than deadalnix's proposal, because there is a certain risk associated with using .transient ranges. Now the onus is on the user to make sure he doesn't pass a transient range to an algorithm that can't handle it. With deadalnix's proposal, it's the *algorithm* that decides whether or not it knows how to handle .transient properly. The user doesn't have to know, and can still reap the benefits. An algorithm that doesn't know how to deal with transient ranges simply does nothing, and UFCS takes over to provide the default .transient, which just returns the original non-transient range.

I think I'll hire you when I need to promote my ideas :D This is much better explained than what I did myself and is exactly what I had in mind.
Nov 05 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:
 On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or
 any intermediate step) that decides whether to use the optimization,
 but the final consumer.
 
 Though now we have the issue that all intermediate ranges must
 propagate .fast, which is troublesome if every range has to do it
 manually. Can this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.

[...] I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary. T -- Real Programmers use "cat > a.out".
Nov 03 2012
next sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Saturday, 3 November 2012 at 23:19:11 UTC, H. S. Teoh wrote:
 I wish Andrei would give some input as to how we should proceed 
 with this. I do consider this a major issue with ranges, 
 because for efficiency reasons I often write ranges that have 
 transient .front values, and they can lead to subtle bugs with 
 the current implementation of std.algorithm. It would be good 
 to settle on this issue one way or another. I'd be happy even 
 if the decision is to say that transient ranges are invalid and 
 shouldn't be considered "real" ranges. Anything is better than 
 the current nebulous state of things which only leads to subtle 
 bugs for the unwary.

From watching and gleaming from what I have so far, I can only think that transient should NOT be the default way that ranges work (as it causes too many problems); However transient should be allowed (and available) when possible. I can only think that perhaps using a template to determine if it's transient should be present, that way it's a simple flag to enable/disable and should propagate anywhere that that range was used. struct Range(bool isTransient=false) { static if (isTransient) { //front and transient } else { //front and non-transient } } Unless someone thinks this is a bad approach?
Nov 03 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, November 04, 2012 02:30:49 Era Scarecrow wrote:
   From watching and gleaming from what I have so far, I can only
 think that transient should NOT be the default way that ranges
 work (as it causes too many problems); However transient should
 be allowed (and available) when possible.
 
   I can only think that perhaps using a template to determine if
 it's transient should be present, that way it's a simple flag to
 enable/disable and should propagate anywhere that that range was
 used.
 
 struct Range(bool isTransient=false) {
    static if (isTransient) {
      //front and transient
    } else {
      //front and non-transient
    }
 }
 
   Unless someone thinks this is a bad approach?

That's just trying to be able to tell a range whether it should be transient or not and doesn't really solve the problem. The problem is that algorithms in general assume that front is _not_ transient, and we need to either decide that algorithms are free to continue to assume that front is non-transient (meaning that you make front transient on your range at your own risk and no guarantees that any range-based functions will work with it), or we need to provide a way for range-based functions to know whether a particular range has a transient front or not. As such, I think that it comes down primarily to either 1. Just decide that all input ranges can have transient fronts and that no ranges greater than input ranges can have transient fronts. Then algorithms can infer transience from the type of range. This is how Andrei thinks that it's supposed to work but pretty much no one else does (if nothing else, because that idea was never made clear anywhere, and no one else inferred that from the definitions of input ranges). 2. Make it so that any range can have a transient front but provide a template for checking for it like we do with stuff like hasSlicing or length. Presumably that template would check that for something like like the existence of R.isTransient, and ranges with transient fronts would just declare an enum with that name. Then every range-based function which relied on front not being transient would have to check whether the range that it was given was transient or not or fail to compile if it is, and every wrapper range would have to propogate the isTransient member. Failure to do so in either case (which _would_ happen at least some of the time, at least outside of Phobos), would result in transient ranges silently doing bizarre things. 3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable. 4. Just decide that range-based algorithms can assume that front isn't ever transient. Any range types which make it transient do so at their own risk. Honestly, I'd really like to go with #4 and make ByLine and ByChunk use opApply. I think that transience complicates things too much. Even just having input ranges have transient fronts really screws with things. Something as simple as auto app = appender!E(); foreach(e; range) app.put(e); is totally screwed by transience, and it's only operating on input ranges. I seriously question that a function like std.array.array can be implemented with a transient front. Without a way to explicitly copy front, you can't have front be transient and keep it like std.array.array does. And even if there _were_ a way to explictly copy the result of front, that would be horribly inefficient in the cases where front isn't transient, because it would force you to copy when it was completely unnecessary. I honestly don't think that transience really works outside of some very specific use cases, and the cost of trying to support it will not be small. And given the issues that std.array.array has with it and the fact that std.array.array really shouldn't have to operate on forward ranges, I don't think that the idea of input ranges having transient fronts is really going to work, which would mean using something like isTransient, which has the same sort of problems as save tends to and is likely to be forgotten even more than save is. So, I really question that transience and ranges is really going to work at all. If we _do_ support it though, I think that we need to go the isTransient route. - Jonathan M Davis
Nov 03 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/11/2012 03:26, Jonathan M Davis a écrit :
 3. Make it so that ranges which can be transient are non-transient by default
 but provide a function to get at a transient version for speed (which was the
 fastRange proposal in this thread). The main problem here is that when the
 fast range gets wrapped, it's transient, and so anything using the wrapped
 range will be forced to use the transient version rather than using the non-
 transient version and only using the transient version when it's asked for.
 So, I don't think that this is particularly viable.

Can you elaborate on that. I definitively don't see a problem that big here.
Nov 04 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
 Le 04/11/2012 03:26, Jonathan M Davis a =C3=A9crit :
 3. Make it so that ranges which can be transient are non-transient =


by
 default but provide a function to get at a transient version for sp=


eed
 (which was the fastRange proposal in this thread). The main problem=


here
 is that when the fast range gets wrapped, it's transient, and so an=


ything
 using the wrapped range will be forced to use the transient version=


 rather than using the non- transient version and only using the tra=


nsient
 version when it's asked for. So, I don't think that this is particu=


larly
 viable.

=20 Can you elaborate on that. I definitively don't see a problem that bi=

g here. The problem is that you can't opt out of the fast range once you've got= it. If=20 you use fastRange and then the result ends up getting wrapped by anothe= r=20 range, that range's front is transient and has no way of indicating it.= So,=20 you're right back in the boat that we're in right now. In order to avoi= d that,=20 virtually all range-based functions would have to not use fastRange, be= cause=20 they have no idea whether you're going to use their result to pass to a= nother=20 function or not. - Jonathan M Davis
Nov 04 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 05/11/2012 04:11, Jonathan M Davis a écrit :
 On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
 Le 04/11/2012 03:26, Jonathan M Davis a écrit :
 3. Make it so that ranges which can be transient are non-transient by
 default but provide a function to get at a transient version for speed
 (which was the fastRange proposal in this thread). The main problem here
 is that when the fast range gets wrapped, it's transient, and so anything
 using the wrapped range will be forced to use the transient version
 rather than using the non- transient version and only using the transient
 version when it's asked for. So, I don't think that this is particularly
 viable.

Can you elaborate on that. I definitively don't see a problem that big here.

The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not. - Jonathan M Davis

HS Teoh explained it nicely. The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable. So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source. If one transformer is unable to handle transient range, the it don't pass .transient to its source.
Nov 05 2012
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 05, 2012 15:48:41 deadalnix wrote:
 HS Teoh explained it nicely.
 
 The responsibility of using .transient or not belong to the consumer.
 Only the consumer can know if a transient range is suitable.
 
 So you don't wrap a transient range. You wrap a non transient range,
 and, if the consumer is able to handle the transcient version, it call
 .transient on its source, that call it on its source, etc . . . up to
 the real source.
 
 If one transformer is unable to handle transient range, the it don't
 pass .transient to its source.

You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience. At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range. - Jonathan M Davis
Nov 05 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 05/11/2012 19:42, Jonathan M Davis a écrit :
 On Monday, November 05, 2012 15:48:41 deadalnix wrote:
 HS Teoh explained it nicely.

 The responsibility of using .transient or not belong to the consumer.
 Only the consumer can know if a transient range is suitable.

 So you don't wrap a transient range. You wrap a non transient range,
 and, if the consumer is able to handle the transcient version, it call
 .transient on its source, that call it on its source, etc . . . up to
 the real source.

 If one transformer is unable to handle transient range, the it don't
 pass .transient to its source.

You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.

As shown before, most wrapper range wouldn't have to do a thing except rewraping with .transient . The typical situation is that a wrapper range get its transientness from the wraped range. So it boils down to womething like : struct Wrapper(Wrapped) { Wrapped e; // Range implementation property auto transient() { return Wrapper!(typeof(e.transient))(e.transient); } } Considering that doing a wrapper that is compatible with transient ranges is already some work and require the writer to be aware of such a concept (you can expect most programmer to simply ignore that fact). The extra work for a wrapper range is really small, and add the benefice to ensure that the developer that programmed the wrapper took special care to ensure this will work with transient.
 At least with Andrei's proposal, transience is explicitly restricted to input
 ranges, which seriously reduces the problems caused by them, particularly
 since so few functions can really function with just an input range.

This proposal is simplistic. array() is not even implementable with such an approach. It is also slower for several use cases shown in this thread. And finally, it is likely that many non transient compatible stuff will proliferate, leading to undefined behavior all over the place. Simply telling to people to follow some guideline is not going to work. Many lead devs have experienced that with small teams, and we are talking here about every single D user.
Nov 05 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 05, 2012 at 01:42:47PM -0500, Jonathan M Davis wrote:
 On Monday, November 05, 2012 15:48:41 deadalnix wrote:
 HS Teoh explained it nicely.
 
 The responsibility of using .transient or not belong to the consumer.
 Only the consumer can know if a transient range is suitable.
 
 So you don't wrap a transient range. You wrap a non transient range,
 and, if the consumer is able to handle the transcient version, it call
 .transient on its source, that call it on its source, etc . . . up to
 the real source.
 
 If one transformer is unable to handle transient range, the it don't
 pass .transient to its source.

You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.

You *can* wrap it if the wrapper range can support transience. It doesn't *have* to be wrapped. That's what I like about deadalnix's proposal -- do nothing, and you get *correct* behaviour. Do something, and you make it faster. Sounds like a good deal to me. Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them. No matter what you do, this has to be fixed, one way or another. By fixing *all* ranges to be non-transient by default (and, as you say, there are only a scant few of them -- byLine is the only Phobos example I can think of), we fix all of this broken behaviour instantly. The rest is optional optimization.
 At least with Andrei's proposal, transience is explicitly restricted
 to input ranges, which seriously reduces the problems caused by them,
 particularly since so few functions can really function with just an
 input range.

[...] With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition. I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well). Furthermore, afterwards there is still no safety net: accidentally create a transient forward range? You're screwed, std.algorithm breaks all over the place. You said that ranges are supposed to be easy for users to create. Well, with Andrei's scheme, all the user has to do is to write a .save method in his input range, and all of a sudden everything breaks. The problem is that input ranges allow transient .front to begin with. Now users have to remember, once I add .save, I have to make .front non-transient. This is very counterintuitive to me. I'm extending a range to make it usable as a forward range, and now suddenly I can't reuse my buffers anymore? In my mind, forward ranges should be a refinement of input ranges. So I should be able to just add more stuff to my input range, and it should become a valid forward range. But now this is not true anymore, and the reason? We arbitrarily decided to make forward ranges non-transient. By making *all* ranges non-transient by default, we avoid this problem. Users don't have to remember the strange interaction between .front and .save. You only get transient ranges when you explicitly ask for it. So code is safe by default, and if you know what you're doing, you can make it faster. And std.array.array actually works correctly 100% of the time with no further changes. The current situation is, code is *not* safe by default. Pass a transient range to std.algorithm, and sometimes it works, sometimes it breaks. Passing it to std.array.array sometimes works, sometimes doesn't. Subtle bugs arise everywhere. With Andrei's proposal, you still have the same problem: forget to update a Phobos algorithm that expects an input range, and it breaks. You have to pretty much rewrite every single algorithm in Phobos that expects an input range, AND complete this rewrite BEFORE code starts working correctly. Then after that, every time you write code that takes an input range, you have to watch out for transient ranges, otherwise things will break. And std.array.array can no longer take an input range because it can't copy .front correctly. IOW, input ranges become almost completely useless. How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons. T -- "Hi." "'Lo."
Nov 05 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/5/12 9:45 PM, H. S. Teoh wrote:
 Besides, almost *all* of those wrapper ranges are currently _broken_ for
 transient ranges. You get undefined behaviour when you inadvertently
 pass a transient range to them.

It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.
 With Andrei's proposal, all code that assumes transient .front with
 input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.
 I've looked over std.algorithm --
 there are a *lot* of places with this assumption. Instead of getting
 correct default behaviour, you get a whole bunch of broken code with
 buggy behaviour, unless you first rewrite a lot of code in
 std.algorithm (and probably std.range as well).

Could you please put together a list of these algorithms so we have it? Thanks.
 Furthermore, afterwards there is still no safety net: accidentally
 create a transient forward range?

But that goes for any incorrect range.
 How is this any better than deadalnix's solution?  From what I can tell,
 it's a lot worse, for all of the above reasons.

I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution. Andrei
Nov 05 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 05/11/2012 22:51, Andrei Alexandrescu a crit :
 On 11/5/12 9:45 PM, H. S. Teoh wrote:
 Besides, almost *all* of those wrapper ranges are currently _broken_ for
 transient ranges. You get undefined behaviour when you inadvertently
 pass a transient range to them.

It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.

Unless you know the internal implementation of the range, it is undefined (nothing in the spec specify what the range does in this case, which IS undefined behavior).
 With Andrei's proposal, all code that assumes transient .front with
 input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

You never addressed the std.array.array() problem. Neither the fact that forward ranges may require to be transient (I have that in my rubik's cube solver, and H. S. Teoh seems to needs that too).
 How is this any better than deadalnix's solution? From what I can tell,
 it's a lot worse, for all of the above reasons.

I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.

I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one. Let me explain what is wrong in your proposal (=> forward range = non transient / input range = transient). First two concepts are packed into one. I can safely say that both are not related, and uses cases has already been presented in all possible combination of input/forward transient or not. This usually seems like a win, but ends up creating a more complex situation at the ends. I've made that mistake quite a lot of time myself, surely enough to be sure that this is a bad idea. See for instance how the private method made final automagicaly (which sounds nice at first) create inconsistent behavior when it come to NVI as explained in TDPL. This is typical of how different concept packed into one tends to explode at some point. Additionally, the proposed solution put a responsibility on the developer : he/she must ensure that all its code with input ranges is compatible with transient range. The fact is that transient stuff are highly unusual in most programming languages (even when possible, it is usually avoided like plague), and that all input ranges will not be transient. This WILL result in a lot of code in the field using input range but not supporting when it is transient. Putting the responsibility on the programmer never worked since programmer exists, and it is not going to improve soon.
Nov 05 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 1:26 AM, deadalnix wrote:
 Le 05/11/2012 22:51, Andrei Alexandrescu a crit :
 On 11/5/12 9:45 PM, H. S. Teoh wrote:
 Besides, almost *all* of those wrapper ranges are currently _broken_ for
 transient ranges. You get undefined behaviour when you inadvertently
 pass a transient range to them.

It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.

Unless you know the internal implementation of the range, it is undefined (nothing in the spec specify what the range does in this case, which IS undefined behavior).

That would be unspecified behavior.
 With Andrei's proposal, all code that assumes transient .front with
 input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

You never addressed the std.array.array() problem. Neither the fact that forward ranges may require to be transient (I have that in my rubik's cube solver, and H. S. Teoh seems to needs that too).
 How is this any better than deadalnix's solution? From what I can tell,
 it's a lot worse, for all of the above reasons.

I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.

I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one.

Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?
 Let me explain what is wrong
 in your proposal (=> forward range = non transient / input range =
 transient).

I am well aware of the relative tradeoffs. Arguing for .transient is futile at this point. What we need to do is find something better. Andrei
Nov 05 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 01:21, Andrei Alexandrescu a crit :
 I'd like to see a proposal discarded in favor of a better proposal. I'm
 certain I can say that we don't have one.

Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?

I certainly don't ! I'd be happy to switch to another solution granted it is superior. I just don't see any right now, which obviously don't mean none exist.
 Let me explain what is wrong
 in your proposal (=> forward range = non transient / input range =
 transient).

I am well aware of the relative tradeoffs. Arguing for .transient is futile at this point. What we need to do is find something better.

To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal). Now, as previously said, if someone come up with something better, I'd be happy to drop it completely. I'm not pushing this proposal because it is mine.
Nov 05 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 01:44:28AM +0100, deadalnix wrote:
 Le 06/11/2012 01:21, Andrei Alexandrescu a crit :

[...]
Good solutions are found in the minds of people. Starting from the
idea that .transient is unacceptably complicated, work from that
angle. You can't claim you have reached perfection in design can you?


I'd like to hear a counterproposal to the .transient idea. Let's not get into a battle of words with nothing on the table to discuss. [...]
 To be honest, my biggest fear isn't that this proposal is rejected,
 but that we fallback as default on the input range = transient /
 forward range = non transient scheme, because we fail to come up with
 something better, or that the status quo is choosen (as both seems to
 me worse than the .transient proposal).

[...] Yeah, that is my fear also. Which is why I'm making noise about this issue. The status quo is definitely out of the question, as it leads to undefined/unspecified behaviour with basically no warning. Conflating transience with input range is not workable IMO, because it leads to leaky abstractions like transient forward ranges being considered non-forward ranges because we arbitrarily decided that forward ranges cannot be transient. I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue. Denying these cases just for the sake of holding on to an idealized abstraction is only further proof that the abstraction is leaky. We all know from experience that leaky abstractions will only lead to trouble down the road. The only other remaining proposal so far is Jonathan's, that is to redefine all ranges to be non-transient. Then we fix byLine() to be non-transient, and everything else will work with no further trouble. We also don't get to use std.algorithm with any transient ranges, and will in all likelihood reinvent std.algorithm everywhere a transient range pops up. I don't see this as a viable solution either. So the .transient idea currently seems to be the best we have. I'd love to hear a superior proposal, if there is one. T -- I don't trust computers, I've spent too long programming to think that they can get anything right. -- James Miller
Nov 05 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 3:06 AM, H. S. Teoh wrote:
 I've already given a number of
 non-trivial examples of transient forward ranges, and apparently
 deadalnix has also encountered the same issue.

I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory. Andrei
Nov 05 2012
next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 02:36, Andrei Alexandrescu a crit :
 On 11/6/12 3:06 AM, H. S. Teoh wrote:
 I've already given a number of
 non-trivial examples of transient forward ranges, and apparently
 deadalnix has also encountered the same issue.

I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save.

Yes my code can also fall in the .dup stuff. If we actually make that difference, our case is moot, but a hell lot of code is broken as well. For instance, the way map is defined is highly incorrect as each call to .front can pop a newly generated element as well, and I'm pretty that isn't the only one. We are here back to equality vs identity, and it seems D also have some stuff to fix in that regard (notably struct and AA). Discussing that here only make the problem more confuse and is not helping as long as this question is open in D.
Nov 05 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 03:36:54AM +0200, Andrei Alexandrescu wrote:
 On 11/6/12 3:06 AM, H. S. Teoh wrote:
I've already given a number of non-trivial examples of transient
forward ranges, and apparently deadalnix has also encountered the
same issue.

I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory.

[...] I suppose you could argue that way. But that still doesn't solve the problem of std.array.array with input ranges. Making std.array.array require forward ranges seems to defeat its purpose, to me. So even if you get rid of transience in forward ranges, you still need to make that distinction with input ranges. Which still requires large-scale code changes to existing Phobos algorithms. And which still requires some sort of complication to the existing range code. T -- Music critic: "That's an imitation fugue!"
Nov 05 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 2:44 AM, deadalnix wrote:
 To be honest, my biggest fear isn't that this proposal is rejected, but
 that we fallback as default on the input range = transient / forward
 range = non transient scheme, because we fail to come up with something
 better, or that the status quo is choosen (as both seems to me worse
 than the .transient proposal).

I think the simplification of input range = transient and forward range = not transient has a lot going for it. It is simple, easy to explain and understand, and builds on simple real-life examples (buffered input and singly-linked lists). Clearly adding a new notion to the soup makes for more expressiveness, but it also makes for more complexity and subtlety in support for niche ranges. This should not be neglected. Andrei
Nov 05 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 02:40, Andrei Alexandrescu a crit :
 On 11/6/12 2:44 AM, deadalnix wrote:
 To be honest, my biggest fear isn't that this proposal is rejected, but
 that we fallback as default on the input range = transient / forward
 range = non transient scheme, because we fail to come up with something
 better, or that the status quo is choosen (as both seems to me worse
 than the .transient proposal).

I think the simplification of input range = transient and forward range = not transient has a lot going for it. It is simple, easy to explain and understand, and builds on simple real-life examples (buffered input and singly-linked lists). Clearly adding a new notion to the soup makes for more expressiveness, but it also makes for more complexity and subtlety in support for niche ranges. This should not be neglected.

At this point, if transient is out I'd prefer Jonathan's proposal were everything is non transient. This is clearly simpler to use and break less code. Indeed, the added complexity of .transient exists. The beauty of it is that it is possible to write 100% correct code without even knowing .transient exists. This is why I like this option : the added complexity only exists for the programmer that what to explore the arcane of the language (which include you and me, but not most D users).
Nov 05 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote:
[...]
With Andrei's proposal, all code that assumes transient .front with
input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.
I've looked over std.algorithm -- there are a *lot* of places with
this assumption. Instead of getting correct default behaviour, you
get a whole bunch of broken code with buggy behaviour, unless you
first rewrite a lot of code in std.algorithm (and probably std.range
as well).

Could you please put together a list of these algorithms so we have it? Thanks.

This is probably an incomplete list, but it's a start: std.algorithm.reduce - when no seed is given. std.algorithm.joiner - both variants (I have a fix for the second variant) std.algorithm.group std.algorithm.minCount (std.algorithm.minPos - but takes forwardRange, so probably OK) (std.algorithm.Levenshtein - takes forwardRange, probably OK) (std.algorithm.makeIndex - takes forwardRange, probably OK) std.algorithm.splitter - tries to take slices without checking if range is sliceable std.algorithm.uniq std.algorithm.topNCopy - copies .front of input range. std.algorithm.NWayUnion - copies .front of (possibly input) outer range into a heap (std.algorithm.largestPartialIntersection - uses NWayUnion) std.array.array - tries to copy input range std.array.insertInPlace - tries to copy input range std.array.join - tries to copy input range std.stdio.writeln - there's too much code here so I didn't narrow it down, but testing with a transient range shows that it doesn't work correctly. This probably implicates std.format somewhere. I didn't check the other Phobos modules, but basically any code that currently takes an input range needs to be audited for this issue. [...]
How is this any better than deadalnix's solution?  From what I can tell,
it's a lot worse, for all of the above reasons.

I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.

[...] I still think forcing input ranges to be transient is oversimplifying the issue. Whether a range is transient is orthogonal to whether you can .save the current position or whether you can traverse it in two directions. Trying to conflate the two only leads to leaky abstractions. The only real solution IMO is to address the issue head-on: either recognize transience as an inherent property of ranges, or (re)define ranges to exclude all transience. If we recognize transience as an inherent property of ranges, then no matter what we do, there's going to be complications, because you'll always have to deal with two cases. Either an algorithm can handle transience, or it can't. If it can't, there needs to be a way to make it so that it will reject transient ranges somehow. With deadalnix's proposal, the overhead of doing this is more or less minimized, but it is still there. If we exclude all transience, then there is no additional complication. We just have to fix the current code and make it crystal clear in the documentation that .front must always be persistent no matter what. The disadvantage here is that some algorithms, like find(), don't *need* the persistence of .front, so you lose generality. Any code that works with transient ranges will have to stick with foreach, or reinvent std.algorithm. With .transient proposal, this is the default behaviour, except that we have the *option* of using transient ranges if both the range type and the algorithm can handle it. So the .transient proposal is pretty close to a comfortable balance between the two extremes. If it is still considered too complicated, then I'd like to hear what other proposals will address all the underlying issues in just as nice a way (or an even nicer way), without compromising this balance between the two extremes. T -- In order to understand recursion you must first understand recursion.
Nov 05 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 05, 2012 16:49:42 H. S. Teoh wrote:
 On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote:
 [...]
 
With Andrei's proposal, all code that assumes transient .front with
input ranges are broken by definition.

I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.

Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.

We can create an hasTransientFront trait for checking whether front is transient or not. It can't be 100% accurate, but it would fall on the side of declaring something transient when it wasn't, so it wouldn't declare something transient to be non-transient. Then any range for which hasTransientFront was false could work with std.array.array. For the rest, they can do something like auto arr = std.array.array(map!"a.dup"(file.byLine())); But it's certainly true that fixing std.array.array will break code, which is annoying. But as long as front can be transient, there's no way around that. byLine cannot possibly work with std.array.array as it stands. The only solutions that I see at this point are 1. Exclude transience completely. 2. Mark ranges as transient in some way and force algorithms to take this new trait into account. 3. Insist that input ranges have transient fronts (save perhaps when it can be statically verified that they aren't based on front's type) and that anything requiring a non-transient front require forward ranges. 4. Make all ranges non-transient but provide a way to get at a transient version of it and force algorithms to take this new property into account. All 4 solutions have problems. They just have different problems.
 I still think forcing input ranges to be transient is oversimplifying
 the issue. Whether a range is transient is orthogonal to whether you can
 .save the current position or whether you can traverse it in two
 directions. Trying to conflate the two only leads to leaky abstractions.

I completly agree that transience and whether something is an input range are orthogonal, but we need to simplify.
 The only real solution IMO is to address the issue head-on: either
 recognize transience as an inherent property of ranges, or (re)define
 ranges to exclude all transience.

Personally, I'm all for excluding transience completely and insisting that anything which would be transient use opApply, but Andrei doesn't like that idea. It would certainly be the simplest solution though, and it probably wouldn't really cause much of a problem for ranges like ByLine, because in most cases what you do is use them with a foreach loop. std.array.array would be the main loss there, but if opApply is used for foreach rather than the range functions (I don't know which currently gets precedence), then ByLine can become a normal range when used with range functions (i.e. no transience) and just reuse its buffer with opApply. Then it would work with range-based functions but still avoid extra allocations when simply iterating over it. That would be my ideal solution at this point, but clearly not everyone agrees. - Jonathan M Davis
Nov 05 2012
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 02:03:45AM +0100, Jonathan M Davis wrote:
[...]
 The only solutions that I see at this point are
 
 1. Exclude transience completely.
 
 2. Mark ranges as transient in some way and force algorithms to take
 this new trait into account.
 
 3. Insist that input ranges have transient fronts (save perhaps when
 it can be statically verified that they aren't based on front's type)
 and that anything requiring a non-transient front require forward
 ranges.
 
 4. Make all ranges non-transient but provide a way to get at a
 transient version of it and force algorithms to take this new property
 into account.
 
 All 4 solutions have problems. They just have different problems.

[...] Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it). Then an algorithm like std.array.array can be written something like this: auto array(R)(R range) { auto a = appender!(ElementType!R)(); while (!range.empty) { // Self-documenting: we expect to get a // persistent copy of the front value. a.put(range.copyFront); range.popFront(); } return a.data; } OTOH, an algorithm that is agnostic to transience, like find(), can be written something like this: R find(R,S)(R haystack, S needle) { while (!haystack.empty) { // Self-documenting: we don't expect a // persistent front value. if (haystack.refFront == needle) return haystack; haystack.popFront(); } return haystack; } Of course, this immediately breaks many ranges, because they only have a single .front currently. Which is bad. So we make use of UFCS: property auto copyFront(R)(R range) { // Not sure how to do this yet, the idea is that // copyFront should be default. return range.front; } property auto refFront(R)(R range) { // By default, .refFront is the same as copyFront. return range.copyFront; } ByLine can then do this: struct ByLine { char[] buf; property auto copyFront() { return buf.dup; } property auto refFront() { return buf; } } I haven't worked out all the details yet, but this is the gist of it. While it's still not perfect, it is a slight improvement over the .transient proposal in that no new range types are involved. There is still the issue of naming: I chose .copyFront and .refFront just for illustration purposes. I'm thinking probably we can make .front == .copyFront, so that it will automatically work with (almost) all current ranges, which should be non-transient. UFCS saves us from having to implement .refFront for everything except actual transient ranges, which should be only a few rare cases. Of course, this has the disadvantage of needing to update existing algorithms to use .refFront or .copyFront where appropriate -- but then, we have to do that anyway if we're going to support transient ranges at all. Maybe somebody can improve on this idea? T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Nov 05 2012
next sibling parent reply "Tommi" <tommitissari hotmail.com> writes:
I don't think this whole issue has anything to do with ranges. I 
think this is an issue of assuming that the symbol = means "copy 
what's on the right to what's on the left". When in reality, = 
could mean: (if what's on the right has reference semantics) 
"make what's on the left reference the same thing that the thing 
on the right references".

I think all range types should be allowed to return whatever they 
want from their front property. It's the responsibility of the 
user of the range to *actually* copy what front returns (if 
that's what he intends), instead of assuming that = means copy.

In the code below, X marks the bug:

module main;

import std.stdio;
import std.range;

class MyData
{
     int _value;
}

struct MyFwdRange
{
     MyData _myData;

     this(MyData md)
     {
         _myData = md;
     }

      property MyData front()
     {
         return _myData;
     }

      property bool empty() const
     {
         return _myData._value > 10;
     }

     void popFront()
     {
         _myData._value++;
     }

      property MyFwdRange save() const
     {
         auto copy = MyFwdRange();
         copy._myData = new MyData();
         copy._myData._value = _myData._value;
         return copy;
     }
}

void printFirstAndSecond(R)(R range)
     if (isForwardRange!MyFwdRange)
{
     //         X
     auto first = range.front; // That is *not* copying

     range.popFront();
     writeln(first._value, " ", range.front._value);
}

void main()
{
     auto mfr = MyFwdRange(new MyData());

     printFirstAndSecond(mfr); // prints: 1 1 (instead of 0 1)
}
Nov 05 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 05:18:18AM +0100, Tommi wrote:
 I don't think this whole issue has anything to do with ranges. I
 think this is an issue of assuming that the symbol = means "copy
 what's on the right to what's on the left". When in reality, = could
 mean: (if what's on the right has reference semantics) "make what's
 on the left reference the same thing that the thing on the right
 references".
 
 I think all range types should be allowed to return whatever they
 want from their front property. It's the responsibility of the user
 of the range to *actually* copy what front returns (if that's what
 he intends), instead of assuming that = means copy.

[...] The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type. Unless we introduce a standardized deep copy operation to D, like a .clone method that all copyable types implement, this solution isn't really viable. T -- The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5
Nov 05 2012
parent reply "Tommi" <tommitissari hotmail.com> writes:
On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because 
 generic code by definition doesn't know how to copy an
 arbitrary type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
The problem is that you can't do this in generic code, because
generic code by definition doesn't know how to copy an arbitrary
type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem. T -- Verbing weirds language. -- Calvin (& Hobbes)
Nov 05 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 7:48 AM, H. S. Teoh wrote:
 On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because
 generic code by definition doesn't know how to copy an arbitrary
 type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.

Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written. Andrei
Nov 05 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 07:56, Andrei Alexandrescu a crit :
 On 11/6/12 7:48 AM, H. S. Teoh wrote:
 On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because
 generic code by definition doesn't know how to copy an arbitrary
 type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.

Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written.

You mentioned me once that AOP was useless in D.
Nov 06 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 3:50 PM, deadalnix wrote:
 Le 06/11/2012 07:56, Andrei Alexandrescu a crit :
 On 11/6/12 7:48 AM, H. S. Teoh wrote:
 On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because
 generic code by definition doesn't know how to copy an arbitrary
 type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.

Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written.

You mentioned me once that AOP was useless in D.

What's the connection? Andrei
Nov 06 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 15:44, Andrei Alexandrescu a crit :
 Here's where user defined  tributes would help a lot. We'd then define
  owned to mention that a class reference field inside an object must be
 duplicated upon copy:

 class A { ... }

 struct B
 {
  owned A payload;
 A another;
 ...
 }

 That way a generic clone() routine could be written.

You mentioned me once that AOP was useless in D.

What's the connection?

This is OT. But this owned stuff coupled with code that is generated depending on its presence IS AOP.
Nov 06 2012
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 11/06/2012 07:56 AM, Andrei Alexandrescu wrote:
 On 11/6/12 7:48 AM, H. S. Teoh wrote:
 On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because
 generic code by definition doesn't know how to copy an arbitrary
 type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.

Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written. Andrei

That is actually the first thing that I will use the new attribute system for.
Nov 06 2012
prev sibling next sibling parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 6 November 2012 at 04:49:45 UTC, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because 
 generic code by definition doesn't know how to copy an 
 arbitrary type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

C++ as a language doesn't, but if you follow the convention that C++ establishes in its libraries (where copy assignment & copy construction == deep copy), it always works out correctly. D doesn't have that convention so that's why we're running into trouble.
Nov 05 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 6:49 AM, Tommi wrote:
 On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:
 The problem is that you can't do this in generic code, because generic
 code by definition doesn't know how to copy an
 arbitrary type.

I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.

Languages commonly have trouble defining comparison and copying generically. More often than not user intervention is needed (e.g. see Java's clone, Lisp's many comparison operators etc). Andrei
Nov 05 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is that we
 are conflating two kinds of values, transient and persistent, under a
 single name .front. What if we explicitly name them? Say, .copyFront for
 the non-transient value and .refFront for the transient value (the
 names are unimportant right now, let's consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine. Andrei
Nov 05 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is that we
 are conflating two kinds of values, transient and persistent, under a
 single name .front. What if we explicitly name them? Say, .copyFront for
 the non-transient value and .refFront for the transient value (the
 names are unimportant right now, let's consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.

peekFront would work better than copy, because whether front needs to be copied or not doesn't necessarily have much to do with its type. For instance, byLine/eachLine can return char[] from front just fine and still have it be a new array every time. So, while in some cases, you can tell from the type that no copy is needed (e.g. string), you can't tell in the general case and would be forced to make needless copies in a number of cases. For instance, every range of class objects would end up having to make a copy, because they'd be mutable reference types, and without knowing that the range is doing, you have no way of knowing whether it keeps replacing the objects referred to by front or not (it's not particularly likely that it would be, but you can't tell for sure just the same). If we defined peekFront via UFCS as a wrapper which calls front, then anything wanting to use peekFront could use peekFront regardless of whether the type defined it or not. So, that would reduce the impact caused by its introduction, but it would still impact a lot of range types ultimately, because we'd have to create appropriate wrappers for peekFront in most of them, or we'd end up making unnecessary copies. I don't like how much this impacts, but as H. S. Teoh points out, we don't exactly have very many options with minimal impact beyond banning transient fronts entirely (which I'd honestly like to do just the same). At least this avoids the need to create more traits to test ranges for, since if we create a free function peekFront, all range types can just assume that it's there and create wrappers for it without caring whether the wrapped range defines it itself or uses the free function. And it's less complicated than the .transient suggestion. Though it _does_ introduce the possibility of front and peekFront returning completely different types, which could complicate things a bit. It might be better to require that they be identical to avoid that problem. For better or worse though, this approach would mean that byLine (or eachLine or whatever) wouldn't be reusing the buffer with foreach like they do now, though I suppose that you could make them have opApply which does the same thing as now (meaning that it effectively uses peekFront), and then any range- based functions would use front until they were updated to use peekFront if appropriate. But then again, maybe we want byLine/eachLine to copy by default, since that's safer, much as it's less efficient, since then we have safe by default but still have an explicit means to be more efficient. That fits in well with our general approach. peekFront may be the way to go, but I think that we need to think through the consequences (like the potential problems caused by front and peekFront returning different types) before we decide on this. - Jonathan M Davis
Nov 05 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 08:17, Jonathan M Davis a écrit :
 If we defined peekFront via UFCS as a wrapper which calls front, then anything
 wanting to use peekFront could use peekFront regardless of whether the type
 defined it or not. So, that would reduce the impact caused by its introduction,
 but it would still impact a lot of range types ultimately, because we'd have
 to create appropriate wrappers for peekFront in most of them, or we'd end up
 making unnecessary copies.

Assuming you call front on the source range, you don't need to copy : you already work on a clean copy. With that one, you ends up duplicating all code that use front on wrapper range into one that use peekFront as well. I'm not sure this is better than .transient, but maybe.
 I don't like how much this impacts, but as H. S. Teoh points out, we don't
 exactly have very many options with minimal impact beyond banning transient
 fronts entirely (which I'd honestly like to do just the same).

If we really want to go simple, this seems like the way to go.
 At least this avoids the need to create more traits to test ranges for, since
 if we create a free function peekFront, all range types can just assume that
 it's there and create wrappers for it without caring whether the wrapped range
 defines it itself or uses the free function. And it's less complicated than the
 .transient suggestion. Though it _does_ introduce the possibility of front and
 peekFront returning completely different types, which could complicate things a
 bit. It might be better to require that they be identical to avoid that
 problem.

I'm not sure yet this is simpler.
 For better or worse though, this approach would mean that byLine (or eachLine
 or whatever) wouldn't be reusing the buffer with foreach like they do now,
 though I suppose that you could make them have opApply which does the same
 thing as now (meaning that it effectively uses peekFront), and then any range-
 based functions would use front until they were updated to use peekFront if
 appropriate. But then again, maybe we want byLine/eachLine to copy by default,
 since that's safer, much as it's less efficient, since then we have safe by
 default but still have an explicit means to be more efficient. That fits in
well
 with our general approach.

Safe by default seems like the way to go.
 peekFront may be the way to go, but I think that we need to think through the
 consequences (like the potential problems caused by front and peekFront
 returning different types) before we decide on this.

Yes especially string/char[] has already been detected as a potential problem.
Nov 06 2012
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 07:49, Andrei Alexandrescu a crit :
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is that we
 are conflating two kinds of values, transient and persistent, under a
 single name .front. What if we explicitly name them? Say, .copyFront for
 the non-transient value and .refFront for the transient value (the
 names are unimportant right now, let's consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.

This have the same issue than .transient have in regard of transformer ranges (BTW what is the correct terminology for that ?).
Nov 06 2012
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis wrote:
 On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is
 that we are conflating two kinds of values, transient and
 persistent, under a single name .front. What if we explicitly name
 them? Say, .copyFront for the non-transient value and .refFront
 for the transient value (the names are unimportant right now,
 let's consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.


I like .peekFront. It's easy to understand, and is self-documenting. And this approach is safe by default (all current code uses .front), and can be implemented incrementally (doesn't require updating all of Phobos at once to avoid transience-related bugs in the interim).
 peekFront would work better than copy, because whether front needs to
 be copied or not doesn't necessarily have much to do with its type.

Yeah, I think trying to guess transience from the return type is risky, and prone to bugs. OTOH, having a generic language-wide way to duplicate a value is something worth considering. I think it will allow us to solve a number of other nagging issues. [...]
 At least this avoids the need to create more traits to test ranges
 for, since if we create a free function peekFront, all range types can
 just assume that it's there and create wrappers for it without caring
 whether the wrapped range defines it itself or uses the free function.
 And it's less complicated than the .transient suggestion. Though it
 _does_ introduce the possibility of front and peekFront returning
 completely different types, which could complicate things a bit. It
 might be better to require that they be identical to avoid that 
 problem.

The free function can be made to always conform to the same type as .front: ElementType!R peekFront(R)(R range) { return range.front; } Though it still doesn't prevent a member called .peekFront of a different type. We *could* use template constraints to enforce that though: auto myRangeFunc(R)(R range) if (is(typeof(range.front)==typeof(range.peekFront))) ... This could be put in a template that the other range-checking templates can call.
 For better or worse though, this approach would mean that byLine (or
 eachLine or whatever) wouldn't be reusing the buffer with foreach like
 they do now, though I suppose that you could make them have opApply
 which does the same thing as now (meaning that it effectively uses
 peekFront), and then any range- based functions would use front until
 they were updated to use peekFront if appropriate. But then again,
 maybe we want byLine/eachLine to copy by default, since that's safer,
 much as it's less efficient, since then we have safe by default but
 still have an explicit means to be more efficient. That fits in well
 with our general approach.

I think we should make byLine copy by default. Only if you explicitly call peekFront you get the non-copying behaviour. This fits better with D's motto of being safe by default, and fast if you know what you're doing.
 peekFront may be the way to go, but I think that we need to think
 through the consequences (like the potential problems caused by front
 and peekFront returning different types) before we decide on this.

[...] For simplicity, I think we should just enforce typeof(.front) == typeof(.peekFront). We will get needless complications otherwise. T -- We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true. -- Robert Wilensk
Nov 06 2012
parent "Mehrdad" <wfunction hotmail.com> writes:
On Tuesday, 6 November 2012 at 16:09:59 UTC, H. S. Teoh wrote:
 On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis 
 wrote:
 On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu 
 wrote:
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 One quite simple approach would be to define (on the 
 contrary) .peekFront, which means "yeah, I'd like to take a 
 peek at the front but I don't plan to store it anywhere".



That's not what "peek" means in any of the languages I've seen. Peek just means "show me the element, but don't pop it off". It does not have to do anything with how the caller uses the result. It would be very confusing to have both front and peekFront.
Nov 06 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 06/11/2012 07:49, Andrei Alexandrescu a crit :
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is that we
 are conflating two kinds of values, transient and persistent, under a
 single name .front. What if we explicitly name them? Say, .copyFront for
 the non-transient value and .refFront for the transient value (the
 names are unimportant right now, let's consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine. Andrei

Is it possible to have the pro and cons of peekFront vs transient ?
Nov 06 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:
 Le 06/11/2012 07:49, Andrei Alexandrescu a crit :
On 11/6/12 4:36 AM, H. S. Teoh wrote:
Hmm. Another idea just occurred to me. The basic problem here is
that we are conflating two kinds of values, transient and
persistent, under a single name .front. What if we explicitly name
them? Say, .copyFront for the non-transient value and .refFront for
the transient value (the names are unimportant right now, let's
consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.


[...]
 Is it possible to have the pro and cons of peekFront vs transient ?

I'll give it a shot, since nobody else is replying: Both .peekFront and .transient require at least existing transient ranges to be modified, as well as algorithms that can take advantage of them. - For .peekFront, existing transient ranges' .front is just renamed to .peekFront, and we add a new .front which returns the .dup (or otherwise copy) of .peekFront. For .transient, we need to write a .transient method that returns a wrapper struct whose .front is transient. * So .peekFront is slightly simpler in this case. - For .peekFront, algorithms that can handle transience can simply have all occurrences of .front replaced with .peekFront. Whereas for .transient, they will need to modified to explicitly call .transient, save that range, and use that instead of the passed-in range. * So .peekFront is slightly simpler in this case. - For .peekFront, algorithms that *can't* handle transience will have to be rewritten. Same goes for .transient. The redeeming factor in both cases is that algorithms that aren't rewritten yet will continue to work as before, just a bit slower. They will also begin to work correctly with transient ranges even _before_ being rewritten, whereas right now they're broken. * Here, .peekFront and .transient are equal in terms of effort required. - For .peekFront, the .front of *any* range is guaranteed to be non-transient, so we never have to worry about whether a particular range's .front is transient or not. User code has no possibility of passing a transient range into an algorithm that cannot handle it. For .transient, the .front of ranges is non-transient by default, but once you call .transient on it, you get a range whose .front is transient. There is a small possibility of wrong user code that passes a .transient range to an algorithm that can't handle it. * So .peekFront is slightly safer on this point. These are the points that I can think of, off the top of my head. Are there any others? Either way, both .peekFront and .transient requires rewriting currently transient ranges (which AFAIK is only byLine) and any algorithms that should be able to handle transience (there are a few I can think of, like std.algorithm.joiner, std.array.join, and maybe NWayUnion and writeln & co.). We can't do better than this if we're going to support transience at all. I note, though, that the list of algorithms that need to be updated is shorter (perhaps significantly so) than the list of currently-broken algorithms that I posted earlier, because many of the algorithms in that list *cannot* be implemented to handle transience. E.g. there is no way to implement uniq on a transient input range since there's no way to save the previous value seen, so there's no way to compare two adjacent values. Likewise, findAdjacent cannot be implemented for transient ranges for the same reason. Once these are eliminated from the list, there should only be a few algorithms left. (And I already have a version of joiner that works correctly with transient ranges.) T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
Nov 07 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 07/11/2012 19:24, H. S. Teoh a crit :
 On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:
 Le 06/11/2012 07:49, Andrei Alexandrescu a crit :
 On 11/6/12 4:36 AM, H. S. Teoh wrote:
 Hmm. Another idea just occurred to me. The basic problem here is
 that we are conflating two kinds of values, transient and
 persistent, under a single name .front. What if we explicitly name
 them? Say, .copyFront for the non-transient value and .refFront for
 the transient value (the names are unimportant right now, let's
 consider the semantics of it).

We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.


[...]
 Is it possible to have the pro and cons of peekFront vs transient ?

I'll give it a shot, since nobody else is replying: Both .peekFront and .transient require at least existing transient ranges to be modified, as well as algorithms that can take advantage of them. - For .peekFront, existing transient ranges' .front is just renamed to .peekFront, and we add a new .front which returns the .dup (or otherwise copy) of .peekFront. For .transient, we need to write a .transient method that returns a wrapper struct whose .front is transient. * So .peekFront is slightly simpler in this case.

Agreed !
 - For .peekFront, algorithms that can handle transience can simply have
    all occurrences of .front replaced with .peekFront. Whereas for
    .transient, they will need to modified to explicitly call .transient,
    save that range, and use that instead of the passed-in range.
     * So .peekFront is slightly simpler in this case.

.transient seems simpler to me in this case. Adding one line is easier than replacing all occurrences of something.
 - For .peekFront, algorithms that *can't* handle transience will have to
    be rewritten. Same goes for .transient. The redeeming factor in both
    cases is that algorithms that aren't rewritten yet will continue to
    work as before, just a bit slower. They will also begin to work
    correctly with transient ranges even _before_ being rewritten, whereas
    right now they're broken.
     * Here, .peekFront and .transient are equal in terms of effort
       required.

Indeed.
 - For .peekFront, the .front of *any* range is guaranteed to be
    non-transient, so we never have to worry about whether a particular
    range's .front is transient or not. User code has no possibility of
    passing a transient range into an algorithm that cannot handle it.
    For .transient, the .front of ranges is non-transient by default, but
    once you call .transient on it, you get a range whose .front is
    transient. There is a small possibility of wrong user code that passes
    a .transient range to an algorithm that can't handle it.
     * So .peekFront is slightly safer on this point.

 These are the points that I can think of, off the top of my head. Are
 there any others?

 Either way, both .peekFront and .transient requires rewriting currently
 transient ranges (which AFAIK is only byLine) and any algorithms that
 should be able to handle transience (there are a few I can think of,
 like std.algorithm.joiner, std.array.join, and maybe NWayUnion and
 writeln&  co.). We can't do better than this if we're going to support
 transience at all.

 I note, though, that the list of algorithms that need to be updated is
 shorter (perhaps significantly so) than the list of currently-broken
 algorithms that I posted earlier, because many of the algorithms in that
 list *cannot* be implemented to handle transience. E.g. there is no way
 to implement uniq on a transient input range since there's no way to
 save the previous value seen, so there's no way to compare two adjacent
 values. Likewise, findAdjacent cannot be implemented for transient
 ranges for the same reason. Once these are eliminated from the list,
 there should only be a few algorithms left. (And I already have a
 version of joiner that works correctly with transient ranges.)

OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication. Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea. Is a solution known here ? Could alias template parameter help ? I'm not sure what is the solution, but if one exist, I'm sold for .peekFront ( .transientFront seems like a better name as peek have other meaning in other languages ).
Nov 07 2012
next sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
 OK, overall, .peekFront seems like a good idea. Something I'm afraid
 with .peekFront is code duplication.
 
 Let's take the joiner example. Joiner's transientness depend on its
 source transientness. This seems to me like a very common case for
 transformer ranges. If we choose the .peekFront, the naive thing to do
 is to implement the same algorithm twice, using front and peekFront,
 which is code duplication, and usually a bad idea.

Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it). - Jonathan M Davis
Nov 07 2012
parent reply deadalnix <deadalnix gmail.com> writes:
Le 07/11/2012 22:40, Jonathan M Davis a écrit :
 On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
 OK, overall, .peekFront seems like a good idea. Something I'm afraid
 with .peekFront is code duplication.

 Let's take the joiner example. Joiner's transientness depend on its
 source transientness. This seems to me like a very common case for
 transformer ranges. If we choose the .peekFront, the naive thing to do
 is to implement the same algorithm twice, using front and peekFront,
 which is code duplication, and usually a bad idea.

Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it). - Jonathan M Davis

OK, seeing your answer and H. S Teoh, I think I badly expressed myself, because none of you understood. Let's try to explain it better. So back on our joiner range. This range have a source, and will be given to a consumer as follow : source -> joiner -> consumer . Now let's consider that source provide its own, transient peekFront. Now joiner can provide both peekFront (based on source.peekFront), which is transcient and front (based on source.front) which is not transcient. The fact is that both front and peekFront in joiner will have the same implementation, because the transience of joiner depends of the transience of its source (like most tranformer ranges). Or, with sample code : struct Joiner { R source; // Fields and range stuffs. property front() { // Some computation using source.front } property peekFront() { // Exact same computation using source.peekFront } } I hope this make the code duplication more apparent. It is impossible for joiner to provide only a version with peekFront, because joiner.front will be transient, and if it doesn't provide the version with .peekFront, it can't take advantage of the improvement that transience can provide. So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.
Nov 09 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Nov 09, 2012 at 02:52:20PM +0100, deadalnix wrote:
[...]
 OK, seeing your answer and H. S Teoh, I think I badly expressed
 myself, because none of you understood. Let's try to explain it
 better.
 
 So back on our joiner range. This range have a source, and will be
 given to a consumer as follow :
 source -> joiner -> consumer .
 
 Now let's consider that source provide its own, transient peekFront.
 Now joiner can provide both peekFront (based on source.peekFront),
 which is transcient and front (based on source.front) which is not
 transcient.
 
 The fact is that both front and peekFront in joiner will have the
 same implementation, because the transience of joiner depends of the
 transience of its source (like most tranformer ranges).
 
 Or, with sample code :
 
 struct Joiner {
     R source;
     // Fields and range stuffs.
 
      property front() {
         // Some computation using source.front
     }
 
      property peekFront() {
         // Exact same computation using source.peekFront
     }
 }
 
 I hope this make the code duplication more apparent.
 
 It is impossible for joiner to provide only a version with
 peekFront, because joiner.front will be transient, and if it doesn't
 provide the version with .peekFront, it can't take advantage of the
 improvement that transience can provide.

Hmm, you're right. At first I thought .front would just be a thin wrapper around .peekFront (e.g., return peekFront().dup), but this only works in the source range; joiner wouldn't know how to do that.
 So my question is, how this duplication can be avoided ? If it can,
 .peekFront is definitively the way to go. But if it can't, this
 seems like a really problematic point.

My first thought is to use a mixin to generate the function bodies, but that's really ugly and doesn't really avoid duplication in the generated code regardless. :-/ I'll have to think about it more carefully. T -- It is impossible to make anything foolproof because fools are so ingenious. -- Sammy
Nov 09 2012
parent reply "Tommi" <tommitissari hotmail.com> writes:
Now I finally see that deepDup/deepCopy/clone is not a (good) 
solution, because it would be inefficient in a lot of situations. 
This whole mess makes me wish that D was designed so that all 
types had value semantics (by convention, since it's probably not 
possible to enforce by the language).

That would mean:
1) No classes. Just duck-typing based polymorphism à la go 
language.
2) Dynamic arrays of mutable types would have had to been 
implemented as copy-on-write types à la Qt containers.

Don't know about the performance implications of COW though.
Nov 12 2012
next sibling parent reply "Tommi" <tommitissari hotmail.com> writes:
On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:
 This whole mess makes me wish that D was designed so that all 
 types had value semantics (by convention, since it's probably 
 not possible to enforce by the language).

 That would mean:
 1) No classes. Just duck-typing based polymorphism à la go 
 language.
 2) Dynamic arrays of mutable types would have had to been 
 implemented as copy-on-write types à la Qt containers.

...forgot to add how this relates to this issue: Then you'd solve this issue by specifying range concept so that front should return by value if it's transient, and front should return by reference or const reference if it's persistent. Then you'd capture front by const reference à la C++. If front returns reference or const reference, then const reference simply references that persistent data. If front returns by value (that's an rvalue from caller's point of view), then this C++ style const reference that we use for capture would extend the lifetime of this temporary rvalue to the const reference variable's scope. And, just to have some code in a post: ref const auto saved = range.front;
Nov 12 2012
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 12, 2012 20:51:53 Tommi wrote:
 Then you'd solve this issue by specifying range concept so that
 front should return by value if it's transient, and front should
 return by reference or const reference if it's persistent.

That wouldn't work. It's the complete opposite of what a generative range would require if it generates the return value in front. Narrow strings in particularly would be screwed by it, because their front is calculated, and since it's a free function, as is popFront, there's no way to save the return value of front in popFront. It has to be calculated in front, and it's not at all transient. It also screws with the ability to have sealed containers, since in that case, something like front would never return by ref. The refness of front's type really shouldn't have anything to do with its transience. They're completely unrelated. - Jonathan M Davis
Nov 12 2012
parent reply "Tommi" <tommitissari hotmail.com> writes:
On Monday, 12 November 2012 at 20:52:07 UTC, Jonathan M Davis 
wrote:
 That wouldn't work. It's the complete opposite of what a 
 generative range would require if it generates the return value 
 in front. Narrow strings in particularly would be screwed by 
 it, because their front is calculated, and
 since it's a free function, as is popFront, there's no way
 to save the return value of front in popFront. It has to
 be calculated in front, and it's not at all transient.

 It also screws with the ability to have sealed containers, 
 since in that case, something like front would never
 return by ref.

 The refness of front's type really shouldn't have anything
 to do with its transience. They're completely unrelated.

I didn't understand any of those arguments, so perhaps I still don't fully comprehend this issue. Persistence ----------- I thought persistent X means that there's a dedicated storage space for X where it will exist after front returns and X's value will not be tampered with by calling popFront. That notion of persistence led to me into thinking that persistent front could (and really should) return either a reference to X or a const reference if the range doesn't want to allow mutation through front. Transience ---------- The other side of the coin in my thinking was that transient means that X doesn't have a dedicated storage space. So transient X could be a variable that's local to function front, or it could be that X is stored in a buffer and will exist there after front returns, but the buffer would be overwritten by a call to popFront. And that notion of transience led me into thinking that transient front could never return a reference. NOTE: All of the above assumes that D were designed so that all types had value semantics.
Nov 12 2012
parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, November 12, 2012 23:01:22 Tommi wrote:
 On Monday, 12 November 2012 at 20:52:07 UTC, Jonathan M Davis
 
 wrote:
 That wouldn't work. It's the complete opposite of what a
 generative range would require if it generates the return value
 in front. Narrow strings in particularly would be screwed by
 it, because their front is calculated, and
 since it's a free function, as is popFront, there's no way
 to save the return value of front in popFront. It has to
 be calculated in front, and it's not at all transient.
 
 It also screws with the ability to have sealed containers,
 since in that case, something like front would never
 return by ref.
 
 The refness of front's type really shouldn't have anything
 to do with its transience. They're completely unrelated.

I didn't understand any of those arguments, so perhaps I still don't fully comprehend this issue. Persistence ----------- I thought persistent X means that there's a dedicated storage space for X where it will exist after front returns and X's value will not be tampered with by calling popFront. That notion of persistence led to me into thinking that persistent front could (and really should) return either a reference to X or a const reference if the range doesn't want to allow mutation through front. Transience ---------- The other side of the coin in my thinking was that transient means that X doesn't have a dedicated storage space. So transient X could be a variable that's local to function front, or it could be that X is stored in a buffer and will exist there after front returns, but the buffer would be overwritten by a call to popFront. And that notion of transience led me into thinking that transient front could never return a reference.

With regards to this discussion, transience means that front is a reference type which gets overwritten when popFront is called. So, when popFront is called any references to the return value of front which the caller has kept will change rather than staying the same as they were before popFront. That has _nothing_ to do with returning by ref. It's perfectly possible for a front which is non-transient to return by ref. front for arrays does that. On the other hand, front on narrow strings doesn't and _can't_ return by ref, because front is calculated on every call. There is no persistent storage for it. But it's _not_ transient like ByLine is. A range like ByLine _could_ return its front by ref. There's nothing stopping it, and it could work just fine. It's just that it doesn't benefit it any, and it would have to make sure that the retun value was still usable on each call to popFront (e.g. if ByLine returned by ref, it would have to worry about whether the buffer was then non-null or sufficiently large or whatever - but it could still work). So, it's unlikely that a range with a transient front would ever return by ref, but it can if it really wants to.
 NOTE: All of the above assumes that D were designed so that all
 types had value semantics.

Which will never happen. So, any discussion based on that premise is pretty pointless. And this discussion would never have happened in the first place if D had no reference types, because you'll never have a transient front with a value type. - Jonathan M Davis
Nov 12 2012
parent "Tommi" <tommitissari hotmail.com> writes:
On Monday, 12 November 2012 at 22:44:22 UTC, Jonathan M Davis 
wrote:
 On Monday, November 12, 2012 23:01:22 Tommi wrote:
 NOTE: All of the above assumes that D were designed so that all
 types had value semantics.

Which will never happen. So, any discussion based on that premise is pretty pointless. And this discussion would never have happened in the first place if D had no reference types, because you'll never have a transient front with a value type.

Oh, it was just a matter of miscommunication then. I bet you missed the following post of mine, which made it clear that I wasn't suggesting a solution, but simply dreaming of a better language design (like I usually am): On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:
 Now I finally see that deepDup/deepCopy/clone is not a (good) 
 solution, because it would be inefficient in a lot of 
 situations. This whole mess makes me wish that D was designed 
 so that all types had value semantics (by convention, since 
 it's probably not possible to enforce by the language).

 That would mean:
 1) No classes. Just duck-typing based polymorphism à la go 
 language.
 2) Dynamic arrays of mutable types would have had to been 
 implemented as copy-on-write types à la Qt containers.

 Don't know about the performance implications of COW though.

I may throw these wild ideas around, but I don't do it in order to have them implemented by D, but so that some-one could crush those ideas with counter-arguments. But yeah, this would be a wrong thread to have that discussion anyway.
Nov 12 2012
prev sibling parent "Tommi" <tommitissari hotmail.com> writes:
On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:
 This whole mess makes me wish that D was designed so that all 
 types had value semantics (by convention, since it's probably 
 not possible to enforce by the language).

"..so that all types had value semantics". That's a bit too harsh. Rather there would need to two kinds of types: 1) struct: own their data, value semantics 2) referer: don't own their data, reference semantics Dynamic arrays would fall into the first category; owns their data, have value semantics. Slices of dynamic arrays would be a separate type, falling into the second category; don't own the data, reference semantics. Range could be of either kind of type. You'd need two kinds of pointers too: the kind that owns its data, and the kind that references data that someone else owns. And you'd be able to differentiate between these two kinds of types at compile-time. Disclaimer: I promise not to spam this thread with this idea any further.
Nov 12 2012
prev sibling parent reply "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Friday, November 09, 2012 14:52:20 deadalnix wrote:
 So my question is, how this duplication can be avoided ? If it can,
 .peekFront is definitively the way to go. But if it can't, this seems
 like a really problematic point.

In most cases, it still won't be a problem. How much in the way of code is generally in front? Almost none. It's popFront which generally has the complicated logic. And if you do end up with a more abnormal range which requires a bunch of work in front, depending on what it does, it can just pass the result of front or peekFront to another function which does the work, or worst case, it can mixin the code. Mostly though, I wouldn't expect much code duplication to actually be required. - Jonathan M Davis
Nov 09 2012
parent "Mehrdad" <wfunction hotmail.com> writes:
On Friday, 9 November 2012 at 20:22:10 UTC, Jonathan M Davis 
wrote:
 On Friday, November 09, 2012 14:52:20 deadalnix wrote:
 So my question is, how this duplication can be avoided ? If it 
 can,
 .peekFront is definitively the way to go. But if it can't, 
 this seems
 like a really problematic point.

In most cases, it still won't be a problem. How much in the way of code is generally in front?

Quite a ton, if it's calculated lazily.
Nov 09 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Nov 07, 2012 at 10:40:32PM +0100, Jonathan M Davis wrote:
 On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:
 OK, overall, .peekFront seems like a good idea. Something I'm afraid
 with .peekFront is code duplication.
 
 Let's take the joiner example. Joiner's transientness depend on its
 source transientness. This seems to me like a very common case for
 transformer ranges. If we choose the .peekFront, the naive thing to
 do is to implement the same algorithm twice, using front and
 peekFront, which is code duplication, and usually a bad idea.

Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it).

[...] Yeah, UFCS will kick in if the range itself doesn't define peekFront, then the free function peekFront will simply call the range's .front instead. So basically: if an algo needs .front to be persistent, they use .front. If they don't care if .front is persistent, they use .peekFront. T -- You are only young once, but you can stay immature indefinitely. -- azephrahel
Nov 07 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Nov 04, 2012 at 07:11:22PM -0800, Jonathan M Davis wrote:
 On Sunday, November 04, 2012 15:40:18 deadalnix wrote:
 Le 04/11/2012 03:26, Jonathan M Davis a crit :
 3. Make it so that ranges which can be transient are non-transient
 by default but provide a function to get at a transient version
 for speed (which was the fastRange proposal in this thread). The
 main problem here is that when the fast range gets wrapped, it's
 transient, and so anything using the wrapped range will be forced
 to use the transient version rather than using the non- transient
 version and only using the transient version when it's asked for.
 So, I don't think that this is particularly viable.

Can you elaborate on that. I definitively don't see a problem that big here.

The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not.

[...] The crux of what deadalnix proposed is that range transformers simply pass .transient along, whilst range consumers decide whether or not to use .transient. So this isn't even an issue until you are in a range consumer, which can decide whether or not the transient range should be used. IOW, the choice between .transient or not isn't even made until the end. The range consumer is the one in the position to decide whether or not .transient should be used. T -- A bend in the road is not the end of the road unless you fail to make the turn. -- Brian White
Nov 04 2012
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/3/12 7:21 PM, H. S. Teoh wrote:
 On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:
 On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:
 Ah, I see. That makes sense. So basically it's not the source (or
 any intermediate step) that decides whether to use the optimization,
 but the final consumer.

 Though now we have the issue that all intermediate ranges must
 propagate .fast, which is troublesome if every range has to do it
 manually. Can this be handled automatically by UFCS?

It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.

[...] I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.

I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain. In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one. Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable. Andrei
Nov 04 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Nov 04, 2012 at 08:24:55AM -0500, Andrei Alexandrescu wrote:
 On 11/3/12 7:21 PM, H. S. Teoh wrote:

[...]
I wish Andrei would give some input as to how we should proceed with
this. I do consider this a major issue with ranges, because for
efficiency reasons I often write ranges that have transient .front
values, and they can lead to subtle bugs with the current
implementation of std.algorithm. It would be good to settle on this
issue one way or another. I'd be happy even if the decision is to say
that transient ranges are invalid and shouldn't be considered "real"
ranges. Anything is better than the current nebulous state of things
which only leads to subtle bugs for the unwary.

I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain. In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one. Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable.

[...] I think that this may be oversimplifying the issue. What about the example of a range of in-place permutations over an array? That range can be made a forward range, or even bidirectional (by reversing the sense of the comparison used by nextPermutation). But it is inherently a transient range. And this is merely one example. I have some non-trivial code that evaluates an expression tree, each node of which may generate multiple elements; one may take a range over all possible values that the tree may have. For efficiency reasons, each output value (which is vector-valued) reuses the same buffer. It is a forward range with a transient .front (it has to be a forward range, because otherwise one can't range over all combinations of multiple values that subtrees may produce). While I'm perfectly fine with the decision that such ranges shouldn't be considered proper ranges at all, it would greatly limit the applicability of std.algorithm in my code. Either I would have to artificially limit the range to an input range (which is a bit silly, as it amounts to renaming .save to something else just so std.algorithm wouldn't recognize it as a forward range), or I wouldn't be able to use std.algorithm to manipulate these ranges, but would have to roll my own. Of course, D makes it so that rolling my own isn't all that hard, but it would be a bit disappointing that the supposedly-generic algorithms in the standard library aren't that generic after all. T -- I see that you JS got Bach.
Nov 04 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/4/12 11:16 AM, H. S. Teoh wrote:
 Fetching lines should be solved by using types; trafficking in
 char[] does not offer guarantees about the preservation of the
 content. In contrast, trafficking in string formalizes a binding
 agreement between the range and its user that the content is
 immutable.

[...] I think that this may be oversimplifying the issue.

The express purpose here is to simplify instead of offering highly granular notions with many odd corner cases and combinations. So the risk of oversimplifying is indeed material.
 What about the
 example of a range of in-place permutations over an array? That range
 can be made a forward range, or even bidirectional (by reversing the
 sense of the comparison used by nextPermutation). But it is inherently a
 transient range.

That's an interesting example indeed. It would keep a T[] as its state and each popFront() would mutate that array to its next permutation. Implementing .save would be achieved by duplicating the array. I would argue this is a tenuous range to characterize as a forward range. Invoking .save on a "usual" forward range means saving the iteration state, with an understanding that the saved range will go through the same iteration steps as the initial one. But nonono, actually the saved range goes through DUPLICATES of the states that the original goes through. Clearly there is an isomorphism of the two sequences of states, but they're not the same states - for example, someone looking at the original array will see the original changing the array, but not the .save. So what you're looking at is a range that can define .dup, not one that can define .save.
 And this is merely one example. I have some non-trivial code that
 evaluates an expression tree, each node of which may generate multiple
 elements; one may take a range over all possible values that the tree
 may have. For efficiency reasons, each output value (which is
 vector-valued) reuses the same buffer. It is a forward range with a
 transient .front (it has to be a forward range, because otherwise one
 can't range over all combinations of multiple values that subtrees may
 produce).

I'm not sure about this example. Are we looking again at a .dup kind of thing?
 While I'm perfectly fine with the decision that such ranges shouldn't be
 considered proper ranges at all, it would greatly limit the
 applicability of std.algorithm in my code. Either I would have to
 artificially limit the range to an input range (which is a bit silly, as
 it amounts to renaming .save to something else just so std.algorithm
 wouldn't recognize it as a forward range), or I wouldn't be able to use
 std.algorithm to manipulate these ranges, but would have to roll my own.
 Of course, D makes it so that rolling my own isn't all that hard, but it
 would be a bit disappointing that the supposedly-generic algorithms in
 the standard library aren't that generic after all.

I understand. At the same time, there is a point where an abstraction can't comprehend all possible instances. There's already significant aggravation moveFront and hasLvalueElements etc., which I'd want to simplify given a chance. Continuing down that path seems risky to me. Andrei
Nov 04 2012