www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proposal: takeFront and takeBack

reply Jonathan M Davis <jmdavisProg gmx.com> writes:
This seems like it probably merits a bit of discussion, so I'm bringing it up 
here rather than simply opening a pull request.

At present, for some ranges (variably-lengthed ranges such as strings in 
particular), calling front incurs a cost which popFront at least partially 
duplicates. So, the range primitives are inherently inefficient in that they 
force you to incur that extra cost as you iterate over the range. Ideally, 
there would be a way to get front and pop it off at the same time so that you 
incur the cost only once (either that or have front cache its result in a way 
that lets it avoid the extra cost in popFront when popFront is called - though 
that wouldn't work with strings, since for them, the range primitives are free 
functions). So, I'm proposing takeFront and takeBack:

https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

For most ranges, takeFront does this:

auto takeFront(R)(ref R range)
    if(isInputRange!R && !isNarrowString!R)
{
    assert(!range.empty);
    auto retval = range.front;
    range.popFront();
    return retval;
}

So, it's pretty much the same cost as using front and popFront separately 
(whether it costs more or less probably depends on the exact code and 
optimizations, but it should be comparable). But for strings, it looks like 
this

auto takeFront(R)(ref R range)
    if(isNarrowString!R)
{
    import std.utf;
    assert(!range.empty);
    size_t index = 0;
    auto retval = decode(range, index);
    range = range[index .. $];
    return retval;
}

So, for strings, it'll be more efficient to use takeFront than calling front
and 
popFront separately. The idea then is that any user-defined range which can 
implement takeFront more efficiently than the default will define it. Then
range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they
gain 
the extra efficiency, or if they don't, then the free function is used with 
UFCS, and they incur the same cost that they'd incur calling front and 
popFront separately. So, it's invisible to range-based functions whether a 
range actually implements takeFront itself. takeBack does the same thing as 
takeFront but it does it with back and popBack for bidirectional ranges.

I _think_ that this is a fairly clean solution to the problem, but someone 
else might be able to point out why this is a bad idea, or they might have a 
better idea. And this will have a definite impact on how ranges are normally 
used if we add this, so I'm bringing it up here for discussion. Opinions? 
Thoughts? Insights?

Oh, and if we go with this, ideally, the compiler would be updated to use 
takeFront for foreach instead of front and popFront if a range implements it 
(but still do it the current way if it doesn't). So, if typeof(range) 
implements takeFront,

foreach(e; range) {...}

would then become

for(auto _range = range; !_range.empty;)
{
    auto e = _range.takeFront();
    ...
}

instead of

for(auto _range = range; !_range.empty; _range.popFront())
{
    auto e = _range.front();
    ...
}

but that's an optimization which could be added later.

- Jonathan M Davis
Jul 03 2012
next sibling parent reply "Roman D. Boiko" <rb d-coding.com> writes:
The name takeFront is too similar to take, which has very 
different semantics.
Jul 03 2012
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, July 03, 2012 19:17:40 Roman D. Boiko wrote:
 The name takeFront is too similar to take, which has very
 different semantics.
I have similar concerns, but takeFront and takeBack were the best that I could come up with. frontPopFront or retPopFront and the like would be horrible. So, suggestions are welcome. The concept is more important than the name, and if someone can come up with a better name, then I'm all for it. - Jonathan M Davis
Jul 03 2012
prev sibling next sibling parent reply Wouter Verhelst <wouter grep.be> writes:
Jonathan M Davis <jmdavisProg gmx.com> writes:

 This seems like it probably merits a bit of discussion, so I'm bringing it up 
 here rather than simply opening a pull request.

 At present, for some ranges (variably-lengthed ranges such as strings in 
 particular), calling front incurs a cost which popFront at least partially 
 duplicates. So, the range primitives are inherently inefficient in that they 
 force you to incur that extra cost as you iterate over the range. Ideally, 
 there would be a way to get front and pop it off at the same time so that you 
 incur the cost only once (either that or have front cache its result in a way 
 that lets it avoid the extra cost in popFront when popFront is called - though 
 that wouldn't work with strings, since for them, the range primitives are free 
 functions). So, I'm proposing takeFront and takeBack:

 https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

 For most ranges, takeFront does this:

 auto takeFront(R)(ref R range)
     if(isInputRange!R && !isNarrowString!R)
 {
     assert(!range.empty);
     auto retval = range.front;
     range.popFront();
     return retval;
 }
Couldn't you just overload popFront? have a void popFront which throws off the first element without returning anything, and an auto popFront which does return data. I'd always been taught that "pop" means "read a bit and chop it off", which means that having to first read the front and then pop it off (i.e., in two separate methods) feels rather counterintuitive to me. That's in addition to the fact that yes, there's a performance issue. But hey, I've only been doing this D thing for a few weeks, so feel free to ignore me if I'm not making any sense :-) -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 03 2012
parent reply "monarch_dodra" <monarch_dodra gmail.com> writes:
On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:

 Couldn't you just overload popFront?

 have a void popFront which throws off the first element without
 returning anything, and an auto popFront which does return data.

 I'd always been taught that "pop" means "read a bit and chop it 
 off",
 which means that having to first read the front and then pop it 
 off
 (i.e., in two separate methods) feels rather counterintuitive to
 me. That's in addition to the fact that yes, there's a 
 performance
 issue.

 But hey, I've only been doing this D thing for a few weeks, so 
 feel free
 to ignore me if I'm not making any sense :-)
You can't overload by return value, so that is not possible. As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for. That's the way C++ does it, and is what I've come to expect from any language.
Jul 03 2012
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
monarch_dodra:

 As far as I can recall, I've always been taught that pop does 
 NOT (should not) return a value. Rationale being it makes you 
 pay for a read/copy you may not have asked for.
I think it's also a matter of exception safety.
 That's the way C++ does it, and is what I've come to expect 
 from any language.
I expect a language to have a pop() in its collections, that returns the first item and removes it from the collection, as in Python. In D sometimes I put the front and popfront on the same line of code, because I think of them almost as a single operation :-) Bye, bearophile
Jul 03 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 03/07/2012 19:27, monarch_dodra a écrit :
 On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:

 Couldn't you just overload popFront?

 have a void popFront which throws off the first element without
 returning anything, and an auto popFront which does return data.

 I'd always been taught that "pop" means "read a bit and chop it off",
 which means that having to first read the front and then pop it off
 (i.e., in two separate methods) feels rather counterintuitive to
 me. That's in addition to the fact that yes, there's a performance
 issue.

 But hey, I've only been doing this D thing for a few weeks, so feel free
 to ignore me if I'm not making any sense :-)
You can't overload by return value, so that is not possible. As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for. That's the way C++ does it, and is what I've come to expect from any language.
Many languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?
Jul 04 2012
parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
 Many languages does this (it doesn't mean it is the right thing 
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
Jul 04 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M Davis
Jul 04 2012
next sibling parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right 
 thing
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M Davis
If you pop from a container and return this the copy constructor / postblit will run. If in this moment a exception is thrown, than this value is lost.
Jul 04 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 04, 2012 22:10:36 Tobias Pankrath wrote:
 On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right
 thing
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M Davis
If you pop from a container and return this the copy constructor / postblit will run. If in this moment a exception is thrown, than this value is lost.
Yes. But the cost of copying the value and the cost of the exception are separate. Simply popping off the element could throw an exception (in the case of strings in D, you'll get a UTFException if the string is malformed). So, _regardless_ of whether you return an element, you potentially risk an exception being thrown (depending on the implementation of the container or range). And so I think that exceptions are pretty much irrelevant to the discussion of whether returning an element from popFront is a good idea or not. - Jonathan M Davis
Jul 04 2012
parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
 Yes. But the cost of copying the value and the cost of the 
 exception are
 separate.
The argument is not about performance, it's about loosing values.
 Simply popping off the element could throw an exception (in the 
 case
 of strings in D, you'll get a UTFException if the string is 
 malformed). So,
 _regardless_ of whether you return an element, you potentially 
 risk an
 exception being thrown (depending on the implementation of the 
 container or
 range).
It's not the same.
 And so I think that exceptions are pretty much irrelevant to the
 discussion of whether returning an element from popFront is a 
 good idea or
 not.
How would you implement an returning popFront for a generic container that does not leak values? It's not possible (at least in C++, not sure in D), so it is relevant. Maybe it was not the prime reason, but it is a good reason non the less.
Jul 04 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, July 04, 2012 22:34:46 Tobias Pankrath wrote:
 Yes. But the cost of copying the value and the cost of the
 exception are
 separate.
The argument is not about performance, it's about loosing values.
 Simply popping off the element could throw an exception (in the
 case
 of strings in D, you'll get a UTFException if the string is
 malformed). So,
 _regardless_ of whether you return an element, you potentially
 risk an
 exception being thrown (depending on the implementation of the
 container or
 range).
It's not the same.
 And so I think that exceptions are pretty much irrelevant to the
 discussion of whether returning an element from popFront is a
 good idea or
 not.
How would you implement an returning popFront for a generic container that does not leak values? It's not possible (at least in C++, not sure in D), so it is relevant. Maybe it was not the prime reason, but it is a good reason non the less.
Okay. I see what you mean now. I thought that the idea was that you would somehow avoid exceptions when popping such that the fact returning the element could generate an exception made it then possible to have exceptions be thrown from popFront, which would be a problem. Well, in most cases, I honestly don't think that that would be a problem, since if an exception is thrown from the copy constructor / postblit of the object being returned, it's probably foobared anyway, and having it on the front of the range (or container) would really buy you anything. And if you didn't actually want the element, then it doesn't really matter anyway - beyond the fact that you now have an extra way to have an exception be thrown. But it is true that you avoid that whole issue if you separate returning the element from popping it. Regardless, performance alone makes it worthwhile to not have popFront return anything IMHO, and that's the argument that I've always heard. - Jonathan M Davis
Jul 04 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 04/07/2012 22:34, Tobias Pankrath a écrit :
 Yes. But the cost of copying the value and the cost of the exception are
 separate.
The argument is not about performance, it's about loosing values.
If you implement popFront as { popFront(); // current popFront return front; } Then no value is lost with Exception. If an Exception is thrown, this is dubious anyway, because the value is likely to be irrelevant anyway.
Jul 04 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 04/07/2012 20:40, Jonathan M Davis a écrit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M Davis
If you return by reference, you get an overhead of 1 MOV instruction. This is ridiculous. You win nothing else, because the register things are returned in is a trash register, so you have the returned thing, or garbage in it, you can assume nothing else. The cost is neglectible. And any compiler is able to reduce that cost to 0 when inlining if the value is not read. If the compiler don't inline, an extra MOV instruction is not will slow you down.
Jul 04 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 07/04/2012 11:53 PM, deadalnix wrote:
 Le 04/07/2012 20:40, Jonathan M Davis a écrit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?
In C++ it was exception safety, wasn't it?
I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M Davis
If you return by reference, you get an overhead of 1 MOV instruction. This is ridiculous.
You get an 'overhead' of calling front, which is potentially unbounded. struct Map(...){ ... property auto ref front() { return f(otherRange.front); } }
Jul 04 2012
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, July 05, 2012 00:03:02 Timon Gehr wrote:
 On 07/04/2012 11:53 PM, deadalnix wrote:
 Le 04/07/2012 20:40, Jonathan M Davis a =C3=A9crit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?
=20 In C++ it was exception safety, wasn't it?
=20 I believe that it was purely a question of speed. If popFront retu=
rns an
 element, then that element gets copied, and if you didn't need to
 access the
 element, then that's wasted cycles. You have to worry about except=
ions in
 either case, depending on the what popFront is doing.
=20
 - Jonathan M Davis
=20 If you return by reference, you get an overhead of 1 MOV instructio=
n.
 This is ridiculous.
=20 You get an 'overhead' of calling front, which is potentially unbounde=
d.
=20
 struct Map(...){
      ...
       property auto ref front() { return f(otherRange.front); }
 }
Not to mention, in many cases, you _can't_ return a ref like deadalnix=20= suggests, because that would require that front be returning an lvalue,= and it=20 often isn't. And in the case of strings - which is the prime reason for= this=20 discussion in the first place - it definitely can't return an lvalue, b= ecause=20 front is calculated (that and returning a value from popFront would mak= e=20 popFront much more expensive in the case where you _don't_ actually nee= d to=20 access front, so making popFront return an element doesn't help the str= ing=20 case at all). - Jonathan M Davis
Jul 04 2012
next sibling parent travert phare.normalesup.org (Christophe Travert) writes:
If you really don't need the value, you could devise a "justPop" method 
that does not return (by the way, overloading by return type would be an 
amazing feature here). The idea is not "we should return a value 
everytime we pop", but "we should pop when we return a value".

-- 
Christophe
Jul 05 2012
prev sibling parent reply "David Piepgrass" <qwertie256 gmail.com> writes:
(grain of salt, I'm new to D.)

I'd vote for consumeFront being always available, because it's 
distinctly more convenient to call one function instead of two, 
especially when you expect that making a copy of front is cheap 
(e.g. a collection of pointers, numbers or slices). Ranges where 
'front' returns a pointer to a buffer that popFront destroys 
(overwrites) are surely in the minority, right? So I hope they 
would be retrofitted to support consumeFront.

But, given that popFront is allowed to be destructive to the 
value of front, by re-using the same buffer (and that the 
proposed consumeFront might also be implemented with 'delayed 
destruction' to front), I am wondering how one is supposed to 
implement generic code correctly when this is unacceptable, e.g.:

void append(Range1,Range2)(Range1 input, ref Range2 output)
{
	// Usually works, unless input.popFront happens to be 
destructive?
	foreach(e; input) output ~= e;
}
Jul 05 2012
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, July 05, 2012 23:59:47 David Piepgrass wrote:
 (grain of salt, I'm new to D.)
 
 I'd vote for consumeFront being always available, because it's
 distinctly more convenient to call one function instead of two,
 especially when you expect that making a copy of front is cheap
 (e.g. a collection of pointers, numbers or slices). Ranges where
 'front' returns a pointer to a buffer that popFront destroys
 (overwrites) are surely in the minority, right? So I hope they
 would be retrofitted to support consumeFront.
 
 But, given that popFront is allowed to be destructive to the
 value of front, by re-using the same buffer (and that the
 proposed consumeFront might also be implemented with 'delayed
 destruction' to front), I am wondering how one is supposed to
 implement generic code correctly when this is unacceptable, e.g.:
 
 void append(Range1,Range2)(Range1 input, ref Range2 output)
 {
 	// Usually works, unless input.popFront happens to be
 destructive?
 	foreach(e; input) output ~= e;
 }
That's generally just fine. The only ranges in Phobos which would have any problem with that would be the ones in std.stdio which reuse a buffer for front, and I wouldn't really expect other ranges to have that sort of problem unless they were doing the same thing. It mostly becomes an issue of knowing which ranges you have to be careful of that way, and being careful what you pass them to. So, I wouldn't worry about it all that much in general, but something like consumeFront would break them entirely, since if it were introduced, then presumably, it would be used fairly heavily by many range- based functions, and it's guaranteed not to work with them, whereas a situation like your code above would be fairly rare, and it would generally be clear to the user of a range like ByLine that the function that they're passing it to would be doing something like appending each element in it into a new range, which wouldn't work, so they'd take precautions, like wrapping it in a call to map which duped the elements before passing it to append. - Jonathan M Davis
Jul 05 2012
prev sibling next sibling parent reply "monarch_dodra" <monarch_dodra gmail.com> writes:
On Tuesday, 3 July 2012 at 16:37:20 UTC, Jonathan M Davis wrote:
 ...
 Oh, and if we go with this, ideally, the compiler would be 
 updated to use
 takeFront for foreach instead of front and popFront if a range 
 implements it
 (but still do it the current way if it doesn't). So, if 
 typeof(range)
 implements takeFront,

 foreach(e; range) {...}

 would then become

 for(auto _range = range; !_range.empty;)
 {
     auto e = _range.takeFront();
     ...
 }

 instead of

 for(auto _range = range; !_range.empty; _range.popFront())
 {
     auto e = _range.front();
     ...
 }

 but that's an optimization which could be added later.

 - Jonathan M Davis
Great job! I think takeFront, is a really good idea. It would also finally fix the whole "should pop return a value" debate. I'd only be afraid the name might clash (in spirit) with the existing "take", "takeOne" and "takeNone". Not so long ago, I asked for a way to get the last elements of a subrange, and proposed a method called "takeBack"... Maybe "popMoveFront" would be more addequate/Accurate? This would make it more in line with the existing "moveFront" defined inside range, As well as the "popFront" which would be inside the range implementation: It would make the trio "front, popFront, popMoveFront". I think wiring it into foreach is also a good idea, but would require compiler support of course. Personally though, regarding foreach, I'd appreciate it more if we could first get foreach with ref to work with "front assign" ( http://forum.dlang.org/thread/ceftaiklanejfhodbpix forum.dlang.org ) That said, your proposal is probably less costly to implement.
Jul 03 2012
parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
consumeFront?
Jul 03 2012
next sibling parent "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 3 July 2012 at 17:29:45 UTC, Tobias Pankrath wrote:
 consumeFront?
+1
Jul 03 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, July 03, 2012 19:29:44 Tobias Pankrath wrote:
 consumeFront?
That seems like a good suggestion. - Jonathan M Davis
Jul 03 2012
prev sibling next sibling parent reply travert phare.normalesup.org (Christophe Travert) writes:
takeFront implementation is dangerous for ranges which invalidates their 
front value when popFront is called, for instance, File.byLine. Thus 
takeFront will have to be used with care: any range implement takeFront 
(because of the template and USFC), but it may not be valid. That makes 
the range interface more complicated: There is a takeFront property, but 
you have to check it is safe to use... how do you check that by the way?

Are there so many ranges that duplicate efforts in front and popFront, 
where there is no easy workarround? Would it be such an improvement in 
performance to add takeFront? strings being immutable, my guess would be 
that it is not that hard for the compiler to optimize a foreach loop (it 
should be easy to profile this...). Even if there is an efficiency 
issue, you could easily solve it by implementing a range wrapper or an 
opApply method to make foreach faster.

-- 
Christophe
Jul 03 2012
next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which invalidates their
 front value when popFront is called, for instance, File.byLine. Thus
 takeFront will have to be used with care: any range implement takeFront
 (because of the template and USFC), but it may not be valid. That makes
 the range interface more complicated: There is a takeFront property, but
 you have to check it is safe to use... how do you check that by the way?
Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
 Are there so many ranges that duplicate efforts in front and popFront,
 where there is no easy workarround? Would it be such an improvement in
 performance to add takeFront? strings being immutable, my guess would be
 that it is not that hard for the compiler to optimize a foreach loop (it
 should be easy to profile this...). Even if there is an efficiency
 issue, you could easily solve it by implementing a range wrapper or an
 opApply method to make foreach faster. 
foreach isn't the problem. strings don't even use the range API for foreach. It's operating on them outside of foreach that's the problem. - Jonathan M Davis
Jul 03 2012
next sibling parent reply "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range implement 
 takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?
Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it.
Just an idea (feels dirty, but still): such range could redefine takeFront, and maintain state. takeFront would defer a call to popFront until the next operation. All of front, popFront, takeFront and empty would check state (whether popFront has been deferred), and execute popFront first. Side effect would be that a call to empty would invalidate result of takeFront, but that could be changed also: empty instead of calling popFront would check whether there is anything else after the front element. This is much more complicated, but would only be necessary for such containers.
Jul 03 2012
parent reply travert phare.normalesup.org (Christophe Travert) writes:
Range have been designed with the idea that front is valid until next 
call to popFront. If popFront was to be called right after front, then 
it would just be a popFront that returns a value, and maybe a justPop or 
something if you don't want to copy the value.

It's delicate to come now and decide that if a previously returned front 
value is no longer valid after a call to popFront, it should be 
documented by an enum. Who is going to review all libraries (written and 
to come) to make sure the enum is placed when it should be? The reverse 
should be done instead: if you want you range to be optimized by calling 
takeFront, define something (for example... takeFront). Then use 
algorithms that are specialized for takeFront.
Jul 03 2012
parent reply "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 3 July 2012 at 18:12:54 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:
 Range have been designed with the idea that front is valid 
 until next
 call to popFront. If popFront was to be called right after 
 front, then
 it would just be a popFront that returns a value, and maybe a 
 justPop or
 something if you don't want to copy the value.

 It's delicate to come now and decide that if a previously 
 returned front
 value is no longer valid after a call to popFront, it should be
 documented by an enum. Who is going to review all libraries 
 (written and
 to come) to make sure the enum is placed when it should be? The 
 reverse
 should be done instead: if you want you range to be optimized 
 by calling
 takeFront, define something (for example... takeFront). Then use
 algorithms that are specialized for takeFront.
takeFront -> consumeFront hasConsume has been proposed by Jonathan already. But not having to modify all existing ranges which invalidate value returned by front might be a good argument against my hack also. However I'm not sure there are really many of such ranges. Those must return something with reference semantics in order to be able to invalidate it. Most common case would be when front value is always stored in the same memory (a buffer). The benefit of my hacking idea is that clients would need no special casing.
Jul 03 2012
parent "Roman D. Boiko" <rb d-coding.com> writes:
Still, ranges which invalidate front on popFront could defer that 
(store a flag that popFront or consumeFront has been called and 
invalidate value previously returned from front in the next call 
to front). That would be intrusive with respect to such ranges, 
and a breaking change. But it would simplify client code 
significantly.

(This doesn't mean that I'm insisting, or strongly prefer some of 
my two ideas, just wanted to clearly restate the second one.)
Jul 03 2012
prev sibling parent "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range implement 
 takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?
Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
An alternative is not to define a free function. So if consumeFront is defined, it may be used by the algorithm, otherwise simply call to front and popFront in the algorithm. If branching is necessary, then the is no value in a free function (it wouldn't optimize for speed any way).
Jul 03 2012
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which invalidates their
 front value when popFront is called, for instance, File.byLine. Thus
 takeFront will have to be used with care: any range implement takeFront
 (because of the template and USFC), but it may not be valid. That makes
 the range interface more complicated: There is a takeFront property, but
 you have to check it is safe to use... how do you check that by the way?
Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
An alternative would be to make it so that takeFront (or consumeFront, as someone suggested, since that's a better name) would be defined only for ranges which it helps. So, similar to hasSlicing, you get something like hasConsume, and then a range-based function which wants to use consumeFront or consumeBack checks hasConsume!R and uses it on that particular static if branch if it does and uses another branch with front and popFront if it doesn't. std.range.consumeFront then works only with strings (and maybe arrays). However, it seemed to me that a major upside of consumeFront as I proposed it was that you could always just use it, and it would do the most efficient thing, so you _didn't_ have to special case for it, whereas with hasConsume, you would. But if you can't use consumeFront safely on all ranges, then that doesn't really work. It's still worth something but not as much. You can already special case strings (and in many cases need to). What consumeFront does is make it so that functions operating on wrappers around strings (i.e. the results of functions like filter or map) can operate on them more efficiently as well. It also makes it so that user-defined ranges can get that benefit from algorithms that know nothing about those ranges and therefore don't special case them (e.g. std.algorithm functions can then be more efficient for user-defined ranges which can define a more efficient consumeFront). So, even if we had hasConsume and functions had to special case in order to use consumeFront or consumeBack, it would still be beneficial. However, it would be nice to have a solution which _didn't_ require special casing. - Jonathan M Davis
Jul 03 2012
parent "Roman D. Boiko" <rb d-coding.com> writes:
On Tuesday, 3 July 2012 at 18:00:55 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range 
 implement takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?
Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
An alternative would be to make it so that takeFront (or consumeFront, as someone suggested, since that's a better name) would be defined only for ranges which it helps. So, similar to hasSlicing, you get something like hasConsume, and then a range-based function which wants to use consumeFront or consumeBack checks hasConsume!R and uses it on that particular static if branch if it does and uses another branch with front and popFront if it doesn't. std.range.consumeFront then works only with strings (and maybe arrays). However, it seemed to me that a major upside of consumeFront as I proposed it was that you could always just use it, and it would do the most efficient thing, so you _didn't_ have to special case for it, whereas with hasConsume, you would. But if you can't use consumeFront safely on all ranges, then that doesn't really work. It's still worth something but not as much. You can already special case strings (and in many cases need to). What consumeFront does is make it so that functions operating on wrappers around strings (i.e. the results of functions like filter or map) can operate on them more efficiently as well. It also makes it so that user-defined ranges can get that benefit from algorithms that know nothing about those ranges and therefore don't special case them (e.g. std.algorithm functions can then be more efficient for user-defined ranges which can define a more efficient consumeFront). So, even if we had hasConsume and functions had to special case in order to use consumeFront or consumeBack, it would still be beneficial. However, it would be nice to have a solution which _didn't_ require special casing. - Jonathan M Davis
You bet me by 19 seconds and a better explanation :) So far we have two options: cleaner code because there would be no need to hack containers which invalidate their front value on a call to popFront, or a hack which I proposed above, but localized in such containers. By the way, some of such containers could potentially defer invalidating front till the next call to front.
Jul 03 2012
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 03-Jul-12 20:37, Jonathan M Davis wrote:
 This seems like it probably merits a bit of discussion, so I'm bringing it up
 here rather than simply opening a pull request.

 At present, for some ranges (variably-lengthed ranges such as strings in
 particular), calling front incurs a cost which popFront at least partially
 duplicates. So, the range primitives are inherently inefficient in that they
 force you to incur that extra cost as you iterate over the range. Ideally,
 there would be a way to get front and pop it off at the same time so that you
 incur the cost only once (either that or have front cache its result in a way
 that lets it avoid the extra cost in popFront when popFront is called - though
 that wouldn't work with strings, since for them, the range primitives are free
 functions). So, I'm proposing takeFront and takeBack:
I was about to propose fetchFront/fetchBack with similar semantics. Thanks for pushing this proposal it looks good. My initial intent however was to add Variable Length Range as a concept. ... Now I think binary predicate hasFetch a-la hasSlicing is better.
 https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

 For most ranges, takeFront does this:

 auto takeFront(R)(ref R range)
      if(isInputRange!R && !isNarrowString!R)
 {
      assert(!range.empty);
      auto retval = range.front;
      range.popFront();
      return retval;
 }
Aye. Yet there is indeed a problem with pseudo-ranges that reuse 'slot' for front on each popFront. Well they always been brittle.
 So, for strings, it'll be more efficient to use takeFront than calling front
and
 popFront separately. The idea then is that any user-defined range which can
 implement takeFront more efficiently than the default will define it. Then
range-
 based functions use takeFront - e.g. range.takeFront() - and if the user-
 defined range implements a more efficient version, that one is used and they
gain
 the extra efficiency, or if they don't, then the free function is used with
 UFCS, and they incur the same cost that they'd incur calling front and
 popFront separately. So, it's invisible to range-based functions whether a
 range actually implements takeFront itself. takeBack does the same thing as
 takeFront but it does it with back and popBack for bidirectional ranges.

 I _think_ that this is a fairly clean solution to the problem, but someone
 else might be able to point out why this is a bad idea, or they might have a
 better idea. And this will have a definite impact on how ranges are normally
 used if we add this, so I'm bringing it up here for discussion. Opinions?
 Thoughts? Insights?

 Oh, and if we go with this, ideally, the compiler would be updated to use
 takeFront for foreach instead of front and popFront if a range implements it
 (but still do it the current way if it doesn't). So, if typeof(range)
 implements takeFront,
Right. In fact as we've seen above with e.g. stdin.byLine not every range can do fetchFront correctly so it's a strict subset. That was the reason for current trio of basic operations BTW. -- Dmitry Olshansky
Jul 03 2012