digitalmars.D - Proposal: takeFront and takeBack

Jonathan M Davis (69/69) Jul 03 2012 This seems like it probably merits a bit of discussion, so I'm bringing ...

Roman D. Boiko (2/2) Jul 03 2012 The name takeFront is too similar to take, which has very

Jonathan M Davis (6/8) Jul 03 2012 I have similar concerns, but takeFront and takeBack were the best that I...

Wouter Verhelst (15/36) Jul 03 2012 Couldn't you just overload popFront?

monarch_dodra (6/21) Jul 03 2012 You can't overload by return value, so that is not possible.

bearophile (9/14) Jul 03 2012 I expect a language to have a pop() in its collections, that
deadalnix (3/24) Jul 04 2012 Many languages does this (it doesn't mean it is the right thing to do).

Tobias Pankrath (1/3) Jul 04 2012 In C++ it was exception safety, wasn't it?

Jonathan M Davis (6/10) Jul 04 2012 I believe that it was purely a question of speed. If popFront returns an...

Tobias Pankrath (4/18) Jul 04 2012 If you pop from a container and return this the copy constructor

Jonathan M Davis (10/31) Jul 04 2012 Yes. But the cost of copying the value and the cost of the exception are...

Tobias Pankrath (7/23) Jul 04 2012 How would you implement an returning popFront for a generic

Jonathan M Davis (16/44) Jul 04 2012 Okay. I see what you mean now. I thought that the idea was that you woul...
deadalnix (8/11) Jul 04 2012 If you implement popFront as

deadalnix (9/19) Jul 04 2012 If you return by reference, you get an overhead of 1 MOV instruction.

Timon Gehr (6/22) Jul 04 2012 You get an 'overhead' of calling front, which is potentially unbounded.

Jonathan M Davis (20/45) Jul 04 2012 ions in

travert phare.normalesup.org (Christophe Travert) (6/6) Jul 05 2012 If you really don't need the value, you could devise a "justPop" method
David Piepgrass (19/19) Jul 05 2012 (grain of salt, I'm new to D.)

Jonathan M Davis (16/38) Jul 05 2012 That's generally just fine. The only ranges in Phobos which would have a...

monarch_dodra (18/41) Jul 03 2012 Great job! I think takeFront, is a really good idea. It would

Tobias Pankrath (1/1) Jul 03 2012 consumeFront?

Roman D. Boiko (2/3) Jul 03 2012 +1
Jonathan M Davis (3/4) Jul 03 2012 That seems like a good suggestion.

travert phare.normalesup.org (Christophe Travert) (15/15) Jul 03 2012 takeFront implementation is dangerous for ranges which invalidates their...

Jonathan M Davis (13/26) Jul 03 2012 Hmm. I hadn't thought of that. That could be a good reason not to do thi...

Roman D. Boiko (11/38) Jul 03 2012 Just an idea (feels dirty, but still): such range could redefine

travert phare.normalesup.org (Christophe Travert) (11/11) Jul 03 2012 Range have been designed with the idea that front is valid until next

Roman D. Boiko (13/31) Jul 03 2012 takeFront -> consumeFront

Roman D. Boiko (8/8) Jul 03 2012 Still, ranges which invalidate front on popFront could defer that

Roman D. Boiko (6/36) Jul 03 2012 An alternative is not to define a free function. So if

Jonathan M Davis (24/43) Jul 03 2012 An alternative would be to make it so that takeFront (or consumeFront, a...

Roman D. Boiko (8/84) Jul 03 2012 You bet me by 19 seconds and a better explanation :)

Dmitry Olshansky (12/52) Jul 03 2012 I was about to propose fetchFront/fetchBack with similar semantics.

Jonathan M Davis <jmdavisProg gmx.com> writes:

This seems like it probably merits a bit of discussion, so I'm bringing it up 
here rather than simply opening a pull request.

At present, for some ranges (variably-lengthed ranges such as strings in 
particular), calling front incurs a cost which popFront at least partially 
duplicates. So, the range primitives are inherently inefficient in that they 
force you to incur that extra cost as you iterate over the range. Ideally, 
there would be a way to get front and pop it off at the same time so that you 
incur the cost only once (either that or have front cache its result in a way 
that lets it avoid the extra cost in popFront when popFront is called - though 
that wouldn't work with strings, since for them, the range primitives are free 
functions). So, I'm proposing takeFront and takeBack:

https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

For most ranges, takeFront does this:

auto takeFront(R)(ref R range)
    if(isInputRange!R && !isNarrowString!R)
{
    assert(!range.empty);
    auto retval = range.front;
    range.popFront();
    return retval;
}

So, it's pretty much the same cost as using front and popFront separately 
(whether it costs more or less probably depends on the exact code and 
optimizations, but it should be comparable). But for strings, it looks like 
this

auto takeFront(R)(ref R range)
    if(isNarrowString!R)
{
    import std.utf;
    assert(!range.empty);
    size_t index = 0;
    auto retval = decode(range, index);
    range = range[index .. $];
    return retval;
}

So, for strings, it'll be more efficient to use takeFront than calling front
and 
popFront separately. The idea then is that any user-defined range which can 
implement takeFront more efficiently than the default will define it. Then
range-
based functions use takeFront - e.g. range.takeFront() - and if the user-
defined range implements a more efficient version, that one is used and they
gain 
the extra efficiency, or if they don't, then the free function is used with 
UFCS, and they incur the same cost that they'd incur calling front and 
popFront separately. So, it's invisible to range-based functions whether a 
range actually implements takeFront itself. takeBack does the same thing as 
takeFront but it does it with back and popBack for bidirectional ranges.

I _think_ that this is a fairly clean solution to the problem, but someone 
else might be able to point out why this is a bad idea, or they might have a 
better idea. And this will have a definite impact on how ranges are normally 
used if we add this, so I'm bringing it up here for discussion. Opinions? 
Thoughts? Insights?

Oh, and if we go with this, ideally, the compiler would be updated to use 
takeFront for foreach instead of front and popFront if a range implements it 
(but still do it the current way if it doesn't). So, if typeof(range) 
implements takeFront,

foreach(e; range) {...}

would then become

for(auto _range = range; !_range.empty;)
{
    auto e = _range.takeFront();
    ...
}

instead of

for(auto _range = range; !_range.empty; _range.popFront())
{
    auto e = _range.front();
    ...
}

but that's an optimization which could be added later.

- Jonathan M Davis

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

The name takeFront is too similar to take, which has very 
different semantics.

Jul 03 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, July 03, 2012 19:17:40 Roman D. Boiko wrote:
 The name takeFront is too similar to take, which has very
 different semantics.

I have similar concerns, but takeFront and takeBack were the best that I could 
come up with. frontPopFront or retPopFront and the like would be horrible. So, 
suggestions are welcome. The concept is more important than the name, and if 
someone can come up with a better name, then I'm all for it.

- Jonathan M Davis

Jul 03 2012

Wouter Verhelst <wouter grep.be> writes:

Jonathan M Davis <jmdavisProg gmx.com> writes:

 This seems like it probably merits a bit of discussion, so I'm bringing it up 
 here rather than simply opening a pull request.

 At present, for some ranges (variably-lengthed ranges such as strings in 
 particular), calling front incurs a cost which popFront at least partially 
 duplicates. So, the range primitives are inherently inefficient in that they 
 force you to incur that extra cost as you iterate over the range. Ideally, 
 there would be a way to get front and pop it off at the same time so that you 
 incur the cost only once (either that or have front cache its result in a way 
 that lets it avoid the extra cost in popFront when popFront is called - though 
 that wouldn't work with strings, since for them, the range primitives are free 
 functions). So, I'm proposing takeFront and takeBack:

 https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

 For most ranges, takeFront does this:

 auto takeFront(R)(ref R range)
     if(isInputRange!R && !isNarrowString!R)
 {
     assert(!range.empty);
     auto retval = range.front;
     range.popFront();
     return retval;
 }

Couldn't you just overload popFront?

have a void popFront which throws off the first element without
returning anything, and an auto popFront which does return data.

I'd always been taught that "pop" means "read a bit and chop it off",
which means that having to first read the front and then pop it off
(i.e., in two separate methods) feels rather counterintuitive to
me. That's in addition to the fact that yes, there's a performance
issue.

But hey, I've only been doing this D thing for a few weeks, so feel free
to ignore me if I'm not making any sense :-)

-- 
The volume of a pizza of thickness a and radius z can be described by
the following formula:

pi zz a

Jul 03 2012

"monarch_dodra" <monarch_dodra gmail.com> writes:

On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:

 Couldn't you just overload popFront?

 have a void popFront which throws off the first element without
 returning anything, and an auto popFront which does return data.

 I'd always been taught that "pop" means "read a bit and chop it 
 off",
 which means that having to first read the front and then pop it 
 off
 (i.e., in two separate methods) feels rather counterintuitive to
 me. That's in addition to the fact that yes, there's a 
 performance
 issue.

 But hey, I've only been doing this D thing for a few weeks, so 
 feel free
 to ignore me if I'm not making any sense :-)

You can't overload by return value, so that is not possible.

As far as I can recall, I've always been taught that pop does NOT 
(should not) return a value. Rationale being it makes you pay for 
a read/copy you may not have asked for. That's the way C++ does 
it, and is what I've come to expect from any language.

Jul 03 2012

"bearophile" <bearophileHUGS lycos.com> writes:

monarch_dodra:

 As far as I can recall, I've always been taught that pop does 
 NOT (should not) return a value. Rationale being it makes you 
 pay for a read/copy you may not have asked for.

I think it's also a matter of exception safety.


 That's the way C++ does it, and is what I've come to expect 
 from any language.

I expect a language to have a pop() in its collections, that 
returns the first item and removes it from the collection, as in 
Python. In D sometimes I put the front and popfront on the same 
line of code, because I think of them almost as a single 
operation :-)

Bye,
bearophile

Jul 03 2012

deadalnix <deadalnix gmail.com> writes:

Le 03/07/2012 19:27, monarch_dodra a écrit :
 On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:
 Jonathan M Davis <jmdavisProg gmx.com> writes:

 Couldn't you just overload popFront?

 have a void popFront which throws off the first element without
 returning anything, and an auto popFront which does return data.

 I'd always been taught that "pop" means "read a bit and chop it off",
 which means that having to first read the front and then pop it off
 (i.e., in two separate methods) feels rather counterintuitive to
 me. That's in addition to the fact that yes, there's a performance
 issue.

 But hey, I've only been doing this D thing for a few weeks, so feel free
 to ignore me if I'm not making any sense :-)

 You can't overload by return value, so that is not possible.

 As far as I can recall, I've always been taught that pop does NOT
 (should not) return a value. Rationale being it makes you pay for a
 read/copy you may not have asked for. That's the way C++ does it, and is
 what I've come to expect from any language.

Many languages does this (it doesn't mean it is the right thing to do). 
Do you know why this shouldn't be done ?

Jul 04 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

 Many languages does this (it doesn't mean it is the right thing 
 to do). Do you know why this shouldn't be done ?

In C++ it was exception safety, wasn't it?

Jul 04 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?

 
 In C++ it was exception safety, wasn't it?

I believe that it was purely a question of speed. If popFront returns an 
element, then that element gets copied, and if you didn't need to access the 
element, then that's wasted cycles. You have to worry about exceptions in 
either case, depending on the what popFront is doing.

- Jonathan M Davis

Jul 04 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right 
 thing
 to do). Do you know why this shouldn't be done ?

 
 In C++ it was exception safety, wasn't it?

 I believe that it was purely a question of speed. If popFront 
 returns an
 element, then that element gets copied, and if you didn't need 
 to access the
 element, then that's wasted cycles. You have to worry about 
 exceptions in
 either case, depending on the what popFront is doing.

 - Jonathan M Davis

If you pop from a container and return this the copy constructor 
/ postblit will run. If in this moment a exception is thrown, 
than this value is lost.

Jul 04 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, July 04, 2012 22:10:36 Tobias Pankrath wrote:
 On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right
 thing
 to do). Do you know why this shouldn't be done ?

 
 In C++ it was exception safety, wasn't it?

 
 I believe that it was purely a question of speed. If popFront
 returns an
 element, then that element gets copied, and if you didn't need
 to access the
 element, then that's wasted cycles. You have to worry about
 exceptions in
 either case, depending on the what popFront is doing.
 
 - Jonathan M Davis

 
 If you pop from a container and return this the copy constructor
 / postblit will run. If in this moment a exception is thrown,
 than this value is lost.

Yes. But the cost of copying the value and the cost of the exception are 
separate. Simply popping off the element could throw an exception (in the case 
of strings in D, you'll get a UTFException if the string is malformed). So, 
_regardless_ of whether you return an element, you potentially risk an 
exception being thrown (depending on the implementation of the container or 
range). And so I think that exceptions are pretty much irrelevant to the 
discussion of whether returning an element from popFront is a good idea or 
not.

- Jonathan M Davis

Jul 04 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

 Yes. But the cost of copying the value and the cost of the 
 exception are
 separate.

The argument is not about performance, it's about loosing values.

 Simply popping off the element could throw an exception (in the 
 case
 of strings in D, you'll get a UTFException if the string is 
 malformed). So,
 _regardless_ of whether you return an element, you potentially 
 risk an
 exception being thrown (depending on the implementation of the 
 container or
 range).

It's not the same.

 And so I think that exceptions are pretty much irrelevant to the
 discussion of whether returning an element from popFront is a 
 good idea or
 not.

How would you implement an returning popFront for a generic 
container that
does not leak values? It's not possible (at least in C++, not 
sure in D), so it is relevant. Maybe it was not the prime reason, 
but it is a good reason non the less.

Jul 04 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, July 04, 2012 22:34:46 Tobias Pankrath wrote:
 Yes. But the cost of copying the value and the cost of the
 exception are
 separate.

 
 The argument is not about performance, it's about loosing values.
 
 Simply popping off the element could throw an exception (in the
 case
 of strings in D, you'll get a UTFException if the string is
 malformed). So,
 _regardless_ of whether you return an element, you potentially
 risk an
 exception being thrown (depending on the implementation of the
 container or
 range).

 
 It's not the same.
 
 And so I think that exceptions are pretty much irrelevant to the
 discussion of whether returning an element from popFront is a
 good idea or
 not.

 
 How would you implement an returning popFront for a generic
 container that
 does not leak values? It's not possible (at least in C++, not
 sure in D), so it is relevant. Maybe it was not the prime reason,
 but it is a good reason non the less.

Okay. I see what you mean now. I thought that the idea was that you would 
somehow avoid exceptions when popping such that the fact returning the element 
could generate an exception made it then possible to have exceptions be thrown 
from popFront, which would be a problem.

Well, in most cases, I honestly don't think that that would be a problem, 
since if an exception is thrown from the copy constructor / postblit of the 
object being returned, it's probably foobared anyway, and having it on the 
front of the range (or container) would really buy you anything. And if you 
didn't actually want the element, then it doesn't really matter anyway - 
beyond the fact that you now have an extra way to have an exception be thrown. 
But it is true that you avoid that whole issue if you separate returning the 
element from popping it. Regardless, performance alone makes it worthwhile to 
not have popFront return anything IMHO, and that's the argument that I've 
always heard.

- Jonathan M Davis

Jul 04 2012

deadalnix <deadalnix gmail.com> writes:

Le 04/07/2012 22:34, Tobias Pankrath a écrit :
 Yes. But the cost of copying the value and the cost of the exception are
 separate.

 The argument is not about performance, it's about loosing values.

If you implement popFront as
{
	popFront();	// current popFront
	return front;
}

Then no value is lost with Exception. If an Exception is thrown, this is 
dubious anyway, because the value is likely to be irrelevant anyway.

Jul 04 2012

deadalnix <deadalnix gmail.com> writes:

Le 04/07/2012 20:40, Jonathan M Davis a écrit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?

 In C++ it was exception safety, wasn't it?

 I believe that it was purely a question of speed. If popFront returns an
 element, then that element gets copied, and if you didn't need to access the
 element, then that's wasted cycles. You have to worry about exceptions in
 either case, depending on the what popFront is doing.

 - Jonathan M Davis

If you return by reference, you get an overhead of 1 MOV instruction. 
This is ridiculous.

You win nothing else, because the register things are returned in is a 
trash register, so you have the returned thing, or garbage in it, you 
can assume nothing else.

The cost is neglectible. And any compiler is able to reduce that cost to 
0 when inlining if the value is not read. If the compiler don't inline, 
an extra MOV instruction is not will slow you down.

Jul 04 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 07/04/2012 11:53 PM, deadalnix wrote:
 Le 04/07/2012 20:40, Jonathan M Davis a écrit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?

 In C++ it was exception safety, wasn't it?

 I believe that it was purely a question of speed. If popFront returns an
 element, then that element gets copied, and if you didn't need to
 access the
 element, then that's wasted cycles. You have to worry about exceptions in
 either case, depending on the what popFront is doing.

 - Jonathan M Davis

 If you return by reference, you get an overhead of 1 MOV instruction.
 This is ridiculous.

You get an 'overhead' of calling front, which is potentially unbounded.

struct Map(...){
     ...
      property auto ref front() { return f(otherRange.front); }
}

Jul 04 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, July 05, 2012 00:03:02 Timon Gehr wrote:
 On 07/04/2012 11:53 PM, deadalnix wrote:
 Le 04/07/2012 20:40, Jonathan M Davis a =C3=A9crit :
 On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:
 Many languages does this (it doesn't mean it is the right thing
 to do). Do you know why this shouldn't be done ?

=20
 In C++ it was exception safety, wasn't it?

=20
 I believe that it was purely a question of speed. If popFront retu=



rns an
 element, then that element gets copied, and if you didn't need to
 access the
 element, then that's wasted cycles. You have to worry about except=



ions in
 either case, depending on the what popFront is doing.
=20
 - Jonathan M Davis

=20
 If you return by reference, you get an overhead of 1 MOV instructio=


n.
 This is ridiculous.

=20
 You get an 'overhead' of calling front, which is potentially unbounde=

d.
=20
 struct Map(...){
      ...
       property auto ref front() { return f(otherRange.front); }
 }

Not to mention, in many cases, you _can't_ return a ref like deadalnix=20=

suggests, because that would require that front be returning an lvalue,=
 and it=20
often isn't. And in the case of strings - which is the prime reason for=
 this=20
discussion in the first place - it definitely can't return an lvalue, b=
ecause=20
front is calculated (that and returning a value from popFront would mak=
e=20
popFront much more expensive in the case where you _don't_ actually nee=
d to=20
access front, so making popFront return an element doesn't help the str=
ing=20
case at all).

- Jonathan M  Davis

Jul 04 2012

travert phare.normalesup.org (Christophe Travert) writes:

If you really don't need the value, you could devise a "justPop" method 
that does not return (by the way, overloading by return type would be an 
amazing feature here). The idea is not "we should return a value 
everytime we pop", but "we should pop when we return a value".

-- 
Christophe

Jul 05 2012

"David Piepgrass" <qwertie256 gmail.com> writes:

(grain of salt, I'm new to D.)

I'd vote for consumeFront being always available, because it's 
distinctly more convenient to call one function instead of two, 
especially when you expect that making a copy of front is cheap 
(e.g. a collection of pointers, numbers or slices). Ranges where 
'front' returns a pointer to a buffer that popFront destroys 
(overwrites) are surely in the minority, right? So I hope they 
would be retrofitted to support consumeFront.

But, given that popFront is allowed to be destructive to the 
value of front, by re-using the same buffer (and that the 
proposed consumeFront might also be implemented with 'delayed 
destruction' to front), I am wondering how one is supposed to 
implement generic code correctly when this is unacceptable, e.g.:

void append(Range1,Range2)(Range1 input, ref Range2 output)
{
	// Usually works, unless input.popFront happens to be 
destructive?
	foreach(e; input) output ~= e;
}

Jul 05 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, July 05, 2012 23:59:47 David Piepgrass wrote:
 (grain of salt, I'm new to D.)
 
 I'd vote for consumeFront being always available, because it's
 distinctly more convenient to call one function instead of two,
 especially when you expect that making a copy of front is cheap
 (e.g. a collection of pointers, numbers or slices). Ranges where
 'front' returns a pointer to a buffer that popFront destroys
 (overwrites) are surely in the minority, right? So I hope they
 would be retrofitted to support consumeFront.
 
 But, given that popFront is allowed to be destructive to the
 value of front, by re-using the same buffer (and that the
 proposed consumeFront might also be implemented with 'delayed
 destruction' to front), I am wondering how one is supposed to
 implement generic code correctly when this is unacceptable, e.g.:
 
 void append(Range1,Range2)(Range1 input, ref Range2 output)
 {
 	// Usually works, unless input.popFront happens to be
 destructive?
 	foreach(e; input) output ~= e;
 }

That's generally just fine. The only ranges in Phobos which would have any 
problem with that would be the ones in std.stdio which reuse a buffer for 
front, and I wouldn't really expect other ranges to have that sort of problem 
unless they were doing the same thing. It mostly becomes an issue of knowing 
which ranges you have to be careful of that way, and being careful what you 
pass them to. So, I wouldn't worry about it all that much in general, but 
something like consumeFront would break them entirely, since if it were 
introduced, then presumably, it would be used fairly heavily by many range-
based functions, and it's guaranteed not to work with them, whereas a 
situation like your code above would be fairly rare, and it would generally be 
clear to the user of a range like ByLine that the function that they're 
passing it to would be doing something like appending each element in it into 
a new range, which wouldn't work, so they'd take precautions, like wrapping it 
in a call to map which duped the elements before passing it to append.

- Jonathan M Davis

Jul 05 2012

"monarch_dodra" <monarch_dodra gmail.com> writes:

On Tuesday, 3 July 2012 at 16:37:20 UTC, Jonathan M Davis wrote:
 ...
 Oh, and if we go with this, ideally, the compiler would be 
 updated to use
 takeFront for foreach instead of front and popFront if a range 
 implements it
 (but still do it the current way if it doesn't). So, if 
 typeof(range)
 implements takeFront,

 foreach(e; range) {...}

 would then become

 for(auto _range = range; !_range.empty;)
 {
     auto e = _range.takeFront();
     ...
 }

 instead of

 for(auto _range = range; !_range.empty; _range.popFront())
 {
     auto e = _range.front();
     ...
 }

 but that's an optimization which could be added later.

 - Jonathan M Davis

Great job! I think takeFront, is a really good idea. It would 
also finally fix the whole "should pop return a value" debate.

I'd only be afraid the name might clash (in spirit) with the 
existing "take", "takeOne" and "takeNone". Not so long ago, I 
asked for a way to get the last elements of a subrange, and 
proposed a method called "takeBack"...

Maybe "popMoveFront" would be more addequate/Accurate? This would 
make it more in line with the existing "moveFront" defined inside 
range, As well as the "popFront" which would be inside the range 
implementation: It would make the trio "front, popFront, 
popMoveFront".

I think wiring it into foreach is also a good idea, but would 
require compiler support of course. Personally though, regarding 
foreach, I'd appreciate it more if we could first get foreach 
with ref to work with "front assign" ( 
http://forum.dlang.org/thread/ceftaiklanejfhodbpix forum.dlang.org 
) That said, your proposal is probably less costly to implement.

Jul 03 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

consumeFront?

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 3 July 2012 at 17:29:45 UTC, Tobias Pankrath wrote:
 consumeFront?

+1

Jul 03 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, July 03, 2012 19:29:44 Tobias Pankrath wrote:
 consumeFront?

That seems like a good suggestion.

- Jonathan M Davis

Jul 03 2012

travert phare.normalesup.org (Christophe Travert) writes:

takeFront implementation is dangerous for ranges which invalidates their 
front value when popFront is called, for instance, File.byLine. Thus 
takeFront will have to be used with care: any range implement takeFront 
(because of the template and USFC), but it may not be valid. That makes 
the range interface more complicated: There is a takeFront property, but 
you have to check it is safe to use... how do you check that by the way?

Are there so many ranges that duplicate efforts in front and popFront, 
where there is no easy workarround? Would it be such an improvement in 
performance to add takeFront? strings being immutable, my guess would be 
that it is not that hard for the compiler to optimize a foreach loop (it 
should be easy to profile this...). Even if there is an efficiency 
issue, you could easily solve it by implementing a range wrapper or an 
opApply method to make foreach faster.

-- 
Christophe

Jul 03 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which invalidates their
 front value when popFront is called, for instance, File.byLine. Thus
 takeFront will have to be used with care: any range implement takeFront
 (because of the template and USFC), but it may not be valid. That makes
 the range interface more complicated: There is a takeFront property, but
 you have to check it is safe to use... how do you check that by the way?

Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm 
not quite sure how to get around that. Hmmm... It would arguably be a bit 
ugly, but a range which couldn't safely be used with takeFront could have an 
enum on it which indicated that, and takeFront would fail to compile if used 
with such a range. Of course, that would mean that you'd have to special case 
such ranges in any range-based function which used takeFront so that there was 
a branch which didn't use takeFront for ranges which couldn't use it.

I don't know. It's certainly something to think about. We may need a different 
solution.

 Are there so many ranges that duplicate efforts in front and popFront,
 where there is no easy workarround? Would it be such an improvement in
 performance to add takeFront? strings being immutable, my guess would be
 that it is not that hard for the compiler to optimize a foreach loop (it
 should be easy to profile this...). Even if there is an efficiency
 issue, you could easily solve it by implementing a range wrapper or an
 opApply method to make foreach faster. 

foreach isn't the problem. strings don't even use the range API for foreach. 
It's operating on them outside of foreach that's the problem.

- Jonathan M Davis

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range implement 
 takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?

 Hmm. I hadn't thought of that. That could be a good reason not 
 to do this. I'm
 not quite sure how to get around that. Hmmm... It would 
 arguably be a bit
 ugly, but a range which couldn't safely be used with takeFront 
 could have an
 enum on it which indicated that, and takeFront would fail to 
 compile if used
 with such a range. Of course, that would mean that you'd have 
 to special case
 such ranges in any range-based function which used takeFront so 
 that there was
 a branch which didn't use takeFront for ranges which couldn't 
 use it.

Just an idea (feels dirty, but still): such range could redefine 
takeFront, and maintain state. takeFront would defer a call to 
popFront until the next operation. All of front, popFront, 
takeFront and empty would check state (whether popFront has been 
deferred), and execute popFront first. Side effect would be that 
a call to empty would invalidate result of takeFront, but that 
could be changed also: empty instead of calling popFront would 
check whether there is anything else after the front element.

This is much more complicated, but would only be necessary for 
such containers.

Jul 03 2012

travert phare.normalesup.org (Christophe Travert) writes:

Range have been designed with the idea that front is valid until next 
call to popFront. If popFront was to be called right after front, then 
it would just be a popFront that returns a value, and maybe a justPop or 
something if you don't want to copy the value.

It's delicate to come now and decide that if a previously returned front 
value is no longer valid after a call to popFront, it should be 
documented by an enum. Who is going to review all libraries (written and 
to come) to make sure the enum is placed when it should be? The reverse 
should be done instead: if you want you range to be optimized by calling 
takeFront, define something (for example... takeFront). Then use 
algorithms that are specialized for takeFront.

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 3 July 2012 at 18:12:54 UTC, 
travert phare.normalesup.org (Christophe Travert) wrote:
 Range have been designed with the idea that front is valid 
 until next
 call to popFront. If popFront was to be called right after 
 front, then
 it would just be a popFront that returns a value, and maybe a 
 justPop or
 something if you don't want to copy the value.

 It's delicate to come now and decide that if a previously 
 returned front
 value is no longer valid after a call to popFront, it should be
 documented by an enum. Who is going to review all libraries 
 (written and
 to come) to make sure the enum is placed when it should be? The 
 reverse
 should be done instead: if you want you range to be optimized 
 by calling
 takeFront, define something (for example... takeFront). Then use
 algorithms that are specialized for takeFront.

takeFront -> consumeFront

hasConsume has been proposed by Jonathan already.

But not having to modify all existing ranges which invalidate 
value returned by front might be a good argument against my hack 
also.

However I'm not sure there are really many of such ranges. Those 
must return something with reference semantics in order to be 
able to invalidate it. Most common case would be when front value 
is always stored in the same memory (a buffer).

The benefit of my hacking idea is that clients would need no 
special casing.

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

Still, ranges which invalidate front on popFront could defer that 
(store a flag that popFront or consumeFront has been called and 
invalidate value previously returned from front in the next call 
to front). That would be intrusive with respect to such ranges, 
and a breaking change. But it would simplify client code 
significantly.

(This doesn't mean that I'm insisting, or strongly prefer some of 
my two ideas, just wanted to clearly restate the second one.)

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range implement 
 takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?

 Hmm. I hadn't thought of that. That could be a good reason not 
 to do this. I'm
 not quite sure how to get around that. Hmmm... It would 
 arguably be a bit
 ugly, but a range which couldn't safely be used with takeFront 
 could have an
 enum on it which indicated that, and takeFront would fail to 
 compile if used
 with such a range. Of course, that would mean that you'd have 
 to special case
 such ranges in any range-based function which used takeFront so 
 that there was
 a branch which didn't use takeFront for ranges which couldn't 
 use it.

 I don't know. It's certainly something to think about. We may 
 need a different
 solution.

An alternative is not to define a free function. So if 
consumeFront is defined, it may be used by the algorithm, 
otherwise simply call to front and popFront in the algorithm. If 
branching is necessary, then the is no value in a free function 
(it wouldn't optimize for speed any way).

Jul 03 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which invalidates their
 front value when popFront is called, for instance, File.byLine. Thus
 takeFront will have to be used with care: any range implement takeFront
 (because of the template and USFC), but it may not be valid. That makes
 the range interface more complicated: There is a takeFront property, but
 you have to check it is safe to use... how do you check that by the way?

 
 Hmm. I hadn't thought of that. That could be a good reason not to do this.
 I'm not quite sure how to get around that. Hmmm... It would arguably be a
 bit ugly, but a range which couldn't safely be used with takeFront could
 have an enum on it which indicated that, and takeFront would fail to
 compile if used with such a range. Of course, that would mean that you'd
 have to special case such ranges in any range-based function which used
 takeFront so that there was a branch which didn't use takeFront for ranges
 which couldn't use it.
 
 I don't know. It's certainly something to think about. We may need a
 different solution.

An alternative would be to make it so that takeFront (or consumeFront, as 
someone suggested, since that's a better name) would be defined only for ranges 
which it helps. So, similar to hasSlicing, you get something like hasConsume, 
and then a range-based function which wants to use consumeFront or consumeBack 
checks hasConsume!R and uses it on that particular static if branch if it does 
and uses another branch with front and popFront if it doesn't. 
std.range.consumeFront then works only with strings (and maybe arrays).

However, it seemed to me that a major upside of consumeFront as I proposed it 
was that you could always just use it, and it would do the most efficient
thing, 
so you _didn't_ have to special case for it, whereas with hasConsume, you 
would. But if you can't use consumeFront safely on all ranges, then that 
doesn't really work. It's still worth something but not as much.

You can already special case strings (and in many cases need to). What 
consumeFront does is make it so that functions operating on wrappers around 
strings (i.e. the results of functions like filter or map) can operate on them 
more efficiently as well. It also makes it so that user-defined ranges can get 
that benefit from algorithms that know nothing about those ranges and therefore 
don't special case them (e.g. std.algorithm functions can then be more
efficient 
for user-defined ranges which can define a more efficient consumeFront). So,
even 
if we had hasConsume and functions had to special case in order to use 
consumeFront or consumeBack, it would still be beneficial. However, it would be 
nice to have a solution which _didn't_ require special casing.

- Jonathan M Davis

Jul 03 2012

"Roman D. Boiko" <rb d-coding.com> writes:

On Tuesday, 3 July 2012 at 18:00:55 UTC, Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:
 On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:
 takeFront implementation is dangerous for ranges which 
 invalidates their
 front value when popFront is called, for instance, 
 File.byLine. Thus
 takeFront will have to be used with care: any range 
 implement takeFront
 (because of the template and USFC), but it may not be valid. 
 That makes
 the range interface more complicated: There is a takeFront 
 property, but
 you have to check it is safe to use... how do you check that 
 by the way?

 
 Hmm. I hadn't thought of that. That could be a good reason not 
 to do this.
 I'm not quite sure how to get around that. Hmmm... It would 
 arguably be a
 bit ugly, but a range which couldn't safely be used with 
 takeFront could
 have an enum on it which indicated that, and takeFront would 
 fail to
 compile if used with such a range. Of course, that would mean 
 that you'd
 have to special case such ranges in any range-based function 
 which used
 takeFront so that there was a branch which didn't use 
 takeFront for ranges
 which couldn't use it.
 
 I don't know. It's certainly something to think about. We may 
 need a
 different solution.

 An alternative would be to make it so that takeFront (or 
 consumeFront, as
 someone suggested, since that's a better name) would be defined 
 only for ranges
 which it helps. So, similar to hasSlicing, you get something 
 like hasConsume,
 and then a range-based function which wants to use consumeFront 
 or consumeBack
 checks hasConsume!R and uses it on that particular static if 
 branch if it does
 and uses another branch with front and popFront if it doesn't.
 std.range.consumeFront then works only with strings (and maybe 
 arrays).

 However, it seemed to me that a major upside of consumeFront as 
 I proposed it
 was that you could always just use it, and it would do the most 
 efficient thing,
 so you _didn't_ have to special case for it, whereas with 
 hasConsume, you
 would. But if you can't use consumeFront safely on all ranges, 
 then that
 doesn't really work. It's still worth something but not as much.

 You can already special case strings (and in many cases need 
 to). What
 consumeFront does is make it so that functions operating on 
 wrappers around
 strings (i.e. the results of functions like filter or map) can 
 operate on them
 more efficiently as well. It also makes it so that user-defined 
 ranges can get
 that benefit from algorithms that know nothing about those 
 ranges and therefore
 don't special case them (e.g. std.algorithm functions can then 
 be more efficient
 for user-defined ranges which can define a more efficient 
 consumeFront). So, even
 if we had hasConsume and functions had to special case in order 
 to use
 consumeFront or consumeBack, it would still be beneficial. 
 However, it would be
 nice to have a solution which _didn't_ require special casing.

 - Jonathan M Davis

You bet me by 19 seconds and a better explanation :)

So far we have two options: cleaner code because there would be 
no need to hack containers which invalidate their front value on 
a call to popFront, or a hack which I proposed above, but 
localized in such containers. By the way, some of such containers 
could potentially defer invalidating front till the next call to 
front.

Jul 03 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 03-Jul-12 20:37, Jonathan M Davis wrote:
 This seems like it probably merits a bit of discussion, so I'm bringing it up
 here rather than simply opening a pull request.

 At present, for some ranges (variably-lengthed ranges such as strings in
 particular), calling front incurs a cost which popFront at least partially
 duplicates. So, the range primitives are inherently inefficient in that they
 force you to incur that extra cost as you iterate over the range. Ideally,
 there would be a way to get front and pop it off at the same time so that you
 incur the cost only once (either that or have front cache its result in a way
 that lets it avoid the extra cost in popFront when popFront is called - though
 that wouldn't work with strings, since for them, the range primitives are free
 functions). So, I'm proposing takeFront and takeBack:

I was about to propose fetchFront/fetchBack with similar semantics. 
Thanks for pushing this proposal it looks good.

My initial intent however was to add Variable Length Range as a concept. 
... Now I think binary predicate hasFetch a-la hasSlicing is better.

 https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b

 For most ranges, takeFront does this:

 auto takeFront(R)(ref R range)
      if(isInputRange!R && !isNarrowString!R)
 {
      assert(!range.empty);
      auto retval = range.front;
      range.popFront();
      return retval;
 }

Aye. Yet there is indeed a problem with pseudo-ranges that reuse 'slot' 
for front on each popFront. Well they always been brittle.

 So, for strings, it'll be more efficient to use takeFront than calling front
and
 popFront separately. The idea then is that any user-defined range which can
 implement takeFront more efficiently than the default will define it. Then
range-
 based functions use takeFront - e.g. range.takeFront() - and if the user-
 defined range implements a more efficient version, that one is used and they
gain
 the extra efficiency, or if they don't, then the free function is used with
 UFCS, and they incur the same cost that they'd incur calling front and
 popFront separately. So, it's invisible to range-based functions whether a
 range actually implements takeFront itself. takeBack does the same thing as
 takeFront but it does it with back and popBack for bidirectional ranges.

 I _think_ that this is a fairly clean solution to the problem, but someone
 else might be able to point out why this is a bad idea, or they might have a
 better idea. And this will have a definite impact on how ranges are normally
 used if we add this, so I'm bringing it up here for discussion. Opinions?
 Thoughts? Insights?

 Oh, and if we go with this, ideally, the compiler would be updated to use
 takeFront for foreach instead of front and popFront if a range implements it
 (but still do it the current way if it doesn't). So, if typeof(range)
 implements takeFront,

Right. In fact as we've seen above with e.g. stdin.byLine not every 
range can do fetchFront correctly so it's a strict subset. That was the 
reason for current trio of basic operations BTW.


-- 
Dmitry Olshansky

Jul 03 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Proposal: takeFront and takeBack