www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - In general, who should do more work: popFront or front?

reply surlymoor <surlymoor cock.li> writes:
All my custom range types perform all their meaningful work in 
their respective popFront methods, in addition to its expected 
source data iteration duties. The reason I do this is because I 
swear I read in a github discussion that front is expected to be 
O(1), and the only way I can think to achieve this is to stash 
the front element of a range in a private field; popFront would 
thus also set this field to a new value upon every call, and 
front would forward to it. (Or front would be the cache itself.)
At the moment, I feel that as long as the stashed front element 
isn't too "big" (For some definition of big, I guess.), that 
built-in caching should be fine. But is this acceptable? What's 
the best practice for determining which range member should 
perform what work? (Other than iterating, of course.)
Jun 14 2021
next sibling parent reply jfondren <julian.fondren gmail.com> writes:
On Tuesday, 15 June 2021 at 04:24:09 UTC, surlymoor wrote:
 All my custom range types perform all their meaningful work in 
 their respective popFront methods, in addition to its expected 
 source data iteration duties. The reason I do this is because I 
 swear I read in a github discussion that front is expected to 
 be O(1), and the only way I can think to achieve this is to 
 stash the front element of a range in a private field; popFront 
 would thus also set this field to a new value upon every call, 
 and front would forward to it. (Or front would be the cache 
 itself.)
 At the moment, I feel that as long as the stashed front element 
 isn't too "big" (For some definition of big, I guess.), that 
 built-in caching should be fine. But is this acceptable? What's 
 the best practice for determining which range member should 
 perform what work? (Other than iterating, of course.)
Well, consider this program: ```d import std; struct Noisy { int[] source; int pops, fronts; bool empty() { return source.empty; } source.popFront; } source.front; } } void main() { iota(5).array .Noisy .filter!"a%2" .each!writeln; } ``` Out of [0,1,2,3,4], only 1,3 pass the filter. Noisy's front is called seven times, 2x for each filter success. Noisy's popFront is called five times, 1x for each source member. But if you slap a .cache (from std.algorithm.cache) before the .filter then these counts are the same.
Jun 14 2021
parent surlymoor <surlymoor cock.li> writes:
On Tuesday, 15 June 2021 at 04:43:38 UTC, jfondren wrote:
 Well, consider this program:

 ```d
 import std;

 struct Noisy {
     int[] source;
     int pops, fronts;
     bool empty() { return source.empty; }

 source.popFront; }

 source.front; }
 }

 void main() {
     iota(5).array
         .Noisy
         .filter!"a%2"
         .each!writeln;
 }
 ```

 Out of [0,1,2,3,4], only 1,3 pass the filter.
 Noisy's front is called seven times, 2x for each filter success.
 Noisy's popFront is called five times, 1x for each source 
 member.

 But if you slap a .cache (from std.algorithm.cache) before the
 .filter then these counts are the same.
Appreciate the response. From your example, one solution is that perhaps documenting one's custom range type with a disclaimer that its front method is expensive upon invocation is reasonable; the consumer of the range type, given this information, might then determine whether using cache is appropriate for them. I'm still left wondering whether one should strive for ensuring that front is O(1), and various solutions to this. With that said, I now realize my question is kind of ridiculous since the answer might be predicated upon other aspects: the type data being consumed; the common operations that are to be performed upon a custom range.
Jun 14 2021
prev sibling next sibling parent reply mw <mingwu gmail.com> writes:
On Tuesday, 15 June 2021 at 04:24:09 UTC, surlymoor wrote:
 All my custom range types perform all their meaningful work in 
 their respective popFront methods, in addition to its expected 
 source data iteration duties. The reason I do this is because I 
 swear I read in a github discussion that front is expected to 
 be O(1), and the only way I can think to achieve this is to 
 stash the front element of a range in a private field; popFront 
 would thus also set this field to a new value upon every call, 
 and front would forward to it. (Or front would be the cache 
 itself.)
 At the moment, I feel that as long as the stashed front element 
 isn't too "big" (For some definition of big, I guess.), that 
 built-in caching should be fine. But is this acceptable? What's 
 the best practice for determining which range member should 
 perform what work? (Other than iterating, of course.)
In English, `front` is a noun, `popFront` have a verb in it, so you know who should do more work :-)
Jun 14 2021
parent reply surlymoor <surlymoor cock.li> writes:
On Tuesday, 15 June 2021 at 04:57:45 UTC, mw wrote:
 In English, `front` is a noun, `popFront` have a verb in it, so 
 you know who should do more work :-)
Sure, but, for example, the front method of Phobos' MapResult is the one applying the predicate to the source's front. There's a clear separation of responsibilities between popFront and front: the former iterates over the source, and the latter performs an operation that produces the range's elements.
Jun 14 2021
parent reply mw <mingwu gmail.com> writes:
On Tuesday, 15 June 2021 at 05:03:45 UTC, surlymoor wrote:
 On Tuesday, 15 June 2021 at 04:57:45 UTC, mw wrote:
 In English, `front` is a noun, `popFront` have a verb in it, 
 so you know who should do more work :-)
Sure, but, for example, the front method of Phobos' MapResult is the one applying the predicate to the source's front. There's a clear separation of responsibilities between popFront and front: the former iterates over the source, and the latter performs an operation that produces the range's elements.
I think there is another convention (although it's not formally enforced, but should be) is: -- `obj.front() [should be] const`, i.e. it shouldn't modify `obj`, so can be called multiple times at any given state, and produce the same result -- `obj.popFront()` perform the actual mutation of the internal state the `obj`, each invocation will change the object's state once. e.g. https://dlang.org/library/std/range/primitives/front.html the 2nd decl: dchar front(T) ( scope const(T)[] a ) pure property safe if (isAutodecodableString!(T[])); you can see `const` but https://dlang.org/library/std/range/primitives/pop_front.html void popFront(C) ( scope ref inout(C)[] str ) pure nothrow trusted if (isAutodecodableString!(C[])); it's `ref inout`, which means the passed-in object is intended to be modified inside the method in-place.
Jun 14 2021
next sibling parent surlymoor <surlymoor cock.li> writes:
On Tuesday, 15 June 2021 at 05:17:06 UTC, mw wrote:
 I think there is another convention (although it's not formally 
 enforced, but should be) is:

 -- `obj.front() [should be] const`, i.e. it shouldn't modify 
 `obj`, so can be called multiple times at any given state, and 
 produce the same result
 -- `obj.popFront()` perform the actual mutation of the internal 
 state the `obj`, each invocation will change the object's state 
 once.
That's a separate matter. To use as an example once more, MapResult's front method produces the range's elements, and its invocation results in the identical element given that popFront isn't called; it doesn't affect the range's source. It satisfies those expectations, but it leaves wanting to know whether one should strive for front being inexpensive to call. My original question is stupid, and I think I'm just going to have to experiment.
Jun 14 2021
prev sibling next sibling parent ag0aep6g <anonymous example.com> writes:
On 15.06.21 07:17, mw wrote:
 https://dlang.org/library/std/range/primitives/front.html
 the 2nd decl:
 dchar front(T) (
    scope const(T)[] a
 ) pure  property  safe
 if (isAutodecodableString!(T[]));
 
 you can see `const`
 
 
 but
 
 https://dlang.org/library/std/range/primitives/pop_front.html
 void popFront(C) (
    scope ref inout(C)[] str
 ) pure nothrow  trusted
 if (isAutodecodableString!(C[]));
 
 it's `ref inout`, which means the passed-in object is intended to be 
 modified inside the method in-place.
`const` and `inout` mean the same thing here: Both functions cannot modify the elements of the array. The difference lies in `ref` alone. `front` receives a copy, so it can't change the length of the array you pass in. `popFront` receives by reference, so it can (and does).
Jun 14 2021
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/14/21 10:17 PM, mw wrote:

 I think there is another convention (although it's not formally
 enforced, but should be) is:

 -- `obj.front() [should be] const`, i.e. it shouldn't modify `obj`, so
 can be called multiple times at any given state, and produce the same
 result
In other words, front() should be "idempotent". To the OP, there is the following presentation that is related and touches on similar concerns: https://forum.dlang.org/thread/diexjstekiyzgxlicnts forum.dlang.org Ali
Jun 15 2021
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On Tuesday, 15 June 2021 at 04:24:09 UTC, surlymoor wrote:
 
 At the moment, I feel that as long as the stashed front element 
 isn't too "big" (For some definition of big, I guess.), that 
 built-in caching should be fine. But is this acceptable? What's 
 the best practice for determining which range member should 
 perform what work? (Other than iterating, of course.)
The "general rule" is that `front` should return the same result on multiple calls until after the next call to `popFront`. I don't know of any requirement that such an operation be O(1), but the expectation is certainly there. The implication then is that any necessary work should be carried out in `popFront` to advance the range (and additionally in a constructor if things need to be teed up first) and that `front` should just return the current element.
Jun 14 2021
prev sibling next sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Tuesday, 15 June 2021 at 04:24:09 UTC, surlymoor wrote:
 All my custom range types perform all their meaningful work in 
 their respective popFront methods, in addition to its expected 
 source data iteration duties. The reason I do this is because I 
 swear I read in a github discussion that front is expected to 
 be O(1), and the only way I can think to achieve this is to 
 stash the front element of a range in a private field; popFront 
 would thus also set this field to a new value upon every call, 
 and front would forward to it. (Or front would be the cache 
 itself.)
 At the moment, I feel that as long as the stashed front element 
 isn't too "big" (For some definition of big, I guess.), that 
 built-in caching should be fine. But is this acceptable? What's 
 the best practice for determining which range member should 
 perform what work? (Other than iterating, of course.)
It's a time-space tradeoff. As you say, caching requires additional space to store the cached element. On the other hand, *not* caching means that you spend unnecessary time computing the next element in cases where the range is only partially consumed. For example: ```d import std.range: generate, take; import std.algorithm: each; import std.stdio: writeln; generate!someExpensiveFunction.take(3).each!writeln; ``` Naively, you'd expect that `someExpensiveFunction` would be called 3 times--but it is actually called 4 times, because `generate` does its work in its constructor and `popFront` instead of in `front`.
Jun 15 2021
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Jun 15, 2021 at 02:20:11PM +0000, Paul Backus via Digitalmars-d-learn
wrote:
[...]
 It's a time-space tradeoff. As you say, caching requires additional
 space to store the cached element. On the other hand, *not* caching
 means that you spend unnecessary time computing the next element in
 cases where the range is only partially consumed. For example:
 
 ```d
 import std.range: generate, take;
 import std.algorithm: each;
 import std.stdio: writeln;
 
 generate!someExpensiveFunction.take(3).each!writeln;
 ```
 
 Naively, you'd expect that `someExpensiveFunction` would be called 3
 times--but it is actually called 4 times, because `generate` does its
 work in its constructor and `popFront` instead of in `front`.
One way to address this is to make the computation lazy: the element is only computed once on demand, and once computed it's cached. But of course, this is probably overkill on many ranges. So the long answer is, it depends. :-) T -- It is not the employer who pays the wages. Employers only handle the money. It is the customer who pays the wages. -- Henry Ford
Jun 15 2021
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/15/21 12:24 AM, surlymoor wrote:
 All my custom range types perform all their meaningful work in their 
 respective popFront methods, in addition to its expected source data 
 iteration duties. The reason I do this is because I swear I read in a 
 github discussion that front is expected to be O(1), and the only way I 
 can think to achieve this is to stash the front element of a range in a 
 private field; popFront would thus also set this field to a new value 
 upon every call, and front would forward to it. (Or front would be the 
 cache itself.)
 At the moment, I feel that as long as the stashed front element isn't 
 too "big" (For some definition of big, I guess.), that built-in caching 
 should be fine. But is this acceptable? What's the best practice for 
 determining which range member should perform what work? (Other than 
 iterating, of course.)
IMO, `front` should not be figuring out which element is next. That is the job of `popFront`. But that doesn't mean `front` cannot do work. map is a prime example. If you use `map` though, you know what you are signing up for. `front` is expected to be called multiple times to get the same data. One thing to think about is that there is a `cache` wrapper range which can store it for you. This way, you give your users the option of caching or not. Each situation is different, and depends on the underlying mechanisms needed to make the range operate. -Steve
Jun 15 2021