digitalmars.D.learn - Idiomatic way to write a range that tracks how much it consumes

Jon Degenhardt (27/27) Apr 26 2020 I have a string that contains a sequence of elements, then a

drug (6/10) Apr 26 2020 I don't know a better way, I think you enlist all possible ways - get a

Jon Degenhardt (2/12) Apr 26 2020 Thanks. Of two, I like the second better as well.

Steven Schveighoffer (19/50) Apr 26 2020 I had exactly the same problems. I created this to solve the problem,

Jon Degenhardt (3/10) Apr 26 2020 Thanks Steve, I'll definitely take a look at this. --Jon

anon (11/11) Apr 26 2020 To implement your option A you could simply use

Jon Degenhardt (20/31) Apr 26 2020 That's interesting. Wouldn't work quite like, but something

Jon Degenhardt <jond noreply.com> writes:

I have a string that contains a sequence of elements, then a 
terminator character, followed by a different sequence of 
elements (of a different type).

I want to create an input range that traverses the initial 
sequence. This is easy enough. But after the initial sequence has 
been traversed, the caller will need to know where the next 
sequence starts. That is, the caller needs to know the index in 
the input string where the initial sequence ends and the next 
sequence begins.

The values returned by the range are a transformation of the 
input, so the values by themselves are insufficient for the 
caller to determined how much of the string has been consumed. 
And, the caller cannot simply search for the terminator character.

Tracking the number of bytes consumed is easy enough. I like to 
do in a way that is consistent with D's normal range paradigm.

Two candidate approaches:
a) Instead of having the range return the individual values, it 
could return a tuple containing the value and the number of bytes 
consumed.

b) Give the input range an extra member function which returns 
the number of bytes consumed. The caller could call this after 
'empty()' returns true to find the amount of data consumed.

Both will work, but I'm not especially satisfied with either. 
Approach (a) seems more consistent with the typical range 
paradigms, but also more of a hassle for callers.

Is there a better way to write this?

--Jon

Apr 26 2020

drug <drug2004 bk.ru> writes:

27.04.2020 06:38, Jon Degenhardt пишет:
 
 Is there a better way to write this?
 
 --Jon

I don't know a better way, I think you enlist all possible ways - get a 
value using either `front` or special range member. I prefer the second 
variant, I don't think it is less consistent with range paradigms. 
Considering you need amount of consumed bytes only when range is empty 
the second way is more effective.

Apr 26 2020

Jon Degenhardt <jond noreply.com> writes:

On Monday, 27 April 2020 at 04:41:58 UTC, drug wrote:
 27.04.2020 06:38, Jon Degenhardt пишет:
 
 Is there a better way to write this?
 
 --Jon

 I don't know a better way, I think you enlist all possible ways 
 - get a value using either `front` or special range member. I 
 prefer the second variant, I don't think it is less consistent 
 with range paradigms. Considering you need amount of consumed 
 bytes only when range is empty the second way is more effective.

Thanks. Of two, I like the second better as well.

Apr 26 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/26/20 11:38 PM, Jon Degenhardt wrote:
 I have a string that contains a sequence of elements, then a terminator 
 character, followed by a different sequence of elements (of a different 
 type).
 
 I want to create an input range that traverses the initial sequence. 
 This is easy enough. But after the initial sequence has been traversed, 
 the caller will need to know where the next sequence starts. That is, 
 the caller needs to know the index in the input string where the initial 
 sequence ends and the next sequence begins.
 
 The values returned by the range are a transformation of the input, so 
 the values by themselves are insufficient for the caller to determined 
 how much of the string has been consumed. And, the caller cannot simply 
 search for the terminator character.
 
 Tracking the number of bytes consumed is easy enough. I like to do in a 
 way that is consistent with D's normal range paradigm.
 
 Two candidate approaches:
 a) Instead of having the range return the individual values, it could 
 return a tuple containing the value and the number of bytes consumed.
 
 b) Give the input range an extra member function which returns the 
 number of bytes consumed. The caller could call this after 'empty()' 
 returns true to find the amount of data consumed.
 
 Both will work, but I'm not especially satisfied with either. Approach 
 (a) seems more consistent with the typical range paradigms, but also 
 more of a hassle for callers.
 
 Is there a better way to write this?

I had exactly the same problems. I created this to solve the problem, 
I've barely tested it, but I plan to use it with all my parsing 
utilities on iopipe:

https://code.dlang.org/packages/bufref
https://github.com/schveiguy/bufref/blob/master/source/bufref.d

It's not documented fully. But partway down, you will see something 
called `bwin`. This is a range that tries to wrap exactly another range 
that is considered a buffer (random access, also includes narrow 
strings), but keeps track of the location of the data inside the buffer 
(the BufRef). Then, you can access the actual data from the original 
buffer using the BufRef and concrete function.

My plan was to store the BufRef items instead of the actual slices, 
because then they are easy to update if for instance you release the 
front of the buffer, or move the data to the front of the buffer (common 
occurrences in iopipe).

If you have ideas or questions, let me know, possibly on the github 
project for it.

-Steve

Apr 26 2020

Jon Degenhardt <jond noreply.com> writes:

On Monday, 27 April 2020 at 04:51:54 UTC, Steven Schveighoffer 
wrote:
 On 4/26/20 11:38 PM, Jon Degenhardt wrote:
 Is there a better way to write this?

 I had exactly the same problems. I created this to solve the 
 problem, I've barely tested it, but I plan to use it with all 
 my parsing utilities on iopipe:

 https://code.dlang.org/packages/bufref
 https://github.com/schveiguy/bufref/blob/master/source/bufref.d

Thanks Steve, I'll definitely take a look at this.  --Jon

Apr 26 2020

anon <anon noreply.com> writes:

To implement your option A you could simply use 
std.range.enumerate.

Would something like this work?

import std.algorithm.iteration : map;
import std.algorithm.searching : until;
import std.range : tee;

size_t bytesConsumed;
auto result = input.map!(a => a.yourTransformation )
                    .until!(stringTerminator)
                    .tee!(a => bytesConsumed++);
// bytesConsumed is automatically updated as result is consumed

Apr 26 2020

Jon Degenhardt <jond noreply.com> writes:

On Monday, 27 April 2020 at 05:06:21 UTC, anon wrote:
 To implement your option A you could simply use 
 std.range.enumerate.

 Would something like this work?

 import std.algorithm.iteration : map;
 import std.algorithm.searching : until;
 import std.range : tee;

 size_t bytesConsumed;
 auto result = input.map!(a => a.yourTransformation )
                    .until!(stringTerminator)
                    .tee!(a => bytesConsumed++);
 // bytesConsumed is automatically updated as result is consumed

That's interesting. Wouldn't work quite like, but something 
similar would, but I don't think it quite achieves what I want.

One thing that's missing is that the initial input is simply a 
string, there's nothing to map over at that point. There is 
however a transformation step that transforms the string into a 
sequence of slices. Then there's a transformation on those 
slices. That would be a step prior to the 'map' step. Also, in my 
case 'map' cannot be used, because each slice may produce 
multiple outputs.

The specifics are minor details, not really so important. The 
implementation can take a form along the lines described. 
However, structuring like this exposes the details of these steps 
to all callers. That is, all callers would have to write the code 
above.

My goal is encapsulate the steps into a single range all callers 
can use. That is, encapsulate something like the steps you have 
above in a standalone range that takes the input string as an 
argument, produces all the output elements, and preserves the 
bytesConsumed in a way the caller can access it.

Apr 26 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Idiomatic way to write a range that tracks how much it consumes