digitalmars.D - buffered input

Andrei Alexandrescu (66/66) Feb 04 2011 I've had the opportunity today to put some solid hours of thinking into

dsimcha (14/80) Feb 04 2011 Interesting. I was just writing LazyMap and AsyncBuf in
Ellery Newcomer (6/44) Feb 04 2011 Does shiftFront literally move element n to index 0 and so on? It seems

Nick Sabalausky (13/18) Feb 05 2011 That's what I was wondering too. Would it be forced to internally do a b...

spir (13/20) Feb 05 2011 That's exactly what I'm expecting.

Andrei Alexandrescu (17/36) Feb 05 2011 The buffered range interface as I defined it supports infinite

Nick Sabalausky (17/55) Feb 05 2011 I think I can see how it might be worthwhile to discourage the tradition...

spir (7/66) Feb 05 2011 Becomes too complicated, doesnt it?

Don (4/82) Feb 05 2011 Circular buffers don't seem like an 'optional' use case to me. Most real...

spir (9/79) Feb 05 2011 Sorry, I meant the way we start to draw the picture; not circular buffer...
Andrei Alexandrescu (18/104) Feb 05 2011 The abstraction does handle it implicitly, except that it doesn't fix

Nick Sabalausky (8/18) Feb 05 2011 But what about when the window straddles the border? Ex: The circular

Andrei Alexandrescu (19/39) Feb 06 2011 Say the buffer has 1000 elements of which the last 100 contain data (and...

spir (12/29) Feb 06 2011 Isn't absolute size also relevant, if not more? I mean, as long as buffe...

Andrei Alexandrescu (6/45) Feb 06 2011 Of course. But mathematically you want to find bounds as a fraction of

Tomek =?ISO-8859-2?B?U293afFza2k=?= (6/11) Feb 06 2011 Truely circular, probably not, but a wrap-around slice (circular view of...

spir (5/12) Feb 06 2011 Right.

Andrei Alexandrescu (6/19) Feb 06 2011 With fixed lookahead you can't do a lot of things - such as line

Tomek =?ISO-8859-2?B?U293afFza2k=?= (7/17) Feb 06 2011 Amen. But I don't see how wrap-around slices stand in the way. When you ...

Nick Sabalausky (4/9) Feb 06 2011 Agreed, but not everything always needs infinite lookahead. And sometime...

Andrei Alexandrescu (11/22) Feb 06 2011 If you don't use infinite lookahead you won't consume infinite memory.

Steven Schveighoffer (5/9) Feb 07 2011 I think also, you should test to see if appending will copy the data

spir (9/17) Feb 05 2011 Is this really what it means? I naively understood "discards" as meaning
so (1/6) Feb 05 2011 I think it is basically popFrontN(), and appendToFront() a just append.
Andrei Alexandrescu (3/8) Feb 05 2011 No, it's a mere internal operation bufpos += n or so.

Michel Fortin (18/97) Feb 04 2011 One thing I'm wondering is whether it'd be more efficient if we could

spir (9/15) Feb 05 2011 Does this also makes sense when one needs to iterate over a whole set of...

Michel Fortin (14/25) Feb 05 2011 As I said in my post, whether a temporary buffer or a user-supplied

Andrei Alexandrescu (7/14) Feb 05 2011 Generally when one says "I want the stream to copy data straight into my...

Michel Fortin (28/33) Feb 05 2011 You're right, this is a different thing.

Michel Fortin (9/14) Feb 05 2011 Oops, please change

Steven Schveighoffer (8/21) Feb 07 2011 I may want to store 1% of a very large file. You are saying I must eith...

spir (8/29) Feb 07 2011 This looks very similar to a possible current use case of mine. (lookahe...

Jonathan M Davis (22/101) Feb 05 2011 Hmm. I think that I'd have to have an actual implementation to mess arou...

Andrei Alexandrescu (7/17) Feb 05 2011 Transparent buffering sounds sensible but in fact it robs you of

Tomek =?UTF-8?B?U293acWEc2tp?= (2/7) Feb 05 2011 Broken sentence?

Andrei Alexandrescu (3/10) Feb 05 2011 Sorry. Well it was nothing interesting anyway.

Jonathan M Davis (19/38) Feb 05 2011 The thing is though that if I want to be iterating over a string which i...
Nick Sabalausky (8/33) Feb 05 2011 That shouldn't be a problem for the cases where a lookahead of 1 is all

spir (7/19) Feb 05 2011 And what about backtracking (eg for parsing the source)?

Nick Sabalausky (9/30) Feb 05 2011 Like I said, there are certainly cases where a lookahead of 1 isn't

Jonathan M Davis (41/83) Feb 06 2011 Okay. I think that I've been misunderstanding some stuff here. I forgot ...

Michel Fortin (33/39) Feb 06 2011 It's true that forward ranges are much easier to deal with. That said,
Andrei Alexandrescu (30/49) Feb 06 2011 APIs predicated on the notion that I/O is very expensive and extra

bearophile (13/16) Feb 05 2011 This is an important part of the range design. This range is useful for ...
spir (11/28) Feb 05 2011 Exactly. I would love something like:

Andrei Alexandrescu (4/42) Feb 05 2011 My design would allow you to build a buffered input range from an input

spir (7/20) Feb 05 2011 All right, point taken.

Tomek =?ISO-8859-2?Q?Sowi=F1ski?= (13/46) Feb 05 2011 =20

Tomek =?ISO-8859-2?Q?Sowi=F1ski?= (5/7) Feb 05 2011 I meant: when n + front.length > buf.length.
Jean Crystof (3/47) Feb 05 2011 I find this discussion interesting. There's one idea for an application ...

Tomek =?UTF-8?B?U293acWEc2tp?= (24/27) Feb 05 2011 'd like to try at some point. Basically a facebook chat thingie, but wit...
Nick Sabalausky (7/65) Feb 05 2011 I don'r mean to derail the topic, but if I were aiming for that many

Andrei Alexandrescu (7/51) Feb 05 2011 I think combining the two into one hurts usability as often you want to

Tomek =?UTF-8?B?U293acWEc2tp?= (10/27) Feb 05 2011 Right.

Andrei Alexandrescu (7/30) Feb 05 2011 Discard everything in the current buffer and fill a new buffer. The new

Tomek =?ISO-8859-2?B?U293afFza2k=?= (7/15) Feb 05 2011 hout moving data? =20

Jesse Phillips (16/16) Feb 05 2011 Dang, you beat me to my post on what I have run into trying to provide a...
Heywood Floyd (89/89) Feb 05 2011 Nice!

Nick Sabalausky (10/18) Feb 05 2011 The problem with that is that in many many cases it forces unnessisary

Robert Jacques (7/31) Feb 05 2011 See point of Andrei's post:
spir (13/32) Feb 05 2011 Same here; thought: "maybe he meant shiftBuf() & appendToBuf(), or such?...
Andrei Alexandrescu (3/22) Feb 05 2011 Better names are always welcome!

Adam D. Ruppe (3/3) Feb 05 2011 This sounds similar to how my network code works. I called
Nick Sabalausky (4/34) Feb 05 2011 Borrowing slightly from Adam:

Tomek =?ISO-8859-2?B?U293afFza2k=?= (2/3) Feb 06 2011 I like that.

Andrei Alexandrescu (7/10) Feb 06 2011 What's missing is the part that they refer to front. Maybe

Robert Jacques (6/18) Feb 06 2011 Actually, I don't think these functions would be relatively rarely used....

Torarin (8/32) Feb 06 2011 I

Andrei Alexandrescu (6/38) Feb 06 2011 Exactly. Consider line continuations. Most of the time you read one line...

Torarin (9/54) Feb 06 2011 er

Robert Jacques (8/41) Feb 06 2011 Because even reading UTF-8 requires more than 1-byte of information.

Andrei Alexandrescu (6/48) Feb 06 2011 They aren't being neglected. If you need to get more stuff in the

Robert Jacques (11/61) Feb 06 2011 Nothing. I like the API and it makes things like byChunk actually usable...

Torarin (10/52) Feb 06 2011 er

Andrei Alexandrescu (4/27) Feb 06 2011 Both byLine and byChunk are buffered inputs, are often used, and will

Robert Jacques (8/37) Feb 06 2011 Yes, but byLine is a higher order meta-range designed primarily for end ...

Andrei Alexandrescu (7/46) Feb 06 2011 I understand. Ultimately such widely used parsers would be themselves

spir (12/37) Feb 06 2011 If append actually appends into buffer (what I understand as of now), th...

Torarin (5/19) Feb 05 2011 This is really cool. I realise now that appendToFront fills the gap in
spir (22/78) Feb 05 2011 pleased to see there is at least one other programmer still considering ...
Steven Schveighoffer (13/21) Feb 07 2011 [snip]

Andrei Alexandrescu (4/22) Feb 07 2011 One good thing about the current proposal is that it integrates

Kagamin (6/11) Feb 07 2011 This surely satisfies most needs for buffered input.

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I've had the opportunity today to put some solid hours of thinking into 
the relationship (better said the relatedness) of what would be called 
"buffered streams" and ranges. They have some commonalities and some 
differences, but it's been difficult to capture them. I think now I have 
a clear view, caused by a few recent discussions. One was the CSV reader 
discussed on the Phobos list; another was the discussion on defining the 
"right" std.xml.

First, let's start with the humblest abstraction of all - an input 
range, which only defines the troika empty/front/popFront with the known 
semantics.

An input range consumes input destructively and has a one-element 
horizon. It may as well considered a buffered stream with the buffer 
length exactly one.

Going from there, we may say that certain streaming can be done by using 
an input range of ubyte (or dchar for text). That would be the 
UTFpowered equivalent of getchar(). The readf function operates that way 
- it only needs to look one character ahead. Incidentally, the CSV 
format also requires lookahead of 1, so it also can operate on a range 
of dchar.

At this point we need to ask ourselves an essential question. Since we 
have this "input range" abstraction for a 1-element buffer, what would 
its n-elements buffer representation look like? How do we go from "input 
range of T" (which really is "unbuffered input range of T" to "buffered 
input range of T"?

Honestly, the answer was extremely unclear to me for the longest time. I 
thought that such a range would be an extension of the unbuffered one, 
e.g. a range that still offers T from front() but also offers some 
additional functions - e.g. a lookahead in the form of a random-access 
operator. I still think something can be defined along those lines, but 
today I came together with a design that is considerably simpler both 
for the client and the designer of the range.

I hereby suggest we define "buffered input range of T" any range R that 
satisfies the following conditions:

1. R is an input range of T[]

2. R defines a primitive shiftFront(size_t n). The semantics of the 
primitive is that, if r.front.length >= n, then shiftFront(n) discards 
the first n elements in r.front. Subsequently r.front will return a 
slice of the remaining elements.

3. R defines a primitive appendToFront(size_t n). Semantics: adds at 
most n more elements from the underlying stream and makes them available 
in addition to whatever was in front. For example if r.front.length was 
1024, after the call r.appendToFront(512) will have r.front have length 
1536 of which the first 1024 will be the old front and the rest will be 
newly-read elements (assuming that the stream had enough data). If n = 
0, this instructs the stream to add any number of elements at its own 
discretion.

This is it. I like many things about this design, although I still fear 
some fatal flaw may be found with it.

With these primitives a lot of good operating operating on buffered 
streams can be written efficiently. The range is allowed to reuse data 
in its buffers (unless that would contradict language invariants, e.g. 
if T is invariant), so if client code wants to stash away parts of the 
input, it needs to make a copy.

One great thing is that buffered ranges as defined above play very well 
with both ranges and built-in arrays - two quintessential parts of D. I 
look at this and say, "this all makes sense". For example the design 
could be generalized to operate on some random-access range other than 
the built-in array, but then I'm thinking, unless some advantage comes 
about, why not giving T[] a little special status? Probably everyone 
thinks of contiguous memory when thinking "buffers", so here 
generalization may be excessive (albeit meaningful).

Finally, this design is very easy to experiment with and causes no 
disruption to ranges. I can readily add the primitives to byLine and 
byChunk so we can see what streaming we can do that way.

What do you all think?


Andrei

Feb 04 2011

dsimcha <dsimcha yahoo.com> writes:

Interesting.  I was just writing LazyMap and AsyncBuf in 
std.parallelism, and I ran into these exact issues.  (A LazyMap computes 
a map lazily and in parallel and stores the result in a buffer.  An 
AsyncBuf reads from an unbuffered input range in a background thread and 
buffers the results for when you need them.  I wanted to optimize 
chaining LazyMaps and AsyncBufs for pipelining parallelism.)

I solved them with very ad-hoc way with lots of static if statements and 
encapsulation violation within the module.  One thing that would make a 
more principled approach in std.parallelism possible is a swapBuffer() 
primitive.  You provide the range with a new buffer to fill, and it 
gives you complete ownership of the old buffer.  This is basically how 
LazyMap/AsyncBuf pipelining works under the hood, though I never gave 
any serious consideration to the more general case.

On 2/5/2011 12:46 AM, Andrei Alexandrescu wrote:
 I've had the opportunity today to put some solid hours of thinking into
 the relationship (better said the relatedness) of what would be called
 "buffered streams" and ranges. They have some commonalities and some
 differences, but it's been difficult to capture them. I think now I have
 a clear view, caused by a few recent discussions. One was the CSV reader
 discussed on the Phobos list; another was the discussion on defining the
 "right" std.xml.

 First, let's start with the humblest abstraction of all - an input
 range, which only defines the troika empty/front/popFront with the known
 semantics.

 An input range consumes input destructively and has a one-element
 horizon. It may as well considered a buffered stream with the buffer
 length exactly one.

 Going from there, we may say that certain streaming can be done by using
 an input range of ubyte (or dchar for text). That would be the
 UTFpowered equivalent of getchar(). The readf function operates that way
 - it only needs to look one character ahead. Incidentally, the CSV
 format also requires lookahead of 1, so it also can operate on a range
 of dchar.

 At this point we need to ask ourselves an essential question. Since we
 have this "input range" abstraction for a 1-element buffer, what would
 its n-elements buffer representation look like? How do we go from "input
 range of T" (which really is "unbuffered input range of T" to "buffered
 input range of T"?

 Honestly, the answer was extremely unclear to me for the longest time. I
 thought that such a range would be an extension of the unbuffered one,
 e.g. a range that still offers T from front() but also offers some
 additional functions - e.g. a lookahead in the form of a random-access
 operator. I still think something can be defined along those lines, but
 today I came together with a design that is considerably simpler both
 for the client and the designer of the range.

 I hereby suggest we define "buffered input range of T" any range R that
 satisfies the following conditions:

 1. R is an input range of T[]

 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards
 the first n elements in r.front. Subsequently r.front will return a
 slice of the remaining elements.

 3. R defines a primitive appendToFront(size_t n). Semantics: adds at
 most n more elements from the underlying stream and makes them available
 in addition to whatever was in front. For example if r.front.length was
 1024, after the call r.appendToFront(512) will have r.front have length
 1536 of which the first 1024 will be the old front and the rest will be
 newly-read elements (assuming that the stream had enough data). If n =
 0, this instructs the stream to add any number of elements at its own
 discretion.

 This is it. I like many things about this design, although I still fear
 some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on buffered
 streams can be written efficiently. The range is allowed to reuse data
 in its buffers (unless that would contradict language invariants, e.g.
 if T is invariant), so if client code wants to stash away parts of the
 input, it needs to make a copy.

 One great thing is that buffered ranges as defined above play very well
 with both ranges and built-in arrays - two quintessential parts of D. I
 look at this and say, "this all makes sense". For example the design
 could be generalized to operate on some random-access range other than
 the built-in array, but then I'm thinking, unless some advantage comes
 about, why not giving T[] a little special status? Probably everyone
 thinks of contiguous memory when thinking "buffers", so here
 generalization may be excessive (albeit meaningful).

 Finally, this design is very easy to experiment with and causes no
 disruption to ranges. I can readily add the primitives to byLine and
 byChunk so we can see what streaming we can do that way.

 What do you all think?


 Andrei

Feb 04 2011

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 02/04/2011 11:46 PM, Andrei Alexandrescu wrote:
 I've had the opportunity today to put some solid hours of thinking into
 the relationship (better said the relatedness) of what would be called
 "buffered streams" and ranges. They have some commonalities and some
 differences, but it's been difficult to capture them. I think now I have
 a clear view, caused by a few recent discussions. One was the CSV reader
 discussed on the Phobos list; another was the discussion on defining the
 "right" std.xml.

 First, let's start with the humblest abstraction of all - an input
 range, which only defines the troika empty/front/popFront with the known
 semantics.

 An input range consumes input destructively and has a one-element
 horizon. It may as well considered a buffered stream with the buffer
 length exactly one.

 Going from there, we may say that certain streaming can be done by using
 an input range of ubyte (or dchar for text). That would be the
 UTFpowered equivalent of getchar(). The readf function operates that way
 - it only needs to look one character ahead. Incidentally, the CSV
 format also requires lookahead of 1, so it also can operate on a range
 of dchar.

 At this point we need to ask ourselves an essential question. Since we
 have this "input range" abstraction for a 1-element buffer, what would
 its n-elements buffer representation look like? How do we go from "input
 range of T" (which really is "unbuffered input range of T" to "buffered
 input range of T"?

 Honestly, the answer was extremely unclear to me for the longest time. I
 thought that such a range would be an extension of the unbuffered one,
 e.g. a range that still offers T from front() but also offers some
 additional functions - e.g. a lookahead in the form of a random-access
 operator. I still think something can be defined along those lines, but
 today I came together with a design that is considerably simpler both
 for the client and the designer of the range.

 I hereby suggest we define "buffered input range of T" any range R that
 satisfies the following conditions:

 1. R is an input range of T[]

 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards
 the first n elements in r.front. Subsequently r.front will return a
 slice of the remaining elements.

Does shiftFront literally move element n to index 0 and so on? It seems 
to me that if you do, its going to have horrid performance, and if you 
don't, then you will eventually run into situations where appendToFront 
will require a wrap around, which loses you your contiguity, or a 
reallocation of the buffer.

Feb 04 2011

"Nick Sabalausky" <a a.a> writes:

"Ellery Newcomer" <ellery-newcomer utulsa.edu> wrote in message 
news:iiiu0g$2kmp$1 digitalmars.com...
 Does shiftFront literally move element n to index 0 and so on? It seems to 
 me that if you do, its going to have horrid performance, and if you don't, 
 then you will eventually run into situations where appendToFront will 
 require a wrap around, which loses you your contiguity, or a reallocation 
 of the buffer.

That's what I was wondering too. Would it be forced to internally do a bunch 
of "buf[x..$]~newStuff" and march itself through memory? Would it be 
sensible to use that interface for a circular buffer? If so, how would you 
even do it?

On a separate note, I think a good way of testing the design (and end up 
getting something useful anyway) would be to try to use it to create a range 
that's automatically-buffered in a more traditional way. Ie, Given any input 
range 'myRange', "buffered(myRange, 2048)" (or something like that) would 
wrap it in a new input range that automatically buffers the next 2048 
elements (asynchronously?) whenever its internal buffer is exhausted. Or 
something like that. It's late and I'm tired and I can't think anymore ;)

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and end up
 getting something useful anyway) would be to try to use it to create a range
 that's automatically-buffered in a more traditional way. Ie, Given any input
 range 'myRange', "buffered(myRange, 2048)" (or something like that) would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore ;)

That's exactly what I'm expecting.
Funnily enough, I was about to start a thread on the topic after reading 
related posts. My point was:
"I'm not a specialist in efficiency (rather the opposite), I just know there is 
--theoretically-- relevant performance loss to expect from unbuffered input in 
various cases. Could we define a generic input-buffering primitive allowing 
people to benefit from others' competence? Just like Appender."

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and end up
 getting something useful anyway) would be to try to use it to create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that) would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

The buffered range interface as I defined it supports infinite 
lookahead. The interface mentioned by Nick has lookahead between 1 and 
2048. So I don't think my interface is appropriate for that.

Infinite lookahead is a wonderful thing. Consider reading lines from a 
file. Essentially what you need to do is to keep on reading blocks of 
data until you see \n (possibly followed by some more stuff). Then you 
offer the client the line up to the \n. When the client wants a new 
line, you combine the leftover data you already have with new stuff you 
read. On occasion you need to move over leftovers, but if your internal 
buffers are large enough that is negligible (I actually tested this 
recently).

Another example: consider dealing with line continuations in reading CSV 
files. Under certain conditions, you need to read one more line and 
stitch it with the existing one. This is easy with infinite lookahead, 
but quite elaborate with lookahead 1.


Andrei

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:iijq99$1a5o$1 digitalmars.com...
 On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and end up
 getting something useful anyway) would be to try to use it to create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that) 
 would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore 
 ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

 The buffered range interface as I defined it supports infinite lookahead. 
 The interface mentioned by Nick has lookahead between 1 and 2048. So I 
 don't think my interface is appropriate for that.

 Infinite lookahead is a wonderful thing. Consider reading lines from a 
 file. Essentially what you need to do is to keep on reading blocks of data 
 until you see \n (possibly followed by some more stuff). Then you offer 
 the client the line up to the \n. When the client wants a new line, you 
 combine the leftover data you already have with new stuff you read. On 
 occasion you need to move over leftovers, but if your internal buffers are 
 large enough that is negligible (I actually tested this recently).

 Another example: consider dealing with line continuations in reading CSV 
 files. Under certain conditions, you need to read one more line and stitch 
 it with the existing one. This is easy with infinite lookahead, but quite 
 elaborate with lookahead 1.

I think I can see how it might be worthwhile to discourage the traditional 
buffer interface I described in favor of the above. It wouldn't be as 
trivial to use as what people are used to, but I can see that it could avoid 
a lot of unnessisary copying, especially with other people's suggestion of 
allowing the user to provide their own buffer to be filled (and it seems 
easy enough to learn).

But what about when you want a circular buffer? Ie, When you know a certain 
maximum lookahead is fine and you want to minimize memory usage and 
buffer-appends. Circular buffers don't do infinite lookahead so the 
interface maybe doesn't work as well. Plus you probably wouldn't want to 
provide an interface for slicing into the buffer, since the slice could 
straddle the wrap-around point which would require a new allocation (ie 
"return buffer[indexOfFront+sliceStart..$] ~ 
buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that 
would just call for another type of range.

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 10:44 PM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:iijq99$1a5o$1 digitalmars.com...
 On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and end up
 getting something useful anyway) would be to try to use it to create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that)
 would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore
 ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

 The buffered range interface as I defined it supports infinite lookahead.
 The interface mentioned by Nick has lookahead between 1 and 2048. So I
 don't think my interface is appropriate for that.

 Infinite lookahead is a wonderful thing. Consider reading lines from a
 file. Essentially what you need to do is to keep on reading blocks of data
 until you see \n (possibly followed by some more stuff). Then you offer
 the client the line up to the \n. When the client wants a new line, you
 combine the leftover data you already have with new stuff you read. On
 occasion you need to move over leftovers, but if your internal buffers are
 large enough that is negligible (I actually tested this recently).

 Another example: consider dealing with line continuations in reading CSV
 files. Under certain conditions, you need to read one more line and stitch
 it with the existing one. This is easy with infinite lookahead, but quite
 elaborate with lookahead 1.

 I think I can see how it might be worthwhile to discourage the traditional
 buffer interface I described in favor of the above. It wouldn't be as
 trivial to use as what people are used to, but I can see that it could avoid
 a lot of unnessisary copying, especially with other people's suggestion of
 allowing the user to provide their own buffer to be filled (and it seems
 easy enough to learn).

 But what about when you want a circular buffer? Ie, When you know a certain
 maximum lookahead is fine and you want to minimize memory usage and
 buffer-appends. Circular buffers don't do infinite lookahead so the
 interface maybe doesn't work as well. Plus you probably wouldn't want to
 provide an interface for slicing into the buffer, since the slice could
 straddle the wrap-around point which would require a new allocation (ie
 "return buffer[indexOfFront+sliceStart..$] ~
 buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that
 would just call for another type of range.

Becomes too complicated, doesnt it?

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Don <nospam nospam.com> writes:

spir wrote:
 On 02/05/2011 10:44 PM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:iijq99$1a5o$1 digitalmars.com...
 On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and 
 end up
 getting something useful anyway) would be to try to use it to create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that)
 would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is 
 exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore
 ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after 
 reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

 The buffered range interface as I defined it supports infinite 
 lookahead.
 The interface mentioned by Nick has lookahead between 1 and 2048. So I
 don't think my interface is appropriate for that.

 Infinite lookahead is a wonderful thing. Consider reading lines from a
 file. Essentially what you need to do is to keep on reading blocks of 
 data
 until you see \n (possibly followed by some more stuff). Then you offer
 the client the line up to the \n. When the client wants a new line, you
 combine the leftover data you already have with new stuff you read. On
 occasion you need to move over leftovers, but if your internal 
 buffers are
 large enough that is negligible (I actually tested this recently).

 Another example: consider dealing with line continuations in reading CSV
 files. Under certain conditions, you need to read one more line and 
 stitch
 it with the existing one. This is easy with infinite lookahead, but 
 quite
 elaborate with lookahead 1.

 I think I can see how it might be worthwhile to discourage the 
 traditional
 buffer interface I described in favor of the above. It wouldn't be as
 trivial to use as what people are used to, but I can see that it could 
 avoid
 a lot of unnessisary copying, especially with other people's 
 suggestion of
 allowing the user to provide their own buffer to be filled (and it seems
 easy enough to learn).

 But what about when you want a circular buffer? Ie, When you know a 
 certain
 maximum lookahead is fine and you want to minimize memory usage and
 buffer-appends. Circular buffers don't do infinite lookahead so the
 interface maybe doesn't work as well. Plus you probably wouldn't want to
 provide an interface for slicing into the buffer, since the slice could
 straddle the wrap-around point which would require a new allocation (ie
 "return buffer[indexOfFront+sliceStart..$] ~
 buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that
 would just call for another type of range.

 
 Becomes too complicated, doesnt it?
 
 Denis

Circular buffers don't seem like an 'optional' use case to me. Most real 
I/O works that way.
I think if the abstraction can't handle it, the abstraction is a failure.

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/06/2011 01:28 AM, Don wrote:
 spir wrote:
 On 02/05/2011 10:44 PM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org> wrote in message
 news:iijq99$1a5o$1 digitalmars.com...
 On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and end up
 getting something useful anyway) would be to try to use it to create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that)
 would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is exhausted. Or
 something like that. It's late and I'm tired and I can't think anymore
 ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

 The buffered range interface as I defined it supports infinite lookahead.
 The interface mentioned by Nick has lookahead between 1 and 2048. So I
 don't think my interface is appropriate for that.

 Infinite lookahead is a wonderful thing. Consider reading lines from a
 file. Essentially what you need to do is to keep on reading blocks of data
 until you see \n (possibly followed by some more stuff). Then you offer
 the client the line up to the \n. When the client wants a new line, you
 combine the leftover data you already have with new stuff you read. On
 occasion you need to move over leftovers, but if your internal buffers are
 large enough that is negligible (I actually tested this recently).

 Another example: consider dealing with line continuations in reading CSV
 files. Under certain conditions, you need to read one more line and stitch
 it with the existing one. This is easy with infinite lookahead, but quite
 elaborate with lookahead 1.

 I think I can see how it might be worthwhile to discourage the traditional
 buffer interface I described in favor of the above. It wouldn't be as
 trivial to use as what people are used to, but I can see that it could avoid
 a lot of unnessisary copying, especially with other people's suggestion of
 allowing the user to provide their own buffer to be filled (and it seems
 easy enough to learn).

 But what about when you want a circular buffer? Ie, When you know a certain
 maximum lookahead is fine and you want to minimize memory usage and
 buffer-appends. Circular buffers don't do infinite lookahead so the
 interface maybe doesn't work as well. Plus you probably wouldn't want to
 provide an interface for slicing into the buffer, since the slice could
 straddle the wrap-around point which would require a new allocation (ie
 "return buffer[indexOfFront+sliceStart..$] ~
 buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that
 would just call for another type of range.

 Becomes too complicated, doesnt it?

 Denis

 Circular buffers don't seem like an 'optional' use case to me. Most real I/O
 works that way.
 I think if the abstraction can't handle it, the abstraction is a failure.

Sorry, I meant the way we start to draw the picture; not circular buffers, can 
see the point about them. Think Heywood's "view window" is a helpful image and 
a good modelling starting point. (maybe it's only me)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 7:28 PM, Don wrote:
 spir wrote:
 On 02/05/2011 10:44 PM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org> wrote in message
 news:iijq99$1a5o$1 digitalmars.com...
 On 2/5/11 6:46 AM, spir wrote:
 On 02/05/2011 10:36 AM, Nick Sabalausky wrote:
 On a separate note, I think a good way of testing the design (and
 end up
 getting something useful anyway) would be to try to use it to
 create a
 range
 that's automatically-buffered in a more traditional way. Ie, Given
 any
 input
 range 'myRange', "buffered(myRange, 2048)" (or something like that)
 would
 wrap it in a new input range that automatically buffers the next 2048
 elements (asynchronously?) whenever its internal buffer is
 exhausted. Or
 something like that. It's late and I'm tired and I can't think
 anymore
 ;)

 That's exactly what I'm expecting.
 Funnily enough, I was about to start a thread on the topic after
 reading
 related posts. My point was:
 "I'm not a specialist in efficiency (rather the opposite), I just know
 there is --theoretically-- relevant performance loss to expect from
 unbuffered input in various cases. Could we define a generic
 input-buffering primitive allowing people to benefit from others'
 competence? Just like Appender."

 Denis

 The buffered range interface as I defined it supports infinite
 lookahead.
 The interface mentioned by Nick has lookahead between 1 and 2048. So I
 don't think my interface is appropriate for that.

 Infinite lookahead is a wonderful thing. Consider reading lines from a
 file. Essentially what you need to do is to keep on reading blocks
 of data
 until you see \n (possibly followed by some more stuff). Then you offer
 the client the line up to the \n. When the client wants a new line, you
 combine the leftover data you already have with new stuff you read. On
 occasion you need to move over leftovers, but if your internal
 buffers are
 large enough that is negligible (I actually tested this recently).

 Another example: consider dealing with line continuations in reading
 CSV
 files. Under certain conditions, you need to read one more line and
 stitch
 it with the existing one. This is easy with infinite lookahead, but
 quite
 elaborate with lookahead 1.

 I think I can see how it might be worthwhile to discourage the
 traditional
 buffer interface I described in favor of the above. It wouldn't be as
 trivial to use as what people are used to, but I can see that it
 could avoid
 a lot of unnessisary copying, especially with other people's
 suggestion of
 allowing the user to provide their own buffer to be filled (and it seems
 easy enough to learn).

 But what about when you want a circular buffer? Ie, When you know a
 certain
 maximum lookahead is fine and you want to minimize memory usage and
 buffer-appends. Circular buffers don't do infinite lookahead so the
 interface maybe doesn't work as well. Plus you probably wouldn't want to
 provide an interface for slicing into the buffer, since the slice could
 straddle the wrap-around point which would require a new allocation (ie
 "return buffer[indexOfFront+sliceStart..$] ~
 buffer[0..sliceLength-($-(frontIndex+sliceStart))]"). I guess maybe that
 would just call for another type of range.

 Becomes too complicated, doesnt it?

 Denis

 Circular buffers don't seem like an 'optional' use case to me. Most real
 I/O works that way.
 I think if the abstraction can't handle it, the abstraction is a failure.

The abstraction does handle it implicitly, except that it doesn't fix 
the buffer size. If you ask for appendToFront() with large numbers 
without calling shiftFront() too, the size of the buffer will ultimately 
increase to accommodate the entire input. That's the infinite in 
infinite lookahead.

Most uses, however, stick with front/popFront (i.e. let the range choose 
the buffer size and handle circularity transparently) and on occasion 
call appendToFront()/shiftFront(). Whenever a part of the buffer has 
been released by calling shiftFront(), the implementation may use it as 
a circular buffer.

Circularity is all transparent, as I think it should be. But the power 
is there. Consider FIE* for contrast. The C API provides setbuf and 
setvbuf, but no way to otherwise take advantage of buffering - at all. 
This really hurts; you can only call fungetc() once and be guaranteed 
success - i.e. all FILE* offer L(1) capabilities no matter how big a 
buffer you set!


Andrei

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:iil64l$1f6s$1 digitalmars.com...
 On 2/5/11 7:28 PM, Don wrote:
 Circular buffers don't seem like an 'optional' use case to me. Most real
 I/O works that way.
 I think if the abstraction can't handle it, the abstraction is a failure.

 The abstraction does handle it implicitly, except that it doesn't fix the 
 buffer size. If you ask for appendToFront() with large numbers without 
 calling shiftFront() too, the size of the buffer will ultimately increase 
 to accommodate the entire input. That's the infinite in infinite 
 lookahead.

But what about when the window straddles the border? Ex: The circular 
buffer's internal size is 1000, the current starting point is 900 and the 
window (ie, front()) is 200. I guess that could work fine if front() is a 
random-access range, but if it's an array (which I think is what you 
proposed unless I misunderstood), then front() would have to return a new 
allocation: buf[900..$]~buf[0..100].

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 0:01 EST, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:iil64l$1f6s$1 digitalmars.com...
 On 2/5/11 7:28 PM, Don wrote:
 Circular buffers don't seem like an 'optional' use case to me. Most real
 I/O works that way.
 I think if the abstraction can't handle it, the abstraction is a failure.

 The abstraction does handle it implicitly, except that it doesn't fix the
 buffer size. If you ask for appendToFront() with large numbers without
 calling shiftFront() too, the size of the buffer will ultimately increase
 to accommodate the entire input. That's the infinite in infinite
 lookahead.

 But what about when the window straddles the border? Ex: The circular
 buffer's internal size is 1000, the current starting point is 900 and the
 window (ie, front()) is 200. I guess that could work fine if front() is a
 random-access range, but if it's an array (which I think is what you
 proposed unless I misunderstood), then front() would have to return a new
 allocation: buf[900..$]~buf[0..100].

Say the buffer has 1000 elements of which the last 100 contain data (and 
the other 900 are past data that's not used). Then say this request comes:

stream.appendToFront(150);

At this point the stream may go two ways:

1. Append to the internal in-memory buffer, making it larger:

_buf.length += 150;
... read into _buf[$ - 150 .. $] ...

Now we have a buffer that has 1100 elements, of which the last 250 are used.

2. Move the data to the beginning of the buffer and then read 150 more 
elements starting at position 100:

_buf[0 .. 100] = _buf[$ - 100 .. $];
... read into _buf[100 .. 250] ...

Now _buf has still 1000 elements, of which the first 250 contain data.

How does the stream decide between 1 and 2? Clearly it's undesirable to 
grow the buffer too much and it's also undesirable to copy too much 
data. A simple approach is to establish a bound on losses, for example 
copy data only if size to copy is < 5% of the entire buffer.


Andrei

Feb 06 2011

spir <denis.spir gmail.com> writes:

On 02/06/2011 04:25 PM, Andrei Alexandrescu wrote:
 Say the buffer has 1000 elements of which the last 100 contain data (and the
 other 900 are past data that's not used). Then say this request comes:

 stream.appendToFront(150);

 At this point the stream may go two ways:

 1. Append to the internal in-memory buffer, making it larger:

 _buf.length += 150;
 ... read into _buf[$ - 150 .. $] ...

 Now we have a buffer that has 1100 elements, of which the last 250 are used.

 2. Move the data to the beginning of the buffer and then read 150 more elements
 starting at position 100:

 _buf[0 .. 100] = _buf[$ - 100 .. $];
 ... read into _buf[100 .. 250] ...

 Now _buf has still 1000 elements, of which the first 250 contain data.

 How does the stream decide between 1 and 2? Clearly it's undesirable to grow
 the buffer too much and it's also undesirable to copy too much data. A simple
 approach is to establish a bound on losses, for example copy data only if size
 to copy is < 5% of the entire buffer.

Isn't absolute size also relevant, if not more? I mean, as long as buffer size 
is small (if not insignificant) in absolute value, compared to eg CPU cache or 
available RAM, may it be good strategy to grow it whatever the relative growth 
in proportion of current size?

Also: could a (truely) circular buffer help & solve the above copy problem, 
concretely?


Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 12:42 PM, spir wrote:
 On 02/06/2011 04:25 PM, Andrei Alexandrescu wrote:
 Say the buffer has 1000 elements of which the last 100 contain data
 (and the
 other 900 are past data that's not used). Then say this request comes:

 stream.appendToFront(150);

 At this point the stream may go two ways:

 1. Append to the internal in-memory buffer, making it larger:

 _buf.length += 150;
 ... read into _buf[$ - 150 .. $] ...

 Now we have a buffer that has 1100 elements, of which the last 250 are
 used.

 2. Move the data to the beginning of the buffer and then read 150 more
 elements
 starting at position 100:

 _buf[0 .. 100] = _buf[$ - 100 .. $];
 ... read into _buf[100 .. 250] ...

 Now _buf has still 1000 elements, of which the first 250 contain data.

 How does the stream decide between 1 and 2? Clearly it's undesirable
 to grow
 the buffer too much and it's also undesirable to copy too much data. A
 simple
 approach is to establish a bound on losses, for example copy data only
 if size
 to copy is < 5% of the entire buffer.

 Isn't absolute size also relevant, if not more? I mean, as long as
 buffer size is small (if not insignificant) in absolute value, compared
 to eg CPU cache or available RAM, may it be good strategy to grow it
 whatever the relative growth in proportion of current size?

Of course. But mathematically you want to find bounds as a fraction of 
the input size.

 Also: could a (truely) circular buffer help & solve the above copy
 problem, concretely?

Not if you want infinite lookahead, which I think is what any modern 
buffering system should offer.


Andrei

Feb 06 2011

Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:

Andrei Alexandrescu napisa=B3:

 Also: could a (truely) circular buffer help & solve the above copy
 problem, concretely? =20

=20
 Not if you want infinite lookahead, which I think is what any modern=20
 buffering system should offer.

Truely circular, probably not, but a wrap-around slice (circular view of le=
ngth at most underlying.length) does offer that and solves the copy problem=
 with style.

--=20
Tomek

Feb 06 2011

spir <denis.spir gmail.com> writes:

On 02/06/2011 08:49 PM, Tomek Sowiński wrote:
 Andrei Alexandrescu napisał:

 Also: could a (truely) circular buffer help&  solve the above copy
 problem, concretely?

 Not if you want infinite lookahead, which I think is what any modern
 buffering system should offer.

 Truely circular, probably not, but a wrap-around slice (circular view of
length at most underlying.length) does offer that and solves the copy problem
with style.

Right.

_________________
vita es estrany
spir.wikidot.com

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 4:51 PM, spir wrote:
 On 02/06/2011 08:49 PM, Tomek Sowiński wrote:
 Andrei Alexandrescu napisał:

 Also: could a (truely) circular buffer help& solve the above copy
 problem, concretely?

 Not if you want infinite lookahead, which I think is what any modern
 buffering system should offer.

 Truely circular, probably not, but a wrap-around slice (circular view
 of length at most underlying.length) does offer that and solves the
 copy problem with style.

 Right.

With fixed lookahead you can't do a lot of things - such as line 
continuation in C programs or CSV files. There are plenty of other 
examples. Generally I believe k-lookahead is a thing of the past and 
infinite-lookahead is where the future is.

Andrei

Feb 06 2011

Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:

Andrei Alexandrescu napisa=B3:

 Truely circular, probably not, but a wrap-around slice (circular view
 of length at most underlying.length) does offer that and solves the
 copy problem with style. =20

 Right. =20

=20
 With fixed lookahead you can't do a lot of things - such as line=20
 continuation in C programs or CSV files. There are plenty of other=20
 examples. Generally I believe k-lookahead is a thing of the past and=20
 infinite-lookahead is where the future is.

Amen. But I don't see how wrap-around slices stand in the way. When you nee=
d more capacity, resize the internal buffer array and let front() expose a =
new wrap-around slice. Just like you'd do with T[]. The only difference is =
you don't care if a fetch goes over the edge.

--=20
Tomek

Feb 06 2011

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:iimnm6$1m4a$2 digitalmars.com...
 On 2/6/11 12:42 PM, spir wrote:
 Also: could a (truely) circular buffer help & solve the above copy
 problem, concretely?

 Not if you want infinite lookahead, which I think is what any modern 
 buffering system should offer.

Agreed, but not everything always needs infinite lookahead. And sometimes 
space guarantees are important.

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 4:13 PM, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:iimnm6$1m4a$2 digitalmars.com...
 On 2/6/11 12:42 PM, spir wrote:
 Also: could a (truely) circular buffer help&  solve the above copy
 problem, concretely?

 Not if you want infinite lookahead, which I think is what any modern
 buffering system should offer.

 Agreed, but not everything always needs infinite lookahead. And sometimes
 space guarantees are important.

If you don't use infinite lookahead you won't consume infinite memory. 
You stand to consume more memory though but I believe we're not supposed 
to optimize for that.

To implement O(n) buffering for n elements of lookahead you can use the 
circular buffer already present in std.range as backend, coupled with 
routines to replenish it. The disadvantage is that the client doesn't 
see true T[] buffers, it sees lookalike random-access ranges that use 
modulo operations for indexing. All in all this is an abstraction worth 
providing but not as general as discardFromFront() and appendToFront().


Andrei

Feb 06 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sun, 06 Feb 2011 10:25:15 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 How does the stream decide between 1 and 2? Clearly it's undesirable to  
 grow the buffer too much and it's also undesirable to copy too much  
 data. A simple approach is to establish a bound on losses, for example  
 copy data only if size to copy is < 5% of the entire buffer.

I think also, you should test to see if appending will copy the data  
anyways (i.e. reallocate).  We may need to add a runtime function for this.

-Steve

Feb 07 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 08:22 AM, Ellery Newcomer wrote:
 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards
 the first n elements in r.front. Subsequently r.front will return a
 slice of the remaining elements.

 Does shiftFront literally move element n to index 0 and so on? It seems to me
 that if you do, its going to have horrid performance, and if you don't, then
 you will eventually run into situations where appendToFront will require a wrap
 around, which loses you your contiguity, or a reallocation of the buffer.

Is this really what it means? I naively understood "discards" as meaning
     buf = buf[n..$];
or similar.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

so <so so.do> writes:

 Does shiftFront literally move element n to index 0 and so on? It seems  
 to me that if you do, its going to have horrid performance, and if you  
 don't, then you will eventually run into situations where appendToFront  
 will require a wrap around, which loses you your contiguity, or a  
 reallocation of the buffer.

I think it is basically popFrontN(), and appendToFront() a just append.

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 2:22 AM, Ellery Newcomer wrote:
 Does shiftFront literally move element n to index 0 and so on? It seems
 to me that if you do, its going to have horrid performance, and if you
 don't, then you will eventually run into situations where appendToFront
 will require a wrap around, which loses you your contiguity, or a
 reallocation of the buffer.

No, it's a mere internal operation bufpos += n or so.

Andrei

Feb 05 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-02-05 00:46:40 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 I've had the opportunity today to put some solid hours of thinking into 
 the relationship (better said the relatedness) of what would be called 
 "buffered streams" and ranges. They have some commonalities and some 
 differences, but it's been difficult to capture them. I think now I 
 have a clear view, caused by a few recent discussions. One was the CSV 
 reader discussed on the Phobos list; another was the discussion on 
 defining the "right" std.xml.
 
 First, let's start with the humblest abstraction of all - an input 
 range, which only defines the troika empty/front/popFront with the 
 known semantics.
 
 An input range consumes input destructively and has a one-element 
 horizon. It may as well considered a buffered stream with the buffer 
 length exactly one.
 
 Going from there, we may say that certain streaming can be done by 
 using an input range of ubyte (or dchar for text). That would be the 
 UTFpowered equivalent of getchar(). The readf function operates that 
 way - it only needs to look one character ahead. Incidentally, the CSV 
 format also requires lookahead of 1, so it also can operate on a range 
 of dchar.
 
 At this point we need to ask ourselves an essential question. Since we 
 have this "input range" abstraction for a 1-element buffer, what would 
 its n-elements buffer representation look like? How do we go from 
 "input range of T" (which really is "unbuffered input range of T" to 
 "buffered input range of T"?
 
 Honestly, the answer was extremely unclear to me for the longest time. 
 I thought that such a range would be an extension of the unbuffered 
 one, e.g. a range that still offers T from front() but also offers some 
 additional functions - e.g. a lookahead in the form of a random-access 
 operator. I still think something can be defined along those lines, but 
 today I came together with a design that is considerably simpler both 
 for the client and the designer of the range.
 
 I hereby suggest we define "buffered input range of T" any range R that 
 satisfies the following conditions:
 
 1. R is an input range of T[]
 
 2. R defines a primitive shiftFront(size_t n). The semantics of the 
 primitive is that, if r.front.length >= n, then shiftFront(n) discards 
 the first n elements in r.front. Subsequently r.front will return a 
 slice of the remaining elements.
 
 3. R defines a primitive appendToFront(size_t n). Semantics: adds at 
 most n more elements from the underlying stream and makes them 
 available in addition to whatever was in front. For example if 
 r.front.length was 1024, after the call r.appendToFront(512) will have 
 r.front have length 1536 of which the first 1024 will be the old front 
 and the rest will be newly-read elements (assuming that the stream had 
 enough data). If n = 0, this instructs the stream to add any number of 
 elements at its own discretion.
 
 This is it. I like many things about this design, although I still fear 
 some fatal flaw may be found with it.
 
 With these primitives a lot of good operating operating on buffered 
 streams can be written efficiently. The range is allowed to reuse data 
 in its buffers (unless that would contradict language invariants, e.g. 
 if T is invariant), so if client code wants to stash away parts of the 
 input, it needs to make a copy.
 
 One great thing is that buffered ranges as defined above play very well 
 with both ranges and built-in arrays - two quintessential parts of D. I 
 look at this and say, "this all makes sense". For example the design 
 could be generalized to operate on some random-access range other than 
 the built-in array, but then I'm thinking, unless some advantage comes 
 about, why not giving T[] a little special status? Probably everyone 
 thinks of contiguous memory when thinking "buffers", so here 
 generalization may be excessive (albeit meaningful).
 
 Finally, this design is very easy to experiment with and causes no 
 disruption to ranges. I can readily add the primitives to byLine and 
 byChunk so we can see what streaming we can do that way.
 
 What do you all think?

One thing I'm wondering is whether it'd be more efficient if we could 
provide our own buffer to be filled. In cases where you want to 
preserve the data, this could let you avoid double-copying: first copy 
in the temporary buffer and then at the permanent storage location. If 
you need the data only temporarily however providing your buffer to be 
filled might be less efficient for a range that can't avoid copying to 
the temporary buffer for some reason..

Overall, it looks like a good design. It's quite low-level, but that's 
not a bad thing. I'll have to think a little to see how I could 
integrate it into my XML parser (which only deal with complete files in 
memory at this time). Being able to say "fill this buffer" would 
certainly make things easier for me.

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 04 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 08:45 AM, Michel Fortin wrote:
 One thing I'm wondering is whether it'd be more efficient if we could provide
 our own buffer to be filled. In cases where you want to preserve the data, this
 could let you avoid double-copying: first copy in the temporary buffer and then
 at the permanent storage location. If you need the data only temporarily
 however providing your buffer to be filled might be less efficient for a range
 that can't avoid copying to the temporary buffer for some reason..

Does this also makes sense when one needs to iterate over a whole set of source 
data via buferred input rangeS? I mean the same buffer can be reused, avoiding 
repeted allocation (or is this wrong or irrelevant?).

Deins
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-02-05 07:01:24 -0500, spir <denis.spir gmail.com> said:

 On 02/05/2011 08:45 AM, Michel Fortin wrote:
 One thing I'm wondering is whether it'd be more efficient if we could provide
 our own buffer to be filled. In cases where you want to preserve the data, this
 could let you avoid double-copying: first copy in the temporary buffer and then
 at the permanent storage location. If you need the data only temporarily
 however providing your buffer to be filled might be less efficient for a range
 that can't avoid copying to the temporary buffer for some reason..

 
 Does this also makes sense when one needs to iterate over a whole set 
 of source data via buferred input rangeS? I mean the same buffer can be 
 reused, avoiding repeted allocation (or is this wrong or irrelevant?).

As I said in my post, whether a temporary buffer or a user-supplied 
buffer is better depends on whether you plan to store the data beyond 
the temporary buffer's lifetime or not. If you just iterate to 
calculate the SHA1 hash, the temporary buffer is fine (and possibly 
better depending on the range's implementation). If you iterate to 
calculate the SHA1 hash *and* also want to store the file in memory, 
then it's better if you can provide your own buffer which can point 
directly to the permanent storage location and bypass copying to the 
temporary buffer (if the range's implementation allows it).

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 2:45 AM, Michel Fortin wrote:
 One thing I'm wondering is whether it'd be more efficient if we could
 provide our own buffer to be filled. In cases where you want to preserve
 the data, this could let you avoid double-copying: first copy in the
 temporary buffer and then at the permanent storage location. If you need
 the data only temporarily however providing your buffer to be filled
 might be less efficient for a range that can't avoid copying to the
 temporary buffer for some reason..

Generally when one says "I want the stream to copy data straight into my 
buffers" this is the same as "I want the stream to be unbuffered". So if 
you want to provide your own buffers to be filled, we need to discuss 
refining the design of unbuffered input - for example by adding an 
optional routine for bulk transfer to input ranges.

Andrei

Feb 05 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-02-05 10:02:47 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Generally when one says "I want the stream to copy data straight into 
 my buffers" this is the same as "I want the stream to be unbuffered". 
 So if you want to provide your own buffers to be filled, we need to 
 discuss refining the design of unbuffered input - for example by adding 
 an optional routine for bulk transfer to input ranges.

You're right, this is a different thing.

My major gripe with ranges at this time is that it's almost impossible 
to design an algorithm that can take slices *or* make copies depending 
on whether the range supports slicing or not, and whether the slices 
are stable (not going to be mutated when popping elements from the 
range). At least not without writing two implementations of it.

I reread your initial post to get a clearer idea of what it meant. It 
seems to me that your buffered range design could be made to fix that 
hole. If the data you want to parse is all in memory, the buffered 
range could simply use the original array as its buffer; shiftFront 
would simply just the whole array to remove the first n elements while 
appendToFront would do nothing (as the buffer already contains all of 
the content). And if the data is immutable, then it's safe to just take 
a slice of it to preserve it instead of doing a copy. So you can't 
really be more efficient than that, it's just great.

As for getting the data in bulk directly so you can avoid needless 
copies... I think the same optimization is possible with a buffered 
range. All you need is a buffered range that doesn't reuse the buffer, 
presumably one of immutable(T)[]. With it, you can slice at will 
without fear of the data being overwritten at a later time.

So my rereading of your proposal convinced me. Go ahead, I can't wait 
to use it. :-)

-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 05 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-02-05 11:41:18 -0500, Michel Fortin <michel.fortin michelf.com> said:

 If the data you want to parse is all in memory, the buffered range 
 could simply use the original array as its buffer; shiftFront would 
 simply just the whole array to remove the first n elements while 
 appendToFront would do nothing (as the buffer already contains all of 
 the content).

Oops, please change
   "shiftFront would simply just the whole array"
to
   "shiftFront would simply slice the whole array"


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 05 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 05 Feb 2011 10:02:47 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 2/5/11 2:45 AM, Michel Fortin wrote:
 One thing I'm wondering is whether it'd be more efficient if we could
 provide our own buffer to be filled. In cases where you want to preserve
 the data, this could let you avoid double-copying: first copy in the
 temporary buffer and then at the permanent storage location. If you need
 the data only temporarily however providing your buffer to be filled
 might be less efficient for a range that can't avoid copying to the
 temporary buffer for some reason..

 Generally when one says "I want the stream to copy data straight into my  
 buffers" this is the same as "I want the stream to be unbuffered". So if  
 you want to provide your own buffers to be filled, we need to discuss  
 refining the design of unbuffered input - for example by adding an  
 optional routine for bulk transfer to input ranges.

I may want to store 1% of a very large file.  You are saying I must either  
a) unbuffer the entire file (handling the buffering on my own) or b) take  
the penalty and double copy the data.

I want c) temporarily use my buffer for buffering until I say to stop.

The range interface doesn't make this easy...

-Steve

Feb 07 2011

spir <denis.spir gmail.com> writes:

On 02/07/2011 02:01 PM, Steven Schveighoffer wrote:
 On Sat, 05 Feb 2011 10:02:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/5/11 2:45 AM, Michel Fortin wrote:
 One thing I'm wondering is whether it'd be more efficient if we could
 provide our own buffer to be filled. In cases where you want to preserve
 the data, this could let you avoid double-copying: first copy in the
 temporary buffer and then at the permanent storage location. If you need
 the data only temporarily however providing your buffer to be filled
 might be less efficient for a range that can't avoid copying to the
 temporary buffer for some reason..

 Generally when one says "I want the stream to copy data straight into my
 buffers" this is the same as "I want the stream to be unbuffered". So if you
 want to provide your own buffers to be filled, we need to discuss refining
 the design of unbuffered input - for example by adding an optional routine
 for bulk transfer to input ranges.

 I may want to store 1% of a very large file. You are saying I must either a)
 unbuffer the entire file (handling the buffering on my own) or b) take the
 penalty and double copy the data.

 I want c) temporarily use my buffer for buffering until I say to stop.

 The range interface doesn't make this easy...

This looks very similar to a possible current use case of mine. (lookahead on 
demand --> possible backtracking)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 07 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday 04 February 2011 21:46:40 Andrei Alexandrescu wrote:
 I've had the opportunity today to put some solid hours of thinking into
 the relationship (better said the relatedness) of what would be called
 "buffered streams" and ranges. They have some commonalities and some
 differences, but it's been difficult to capture them. I think now I have
 a clear view, caused by a few recent discussions. One was the CSV reader
 discussed on the Phobos list; another was the discussion on defining the
 "right" std.xml.
 
 First, let's start with the humblest abstraction of all - an input
 range, which only defines the troika empty/front/popFront with the known
 semantics.
 
 An input range consumes input destructively and has a one-element
 horizon. It may as well considered a buffered stream with the buffer
 length exactly one.
 
 Going from there, we may say that certain streaming can be done by using
 an input range of ubyte (or dchar for text). That would be the
 UTFpowered equivalent of getchar(). The readf function operates that way
 - it only needs to look one character ahead. Incidentally, the CSV
 format also requires lookahead of 1, so it also can operate on a range
 of dchar.
 
 At this point we need to ask ourselves an essential question. Since we
 have this "input range" abstraction for a 1-element buffer, what would
 its n-elements buffer representation look like? How do we go from "input
 range of T" (which really is "unbuffered input range of T" to "buffered
 input range of T"?
 
 Honestly, the answer was extremely unclear to me for the longest time. I
 thought that such a range would be an extension of the unbuffered one,
 e.g. a range that still offers T from front() but also offers some
 additional functions - e.g. a lookahead in the form of a random-access
 operator. I still think something can be defined along those lines, but
 today I came together with a design that is considerably simpler both
 for the client and the designer of the range.
 
 I hereby suggest we define "buffered input range of T" any range R that
 satisfies the following conditions:
 
 1. R is an input range of T[]
 
 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards
 the first n elements in r.front. Subsequently r.front will return a
 slice of the remaining elements.
 
 3. R defines a primitive appendToFront(size_t n). Semantics: adds at
 most n more elements from the underlying stream and makes them available
 in addition to whatever was in front. For example if r.front.length was
 1024, after the call r.appendToFront(512) will have r.front have length
 1536 of which the first 1024 will be the old front and the rest will be
 newly-read elements (assuming that the stream had enough data). If n =
 0, this instructs the stream to add any number of elements at its own
 discretion.
 
 This is it. I like many things about this design, although I still fear
 some fatal flaw may be found with it.
 
 With these primitives a lot of good operating operating on buffered
 streams can be written efficiently. The range is allowed to reuse data
 in its buffers (unless that would contradict language invariants, e.g.
 if T is invariant), so if client code wants to stash away parts of the
 input, it needs to make a copy.
 
 One great thing is that buffered ranges as defined above play very well
 with both ranges and built-in arrays - two quintessential parts of D. I
 look at this and say, "this all makes sense". For example the design
 could be generalized to operate on some random-access range other than
 the built-in array, but then I'm thinking, unless some advantage comes
 about, why not giving T[] a little special status? Probably everyone
 thinks of contiguous memory when thinking "buffers", so here
 generalization may be excessive (albeit meaningful).
 
 Finally, this design is very easy to experiment with and causes no
 disruption to ranges. I can readily add the primitives to byLine and
 byChunk so we can see what streaming we can do that way.
 
 What do you all think?

Hmm. I think that I'd have to have an actual implementation to mess around with 
to say much. My general take on buffered input is that I don't want to worry 
about it. I want it to be buffered so that it's more efficient, but I don't
want to 
have to care about it in how I use it. I would have expected a buffered input 
range to be exactly the same as an input range except that it doesn't really 
just pull in one character behind the scenes. It pulls in 1024 or whatever when 
popFront() would result in the end of the buffer being reached, and you just
get 
the first one with front. The API doesn't reflect the fact that it's buffered
at 
all except perhaps in how you initialize it (by telling how big the buffer is, 
though generally I don't want to have to care about that either).

Now, there may be some sort of use case where you actually need to care about 
the buffering, so using buffered data in the way that I was thinking wouldn't 
really work. But if so, that's not the sort of use case that I normally run 
into.

Regardless, a more normal range could be built on top of what you're
suggesting, 
and it could do essentially what I was thinking buffered ranges would do. So, 
perhaps doing what you're suggesting and building what I was thinking of on top 
of that would be the way to go. That way, if you actually care about messing 
with the buffer, you can, but if not, you just use it normally and the
buffering 
is dealt with underneath.

- Jonathan M Davis

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 5:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess around with
 to say much. My general take on buffered input is that I don't want to worry
 about it. I want it to be buffered so that it's more efficient, but I don't
want to
 have to care about it in how I use it. I would have expected a buffered input
 range to be exactly the same as an input range except that it doesn't really
 just pull in one character behind the scenes. It pulls in 1024 or whatever when
 popFront() would result in the end of the buffer being reached, and you just
get
 the first one with front. The API doesn't reflect the fact that it's buffered
at
 all except perhaps in how you initialize it (by telling how big the buffer is,
 though generally I don't want to have to care about that either).

Transparent buffering sounds sensible but in fact it robs you of 
important capabilities. It essentially forces you to use grammars with 
lookahead 1 for all input operations. Being able to peek forward into 
the stream without committing to read from it allows you to e.g. do 
operations like "does this stream start with a specific word" etc. As soon

Andrei

Feb 05 2011

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu napisa=C5=82:

 Transparent buffering sounds sensible but in fact it robs you of=20
 important capabilities. It essentially forces you to use grammars with=20
 lookahead 1 for all input operations. Being able to peek forward into=20
 the stream without committing to read from it allows you to e.g. do=20
 operations like "does this stream start with a specific word" etc. As soon

Broken sentence?

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 12:59 PM, Tomek Sowiński wrote:
 Andrei Alexandrescu napisał:

 Transparent buffering sounds sensible but in fact it robs you of
 important capabilities. It essentially forces you to use grammars with
 lookahead 1 for all input operations. Being able to peek forward into
 the stream without committing to read from it allows you to e.g. do
 operations like "does this stream start with a specific word" etc. As soon

 Broken sentence?

Sorry. Well it was nothing interesting anyway.

Andrei

Feb 05 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday 05 February 2011 07:16:45 Andrei Alexandrescu wrote:
 On 2/5/11 5:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess
 around with to say much. My general take on buffered input is that I
 don't want to worry about it. I want it to be buffered so that it's more
 efficient, but I don't want to have to care about it in how I use it. I
 would have expected a buffered input range to be exactly the same as an
 input range except that it doesn't really just pull in one character
 behind the scenes. It pulls in 1024 or whatever when popFront() would
 result in the end of the buffer being reached, and you just get the
 first one with front. The API doesn't reflect the fact that it's
 buffered at all except perhaps in how you initialize it (by telling how
 big the buffer is, though generally I don't want to have to care about
 that either).

 
 Transparent buffering sounds sensible but in fact it robs you of
 important capabilities. It essentially forces you to use grammars with
 lookahead 1 for all input operations. Being able to peek forward into
 the stream without committing to read from it allows you to e.g. do
 operations like "does this stream start with a specific word" etc. As soon

The thing is though that if I want to be iterating over a string which is 
buffered (from a file or stream or whatever), I want front to be
immutable(char) 
or char, not immutable(char)[] or char[]. I can see how having an interface 
which allows startsWith to efficiently check whether the buffered string starts 
with a particular string makes good sense, but generally, as far as I'm 
concerned, that's startsWith's problem. How would I even begin to use a
buffered 
range of string[] as a string?

Normally, when I've used buffered anything, it's been purely for efficiency 
reasons. All I've cared about is having a stream or file or whatever. The fact 
that reading it from the file (or wherever it came from) in a buffered manner
is 
more efficient means that I want it buffered, but that hasn't had any effect on
how 
I've used it. If I want x characters from the file, I ask for x characters.
It's 
the buffered object's problem how many reads that does or doesn't do.

You must be thinking of a use case which I don't normal think of or am not
aware 
of. In my experience, buffering has always been an implementation detail that
you 
use because it's more efficient, but you don't worry about it beyond creating a 
buffered stream rather than an unbuffered one.

- Jonathan M Davis

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:iijpp7$197f$1 digitalmars.com...
 On 2/5/11 5:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess 
 around with
 to say much. My general take on buffered input is that I don't want to 
 worry
 about it. I want it to be buffered so that it's more efficient, but I 
 don't want to
 have to care about it in how I use it. I would have expected a buffered 
 input
 range to be exactly the same as an input range except that it doesn't 
 really
 just pull in one character behind the scenes. It pulls in 1024 or 
 whatever when
 popFront() would result in the end of the buffer being reached, and you 
 just get
 the first one with front. The API doesn't reflect the fact that it's 
 buffered at
 all except perhaps in how you initialize it (by telling how big the 
 buffer is,
 though generally I don't want to have to care about that either).

 Transparent buffering sounds sensible but in fact it robs you of important 
 capabilities. It essentially forces you to use grammars with lookahead 1 
 for all input operations. Being able to peek forward into the stream 
 without committing to read from it allows you to e.g. do operations like 
 "does this stream start with a specific word" etc. As soon

That shouldn't be a problem for the cases where a lookahead of 1 is all 
that's needed. So both types can exist (with the traditional/automatic type 
most likely built on top of Andrei's type). Thus, I think the only question 
is "Are the appropriate use-cases for the traditional/automatic type minor 
enough and infrequent enough to actively discourage it by not providing it?" 
That I can't answer.

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 11:00 PM, Nick Sabalausky wrote:
 Transparent buffering sounds sensible but in fact it robs you of important
  capabilities. It essentially forces you to use grammars with lookahead 1
  for all input operations. Being able to peek forward into the stream
  without committing to read from it allows you to e.g. do operations like
  "does this stream start with a specific word" etc. As soon


 That shouldn't be a problem for the cases where a lookahead of 1 is all
 that's needed. So both types can exist (with the traditional/automatic type
 most likely built on top of Andrei's type). Thus, I think the only question
 is "Are the appropriate use-cases for the traditional/automatic type minor
 enough and infrequent enough to actively discourage it by not providing it?"
 That I can't answer.

And what about backtracking (eg for parsing the source)?

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"spir" <denis.spir gmail.com> wrote in message 
news:mailman.1321.1296950957.4748.digitalmars-d puremagic.com...
 On 02/05/2011 11:00 PM, Nick Sabalausky wrote:
 Transparent buffering sounds sensible but in fact it robs you of 
 important
  capabilities. It essentially forces you to use grammars with 
 lookahead 1
  for all input operations. Being able to peek forward into the stream
  without committing to read from it allows you to e.g. do operations 
 like
  "does this stream start with a specific word" etc. As soon


 That shouldn't be a problem for the cases where a lookahead of 1 is all
 that's needed. So both types can exist (with the traditional/automatic 
 type
 most likely built on top of Andrei's type). Thus, I think the only 
 question
 is "Are the appropriate use-cases for the traditional/automatic type 
 minor
 enough and infrequent enough to actively discourage it by not providing 
 it?"
 That I can't answer.

 And what about backtracking (eg for parsing the source)?

Like I said, there are certainly cases where a lookahead of 1 isn't 
sufficient, and for those, something more like Andrei's proposal can be 
used.

(FWIW, LR doesn't usually need backtracking. That's more typically an LL 
thing. Not that LL is any less important, though. Of course, if the lexical 
grammer supports non-consuming lookahead, then you'd still need lookahead >1 
no matter what parsing algorithm is used.)

Feb 05 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday 05 February 2011 12:57:21 Jonathan M Davis wrote:
 On Saturday 05 February 2011 07:16:45 Andrei Alexandrescu wrote:
 On 2/5/11 5:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess
 around with to say much. My general take on buffered input is that I
 don't want to worry about it. I want it to be buffered so that it's
 more efficient, but I don't want to have to care about it in how I use
 it. I would have expected a buffered input range to be exactly the
 same as an input range except that it doesn't really just pull in one
 character behind the scenes. It pulls in 1024 or whatever when
 popFront() would result in the end of the buffer being reached, and
 you just get the first one with front. The API doesn't reflect the
 fact that it's buffered at all except perhaps in how you initialize it
 (by telling how big the buffer is, though generally I don't want to
 have to care about that either).

 
 Transparent buffering sounds sensible but in fact it robs you of
 important capabilities. It essentially forces you to use grammars with
 lookahead 1 for all input operations. Being able to peek forward into
 the stream without committing to read from it allows you to e.g. do
 operations like "does this stream start with a specific word" etc. As
 soon

 
 The thing is though that if I want to be iterating over a string which is
 buffered (from a file or stream or whatever), I want front to be
 immutable(char) or char, not immutable(char)[] or char[]. I can see how
 having an interface which allows startsWith to efficiently check whether
 the buffered string starts with a particular string makes good sense, but
 generally, as far as I'm concerned, that's startsWith's problem. How would
 I even begin to use a buffered range of string[] as a string?
 
 Normally, when I've used buffered anything, it's been purely for efficiency
 reasons. All I've cared about is having a stream or file or whatever. The
 fact that reading it from the file (or wherever it came from) in a
 buffered manner is more efficient means that I want it buffered, but that
 hasn't had any effect on how I've used it. If I want x characters from the
 file, I ask for x characters. It's the buffered object's problem how many
 reads that does or doesn't do.
 
 You must be thinking of a use case which I don't normal think of or am not
 aware of. In my experience, buffering has always been an implementation
 detail that you use because it's more efficient, but you don't worry about
 it beyond creating a buffered stream rather than an unbuffered one.

Okay. I think that I've been misunderstanding some stuff here. I forgot that we 
were dealing with input ranges rather than forward ranges, and many range 
functions just don't work with input ranges, since they lack save(). Bleh.

Okay. Honestly, what I'd normally want to be dealing with when reading a stream 
or file is a buffered forward range which is implemented in a manner which 
minimized copies. Having to deal with a input range, let alone what Andrei is 
suggesting here would definitely be annoying to say the least.

Couldn't we do something which created a new buffer each time that it read in 
data from a file, and then it could be a forward range with infinite
look-ahead. 
The cost of creating a new buffer would likely be minimal, if not outright 
neglible, in comparison to reading in the data from a file, and having multiple 
buffers would allow it to be a forward range. Perhaps, the creation of a new 
buffer could even be skipped if save had never been called and therefore no 
external references to the buffer would exist - at least as long as we're
talking 
about bytes or characters or other value types.

Maybe there's some major flaw in that basic idea. I don't know. But Andrei's 
suggestion sounds like a royal pain for basic I/O. If that's all I had to deal 
with when trying to lazily read in a file and process it, I'd just use
readText() 
instead, since it would just be way easier to use. But that's not exactly
ideal, 
because it doesn't work well with large files. Maybe Andrei's idea is great to 
have and maybe it _should_ be in Phobos, but I really think that we need a 
higher level abstraction that makes a stream into a forward range so that it's 
actually simple to use buffered I/O. As efficient as Andrei's suggestion may
be, it 
sure sounds like a royal pain to use - especially in comparison to readText().

So, maybe I'm still misunderstanding or missing something here, but what _I_ 
want to see for I/O streams is a _forward_ range which is buffered and which 
reads in the file or whatever the data comes from in a lazy manner. The more I 
think about it, the less I like input ranges. They're just so painfully 
restrictive. They may be necessary at times, but I'd _much_ prefer to deal with 
forward ranges.

On a related note, perhaps we should add a function to some subset of ranges 
which is called something like frontN() and returns a T[] (or perhaps a range 
over type T) when the range is a range over type T. That way, you could grab a 
whole chunk at once without taking it out of the range or having to process the 
range element by element. It wouldn't even need to be a random access range. It 
would just need enough lookahead to grab the first n elements.

So, I don't know what the best solution to this problem is, but I'd _really_ 
like one which makes buffered I/O _simple_, and while Andrei's solution may be
a 
great building block, it is _not_ simple.

- Jonathan M Davis

Feb 06 2011

Michel Fortin <michel.fortin michelf.com> writes:

On 2011-02-06 03:22:08 -0500, Jonathan M Davis <jmdavisProg gmx.com> said:

 So, maybe I'm still misunderstanding or missing something here, but what _I_
 want to see for I/O streams is a _forward_ range which is buffered and which
 reads in the file or whatever the data comes from in a lazy manner. The more I
 think about it, the less I like input ranges. They're just so painfully
 restrictive. They may be necessary at times, but I'd _much_ prefer to deal with
 forward ranges.

It's true that forward ranges are much easier to deal with. That said, 
if the underlying abstraction does not support saving a position to 
return to it later (think of a network stream), implementing them is 
rather impractical, even though it might be possible with buffering.

But still, let's let's see how we could implement a forward range on 
top of the buffered range abstraction could be done:

1. Wrap the buffered range in a data structure that keeps track of the 
absolute offset for the first element in the range and contains a 
sorted list of offsets representing all the forward ranges iterating on 
it.

2. Make is so that each time the smallest offset in the list is 
advanced you call shiftFront to to remove from the buffer the no longer 
referenced elements; and each time an offset goes beyond what's 
available in the buffer you call appendToFront to make elements up to 
that index available in the buffer. This ensures the buffer always 
cover all of the offsets in the list. To make this efficient, the list 
must be kept sorted at all times.

3. Then you can create forward ranges as reference to that data 
structure: all they have to do is maintain their offset in the list. 
Creating a new range would just create another offset in the list and 
ensures the buffer is preserved. When the range advances or gets 
destroyed it updates or destroy its offset in the list accordingly. 
This way the buffer always covers all forward ranges.

While it is indeed possible to do what you want, it's hardly a good 
abstraction to build an efficient parsing algorithm. There's just too 
much bookkeeping to do. And you must be sure saved ranges get destroyed 
as soon as possible to avoid the buffer from growing too much, 
something you usually don't have to care about with forward ranges.


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 3:22 EST, Jonathan M Davis wrote:
 Okay. I think that I've been misunderstanding some stuff here. I forgot that we
 were dealing with input ranges rather than forward ranges, and many range
 functions just don't work with input ranges, since they lack save(). Bleh.

 Okay. Honestly, what I'd normally want to be dealing with when reading a stream
 or file is a buffered forward range which is implemented in a manner which
 minimized copies. Having to deal with a input range, let alone what Andrei is
 suggesting here would definitely be annoying to say the least.

 Couldn't we do something which created a new buffer each time that it read in
 data from a file, and then it could be a forward range with infinite
look-ahead.
 The cost of creating a new buffer would likely be minimal, if not outright
 neglible, in comparison to reading in the data from a file, and having multiple
 buffers would allow it to be a forward range. Perhaps, the creation of a new
 buffer could even be skipped if save had never been called and therefore no
 external references to the buffer would exist - at least as long as we're
talking
 about bytes or characters or other value types.

APIs predicated on the notion that I/O is very expensive and extra 
overheads are not measurable have paid dearly for it (e.g. C++'s iostreams).

 Maybe there's some major flaw in that basic idea. I don't know. But Andrei's
 suggestion sounds like a royal pain for basic I/O. If that's all I had to deal
 with when trying to lazily read in a file and process it, I'd just use
readText()
 instead, since it would just be way easier to use.

Clearly reading the entire file in an in-memory structure simplifies 
things. But the proposed streaming interface is extremely convenient as 
it always was; the two added APIs help people who need extra flexibility 
without hurting efficiency.

If you want to read a file in Java: 
http://www.java-tips.org/java-se-tips/java.io/how-to-read-file-in-java.html

In C (with many caveats): http://www.phanderson.com/files/file_read.html

In D:

foreach (line; File("name").byLine()) {
    ...
}

I plan to add a simpler API:

foreach (line; File.byLine("name")) {
    ...
}

To read fixed-sized chunks, use byChunk. This covers the vast majority 
of file I/O needs.

There are two limitations of the current APIs:

1. You can't add a new line to the existing line (or a buffer to the 
existing buffer) if you sometimes want to process multiple lines as a 
logical unit (some programs and file formats need that, as well as 
composing streams).

2. You can't comfortably read data of user-specified size if that size 
varies. This is the case for e.g. binary formats where you need to read 
"doped chunks", i.e. chunks prefixed by their lengths.

My proposal addresses 1 and makes 2 possible.


Andrei

Feb 06 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 I've had the opportunity today to put some solid hours of thinking into 
 the relationship (better said the relatedness) of what would be called 
 "buffered streams" and ranges.

This is an important part of the range design. This range is useful for other
things too, like:
- increasing efficiency of some lazy operations, as already done in Clojure. A
buffer is meant to be CPU cache friendly, increasing performance of numeric
code too.
- Buffered I/O
- The chunked lazy parallel map dsimcha is working on
- Creating a chunked interface in Phobos for DBMSs

See some of my posts about it:
http://www.digitalmars.com/d/archives/digitalmars/D/Vectorized_Laziness_100525.html
http://www.digitalmars.com/d/archives/digitalmars/D/Re_Vectorized_Laziness_100676.html
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=103882
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=125876

Bye,
bearophile

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 11:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess around with
 to say much. My general take on buffered input is that I don't want to worry
 about it. I want it to be buffered so that it's more efficient, but I don't
want to
 have to care about it in how I use it. I would have expected a buffered input
 range to be exactly the same as an input range except that it doesn't really
 just pull in one character behind the scenes. It pulls in 1024 or whatever when
 popFront() would result in the end of the buffer being reached, and you just
get
 the first one with front. The API doesn't reflect the fact that it's buffered
at
 all except perhaps in how you initialize it (by telling how big the buffer is,
 though generally I don't want to have to care about that either).
 [...]
 Regardless, a more normal range could be built on top of what you're
suggesting,
 and it could do essentially what I was thinking buffered ranges would do. So,
 perhaps doing what you're suggesting and building what I was thinking of on top
 of that would be the way to go. That way, if you actually care about messing
 with the buffer, you can, but if not, you just use it normally and the
buffering
 is dealt with underneath.

Exactly. I would love something like:
     auto bufInputRange (R) (R inputRange, size_t capacity=0) if (...)
Meaning one can specify (max) buffering capacity; else there is a standard 
(re)sizing scheme. Just like dyn array (re)sizing.

Side-question to specialists: What should actual buf capacity depend on?


Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 6:57 AM, spir wrote:
 On 02/05/2011 11:09 AM, Jonathan M Davis wrote:
 Hmm. I think that I'd have to have an actual implementation to mess
 around with
 to say much. My general take on buffered input is that I don't want to
 worry
 about it. I want it to be buffered so that it's more efficient, but I
 don't want to
 have to care about it in how I use it. I would have expected a
 buffered input
 range to be exactly the same as an input range except that it doesn't
 really
 just pull in one character behind the scenes. It pulls in 1024 or
 whatever when
 popFront() would result in the end of the buffer being reached, and
 you just get
 the first one with front. The API doesn't reflect the fact that it's
 buffered at
 all except perhaps in how you initialize it (by telling how big the
 buffer is,
 though generally I don't want to have to care about that either).
 [...]
 Regardless, a more normal range could be built on top of what you're
 suggesting,
 and it could do essentially what I was thinking buffered ranges would
 do. So,
 perhaps doing what you're suggesting and building what I was thinking
 of on top
 of that would be the way to go. That way, if you actually care about
 messing
 with the buffer, you can, but if not, you just use it normally and the
 buffering
 is dealt with underneath.

 Exactly. I would love something like:
 auto bufInputRange (R) (R inputRange, size_t capacity=0) if (...)
 Meaning one can specify (max) buffering capacity; else there is a
 standard (re)sizing scheme. Just like dyn array (re)sizing.

 Side-question to specialists: What should actual buf capacity depend on?


 Denis

My design would allow you to build a buffered input range from an input 
range, but only with infinite capacity.

Andrei

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 04:27 PM, Andrei Alexandrescu wrote:
 On 2/5/11 6:57 AM, spir wrote:

 Exactly. I would love something like:
 auto bufInputRange (R) (R inputRange, size_t capacity=0) if (...)
 Meaning one can specify (max) buffering capacity; else there is a
 standard (re)sizing scheme. Just like dyn array (re)sizing.

 Side-question to specialists: What should actual buf capacity depend on?


 Denis

 My design would allow you to build a buffered input range from an input range,
 but only with infinite capacity.

 Andrei

All right, point taken.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Tomek =?ISO-8859-2?Q?Sowi=F1ski?= <just ask.me> writes:

Andrei Alexandrescu napisa=B3:

 I hereby suggest we define "buffered input range of T" any range R that=20
 satisfies the following conditions:
=20
 1. R is an input range of T[]
=20
 2. R defines a primitive shiftFront(size_t n). The semantics of the=20
 primitive is that, if r.front.length >=3D n, then shiftFront(n) discards=

=20
 the first n elements in r.front. Subsequently r.front will return a=20
 slice of the remaining elements.
=20
 3. R defines a primitive appendToFront(size_t n). Semantics: adds at=20
 most n more elements from the underlying stream and makes them available=

=20
 in addition to whatever was in front. For example if r.front.length was=20
 1024, after the call r.appendToFront(512) will have r.front have length=20
 1536 of which the first 1024 will be the old front and the rest will be=20
 newly-read elements (assuming that the stream had enough data). If n =3D=

=20
 0, this instructs the stream to add any number of elements at its own=20
 discretion.

I don't see a clear need for the two to be separate. Could they fold into p=
opFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() =
discards all and loads any number it pleases.

 This is it. I like many things about this design, although I still fear=20
 some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on buffered=20
 streams can be written efficiently. The range is allowed to reuse data=20
 in its buffers (unless that would contradict language invariants, e.g.=20
 if T is invariant), so if client code wants to stash away parts of the=20
 input, it needs to make a copy.

Some users would benefit if they could just pass in a buffer and say "fill'=
er up".

 One great thing is that buffered ranges as defined above play very well=20
 with both ranges and built-in arrays - two quintessential parts of D. I=20
 look at this and say, "this all makes sense". For example the design=20
 could be generalized to operate on some random-access range other than=20
 the built-in array, but then I'm thinking, unless some advantage comes=20
 about, why not giving T[] a little special status? Probably everyone=20
 thinks of contiguous memory when thinking "buffers", so here=20
 generalization may be excessive (albeit meaningful).

Contiguous, yes. But I'd rather see front() exposing, say, a circular buffe=
r so that appendToFront(n) reallocates only when n > buf.length.

--=20
Tomek

Feb 05 2011

Tomek =?ISO-8859-2?Q?Sowi=F1ski?= <just ask.me> writes:

Tomek Sowi=F1ski napisa=B3:

 Contiguous, yes. But I'd rather see front() exposing, say, a circular buf=

fer so that appendToFront(n)=20
 reallocates only when n > buf.length.

I meant: when n + front.length > buf.length.

--=20
Tomek

Feb 05 2011

Jean Crystof <a a.a> writes:

Tomek Sowi�ski Wrote:

 Andrei Alexandrescu napisa�:
 
 I hereby suggest we define "buffered input range of T" any range R that 
 satisfies the following conditions:
 
 1. R is an input range of T[]
 
 2. R defines a primitive shiftFront(size_t n). The semantics of the 
 primitive is that, if r.front.length >= n, then shiftFront(n) discards 
 the first n elements in r.front. Subsequently r.front will return a 
 slice of the remaining elements.
 
 3. R defines a primitive appendToFront(size_t n). Semantics: adds at 
 most n more elements from the underlying stream and makes them available 
 in addition to whatever was in front. For example if r.front.length was 
 1024, after the call r.appendToFront(512) will have r.front have length 
 1536 of which the first 1024 will be the old front and the rest will be 
 newly-read elements (assuming that the stream had enough data). If n = 
 0, this instructs the stream to add any number of elements at its own 
 discretion.

 
 I don't see a clear need for the two to be separate. Could they fold into
popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront()
discards all and loads any number it pleases.
 
 This is it. I like many things about this design, although I still fear 
 some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on buffered 
 streams can be written efficiently. The range is allowed to reuse data 
 in its buffers (unless that would contradict language invariants, e.g. 
 if T is invariant), so if client code wants to stash away parts of the 
 input, it needs to make a copy.

 
 Some users would benefit if they could just pass in a buffer and say "fill'er
up".
 
 One great thing is that buffered ranges as defined above play very well 
 with both ranges and built-in arrays - two quintessential parts of D. I 
 look at this and say, "this all makes sense". For example the design 
 could be generalized to operate on some random-access range other than 
 the built-in array, but then I'm thinking, unless some advantage comes 
 about, why not giving T[] a little special status? Probably everyone 
 thinks of contiguous memory when thinking "buffers", so here 
 generalization may be excessive (albeit meaningful).

 
 Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer
so that appendToFront(n) reallocates only when n > buf.length.

I find this discussion interesting. There's one idea for an application I'd
like to try at some point. Basically a facebook chat thingie, but with richer
gaming features. The expected audience will be 10 - 100K simultaneous clients
connecting to a single server. Not sure if DOM or SAX will be better. After
seeing the Tango's XML benchmarks I was convinced that the implementation
platform will be D1/Tango, but now it looks like Phobos is also getting there,
propably even outperforming Tango by a clear margin.

Since even looking at Tango's documentation has intellectual property problems
and likely causes taint, I could make an independent benchmark comparing the
two and their interfaces later. But I propaply need to avoid going into too
much details, otherwise the Phobos developers wouldn't be able to read it
without changing their license. From what I've read so far, the proposed design
looks very much like what Tango has now in their I/O framework. But probably
Phobos's TLS default and immutable strings improve multithreaded performance
even more.

Feb 05 2011

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Jean Crystof napisa=C5=82:

 I find this discussion interesting. There's one idea for an application I=

'd like to try at some point. Basically a facebook chat thingie, but with r=
icher gaming features. The expected audience will be 10 - 100K simultaneous=
 clients connecting to a single server. Not sure if DOM or SAX will be bett=
er. After seeing the Tango's XML benchmarks I was convinced that the implem=
entation platform will be D1/Tango, but now it looks like Phobos is also ge=
tting there, propably even outperforming Tango by a clear margin.

Thanks for having faith ;-)

 Since even looking at Tango's documentation has intellectual property pro=

blems and likely causes taint, I could make an independent benchmark compar=
ing the two and their interfaces later. But I propaply need to avoid going =
into too much details, otherwise the Phobos developers wouldn't be able to =
read it without changing their license.=20

That would be helpful.

 From what I've read so far, the proposed design looks very much like what=

 Tango has now in their I/O framework. But probably Phobos's TLS default an=
d immutable strings improve multithreaded performance even more.

Well, immutability doesn't help much because a buffer must be written to.

Speaking of multithreading, I was thinking of an implementation where an in=
ternal thread is doing I/O. It loads data in front of the current front() s=
lice, as much as the internal buffer can hold. The motivation is to overlap=
 content processing and I/O operations so that less time is spent in total.=
 Although there is some interaction overhead: locking, syncing caches so th=
at cores see the same buffer.

--=20
Tomek

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Jean Crystof" <a a.a> wrote in message 
news:iijl2t$10np$1 digitalmars.com...
 Tomek Sowi�ski Wrote:

 Andrei Alexandrescu napisa�:

 I hereby suggest we define "buffered input range of T" any range R that
 satisfies the following conditions:

 1. R is an input range of T[]

 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards
 the first n elements in r.front. Subsequently r.front will return a
 slice of the remaining elements.

 3. R defines a primitive appendToFront(size_t n). Semantics: adds at
 most n more elements from the underlying stream and makes them 
 available
 in addition to whatever was in front. For example if r.front.length was
 1024, after the call r.appendToFront(512) will have r.front have length
 1536 of which the first 1024 will be the old front and the rest will be
 newly-read elements (assuming that the stream had enough data). If n =
 0, this instructs the stream to add any number of elements at its own
 discretion.

 I don't see a clear need for the two to be separate. Could they fold into 
 popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary 
 popFront() discards all and loads any number it pleases.

 This is it. I like many things about this design, although I still fear
 some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on buffered
 streams can be written efficiently. The range is allowed to reuse data
 in its buffers (unless that would contradict language invariants, e.g.
 if T is invariant), so if client code wants to stash away parts of the
 input, it needs to make a copy.

 Some users would benefit if they could just pass in a buffer and say 
 "fill'er up".

 One great thing is that buffered ranges as defined above play very well
 with both ranges and built-in arrays - two quintessential parts of D. I
 look at this and say, "this all makes sense". For example the design
 could be generalized to operate on some random-access range other than
 the built-in array, but then I'm thinking, unless some advantage comes
 about, why not giving T[] a little special status? Probably everyone
 thinks of contiguous memory when thinking "buffers", so here
 generalization may be excessive (albeit meaningful).

 Contiguous, yes. But I'd rather see front() exposing, say, a circular 
 buffer so that appendToFront(n) reallocates only when n > buf.length.

 I find this discussion interesting. There's one idea for an application 
 I'd like to try at some point. Basically a facebook chat thingie, but with 
 richer gaming features. The expected audience will be 10 - 100K 
 simultaneous clients connecting to a single server. Not sure if DOM or SAX 
 will be better. After seeing the Tango's XML benchmarks I was convinced 
 that the implementation platform will be D1/Tango, but now it looks like 
 Phobos is also getting there, propably even outperforming Tango by a clear 
 margin.

I don'r mean to derail the topic, but if I were aiming for that many 
simultaneous users I wouldn't even consider using XML at all. Despite MS's, 
Java's and AJAX's infatuation with it, XML is really only appropriate in two 
situations: 1. When memory/bandwidth/speed/etc don't matter and 2. When you 
don't have a choice in the matter.

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 8:41 AM, Tomek Sowi�ski wrote:
 Andrei Alexandrescu napisa�:

 I hereby suggest we define "buffered input range of T" any range R
 that satisfies the following conditions:

 1. R is an input range of T[]

 2. R defines a primitive shiftFront(size_t n). The semantics of
 the primitive is that, if r.front.length>= n, then shiftFront(n)
 discards the first n elements in r.front. Subsequently r.front will
 return a slice of the remaining elements.

 3. R defines a primitive appendToFront(size_t n). Semantics: adds
 at most n more elements from the underlying stream and makes them
 available in addition to whatever was in front. For example if
 r.front.length was 1024, after the call r.appendToFront(512) will
 have r.front have length 1536 of which the first 1024 will be the
 old front and the rest will be newly-read elements (assuming that
 the stream had enough data). If n = 0, this instructs the stream to
 add any number of elements at its own discretion.

 I don't see a clear need for the two to be separate. Could they fold
 into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary
 popFront() discards all and loads any number it pleases.

I think combining the two into one hurts usability as often you want to 
do one without the other.

 This is it. I like many things about this design, although I still
 fear some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on
 buffered streams can be written efficiently. The range is allowed
 to reuse data in its buffers (unless that would contradict language
 invariants, e.g. if T is invariant), so if client code wants to
 stash away parts of the input, it needs to make a copy.

 Some users would benefit if they could just pass in a buffer and say
 "fill'er up".

Correct. That observation applies to unbuffered input as well.

 One great thing is that buffered ranges as defined above play very
 well with both ranges and built-in arrays - two quintessential
 parts of D. I look at this and say, "this all makes sense". For
 example the design could be generalized to operate on some
 random-access range other than the built-in array, but then I'm
 thinking, unless some advantage comes about, why not giving T[] a
 little special status? Probably everyone thinks of contiguous
 memory when thinking "buffers", so here generalization may be
 excessive (albeit meaningful).

 Contiguous, yes. But I'd rather see front() exposing, say, a circular
 buffer so that appendToFront(n) reallocates only when n>
 buf.length.

I think circularity is an implementation detail that is poor as a 
client-side abstraction.


Andrei

Feb 05 2011

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

Andrei Alexandrescu napisa=C5=82:

 I don't see a clear need for the two to be separate. Could they fold
 into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary
 popFront() discards all and loads any number it pleases. =20

=20
 I think combining the two into one hurts usability as often you want to=20
 do one without the other.

OK, but if you go this way, what would popFront() do?

 Some users would benefit if they could just pass in a buffer and say
 "fill'er up". =20

=20
 Correct. That observation applies to unbuffered input as well.

Right.

 Contiguous, yes. But I'd rather see front() exposing, say, a circular
 buffer so that appendToFront(n) reallocates only when n>
 buf.length.
 =20

=20
 I think circularity is an implementation detail that is poor as a=20
 client-side abstraction.

I fear efficiency will get abstracted out. Say this is my internal buffer (=
pipes indicate front() slice):

[ooo|oooooo|oo]

Now I do appendToFront(3) -- how do you expose the expected front() without=
 moving data?

--=20
Tomek

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 1:18 PM, Tomek Sowiński wrote:
 Andrei Alexandrescu napisał:

 I don't see a clear need for the two to be separate. Could they fold
 into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary
 popFront() discards all and loads any number it pleases.

 I think combining the two into one hurts usability as often you want to
 do one without the other.

 OK, but if you go this way, what would popFront() do?

Discard everything in the current buffer and fill a new buffer. The new 
size depends on the stream; for byLine a new line would be read, for 
byChunk(4096) 4096 more bytes would be read.

 Some users would benefit if they could just pass in a buffer and say
 "fill'er up".

 Correct. That observation applies to unbuffered input as well.

 Right.

 Contiguous, yes. But I'd rather see front() exposing, say, a circular
 buffer so that appendToFront(n) reallocates only when n>
 buf.length.

 I think circularity is an implementation detail that is poor as a
 client-side abstraction.

 I fear efficiency will get abstracted out. Say this is my internal buffer
(pipes indicate front() slice):

 [ooo|oooooo|oo]

 Now I do appendToFront(3) -- how do you expose the expected front() without
moving data?

You do end up moving data, but proportionally little if the buffer is 
large enough.


Andrei

Feb 05 2011

Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:

Andrei Alexandrescu napisa=B3:

 I fear efficiency will get abstracted out. Say this is my internal buff=


er (pipes indicate front() slice):
 [ooo|oooooo|oo]

 Now I do appendToFront(3) -- how do you expose the expected front() wit=


hout moving data? =20
=20
 You do end up moving data, but proportionally little if the buffer is=20
 large enough.

It still matters for frequent big munches. I'd like a minimum memory option=
 if that's neccessary.

--=20
Tomek

Feb 05 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Dang, you beat me to my post on what I have run into trying to provide a
slice-able, assignable, buffered Forward Range.

I was doing some work on a CSV parser. It is rather simple to build a proper
parser from an input range. But providing the ability to use custom separators
which could be of any length did not work well with a forward range. It was no
longer a look-ahead of one. So I started examining how Splitter[1] work with
slice-able ranges. Ok enough of the background.

So basically I tried to make a range that would provide everything I needed for
the new CSV design[2] and the result[3] didn't work. It actually works better
with my CSV parser than it does splitter.

The main issue I was having is, if you save the range and move forward, how do
you keep the buffer of all instances in sync. Can we turn an input range into a
forward range? If not, how would you get splitter working on an input range? (I
probably need to file a bug, but my InputFileRange[3, bottom] didn't work with
splitter either)

The next issue is with slicing. If we can't get an input range to become a
forward range then we can't have slicing either. A slice of [0..$] should give
me a copy of the range. But even if you could do this, how would you know that
the slice should be made of the entire range, or of just what is available in
the buffer?

So I guess the question is, with the proposal. Can a hasSlicing!R be created
from an InputRange!R such that 

    auto range = "Hello, World";
    auto len = countUntil(range, ",");
    assert(range[0..len] == "Hello");

where range is replaced by a buffered Input Range. And as an added bonus:

    range = range[len..$];
    assert(range == ",World");

You can of course use the Range for equality, instead of strings like "Hello".

1. https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L1317

2. https://github.com/he-the-great/JPDLibs/blob/csvoptimize/csv/csv.d

3. https://gist.github.com/812681

Feb 05 2011

Heywood Floyd <soul8o8 gmail.com> writes:

Nice!
And evenin'!



Layman's view:
- - - - - - - - - - -
(I'm serious, please don't take my post too seriously. I'm not a heavy =
user of D and I don't want to pollute. I know in NGs exposure means =
influence and I babble a lot. Still, my thoughts, or rather reactions, =
could be of interest, I assume, or I wouldn't be writing this : )


I'm not sure how these buffered input ranges are supposed to be used =
(some mockup sample code would be very cool!), but it seems to me, and =
please correct me if I'm wrong, that it's very desirable for these =
ranges to be interchangeable?

As in, you've built some library that passes around ranges, but then =
some part of it is slow and needs buffered ones. Isn't the point that =
you can then swap out your ranges for buffered ones here and there, but =
continue to have a functioning library?

Reusability, generics, bend the spoon neo and all that?

If not, then ok.

But if yes, then I think these buffered ranges look very troublesome! =
Naughty even!



* * *



Then there's the sneaky break of consistency of the D semantics.

Even if these ranges are not intended to be interchangeable, still, =
changing the (human langauge) semantics that the input ranges already =
define is not good! This makes D a difficult language to get an =
intuitive feel for, I think.

By the definition of input ranges, the word "front" symbolizes the first =
_element_ in a forwards facing queue of elements.

| 1:st |     	<-- front()
| 2:nd |   v-- hidden --v
| 3:rd |
| .....  |
| n:th | <-- "back"

..as front() returns a T.

Then follows that popFront() means "discard the first _element_, so that =
the element that was second now becomes first." And if we can agree to =
that, then shiftFront() is possibly very confusing, and so is =
appendToFront(). They sound like they operate on the first element!=20

So it seems these buffered ranges have redefined the semantics for the =
word "front", as meaning "the view window into the front part of the =
queue". Sneaky!

I mean, imagine being new with D and skimming through the API docs for =
ranges, and picking up these function names at a glance. You'd be =
setting yourself up for one of those =
"aaahaa-now-I-get-why-I-didn't-get-it"-moments for sure. Hmmm.

Still, front() could very well refer to the "front part"=97to a list of =
elements (or the "view window"), and first() could refer to the first =
element. Actually, that would make the most sense! Then an input range =
would be first()/popFirst()/empty(), and a buffered one would have all =
those, but also amend something like front(n)/widenFront(n)/popFront(n), =
but yeah, erhm.

I call for stricter and more consistent semantics!
Decide what "front" means when talking about ranges, and stick to it!
(And I'm talking about human language semantics, not what a function (or =
"primitive"?) does.)

Erh, I tried to sound resolute there. Not my thing really.



* * *



Besides that, "shiftFront" got me thinking about sliding windows, and =
that would actually be cool! As in

| 1st |   '\	     <-- first()
| 2nd |   |-- front() // "view window"
| 3rd |  ./
| 4th | v-- hidden --v
| 5th |
| ...... |
| n:th |

and then calling shiftFront(2) would shift the view window 2 elements =
forward (thus fetching 2 and discarding 2). Seems like a useful feature =
when parsing some encoding with variable point width and known distance =
to the "event horizon", no? As in

code.viewDistance =3D 8;
do{
	auto p =3D code.front()
	if(isLongPoint(p)){
		processLong(p)
		code.shiftFront(8);
	}else if(isPoint(p)){
		process(p)
		code.shiftFront(4);
	}else break;
}while(p);

or something like that.

But the semantic that "shiftFront" would mean the same as popFront(), =
but on a list of elements? Confusing! Surely, at least popFront(n)...

Hm, yeah
Ok I'm all out of coffee!!!
Thanks for your time!
BR
/HF

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Heywood Floyd" <soul8o8 gmail.com> wrote in message 
news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
As in, you've built some library that passes around ranges, but then some 
part of it is slow and needs buffered ones. Isn't the point that you can 
then swap out your ranges for buffered ones here and there, but continue to 
have a functioning library?

The problem with that is that in many many cases it forces unnessisary 
copying. We can get much better performance with this slightly more 
"hands-on" version. But that said, if the traditional "hands-free automatic 
buffering" really is all you need, then such a thing [should] be easily to 
construct out of the Andrei's style of buffered range.


Then follows that popFront() means "discard the first _element_, so that 
the element that was second now becomes first." And if we can agree to 
that, then shiftFront() is possibly very confusing, and so is 
appendToFront(). They sound like they operate on the first element!

I completely agree. The names of those functions confused the hell out of me 
until I read Andrei's descriptions of them. Now I understand what they 
do...but I still don't understand their names at all.

Feb 05 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 05 Feb 2011 17:22:01 -0500, Nick Sabalausky <a a.a> wrote:

 "Heywood Floyd" <soul8o8 gmail.com> wrote in message
 news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
 As in, you've built some library that passes around ranges, but then  
 some
 part of it is slow and needs buffered ones. Isn't the point that you can
 then swap out your ranges for buffered ones here and there, but  
 continue to
 have a functioning library?

 The problem with that is that in many many cases it forces unnessisary
 copying. We can get much better performance with this slightly more
 "hands-on" version. But that said, if the traditional "hands-free  
 automatic
 buffering" really is all you need, then such a thing [should] be easily  
 to
 construct out of the Andrei's style of buffered range.


 Then follows that popFront() means "discard the first _element_, so that
 the element that was second now becomes first." And if we can agree to
 that, then shiftFront() is possibly very confusing, and so is
 appendToFront(). They sound like they operate on the first element!

 I completely agree. The names of those functions confused the hell out  
 of me
 until I read Andrei's descriptions of them. Now I understand what they
 do...but I still don't understand their names at all.

See point of Andrei's post:
1. R is an input range of T[]
Which means that front returns an array, not a single element. So they  
sound like they operate on the first element, because that's exactly what  
they do. Conceptually, you need to think of buffered inputs as range of  
ranges, not a range of elements.

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 11:22 PM, Nick Sabalausky wrote:
 "Heywood Floyd"<soul8o8 gmail.com>  wrote in message
 news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
 As in, you've built some library that passes around ranges, but then some
 part of it is slow and needs buffered ones. Isn't the point that you can
 then swap out your ranges for buffered ones here and there, but continue to
 have a functioning library?

 The problem with that is that in many many cases it forces unnessisary
 copying. We can get much better performance with this slightly more
 "hands-on" version. But that said, if the traditional "hands-free automatic
 buffering" really is all you need, then such a thing [should] be easily to
 construct out of the Andrei's style of buffered range.


 Then follows that popFront() means "discard the first _element_, so that
 the element that was second now becomes first." And if we can agree to
 that, then shiftFront() is possibly very confusing, and so is
 appendToFront(). They sound like they operate on the first element!

 I completely agree. The names of those functions confused the hell out of me
 until I read Andrei's descriptions of them. Now I understand what they
 do...but I still don't understand their names at all.

Same here; thought: "maybe he meant shiftBuf() & appendToBuf(), or such?". 
(Then, as nobody reacted about that point, thought: "You're the stupid one; 
shut your mouth!")

I also agree with Heywood about first() / popFirst(). Then, shiftFront() / 
appendToFront() would be less confusing --but still hard to guess (for me).
I wonder if his "view window" is the whole or part of the buffer. Well...
(Else, I actually share most of Heywood's views, I guess, at least at first
read.)

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/5/11 5:22 PM, Nick Sabalausky wrote:
 "Heywood Floyd"<soul8o8 gmail.com>  wrote in message
 news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
 As in, you've built some library that passes around ranges, but then some
 part of it is slow and needs buffered ones. Isn't the point that you can
 then swap out your ranges for buffered ones here and there, but continue to
 have a functioning library?

 The problem with that is that in many many cases it forces unnessisary
 copying. We can get much better performance with this slightly more
 "hands-on" version. But that said, if the traditional "hands-free automatic
 buffering" really is all you need, then such a thing [should] be easily to
 construct out of the Andrei's style of buffered range.


 Then follows that popFront() means "discard the first _element_, so that
 the element that was second now becomes first." And if we can agree to
 that, then shiftFront() is possibly very confusing, and so is
 appendToFront(). They sound like they operate on the first element!

 I completely agree. The names of those functions confused the hell out of me
 until I read Andrei's descriptions of them. Now I understand what they
 do...but I still don't understand their names at all.

Better names are always welcome!

Andrei

Feb 05 2011

Adam D. Ruppe <destructionator gmail.com> writes:

This sounds similar to how my network code works. I called
the functions fetchMore() to append to the buffer and eat(int n)
to advance the front position.

Feb 05 2011

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:iil3lv$1bb1$1 digitalmars.com...
 On 2/5/11 5:22 PM, Nick Sabalausky wrote:
 "Heywood Floyd"<soul8o8 gmail.com>  wrote in message
 news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
 As in, you've built some library that passes around ranges, but then 
 some
 part of it is slow and needs buffered ones. Isn't the point that you can
 then swap out your ranges for buffered ones here and there, but continue 
 to
 have a functioning library?

 The problem with that is that in many many cases it forces unnessisary
 copying. We can get much better performance with this slightly more
 "hands-on" version. But that said, if the traditional "hands-free 
 automatic
 buffering" really is all you need, then such a thing [should] be easily 
 to
 construct out of the Andrei's style of buffered range.


 Then follows that popFront() means "discard the first _element_, so that
 the element that was second now becomes first." And if we can agree to
 that, then shiftFront() is possibly very confusing, and so is
 appendToFront(). They sound like they operate on the first element!

 I completely agree. The names of those functions confused the hell out of 
 me
 until I read Andrei's descriptions of them. Now I understand what they
 do...but I still don't understand their names at all.

 Better names are always welcome!

Borrowing slightly from Adam:

discard and fetch?

Feb 05 2011

Tomek =?ISO-8859-2?B?U293afFza2k=?= <just ask.me> writes:

Nick Sabalausky napisa=B3:

 discard and fetch?

I like that.

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 6:01 EST, Tomek Sowi�ski wrote:
 Nick Sabalausky napisa�:

 discard and fetch?

 I like that.

What's missing is the part that they refer to front. Maybe 
discardFromFront() and fetchToFront()? But then I like 
discardFromFront() and appendToFront() better - the latter is about as 
long and more informative.

Don't forget that these are relatively rarely used.


Andrei

Feb 06 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe  
 discardFromFront() and fetchToFront()? But then I like  
 discardFromFront() and appendToFront() better - the latter is about as  
 long and more informative.

 Don't forget that these are relatively rarely used.


 Andrei

Actually, I don't think these functions would be relatively rarely used. I  
don't see that many people using a buffered input's popFront. Instead I  
see shiftFront in its place and an appendToFront call has to be made  
whenever buffer.front.empty.

Feb 06 2011

Torarin <torarind gmail.com> writes:

2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowi=C5=84ski wrote:
 Nick Sabalausky napisa=C5=82:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like discardFromFront(=


)
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used. =

I
 don't see that many people using a buffered input's popFront. Instead I s=

ee
 shiftFront in its place and an appendToFront call has to be made whenever
 buffer.front.empty.

Why not popFront if empty? Maybe you could use appendToFront if you
knew that you needed to expand the stream's buffer, but that doesn't
sound like a common case.

Torarin

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 9:47 PM, Torarin wrote:
 2011/2/7 Robert Jacques<sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>  wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used. I
 don't see that many people using a buffered input's popFront. Instead I see
 shiftFront in its place and an appendToFront call has to be made whenever
 buffer.front.empty.

 Why not popFront if empty? Maybe you could use appendToFront if you
 knew that you needed to expand the stream's buffer, but that doesn't
 sound like a common case.

 Torarin

Exactly. Consider line continuations. Most of the time you read one line 
at a time and everything goes swell. On occasion you'll have a line 
continuation convention (line ends with a backslash, unmatched quote 
etc.) and you need to expand the current buffer to gobble the next line.


Andrei

Feb 06 2011

Torarin <torarind gmail.com> writes:

2011/2/7 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 On 2/6/11 9:47 PM, Torarin wrote:
 2011/2/7 Robert Jacques<sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> =A0wrote:

 On 2/6/11 6:01 EST, Tomek Sowi=F1ski wrote:
 Nick Sabalausky napisa=B3:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used=



.
 I
 don't see that many people using a buffered input's popFront. Instead I
 see
 shiftFront in its place and an appendToFront call has to be made whenev=



er
 buffer.front.empty.

 Why not popFront if empty? Maybe you could use appendToFront if you
 knew that you needed to expand the stream's buffer, but that doesn't
 sound like a common case.

 Torarin

 Exactly. Consider line continuations. Most of the time you read one line =

at
 a time and everything goes swell. On occasion you'll have a line
 continuation convention (line ends with a backslash, unmatched quote etc.=

)
 and you need to expand the current buffer to gobble the next line.


 Andrei

Yes, it's really convenient. You don't have to start messing with your
own buffer, and you avoid the temptation of doing one-element reads.
What seems more unlikely is using it on an empty front, though.

Torarin

Feb 06 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 06 Feb 2011 21:47:48 -0500, Torarin <torarind gmail.com> wrote:

 2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like  
 discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely  
 used. I
 don't see that many people using a buffered input's popFront. Instead I  
 see
 shiftFront in its place and an appendToFront call has to be made  
 whenever
 buffer.front.empty.

 Why not popFront if empty?

Because even reading UTF-8 requires more than 1-byte of information.  
Basically, any routine that processes a raw stream is going to have to  
handle the case where what they're processing straddles the buffer  
boundary. Now, if the routine wraps the raw stream in some higher-order  
range, such as byLine, which guarantees them that none of their inputs  
straddle, great. But it would be negligent to neglect those coders writing  
the higher-level ranges.

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 10:59 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 21:47:48 -0500, Torarin <torarind gmail.com> wrote:

 2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely
 used. I
 don't see that many people using a buffered input's popFront. Instead
 I see
 shiftFront in its place and an appendToFront call has to be made
 whenever
 buffer.front.empty.

 Why not popFront if empty?

 Because even reading UTF-8 requires more than 1-byte of information.
 Basically, any routine that processes a raw stream is going to have to
 handle the case where what they're processing straddles the buffer
 boundary. Now, if the routine wraps the raw stream in some higher-order
 range, such as byLine, which guarantees them that none of their inputs
 straddle, great. But it would be negligent to neglect those coders
 writing the higher-level ranges.

They aren't being neglected. If you need to get more stuff in the 
current unit of work, use appendToFront().

Let's restart this. What do you want to do that you can't do with the 
proposed API?

Andrei

Feb 06 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 06 Feb 2011 23:01:12 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 10:59 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 21:47:48 -0500, Torarin <torarind gmail.com> wrote:

 2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely
 used. I
 don't see that many people using a buffered input's popFront. Instead
 I see
 shiftFront in its place and an appendToFront call has to be made
 whenever
 buffer.front.empty.

 Why not popFront if empty?

 Because even reading UTF-8 requires more than 1-byte of information.
 Basically, any routine that processes a raw stream is going to have to
 handle the case where what they're processing straddles the buffer
 boundary. Now, if the routine wraps the raw stream in some higher-order
 range, such as byLine, which guarantees them that none of their inputs
 straddle, great. But it would be negligent to neglect those coders
 writing the higher-level ranges.

 They aren't being neglected. If you need to get more stuff in the  
 current unit of work, use appendToFront().

 Let's restart this. What do you want to do that you can't do with the  
 proposed API?

 Andrei

Nothing. I like the API and it makes things like byChunk actually usable  
in my mind. My objection was to your 'relatively rarely used' comment and  
the implications of that with regard to API design. I have also just  
realized that made a typo when talking about appendToFront; I meant a call  
to appendToFront would be common when your working slice of .front has run  
out, but you're not ready to call shiftFront (because you're in the middle  
of parsing something, etc). The way I stated it, a call to popFront and a  
call to appendToFront would be (mostly) equivalent. Which may has confused  
people. Sorry.

Feb 06 2011

Torarin <torarind gmail.com> writes:

2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 21:47:48 -0500, Torarin <torarind gmail.com> wrote:

 2011/2/7 Robert Jacques <sandford jhu.edu>:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowi=C5=84ski wrote:
 Nick Sabalausky napisa=C5=82:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront()
 and appendToFront() better - the latter is about as long and more
 informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used=



.
 I
 don't see that many people using a buffered input's popFront. Instead I
 see
 shiftFront in its place and an appendToFront call has to be made whenev=



er
 buffer.front.empty.

 Why not popFront if empty?

 Because even reading UTF-8 requires more than 1-byte of information.
 Basically, any routine that processes a raw stream is going to have to
 handle the case where what they're processing straddles the buffer bounda=

ry.
 Now, if the routine wraps the raw stream in some higher-order range, such=

 as
 byLine, which guarantees them that none of their inputs straddle, great. =

But
 it would be negligent to neglect those coders writing the higher-level
 ranges.

But popFront also reads more than 1 byte. Unless you call
appendToFront with a larger value than the current buffer size, unless
I have misunderstood, they should end up doing the same thing.

Torarin

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 8:57 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront() and appendToFront() better - the latter is about as
 long and more informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used.
 I don't see that many people using a buffered input's popFront. Instead
 I see shiftFront in its place and an appendToFront call has to be made
 whenever buffer.front.empty.

Both byLine and byChunk are buffered inputs, are often used, and will 
have seldom use for the added functions.

Andrei

Feb 06 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Sun, 06 Feb 2011 21:53:01 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 8:57 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront() and appendToFront() better - the latter is about as
 long and more informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used.
 I don't see that many people using a buffered input's popFront. Instead
 I see shiftFront in its place and an appendToFront call has to be made
 whenever buffer.front.empty.

 Both byLine and byChunk are buffered inputs, are often used, and will  
 have seldom use for the added functions.

 Andrei

Yes, but byLine is a higher order meta-range designed primarily for end  
users. I believe that internally, byLine would use  
shiftFront/appendToFront, as would virtually all library code. I do  
apologize for thinking only as a library writer, but you need to remember  
those of us writing the JSON, XML, separated value, etc parsers too, when  
designing the api.

Feb 06 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/6/11 10:37 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 21:53:01 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 8:57 PM, Robert Jacques wrote:
 On Sun, 06 Feb 2011 10:43:47 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/6/11 6:01 EST, Tomek Sowiński wrote:
 Nick Sabalausky napisał:

 discard and fetch?

 I like that.

 What's missing is the part that they refer to front. Maybe
 discardFromFront() and fetchToFront()? But then I like
 discardFromFront() and appendToFront() better - the latter is about as
 long and more informative.

 Don't forget that these are relatively rarely used.


 Andrei

 Actually, I don't think these functions would be relatively rarely used.
 I don't see that many people using a buffered input's popFront. Instead
 I see shiftFront in its place and an appendToFront call has to be made
 whenever buffer.front.empty.

 Both byLine and byChunk are buffered inputs, are often used, and will
 have seldom use for the added functions.

 Andrei

 Yes, but byLine is a higher order meta-range designed primarily for end
 users. I believe that internally, byLine would use
 shiftFront/appendToFront, as would virtually all library code. I do
 apologize for thinking only as a library writer, but you need to
 remember those of us writing the JSON, XML, separated value, etc parsers
 too, when designing the api.

I understand. Ultimately such widely used parsers would be themselves 
encapsulated in library (Phobos?) modules. The general idea is that 
straight reading tasks only need very simple code, and more 
sophisticated reading tasks would need to use a couple more primitives. 
That's not that bad, is it?


Andrei

Feb 06 2011

spir <denis.spir gmail.com> writes:

On 02/06/2011 04:11 AM, Andrei Alexandrescu wrote:
 On 2/5/11 5:22 PM, Nick Sabalausky wrote:
 "Heywood Floyd"<soul8o8 gmail.com> wrote in message
 news:mailman.1318.1296941395.4748.digitalmars-d puremagic.com...
 As in, you've built some library that passes around ranges, but then some
 part of it is slow and needs buffered ones. Isn't the point that you can
 then swap out your ranges for buffered ones here and there, but continue to
 have a functioning library?

 The problem with that is that in many many cases it forces unnessisary
 copying. We can get much better performance with this slightly more
 "hands-on" version. But that said, if the traditional "hands-free automatic
 buffering" really is all you need, then such a thing [should] be easily to
 construct out of the Andrei's style of buffered range.


 Then follows that popFront() means "discard the first _element_, so that
 the element that was second now becomes first." And if we can agree to
 that, then shiftFront() is possibly very confusing, and so is
 appendToFront(). They sound like they operate on the first element!

 I completely agree. The names of those functions confused the hell out of me
 until I read Andrei's descriptions of them. Now I understand what they
 do...but I still don't understand their names at all.

 Better names are always welcome!

If append actually appends into buffer (what I understand as of now), then 
appendToBuf(er). For shiftFront, maybe eatSlice? popFrontSlice would be very 
good (esp as opposed to a hypothetical poBackSlice), to hint the operation 
happens at head (because "pop" alone means at back). But as others have said 
"front" in the context of ranges has now an established sense of "first" (just 
like "head" or "car" for lists, by the way), or maybe more exactly "current".

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 06 2011

Torarin <torarind gmail.com> writes:

2011/2/5 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
 I hereby suggest we define "buffered input range of T" any range R that
 satisfies the following conditions:

 1. R is an input range of T[]

 2. R defines a primitive shiftFront(size_t n). The semantics of the
 primitive is that, if r.front.length >= n, then shiftFront(n) discards the
 first n elements in r.front. Subsequently r.front will return a slice of the
 remaining elements.

 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n
 more elements from the underlying stream and makes them available in
 addition to whatever was in front. For example if r.front.length was 1024,
 after the call r.appendToFront(512) will have r.front have length 1536 of
 which the first 1024 will be the old front and the rest will be newly-read
 elements (assuming that the stream had enough data). If n = 0, this
 instructs the stream to add any number of elements at its own discretion.

This is really cool. I realise now that appendToFront fills the gap in
the design providing only shiftFront/advance. I also thought their
names were well-chosen.

Torarin

Feb 05 2011

spir <denis.spir gmail.com> writes:

On 02/05/2011 06:42 PM, Heywood Floyd wrote:
 Nice!
 And evenin'!



 Layman's view:
 - - - - - - - - - - -
 (I'm serious, please don't take my post too seriously. I'm not a heavy user of
D and I don't want to pollute. I know in NGs exposure means influence and I
babble a lot. Still, my thoughts, or rather reactions, could be of interest, I
assume, or I wouldn't be writing this : )


 I'm not sure how these buffered input ranges are supposed to be used (some
mockup sample code would be very cool!), but it seems to me, and please correct
me if I'm wrong, that it's very desirable for these ranges to be
interchangeable?

 As in, you've built some library that passes around ranges, but then some part
of it is slow and needs buffered ones. Isn't the point that you can then swap
out your ranges for buffered ones here and there, but continue to have a
functioning library?

 Reusability, generics, bend the spoon neo and all that?

 If not, then ok.

 But if yes, then I think these buffered ranges look very troublesome! Naughty
even!



 * * *



 Then there's the sneaky break of consistency of the D semantics.

 Even if these ranges are not intended to be interchangeable, still, changing
the (human langauge) semantics that the input ranges already define is not
good! This makes D a difficult language to get an intuitive feel for, I think.

 By the definition of input ranges, the word "front" symbolizes the first
_element_ in a forwards facing queue of elements.

 | 1:st |     	<-- front()
 | 2:nd |   v-- hidden --v
 | 3:rd |
 | .....  |
 | n:th |<-- "back"

 ..as front() returns a T.

 Then follows that popFront() means "discard the first _element_, so that the
element that was second now becomes first." And if we can agree to that, then
shiftFront() is possibly very confusing, and so is appendToFront(). They sound
like they operate on the first element!

 So it seems these buffered ranges have redefined the semantics for the word
"front", as meaning "the view window into the front part of the queue". Sneaky!

 I mean, imagine being new with D and skimming through the API docs for ranges,
and picking up these function names at a glance. You'd be setting yourself up
for one of those "aaahaa-now-I-get-why-I-didn't-get-it"-moments for sure. Hmmm.

 Still, front() could very well refer to the "front part"—to a list of
elements (or the "view window"), and first() could refer to the first element.
Actually, that would make the most sense! Then an input range would be
first()/popFirst()/empty(), and a buffered one would have all those, but also
amend something like front(n)/widenFront(n)/popFront(n), but yeah, erhm.

+++ (everything above)

 I call for stricter and more consistent semantics!
 Decide what "front" means when talking about ranges, and stick to it!
 (And I'm talking about human language semantics, not what a function (or
"primitive"?) does.)

 Erh, I tried to sound resolute there. Not my thing really.

pleased to see there is at least one other programmer still considering that 
"semantic" applies to human thoughts, rather than machine process...

 * * *



 Besides that, "shiftFront" got me thinking about sliding windows, and that
would actually be cool! As in

 | 1st |   '\	<-- first()
 | 2nd |   |-- front() // "view window"
 | 3rd |  ./
 | 4th | v-- hidden --v
 | 5th |
 | ...... |
 | n:th |

There is an off-by-one error between 1st & first, I guess ;-)
What's your "view window"? Is it buffer, or the needed amount of lookahead, or 
what else?
How would you draw the buffer, on the first picture or the one above?

"Sliding window" is, for me, the mental picture my brain intuitively forms when 
thinking at buffered input. But the sliding move may not be smooth (element per 
element), instead could happen as is most practical or efficient; as long as it 
remains a point not exposed on the interface (or only on request by client 
code). Meaning there would be an independant index pointing to 
current/first/front element, in the buffer or the window, & automagically 
maintained when sliding happens (index -= offset).

 and then calling shiftFront(2) would shift the view window 2 elements forward
(thus fetching 2 and discarding 2). Seems like a useful feature when parsing
some encoding with variable point width and known distance to the "event
horizon", no? As in

 code.viewDistance = 8;
 do{
 	auto p = code.front()
 	if(isLongPoint(p)){
 		processLong(p)
 		code.shiftFront(8);
 	}else if(isPoint(p)){
 		process(p)
 		code.shiftFront(4);
 	}else break;
 }while(p);

 or something like that.

 But the semantic that "shiftFront" would mean the same as popFront(), but on a
list of elements? Confusing! Surely, at least popFront(n)...

 Hm, yeah
 Ok I'm all out of coffee!!!
 Thanks for your time!
 BR
 /HF

Veeery interesting message, thank you. I share your care for correct naming. 
And the rest, actually. Wish you would post regularly.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com

Feb 05 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Sat, 05 Feb 2011 00:46:40 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 I've had the opportunity today to put some solid hours of thinking into  
 the relationship (better said the relatedness) of what would be called  
 "buffered streams" and ranges. They have some commonalities and some  
 differences, but it's been difficult to capture them. I think now I have  
 a clear view, caused by a few recent discussions. One was the CSV reader  
 discussed on the Phobos list; another was the discussion on defining the  
 "right" std.xml.

[snip]

 What do you all think?

I haven't read many of the responses, but I'll say again what I've always  
said.  The range concept does not fit streams very well.  I think a range  
can be built on a stream, but I think a buffered stream should be it's own  
type (similar to how you laid it out a few weeks ago).

IMO, a stream should be a simple class hierarchy that defines input/output  
and buffering.  Then ranges can be built on top of the stream to interface  
with other parts of phobos.

Now, I think as an *input parameter* for algorithms that wish to work with  
streams (or other ranges), a range of T[] is most appropriate.

-Steve

Feb 07 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 2/7/11 7:53 AM, Steven Schveighoffer wrote:
 On Sat, 05 Feb 2011 00:46:40 -0500, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 I've had the opportunity today to put some solid hours of thinking
 into the relationship (better said the relatedness) of what would be
 called "buffered streams" and ranges. They have some commonalities and
 some differences, but it's been difficult to capture them. I think now
 I have a clear view, caused by a few recent discussions. One was the
 CSV reader discussed on the Phobos list; another was the discussion on
 defining the "right" std.xml.

 [snip]

 What do you all think?

 I haven't read many of the responses, but I'll say again what I've
 always said. The range concept does not fit streams very well. I think a
 range can be built on a stream, but I think a buffered stream should be
 it's own type (similar to how you laid it out a few weeks ago).

 IMO, a stream should be a simple class hierarchy that defines
 input/output and buffering. Then ranges can be built on top of the
 stream to interface with other parts of phobos.

One good thing about the current proposal is that it integrates 
seamlessly with that one.


Andrei

Feb 07 2011

Kagamin <spam here.lot> writes:

Andrei Alexandrescu Wrote:

 With these primitives a lot of good operating operating on buffered 
 streams can be written efficiently. The range is allowed to reuse data 
 in its buffers (unless that would contradict language invariants, e.g. 
 if T is invariant), so if client code wants to stash away parts of the 
 input, it needs to make a copy.

This surely satisfies most needs for buffered input.

I'm thinking about adaptive caller which can do tricks based on the stream
content. Say, strings are usually serialized as |length|data| so the caller can
preallocate buffer of exact length and fill it directly.

Will unbuffered file stream bypass system cache?
If yes, you'll have poor performance if the file is already in cache.
If no, you'll have double copy if the caller wants to save the data: from
system cache to the range's buffer and from the range's buffer to the caller's
buffer.

Feb 07 2011

D Programming

C/C++ Programming

Other

digitalmars.D - buffered input