www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - iopipe alpha 0.0.1 version

reply Steven Schveighoffer <schveiguy yahoo.com> writes:
I added a tag for iopipe and added it to the dub registry so people can 
try it out.

I didn't want to add it until I had fully documented and unittested it.

http://code.dlang.org/packages/iopipe
https://github.com/schveiguy/iopipe

If you plan on using it, expect some API changes in the near future. I 
think the next step is really to add Windows support for the IODev type.

Suggestions and ideas welcome and appreciated.

I also want to add generated documentation. Does anyone know of a good 
way to generate the ddoc (or ddox or whatever) and put it directly into 
the repository for github to serve? Would be an awesome tip for people 
making projects for code.dlang.org.

-Steve
Oct 11
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Thursday, 12 October 2017 at 04:22:01 UTC, Steven 
Schveighoffer wrote:
 I added a tag for iopipe and added it to the dub registry so 
 people can try it out.

 I didn't want to add it until I had fully documented and 
 unittested it.

 http://code.dlang.org/packages/iopipe
 https://github.com/schveiguy/iopipe

 If you plan on using it, expect some API changes in the near 
 future. I think the next step is really to add Windows support 
 for the IODev type.
Might be able to help you on that using WinAPI for I/O. (I assume bypassing libc is one of goals).
Oct 11
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/12/17 1:48 AM, Dmitry Olshansky wrote:
 On Thursday, 12 October 2017 at 04:22:01 UTC, Steven Schveighoffer wrote:
 I added a tag for iopipe and added it to the dub registry so people 
 can try it out.

 I didn't want to add it until I had fully documented and unittested it.

 http://code.dlang.org/packages/iopipe
 https://github.com/schveiguy/iopipe

 If you plan on using it, expect some API changes in the near future. I 
 think the next step is really to add Windows support for the IODev type.
Might be able to help you on that using WinAPI for I/O. (I assume bypassing libc is one of goals).
That would be awesome! Yes, the idea is to avoid any "extra" buffering. So using CreateFile, ReadFile, etc. -Steve
Oct 12
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/12/17 8:41 AM, Steven Schveighoffer wrote:
 On 10/12/17 1:48 AM, Dmitry Olshansky wrote:
 On Thursday, 12 October 2017 at 04:22:01 UTC, Steven Schveighoffer wrote:
 I added a tag for iopipe and added it to the dub registry so people 
 can try it out.

 I didn't want to add it until I had fully documented and unittested it.

 http://code.dlang.org/packages/iopipe
 https://github.com/schveiguy/iopipe

 If you plan on using it, expect some API changes in the near future. 
 I think the next step is really to add Windows support for the IODev 
 type.
Might be able to help you on that using WinAPI for I/O. (I assume bypassing libc is one of goals).
That would be awesome! Yes, the idea is to avoid any "extra" buffering. So using CreateFile, ReadFile, etc.
Dmitry hold off on this if you were going to do it. I have been looking at Jason White's io library, and think I'm going to just extract all the low-level types he has there as a basic io library, as they are fairly complete, and start from there. His library includes the ability to use Windows. -Steve
Oct 16
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Monday, 16 October 2017 at 14:45:21 UTC, Steven Schveighoffer 
wrote:
 On 10/12/17 8:41 AM, Steven Schveighoffer wrote:
 On 10/12/17 1:48 AM, Dmitry Olshansky wrote:
 On Thursday, 12 October 2017 at 04:22:01 UTC, Steven 
 Schveighoffer wrote:
 [...]
Might be able to help you on that using WinAPI for I/O. (I assume bypassing libc is one of goals).
That would be awesome! Yes, the idea is to avoid any "extra" buffering. So using CreateFile, ReadFile, etc.
Dmitry hold off on this if you were going to do it. I have been looking at Jason White's io library, and think I'm going to just extract all the low-level types he has there as a basic io library, as they are fairly complete, and start from there. His library includes the ability to use Windows.
Meh, not that I had mich spare time to actually do anything ;) Might help by reviewing what you have there.
 -Steve
Oct 16
parent reply Martin Nowak <code dawg.eu> writes:
On Monday, 16 October 2017 at 19:36:20 UTC, Dmitry Olshansky 
wrote:
 Dmitry hold off on this if you were going to do it. I have 
 been looking at Jason White's io library, and think I'm going 
 to just extract all the low-level types he has there as a 
 basic io library, as they are fairly complete, and start from 
 there. His library includes the ability to use Windows.
Meh, not that I had mich spare time to actually do anything ;) Might help by reviewing what you have there.
Started to work on unbuffered I/O, had it in mind for quite a while already. http://github.com/MartinNowak/io
Oct 16
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/16/17 4:56 PM, Martin Nowak wrote:
 On Monday, 16 October 2017 at 19:36:20 UTC, Dmitry Olshansky wrote:
 Dmitry hold off on this if you were going to do it. I have been 
 looking at Jason White's io library, and think I'm going to just 
 extract all the low-level types he has there as a basic io library, 
 as they are fairly complete, and start from there. His library 
 includes the ability to use Windows.
Meh, not that I had mich spare time to actually do anything ;) Might help by reviewing what you have there.
Started to work on unbuffered I/O, had it in mind for quite a while already. http://github.com/MartinNowak/io
Awesome! Is the plan to put this into Phobos? If so, I would put it under std/experimental/io. However, if not, it should not be std/io. Looks like it has all the stuff I had for my basic io type (and I see you have scatter read/write, that will help), so I will migrate iopipe to depend on it. I was thinking about using Jason White's io library, but I haven't seen him around in a while. Plus if this is going into Phobos, it would be the best thing for me to use. Will pitch in when I can. Thanks! -Steve
Oct 17
next sibling parent reply Suliman <evermind live.ru> writes:
I was thinking about using Jason White's io library, but I 
haven't seen him around in a while
Yes, it would be interesting if you will get some from his lib. He have very good API
Oct 17
parent Martin Nowak <code dawg.eu> writes:
On Tuesday, 17 October 2017 at 13:45:02 UTC, Suliman wrote:
I was thinking about using Jason White's io library, but I 
haven't seen him around in a while
Yes, it would be interesting if you will get some from his lib. He have very good API
I previously collaborated a bit on that library as it was very close to the design I had in mind since years. Unfortunately the lib seems unmaintained now and also went somewhat off-track with a std.socket wrapper (https://github.com/jasonwhite/io/commit/3bbe43954d9c11cc892da10f656f31fff863875a#diff-a5ef7b1ce67d62 95f9bdf019adc4784). But indeed I'll try to contact him. Furthermore I want this to be very focused (no stat/fs functionality beyond what's necessary), but also add hooks for Fiber based async event loops. It's really not that much work to write an unbuffered I/O library, so let's see where that goes.
Oct 17
prev sibling parent Martin Nowak <code dawg.eu> writes:
On Tuesday, 17 October 2017 at 12:28:28 UTC, Steven Schveighoffer 
wrote:
 Is the plan to put this into Phobos? If so, I would put it 
 under std/experimental/io. However, if not, it should not be 
 std/io.
I don't know yet how it will turn, but phobos is very much in need of a better Files and Sockets. Certainly the ambition is to write a standard-worthy library. Honestly it seems to me that the std.experimental-experiment didn't succeed. It's still too much overhead to develop in phobos (and get it reviewed/merged), there is no clear path from std.experimental -> std, and if sth. is well-proofed outside of phobos there is no point in putting it into std.experimental in the first place. Developing std.io-v0.1.0 on dub until it reaches v1.0.0, seems like a straightforward and obvious approach. Also at our current community size, I'm hardly worried about namespace clashes. Plus I'm already using std.internal.cstring as workhorse to support any string-like ranges (including nogc std.path ranges) and core.internal.string : unsignedToTempString to avoid the fat and exception throwing formattedWrite (even the templated variant isn't nothrow).
Oct 17
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-10-12 06:22, Steven Schveighoffer wrote:

 I also want to add generated documentation. Does anyone know of a good 
 way to generate the ddoc (or ddox or whatever) and put it directly into 
 the repository for github to serve? Would be an awesome tip for people 
 making projects for code.dlang.org.
I would suggest using GitHub Pages [1] for storing it. [1] https://pages.github.com -- /Jacob Carlborg
Oct 12
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/12/17 3:05 AM, Jacob Carlborg wrote:
 On 2017-10-12 06:22, Steven Schveighoffer wrote:
 
 I also want to add generated documentation. Does anyone know of a good 
 way to generate the ddoc (or ddox or whatever) and put it directly 
 into the repository for github to serve? Would be an awesome tip for 
 people making projects for code.dlang.org.
I would suggest using GitHub Pages [1] for storing it. [1] https://pages.github.com
Thanks, I used dub -b ddox to generate the documentation, then committed the result, looks great! http://schveiguy.github.io/iopipe Release 0.0.2 has fixes for the ddoc that I didn't notice before, there are no actual changes in the code. -Steve
Oct 12
parent reply Martin Nowak <code dawg.eu> writes:
On Thursday, 12 October 2017 at 18:08:11 UTC, Steven 
Schveighoffer wrote:
 Release 0.0.2 has fixes for the ddoc that I didn't notice 
 before, there are no actual changes in the code.
May I recommend scod? It's just a ddox theme. https://github.com/MartinNowak/scod I keep https://github.com/MartinNowak/bloom also as example/scaffold repo, it's using an automated docs setup with gh-branches. Just create a doc deployment token (https://github.com/settings/tokens) with public_repo access and store that encrypted in your .travis-ci.yml.
Oct 13
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/13/17 12:49 PM, Martin Nowak wrote:
 On Thursday, 12 October 2017 at 18:08:11 UTC, Steven Schveighoffer wrote:
 Release 0.0.2 has fixes for the ddoc that I didn't notice before, 
 there are no actual changes in the code.
May I recommend scod? It's just a ddox theme. https://github.com/MartinNowak/scod I keep https://github.com/MartinNowak/bloom also as example/scaffold repo, it's using an automated docs setup with gh-branches. Just create a doc deployment token (https://github.com/settings/tokens) with public_repo access and store that encrypted in your .travis-ci.yml.
Martin, I would appreciate and I think many people would, a blog/tutorial on how to do this. I'll look into your suggestions on the docs, thanks! -Steve
Oct 13
parent Martin Nowak <code dawg.eu> writes:
On Friday, 13 October 2017 at 17:08:18 UTC, Steven Schveighoffer 
wrote:
 I keep https://github.com/MartinNowak/bloom also as 
 example/scaffold repo, it's using an automated docs setup with 
 gh-branches.
 
 Just create a doc deployment token 
 (https://github.com/settings/tokens) with public_repo access 
 and store that encrypted in your .travis-ci.yml.
Martin, I would appreciate and I think many people would, a blog/tutorial on how to do this.
Indeed, that already crossed my mind a couple of times ;).
Oct 16
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
On Thursday, 12 October 2017 at 04:22:01 UTC, Steven 
Schveighoffer wrote:
 I added a tag for iopipe and added it to the dub registry so 
 people can try it out.

 I didn't want to add it until I had fully documented and 
 unittested it.

 http://code.dlang.org/packages/iopipe
 https://github.com/schveiguy/iopipe
Great news to see continued work on this. I'll just use this thread to get started on design discussions. If there is there a better place for that, let me know ;). Questions/Ideas - You can move docs out of the repo to fix search, e.g. by pushing them to a `gh-pages` branch of your repo. See https://github.com/MartinNowak/bloom/blob/736dc7a7ffcd2bbca7997f273a09e272e0 84596/travis.sh#L13 for an automated setup using Travis-CI and ddox/scod. - Standard device implementation? You library already has the notion of devices as thin abstractions over file/socket handles. Should we start with such an unbuffered IO library as foundation including support hooks for Fiber based event loops. Something along the lines of https://code.dlang.org/packages/io? Without a standard device lib, IOPipe could not be used in APIs. Easy enough to write, could be written over a weekend. - What's the plan for safe buffer/window invalidation, right now you're handing out raw access to internal buffers with an inherent memory safety problem. ```d auto w = f.window(); f.extend(random()); w[0]; // ⚡ dangling pointer ⚡ ``` I can see how the compiler could catch that if we'd go with compile-time enforced safety for RC and friends. But that's still unclear atm. and we might end up with a runtime RC/weak ptr mechanism instead, which wouldn't be too good a fit for that window mechanism. - What about the principle that the caller should choose allocation/ownership? Having an extend methods means the IOPipe is responsible for growing/allocating buffers, so you'll end up with IOPipeMalloc, IOPipeGC, IOPipeAllocatorGrowExp (or their template alternatives), not very nice for APIs. - Why continuous memory? The current implementations reallocs and even weirder memmoves data in extend. https://github.com/schveiguy/iopipe/blob/3589a4c9fc72b844eb4efd3ae718773faf9ab9ed/source/iopipe/buffer.d#L171 Shouldn't a modern IO library be as zero-copy as possible? The docs say random access, that should be supported by ringbuffers or lists/arrays of buffers. Any plans towards that direction?
Oct 13
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/13/17 11:59 AM, Martin Nowak wrote:
 On Thursday, 12 October 2017 at 04:22:01 UTC, Steven Schveighoffer wrote:
 I added a tag for iopipe and added it to the dub registry so people 
 can try it out.

 I didn't want to add it until I had fully documented and unittested it.

 http://code.dlang.org/packages/iopipe
 https://github.com/schveiguy/iopipe
Great news to see continued work on this. I'll just use this thread to get started on design discussions. If there is there a better place for that, let me know ;).
This is as good a place as any :) I may create some issue reports on github to track things better.
 Questions/Ideas
 
 - You can move docs out of the repo to fix search, e.g. by pushing them 
 to a `gh-pages` branch of your repo.
When I tried the search it seemed to work...
 See 
 https://github.com/MartinNowak/bloom/blob/736dc7a7ffcd2bbca7997f273a09e272e0
84596/travis.sh#L13 
 for an automated setup using Travis-CI and ddox/scod.
I admit complete ignorance on this, I need to look into it, but at the moment, I'm OK with committing the generated docs directly as an ugly extra step. When I looked at the options under adding a "pages" piece for the project that if I put things under "docs" directory, it could use that, so that's what I went with.
 - Standard device implementation?
 
    You library already has the notion of devices as thin abstractions 
 over file/socket handles.
    Should we start with such an unbuffered IO library as foundation 
 including support hooks for Fiber based event loops. Something along the 
 lines of https://code.dlang.org/packages/io? Without a standard device 
 lib, IOPipe could not be used in APIs.
I absolutely think this would be a great idea. In fact, you could use Jason White's io package with iopipes directly, as his low-level types have the necessary read function: https://github.com/jasonwhite/io/blob/master/source/io/file/stream.d#L335 Perhaps we could coax the basic types out of that library to provide a base for both iopipe and his high-level stuff. The stream portion of my library is really just a throwaway piece that is not a focus of the library. Indeed, I created it because unbuffered stream types didn't exist anywhere (the IODev type predates iopipe, as it was part of my original attempt to rewrite Phobos io).
 - What's the plan for  safe buffer/window invalidation, right now you're 
 handing out raw access to internal buffers with an inherent memory 
 safety problem.
I don't plan to put any restrictions on this. In fact the core purpose of iopipe is to give raw buffer access to aid in writing higher-level routines around it. As I said here: https://github.com/schveiguy/iopipe/blob/master/source/iopipe/buffer.d#L217 If the Allocator supports deallocation I call it, but it may not be the correct thing to do. There is a sticky point in std.experiemental.allocator: the GC allocator defines deallocate, because it's available, but the *presence* of that member may be taken to mean you have to call it to deallocate. There is no member saying whether deallocation is optional. In my wrapper GCNoPointerAllocator (which I needed to support allocating ubyte buffers without having to scan them), I leave out the deallocate function, so technically it's safe with that allocator. I will say though, at some point, I'm going to focus on making safe as much as possible in iopipe. That may require using the GC for buffering.
 
    ```d
    auto w = f.window();
    f.extend(random());
    w[0]; // ⚡ dangling pointer ⚡
    ```
 
    I can see how the compiler could catch that if we'd go with 
 compile-time enforced safety for RC and friends. But that's still 
 unclear atm. and we might end up with a runtime RC/weak ptr mechanism 
 instead, which wouldn't be too good a fit for that window mechanism.
What would be nice is a mechanism to detect this situation, since the above is both un- safe and incorrect code. Possibly you could instrument a window with a mechanism to check to see if it's still correct on every access, to be used when compiled in non-release mode for checking program correctness. But in terms of safe code in release mode, I think the only option is really to rely on the GC or reference counting to allow the window to still exist.
 
 - What about the principle that the caller should choose 
 allocation/ownership?
It can, BufferManager takes an Allocator compile-time option. It's also possible to create your own ownership or allocation scheme as long as you implement the required iopipe methods.
    Having an extend methods means the IOPipe is responsible for 
 growing/allocating buffers, so you'll end up with IOPipeMalloc, 
 IOPipeGC, IOPipeAllocatorGrowExp (or their template alternatives), not 
 very nice for APIs.
extend is a core part of the iopipe system. The point of the library is that you don't have to manage the buffering or allocation of your higher-level code in terms of memory ownership or allocation. I've used so many buffered streams where I have to still create my own buffer because of a quirk in the way I have to process the data doesn't fit the API of the stream. This mitigates that by giving you direct control over how much data should be buffered, but not burdening you with the details of managing that memory. The mechanism was clear to me in Dmitry Olshansky's simple back-reference toy library that he made a while back (and actually was the inspiration for making iopipe instead of what I was doing before). I can't find his library any more, but here is the post he made: https://forum.dlang.org/post/l9q66g$2he3$1 digitalmars.com
 
 - Why continuous memory? The current implementations reallocs and even 
 weirder memmoves data in extend.
 https://github.com/schveiguy/iopipe/blob/3589a4c9fc72b844eb4efd3ae718773faf9ab9ed/source/
opipe/buffer.d#L171 
 
    Shouldn't a modern IO library be as zero-copy as possible?
    The docs say random access, that should be supported by ringbuffers 
 or lists/arrays of buffers. Any plans towards that direction?
Yes and no :) My original idea was that once I got simple array buffers working, I would move on to circular buffers, and linked lists of buffers, etc, with all the details hidden by the range itself. I still might implement this. Windows and Posix support the notion of scatter read so you can easily implement a way for streams to fit perfectly on top of these things. But what I realized is that in practice (and especially when battling to beat Phobos byLine and libc's getline), avoiding copying may not be as important as I thought. For one thing, the focused data (the data you care about currently) is generally much smaller than the real buffer size. So when it is calling memmove, you are generally only moving a tiny piece of the buffer. Second, the CPU is really good at dealing with arrays (and searching through arrays), especially when dereferencing data. Third, every single access to a non-array is going to have to go through some mechanism to check which actual array the index falls into. When implementing iopipe's byline, I got a SIGNIFICANT speedup by copying members of the ByLine struct (e.g. the dchar being searched for) into a local variable. If you have a custom range for a circular buffer whose division point has to be read on every element index, the penalties are going to add up. The trade-offs might still be worth it. For instance if your focused data is a larger percentage of the total buffer (like 70%), moving it to the front of the buffer is going to hurt performance. I don't know whether it would overcome slower access per element. The good news is, I can implement it, and see how it fares, since the higher level code is abstracted to the buffer type. And of course, any existing (non-infinite) random-access range can be hooked as a non-extendable iopipe (see how arrays are hooked). Thanks for all your thoughts on this, Martin! -Steve
Oct 13
parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
 What would be nice is a mechanism to detect this situation, since the
 above is both un- safe and incorrect code.
 
 Possibly you could instrument a window with a mechanism to check to see
 if it's still correct on every access, to be used when compiled in
 non-release mode for checking program correctness.
 
 But in terms of  safe code in release mode, I think the only option is
 really to rely on the GC or reference counting to allow the window to
 still exist.
We should definitely find a nogc solution to this, but it's a good litmus test for the RC compiler support I'll work on. Why do IOPipe have to hand over the window to the caller? They could just implement the RandomAccessRange interface themselves. Instead of ```d auto w = f.window(); f.extend(random()); w[0]; ``` you could only do ```d f[0]; f.extend(random()); f[0]; // bug, but no memory corruption ``` This problem seems to be very similar to the Range vs. Iterators difference, the former can perform bounds checks on indexing, the later are inherently unsafe (with expensive runtime debug checks e.g. in VC++). Similarly always accessing the buffer through IOPipe would allow cheap bounds checking, and sure you could still offer IOPipe.ptr for unsafe code. -Martin
Oct 19
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/19/17 7:13 AM, Martin Nowak wrote:
 On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
 What would be nice is a mechanism to detect this situation, since the
 above is both un- safe and incorrect code.

 Possibly you could instrument a window with a mechanism to check to see
 if it's still correct on every access, to be used when compiled in
 non-release mode for checking program correctness.

 But in terms of  safe code in release mode, I think the only option is
 really to rely on the GC or reference counting to allow the window to
 still exist.
We should definitely find a nogc solution to this, but it's a good litmus test for the RC compiler support I'll work on. Why do IOPipe have to hand over the window to the caller? They could just implement the RandomAccessRange interface themselves. Instead of ```d auto w = f.window(); f.extend(random()); w[0]; ``` you could only do ```d f[0]; f.extend(random()); f[0]; // bug, but no memory corruption ```
So the idea here (If I understand correctly) is to encapsulate the window into the pipe, such that you don't need to access the buffer separately? I'm not quite sure because of that last comment. If f[0] is equivalent to previous code f.window[0], then the second f[0] is not a bug, it's valid, and accessing the first element of the window (which may have moved). But let me assume that was just a misunderstanding...
 
 This problem seems to be very similar to the Range vs. Iterators
 difference, the former can perform bounds checks on indexing, the later
 are inherently unsafe (with expensive runtime debug checks e.g. in VC++).
But ranges have this same problem. For instance: const(char[])[] lines = stdin.byLine.array; Here, since byLine uses GC buffering, it's safe (but wrong). If non-GC buffers are used, then it's not safe. I think as long as the windows are backed by GC data, it should be safe. In this sense, your choice of buffering scheme can make something safe or not safe. I'm OK with that, as long as iopipes can be safe in some way (and that happens to be the default).
 Similarly always accessing the buffer through IOPipe would allow cheap
 bounds checking, and sure you could still offer IOPipe.ptr for unsafe code.
It's an interesting idea to simply make the iopipe the window, not just for safety reasons: 1. this means the iopipe itself *is* a random access range, allowing it to automatically fit into existing algorithms. 2. Existing random-access ranges can be easily shoehorned into being ranges (I already did it with arrays, and it's not much harder with popFrontN). Alternatively, code that uses iopipes can simply check for the existence of iopipe-like methods, and use them if they are present. 3. Less verbose usage, and more uniform access. For instance if an iopipe defines opIndex, then iopipe.window[0] and iopipe[0] are possibly different things, which would be confusing. Some downsides however: 1. iopipes can be complex and windows are not. They were a fixed view of the current buffer. The idea that I can fetch a window of data from an iopipe and then deal simply with that part of the data was attractive. 2. The iopipe is generally not copyable once usage begins. In other words, the feature of ranges that you can copy them and they just work, would be difficult to replicate in iopipe. A possible way forward could be: * iopipe is a random-access range (not necessarily a forward range). * iopipe.window returns a non-extendable window of the buffer itself, which is a forward/random-access range. If backed by the GC or some form of RC, everything is safe. * Functions which now take iopipes could be adjusted to take random-access ranges, and if they are also iopipes, could use the extend features to get more data. * iopipe.release(size_t) could be hooked by popFrontN. I don't like the idea of supporting slicing on iopipes, for the non-forward aspect of iopipe. Much better to have an internal hook that modifies the range in-place. This would make iopipes fit right into the range hierarchy, and therefore could be integrated easily into Phobos. In fact, I can accomplish most of this by simply adding the appropriate range operations to iopipes. I have resisted this in the past but I can't see how it hurts. For Phobos inclusion, however, I don't know how to reconcile auto-decoding. I absolutely need to treat buffers of char, wchar, and dchar data as normal buffers, and not something else. This one thing may keep it from getting accepted. -Steve
Oct 19
parent reply Martin Nowak <code+news.digitalmars dawg.eu> writes:
On 10/19/2017 03:12 PM, Steven Schveighoffer wrote:
 On 10/19/17 7:13 AM, Martin Nowak wrote:
 On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
 What would be nice is a mechanism to detect this situation, since the
 above is both un- safe and incorrect code.

 Possibly you could instrument a window with a mechanism to check to see
 if it's still correct on every access, to be used when compiled in
 non-release mode for checking program correctness.

 But in terms of  safe code in release mode, I think the only option is
 really to rely on the GC or reference counting to allow the window to
 still exist.
We should definitely find a nogc solution to this, but it's a good litmus test for the RC compiler support I'll work on. Why do IOPipe have to hand over the window to the caller? They could just implement the RandomAccessRange interface themselves. Instead of ```d auto w = f.window(); f.extend(random()); w[0]; ``` you could only do ```d f[0]; f.extend(random()); f[0]; // bug, but no memory corruption ```
So the idea here (If I understand correctly) is to encapsulate the window into the pipe, such that you don't need to access the buffer separately? I'm not quite sure because of that last comment. If f[0] is equivalent to previous code f.window[0], then the second f[0] is not a bug, it's valid, and accessing the first element of the window (which may have moved).
The above sample with the window is a bug and memory corruption because of iterator/window invalidation by extend. If you didn't thought of the invalidation, then the latter example would still be a bug to you, but not a memory corruption.
 This problem seems to be very similar to the Range vs. Iterators
 difference, the former can perform bounds checks on indexing, the later
 are inherently unsafe (with expensive runtime debug checks e.g. in VC++).
But ranges have this same problem. For instance: const(char[])[] lines = stdin.byLine.array; Here, since byLine uses GC buffering, it's safe (but wrong). If non-GC buffers are used, then it's not safe. I think as long as the windows are backed by GC data, it should be safe. In this sense, your choice of buffering scheme can make something safe or not safe. I'm OK with that, as long as iopipes can be safe in some way (and that happens to be the default).
 Similarly always accessing the buffer through IOPipe would allow cheap
 bounds checking, and sure you could still offer IOPipe.ptr for unsafe
 code.
It's an interesting idea to simply make the iopipe the window, not just for safety reasons: 1. this means the iopipe itself *is* a random access range, allowing it to automatically fit into existing algorithms. 2. Existing random-access ranges can be easily shoehorned into being ranges (I already did it with arrays, and it's not much harder with popFrontN). Alternatively, code that uses iopipes can simply check for the existence of iopipe-like methods, and use them if they are present. 3. Less verbose usage, and more uniform access. For instance if an iopipe defines opIndex, then iopipe.window[0] and iopipe[0] are possibly different things, which would be confusing. Some downsides however: 1. iopipes can be complex and windows are not. They were a fixed view of the current buffer. The idea that I can fetch a window of data from an iopipe and then deal simply with that part of the data was attractive.
You could still have a window internally and just forward to that.
 2. The iopipe is generally not copyable once usage begins. In other
 words, the feature of ranges that you can copy them and they just work,
 would be difficult to replicate in iopipe.
That's a general problem. Unique ownership is really useful, but most phobos range methods don't care, and assume copying is implicit saving. Not too nice and I guess this will bite us again with RC/Unique/Weak. The current workaround for this is `refRange`.
 A possible way forward could be:
 
 * iopipe is a random-access range (not necessarily a forward range).
 * iopipe.window returns a non-extendable window of the buffer itself,
 which is a forward/random-access range. If backed by the GC or some form
 of RC, everything is  safe.
 * Functions which now take iopipes could be adjusted to take
 random-access ranges, and if they are also iopipes, could use the extend
 features to get more data.
 * iopipe.release(size_t) could be hooked by popFrontN. I don't like the
 idea of supporting slicing on iopipes, for the non-forward aspect of
 iopipe. Much better to have an internal hook that modifies the range
 in-place.
 
 This would make iopipes fit right into the range hierarchy, and
 therefore could be integrated easily into Phobos.
I made an interesting experiment with buffered input ranges quite a while ago. https://gist.github.com/MartinNowak/1257196 This would use popFront to fetch new data and ref-counts a list of buffers depending on older saved ranges still using earlier buffers. With a bit of creative use, the existing Range primitives could be used to implement infinite look-ahead. auto beg = rng.save; auto end = rng.find("bla"); auto window = beg[0 .. end]; // get a random access window The main problem with this has been, that the many implicit copies (e.g. in foreach) bump the reference-count, so the RC buffer release would often not work. Could be avoided by making them non-copyable, but again phobos and foreach currently don't support this hybrid of input (consuming) and forward (saveable) range. -Martin
Oct 21
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/21/17 6:33 AM, Martin Nowak wrote:
 On 10/19/2017 03:12 PM, Steven Schveighoffer wrote:
 On 10/19/17 7:13 AM, Martin Nowak wrote:
 On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
 What would be nice is a mechanism to detect this situation, since the
 above is both un- safe and incorrect code.

 Possibly you could instrument a window with a mechanism to check to see
 if it's still correct on every access, to be used when compiled in
 non-release mode for checking program correctness.

 But in terms of  safe code in release mode, I think the only option is
 really to rely on the GC or reference counting to allow the window to
 still exist.
We should definitely find a nogc solution to this, but it's a good litmus test for the RC compiler support I'll work on. Why do IOPipe have to hand over the window to the caller? They could just implement the RandomAccessRange interface themselves. Instead of ```d auto w = f.window(); f.extend(random()); w[0]; ``` you could only do ```d f[0]; f.extend(random()); f[0]; // bug, but no memory corruption ```
So the idea here (If I understand correctly) is to encapsulate the window into the pipe, such that you don't need to access the buffer separately? I'm not quite sure because of that last comment. If f[0] is equivalent to previous code f.window[0], then the second f[0] is not a bug, it's valid, and accessing the first element of the window (which may have moved).
The above sample with the window is a bug and memory corruption because of iterator/window invalidation by extend. If you didn't thought of the invalidation, then the latter example would still be a bug to you, but not a memory corruption.
The issue with the original code is that the window may move *within the buffer*. That is, if your current window is looking at the last 1k of a 2M buffer, and you extend, the buffer manager may move the data from the end of the buffer to the beginning, and re-fill the rest of the buffer with new data from the source. In this case, the old window reference that you saved is pointing at completely different data. That is, f.window[0] may not be the same as w[0]. Still safe, but not correct. Whereas in your new code, you are looking at the correct window data every time.
 Some downsides however:

 1. iopipes can be complex and windows are not. They were a fixed view of
 the current buffer. The idea that I can fetch a window of data from an
 iopipe and then deal simply with that part of the data was attractive.
You could still have a window internally and just forward to that.
My attention is really on algorithms that may use the range interface. It may be less efficient and maybe not even correct to use the whole iopipe as a range. At first look, I wanted to create an abstraction on the data itself, and then build a range on top of it. It's a different way to look at it.
 2. The iopipe is generally not copyable once usage begins. In other
 words, the feature of ranges that you can copy them and they just work,
 would be difficult to replicate in iopipe.
That's a general problem. Unique ownership is really useful, but most phobos range methods don't care, and assume copying is implicit saving. Not too nice and I guess this will bite us again with RC/Unique/Weak. The current workaround for this is `refRange`.
There is actually quite a bit of this problem in Phobos. Most range wrapper functions do not take ranges by reference, but by value, making copies everywhere. However, most of the time, this is only during construction, where the copy is a move. But many of the functions do not actually move the parameters into the wrapper, so disabling postblit would be horrific. iopipe, unfortunately, follows that precedent. I should probably correct it.
 A possible way forward could be:

 * iopipe is a random-access range (not necessarily a forward range).
 * iopipe.window returns a non-extendable window of the buffer itself,
 which is a forward/random-access range. If backed by the GC or some form
 of RC, everything is  safe.
 * Functions which now take iopipes could be adjusted to take
 random-access ranges, and if they are also iopipes, could use the extend
 features to get more data.
 * iopipe.release(size_t) could be hooked by popFrontN. I don't like the
 idea of supporting slicing on iopipes, for the non-forward aspect of
 iopipe. Much better to have an internal hook that modifies the range
 in-place.

 This would make iopipes fit right into the range hierarchy, and
 therefore could be integrated easily into Phobos.
I made an interesting experiment with buffered input ranges quite a while ago. https://gist.github.com/MartinNowak/1257196 This would use popFront to fetch new data and ref-counts a list of buffers depending on older saved ranges still using earlier buffers. With a bit of creative use, the existing Range primitives could be used to implement infinite look-ahead. auto beg = rng.save; auto end = rng.find("bla"); auto window = beg[0 .. end]; // get a random access window
This is similar to Dmitry's attempt as well (which unfortunately is no longer available that I can see), but his did not use the range primitives I think. It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range. -Steve
Oct 23
parent reply Martin Nowak <code dawg.eu> writes:
On Monday, 23 October 2017 at 16:34:19 UTC, Steven Schveighoffer 
wrote:
 On 10/21/17 6:33 AM, Martin Nowak wrote:
 On 10/19/2017 03:12 PM, Steven Schveighoffer wrote:
 On 10/19/17 7:13 AM, Martin Nowak wrote:
 On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range.
You mean chunk based processing vs. infinite lookahead for parsing? They both provide a similar API, sth. to extend the current window and sth. to release data. The example input here was an input range, but it's read in page sizes and could as well be a socket.
Oct 24
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 10/24/17 5:32 AM, Martin Nowak wrote:
 On Monday, 23 October 2017 at 16:34:19 UTC, Steven Schveighoffer wrote:
 On 10/21/17 6:33 AM, Martin Nowak wrote:
 On 10/19/2017 03:12 PM, Steven Schveighoffer wrote:
 On 10/19/17 7:13 AM, Martin Nowak wrote:
 On 10/13/2017 08:39 PM, Steven Schveighoffer wrote:
It's solving a different problem than iopipe is solving. I plan on adding iopipe-on-range capability soon as well, since many times, all you have is a range.
You mean chunk based processing vs. infinite lookahead for parsing? They both provide a similar API, sth. to extend the current window and sth. to release data.
Yes, definitely.
 The example input here was an input range, but it's read in page sizes 
 and could as well be a socket.
iopipe provides "infinite" lookahead, which is central to its purpose. The trouble with bolting that on top of ranges, as you said, is that we have to copy everything out of the range, which necessarily buffers somehow (if it's efficient i/o), so you are double buffering. iopipe's purpose is to get rid of this unnecessary buffering. This is why it's a great fit for being the *base* of a range. In other words, if you want something to have optional lookahead and range support, it's better to start out with an extendable buffering type like an iopipe, and bolt ranges on top, vs. the other way around. -Steve
Oct 24
parent reply Martin Nowak <code dawg.eu> writes:
On Tuesday, 24 October 2017 at 14:47:02 UTC, Steven Schveighoffer 
wrote:
 iopipe provides "infinite" lookahead, which is central to its 
 purpose. The trouble with bolting that on top of ranges, as you 
 said, is that we have to copy everything out of the range, 
 which necessarily buffers somehow (if it's efficient i/o), so 
 you are double buffering. iopipe's purpose is to get rid of 
 this unnecessary buffering. This is why it's a great fit for 
 being the *base* of a range.

 In other words, if you want something to have optional 
 lookahead and range support, it's better to start out with an 
 extendable buffering type like an iopipe, and bolt ranges on 
 top, vs. the other way around.
Arguably this it is somewhat hacky to use a range as end marker for slicing sth., but you'd get the same benefit, access to the random buffer with zero-copying. auto beg = rng.save; // save current position auto end = rng.find("bla"); // lookahead using popFront auto window = beg[0 .. end]; // get a random access window to underlying buffer So basically forward ranges with slicing. At least that would require to extend all algorithms with `extend` support, though likely you could have a small extender proxy range for IOPipes. Note that rng could be a wrapper around unbuffered IO reads.
Oct 24
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On Tuesday, 24 October 2017 at 19:05:02 UTC, Martin Nowak wrote:
 On Tuesday, 24 October 2017 at 14:47:02 UTC, Steven 
 Schveighoffer wrote:
 iopipe provides "infinite" lookahead, which is central to its 
 purpose. The trouble with bolting that on top of ranges, as 
 you said, is that we have to copy everything out of the range, 
 which necessarily buffers somehow (if it's efficient i/o), so 
 you are double buffering. iopipe's purpose is to get rid of 
 this unnecessary buffering. This is why it's a great fit for 
 being the *base* of a range.

 In other words, if you want something to have optional 
 lookahead and range support, it's better to start out with an 
 extendable buffering type like an iopipe, and bolt ranges on 
 top, vs. the other way around.
Arguably this it is somewhat hacky to use a range as end marker for slicing sth., but you'd get the same benefit, access to the random buffer with zero-copying. auto beg = rng.save; // save current position auto end = rng.find("bla"); // lookahead using popFront auto window = beg[0 .. end]; // get a random access window to underlying buffer
I had a design like that except save returned a “mark” (not full range) and there was a slice primitive. It even worked with patched std.regex, but at a non-zero performance penalty. I think that maintaining the illusion of a full copy of range when you do “save” for buffered I/O stream is too costly. Because a user can now legally advance both - you need to RC buffers behind the scenes with separate “pointers” for each range that effectively pin them.
 So basically forward ranges with slicing.
 At least that would require to extend all algorithms with 
 `extend` support, though likely you could have a small extender 
 proxy range for IOPipes.

 Note that rng could be a wrapper around unbuffered IO reads.
Oct 24