www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D and i/o

reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
Dear,

In my field we are io bound thus I would like to have our tools 
fast as I can read a file.

Thus I started some dummy bench which count the number of lines.
The result is compared to wc -l command. The line counting is 
only a pretext to evaluate the io, this process can be switched 
by any io processing. Thus we use much as possible the buffer 
instead the byLine range. Moreover such range imply that the 
buffer was read once before to be ready to process.


https://github.com/bioinfornatics/test_io

Ideally I would like to process a shared buffer through multiple 
core and run a simd computation. But it is not yet done.
Nov 09 2019
next sibling parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
wrote:
 Dear,

 In my field we are io bound thus I would like to have our tools 
 fast as I can read a file.

 Thus I started some dummy bench which count the number of lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be switched 
 by any io processing. Thus we use much as possible the buffer 
 instead the byLine range. Moreover such range imply that the 
 buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.
If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Nov 09 2019
next sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]
If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
Nov 09 2019
parent reply bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics 
 wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]
If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
a) Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list. b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/. c) https://oxnz.github.io/2016/10/13/linux-aio/
Nov 09 2019
next sibling parent bioinfornatics <bioinfornatics fedoraproject.org> writes:
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler 
 wrote:
 On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics 
 wrote:
 [...]
I haven't really looked at your code but in general I find mmap to be much faster than reading a file when searching for things.
a) Thanks Jonathan, I plan to add a script using mmap. It is definitely into my todo list. b) On linux et seem that kernel could handle // read through asynchronous read ,describe here: https://oxnz.github.io/2016/10/13/linux-aio/. c) https://oxnz.github.io/2016/10/13/linux-aio/
https://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files
Nov 10 2019
prev sibling next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
On Sunday, 10 November 2019 at 07:43:31 UTC, bioinfornatics wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler
 wrote:

 b)
 On linux et seem that kernel could handle // read through
 asynchronous read ,describe
 here: https://oxnz.github.io/2016/10/13/linux-aio/.
Do not use that. If you want AIO on linux you should use io_uring https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient I have been using for some time and it is really fast. The only issue is you need recent kernels
Nov 12 2019
prev sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Sun, Nov 10, 2019 at 8:45 AM bioinfornatics via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Sunday, 10 November 2019 at 07:33:41 UTC, Jonathan Marler
 wrote:

 b)
 On linux et seem that kernel could handle // read through
 asynchronous read ,describe
 here: https://oxnz.github.io/2016/10/13/linux-aio/.
Do not use that. If you want AIO on linux you should use io_uring https://www.phoronix.com/scan.php?page=news_item&px=Linux-io_uring-Fast-Efficient I have been using for some time and it is really fast. The only issue is you need recent kernels
Nov 11 2019
prev sibling next sibling parent Jonathan Marler <johnnymarler gmail.com> writes:
On Sunday, 10 November 2019 at 07:16:31 UTC, bioinfornatics wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 Dear,

 In my field we are io bound thus I would like to have our 
 tools fast as I can read a file.

 Thus I started some dummy bench which count the number of 
 lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be 
 switched by any io processing. Thus we use much as possible 
 the buffer instead the byLine range. Moreover such range imply 
 that the buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.
If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
Here's an example implementation of wc using mmap: import std.stdio, std.algorithm, std.mmfile; void main(string[] args) { foreach (arg; args[1..$]) { auto file = new MmFile(arg, MmFile.Mode.read, 0, null); auto content = cast(char[])file.opSlice; writefln("%s", content.count('\n')); } }
Nov 09 2019
prev sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/10/19 2:16 AM, bioinfornatics wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics wrote:
 Dear,

 In my field we are io bound thus I would like to have our tools fast 
 as I can read a file.

 Thus I started some dummy bench which count the number of lines.
 The result is compared to wc -l command. The line counting is only a 
 pretext to evaluate the io, this process can be switched by any io 
 processing. Thus we use much as possible the buffer instead the byLine 
 range. Moreover such range imply that the buffer was read once before 
 to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through multiple core 
 and run a simd computation. But it is not yet done.
If you have some scripts or enhancements you are welcome Currently results show that naïve implementation is at least twice time slower than wc, up to 5 slower for // scripts
I will say from my experience with iopipe, the secret to counting lines is memchr. After switching to memchr to find single bytes as an optimization, I was beating Linux getline. Both use memchr, but getline does extra processing to ensure the FILE * state is maintained. See https://github.com/schveiguy/iopipe/blob/6fa58b67bc9cadeb5ccded0d686f0fd116aed1ed/examples/byline/byline.d If you run that like: iopipe_byline -nooutput < filetocheck.txt that's about as fast as I can get without using mmap, should be comparable to wc -l. And it should work fine with all encodings (though only UTF8 is optimized with memchr, should work on that). -Steve
Nov 11 2019
prev sibling parent reply Jon Degenhardt <jond noreply.com> writes:
On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
wrote:
 Dear,

 In my field we are io bound thus I would like to have our tools 
 fast as I can read a file.

 Thus I started some dummy bench which count the number of lines.
 The result is compared to wc -l command. The line counting is 
 only a pretext to evaluate the io, this process can be switched 
 by any io processing. Thus we use much as possible the buffer 
 instead the byLine range. Moreover such range imply that the 
 buffer was read once before to be ready to process.


 https://github.com/bioinfornatics/test_io

 Ideally I would like to process a shared buffer through 
 multiple core and run a simd computation. But it is not yet 
 done.
You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com. As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests. --Jon
Nov 10 2019
parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]
You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. A general observation is that if lines are involved, it's important to measure performance of both short and long lines. This may even affect 'wc' when reading by chunk or memory mapped files, see H. S. Teoh's observations on 'wc' performance: https://forum.dlang.org/post/mailman.664.1571878411.8294.digitalmars-d puremagic.com. As an aside - My preliminary conclusion is that phobos facilities are overall quite good (based on tsv-utils comparative performance benchmarks), but are non-optimal when short lines are involved. This is the case for both input and output. Both the tsv-utils covers and iopipe are better, with iopipe being the best for input, but appears to need some further work on the output side (or I don't know iopipe well enough). By "preliminary", I mean just that. There could certainly be mistakes or incomplete analysis in the tests. --Jon
For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Nov 10 2019
next sibling parent Jon Degenhardt <jond noreply.com> writes:
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 On Saturday, 9 November 2019 at 23:39:09 UTC, bioinfornatics 
 wrote:
 [...]
You might also be interested in a similar I/O performance test I created: https://github.com/jondegenhardt/dcat-perf. This one is based on 'cat' (copy to standard output) rather than 'wc', as I'm interested in both input and output, but the general motivation is similar. I specifically wanted to compare native phobos facilities to those in iopipe and some phobos covers in tsv-utils. Most tests are by-line based, as I'm interested in record oriented operations, but chunk-based copying is included. [...]
For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Thanks, I wasn't aware of this. But perhaps I should describe the motivation in more detail. I'm not actually interested in 'cat' per se, it is just a stand-in for the more general processing I'm typically interested in. In every case I'm operating on the records in some form (lines or something else), making a transformation, and depending on application, writing something out. This is the case in tsv-utils as well as many scenarios of the systems I work on (search engines). These applications sometimes operate on data streams, sometimes on complete files. Hence my interest in line-oriented I/O performance. Obviously there is a lot more ground in the general set of applications I'm interested in than is covered in the simple performance tests in dcat-perf, but it's a starting point. It's also why I didn't make comparisons to existing versions of 'cat'.
Nov 10 2019
prev sibling next sibling parent reply sarn <sarn theartofmachinery.com> writes:
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 For "cat" I believe there is a system call to tell the kernel 
 to forward data from one file descriptor to the other, meaning 
 you could implement cat without ever mapping the data into 
 user-space at all. I'm sure this would be the fastest mechanism 
 to implement cat, and I've seen this system call used by a 
 version of cat somewhere out there.
FTR, that sounds like Linux's sendfile and splice syscalls. They're not portable, though.
Nov 10 2019
parent reply Jacob Carlborg <doob me.com> writes:
On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. They're not 
 portable, though.
"sendfile" is intended to send a file over a socket? -- /Jacob Carlborg
Nov 11 2019
next sibling parent Jonathan Marler <johnnymarler gmail.com> writes:
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
 On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. 
 They're not portable, though.
"sendfile" is intended to send a file over a socket?
You could use it to send a file over a socket. However, it should be usable to forward data between any 2 file descriptors. I believe that `cat` uses it to forward a file handle to stdio for example. Or you could use it to implement `cp` to copy from content from one file to another.
Nov 11 2019
prev sibling parent Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Monday, 11 November 2019 at 19:36:22 UTC, Jacob Carlborg wrote:
 On 2019-11-11 02:04, sarn wrote:

 FTR, that sounds like Linux's sendfile and splice syscalls. 
 They're not portable, though.
"sendfile" is intended to send a file over a socket?
It works with any file handle. I used it to implement cp and I had used it with pipes. Its only limitation is the 0x7FFF0000 limit, but a 3 line loop takes care of that easily.
Nov 12 2019
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 [...]
For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also a bit of work around because of its limitations. On Linux it can only send at most 0x7ffff000 (2,147,479,552) bytes for example. I used it to implement a cp and it is indeed quite fast and definitely easier to use than mmap, which is often very difficult to get right (I'm talking C here).
Nov 11 2019
parent ikod <geller.garry gmail.com> writes:
On Monday, 11 November 2019 at 10:14:51 UTC, Patrick Schluter 
wrote:
 On Sunday, 10 November 2019 at 20:33:35 UTC, Jonathan Marler 
 wrote:
 On Sunday, 10 November 2019 at 19:41:52 UTC, Jon Degenhardt 
 wrote:
 [...]
For "cat" I believe there is a system call to tell the kernel to forward data from one file descriptor to the other, meaning you could implement cat without ever mapping the data into user-space at all. I'm sure this would be the fastest mechanism to implement cat, and I've seen this system call used by a version of cat somewhere out there.
Looks like sendfile(), which as said is not portable. It exists on different Unixes but with different semantics. Requires also
There are more non-portable options for fast disk io - O_DIRECT flag for open()[1] and readahead()[2]. 1. http://man7.org/linux/man-pages/man2/open.2.html 2. http://man7.org/linux/man-pages/man2/readahead.2.html
Nov 11 2019