digitalmars.D.learn - Why GNU coreutils/dd is creating a dummy file more efficiently than
- BoQsc (16/16) May 23 2019 This code of D creates a dummy 47,6 MB text file filled with Nul
- Cym13 (16/32) May 23 2019 If you're talking about benchmarking it's important to provide
- kdevel (25/33) May 23 2019 His code doesn't write 1 byte at a time either. strace on my
- H. S. Teoh (11/18) May 23 2019 If you're on Linux, writing a bunch of zeroes just to create a large
- Daniel =?UTF-8?B?S296w6Fr?= (4/23) May 23 2019 Yes using sparse files is good, but only for this case. If you
- Daniel Kozak (3/19) May 23 2019 https://matthias-endler.de/2017/yes/
-
Daniel Kozak
(17/17)
May 23 2019
On Thu, May 23, 2019 at 11:19 PM Daniel Kozak
wrote: - Daniel Kozak (15/18) May 23 2019 So this should do it
This code of D creates a dummy 47,6 MB text file filled with Nul characters in about 9 seconds import std.stdio, std.process; void main() { writeln("Creating a dummy file"); File file = File("test.txt", "w"); for (int i = 0; i < 50000000; i++) { file.write("\x00"); } file.close(); } While GNU coreutils dd can create 500mb dummy Nul file in a second. https://github.com/coreutils/coreutils/blob/master/src/dd.c What are the explanations for this?
May 23 2019
On Thursday, 23 May 2019 at 09:09:05 UTC, BoQsc wrote:This code of D creates a dummy 47,6 MB text file filled with Nul characters in about 9 seconds import std.stdio, std.process; void main() { writeln("Creating a dummy file"); File file = File("test.txt", "w"); for (int i = 0; i < 50000000; i++) { file.write("\x00"); } file.close(); } While GNU coreutils dd can create 500mb dummy Nul file in a second. https://github.com/coreutils/coreutils/blob/master/src/dd.c What are the explanations for this?If you're talking about benchmarking it's important to provide both source code and how you use/compile them. However in that case I think I can point you in the right direction already: I'll suppose that you used something like that: dd if=/dev/zero of=testfile bs=1M count=500 Note in particular the blocksize argument. I set it to 1M but by default it's 512 bytes. If you use strace with the command above you'll see a series of write() calls, each writting 1M of null bytes to testfile. That's the main difference between your code and what dd does: it doesn't write 1 byte at a time. This results in way less system calls and system calls are very expensive. To go fast, read/write bigger chunks. I may be wrong though, maybe you tested with a bs of 1 byte, so test for yourself and if necessary provide all informations and not just pieces so that we are able to reproduce your test :)
May 23 2019
On Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote: [...]Note in particular the blocksize argument. I set it to 1M but by default it's 512 bytes. If you use strace with the command above you'll see a series of write() calls, each writting 1M of null bytes to testfile. That's the main difference between your code and what dd does: it doesn't write 1 byte at a time.His code doesn't write 1 byte at a time either. strace on my machine reports a blocksize of 4096. If I use this blocksize with dd it still takes only a fraction of a second to complete.This results in way less system calls and system calls are very expensive.His program and dd with bs=4K both have the same number of syscalls.To go fast, read/write bigger chunks.Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete. y.d ``` import std.stdio, std.process; void main() { writeln("Creating a dummy file"); File file = File("test.txt", "w"); ubyte [4096] nuls; for (int i = 0; i < 50_000_000 / nuls.sizeof; ++i) file.write(cast (char[nuls.sizeof]) nuls); file.close(); } ```
May 23 2019
On Thu, May 23, 2019 at 06:20:23PM +0000, kdevel via Digitalmars-d-learn wrote:On Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote:[...]If you're on Linux, writing a bunch of zeroes just to create a large file is a waste of time. Just use the kernel's sparse file feature: https://www.systutorials.com/136652/handling-sparse-files-on-linux/ The blocks won't actually get allocated until you write something to them, so this beats any write-based method of creating a file filled with zeroes -- probably by several orders of magnitude. :-P T -- It is not the employer who pays the wages. Employers only handle the money. It is the customer who pays the wages. -- Henry FordTo go fast, read/write bigger chunks.Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete.
May 23 2019
On Thursday, 23 May 2019 at 18:37:17 UTC, H. S. Teoh wrote:On Thu, May 23, 2019 at 06:20:23PM +0000, kdevel via Digitalmars-d-learn wrote:Yes using sparse files is good, but only for this case. If you need write something else than null it is not so usable. But AFAIK not all FS support this anywayOn Thursday, 23 May 2019 at 09:44:15 UTC, Cym13 wrote:[...]If you're on Linux, writing a bunch of zeroes just to create a large file is a waste of time. Just use the kernel's sparse file feature: https://www.systutorials.com/136652/handling-sparse-files-on-linux/ The blocks won't actually get allocated until you write something to them, so this beats any write-based method of creating a file filled with zeroes -- probably by several orders of magnitude. :-P TTo go fast, read/write bigger chunks.Or use rawWrite instead of write (reduces the runtime to about 1.6 s). When using write time is IMHO spent in unicode processing and/or locking. Or write more characters at a time. The code below takes 60 ms to complete.
May 23 2019
On Thu, May 23, 2019 at 11:10 AM BoQsc via Digitalmars-d-learn < digitalmars-d-learn puremagic.com> wrote:This code of D creates a dummy 47,6 MB text file filled with Nul characters in about 9 seconds import std.stdio, std.process; void main() { writeln("Creating a dummy file"); File file = File("test.txt", "w"); for (int i = 0; i < 50000000; i++) { file.write("\x00"); } file.close(); } While GNU coreutils dd can create 500mb dummy Nul file in a second. https://github.com/coreutils/coreutils/blob/master/src/dd.c What are the explanations for this?https://matthias-endler.de/2017/yes/
May 23 2019
On Thu, May 23, 2019 at 11:19 PM Daniel Kozak <kozzi11 gmail.com> wrote: Fixed version without decode to dchar void main() { import std.range : array, cycle, take; import std.stdio; import std.utf; immutable buf_size = 8192; immutable buf = "\x00".byCodeUnit.cycle.take(buf_size).array; auto cnt = 50_000_000 / buf_size; immutable tail = "\x00".byCodeUnit.cycle.take(50_000_000 % buf_size).array; File file = File("test.txt", "w"); while(cnt--) file.rawWrite(buf); file.rawWrite(tail); }
May 23 2019
On Thu, May 23, 2019 at 11:06 PM Daniel Kozak <kozzi11 gmail.com> wrote:On Thu, May 23, 2019 at 11:10 AM BoQsc via Digitalmars-d-learn < digitalmars-d-learn puremagic.com> wrote:https://matthias-endler.de/2017/yes/So this should do it void main() { import std.range : array, cycle, take; import std.stdio; immutable buf_size = 8192; immutable buf = "\x00".cycle.take(buf_size).array; auto cnt = 50_000_000 / buf_size; immutable tail = "\x00".cycle.take(50_000_000 % buf_size).array; File file = File("test.txt", "w"); while(cnt--) file.rawWrite(buf); file.rawWrite(tail); }
May 23 2019