digitalmars.D.bugs - [Issue 3763] New: std.stdio.readlnImpl absurdly inefficient
- d-bugmail puremagic.com (43/43) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (12/12) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (20/20) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (11/29) Feb 01 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (6/6) Feb 03 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (11/11) Feb 21 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
- d-bugmail puremagic.com (10/10) Mar 08 2010 http://d.puremagic.com/issues/show_bug.cgi?id=3763
http://d.puremagic.com/issues/show_bug.cgi?id=3763
Summary: std.stdio.readlnImpl absurdly inefficient
Product: D
Version: 2.040
Platform: Other
OS/Version: Windows
Status: NEW
Keywords: performance
Severity: normal
Priority: P2
Component: Phobos
AssignedTo: nobody puremagic.com
ReportedBy: dsimcha yahoo.com
Apparently stdio.readln() copies the data it reads a ridiculous number of times
and performs a ridiculous amount of heap allocations. The following program
uses **gigabytes** of RAM. This issue came to my attention while reading in a
huge file in the presence of large associative arrays that contained lots of
false pointers.
import std.stdio, core.memory;
void main() {
// Write 512 kilobytes out to a file on a single line.
auto writeHandle = File("foo.txt", "wb");
auto contents = new char[512 * 1024];
contents[] = 'a';
writeHandle.writeln(contents);
writeHandle.close;
contents = null;
GC.collect();
// Read it back with the GC disabled.
GC.disable;
auto readHandle = File("foo.txt");
auto firstLine = readHandle.readln();
stderr.writeln("Check task manager for memory usage, then press enter.");
stdin.readln();
}
Under a sane allocation scheme (geometric growth of the buffer), this program
would only use about 1 MB of RAM plus overhead for stack space, etc., even with
the GC disabled. (Derivation: 512 * 1024 == 2^19. 2^0 + 2^1 + ... + 2^19 ==
2^20 - 1.)
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763
Andrei Alexandrescu <andrei metalanguage.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andrei metalanguage.com
17:04:56 PST ---
Since you are on the dev team, could you please look into this? I take it it
must be the array appends that are the culprit. If you're testing on Windows, I
think the stuff is under the DIGITAL_MARS_STDIO version.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: buf = readNext64Bytes(); buf ~= readRest(); // Recurse. We're effectively prepending to our result in 64-byte increments. Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+). The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 22:30:44 PST ---Well, the problem is pretty clear. It's in std.stdio.readlnImpl(). In the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block, we basically do the following in pseudocode: buf = readNext64Bytes(); buf ~= readRest(); // Recurse.That's Walter's code :o).We're effectively prepending to our result in 64-byte increments.Ouch.Therefore, buf is reallocated once for every 64 bytes once we hit unbuffered I/O. Also note that the use of O(N) stack space causes stack overflows for very long lines (around 700 KB+). The problem is that, from looking at the rest of the code, all the other routines for different OS's and I/O libs are implemented the obvious way, using plain old array appending, which makes me believe that this one is different for a (unknown to me and probably relatively obscure) reason. Why was the unbuffered I/O routine in the DIGITAL_MARS_STDIO version block coded in such an odd way in the first place?I recall Walter wrote that code, and he wrote it in a hurry. We were under deadline pressure. I personally find that code extremely bulky and difficult to follow, and would like to see it simplified. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763 17:37:42 PST --- Love the summary, hate the code. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 03 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763
David Simcha <dsimcha yahoo.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
Changeset 1429.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 21 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3763
Walter Bright <bugzilla digitalmars.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bugzilla digitalmars.com
22:28:15 PST ---
Fixed dmd 2.041
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 08 2010









d-bugmail puremagic.com 