www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading a file eats whole memory

reply "Emil Wojak" <emil wojak.eu> writes:
Hi!

Could someone please explain why this code tries to eat my 1 GB memory a=
nd  =

gets killed by the kernel afterwards? Eventually it prints "Error: Out o=
f  =

memory" when I set ulimit on memory prior to launching the program.

The code:
import std.stream;

int main(char [][] args) {
	Stream input=3Dnew File(args[0]);

	char[] data;
	input.read(data);
	input.close();
	return 0;
}

My intention was to read the executable itself, which is about 444 kB.
I'm running Linux, compiling with Digital Mars D Compiler v1.022
Oct 21 2007
next sibling parent reply div0 <div0 users.sourceforge.net> writes:
Emil Wojak wrote:
 Hi!
 
 Could someone please explain why this code tries to eat my 1 GB memory 
 and gets killed by the kernel afterwards? Eventually it prints "Error: 
 Out of memory" when I set ulimit on memory prior to launching the program.
 
 The code:
 import std.stream;
 
 int main(char [][] args) {
     Stream input=new File(args[0]);
 
     char[] data;
     input.read(data);
     input.close();
     return 0;
 }
 
 My intention was to read the executable itself, which is about 444 kB.
 I'm running Linux, compiling with Digital Mars D Compiler v1.022

You are trying to read a string in, so I guess the routine is using the 1st four bytes as a string length count. That's how tango works anyway IIRC. -- My enormous talent is exceeded only by my outrageous laziness.
Oct 21 2007
parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"div0" <div0 users.sourceforge.net> wrote in message 
news:fffqid$1csk$1 digitalmars.com...
 You are trying to read a string in, so I guess the routine is using the 
 1st four bytes as a string length count. That's how tango works anyway 
 IIRC.

 -- 

You are precisely right. If you just want to get all the data in a file, just do: import std.file; int main(char[][] args) { ubyte[] data = cast(ubyte[])std.file.read(args[0]); return 0; } Two things: one, std.file.read returns a void[], which is a bit like D's equivalent of a void* -- it can point to anything, but you can't modify its data, and it also has a length which indicates the number of bytes in the data. Two, I'm casting to ubyte[] instead of char[]. Do NOT use char[] for "plain old data" as in C. char is a UTF-8 datatype, not a "one byte" datatype. You'll most likely get errors unless your input file is all plain ASCII or UTF-8 text. D provides the byte and ubyte types for raw byte data.
Oct 21 2007
prev sibling parent reply Frank Benoit <keinfarbton googlemail.com> writes:
Emil Wojak schrieb:
 Hi!
 
 Could someone please explain why this code tries to eat my 1 GB memory
 and gets killed by the kernel afterwards? Eventually it prints "Error:
 Out of memory" when I set ulimit on memory prior to launching the program.
 
 The code:
 import std.stream;
 
 int main(char [][] args) {
     Stream input=new File(args[0]);
 
     char[] data;
     input.read(data);
     input.close();
     return 0;
 }
 
 My intention was to read the executable itself, which is about 444 kB.
 I'm running Linux, compiling with Digital Mars D Compiler v1.022

other had commented the file reading... Using arg[0] to access the programs binary is not save, because if it is called via the PATH variable it does not contain the path. /proc/self/exe is a link to your executable.
Oct 21 2007
parent "Emil Wojak" <emil wojak.eu> writes:
Dnia 21-10-2007 o 17:45:54 Frank Benoit <keinfarbton googlemail.com>  
napisaƂ(a):

Thank you everyone for your explanations. This test below proves what you  
wrote:

$ echo -en '\x03\x00\x00\x00abcdefgh' > string.dat

A test code:
-----------------
import std.stdio;
import std.stream;

int main(char [][] args) {
	Stream input=new File(args[1], FileMode.In);
	char[] data;
	input.read(data);
	writefln("data.length=", data.length, " data=", data);
	input.close();
	return 0;
}
-----------------
$ dmd test.d
$ ./test ./string.dat
data.length=3 data=abc

So the program reads 7 bytes - array length (4 bytes) + 3 bytes of data.
Switching type of data to ubyte[5] makes the program read exactly 5 bytes  
("\x03\x00\x00\x00a").

 Using arg[0] to access the programs binary is not save, because if it is
 called via the PATH variable it does not contain the path.
 /proc/self/exe is a link to your executable.

Well, argv[0] was just a quick and dirty test file, nevertheless thanks for your hint :)
Oct 21 2007