digitalmars.D - Re: (Phobos - SocketStream) Am I doing something wrong or is this a

Zane (40/50) Nov 04 2009 Thanks Jesse,

BCS (3/9) Nov 04 2009 That probably will work but I wouldn't use readBlock. If you are going t...
Travis Boucher (96/161) Nov 04 2009 There are a few issues with your implementation.

Zane (5/123) Nov 05 2009 Hey Travis,

Travis Boucher (18/145) Nov 04 2009 4k is the standard page size on x86 (well, on most machine actually), so...

Zane <zane.sims gmail.com> writes:

Jesse Phillips Wrote:

 On Tue, 03 Nov 2009 20:05:17 -0500, Zane wrote:
 
 If I am to receive
 these in arbitrarily sized chunks for concatenation, I don't see a
 sensible way of constructing a loop.  Example?
 
 Zane

 
 You can use the number of bytes read to determine and use string slicing 
 to concatenation into the final array.

Thanks Jesse,

Can you (or someone) confirm that this program should work.  I added a loop
with array slicing, but it does not seem to work for me.  The final output of
"num" is 17593, and the file of that size is created, but it is not a valid gif
image.  The code is below (note that this is assuming google still has their
'big-bird logo up :-P)

import std.stream;
import std.stdio;
import std.socket;
import std.socketstream;

import std.c.time;

int main()
{
	char[] line;
	ubyte[] data = new ubyte[17593];
	uint num = 0;

	TcpSocket socket = new TcpSocket(new InternetAddress("www.google.com", 80));

	socket.send("GET /logos/bigbird-hp.gif HTTP/1.0\r\n\r\n");

	SocketStream socketStream = new SocketStream(socket);
	
	while(!socketStream.eof)
	{
		line = socketStream.readLine();

		if (line=="")
			break;

		writef("%s\n", line);
	}
	
	num = socketStream.readBlock(data.ptr, 17593);
	writef("\n\nNum: %d\n", num);

	while(num < 17593)
	{
		num += socketStream.readBlock(data[(num-1)..length].ptr, data.length-num);
		writef("\n\nNum: %d\n", num);
	}

	socketStream.close;
	socket.close;

	File file = new File("logo.gif", FileMode.Out);
	file.write(data);
	file.close;

	return 0;
}

Thanks for everyone's help so far!

Nov 04 2009

BCS <none anon.com> writes:

Hello Zane,

 while(num < 17593)
 {
 num += socketStream.readBlock(data[(num-1)..length].ptr,
 data.length-num);
 writef("\n\nNum: %d\n", num);
 }

That probably will work but I wouldn't use readBlock. If you are going to 
go with a low level 2 arg function, look at readExact.

Nov 04 2009

Travis Boucher <boucher.travis gmail.com> writes:

Zane wrote:
 Jesse Phillips Wrote:
 
 On Tue, 03 Nov 2009 20:05:17 -0500, Zane wrote:

 If I am to receive
 these in arbitrarily sized chunks for concatenation, I don't see a
 sensible way of constructing a loop.  Example?

 Zane

 You can use the number of bytes read to determine and use string slicing 
 to concatenation into the final array.

 
 Thanks Jesse,
 
 Can you (or someone) confirm that this program should work.  I added a loop
with array slicing, but it does not seem to work for me.  The final output of
"num" is 17593, and the file of that size is created, but it is not a valid gif
image.  The code is below (note that this is assuming google still has their
'big-bird logo up :-P)
 
 import std.stream;
 import std.stdio;
 import std.socket;
 import std.socketstream;
 
 import std.c.time;
 
 int main()
 {
 	char[] line;
 	ubyte[] data = new ubyte[17593];
 	uint num = 0;
 
 	TcpSocket socket = new TcpSocket(new InternetAddress("www.google.com", 80));
 
 	socket.send("GET /logos/bigbird-hp.gif HTTP/1.0\r\n\r\n");
 
 	SocketStream socketStream = new SocketStream(socket);
 	
 	while(!socketStream.eof)
 	{
 		line = socketStream.readLine();
 
 		if (line=="")
 			break;
 
 		writef("%s\n", line);
 	}
 	
 	num = socketStream.readBlock(data.ptr, 17593);
 	writef("\n\nNum: %d\n", num);
 
 	while(num < 17593)
 	{
 		num += socketStream.readBlock(data[(num-1)..length].ptr, data.length-num);
 		writef("\n\nNum: %d\n", num);
 	}
 
 	socketStream.close;
 	socket.close;
 
 	File file = new File("logo.gif", FileMode.Out);
 	file.write(data);
 	file.close;
 
 	return 0;
 }
 
 Thanks for everyone's help so far!

There are a few issues with your implementation.

First, parse the headers properly.  Below see my trivial implementation. 
  You want to parse them properly so you can find the correct 
end-of-headers, and check the size of the content from the headers.

readLine() looks to be designed for a text based protocol.  The biggest 
issue is with the end-of-line detection.  "\r", "\n" and "\r\n" are all 
valid end-of-line combinations and it doesn't seem to do the detection 
in a greedy manor.  This leaves us with a trailing '\n' at the end of 
the headers.

The implementation of readBlock() doesn't seem to really wait to fill 
the buffer.  It fills the buffer, if it can.  This is pretty standard
of a read on a socket.  So wrap it in a loop and read chunks.  You want 
to do it this way anyway for many reasons.  The implementation below 
double-buffers which does result in an extra copy.  Although logically 
this seems like a pointless copy, but in a real application it is very 
useful many reasons.

Below is a working version (but still has its own issues).



import std.stream;
import std.stdio;
import std.socket;
import std.socketstream;

import std.string;		// for header parsing
import std.conv;		// for toInt

import std.c.time;

int main()
{
	char[] line;
	ubyte[] data;
	uint num = 0;

	TcpSocket socket = new TcpSocket(new InternetAddress("www.google.com", 
80));

	socket.send("GET /logos/bigbird-hp.gif HTTP/1.0\r\n\r\n");

	SocketStream socketStream = new SocketStream(socket);
	
	string[] response;	// Holds the lines in the response
	while(!socketStream.eof)
	{
		line = socketStream.readLine();

		if (line=="")
			break;

		// Append this line to array of response lines
		response ~= line;
	}

	// Due to how readLine() works, we might end up with a
	//trailing \n, so
	// get rid of it if we do.
	ubyte ncr;
	socketStream.read(ncr);
	if (ncr != '\n')
		data ~= ncr;


	// D's builtin associative arrays (safe & easy hashtables!)
	string[char[]] headers;	
	
	// Parse the HTTP response.  NOTE: This is a REALLY bad HTTP
	// parser. a real parser would handle header parsing properly.
	// See RFC2616 for proper rules.
	foreach (v; response) {
		// There is likely a better way to do this then
		// a join(split())
		string[] kv_pair = split(v, ": ");
		headers[tolower(kv_pair[0])] = join(kv_pair[1 .. $], ":");
	}

	foreach (k, v; headers)
		writefln("[%s] [%s]", k, v);

	uint size;
	if (isNumeric(headers["content-length"])) {
		size = toInt(headers["content-length"]);
	} else {
		writefln("Unable to parse content length of '%s' to a number.",
			headers["content-length"]);
		return 0;
	}
	// This fully buffers the data, if you are fetching large files you
	// process them in chunks rather then in a big buffer.  Also, this
	// does not handle chunked encoding, see RFC2616 for details.
	while (data.length < size && !socketStream.eof) {
		ubyte[4096] buffer;
		num = socketStream.readBlock(buffer.ptr, 4096); // read 4k at a time
		writefln("Read %s bytes [%s/%s] (%s%%)",
			num, data.length, size, (cast(float)data.length/cast(float)size)*100);

		// Process the buffer, in this case just copy it to the data
		// buffer.  This double buffering process may seem bad, but
		// has the advantage of allowing you to thread around data,
		// process the buffer in chunks, etc.
		data ~= buffer[0..num];
	}

	socketStream.close;
	socket.close;

	// It might be worthwhile to chunk this as well in some cases.
	File file = new File("logo.gif", FileMode.Out);
	file.write(data);
	file.close;

	return 0;
}

Nov 04 2009

Zane <zane.sims gmail.com> writes:

Travis Boucher Wrote:

 There are a few issues with your implementation.
 
 First, parse the headers properly.  Below see my trivial implementation. 
   You want to parse them properly so you can find the correct 
 end-of-headers, and check the size of the content from the headers.
 
 readLine() looks to be designed for a text based protocol.  The biggest 
 issue is with the end-of-line detection.  "\r", "\n" and "\r\n" are all 
 valid end-of-line combinations and it doesn't seem to do the detection 
 in a greedy manor.  This leaves us with a trailing '\n' at the end of 
 the headers.
 
 The implementation of readBlock() doesn't seem to really wait to fill 
 the buffer.  It fills the buffer, if it can.  This is pretty standard
 of a read on a socket.  So wrap it in a loop and read chunks.  You want 
 to do it this way anyway for many reasons.  The implementation below 
 double-buffers which does result in an extra copy.  Although logically 
 this seems like a pointless copy, but in a real application it is very 
 useful many reasons.
 
 Below is a working version (but still has its own issues).
 

 
 import std.stream;
 import std.stdio;
 import std.socket;
 import std.socketstream;
 
 import std.string;		// for header parsing
 import std.conv;		// for toInt
 
 import std.c.time;
 
 int main()
 {
 	char[] line;
 	ubyte[] data;
 	uint num = 0;
 
 	TcpSocket socket = new TcpSocket(new InternetAddress("www.google.com", 
 80));
 
 	socket.send("GET /logos/bigbird-hp.gif HTTP/1.0\r\n\r\n");
 
 	SocketStream socketStream = new SocketStream(socket);
 	
 	string[] response;	// Holds the lines in the response
 	while(!socketStream.eof)
 	{
 		line = socketStream.readLine();
 
 		if (line=="")
 			break;
 
 		// Append this line to array of response lines
 		response ~= line;
 	}
 
 	// Due to how readLine() works, we might end up with a
 	//trailing \n, so
 	// get rid of it if we do.
 	ubyte ncr;
 	socketStream.read(ncr);
 	if (ncr != '\n')
 		data ~= ncr;
 
 
 	// D's builtin associative arrays (safe & easy hashtables!)
 	string[char[]] headers;	
 	
 	// Parse the HTTP response.  NOTE: This is a REALLY bad HTTP
 	// parser. a real parser would handle header parsing properly.
 	// See RFC2616 for proper rules.
 	foreach (v; response) {
 		// There is likely a better way to do this then
 		// a join(split())
 		string[] kv_pair = split(v, ": ");
 		headers[tolower(kv_pair[0])] = join(kv_pair[1 .. $], ":");
 	}
 
 	foreach (k, v; headers)
 		writefln("[%s] [%s]", k, v);
 
 	uint size;
 	if (isNumeric(headers["content-length"])) {
 		size = toInt(headers["content-length"]);
 	} else {
 		writefln("Unable to parse content length of '%s' to a number.",
 			headers["content-length"]);
 		return 0;
 	}
 	// This fully buffers the data, if you are fetching large files you
 	// process them in chunks rather then in a big buffer.  Also, this
 	// does not handle chunked encoding, see RFC2616 for details.
 	while (data.length < size && !socketStream.eof) {
 		ubyte[4096] buffer;
 		num = socketStream.readBlock(buffer.ptr, 4096); // read 4k at a time
 		writefln("Read %s bytes [%s/%s] (%s%%)",
 			num, data.length, size, (cast(float)data.length/cast(float)size)*100);
 
 		// Process the buffer, in this case just copy it to the data
 		// buffer.  This double buffering process may seem bad, but
 		// has the advantage of allowing you to thread around data,
 		// process the buffer in chunks, etc.
 		data ~= buffer[0..num];
 	}
 
 	socketStream.close;
 	socket.close;
 
 	// It might be worthwhile to chunk this as well in some cases.
 	File file = new File("logo.gif", FileMode.Out);
 	file.write(data);
 	file.close;
 
 	return 0;
 }

Hey Travis,

I cannot begin to thank you enough for taking the time to explain this in such
an exaustive example!  One reason I like the D community...I can ask
rediculously simple questions and still people humor me with answers and take
the time to care.  I was able to put together a working app based on your
example.  The only part I don't understand from your example that strikes my
curiousity is the buffer.  Why the 4096 size?  I guess what I don't understand
is how do I know what buffer size I should use and if that is the best buffer
size, why?  I hope that makes sense.

Thanks again,
Zane

Nov 05 2009

Travis Boucher <boucher.travis gmail.com> writes:

Zane wrote:
 Travis Boucher Wrote:
 
 There are a few issues with your implementation.

 First, parse the headers properly.  Below see my trivial implementation. 
   You want to parse them properly so you can find the correct 
 end-of-headers, and check the size of the content from the headers.

 readLine() looks to be designed for a text based protocol.  The biggest 
 issue is with the end-of-line detection.  "\r", "\n" and "\r\n" are all 
 valid end-of-line combinations and it doesn't seem to do the detection 
 in a greedy manor.  This leaves us with a trailing '\n' at the end of 
 the headers.

 The implementation of readBlock() doesn't seem to really wait to fill 
 the buffer.  It fills the buffer, if it can.  This is pretty standard
 of a read on a socket.  So wrap it in a loop and read chunks.  You want 
 to do it this way anyway for many reasons.  The implementation below 
 double-buffers which does result in an extra copy.  Although logically 
 this seems like a pointless copy, but in a real application it is very 
 useful many reasons.

 Below is a working version (but still has its own issues).



 import std.stream;
 import std.stdio;
 import std.socket;
 import std.socketstream;

 import std.string;		// for header parsing
 import std.conv;		// for toInt

 import std.c.time;

 int main()
 {
 	char[] line;
 	ubyte[] data;
 	uint num = 0;

 	TcpSocket socket = new TcpSocket(new InternetAddress("www.google.com", 
 80));

 	socket.send("GET /logos/bigbird-hp.gif HTTP/1.0\r\n\r\n");

 	SocketStream socketStream = new SocketStream(socket);
 	
 	string[] response;	// Holds the lines in the response
 	while(!socketStream.eof)
 	{
 		line = socketStream.readLine();

 		if (line=="")
 			break;

 		// Append this line to array of response lines
 		response ~= line;
 	}

 	// Due to how readLine() works, we might end up with a
 	//trailing \n, so
 	// get rid of it if we do.
 	ubyte ncr;
 	socketStream.read(ncr);
 	if (ncr != '\n')
 		data ~= ncr;


 	// D's builtin associative arrays (safe & easy hashtables!)
 	string[char[]] headers;	
 	
 	// Parse the HTTP response.  NOTE: This is a REALLY bad HTTP
 	// parser. a real parser would handle header parsing properly.
 	// See RFC2616 for proper rules.
 	foreach (v; response) {
 		// There is likely a better way to do this then
 		// a join(split())
 		string[] kv_pair = split(v, ": ");
 		headers[tolower(kv_pair[0])] = join(kv_pair[1 .. $], ":");
 	}

 	foreach (k, v; headers)
 		writefln("[%s] [%s]", k, v);

 	uint size;
 	if (isNumeric(headers["content-length"])) {
 		size = toInt(headers["content-length"]);
 	} else {
 		writefln("Unable to parse content length of '%s' to a number.",
 			headers["content-length"]);
 		return 0;
 	}
 	// This fully buffers the data, if you are fetching large files you
 	// process them in chunks rather then in a big buffer.  Also, this
 	// does not handle chunked encoding, see RFC2616 for details.
 	while (data.length < size && !socketStream.eof) {
 		ubyte[4096] buffer;
 		num = socketStream.readBlock(buffer.ptr, 4096); // read 4k at a time
 		writefln("Read %s bytes [%s/%s] (%s%%)",
 			num, data.length, size, (cast(float)data.length/cast(float)size)*100);

 		// Process the buffer, in this case just copy it to the data
 		// buffer.  This double buffering process may seem bad, but
 		// has the advantage of allowing you to thread around data,
 		// process the buffer in chunks, etc.
 		data ~= buffer[0..num];
 	}

 	socketStream.close;
 	socket.close;

 	// It might be worthwhile to chunk this as well in some cases.
 	File file = new File("logo.gif", FileMode.Out);
 	file.write(data);
 	file.close;

 	return 0;
 }

 
 Hey Travis,
 
 I cannot begin to thank you enough for taking the time to explain this in such
an exaustive example!  One reason I like the D community...I can ask
rediculously simple questions and still people humor me with answers and take
the time to care.  I was able to put together a working app based on your
example.  The only part I don't understand from your example that strikes my
curiousity is the buffer.  Why the 4096 size?  I guess what I don't understand
is how do I know what buffer size I should use and if that is the best buffer
size, why?  I hope that makes sense.
 
 Thanks again,
 Zane

4k is the standard page size on x86 (well, on most machine actually), so 
I got in the habit of using it.  Normally I'd make sure it was page 
aligned as well (especially for output buffers), since under certain 
conditions and configurations FreeBSD can turn a socket write of a 4k 
page aligned buffer into a zero-copy operation.

Normally when you do a write(), the kernel will copy the data into it's 
own buffers for processing (so you can't mess with it before it is 
actually sent to the network).

FreeBSD (and possibly others) can do some page table trickery to mark 
the page ready only (well, copy on write), and when the time comes to 
send the data to the network card, if the page hasn't been modified, 
it'll send it straight from your (user space) buffer.

Anyway, it doesn't have to be 4k, or a multiple of 4k, in fact under 
certain configurations you may want it to be smaller (or larger).  In 
this case it doesn't really matter since using sockets stream is going 
to result in a mess of copies anyway (network card -> kernel buffer -> 
socket stream buffer -> your buffer).

Nov 04 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: (Phobos - SocketStream) Am I doing something wrong or is this a