www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SocketStream

reply Bob <Bob_member pathlink.com> writes:
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain
that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML
page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which
usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case.
I could look for [/HTML] tags but some documents contain none and others have
multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully
loaded other than waiting for eof()?

Thanks.
Jan 18 2005
next sibling parent reply Kris <Kris_member pathlink.com> writes:
Yes, there is a solution: you might consider using Mango (over at dsource.org)
instead, since it has a fully operational HTTP server and Servlet wrapper. You
can grab whatever headers you need at either level (mango.http.server, or
mango.http.servlet). Take a look at some of the examples to get started.

If you're building a client rather than a server, then you migh consider
mango.http.client instead -- it provides you with access to all the headers
also.

Oh, and the server is rather fast: once operational, it doesn't allocate any
memory at all -- so the GC is never active (except for allocations made within
your own code).

There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the
server and a single client feeding it with requests: I recall it was completing
~3500 requests per second, and half of the CPU was eaten by the client portion.


In article <csjh1m$1spp$1 digitaldaemon.com>, Bob says...
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain
that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML
page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which
usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case.
I could look for [/HTML] tags but some documents contain none and others have
multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully
loaded other than waiting for eof()?

Thanks.

Jan 18 2005
parent Bob <Bob_member pathlink.com> writes:
Quite interesting project. Thanks for your info.
Doing some test now ...


In article <csjmhq$24c8$1 digitaldaemon.com>, Kris says...
Yes, there is a solution: you might consider using Mango (over at dsource.org)
instead, since it has a fully operational HTTP server and Servlet wrapper. You
can grab whatever headers you need at either level (mango.http.server, or
mango.http.servlet). Take a look at some of the examples to get started.

If you're building a client rather than a server, then you migh consider
mango.http.client instead -- it provides you with access to all the headers
also.

Oh, and the server is rather fast: once operational, it doesn't allocate any
memory at all -- so the GC is never active (except for allocations made within
your own code).

There was a test done on Gentoo linux, with a 1.4Ghz Pentium-M running both the
server and a single client feeding it with requests: I recall it was completing
~3500 requests per second, and half of the CPU was eaten by the client portion.


In article <csjh1m$1spp$1 digitaldaemon.com>, Bob says...
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain
that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML
page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which
usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case.
I could look for [/HTML] tags but some documents contain none and others have
multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully
loaded other than waiting for eof()?

Thanks.


Jan 18 2005
prev sibling parent Ben Hinkle <Ben_member pathlink.com> writes:
In article <csjh1m$1spp$1 digitaldaemon.com>, Bob says...
Tried to convert one of my scriptlets to D:

I was trying to use SocketStream to capture HTML pages, because I am certain
that D is much faster parsing the contents than Javascript.

Unfortunately I have not found any means to get the proper timing when the HTML
page has been received.

"eof()" returns only TRUE when the connection to the server is terminated, which
usually takes many times longer than receiving the contents.

"available()" always returns 0 bytes, which is also no help.

Some HTTP headers do mention the content size, but this is not always the case.
I could look for [/HTML] tags but some documents contain none and others have
multiple [/HTML] tags.

Is there any solution for the program to know when the document has been fully
loaded other than waiting for eof()?

Thanks.

I don't know the answer to your question but if you have ideas to improve std.socketstream don't hesitate to try them out, post and/or email them to Walter. One thing I see glancing over the code is that it doesn't take advantage of the API to readLine that accepts an input buffer. That would improve performance if that turns out to be a problem. Also using a BufferedStream might help. It could probably use a fresh look to see what needs updating. On the other hand Mango is also an option as Kris mentioned. In terms of knowing when the content ends, I think you've answered your own question: either wait for eof or bail at /html. But that's my naive guess. -Ben
Jan 18 2005