www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Reading web pages

reply Xan xan <xancorreu gmail.com> writes:
Hi,

I want to simply code a script to get the url as string in D 2.0.
I have this code:

//D 2.0
//gdmd-4.6
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;

int main(string [] args)
{
    if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./aranya {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		foreach (a; args[1..$]) {
			Socket sock = new TcpSocket(new InternetAddress(a, 80));
			scope(exit) sock.close();
			Stream ss = new SocketStream(sock);
			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
			writeln(ss);
		}
		return 0;
	}
}


but when I use it, I receive:
$ ./aranya http://www.google.com
std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
Unable to resolve host 'http://www.google.com'

What fails?

Thanks in advance,
Xan.
Jan 19 2012
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01/19/2012 04:30 PM, Xan xan wrote:
 Hi,

 I want to simply code a script to get the url as string in D 2.0.
 I have this code:

 //D 2.0
 //gdmd-4.6
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(string [] args)
 {
      if (args.length<  2) {
 		writeln("Usage:");
 		writeln("   ./aranya {<url1>,<url2>, ...}");
 		return 0;
 	}
 	else {
 		foreach (a; args[1..$]) {
 			Socket sock = new TcpSocket(new InternetAddress(a, 80));
 			scope(exit) sock.close();
 			Stream ss = new SocketStream(sock);
 			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
 			writeln(ss);
 		}
 		return 0;
 	}
 }


 but when I use it, I receive:
 $ ./aranya http://www.google.com
 std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
 Unable to resolve host 'http://www.google.com'

 What fails?

 Thanks in advance,
 Xan.
The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 19 2012
next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
You can always use my module:
   https://github.com/Bystroushaak/DHTTPClient

On 19.1.2012 20:24, Timon Gehr wrote:
 On 01/19/2012 04:30 PM, Xan xan wrote:
 Hi,

 I want to simply code a script to get the url as string in D 2.0.
 I have this code:

 //D 2.0
 //gdmd-4.6
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(string [] args)
 {
 if (args.length< 2) {
 writeln("Usage:");
 writeln(" ./aranya {<url1>,<url2>, ...}");
 return 0;
 }
 else {
 foreach (a; args[1..$]) {
 Socket sock = new TcpSocket(new InternetAddress(a, 80));
 scope(exit) sock.close();
 Stream ss = new SocketStream(sock);
 ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
 writeln(ss);
 }
 return 0;
 }
 }


 but when I use it, I receive:
 $ ./aranya http://www.google.com
 std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
 Unable to resolve host 'http://www.google.com'

 What fails?

 Thanks in advance,
 Xan.
The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 19 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Nope:

xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 aranya.d
xan gerret:~/yottium/ codi/aranya-d2.0$ ./aranya www.google.com
std.socket.TcpSocket


What fails?

2012/1/19 Timon Gehr <timon.gehr gmx.ch>:
 On 01/19/2012 04:30 PM, Xan xan wrote:
 Hi,

 I want to simply code a script to get the url as string in D 2.0.
 I have this code:

 //D 2.0
 //gdmd-4.6
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(string [] args)
 {
 =C2=A0 =C2=A0 if (args.length< =C2=A02) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=
;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 =
./aranya {<url1>,<url2>, ...}");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0else {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0foreach (a; args[=
1..$]) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Socket sock =3D new TcpSocket(new InternetAddress(a,
 80));
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0scope(exit) sock.close();
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Stream ss =3D new SocketStream(sock);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0writeln(ss);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 }


 but when I use it, I receive:
 $ ./aranya http://www.google.com
 std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
 Unable to resolve host 'http://www.google.com'

 What fails?

 Thanks in advance,
 Xan.
The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Thanks for that. The standard library would include it. It will easy
the things.... high level, please.

For the other hand, how to specify the protocol? It's not the same
http://foo than ftp://foo

Thanks,
Xan.

2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 You can always use my module:
 =C2=A0https://github.com/Bystroushaak/DHTTPClient


 On 19.1.2012 20:24, Timon Gehr wrote:
 On 01/19/2012 04:30 PM, Xan xan wrote:
 Hi,

 I want to simply code a script to get the url as string in D 2.0.
 I have this code:

 //D 2.0
 //gdmd-4.6
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(string [] args)
 {
 if (args.length< 2) {
 writeln("Usage:");
 writeln(" ./aranya {<url1>,<url2>, ...}");
 return 0;
 }
 else {
 foreach (a; args[1..$]) {
 Socket sock =3D new TcpSocket(new InternetAddress(a, 80));
 scope(exit) sock.close();
 Stream ss =3D new SocketStream(sock);
 ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
 writeln(ss);
 }
 return 0;
 }
 }


 but when I use it, I receive:
 $ ./aranya http://www.google.com
 std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
 Unable to resolve host 'http://www.google.com'

 What fails?

 Thanks in advance,
 Xan.
The protocol specification is part of the get request. ./aranaya www.google.com seems to actually connect to google. (it still does not work fully, I get back 400 Bad Request, but maybe you can figure it out)
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
I get errors:

xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d
spider.o: In function `_Dmain':
spider.d:(.text+0x4d): undefined reference to
`_D11dhttpclient10HTTPClient7__ClassZ'
spider.d:(.text+0x5a): undefined reference to
`_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInf=
oZ'
collect2: ld returned 1 exit status


with the file spider.d:

//D 2.0
//gdmd-4.6 <fitxer> =3D> surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
    if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			HTTPClient navegador =3D new HTTPClient();
			foreach (a; args[1..$]) {
				writeln("[Contingut: ", navegador.get(a), "]");
			}
		}
		catch (Exception e) {
			writeln("[Excepci=C3=B3: ", e, "]");
		}
		return 0;
	}
}



What happens now?

Thanks a lot,
Xan.

2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 You can always use my module:
 =C2=A0https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
With dmd 2.057 on my linux machine:

bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d
bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org
[Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<HTML>
.....


On 20.1.2012 15:37, Xan xan wrote:
 I get errors:

 xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d
 spider.o: In function `_Dmain':
 spider.d:(.text+0x4d): undefined reference to
 `_D11dhttpclient10HTTPClient7__ClassZ'
 spider.d:(.text+0x5a): undefined reference to
 `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
 spider.o:(.data+0x24): undefined reference to `_D11dhttpclient12__ModuleInfoZ'
 collect2: ld returned 1 exit status


 with the file spider.d:

 //D 2.0
 //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
      if (args.length<  2) {
 		writeln("Usage:");
 		writeln("   ./spider {<url1>,<url2>, ...}");
 		return 0;
 	}
 	else {
 		try {
 			HTTPClient navegador = new HTTPClient();
 			foreach (a; args[1..$]) {
 				writeln("[Contingut: ", navegador.get(a), "]");
 			}
 		}
 		catch (Exception e) {
 			writeln("[Excepció: ", e, "]");
 		}
 		return 0;
 	}
 }



 What happens now?

 Thanks a lot,
 Xan.

 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 You can always use my module:
   https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Yes. I ddi not know that I have to compile the two d files, although
it has sense ;-)

Perfect.

On the other hand, I see dhttpclient  identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
Gecko/20100401 Firefox/3.6.13"

How can I change that?




2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 With dmd 2.057 on my linux machine:

 bystrousak:DHTTPClient,0$ dmd spider.d dhttpclient.d
 bystrousak:DHTTPClient,0$ ./spider http://kitakitsune.org
 [Contingut: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN=
"
 "http://www.w3.org/TR/html4/loose.dtd">

 <HTML>
 .....



 On 20.1.2012 15:37, Xan xan wrote:
 I get errors:

 xan gerret:~/yottium/ codi/aranya-d2.0$ gdmd-4.6 spider.d
 spider.o: In function `_Dmain':
 spider.d:(.text+0x4d): undefined reference to
 `_D11dhttpclient10HTTPClient7__ClassZ'
 spider.d:(.text+0x5a): undefined reference to
 `_D11dhttpclient10HTTPClient6__ctorMFZC11dhttpclient10HTTPClient'
 spider.o:(.data+0x24): undefined reference to
 `_D11dhttpclient12__ModuleInfoZ'
 collect2: ld returned 1 exit status


 with the file spider.d:

 //D 2.0
 //gdmd-4.6<fitxer> =C2=A0=3D> =C2=A0surt el fitxer amb el mateix nom i .=
o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
 =C2=A0 =C2=A0 if (args.length< =C2=A02) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=
;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 =
./spider {<url1>,<url2>, ...}");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0else {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0foreach (a; args[1..$]) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", navegador.= get(a),
 "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exception =
e) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 }



 What happens now?

 Thanks a lot,
 Xan.

 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 You can always use my module:
 =C2=A0https://github.com/Bystroushaak/DHTTPClient
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
This module is very simple, only for HTTP protocol, but there is way how 
to add HTTPS:

public void setTcpSocketCreator(TcpSocket function(string domain, ushort 
port) fn)

You can add lambda function which return SSL socket, which will be 
called for every connection.

FTP is not supported - it is DHTTPCLient, not DFTPClient :)

On 20.1.2012 15:24, Xan xan wrote:
 For the other hand, how to specify the protocol? It's not the same
 http://foo  thanftp://foo
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
There are two ways:

Change global variable for module:

dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

This will change headers for all clients.

---

Change instance headers:

string[string] my_headers = dhttpclient.FFHeaders; // there are more 
headers than just User-Agent and you have to copy it
my_headers["User-Agent"] = "My own spider!";

HTTPClient navegador = new HTTPClient();
navegador.setClientHeaders(my_headers);

---

Headers are defined as:

public enum string[string] FFHeaders = [
   "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; 
rv:1.9.2.3) Gecko/20100401 Firefox/3.6.13",
   "Accept" : 
"text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
   "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
   "Accept-Charset" : "utf-8",
   "Keep-Alive" : "300",
   "Connection" : "keep-alive"
];

/// Headers from firefox 3.6.13 on Linux
public enum string[string] LFFHeaders = [
   "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3) 
Gecko/20100401 Firefox/3.6.13",
   "Accept" : 
"text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",
   "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
   "Accept-Charset" : "utf-8",
   "Keep-Alive" : "300",
   "Connection" : "keep-alive"
];

Accept, Accept-Charset, Kepp-ALive and Connection are important and if 
you redefine it, module can stop work with some servers.

On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient  identifies as
   "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
First version was buggy. I've updated code at github, so if you want to 
try it, pull new version (git pull). I've also added new example into 
examples/user_agent_change.d

On 20.1.2012 16:08, Bystroushaak wrote:
 There are two ways:

 Change global variable for module:

 dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

 This will change headers for all clients.

 ---

 Change instance headers:

 string[string] my_headers = dhttpclient.FFHeaders; // there are more
 headers than just User-Agent and you have to copy it
 my_headers["User-Agent"] = "My own spider!";

 HTTPClient navegador = new HTTPClient();
 navegador.setClientHeaders(my_headers);

 ---

 Headers are defined as:

 public enum string[string] FFHeaders = [
 "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :
 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 /// Headers from firefox 3.6.13 on Linux
 public enum string[string] LFFHeaders = [
 "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :
 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 Accept, Accept-Charset, Kepp-ALive and Connection are important and if
 you redefine it, module can stop work with some servers.

 On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Thank you very much, Bystroushaak.
I see you limite httpclient to xml/html documents. Is there
possibility of download any files (and not only html or xml). Just
like:

HTTPClient navegador = new HTTPClient();
auto file = navegador.download("http://www.google.com/myfile.pdf")

?

Thanks a lot,



2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 First version was buggy. I've updated code at github, so if you want to try
 it, pull new version (git pull). I've also added new example into
 examples/user_agent_change.d


 On 20.1.2012 16:08, Bystroushaak wrote:
 There are two ways:

 Change global variable for module:

 dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

 This will change headers for all clients.

 ---

 Change instance headers:

 string[string] my_headers = dhttpclient.FFHeaders; // there are more
 headers than just User-Agent and you have to copy it
 my_headers["User-Agent"] = "My own spider!";

 HTTPClient navegador = new HTTPClient();
 navegador.setClientHeaders(my_headers);

 ---

 Headers are defined as:

 public enum string[string] FFHeaders = [
 "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 /// Headers from firefox 3.6.13 on Linux
 public enum string[string] LFFHeaders = [
 "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 Accept, Accept-Charset, Kepp-ALive and Connection are important and if
 you redefine it, module can stop work with some servers.

 On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
It is unlimited, you just have to cast output to ubyte[]:

std.file.write("logo3w.png", cast(ubyte[]) 
cl.get("http://www.google.cz/images/srpr/logo3w.png"));

On 20.1.2012 17:53, Xan xan wrote:
 Thank you very much, Bystroushaak.
 I see you limite httpclient to xml/html documents. Is there
 possibility of download any files (and not only html or xml). Just
 like:

 HTTPClient navegador = new HTTPClient();
 auto file = navegador.download("http://www.google.com/myfile.pdf")

 ?

 Thanks a lot,



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 First version was buggy. I've updated code at github, so if you want to try
 it, pull new version (git pull). I've also added new example into
 examples/user_agent_change.d


 On 20.1.2012 16:08, Bystroushaak wrote:
 There are two ways:

 Change global variable for module:

 dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

 This will change headers for all clients.

 ---

 Change instance headers:

 string[string] my_headers = dhttpclient.FFHeaders; // there are more
 headers than just User-Agent and you have to copy it
 my_headers["User-Agent"] = "My own spider!";

 HTTPClient navegador = new HTTPClient();
 navegador.setClientHeaders(my_headers);

 ---

 Headers are defined as:

 public enum string[string] FFHeaders = [
 "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 /// Headers from firefox 3.6.13 on Linux
 public enum string[string] LFFHeaders = [
 "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",

 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 Accept, Accept-Charset, Kepp-ALive and Connection are important and if
 you redefine it, module can stop work with some servers.

 On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
If you want to know what type of file you just downloaded, look at 
.getResponseHeaders():

   std.file.write("logo3w.png", cast(ubyte[]) 
cl.get("http://www.google.cz/images/srpr/logo3w.png"));
   writeln(cl.getResponseHeaders()["Content-Type"]);

Which will print in this case: image/png

Here is full example: 
https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download_binary_file.d

On 20.1.2012 18:00, Bystroushaak wrote:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));

 On 20.1.2012 17:53, Xan xan wrote:
 Thank you very much, Bystroushaak.
 I see you limite httpclient to xml/html documents. Is there
 possibility of download any files (and not only html or xml). Just
 like:

 HTTPClient navegador = new HTTPClient();
 auto file = navegador.download("http://www.google.com/myfile.pdf")

 ?

 Thanks a lot,



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 First version was buggy. I've updated code at github, so if you want
 to try
 it, pull new version (git pull). I've also added new example into
 examples/user_agent_change.d


 On 20.1.2012 16:08, Bystroushaak wrote:
 There are two ways:

 Change global variable for module:

 dhttpclient.DefaultHeaders = dhttpclient.IEHeaders; // or your own

 This will change headers for all clients.

 ---

 Change instance headers:

 string[string] my_headers = dhttpclient.FFHeaders; // there are more
 headers than just User-Agent and you have to copy it
 my_headers["User-Agent"] = "My own spider!";

 HTTPClient navegador = new HTTPClient();
 navegador.setClientHeaders(my_headers);

 ---

 Headers are defined as:

 public enum string[string] FFHeaders = [
 "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
 rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",


 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 /// Headers from firefox 3.6.13 on Linux
 public enum string[string] LFFHeaders = [
 "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :

 "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain",


 "Accept-Language" : "cs,en-us;q=0.7,en;q=0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 Accept, Accept-Charset, Kepp-ALive and Connection are important and if
 you redefine it, module can stop work with some servers.

 On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Before and now, I get this error:

$ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
[Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640)=
:
Can't convert value `HTT' of type string to type uint]

The code:

//D 2.0
//gdmd-4.6 <fitxer> =3D> surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
    if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			string[string] capcalera =3D dhttpclient.FFHeaders;
			//capcalera["User-Agent"] =3D "arachnida yottiuma";
			HTTPClient navegador =3D new HTTPClient();
			navegador.setClientHeaders(capcalera);

			foreach (a; args[1..$]) {
				writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]");
			}
		}
		catch (Exception e) {
			writeln("[Excepci=C3=B3: ", e, "]");
		}
		return 0;
	}
}



What happens?


2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Thanks, but what fails that, because I downloaded as collection of
bytes. No matter if a file is a pdf, png or whatever if I downloaded
as bytes, isn't?

Thanks,


2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 If you want to know what type of file you just downloaded, look at
 .getResponseHeaders():


 =C2=A0std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
 =C2=A0writeln(cl.getResponseHeaders()["Content-Type"]);

 Which will print in this case: image/png

 Here is full example:
 https://github.com/Bystroushaak/DHTTPClient/blob/master/examples/download=
_binary_file.d
 On 20.1.2012 18:00, Bystroushaak wrote:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));

 On 20.1.2012 17:53, Xan xan wrote:
 Thank you very much, Bystroushaak.
 I see you limite httpclient to xml/html documents. Is there
 possibility of download any files (and not only html or xml). Just
 like:

 HTTPClient navegador =3D new HTTPClient();
 auto file =3D navegador.download("http://www.google.com/myfile.pdf")

 ?

 Thanks a lot,



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 First version was buggy. I've updated code at github, so if you want
 to try
 it, pull new version (git pull). I've also added new example into
 examples/user_agent_change.d


 On 20.1.2012 16:08, Bystroushaak wrote:
 There are two ways:

 Change global variable for module:

 dhttpclient.DefaultHeaders =3D dhttpclient.IEHeaders; // or your own

 This will change headers for all clients.

 ---

 Change instance headers:

 string[string] my_headers =3D dhttpclient.FFHeaders; // there are mor=
e
 headers than just User-Agent and you have to copy it
 my_headers["User-Agent"] =3D "My own spider!";

 HTTPClient navegador =3D new HTTPClient();
 navegador.setClientHeaders(my_headers);

 ---

 Headers are defined as:

 public enum string[string] FFHeaders =3D [
 "User-Agent" : "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs;
 rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :


 "text/xml,application/xml,application/xhtml+xml,text/html;q=3D0.9,tex=
t/plain",
 "Accept-Language" : "cs,en-us;q=3D0.7,en;q=3D0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 /// Headers from firefox 3.6.13 on Linux
 public enum string[string] LFFHeaders =3D [
 "User-Agent" : "Mozilla/5.0 (X11; U; Linux i686; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13",
 "Accept" :


 "text/xml,application/xml,application/xhtml+xml,text/html;q=3D0.9,tex=
t/plain",
 "Accept-Language" : "cs,en-us;q=3D0.7,en;q=3D0.3",
 "Accept-Charset" : "utf-8",
 "Keep-Alive" : "300",
 "Connection" : "keep-alive"
 ];

 Accept, Accept-Charset, Kepp-ALive and Connection are important and i=
f
 you redefine it, module can stop work with some servers.

 On 20.1.2012 15:56, Xan xan wrote:
 On the other hand, I see dhttpclient identifies as
 "Mozilla/5.0 (Windows; U; Windows NT 5.1; cs; rv:1.9.2.3)
 Gecko/20100401 Firefox/3.6.13"

 How can I change that?
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
Thats because you are trying writeln binary data, and that is 
impossible, because writeln IMHO checks UTF8 validity.

On 20.1.2012 18:08, Xan xan wrote:
 Before and now, I get this error:

 $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
 [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]

 The code:

 //D 2.0
 //gdmd-4.6<fitxer>  =>  surt el fitxer amb el mateix nom i .o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
      if (args.length<  2) {
 		writeln("Usage:");
 		writeln("   ./spider {<url1>,<url2>, ...}");
 		return 0;
 	}
 	else {
 		try {
 			string[string] capcalera = dhttpclient.FFHeaders;
 			//capcalera["User-Agent"] = "arachnida yottiuma";
 			HTTPClient navegador = new HTTPClient();
 			navegador.setClientHeaders(capcalera);

 			foreach (a; args[1..$]) {
 				writeln("[Contingut: ", cast(ubyte[]) navegador.get(a), "]");
 			}
 		}
 		catch (Exception e) {
 			writeln("[Excepció: ", e, "]");
 		}
 		return 0;
 	}
 }



 What happens?


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Mmmm... I understand it. But is there any way of circumvent it?
Perhaps I could write to one file, isn't?



2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 Thats because you are trying writeln binary data, and that is impossible,
 because writeln IMHO checks UTF8 validity.


 On 20.1.2012 18:08, Xan xan wrote:
 Before and now, I get this error:

 $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
 [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=
40):
 Can't convert value `HTT' of type string to type uint]

 The code:

 //D 2.0
 //gdmd-4.6<fitxer> =C2=A0=3D> =C2=A0surt el fitxer amb el mateix nom i .=
o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
 =C2=A0 =C2=A0 if (args.length< =C2=A02) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:")=
;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=A0 =
./spider {<url1>,<url2>, ...}");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0else {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0navegador.setClientHeaders(capcalera);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0foreach (a; args[1..$]) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])
 navegador.get(a), "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exception =
e) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 }



 What happens?


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
rawWrite():

stdout.rawWrite(cast(ubyte[]) navegador.get(a));

On 20.1.2012 18:18, Xan xan wrote:
 Mmmm... I understand it. But is there any way of circumvent it?
 Perhaps I could write to one file, isn't?



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 Thats because you are trying writeln binary data, and that is impossible,
 because writeln IMHO checks UTF8 validity.


 On 20.1.2012 18:08, Xan xan wrote:
 Before and now, I get this error:

 $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
 [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]

 The code:

 //D 2.0
 //gdmd-4.6<fitxer>    =>    surt el fitxer amb el mateix nom i .o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
      if (args.length<    2) {
                 writeln("Usage:");
                 writeln("   ./spider {<url1>,<url2>, ...}");
                 return 0;
         }
         else {
                 try {
                         string[string] capcalera = dhttpclient.FFHeaders;
                         //capcalera["User-Agent"] = "arachnida yottiuma";
                         HTTPClient navegador = new HTTPClient();
                         navegador.setClientHeaders(capcalera);

                         foreach (a; args[1..$]) {
                                 writeln("[Contingut: ", cast(ubyte[])
 navegador.get(a), "]");
                         }
                 }
                 catch (Exception e) {
                         writeln("[Excepció: ", e, "]");
                 }
                 return 0;
         }
 }



 What happens?


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
Thank you very much. I should invite you to a beer ;-)

For the other hand,

I get this error:

[Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640)=
:
Can't convert value `HTT' of type string to type uint]


if I only want the length:

//D 2.0
//gdmd-4.6 <fitxer> dhttpclient =3D> surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
//versi=C3=B3 0.0.2
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
    if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			string[string] capcalera =3D dhttpclient.FFHeaders;
			HTTPClient navegador =3D new HTTPClient();
			navegador.setClientHeaders(capcalera);

			foreach (a; args[1..$]) {
				auto tamany =3D cast(ubyte[]) navegador.get(a);
				writeln("[Contingut: ", tamany.length, "]");
			}
		}
		catch (Exception e) {
			writeln("[Excepci=C3=B3: ", e, "]");
		}
		return 0;
	}
}


In theory, tamany.length is completely defined.

Xan.

2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 rawWrite():

 stdout.rawWrite(cast(ubyte[]) navegador.get(a));


 On 20.1.2012 18:18, Xan xan wrote:
 Mmmm... I understand it. But is there any way of circumvent it?
 Perhaps I could write to one file, isn't?



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 Thats because you are trying writeln binary data, and that is impossibl=
e,
 because writeln IMHO checks UTF8 validity.


 On 20.1.2012 18:08, Xan xan wrote:
 Before and now, I get this error:

 $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
 [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(=
1640):
 Can't convert value `HTT' of type string to type uint]

 The code:

 //D 2.0
 //gdmd-4.6<fitxer> =C2=A0 =C2=A0=3D> =C2=A0 =C2=A0surt el fitxer amb e=
l mateix nom i .o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
 =C2=A0 =C2=A0 if (args.length< =C2=A0 =C2=A02) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:=
");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=
=A0 ./spider {<url1>,<url2>, ...}");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0else {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0navegador.setClientHeaders(capcalera);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0foreach (a; args[1..$]) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])
 navegador.get(a), "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exceptio=
n e) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 }



 What happens?


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
The same error with:
[...]
foreach (a; args[1..$]) {
|___|___|___|___write("[Longitud: ");
|___|___|___|___stdout.rawWrite(cast(ubyte[]) navegador.get(a));
|___|___|___|___writeln("]");
|___|___|___}
[...]

2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 rawWrite():

 stdout.rawWrite(cast(ubyte[]) navegador.get(a));


 On 20.1.2012 18:18, Xan xan wrote:
 Mmmm... I understand it. But is there any way of circumvent it?
 Perhaps I could write to one file, isn't?



 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 Thats because you are trying writeln binary data, and that is impossibl=
e,
 because writeln IMHO checks UTF8 validity.


 On 20.1.2012 18:08, Xan xan wrote:
 Before and now, I get this error:

 $ ./spider http://static.arxiv.org/pdf/1109.4897.pdf
 [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(=
1640):
 Can't convert value `HTT' of type string to type uint]

 The code:

 //D 2.0
 //gdmd-4.6<fitxer> =C2=A0 =C2=A0=3D> =C2=A0 =C2=A0surt el fitxer amb e=
l mateix nom i .o
 //Usa https://github.com/Bystroushaak/DHTTPClient
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;
 import dhttpclient;

 int main(string [] args)
 {
 =C2=A0 =C2=A0 if (args.length< =C2=A0 =C2=A02) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("Usage:=
");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln(" =C2=
=A0 ./spider {<url1>,<url2>, ...}");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0else {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0string[string] capcalera =3D dhttpclient.FFHeaders;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0//capcalera["User-Agent"] =3D "arachnida yottiuma";
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0HTTPClient navegador =3D new HTTPClient();
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0navegador.setClientHeaders(capcalera);
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0foreach (a; args[1..$]) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0writeln("[Contingut: ", cast(ubyte= [])
 navegador.get(a), "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catch (Exceptio=
n e) {
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0writeln("[Excepci=C3=B3: ", e, "]");
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
 }



 What happens?


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 It is unlimited, you just have to cast output to ubyte[]:

 std.file.write("logo3w.png", cast(ubyte[])
 cl.get("http://www.google.cz/images/srpr/logo3w.png"));
Jan 20 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 20 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
The full code is:;

//D 2.0
//gdmd-4.6 <fitxer> dhttpclient =3D> surt el fitxer amb el mateix nom i .o
//Usa https://github.com/Bystroushaak/DHTTPClient
//versi=C3=B3 0.0.3
import std.stdio, std.string, std.conv, std.stream;
import std.socket, std.socketstream;
import dhttpclient;

int main(string [] args)
{
    if (args.length < 2) {
		writeln("Usage:");
		writeln("   ./spider {<url1>, <url2>, ...}");
		return 0;
	}
	else {
		try {
			string[string] capcalera =3D dhttpclient.FFHeaders;
			HTTPClient navegador =3D new HTTPClient();
			navegador.setClientHeaders(capcalera);

			foreach (a; args[1..$]) {
				write("[Longitud: ");
				stdout.rawWrite(cast(ubyte[]) navegador.get(a));
				writeln("]");
			}
		}
		catch (Exception e) {
			writeln("[Excepci=C3=B3: ", e, "]");
		}
		return 0;
	}
}


I don't know what happens!!!

And no, I don't live in Czech Republic: we have to postpone the invitation =
;-)




2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=
40):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
prev sibling next sibling parent Xan xan <xancorreu gmail.com> writes:
With png works, with pdf not:

 ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png
[a lot of output]

$ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf
[Longitud: [Excepci=C3=B3:
std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't
convert value `HTT' of type string to type uint]


2012/1/20 Bystroushaak <bystrousak kitakitsune.org>:
 On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepci=C3=B3: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(16=
40):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
prev sibling next sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
That is really strange - for me, it works with both files. Are you sure, 
that you can manually download that pdf file? Maybe your provider 
blocking your connection, or something like that.

What type of compiler did you used?

On 21.1.2012 13:14, Xan xan wrote:
 With png works, with pdf not:

   ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png
 [a lot of output]

 $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf
 [Longitud: [Excepció:
 std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't
 convert value `HTT' of type string to type uint]


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
prev sibling next sibling parent xancorreu <xancorreu gmail.com> writes:
Al 21/01/12 14:28, En/na Bystroushaak ha escrit:
 That is really strange - for me, it works with both files. Are you 
 sure, that you can manually download that pdf file? Maybe your 
 provider blocking your connection, or something like that.
I don't think so. It's arxiv pdf.
 What type of compiler did you used?
I use gdmd-4.6 in ubuntu. Surely you use dmd, isn't? Perhaps it's a bug on gdc. Can you help me to isolate this? Thanks, Xan.
 On 21.1.2012 13:14, Xan xan wrote:
 With png works, with pdf not:

   ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png
 [a lot of output]

 $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf
 [Longitud: [Excepció:
 std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't
 convert value `HTT' of type string to type uint]


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepció: 
 std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 21 2012
prev sibling parent Bystroushaak <bystrousak kitakitsune.org> writes:
Fixed. Bug was caused by HTTP 1.0 'HTTP 1.0 200 OK' reply.

On 21.1.2012 13:14, Xan xan wrote:
 With png works, with pdf not:

   ./spider2 http://www.google.com/intl/ca/images/logos/mail_logo.png
 [a lot of output]

 $ ./spider2 http://static.arxiv.org/pdf/1109.4897.pdf
 [Longitud: [Excepció:
 std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640): Can't
 convert value `HTT' of type string to type uint]


 2012/1/20 Bystroushaak<bystrousak kitakitsune.org>:
 On 20.1.2012 18:42, Xan xan wrote:
 Thank you very much. I should invite you to a beer ;-)
Write me if you will be in prag/czech republic :)
 For the other hand,

 I get this error:

 [Excepció: std.conv.ConvException /usr/include/d2/4.6/std/conv.d(1640):
 Can't convert value `HTT' of type string to type uint]
This is very strange error, because on my computer it works well. Can you remove try..catch and post full error list and program parameters?
Jan 22 2012
prev sibling parent Kapps <Kapps NotValidEmail.com> writes:
The host is www.google.com - http is only a web protocol. The DNS lookup 
is independent of HTTP, and thus should not include it. Note that you're 
also missing a space after the GET. Also, in terms of the example given, 
some servers won't like you not using the Host header, some won't like 
the GET being an absolute path instead of relative (but the two combined 
should make most accept it). There's a CURL wrapper added, and a higher 
level version should be available within the next release or two, you 
make want to look into that.

On 19/01/2012 9:30 AM, Xan xan wrote:
 Hi,

 I want to simply code a script to get the url as string in D 2.0.
 I have this code:

 //D 2.0
 //gdmd-4.6
 import std.stdio, std.string, std.conv, std.stream;
 import std.socket, std.socketstream;

 int main(string [] args)
 {
      if (args.length<  2) {
 		writeln("Usage:");
 		writeln("   ./aranya {<url1>,<url2>, ...}");
 		return 0;
 	}
 	else {
 		foreach (a; args[1..$]) {
 			Socket sock = new TcpSocket(new InternetAddress(a, 80));
 			scope(exit) sock.close();
 			Stream ss = new SocketStream(sock);
 			ss.writeString("GET" ~ a ~ " HTTP/1.1\r\n");
 			writeln(ss);
 		}
 		return 0;
 	}
 }


 but when I use it, I receive:
 $ ./aranya http://www.google.com
 std.socket.AddressException ../../../src/libphobos/std/socket.d(697):
 Unable to resolve host 'http://www.google.com'

 What fails?

 Thanks in advance,
 Xan.
Jan 19 2012