digitalmars.D - Curl support RFC

Jonas Drewsen (60/60) Mar 11 2011 Hi,

dsimcha (8/68) Mar 11 2011 I don't know much about this kind of stuff except that I use it for very...

Lutger Blijdestijn (3/17) Mar 11 2011 Walter contacted the author, it's not a problem:
Jonas Drewsen (14/82) Mar 12 2011 Thank you.

Vladimir Panteleev (7/8) Mar 11 2011 Does this return a string? What if the page's encoding isn't UTF-8?

Jonas Drewsen (18/23) Mar 12 2011 Currently it returns a string, but should probably return void[] as you

Jacob Carlborg (4/64) Mar 11 2011 Is there support for other HTTP methods/verbs in the D wrapper, like del...

Jonas Drewsen (3/80) Mar 12 2011 Yes.. all methods in libcurl are supported.

Jesse Phillips (6/21) Mar 11 2011 I'll make some comments on the API. Do we have to choose Http/Ftp...? Th...

Jonas Drewsen (17/38) Mar 12 2011 That is a good question.

Lutger Blijdestijn (10/62) Mar 12 2011 Properties *are* accessor methods, with some sugar. In fact you already ...
Jesse Phillips (8/32) Mar 12 2011 D was originally very friendly with properties. Your could can at this m...

Jonas Drewsen (29/61) Mar 12 2011 There should definitely be a simple method based only on an url. I'll

Jonathan M Davis (15/97) Mar 12 2011 @property doesn't currently enforce much of anything. Things are in a tr...

Jonas Drewsen (4/101) Mar 12 2011 Okey... nice to hear that this is coming up.

Jesse Phillips (3/36) Mar 13 2011 Ah, yes. One of the big reasons for introducing @property was because re...

Ary Manzana (5/15) Mar 11 2011 I *love* it.

Jonas Drewsen (3/21) Mar 12 2011 Thank you! Words like these keep up the motivation.

Jonas Drewsen (10/70) Mar 13 2011 Hi,

Johannes Pfau (39/125) Mar 14 2011 Hi,

Jonas Drewsen (17/147) Mar 14 2011 I did see the notice about the future of NOPROGRESS's removal but

Johannes Pfau (46/70) Mar 14 2011 ).=20

Jonas Drewsen (4/73) Mar 14 2011 Seems like a very nice addition. I will have a look at your github and

Jacob Carlborg (5/90) Mar 14 2011 I thought that the "etc" package was for C bindings and would expect the...
Johannes Pfau (43/129) Mar 25 2011 I looked at the code again and I got 2 more suggestions:

Johannes Pfau (8/144) Mar 25 2011 I added some code to show how I think this could be used in the HTTP

Jonas Drewsen (7/157) Mar 27 2011 Thanks!

Jonas Drewsen (19/154) Mar 27 2011 I'm a little confused as to what a headersReceived(string[string]

Johannes Pfau (34/40) Mar 29 2011 Thanks, I think I'll propose the parser for the new experimental

Jonas Drewsen (16/56) Mar 30 2011 I'll put it on my todo and reconsider when I get to it :)

Andrei Alexandrescu (34/94) Mar 13 2011 Sweet. As has been discussed, often the content is not text so you may

Jonas Drewsen (15/118) Mar 14 2011 Will do as soon as I've figured out howto create a pull request for a

Jonathan M Davis (11/44) Mar 14 2011 You can't. A pull request is for an entire branch. It pulls _everything_...

Lars T. Kyllingstad (16/64) Mar 14 2011 I also think ubyte[] is best, because:

Steven Schveighoffer (12/75) Mar 14 2011 This isn't exactly true. arrays *created* as void[] will be scanned.

Jonas Drewsen (6/89) Mar 14 2011 const(ubyte)[] for input

Andrei Alexandrescu (5/9) Mar 14 2011 Move the const from the first to the second line :o). I see no reason

Jonas Drewsen (5/15) Mar 14 2011 Then lets hope someone makes a patch for it. Maybe I'll make it when I'm...

Andrei Alexandrescu (12/16) Mar 14 2011 void[]: "There is a typed array underneath, but I forgot its exact type"...

Andrei Alexandrescu (13/40) Mar 14 2011 A good general guideline: make sure that the user could easily and

Jonas Drewsen (4/47) Mar 14 2011 I get it. Any existing implementation that does this I can have a look a...

Andrei Alexandrescu (13/26) Mar 14 2011 Unfortunately not at the moment. I wanted to define such a thing for

Jonas Drewsen (13/18) Mar 14 2011 Missed this one in my last reply.

Andrei Alexandrescu (6/23) Mar 14 2011 Use Occam's razor and the path of least resistence to get the most

Kagamin (2/4) Mar 14 2011 http has content-type, so it's known, what is contained in the array.

Jonas Drewsen <jdrewsen nospam.com> writes:

Hi,

    So I've spent some time trying to wrap libcurl for D. There is a lot 
of things that you can do with libcurl which I did not know so I'm 
starting out small.

For now I've created all the declarations for the latest public curl C 
api. I have put that in the etc.c.curl module.

On top of that I've created a more D like api as seen below. This is 
located in the 'etc.curl' module. What you can see below currently works 
but before proceeding further down this road I would like to get your 
comments on it.

//
// Simple HTTP GET with sane defaults
// provides the .content, .headers and .status
//
writeln( Http.get("http://www.google.com").content );

//
// GET with custom data receiver delegates
//
Http http = new Http("http://www.google.dk");
http.setReceiveHeaderCallback( (string key, string value) {
	writeln(key ~ ":" ~ value);
} );
http.setReceiveCallback( (string data) { /* drop */ } );
http.perform;

//
// POST with some timouts
//
http.setUrl("http://www.testing.com/test.cgi");
http.setReceiveCallback( (string data) { writeln(data); } );
http.setConnectTimeout(1000);
http.setDataTimeout(1000);
http.setDnsTimeout(1000);
http.setPostData("The quick....");
http.perform;

//
// PUT with data sender delegate
//
string msg = "Hello world";
size_t len = msg.length; /* using chuncked transfer if omitted */

http.setSendCallback( delegate size_t(char[] data) {
     if (msg.empty) return 0;
     auto l = msg.length;
     data[0..l] = msg[0..$];
     msg.length = 0;
     return l;
     },
     HttpMethod.put, len );
http.perform;

//
// HTTPS
//
writeln(Http.get("https://mail.google.com").content);

//
// FTP
//
writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
                 "./downloaded-file"));


// ... authenication, cookies, interface select, progress callback
// etc. is also implemented this way.


/Jonas

Mar 11 2011

dsimcha <dsimcha yahoo.com> writes:

I don't know much about this kind of stuff except that I use it for very simple
use cases occasionally.  One thing I'll definitely give your design credit for,
based on your examples, is making simple things simple.  I don't know how it
scales to more complex use cases (not saying it doesn't, just that I'm not
qualified to evaluate that), but I definitely would use this.  Nice work.

BTW, what is the license status of libcurl?  According to Wikipedia it's MIT
licensed.  Where does that leave us with regard to the binary attribution issue?

== Quote from Jonas Drewsen (jdrewsen nospam.com)'s article
 Hi,
     So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.
 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.
 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.
 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );
 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 	writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;
 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;
 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */
 http.setSendCallback( delegate size_t(char[] data) {
      if (msg.empty) return 0;
      auto l = msg.length;
      data[0..l] = msg[0..$];
      msg.length = 0;
      return l;
      },
      HttpMethod.put, len );
 http.perform;
 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);
 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
                  "./downloaded-file"));
 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.
 /Jonas

Mar 11 2011

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

dsimcha wrote:

 I don't know much about this kind of stuff except that I use it for very
 simple
 use cases occasionally.  One thing I'll definitely give your design credit
 for,
 based on your examples, is making simple things simple.  I don't know how
 it scales to more complex use cases (not saying it doesn't, just that I'm
 not
 qualified to evaluate that), but I definitely would use this.  Nice work.
 
 BTW, what is the license status of libcurl?  According to Wikipedia it's
 MIT
 licensed.  Where does that leave us with regard to the binary attribution
 issue?
 

Walter contacted the author, it's not a problem:

http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=112832

Mar 11 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

Thank you.

Regarding scalability: In my experience the fastest network handling for 
multiple concurrent request is done asyncronously using select or epoll. 
The current wrapper would probably use threading and messages to handle 
multiple concurrent requests which is not as efficient.

Usually you only need this kind of scalability for server side 
networking and not client side like libcurl is providing so I do not see 
this as a major issue for an initial version.

I do know how to support epoll/select based curl and by that better 
scalability and that would fortunately just be an extension to the API 
I've shown. Currently I will focus on getting the common things finished 
and rock solid.

/Jonas


On 11/03/11 16.30, dsimcha wrote:
 I don't know much about this kind of stuff except that I use it for very simple
 use cases occasionally.  One thing I'll definitely give your design credit for,
 based on your examples, is making simple things simple.  I don't know how it
 scales to more complex use cases (not saying it doesn't, just that I'm not
 qualified to evaluate that), but I definitely would use this.  Nice work.

 BTW, what is the license status of libcurl?  According to Wikipedia it's MIT
 licensed.  Where does that leave us with regard to the binary attribution
issue?

 == Quote from Jonas Drewsen (jdrewsen nospam.com)'s article
 Hi,
      So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.
 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.
 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.
 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );
 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 	writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;
 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;
 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */
 http.setSendCallback( delegate size_t(char[] data) {
       if (msg.empty) return 0;
       auto l = msg.length;
       data[0..l] = msg[0..$];
       msg.length = 0;
       return l;
       },
       HttpMethod.put, len );
 http.perform;
 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);
 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
                   "./downloaded-file"));
 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.
 /Jonas

Mar 12 2011

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Fri, 11 Mar 2011 17:20:38 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 writeln( Http.get("http://www.google.com").content );

Does this return a string? What if the page's encoding isn't UTF-8?

Data should probably be returned as void[], similar to std.file.read.

-- 
Best regards,
  Vladimir                            mailto:vladimir thecybershadow.net

Mar 11 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 11/03/11 17.33, Vladimir Panteleev wrote:
 On Fri, 11 Mar 2011 17:20:38 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 writeln( Http.get("http://www.google.com").content );

 Does this return a string? What if the page's encoding isn't UTF-8?

 Data should probably be returned as void[], similar to std.file.read.

Currently it returns a string, but should probably return void[] as you 
suggest.

Maybe the interface should be something like this to support misc. 
encodings (like the std.file.readText does):

class Http {
   	struct Result(S) {
		S content;
		...
	}
	static Result!S get(S = void[])(in string url);
	
}

Actually I just took a look at Andrei's std.stream2 suggestion and 
Http/Ftp... Transports would be pretty neat to have as well for reading 
formatted data.

I'll follow the newly spawned "Stream proposal" thread on this one :)

/Jonas

Mar 12 2011

Jacob Carlborg <doob me.com> writes:

On 2011-03-11 16:20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

Is there support for other HTTP methods/verbs in the D wrapper, like delete?

-- 
/Jacob Carlborg

Mar 11 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 11/03/11 19.31, Jacob Carlborg wrote:
 On 2011-03-11 16:20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

 Is there support for other HTTP methods/verbs in the D wrapper, like
 delete?

Yes.. all methods in libcurl are supported.

/Jonas

Mar 12 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI
already contains this, I could see being able to specifically request one or
the other for performance or so www.google.com works.

And what about properties? They tend to be very nice instead of set methods.
examples below.

Jonas Drewsen Wrote:

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );
 
 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 	writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

http.onHeader = (string key, string value) {...};
http.onContent = (string data) { ... };
http.perform();

Mar 11 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI
already contains this, I could see being able to specifically request one or
the other for performance or so www.google.com works.

That is a good question.

The problem with creating a grand unified Curl class that does it all is 
that each protocol supports different things ie. http supports cookie 
handling and http redirection, ftp supports passive/active mode and dir 
listings and so on.

I think it would confuse the user of the API if e.g. he were allowed to 
set cookies on his ftp request.

The protocols supported (Http, Ftp,... classes) do have a base class 
Protocol that implements common things like timouts etc.


 And what about properties? They tend to be very nice instead of set methods.
examples below.

Actually I thought off this and went the usual C++ way of _not_ using 
public properties but use accessor methods. Is public properties 
accepted as "the D way" and if so what about the usual reasons about why 
you should use accessor methods (like encapsulation and tolerance to 
future changes to the API)?

I do like the shorter onHeader/onContent much better though :)

/Jonas

 Jonas Drewsen Wrote:

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 	writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 http.onHeader = (string key, string value) {...};
 http.onContent = (string data) { ... };
 http.perform();

Mar 12 2011

Lutger Blijdestijn <lutger.blijdestijn gmail.com> writes:

Jonas Drewsen wrote:

 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...? The
 URI already contains this, I could see being able to specifically request
 one or the other for performance or so www.google.com works.

 
 That is a good question.
 
 The problem with creating a grand unified Curl class that does it all is
 that each protocol supports different things ie. http supports cookie
 handling and http redirection, ftp supports passive/active mode and dir
 listings and so on.
 
 I think it would confuse the user of the API if e.g. he were allowed to
 set cookies on his ftp request.
 
 The protocols supported (Http, Ftp,... classes) do have a base class
 Protocol that implements common things like timouts etc.
 
 
 And what about properties? They tend to be very nice instead of set
 methods. examples below.

 
 Actually I thought off this and went the usual C++ way of _not_ using
 public properties but use accessor methods. Is public properties
 accepted as "the D way" and if so what about the usual reasons about why
 you should use accessor methods (like encapsulation and tolerance to
 future changes to the API)?
 
 I do like the shorter onHeader/onContent much better though :)
 
 /Jonas

Properties *are* accessor methods, with some sugar. In fact you already have 
used them, try it:

http.setReceiveHeaderCallback =  (string key, string value) {
    writeln(key ~ ":" ~ value);
};

Marking a function with  property just signals it's intended use, in which 
case it's nicer to grop the get/set prefixes. Supposedly using parenthesis 
with such declarations will be outlawed in the future, but I don't think 
that's the case currently.

 Jonas Drewsen Wrote:

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 http.onHeader = (string key, string value) {...};
 http.onContent = (string data) { ... };
 http.perform();

Mar 12 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Jonas Drewsen Wrote:

 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI
already contains this, I could see being able to specifically request one or
the other for performance or so www.google.com works.

 
 That is a good question.
 
 The problem with creating a grand unified Curl class that does it all is 
 that each protocol supports different things ie. http supports cookie 
 handling and http redirection, ftp supports passive/active mode and dir 
 listings and so on.
 
 I think it would confuse the user of the API if e.g. he were allowed to 
 set cookies on his ftp request.
 
 The protocols supported (Http, Ftp,... classes) do have a base class 
 Protocol that implements common things like timouts etc.

Ah. I guess I was just thinking about if you want to download some file, you
don't really care where you are getting it from you just have the URL and are
read to go.

 And what about properties? They tend to be very nice instead of set methods.
examples below.

 
 Actually I thought off this and went the usual C++ way of _not_ using 
 public properties but use accessor methods. Is public properties 
 accepted as "the D way" and if so what about the usual reasons about why 
 you should use accessor methods (like encapsulation and tolerance to 
 future changes to the API)?
 
 I do like the shorter onHeader/onContent much better though :)

D was originally very friendly with properties. Your could can at this moment
be written: 

http.setReceiveHeaderCallback = (string key, string value) {
        writeln(key ~ ":" ~ value);
};

But is going to be deprecated for the use of the  property attribute. You are

functions that look like public fields.

Otherwise this looks really good and I do hope to see it in Phobos.

Mar 12 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 12/03/11 20.44, Jesse Phillips wrote:
 Jonas Drewsen Wrote:

 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...? The URI
already contains this, I could see being able to specifically request one or
the other for performance or so www.google.com works.

 That is a good question.

 The problem with creating a grand unified Curl class that does it all is
 that each protocol supports different things ie. http supports cookie
 handling and http redirection, ftp supports passive/active mode and dir
 listings and so on.

 I think it would confuse the user of the API if e.g. he were allowed to
 set cookies on his ftp request.

 The protocols supported (Http, Ftp,... classes) do have a base class
 Protocol that implements common things like timouts etc.

 Ah. I guess I was just thinking about if you want to download some file, you
don't really care where you are getting it from you just have the URL and are
read to go.

There should definitely be a simple method based only on an url. I'll 
put that in.


 And what about properties? They tend to be very nice instead of set methods.
examples below.

 Actually I thought off this and went the usual C++ way of _not_ using
 public properties but use accessor methods. Is public properties
 accepted as "the D way" and if so what about the usual reasons about why
 you should use accessor methods (like encapsulation and tolerance to
 future changes to the API)?

 I do like the shorter onHeader/onContent much better though :)

 D was originally very friendly with properties. Your could can at this moment
be written:

 http.setReceiveHeaderCallback = (string key, string value) {
          writeln(key ~ ":" ~ value);
 };

 But is going to be deprecated for the use of the  property attribute. You are

functions that look like public fields.

Just tried the property stuff out but it seems a bit inconsistent. Maybe 
someone can enlighten me:

import std.stdio;

alias void delegate() deleg;

class T {
   private deleg tvalue;
    property void prop(deleg dg) {
     tvalue = dg;
   }
    property deleg prop() {
     return tvalue;
   }
}

void main(string[] args) {
   T t = new T;
   t.prop = { writeln("fda"); };

   // Seems a bit odd that assigning to a temporary (tvalue) suddently
   // changes the behaviour.
   auto tvalue = t.prop;
   tvalue();     // Works as expected by printing fda
   t.prop();     // Just returns the delegate!

   // Shouldn't the  property attribute ensure that no () is needed
   // when using the property
   t.prop()(); // Works
}

/Jonas




 Otherwise this looks really good and I do hope to see it in Phobos.

Mar 12 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday 12 March 2011 13:51:37 Jonas Drewsen wrote:
 On 12/03/11 20.44, Jesse Phillips wrote:
 Jonas Drewsen Wrote:
 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...?
 The URI already contains this, I could see being able to specifically
 request one or the other for performance or so www.google.com works.

 
 That is a good question.
 
 The problem with creating a grand unified Curl class that does it all is
 that each protocol supports different things ie. http supports cookie
 handling and http redirection, ftp supports passive/active mode and dir
 listings and so on.
 
 I think it would confuse the user of the API if e.g. he were allowed to
 set cookies on his ftp request.
 
 The protocols supported (Http, Ftp,... classes) do have a base class
 Protocol that implements common things like timouts etc.

 
 Ah. I guess I was just thinking about if you want to download some file,
 you don't really care where you are getting it from you just have the
 URL and are read to go.

 
 There should definitely be a simple method based only on an url. I'll
 put that in.
 
 And what about properties? They tend to be very nice instead of set
 methods. examples below.

 
 Actually I thought off this and went the usual C++ way of _not_ using
 public properties but use accessor methods. Is public properties
 accepted as "the D way" and if so what about the usual reasons about why
 you should use accessor methods (like encapsulation and tolerance to
 future changes to the API)?
 
 I do like the shorter onHeader/onContent much better though :)

 
 D was originally very friendly with properties. Your could can at this
 moment be written:
 
 http.setReceiveHeaderCallback = (string key, string value) {
 
          writeln(key ~ ":" ~ value);
 
 };
 
 But is going to be deprecated for the use of the  property attribute. You

 fields and functions that look like public fields.

 
 Just tried the property stuff out but it seems a bit inconsistent. Maybe
 someone can enlighten me:
 
 import std.stdio;
 
 alias void delegate() deleg;
 
 class T {
    private deleg tvalue;
     property void prop(deleg dg) {
      tvalue = dg;
    }
     property deleg prop() {
      return tvalue;
    }
 }
 
 void main(string[] args) {
    T t = new T;
    t.prop = { writeln("fda"); };
 
    // Seems a bit odd that assigning to a temporary (tvalue) suddently
    // changes the behaviour.
    auto tvalue = t.prop;
    tvalue();     // Works as expected by printing fda
    t.prop();     // Just returns the delegate!
 
    // Shouldn't the  property attribute ensure that no () is needed
    // when using the property
    t.prop()(); // Works
 }

 property doesn't currently enforce much of anything. Things are in a
transitory 
state with regards to property. Originally, there was no such thing as
 property 
and any function which had no parameters and returned a value could be used as
a 
getter and any function which returned nothing and took a single argument could 
be used as a setter. It was decided to make it more restrictive, so  property 
was added. Eventually, you will _only_ be able to use such functions as
property 
functions if they are marked with  property, and you will _have_ to call them 
with the property syntax and will _not_ be able to call non-property functions 
with the property syntax. However, at the moment, the compiler doesn't enforce 
that. It will eventually, but there are several bugs with regards to property 
functions (they mostly work, but you found one of the cases where they don't), 
and it probably wouldn't be a good idea to enforce it until more of those bugs 
have been fixed.

- Jonathan M Davis

Mar 12 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 13/03/11 00.28, Jonathan M Davis wrote:
 On Saturday 12 March 2011 13:51:37 Jonas Drewsen wrote:
 On 12/03/11 20.44, Jesse Phillips wrote:
 Jonas Drewsen Wrote:
 On 11/03/11 22.21, Jesse Phillips wrote:
 I'll make some comments on the API. Do we have to choose Http/Ftp...?
 The URI already contains this, I could see being able to specifically
 request one or the other for performance or so www.google.com works.

 That is a good question.

 The problem with creating a grand unified Curl class that does it all is
 that each protocol supports different things ie. http supports cookie
 handling and http redirection, ftp supports passive/active mode and dir
 listings and so on.

 I think it would confuse the user of the API if e.g. he were allowed to
 set cookies on his ftp request.

 The protocols supported (Http, Ftp,... classes) do have a base class
 Protocol that implements common things like timouts etc.

 Ah. I guess I was just thinking about if you want to download some file,
 you don't really care where you are getting it from you just have the
 URL and are read to go.

 There should definitely be a simple method based only on an url. I'll
 put that in.

 And what about properties? They tend to be very nice instead of set
 methods. examples below.

 Actually I thought off this and went the usual C++ way of _not_ using
 public properties but use accessor methods. Is public properties
 accepted as "the D way" and if so what about the usual reasons about why
 you should use accessor methods (like encapsulation and tolerance to
 future changes to the API)?

 I do like the shorter onHeader/onContent much better though :)

 D was originally very friendly with properties. Your could can at this
 moment be written:

 http.setReceiveHeaderCallback = (string key, string value) {

           writeln(key ~ ":" ~ value);

 };

 But is going to be deprecated for the use of the  property attribute. You

 fields and functions that look like public fields.

 Just tried the property stuff out but it seems a bit inconsistent. Maybe
 someone can enlighten me:

 import std.stdio;

 alias void delegate() deleg;

 class T {
     private deleg tvalue;
      property void prop(deleg dg) {
       tvalue = dg;
     }
      property deleg prop() {
       return tvalue;
     }
 }

 void main(string[] args) {
     T t = new T;
     t.prop = { writeln("fda"); };

     // Seems a bit odd that assigning to a temporary (tvalue) suddently
     // changes the behaviour.
     auto tvalue = t.prop;
     tvalue();     // Works as expected by printing fda
     t.prop();     // Just returns the delegate!

     // Shouldn't the  property attribute ensure that no () is needed
     // when using the property
     t.prop()(); // Works
 }

  property doesn't currently enforce much of anything. Things are in a
transitory
 state with regards to property. Originally, there was no such thing as
 property
 and any function which had no parameters and returned a value could be used as
a
 getter and any function which returned nothing and took a single argument could
 be used as a setter. It was decided to make it more restrictive, so  property
 was added. Eventually, you will _only_ be able to use such functions as
property
 functions if they are marked with  property, and you will _have_ to call them
 with the property syntax and will _not_ be able to call non-property functions
 with the property syntax. However, at the moment, the compiler doesn't enforce
 that. It will eventually, but there are several bugs with regards to property
 functions (they mostly work, but you found one of the cases where they don't),
 and it probably wouldn't be a good idea to enforce it until more of those bugs
 have been fixed.

 - Jonathan M Davis

Okey... nice to hear that this is coming up.

Thanks again!
/Jonas

Mar 12 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

Jonas Drewsen Wrote:

 Just tried the property stuff out but it seems a bit inconsistent. Maybe 
 someone can enlighten me:
 
 import std.stdio;
 
 alias void delegate() deleg;
 
 class T {
    private deleg tvalue;
     property void prop(deleg dg) {
      tvalue = dg;
    }
     property deleg prop() {
      return tvalue;
    }
 }
 
 void main(string[] args) {
    T t = new T;
    t.prop = { writeln("fda"); };
 
    // Seems a bit odd that assigning to a temporary (tvalue) suddently
    // changes the behaviour.
    auto tvalue = t.prop;
    tvalue();     // Works as expected by printing fda
    t.prop();     // Just returns the delegate!
 
    // Shouldn't the  property attribute ensure that no () is needed
    // when using the property
    t.prop()(); // Works
 }
 
 /Jonas

Ah, yes. One of the big reasons for introducing  property was because returning
delegates could be very confusing in terms if whether the delegate is called or
returned from the function. Since the old system has not yet been ripped out
 property basically does nothing except under some conditions where it will
complain you have added a ().

So the situation should improve, but I really don't know how or when things
will change.

Mar 13 2011

Ary Manzana <ary esperanto.org.ar> writes:

On 3/11/11 12:20 PM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

I *love* it.

All APIs should be like yours. One-liners for what you want right now. 
If it's a little more complex, some more lines. This is perfect.

Congratulations!

Mar 11 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 12/03/11 05.30, Ary Manzana wrote:
 On 3/11/11 12:20 PM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 I *love* it.

 All APIs should be like yours. One-liners for what you want right now.
 If it's a little more complex, some more lines. This is perfect.

 Congratulations!

Thank you! Words like these keep up the motivation.

/Jonas

Mar 12 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

Hi,

   So I've been working a bit on the etc.curl module. Currently most of 
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in phobos 
if I finish it with the current design. If not then it will be for my 
own project only and doesn't need as much documentation or all the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

Mar 13 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of=20
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http =3D new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg =3D "Hello world";
 size_t len =3D msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l =3D msg.length;
 data[0..l] =3D msg[0..$];
 msg.length =3D 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


Hi,
I really like the API. A few comments:

You use the internal curl progress meter. According to the
documentation (It's a little hidden, look at CURLOPT_NOPROGRESS) the
progress meter is likely to removed in future curl versions. The
download progress should be easy to reimplement, although you'd have to
parse the Content-Length header. Upload shouldn't be to difficult either
(One problem: What does curl pass as ultotal/dltotal when chunked
encoding is used or the total size is not known?). Then we could also
use different delegates for upload/download.

The callback interface suits curl best and I actually like it, but how
will it interact with streams? As an example: If someone wrote a
stream/filter that decoded gzip for files it should be usable with
the http streams as well. But files/ filestreams have a pull
interface (no callbacks, stream.read() in a loop). So how could a gzip
stream be written without to much code duplication supporting files and
the http stuff?

Do you plan to add some kind of support for header parsing? I think
something like what the .net webclient uses
( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=3DVS.100).=
aspx )
would be great. Especially the HeaderCollection supporting headers as
strings and as data types (for both parsing and formatting), but
without a class hierarchy for the headers, using templates instead.

I've written D parsers/formatters for almost all headers in
rfc2616 (1 or 2 might be missing) and for a few additional commonly
used headers (Content-Disposition, cookie headers). The parsers are
written with ragel and are to be used with curl (continuations must be
removed and the parsers always take 1 line of input, just as you get it
from curl). Right now only the client side is implemented (no parsers
for headers which can only be sent from client-->server ). However, I
need to add some more documentation to the parsers, need to do
some refactoring and I've got absolutely no time for that in the next 2
weeks ('abitur' final exams). But if you could wait 2 weeks or if
you wanted to do the refactoring yourself, I would be happy to
contribute that code.


--=20
Johannes Pfau

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 14/03/11 12.10, Johannes Pfau wrote:
 Jonas Drewsen wrote:
 Hi,

    So I've been working a bit on the etc.curl module. Currently most
 of
 the HTTP functionality is done and some very simple Ftp.

 I would very much like to know if this has a chance of getting in
 phobos if I finish it with the current design. If not then it will be
 for my own project only and doesn't need as much documentation or all
 the features.

 https://github.com/jcd/phobos/tree/curl

 I do know that the error handling is currently not good enough... WIP.

 /Jonas


 On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


 Hi,
 I really like the API. A few comments:

 You use the internal curl progress meter. According to the
 documentation (It's a little hidden, look at CURLOPT_NOPROGRESS) the
 progress meter is likely to removed in future curl versions. The
 download progress should be easy to reimplement, although you'd have to
 parse the Content-Length header. Upload shouldn't be to difficult either
 (One problem: What does curl pass as ultotal/dltotal when chunked
 encoding is used or the total size is not known?). Then we could also
 use different delegates for upload/download.

I did see the notice about the future of NOPROGRESS's removal but 
decided to wrap it anyway. Maybe I should just remove it in an initial 
version. As you say it is pretty simple to implement ourselves.

 The callback interface suits curl best and I actually like it, but how
 will it interact with streams? As an example: If someone wrote a
 stream/filter that decoded gzip for files it should be usable with
 the http streams as well. But files/ filestreams have a pull
 interface (no callbacks, stream.read() in a loop). So how could a gzip
 stream be written without to much code duplication supporting files and
 the http stuff?

If we take Andrei's stream proposal as the base of a new streaming 
design then the http would just be another Transport. Files have a pull 
interface that blocks until data is read. The same could be done for a 
the http class.

What I would really like is for the stream design to support 
non-blocking as mentioned in the stream proposal. Just have to figure 
out how the streaming API should behave in such cases I guess.


 Do you plan to add some kind of support for header parsing? I think
 something like what the .net webclient uses
 ( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
 would be great. Especially the HeaderCollection supporting headers as
 strings and as data types (for both parsing and formatting), but
 without a class hierarchy for the headers, using templates instead.

It would be nice to be able to get/set headers by string and enums 
(http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx). 
But I cannot see that .net is using datatypes or templates for it. Could 
you give me a pointer please?


 I've written D parsers/formatters for almost all headers in
 rfc2616 (1 or 2 might be missing) and for a few additional commonly
 used headers (Content-Disposition, cookie headers). The parsers are
 written with ragel and are to be used with curl (continuations must be
 removed and the parsers always take 1 line of input, just as you get it
 from curl). Right now only the client side is implemented (no parsers
 for headers which can only be sent from client-->server ). However, I
 need to add some more documentation to the parsers, need to do
 some refactoring and I've got absolutely no time for that in the next 2
 weeks ('abitur' final exams). But if you could wait 2 weeks or if
 you wanted to do the refactoring yourself, I would be happy to
 contribute that code.

That sounds very interesting. I would very much like to see the code and 
see if fits in.

Mar 14 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
 Do you plan to add some kind of support for header parsing? I think
 something like what the .net webclient uses
 ( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=3DVS.10=


0).aspx )
 would be great. Especially the HeaderCollection supporting headers as
 strings and as data types (for both parsing and formatting), but
 without a class hierarchy for the headers, using templates instead.

It would be nice to be able to get/set headers by string and enums=20
(http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx=

).=20
But I cannot see that .net is using datatypes or templates for it.
Could you give me a pointer please?

You're right I didn't look close enough at the .net documentation. I
thought HttpRequestHeader is a class. What I meant for D was something
like this:

struct ETagHeader
{
    //Data members
    bool Weak =3D false;
    string Value;

    //All header structs provide these
    static string Key =3D "ETag";

    static ETagHeader parse(string value)
    {
        //parser logic here
    }

    void format(T writer)
        if (isOutputRange!(T, string))
    {
        if(etag.Weak)
            writer.put("W/");
        assert(etag.Value !=3D "");
        writer.put(quote(etag.Value));
    }
}

Then we can offer methods like these:

setHeader(T)(T header)
    if(isHeader(T))
{
    headers[T.Key] =3D formatHeader(header);
}

T getHeader(T type)()
    if(isHeader(T))
{
   if(!T.Key in headers)
       throw Exception();
   return T.parse(headers[T.key]);
}

So user code wouldn't have to deal with header parsing / formatting:
auto etag =3D client.getHeader!ETagHeader();
assert(etag.Weak);

 I've written D parsers/formatters for almost all headers in
 rfc2616 (1 or 2 might be missing) and for a few additional commonly
 used headers (Content-Disposition, cookie headers). The parsers are
 written with ragel and are to be used with curl (continuations must
 be removed and the parsers always take 1 line of input, just as you
 get it from curl). Right now only the client side is implemented (no
 parsers for headers which can only be sent from client-->server ).
 However, I need to add some more documentation to the parsers, need
 to do some refactoring and I've got absolutely no time for that in
 the next 2 weeks ('abitur' final exams). But if you could wait 2
 weeks or if you wanted to do the refactoring yourself, I would be
 happy to contribute that code.

That sounds very interesting. I would very much like to see the code
and see if fits in.

Ok, here it is, but it seriously needs to be refactored and documented:
https://gist.github.com/869324

--=20
Johannes Pfau

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 14/03/11 16.40, Johannes Pfau wrote:
 Jonas Drewsen wrote:
 Do you plan to add some kind of support for header parsing? I think
 something like what the .net webclient uses
 ( http://msdn.microsoft.com/en-us/library/system.net.webclient(v=VS.100).aspx )
 would be great. Especially the HeaderCollection supporting headers as
 strings and as data types (for both parsing and formatting), but
 without a class hierarchy for the headers, using templates instead.

 It would be nice to be able to get/set headers by string and enums
 (http://msdn.microsoft.com/en-us/library/system.net.httprequestheader.aspx).
 But I cannot see that .net is using datatypes or templates for it.
 Could you give me a pointer please?

 You're right I didn't look close enough at the .net documentation. I
 thought HttpRequestHeader is a class. What I meant for D was something
 like this:

 struct ETagHeader
 {
      //Data members
      bool Weak = false;
      string Value;

      //All header structs provide these
      static string Key = "ETag";

      static ETagHeader parse(string value)
      {
          //parser logic here
      }

      void format(T writer)
          if (isOutputRange!(T, string))
      {
          if(etag.Weak)
              writer.put("W/");
          assert(etag.Value != "");
          writer.put(quote(etag.Value));
      }
 }

 Then we can offer methods like these:

 setHeader(T)(T header)
      if(isHeader(T))
 {
      headers[T.Key] = formatHeader(header);
 }

 T getHeader(T type)()
      if(isHeader(T))
 {
     if(!T.Key in headers)
         throw Exception();
     return T.parse(headers[T.key]);
 }

 So user code wouldn't have to deal with header parsing / formatting:
 auto etag = client.getHeader!ETagHeader();
 assert(etag.Weak);

Seems like a very nice addition. I will have a look at your github and 
probably wait until you have made it ready for consumption before adding 
it :)

 I've written D parsers/formatters for almost all headers in
 rfc2616 (1 or 2 might be missing) and for a few additional commonly
 used headers (Content-Disposition, cookie headers). The parsers are
 written with ragel and are to be used with curl (continuations must
 be removed and the parsers always take 1 line of input, just as you
 get it from curl). Right now only the client side is implemented (no
 parsers for headers which can only be sent from client-->server ).
 However, I need to add some more documentation to the parsers, need
 to do some refactoring and I've got absolutely no time for that in
 the next 2 weeks ('abitur' final exams). But if you could wait 2
 weeks or if you wanted to do the refactoring yourself, I would be
 happy to contribute that code.

 That sounds very interesting. I would very much like to see the code
 and see if fits in.

 Ok, here it is, but it seriously needs to be refactored and documented:
 https://gist.github.com/869324

Mar 14 2011

Jacob Carlborg <doob me.com> writes:

On 2011-03-13 22:39, Jonas Drewsen wrote:
 Hi,

 So I've been working a bit on the etc.curl module. Currently most of the
 HTTP functionality is done and some very simple Ftp.

 I would very much like to know if this has a chance of getting in phobos
 if I finish it with the current design. If not then it will be for my
 own project only and doesn't need as much documentation or all the
 features.

 https://github.com/jcd/phobos/tree/curl

 I do know that the error handling is currently not good enough... WIP.

 /Jonas


 On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


I thought that the "etc" package was for C bindings and would expect the 
"curl" module to be placed in std.curl or std.net.curl.

-- 
/Jacob Carlborg

Mar 14 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of=20
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http =3D new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg =3D "Hello world";
 size_t len =3D msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l =3D msg.length;
 data[0..l] =3D msg[0..$];
 msg.length =3D 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would be
called when all headers have been received (when the data callback is
called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.

2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~=3D value;). This means "Content-Length" vs
"content-length" would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
----------------------
   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that

   It MUST be possible to combine the multiple header fields into one
   "field-name: field-value" pair, without changing the semantics of the
   message, by appending each subsequent field-value to the first, each
   separated by a comma. The order in which header fields with the same
   field-name are received is therefore significant to the
   interpretation of the combined field value, and thus a proxy MUST NOT
   change the order of these field values when a message is forwarded.
----------------------

I'm also done with the first pass through the http parsers.
Documentation is here:
http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

Code here:
https://gist.github.com/886612
The http.d file is generated from the http.d.rl file.=20

--=20
Johannes Pfau

Mar 25 2011

Johannes Pfau <spam example.com> writes:

Johannes Pfau wrote:
Jonas Drewsen wrote:
Hi,

   So I've been working a bit on the etc.curl module. Currently most
 of=20
the HTTP functionality is done and some very simple Ftp.

I would very much like to know if this has a chance of getting in
phobos if I finish it with the current design. If not then it will be
for my own project only and doesn't need as much documentation or all
the features.

https://github.com/jcd/phobos/tree/curl

I do know that the error handling is currently not good enough... WIP.

/Jonas


On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http =3D new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg =3D "Hello world";
 size_t len =3D msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l =3D msg.length;
 data[0..l] =3D msg[0..$];
 msg.length =3D 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


I looked at the code again and I got 2 more suggestions:

1.) Would it be useful to have a headersReceived callback which would
be called when all headers have been received (when the data callback
is called the first time)? I think of a situation where you don't know
what data the server will return: a few KB html which you can easily
keep in memory or a huge file which you'd have to save to disk. You
can only know that if the headers have been received. It would also be
possible to do that by just overwriting the headerCallback and looking
out for the ContentLength/ContentType header, but I think it should
also work with the default headerCallback.

2.)
As far as I can see you store the http headers in a case sensitive way.
(res.headers[key] ~=3D value;). This means "Content-Length" vs
"content-length" would produce two entries in the array and it makes
it difficult to get the header from the associative array. It is maybe
useful to keep the original casing, but probably not in the array key.

BTW: According to RFC2616 the only headers which are allowed
to be included multiple times in the response must consist of comma
separated lists. So in theory we could keep a simple string[string]
list and if we see a header twice we can just merge it with a ','.

http://tools.ietf.org/html/rfc2616#section-4.2
Relevant part from the RFC:
----------------------
   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that

   It MUST be possible to combine the multiple header fields into one
   "field-name: field-value" pair, without changing the semantics of
 the message, by appending each subsequent field-value to the first,
 each separated by a comma. The order in which header fields with the
 same field-name are received is therefore significant to the
   interpretation of the combined field value, and thus a proxy MUST
 NOT change the order of these field values when a message is
 forwarded.
----------------------

I'm also done with the first pass through the http parsers.
Documentation is here:
http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

Code here:
https://gist.github.com/886612
The http.d file is generated from the http.d.rl file.=20

I added some code to show how I think this could be used in the HTTP
client:
https://gist.github.com/886612#file_gistfile1.d

Like in the .net webclient we'd need two of these collections: one for
received headers and one for headers to be sent.
--=20
Johannes Pfau

Mar 25 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 25/03/11 12.07, Johannes Pfau wrote:
 Johannes Pfau wrote:
 Jonas Drewsen wrote:
 Hi,

    So I've been working a bit on the etc.curl module. Currently most
 of
 the HTTP functionality is done and some very simple Ftp.

 I would very much like to know if this has a chance of getting in
 phobos if I finish it with the current design. If not then it will be
 for my own project only and doesn't need as much documentation or all
 the features.

 https://github.com/jcd/phobos/tree/curl

 I do know that the error handling is currently not good enough... WIP.

 /Jonas


 On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


 I looked at the code again and I got 2 more suggestions:

 1.) Would it be useful to have a headersReceived callback which would
 be called when all headers have been received (when the data callback
 is called the first time)? I think of a situation where you don't know
 what data the server will return: a few KB html which you can easily
 keep in memory or a huge file which you'd have to save to disk. You
 can only know that if the headers have been received. It would also be
 possible to do that by just overwriting the headerCallback and looking
 out for the ContentLength/ContentType header, but I think it should
 also work with the default headerCallback.

 2.)
 As far as I can see you store the http headers in a case sensitive way.
 (res.headers[key] ~= value;). This means "Content-Length" vs
 "content-length" would produce two entries in the array and it makes
 it difficult to get the header from the associative array. It is maybe
 useful to keep the original casing, but probably not in the array key.

 BTW: According to RFC2616 the only headers which are allowed
 to be included multiple times in the response must consist of comma
 separated lists. So in theory we could keep a simple string[string]
 list and if we see a header twice we can just merge it with a ','.

 http://tools.ietf.org/html/rfc2616#section-4.2
 Relevant part from the RFC:
 ----------------------
    Multiple message-header fields with the same field-name MAY be
    present in a message if and only if the entire field-value for that

    It MUST be possible to combine the multiple header fields into one
    "field-name: field-value" pair, without changing the semantics of
 the message, by appending each subsequent field-value to the first,
 each separated by a comma. The order in which header fields with the
 same field-name are received is therefore significant to the
    interpretation of the combined field value, and thus a proxy MUST
 NOT change the order of these field values when a message is
 forwarded.
 ----------------------

 I'm also done with the first pass through the http parsers.
 Documentation is here:
 http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

 Code here:
 https://gist.github.com/886612
 The http.d file is generated from the http.d.rl file.

 I added some code to show how I think this could be used in the HTTP
 client:
 https://gist.github.com/886612#file_gistfile1.d

 Like in the .net webclient we'd need two of these collections: one for
 received headers and one for headers to be sent.

Thanks!

It would be very nice to have in the std.protocol.http in phobos so that 
the curl stuff could use it. If that happened then 
std.protocol.{smtp,imap,....} could probably also be built on your 
framework and be added as support in the curl wrappers.

/Jonas

Mar 27 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 25/03/11 10.54, Johannes Pfau wrote:
 Jonas Drewsen wrote:
 Hi,

    So I've been working a bit on the etc.curl module. Currently most
 of
 the HTTP functionality is done and some very simple Ftp.

 I would very much like to know if this has a chance of getting in
 phobos if I finish it with the current design. If not then it will be
 for my own project only and doesn't need as much documentation or all
 the features.

 https://github.com/jcd/phobos/tree/curl

 I do know that the error handling is currently not good enough... WIP.

 /Jonas


 On 11/03/11 16.20, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");
 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas


 I looked at the code again and I got 2 more suggestions:

 1.) Would it be useful to have a headersReceived callback which would be
 called when all headers have been received (when the data callback is
 called the first time)? I think of a situation where you don't know
 what data the server will return: a few KB html which you can easily
 keep in memory or a huge file which you'd have to save to disk. You
 can only know that if the headers have been received. It would also be
 possible to do that by just overwriting the headerCallback and looking
 out for the ContentLength/ContentType header, but I think it should
 also work with the default headerCallback.

I'm a little confused as to what a headersReceived(string[string] 
headers) would give you compared to the onReceiveHeader(const(char)[], 
const(char)[])) callback that exists today in the example.

The headersReceived callback would probably lookup the content-length 
header and set a flag about whether to save content to file or memory.

The existing onReceiveHeader could do the same by setting the flag when 
it receives the content-length field.

Or maybe I'm misunderstanding you?


 2.)
 As far as I can see you store the http headers in a case sensitive way.
 (res.headers[key] ~= value;). This means "Content-Length" vs
 "content-length" would produce two entries in the array and it makes
 it difficult to get the header from the associative array. It is maybe
 useful to keep the original casing, but probably not in the array key.

 BTW: According to RFC2616 the only headers which are allowed
 to be included multiple times in the response must consist of comma
 separated lists. So in theory we could keep a simple string[string]
 list and if we see a header twice we can just merge it with a ','.

 http://tools.ietf.org/html/rfc2616#section-4.2
 Relevant part from the RFC:
 ----------------------
     Multiple message-header fields with the same field-name MAY be
     present in a message if and only if the entire field-value for that

     It MUST be possible to combine the multiple header fields into one
     "field-name: field-value" pair, without changing the semantics of the
     message, by appending each subsequent field-value to the first, each
     separated by a comma. The order in which header fields with the same
     field-name are received is therefore significant to the
     interpretation of the combined field value, and thus a proxy MUST NOT
     change the order of these field values when a message is forwarded.
 ----------------------

I will surely implement this combined value functionality. I also noted 
that header field names are case insensitive. This means that they could 
just be stored internally as lower cased and the documentation could 
specify lowercase for looking up by field name.


 I'm also done with the first pass through the http parsers.
 Documentation is here:
 http://dl.dropbox.com/u/24218791/std.protocol.http/http/http.html

 Code here:
 https://gist.github.com/886612
 The http.d file is generated from the http.d.rl file.

This is a nice protocol parser. I would very much like it to be used 
with the curl API but without it being a dependency. This is already 
possible now using the onReceiveHeader callback and this would decouple 
the two. At least until std.protocol.http is in phobos as well - at that 
point convenience methods could be added :)

/Jonas

Mar 27 2011

Johannes Pfau <spam example.com> writes:

Jonas Drewsen wrote:
This is a nice protocol parser. I would very much like it to be used=20
with the curl API but without it being a dependency. This is already=20
possible now using the onReceiveHeader callback and this would
decouple the two. At least until std.protocol.http is in phobos as
well - at that point convenience methods could be added :)

/Jonas

Thanks, I think I'll propose the parser for the new experimental
namespace when it's available.

About the headersReceived callback: You're totally right, it can be
done with the onReceiveHeader callback right now. But I think in the
common case the user wants the headers in an key/value array. So if the
user doesn't want to use the onReceiveHeader api, a headersReceived
callback would probably be convenient. But, as said it's not necessary.

Reading the curl documentation showed another small trap:
CURLOPT_HEADERFUNCTION
------------------------------------------------------------
It's important to note that the callback will be invoked for the
headers of all responses received after initiating a request and not
just the final response. This includes all responses which occur during
authentication negotiation. If you need to operate on only the headers
from the final response, you will need to collect headers in the
callback yourself and use HTTP status lines, for example, to delimit
response boundaries.
------------------------------------------------------------

I think if we store the headers into an array, we should only store the
headers of the final response. Another question is should all headers
or only final headers trigger the onReceiveHeader callback? Passing
only the final headers would require extra work, passing all headers
should at least be documented.

Thinking of this more, this also means the _receiveHeaderCallback is
not 100% correct, as it expects all lines after the first line to be
header or empty lines, but it's possible that we get multiple statuslines.
It still works, the regex doesn't match anything and the code
ignores that line. But this way, the stored statusline will always be
the first statusline, which isn't optimal. We'd also need to detect if a
line is a statusline to reset the headers array if it's used. Seems
like we have to think about this some more.

--=20
Johannes Pfau

Mar 29 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 29/03/11 17.31, Johannes Pfau wrote:
 Jonas Drewsen wrote:
 This is a nice protocol parser. I would very much like it to be used
 with the curl API but without it being a dependency. This is already
 possible now using the onReceiveHeader callback and this would
 decouple the two. At least until std.protocol.http is in phobos as
 well - at that point convenience methods could be added :)

 /Jonas

 Thanks, I think I'll propose the parser for the new experimental
 namespace when it's available.

I'm looking forward to that.

 About the headersReceived callback: You're totally right, it can be
 done with the onReceiveHeader callback right now. But I think in the
 common case the user wants the headers in an key/value array. So if the
 user doesn't want to use the onReceiveHeader api, a headersReceived
 callback would probably be convenient. But, as said it's not necessary.

I'll put it on my todo and reconsider when I get to it :)

 Reading the curl documentation showed another small trap:
 CURLOPT_HEADERFUNCTION
 ------------------------------------------------------------
 It's important to note that the callback will be invoked for the
 headers of all responses received after initiating a request and not
 just the final response. This includes all responses which occur during
 authentication negotiation. If you need to operate on only the headers
 from the final response, you will need to collect headers in the
 callback yourself and use HTTP status lines, for example, to delimit
 response boundaries.
 ------------------------------------------------------------

 I think if we store the headers into an array, we should only store the
 headers of the final response. Another question is should all headers
 or only final headers trigger the onReceiveHeader callback? Passing
 only the final headers would require extra work, passing all headers
 should at least be documented.

Yeah... I've discovered this myself as well.

The current implementation does as libcurl does it an passes all headers 
not just for the final subrequest.

 Thinking of this more, this also means the _receiveHeaderCallback is
 not 100% correct, as it expects all lines after the first line to be
 header or empty lines, but it's possible that we get multiple statuslines.
 It still works, the regex doesn't match anything and the code
 ignores that line. But this way, the stored statusline will always be
 the first statusline, which isn't optimal. We'd also need to detect if a
 line is a statusline to reset the headers array if it's used. Seems
 like we have to think about this some more.

My local version already takes care of this. It was the wrong place for 
parsing status lines and headers anyway. It is now moved to the Http 
class where it should have been all the time.

I have implemented almost all of the features/changes suggested now. The 
last one I'm currently fighting is the support for "foreach" and async 
.byLine/.byChunk. I may have to make some changes in the current design 
to support this with the calling API that I would like to expose.

I wonder who could take the step and open a std.experimental package for 
submissions?

Thank you for the feedback!

Mar 30 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

Great! Could you please create a pull request for that?

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

Sweet. As has been discussed, often the content is not text so you may 
want to have content return ubyte[] and add a new property such as 
"textContent" or "text".

 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");

You'll probably need to justify the existence of a class hierarchy and 
what overridable methods there are. In particular, since you seem to 
offer hooks via delegates, probably classes wouldn't be needed at all. 
(FWIW I would've done the same; I wouldn't want to inherit just to 
intercept the headers etc.)

 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

As discussed, properties may be better here than setXxx and getXxx. The 
setReceiveCallback hook should take a ubyte[]. The 
setReceiveHeaderCallback should take a const(char)[]. That way you won't 
need to copy all headers, leaving safely that option to the client.

 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

setPostData -> setTextPostData, and then changing everything to 
properties would make it something like textPostData. Or wait, there 
could be some overloading going on... Anyway, the basic idea is that 
generally get and post data could be raw bytes, and the user could elect 
to transfer strings instead.

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

The callback would take ubyte[].

 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

This is all very encouraging. I think this API covers nicely a variety 
of needs. We need to make sure everything interacts well with threads, 
in particular that one can shut down a transfer (or the entire library) 
from a thread or callback and have the existing transfer(s) throw an 
exception immediately.

Regarding a range interface, it would be great if you allowed e.g.

foreach (line; Http.get("https://mail.google.com").byLine()) {
    ...
}

The data transfer should happen concurrently with the foreach code. The 
type of line is char[] or const(char)[]. Similarly, there would be a 
byChunk interface that transfers in ubyte[] chunks.

Also we need a head() method for the corresponding command.


Andrei

Mar 13 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.

 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 Great! Could you please create a pull request for that?

Will do as soon as I've figured out howto create a pull request for a 
single file in a branch. Anyone knows how to do that on github? Or 
should I just create a pull request including the etc.curl wrapper as well?

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.

 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 Sweet. As has been discussed, often the content is not text so you may
 want to have content return ubyte[] and add a new property such as
 "textContent" or "text".

I've already changed it to void[] as done in the std.file module. Is 
ubyte[] better suited?

I'll add a text property as well.


 //
 // GET with custom data receiver delegates
 //
 Http http = new Http("http://www.google.dk");

 You'll probably need to justify the existence of a class hierarchy and
 what overridable methods there are. In particular, since you seem to
 offer hooks via delegates, probably classes wouldn't be needed at all.
 (FWIW I would've done the same; I wouldn't want to inherit just to
 intercept the headers etc.)

 http.setReceiveHeaderCallback( (string key, string value) {
 writeln(key ~ ":" ~ value);
 } );
 http.setReceiveCallback( (string data) { /* drop */ } );
 http.perform;

 As discussed, properties may be better here than setXxx and getXxx. The
 setReceiveCallback hook should take a ubyte[]. The
 setReceiveHeaderCallback should take a const(char)[]. That way you won't
 need to copy all headers, leaving safely that option to the client.

I've already replaced the set/get methods with properties and renamed 
them. Hadn't thought of using const(char)[].. thanks for the hint.


 //
 // POST with some timouts
 //
 http.setUrl("http://www.testing.com/test.cgi");
 http.setReceiveCallback( (string data) { writeln(data); } );
 http.setConnectTimeout(1000);
 http.setDataTimeout(1000);
 http.setDnsTimeout(1000);
 http.setPostData("The quick....");
 http.perform;

 setPostData -> setTextPostData, and then changing everything to
 properties would make it something like textPostData. Or wait, there
 could be some overloading going on... Anyway, the basic idea is that
 generally get and post data could be raw bytes, and the user could elect
 to transfer strings instead.

I'll make sure both text and byte[]/void[] versions will be available.

 //
 // PUT with data sender delegate
 //
 string msg = "Hello world";
 size_t len = msg.length; /* using chuncked transfer if omitted */

 http.setSendCallback( delegate size_t(char[] data) {
 if (msg.empty) return 0;
 auto l = msg.length;
 data[0..l] = msg[0..$];
 msg.length = 0;
 return l;
 },
 HttpMethod.put, len );
 http.perform;

 The callback would take ubyte[].

Already fixed.


 //
 // HTTPS
 //
 writeln(Http.get("https://mail.google.com").content);

 //
 // FTP
 //
 writeln(Ftp.get("ftp://ftp.digitalmars.com/sieve.ds",
 "./downloaded-file"));


 // ... authenication, cookies, interface select, progress callback
 // etc. is also implemented this way.


 /Jonas

 This is all very encouraging. I think this API covers nicely a variety
 of needs. We need to make sure everything interacts well with threads,
 in particular that one can shut down a transfer (or the entire library)
 from a thread or callback and have the existing transfer(s) throw an
 exception immediately.

I'll have a look at it.


 Regarding a range interface, it would be great if you allowed e.g.

 foreach (line; Http.get("https://mail.google.com").byLine()) {
 ...
 }

 The data transfer should happen concurrently with the foreach code. The
 type of line is char[] or const(char)[]. Similarly, there would be a
 byChunk interface that transfers in ubyte[] chunks.

 Also we need a head() method for the corresponding command.

 Andrei

That would be neat. What do you mean about concurrent data transfers 
with foreach?


/Jonas

Mar 14 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,
 
 So I've spent some time trying to wrap libcurl for D. There is a lot of
 things that you can do with libcurl which I did not know so I'm starting
 out small.
 
 For now I've created all the declarations for the latest public curl C
 api. I have put that in the etc.c.curl module.

 
 Great! Could you please create a pull request for that?

 
 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as well?

You can't. A pull request is for an entire branch. It pulls _everything_ from 
that branch which differs from the one being merged with. git cares about 
commits, not files. And pulling from another repository pulls all of the
commits 
which you don't have. So, if you want to do a pull request, you create a branch 
with exactly the commits that you wanted merged in on it. No more, no less.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently works
 but before proceeding further down this road I would like to get your
 comments on it.
 
 //
 // Simple HTTP GET with sane defaults
 // provides the .content, .headers and .status
 //
 writeln( Http.get("http://www.google.com").content );

 
 Sweet. As has been discussed, often the content is not text so you may
 want to have content return ubyte[] and add a new property such as
 "textContent" or "text".

 
 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

That's debatable. Some would argue one way, some another. Personally, I'd argue 
ubyte[]. I don't like void[] one bit. Others would agree with me, and yet
others 
would disagree. I don't think that there's really a general agreement on
whether 
void[] or ubyte[] is better when it comes to reading binary data like that.

- Jonathan M Davis

Mar 14 2011

"Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:

On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:

 On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,
 
 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.
 
 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 
 Great! Could you please create a pull request for that?

 
 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as
 well?

 
 You can't. A pull request is for an entire branch. It pulls _everything_
 from that branch which differs from the one being merged with. git cares
 about commits, not files. And pulling from another repository pulls all
 of the commits which you don't have. So, if you want to do a pull
 request, you create a branch with exactly the commits that you wanted
 merged in on it. No more, no less.
 
 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.
 
 //
 // Simple HTTP GET with sane defaults // provides the .content,
 .headers and .status //
 writeln( Http.get("http://www.google.com").content );

 
 Sweet. As has been discussed, often the content is not text so you
 may want to have content return ubyte[] and add a new property such
 as "textContent" or "text".

 
 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

 
 That's debatable. Some would argue one way, some another. Personally,
 I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
 me, and yet others would disagree. I don't think that there's really a
 general agreement on whether void[] or ubyte[] is better when it comes
 to reading binary data like that.

I also think ubyte[] is best, because:

1. It can be used directly.  (You can't get an element from a void[] 
array without casting it to something else first.)

2. There are no assumptions about the type of data contained in the 
array.  (char[] arrays are assumed to be UTF-8 encoded.)

3. ubyte[] arrays are (AFAIK) not scanned by the GC.  (void[] arrays may 
contain pointers and must therefore be scanned.)

I think the rule of thumb should be:  If the array contains raw data of 
unspecified type, but no pointers or references, use ubyte[].  

void[] is very useful for input parameters, however, since all arrays are 
implicitly castable to void[]:

  void writeData(void[] data) { ... }

  writeData("Hello World!");
  writeData([1, 2, 3, 4]);

-Lars

Mar 14 2011

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 14 Mar 2011 07:20:26 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:

 On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 Great! Could you please create a pull request for that?

 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as
 well?

 You can't. A pull request is for an entire branch. It pulls _everything_
 from that branch which differs from the one being merged with. git cares
 about commits, not files. And pulling from another repository pulls all
 of the commits which you don't have. So, if you want to do a pull
 request, you create a branch with exactly the commits that you wanted
 merged in on it. No more, no less.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults // provides the .content,
 .headers and .status //
 writeln( Http.get("http://www.google.com").content );

 Sweet. As has been discussed, often the content is not text so you
 may want to have content return ubyte[] and add a new property such
 as "textContent" or "text".

 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

 That's debatable. Some would argue one way, some another. Personally,
 I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
 me, and yet others would disagree. I don't think that there's really a
 general agreement on whether void[] or ubyte[] is better when it comes
 to reading binary data like that.

 I also think ubyte[] is best, because:

 1. It can be used directly.  (You can't get an element from a void[]
 array without casting it to something else first.)

 2. There are no assumptions about the type of data contained in the
 array.  (char[] arrays are assumed to be UTF-8 encoded.)

 3. ubyte[] arrays are (AFAIK) not scanned by the GC.  (void[] arrays may
 contain pointers and must therefore be scanned.)

This isn't exactly true.  arrays *created* as void[] will be scanned.   
Arrays created as ubyte[] and then cast to void[] will not be scanned.

However, it is far too easy while dealing with a void[] array to have it  
mysteriously flip its bit to scan-able.

 I think the rule of thumb should be:  If the array contains raw data of
 unspecified type, but no pointers or references, use ubyte[].

 void[] is very useful for input parameters, however, since all arrays are
 implicitly castable to void[]:

   void writeData(void[] data) { ... }

   writeData("Hello World!");
   writeData([1, 2, 3, 4]);

I think (and this differs from  my previous opinion) const(void)[] should  
be used for input parameters where any array type could be passed in.   
However, ubyte[] should be used for output parameters and for internal  
storage.  void[] just has too many pitfalls to be used anywhere but where  
its implicit casting ability is useful.

-Steve

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 14/03/11 13.28, Steven Schveighoffer wrote:
 On Mon, 14 Mar 2011 07:20:26 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:

 On Mon, 14 Mar 2011 02:36:07 -0700, Jonathan M Davis wrote:

 On Monday 14 March 2011 02:16:12 Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 On 3/11/11 9:20 AM, Jonas Drewsen wrote:
 Hi,

 So I've spent some time trying to wrap libcurl for D. There is a lot
 of things that you can do with libcurl which I did not know so I'm
 starting out small.

 For now I've created all the declarations for the latest public curl
 C api. I have put that in the etc.c.curl module.

 Great! Could you please create a pull request for that?

 Will do as soon as I've figured out howto create a pull request for a
 single file in a branch. Anyone knows how to do that on github? Or
 should I just create a pull request including the etc.curl wrapper as
 well?

 You can't. A pull request is for an entire branch. It pulls _everything_
 from that branch which differs from the one being merged with. git cares
 about commits, not files. And pulling from another repository pulls all
 of the commits which you don't have. So, if you want to do a pull
 request, you create a branch with exactly the commits that you wanted
 merged in on it. No more, no less.

 On top of that I've created a more D like api as seen below. This is
 located in the 'etc.curl' module. What you can see below currently
 works but before proceeding further down this road I would like to
 get your comments on it.

 //
 // Simple HTTP GET with sane defaults // provides the .content,
 .headers and .status //
 writeln( Http.get("http://www.google.com").content );

 Sweet. As has been discussed, often the content is not text so you
 may want to have content return ubyte[] and add a new property such
 as "textContent" or "text".

 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

 That's debatable. Some would argue one way, some another. Personally,
 I'd argue ubyte[]. I don't like void[] one bit. Others would agree with
 me, and yet others would disagree. I don't think that there's really a
 general agreement on whether void[] or ubyte[] is better when it comes
 to reading binary data like that.

 I also think ubyte[] is best, because:

 1. It can be used directly. (You can't get an element from a void[]
 array without casting it to something else first.)

 2. There are no assumptions about the type of data contained in the
 array. (char[] arrays are assumed to be UTF-8 encoded.)

 3. ubyte[] arrays are (AFAIK) not scanned by the GC. (void[] arrays may
 contain pointers and must therefore be scanned.)

 This isn't exactly true. arrays *created* as void[] will be scanned.
 Arrays created as ubyte[] and then cast to void[] will not be scanned.

 However, it is far too easy while dealing with a void[] array to have it
 mysteriously flip its bit to scan-able.

 I think the rule of thumb should be: If the array contains raw data of
 unspecified type, but no pointers or references, use ubyte[].

 void[] is very useful for input parameters, however, since all arrays are
 implicitly castable to void[]:

 void writeData(void[] data) { ... }

 writeData("Hello World!");
 writeData([1, 2, 3, 4]);

 I think (and this differs from my previous opinion) const(void)[] should
 be used for input parameters where any array type could be passed in.
 However, ubyte[] should be used for output parameters and for internal
 storage. void[] just has too many pitfalls to be used anywhere but where
 its implicit casting ability is useful.

 -Steve

const(ubyte)[] for input
void[] for output

that sounds reasonable. I guess that if everybody can agree on this then 
the all of phobos (e.g. std.file) should use the same types?

/Jonas

Mar 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/14/11 10:06 AM, Jonas Drewsen wrote:
 const(ubyte)[] for input
 void[] for output

 that sounds reasonable. I guess that if everybody can agree on this then
 the all of phobos (e.g. std.file) should use the same types?

Move the const from the first to the second line :o). I see no reason 
why user code can't mess with the buffer once read.

Yes, I agree std.file et al should switch to ubyte[].

Andrei

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 14/03/11 18.46, Andrei Alexandrescu wrote:
 On 3/14/11 10:06 AM, Jonas Drewsen wrote:
 const(ubyte)[] for input
 void[] for output

 that sounds reasonable. I guess that if everybody can agree on this then
 the all of phobos (e.g. std.file) should use the same types?

 Move the const from the first to the second line :o). I see no reason
 why user code can't mess with the buffer once read.

You are right of course. bummer.

 Yes, I agree std.file et al should switch to ubyte[].

 Andrei

Then lets hope someone makes a patch for it. Maybe I'll make it when I'm 
done with the curl stuff if no one beats me to it.

/Jonas

Mar 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/14/11 4:36 AM, Jonathan M Davis wrote:
 That's debatable. Some would argue one way, some another. Personally, I'd argue
 ubyte[]. I don't like void[] one bit. Others would agree with me, and yet
others
 would disagree. I don't think that there's really a general agreement on
whether
 void[] or ubyte[] is better when it comes to reading binary data like that.

void[]: "There is a typed array underneath, but I forgot its exact type".

Evidence: all array types convert to void[] automatically.

ubyte[]: "We're dealing with an array of octets here."

Evidence: ubyte[] has no special properties over T[].

All raw data reads should yield ubyte[], not void[]. This is because the 
user may or may not know that underneath really there's a different 
type, but the compiler and runtime have no such idea. So the burden of 
the assumption is on the user.

Raw data writes that take arrays could be allowed to accept void[] if 
implicit conversion from T[] is desirable.


Andrei

Mar 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/14/11 4:16 AM, Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 Sweet. As has been discussed, often the content is not text so you may
 want to have content return ubyte[] and add a new property such as
 "textContent" or "text".

 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

Yah, as per the ensuing discussion.

 As discussed, properties may be better here than setXxx and getXxx. The
 setReceiveCallback hook should take a ubyte[]. The
 setReceiveHeaderCallback should take a const(char)[]. That way you won't
 need to copy all headers, leaving safely that option to the client.

 I've already replaced the set/get methods with properties and renamed
 them. Hadn't thought of using const(char)[].. thanks for the hint.

A good general guideline: make sure that the user could easily and 
safely use a loop that reads a large http stream (with hooks and all) 
without allocating one item each pass through the loop.

 Regarding a range interface, it would be great if you allowed e.g.

 foreach (line; Http.get("https://mail.google.com").byLine()) {
 ...
 }

 The data transfer should happen concurrently with the foreach code. The
 type of line is char[] or const(char)[]. Similarly, there would be a
 byChunk interface that transfers in ubyte[] chunks.

 Also we need a head() method for the corresponding command.

 Andrei

 That would be neat. What do you mean about concurrent data transfers
 with foreach?

Assume the body of the loop does some time-consuming processing - like 
e.g. writing to another HTTP stream. Then your network reads should not 
wait for that processing. While the user code does something, you should 
already have the next transfer in flight.

Example: a utility that efficiently uses GET from one http source and 
uses the data to POST it to an http target should be an efficient 
few-liner. (FTP versions and mixed ones too.)


Andrei

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 14/03/11 18.55, Andrei Alexandrescu wrote:
 On 3/14/11 4:16 AM, Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 Sweet. As has been discussed, often the content is not text so you may
 want to have content return ubyte[] and add a new property such as
 "textContent" or "text".

 I've already changed it to void[] as done in the std.file module. Is
 ubyte[] better suited?

 Yah, as per the ensuing discussion.

 As discussed, properties may be better here than setXxx and getXxx. The
 setReceiveCallback hook should take a ubyte[]. The
 setReceiveHeaderCallback should take a const(char)[]. That way you won't
 need to copy all headers, leaving safely that option to the client.

 I've already replaced the set/get methods with properties and renamed
 them. Hadn't thought of using const(char)[].. thanks for the hint.

 A good general guideline: make sure that the user could easily and
 safely use a loop that reads a large http stream (with hooks and all)
 without allocating one item each pass through the loop.

Makes sense. I'll keep that in mind.

 Regarding a range interface, it would be great if you allowed e.g.

 foreach (line; Http.get("https://mail.google.com").byLine()) {
 ...
 }

 The data transfer should happen concurrently with the foreach code. The
 type of line is char[] or const(char)[]. Similarly, there would be a
 byChunk interface that transfers in ubyte[] chunks.

 Also we need a head() method for the corresponding command.

 Andrei

 That would be neat. What do you mean about concurrent data transfers
 with foreach?

 Assume the body of the loop does some time-consuming processing - like
 e.g. writing to another HTTP stream. Then your network reads should not
 wait for that processing. While the user code does something, you should
 already have the next transfer in flight.

 Example: a utility that efficiently uses GET from one http source and
 uses the data to POST it to an http target should be an efficient
 few-liner. (FTP versions and mixed ones too.)


 Andrei

I get it. Any existing implementation that does this I can have a look at?

/Jonas

Mar 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/14/11 4:11 PM, Jonas Drewsen wrote:
 On 14/03/11 18.55, Andrei Alexandrescu wrote:
 Assume the body of the loop does some time-consuming processing - like
 e.g. writing to another HTTP stream. Then your network reads should not
 wait for that processing. While the user code does something, you should
 already have the next transfer in flight.

 Example: a utility that efficiently uses GET from one http source and
 uses the data to POST it to an http target should be an efficient
 few-liner. (FTP versions and mixed ones too.)


 Andrei

 I get it. Any existing implementation that does this I can have a look at?

Unfortunately not at the moment. I wanted to define such a thing for 
std.stdio called byLineAsync and byChunkAsync but never got to it.

The basic idea is:

1. Define a new range type, e.g. AsyncHttpInputRange

2. Inside that range start a secondary thread that does the actual 
transfer and passes read buffers to the main thread by means of messages

3. See std.concurrency and the free chapter 
http://www.informit.com/articles/printerfriendly.aspx?p=1609144 for details

4. Control congestion (too many buffers in flight) with setMaxMailboxSize.

5. Make sure you have a little protocol that stops the secondary thread 
when the range is destroyed.


Andrei

Mar 14 2011

Jonas Drewsen <jdrewsen nospam.com> writes:

On 13/03/11 23.44, Andrei Alexandrescu wrote:
 You'll probably need to justify the existence of a class hierarchy and
 what overridable methods there are. In particular, since you seem to
 offer hooks via delegates, probably classes wouldn't be needed at all.
 (FWIW I would've done the same; I wouldn't want to inherit just to
 intercept the headers etc.)

Missed this one in my last reply.

Ftp/Http etc. are all inheriting from a Protocol class. The Protocol 
class defines common settings ( properties) for all protocols e.g. 
dnsTimeout, connectTimeout, networkInterface, url, port selection.

I could make these into a mixin and thereby get rid of the inheritance 
of course.

I think that keeping the Protocol as an abstract base class would 
benefit e.g. the integration with streams. In that case we could simply 
create a CurlTransport that contains a reference to a Protocol derived 
objects (Http,Ftp...).

Or would it be better to have specific HttpTransport, FtpTransport?


/Jonas

Mar 14 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/14/11 10:38 AM, Jonas Drewsen wrote:
 On 13/03/11 23.44, Andrei Alexandrescu wrote:
 You'll probably need to justify the existence of a class hierarchy and
 what overridable methods there are. In particular, since you seem to
 offer hooks via delegates, probably classes wouldn't be needed at all.
 (FWIW I would've done the same; I wouldn't want to inherit just to
 intercept the headers etc.)

 Missed this one in my last reply.

 Ftp/Http etc. are all inheriting from a Protocol class. The Protocol
 class defines common settings ( properties) for all protocols e.g.
 dnsTimeout, connectTimeout, networkInterface, url, port selection.

 I could make these into a mixin and thereby get rid of the inheritance
 of course.

Use Occam's razor and the path of least resistence to get the most 
natural interface.

 I think that keeping the Protocol as an abstract base class would
 benefit e.g. the integration with streams. In that case we could simply
 create a CurlTransport that contains a reference to a Protocol derived
 objects (Http,Ftp...).

 Or would it be better to have specific HttpTransport, FtpTransport?

Count the commonalities and the differences and then make an executive 
decision.


Andrei

Mar 14 2011

Kagamin <spam here.lot> writes:

Lars T. Kyllingstad Wrote:

 2. There are no assumptions about the type of data contained in the 
 array.  (char[] arrays are assumed to be UTF-8 encoded.)

http has content-type, so it's known, what is contained in the array.

Mar 14 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Curl support RFC