www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - etc.curl: Formal review begin

reply David Nadlinger <see klickverbot.at> writes:
Now that Lars Kyllingstad's new and improved std.path has passed the 
vote  – congratulations, Lars! –, and Jose Armando Garcia, the author of 
the proposed logging module, is currently not available, the etc.curl 
module by Jonas Drewsen is at the front of the review queue. I have 
volunteered to run the formal process for inclusion with Phobos.

etc.curl provides a high-level interface to client-side networking 
functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl 
[1], accessing the C interface of which is already possible using Phobos 
in the form of etc.c.curl. Prior stages of the work have already been 
discussed in March [2], May [3], and June [4]. Thanks to everybody who 
took the time to evaluate the library back then; as far as I am aware, 
all raised concerns have been addressed since then.

Based on the amount of discussion this module already underwent, I 
hereby propose a two-week review period. Barring any objections, the 
review starts today (Aug 18) and ends on Aug 31, followed by a one week 
vote (Sep 1 - Sep 7).

Code:
   https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

API documentation:
   http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

A discussion on the wrapper has already been started on digitalmars.D 
[5]. To keep everything in one place, please post your comments and 
reviews there.

At this point, I would like to invite everyone to spend some time 
testing the module and reading its documentation and code. It is 
essential for a working open source code review process that many people 
participate in it, and regardless of whether you are new to D or a 
seasoned contributor, your opinion is very valuable.

Jonas also asked us to have a look at the list of known issues (which 
can also be found at the top of the source file), and, if possible, 
suggest a solution:

---
Known issues:
  * DDoc is not generated where the mixins ByLineAsync, ByLineSync, 
ByChunkAsync and ByLineSync are used. This seems to be a limitation of 
ddoc - suggestions on how to circumvent this appreciated.

Possible improvements:
  * Progress may be deprecated in the future. Maybe implement a replacement.
  * Support typed http headers - (Johannes Pfau)
---

To make sure it doesn't get lost, let me also repeat Johannes Pfau's 
request to add proxy support from another thread ([6]) here.

Thanks a lot for your reviewing efforts,
David



[1] http://curl.haxx.se/libcurl/
[2] 
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_131753.html
[3] 
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.html
[4] 
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two_138945.html
[5] 
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142661
[6] 
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142175
Aug 17 2011
next sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

 
 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 
 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 
 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.
 
 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement. 
   * Support typed http headers - (Johannes Pfau)
 ---

I'll try my best to actually make use of this for testing it. My first thought here is that I don't like the idea of including something that is already being considered for deprecation. If it isn't good enough lets hold off putting it in.
Aug 17 2011
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
=20
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando =


 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

=20
 Code:
   https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
=20
 API documentation:
   http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
=20
 ---
 Known issues:
  * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation =


 ddoc - suggestions on how to circumvent this appreciated.
=20
 Possible improvements:
  * Progress may be deprecated in the future. Maybe implement a
  replacement.=20
  * Support typed http headers - (Johannes Pfau)
 ---


Doe libcurl provide any facility for URL encoding/decoding? Or perhaps = there should be an accompanying submission for doing so? Any nontrivial = request will need properly encoded URL parameters.=
Aug 30 2011
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
   Or perhaps there should be an accompanying submission for doing so?

Phobos does have std.uri.
Aug 30 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 6:35 PM, Adam D. Ruppe wrote:
    Or perhaps there should be an accompanying submission for doing so?

Phobos does have std.uri.

I'm still longing for that cgi library of yours... Andrei
Aug 30 2011
prev sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement.
   * Support typed http headers - (Johannes Pfau)
 ---


Doe libcurl provide any facility for URL encoding/decoding? Or perhaps there should be an accompanying submission for doing so? Any nontrivial request will need properly encoded URL parameters.

etc.curl.Protocol has the escape/unescape methods. /Jonas
Sep 01 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 9/1/11 6:42 PM, jdrewsen wrote:
 Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:
 Doe libcurl provide any facility for URL encoding/decoding? Or perhaps
 there should be an accompanying submission for doing so? Any
 nontrivial request will need properly encoded URL parameters.


Oh, sorry, I got that wrong in the review wrap-up then. But due to the already discussed documentation issues, and the not very descriptive »(Un)Escape a string« comments, these two are really easy to miss. Are the functions from std.uri sufficient (or could they easily be improved to be so)? If so, I'd propose to simply remove the two and cross reference std.uri, otherwise you might want to add a more detailed description, including a link to http://www.ietf.org/rfc/rfc3986.txt. David
Sep 01 2011
parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 01-09-2011 19:23, David Nadlinger skrev:
 On 9/1/11 6:42 PM, jdrewsen wrote:
 Den 31-08-2011 01:28, Sean Kelly skrev:
 On Aug 17, 2011, at 6:19 PM, Jesse Phillips wrote:
 Doe libcurl provide any facility for URL encoding/decoding? Or perhaps
 there should be an accompanying submission for doing so? Any
 nontrivial request will need properly encoded URL parameters.


Oh, sorry, I got that wrong in the review wrap-up then. But due to the already discussed documentation issues, and the not very descriptive »(Un)Escape a string« comments, these two are really easy to miss. Are the functions from std.uri sufficient (or could they easily be improved to be so)? If so, I'd propose to simply remove the two and cross reference std.uri, otherwise you might want to add a more detailed description, including a link to http://www.ietf.org/rfc/rfc3986.txt. David

The (un)escape methods are located in the Protocol template and is used for escaping strings for all of the protocols supported by libcurl. That should be in the documentation of course. /Jonas
Sep 03 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 9/3/11 10:19 PM, jdrewsen wrote:
 The (un)escape methods are located in the Protocol template and is used
 for escaping strings for all of the protocols supported by libcurl. That
 should be in the documentation of course.

Yes, sure – but do they add any value over the functions from std.uri? Otherwise, I would argue to drop the libcurl ones in favor of the native D ones, having the same functionality in Phobos twice doesn't seem desirable to me. David
Sep 03 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 04-09-2011 00:47, David Nadlinger skrev:
 On 9/3/11 10:19 PM, jdrewsen wrote:
 The (un)escape methods are located in the Protocol template and is used
 for escaping strings for all of the protocols supported by libcurl. That
 should be in the documentation of course.

Yes, sure – but do they add any value over the functions from std.uri? Otherwise, I would argue to drop the libcurl ones in favor of the native D ones, having the same functionality in Phobos twice doesn't seem desirable to me.

I noticed that libcurl changed their escape/unescape function from not needing a curl handle to requiring one. This is probably because the supported protocols escapes string differently and this would be supported automatically by using libcurls escape functions in std.curl. On the other hand I also think they fit better in the std.uri package for the http protocol at least. I do not know much about escapes in the two other std.curl supported protocols Ftp and Smtp. All in all this probably suggests that I should remove the std.curl escape functions and std.uri should be used instead. Anyone who disagrees with this choice? /Jonas
Sep 09 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 30, 2011, at 4:35 PM, Adam D. Ruppe wrote:

  Or perhaps there should be an accompanying submission for doing so?

Phobos does have std.uri.

And here is where I show how much web programming I've been doing in D.
Aug 30 2011
prev sibling parent David Nadlinger <see klickverbot.at> writes:
On 8/18/11 3:19 AM, Jesse Phillips wrote:
 My first thought here is that I don't like the idea of including
 something that is already being considered for deprecation. If it isn't
 good enough lets hold off putting it in.

As Johannes already mentioned, only removing the default progress callback *implementation* is on the libcurl TODO list (the thing that prints the progress information for the curl command line tool). The D API would be completely unaffected by this, since it is all about providing custom callback implementations. The TODO list also mentions the possibility that libcurl will switch to 64 bit integer parameters instead of doubles (which was also suggested somewhere else during the review of the D wrapper), but this seems to be still a long way off. David
Aug 31 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Jesse Phillips wrote:
On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garci=


 author of the proposed logging module, is currently not available,
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue. I have volunteered to run the formal process for inclusion
 with Phobos.

=20
 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
=20
 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
=20
 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation
 of ddoc - suggestions on how to circumvent this appreciated.
=20
 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
   replacement.=20
   * Support typed http headers - (Johannes Pfau)
 ---

I'll try my best to actually make use of this for testing it. My first thought here is that I don't like the idea of including=20 something that is already being considered for deprecation. If it isn't good enough lets hold off putting it in.

I agree, but I think I should point out, that only curls _builtin_ progress callback might be removed in the future. The progress functionality could still be implemented in the wrapper without changing the public API, so this change would be transparent to users. So we won't gain much from implementing this right now. --=20 Johannes Pfau
Aug 18 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-18 01:12, David Nadlinger wrote:
 Known issues:
 * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

Not sure if it's the same issue but I think Jonas mentioned it: http://d.puremagic.com/issues/show_bug.cgi?id=648 -- /Jacob Carlborg
Aug 18 2011
prev sibling next sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many people
 participate in it, and regardless of whether you are new to D or a
 seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.
Aug 20 2011
next sibling parent reply Jesse Phillips <jessekphillips+d gmail.com> writes:
On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.
Aug 21 2011
next sibling parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:

 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.

Thank you for the feedback. There is no mentioning of brackets in the style doc though: http://www.d-programming-language.org/dstyle.html /Jonas
Aug 21 2011
parent reply simendsjo <simendsjo gmail.com> writes:
On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.

Thank you for the feedback. There is no mentioning of brackets in the style doc though: http://www.d-programming-language.org/dstyle.html

That's only because it hasn't been updated yet. The 2 main formatting guidelines that aren't in the style guide are 1. Braces must be on their own line (save perhaps for lambdas which are one liners).

So I've heard, but even new code doesn't follow this. Davids RegionAllocater brand new (in the review queue?), and have braces on the same line.
 2. There is a soft limit of 80 characters per line and a hard limit of 120
 characters (so, generally lines should be within 80 characters but they can go
 longer just long; however, they can never exceed 120 characters).

 - Jonathan M Davis

Aug 22 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 22-08-2011 17:20, Jonathan M Davis skrev:
 On Monday, August 22, 2011 15:23:54 simendsjo wrote:
 On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some
 time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that
 many
 people participate in it, and regardless of whether you are new
 to D
 or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.

Thank you for the feedback. There is no mentioning of brackets in the style doc though: http://www.d-programming-language.org/dstyle.html

That's only because it hasn't been updated yet. The 2 main formatting guidelines that aren't in the style guide are 1. Braces must be on their own line (save perhaps for lambdas which are one liners).

So I've heard, but even new code doesn't follow this. Davids RegionAllocater brand new (in the review queue?), and have braces on the same line.

Just because something is in the review queue doesn't mean that it follows the style guide correctly. And in a few cases, code which doesn't follow the style guide may have managed to get into Phobos because it wasn't noticed or it wsa before we decided on what style to use. I believe that David normally puts braces on the same line in his own code, so unless he's writing specifically for Phobos and remembers to put the braces on their own line, he's likely to put them on the same line. It's going to need to be fixed before getting merged into Phobos. - Jonathan M Davis

Aug 24 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.

Thank you for the feedback. There is no mentioning of brackets in the style doc though: http://www.d-programming-language.org/dstyle.html

That's only because it hasn't been updated yet. The 2 main formatting guidelines that aren't in the style guide are 1. Braces must be on their own line (save perhaps for lambdas which are one liners). 2. There is a soft limit of 80 characters per line and a hard limit of 120 characters (so, generally lines should be within 80 characters but they can go longer just long; however, they can never exceed 120 characters). - Jonathan M Davis
Aug 21 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/21/2011 2:56 PM, Jonathan M Davis wrote:
 80 characters should be enough. Most of the documentation in Phobos restricts
 itself to 80 characters, including the examples.

80 characters is way too long for magazine/book examples, and will cause problems when converting the Phobos docs to an ebook format. I recommend examples should be ruthlessly narrowed. At least less than 40.
Aug 23 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/23/2011 3:09 PM, Jonathan M Davis wrote:
 40 characters is disgustingly narrow. We may need to cut examples to that sort
 of length for the sake of e-books, but I'm willing to bet that something like
 3/4 of the examples would have to be wrapped because of that. It's a serious
 downgrade for anyone reading them on a computer.

It's not that bad. I did it for the Digital Mars C examples. I just want to make the point that the Phobos implementation code will necessarily follow a different formatting style than the example code. It's just a fact of life, and it isn't just for ebooks. Take a look at any programming book, magazine article, etc. Vertical space is also at a premium for example code.
Aug 23 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/23/2011 4:38 PM, Jonathan M Davis wrote:
 I understand if examples have to have a shorter line limit due to how they're
 presented, but the shorter the limit, the more restrictive and annoying it
 gets.

I know, but having it wrap is worse (or worst of all, have the right edge just chopped off).
Aug 23 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, August 21, 2011 21:44:33 Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width.

80 characters should be enough. Most of the documentation in Phobos restricts itself to 80 characters, including the examples. - Jonathan M Davis
Aug 21 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, August 22, 2011 15:23:54 simendsjo wrote:
 On 22.08.2011 08:25, Jonathan M Davis wrote:
 On Monday, August 22, 2011 08:09:54 Jonas Drewsen wrote:
 On 21/08/11 23.44, Jesse Phillips wrote:
 On Sat, 20 Aug 2011 23:57:54 +0000, Jesse Phillips wrote:
 On Thu, 18 Aug 2011 01:12:20 +0200, David Nadlinger wrote:
 At this point, I would like to invite everyone to spend some
 time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that
 many
 people participate in it, and regardless of whether you are new
 to D
 or
 a seasoned contributor, your opinion is very valuable.

Just tried building a 64 bit Phobos and got: etc/curl.d(500): Error: function etc.c.curl.curl_easy_escape (void* handle, char* string, int length) is not callable using argument types (void*,char*,ulong) etc/curl.d(510): Error: function etc.c.curl.curl_easy_unescape (void* handle, char* string, int length, int* outlength) is not callable using argument types (void*,char*,ulong,int*) A couple casts let it build, moving on to using.

And now for my miniature review. The coding standard for Phobos was decided to have brackets on their own line. I think the examples should also be limited to about 50-60 characters in width. I don't have much else on comments, once it complied my basic testing worked. I'd need to have a real objective to actually learn most of what it provides, but it seems simple enough. Good work.

Thank you for the feedback. There is no mentioning of brackets in the style doc though: http://www.d-programming-language.org/dstyle.html

That's only because it hasn't been updated yet. The 2 main formatting guidelines that aren't in the style guide are 1. Braces must be on their own line (save perhaps for lambdas which are one liners).

So I've heard, but even new code doesn't follow this. Davids RegionAllocater brand new (in the review queue?), and have braces on the same line.

Just because something is in the review queue doesn't mean that it follows the style guide correctly. And in a few cases, code which doesn't follow the style guide may have managed to get into Phobos because it wasn't noticed or it wsa before we decided on what style to use. I believe that David normally puts braces on the same line in his own code, so unless he's writing specifically for Phobos and remembers to put the braces on their own line, he's likely to put them on the same line. It's going to need to be fixed before getting merged into Phobos. - Jonathan M Davis
Aug 22 2011
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com"); and contents[] should be the web page; what I'd see if I opened it in a browser and selected "view source". From perusing the etc.curl doc I have absolutely no idea how to do this. 2. I'd like to get the contents as an input range, which I can then iterate like any other range. 3. A lot of web sites like twitter.com have an API. How do I access that API from D? 4. How do I download a file from ftp? ubyte[] file = downloadFile("ftp://www.digitalmars.com/whatever.zip"); 5. I'd like an NNTP interface so I can get and post newsgroup messages from D, though I suspect that is beyond the scope of etc.curl.
Aug 23 2011
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/23/2011 2:33 PM, Walter Bright wrote:
 1. The first, most obvious thing I'd like to do is present a URL and get a
 string back that is the contents of that web page:

 string contents = getURL("http://www.digitalmars.com");

 and contents[] should be the web page; what I'd see if I opened it in a browser
 and selected "view source". From perusing the etc.curl doc I have absolutely no
 idea how to do this.

I should add that this should be multithreadable, as I expect calls to curl from multiple threads and they should not step on each other.
 3. A lot of web sites like twitter.com have an API. How do I access that API
 from D?

For example, I'd like a program that I can write and run like: tweet WalterBright -pw=1234 "I had eggs for breakfast" This could be in the D samples directory.
Aug 23 2011
prev sibling next sibling parent Jimmy Cao <jcao219 gmail.com> writes:
--001517478c8a8c413104ab36419f
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Aug 23, 2011 at 4:33 PM, Walter Bright
<newshound2 digitalmars.com>wrote:
 5. I'd like an NNTP interface so I can get and post newsgroup messages from
 D, though I suspect that is beyond the scope of etc.curl.

libcurl doesn't have NNTP support (yet). If someone can write it and contribute to libcurl, that would be welcomed, I think. --001517478c8a8c413104ab36419f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Tue, Aug 23, 2011 at 4:33 PM, Walter Bright <= span dir=3D"ltr">&lt;<a href=3D"mailto:newshound2 digitalmars.com">newshoun= d2 digitalmars.com</a>&gt;</span> wrote:<blockquote class=3D"gmail_quote" s= tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> 5. I&#39;d like an NNTP interface so I can get and post newsgroup messages = from D, though I suspect that is beyond the scope of etc.curl.<br> </blockquote></div><br><div>libcurl doesn&#39;t have NNTP support (yet). = =A0If someone can write it and contribute to libcurl, that would be welcome= d, I think.</div> --001517478c8a8c413104ab36419f--
Aug 23 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-23 23:33, Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com"); and contents[] should be the web page; what I'd see if I opened it in a browser and selected "view source". From perusing the etc.curl doc I have absolutely no idea how to do this. 2. I'd like to get the contents as an input range, which I can then iterate like any other range. 3. A lot of web sites like twitter.com have an API. How do I access that API from D?

APIs like this, assuming they are RESTful APIs, are accessed via regular HTTP methods/verbs, like POST, GET, PUT, DELETE and so on. Your first example would be a GET request.
 4. How do I download a file from ftp?

 ubyte[] file = downloadFile("ftp://www.digitalmars.com/whatever.zip");

 5. I'd like an NNTP interface so I can get and post newsgroup messages
 from D, though I suspect that is beyond the scope of etc.curl.

-- /Jacob Carlborg
Aug 23 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Ok, but 1. there's no indication in the doc that that's what it does because 1. it is an incomplete fragment and 2. a simpler interface (like what I proposed) would likely be better.
 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error recovery, progress indicators, etc., should all be brought up later.
 and contents[] should be the web page; what I'd see if I opened it in
 a browser and selected "view source". From perusing the etc.curl doc I
 have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

auto res = Http.getAsync("http://www.digitalmars.com"); auto range = res.byChunk(100); writeln(range.front); range.popFront(); writeln(range.front); There's only a byChunk for synchronous requests, but I think this first reads the complete Result into memory and then returns slices. (This is a limitation caused by curl's API, as curl uses callbacks)

Ok.
 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

The twitter API is a REST API, so if you want to get information from twitter, you use a GET request with a special url: string xmlTimeLine = Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString(); Posting is a little more complicated, as it requires authentication: ubyte[] result; Http twitter = Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World"); twitter.setAuthentication("user", "password"); twitter.onReceive = (ubyte[] data) { result ~= data; return data.length; }; twitter.method = Http.Method.post; twitter.perform(); (Status should be url encoded, Http.escape can be used for that) Not sure, if this still works though. Twitter is switching from HTTP basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the documentation would be very informative and useful.
 4. How do I download a file from ftp?

     ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file"); or ubyte[] file = Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

Great! Please add to the documentation. I also think that etc.curl would benefit greatly from an article written about it showing how to use it to accomplish various fun and simple tasks.
Aug 24 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/24/11 9:35 AM, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();


I find this flow counterintuitive. It suggests that first a GET is issued and then the timeout is set. It takes a bit of getting used to things to realize that get() is actually preparing things and that the real work is done in toString.
 Ok, but 1. there's no indication in the doc that that's what it does
 because 1. it is an incomplete fragment and 2. a simpler interface (like
 what I proposed) would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error recovery, progress indicators, etc., should all be brought up later.

I'll note there's an annoying redundancy: the protocol appears twice, once in the URL and again in the type Http. That means if one needs to fetch from an URL transparently they'd have to go through e.g.: string url = readln(); string contents; if (url.startsWith("http://")) { contents = Http.get(url).toString(); } else if (url.startsWith("ftp://")) { contents = Ftp.get(url).toString(); } else { enforce(false, "Unrecognized protocol."); } Probably this simple level of abstraction should be provided by the library (with the added advantage that client code will automatically work with new protocols as the library includes them). I'll be back with a thorough review. Andrei
Aug 24 2011
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/24/2011 10:33 AM, Andrei Alexandrescu wrote:
 Probably this simple level of abstraction should be provided by the library
 (with the added advantage that client code will automatically work with new
 protocols as the library includes them).

Yes to both.
Aug 24 2011
prev sibling parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 24/08/11 19.33, Andrei Alexandrescu wrote:
 On 8/24/11 9:35 AM, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();


I find this flow counterintuitive. It suggests that first a GET is issued and then the timeout is set. It takes a bit of getting used to things to realize that get() is actually preparing things and that the real work is done in toString.

What I was aiming at was something like: Http.get("http://google.com", connectTimeout = dur!"seconds"(10)); But D does not support named parameters and that is why it ended up this way. I guess renaming to Http.prepareGet("http://google.com") or something would better reflect its usage.
 Ok, but 1. there's no indication in the doc that that's what it does
 because 1. it is an incomplete fragment and 2. a simpler interface (like
 what I proposed) would likely be better.


 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error recovery, progress indicators, etc., should all be brought up later.

I'll note there's an annoying redundancy: the protocol appears twice, once in the URL and again in the type Http. That means if one needs to fetch from an URL transparently they'd have to go through e.g.: string url = readln(); string contents; if (url.startsWith("http://")) { contents = Http.get(url).toString(); } else if (url.startsWith("ftp://")) { contents = Ftp.get(url).toString(); } else { enforce(false, "Unrecognized protocol."); } Probably this simple level of abstraction should be provided by the library (with the added advantage that client code will automatically work with new protocols as the library includes them).

Someone also got confused about the "get()" static method could be called on an instance of Http. This is of course not how it is intended to be used but never the less possible. I guess forcing the user to prepend the class name when calling a static class method would be nice to catch such errors. Maybe this should be change in the D language? Since this is currently not the case I think moving all the Http.get(),Http.post()...Ftp.get()... static methods to module scope would solve the issue. I also think it would be a good idea to provide a function that parses the protocol and just performs the curl request as a convenience. This cannot replace the Http.get() style functions because they return a Result that know what can be done on a specific protocol. For example maxRedirects is supported for the Http protocol but not for Ftp/Smtp. So to summarize: * Move static convenience methods (Http.get/post/...) to module level functions and rename to reflect that they only prepare a request. One of toString(),bytes(),byLine(),byChunk(),byLineAsync(),byChunkAsync() will have to be called to actually get something. * Create a new convenience function that parses protocol from url and performs a sync version of the request and then returns the result (ubyte[] or string). This is not as flexible but probably the most common case. For example: string content = curl("http://mysite.com/form.cgi", "hello world data", "username:password"); Actually would adding opCast!string() and opCast!(ubyte[])() to AsyncResult and Result allow for this?: string content = Http.get("http://mysite.com/form.cgi"); or ubyte[] content = Http.get("http://mysite.com/form.cgi"); How does this sound?
 I'll be back with a thorough review.


 Andrei

Aug 25 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David
Aug 31 2011
parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David

Ok. You can do it in C++. I wonder what this has not been allowed in D? /Jonas
Sep 01 2011
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David

Ok. You can do it in C++. I wonder what this has not been allowed in D? /Jonas

Because in D there is/should be multiple alias this, which is a better concept to solve the same set of problems.
Sep 01 2011
next sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David

Ok. You can do it in C++. I wonder what this has not been allowed in D? /Jonas

Because in D there is/should be multiple alias this, which is a better concept to solve the same set of problems.

Will multiple alias this is currently not supported and AFAIK not planned. Furthermore I don't how doing alias this on both string and ubyte[] is possible since they share a lot of interface. For example: If the class Result has alias this for ubyte[] and for string then what does this mean? Result res = .....; writeln("The length is ", res.length); Therefore I still think implicit opCast would be nice. Maybe the opCast could have an implicit attribute or something to make i safer. /Jonas
Sep 03 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 03-09-2011 22:19, Jonathan M Davis skrev:
 On Saturday, September 03, 2011 22:13:44 jdrewsen wrote:
 Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:

 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David

Ok. You can do it in C++. I wonder what this has not been allowed in D? /Jonas

Because in D there is/should be multiple alias this, which is a better concept to solve the same set of problems.

Will multiple alias this is currently not supported and AFAIK not planned. Furthermore I don't how doing alias this on both string and ubyte[] is possible since they share a lot of interface. For example: If the class Result has alias this for ubyte[] and for string then what does this mean? Result res = .....; writeln("The length is ", res.length); Therefore I still think implicit opCast would be nice. Maybe the opCast could have an implicit attribute or something to make i safer.

TDPL explicitly states that D supports multiple alias this. It just hasn't been implemented in dmd yet. It is certainly true though that there may be issues with having multiple alias this of types which are too similar, and those issues have not necessarily been ironed out yet.

Ok. Must have got it wrong then.
 opImplicitCast has been requested on a number of occasions, but there are not
 currently any plans for such a feature. alias this is intended to fulfill that
 role, and I don't see how opImplicitCast would really be any different from
 alias this, since you can give a function to alias this already (and then the
 type implicitly converts to the function's return type).

Didn't know the return type trick. Nice! /Jonas
Sep 09 2011
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, September 03, 2011 22:13:44 jdrewsen wrote:
 Den 01-09-2011 18:52, Timon Gehr skrev:
 On 09/01/2011 06:39 PM, jdrewsen wrote:
 Den 01-09-2011 05:19, David Nadlinger skrev:
 On 8/25/11 12:05 PM, Jonas Drewsen wrote:
 Actually would adding opCast!string() and opCast!(ubyte[])() to
 AsyncResult and Result allow for this?:
 
 string content = Http.get("http://mysite.com/form.cgi");
 or
 ubyte[] content = Http.get("http://mysite.com/form.cgi");

It wouldn't, opCast is only taken into account for explicit casts (if(something) is considered an explicit cast to bool). For this to work, you would have to use alias this, but currently you could only provide one of the two conversions. David

Ok. You can do it in C++. I wonder what this has not been allowed in D? /Jonas

Because in D there is/should be multiple alias this, which is a better concept to solve the same set of problems.

Will multiple alias this is currently not supported and AFAIK not planned. Furthermore I don't how doing alias this on both string and ubyte[] is possible since they share a lot of interface. For example: If the class Result has alias this for ubyte[] and for string then what does this mean? Result res = .....; writeln("The length is ", res.length); Therefore I still think implicit opCast would be nice. Maybe the opCast could have an implicit attribute or something to make i safer.

TDPL explicitly states that D supports multiple alias this. It just hasn't been implemented in dmd yet. It is certainly true though that there may be issues with having multiple alias this of types which are too similar, and those issues have not necessarily been ironed out yet. opImplicitCast has been requested on a number of occasions, but there are not currently any plans for such a feature. alias this is intended to fulfill that role, and I don't see how opImplicitCast would really be any different from alias this, since you can give a function to alias this already (and then the type implicitly converts to the function's return type). - Jonathan M Davis
Sep 03 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-08-24 18:35, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:

 Not sure, if this still works though. Twitter is switching from HTTP
 basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the documentation would be very informative and useful.

I think this is a little beyond the documentation to explain who to use a specific third party API. This would fit better in an article or similar. -- /Jacob Carlborg
Aug 24 2011
prev sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 24/08/11 18.35, Walter Bright wrote:
 On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Ok, but 1. there's no indication in the doc that that's what it does because 1. it is an incomplete fragment and 2. a simpler interface (like what I proposed) would likely be better.
 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error recovery, progress indicators, etc., should all be brought up later.
 and contents[] should be the web page; what I'd see if I opened it in
 a browser and selected "view source". From perusing the etc.curl doc I
 have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

auto res = Http.getAsync("http://www.digitalmars.com"); auto range = res.byChunk(100); writeln(range.front); range.popFront(); writeln(range.front); There's only a byChunk for synchronous requests, but I think this first reads the complete Result into memory and then returns slices. (This is a limitation caused by curl's API, as curl uses callbacks)

Ok.
 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

The twitter API is a REST API, so if you want to get information from twitter, you use a GET request with a special url: string xmlTimeLine = Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString(); Posting is a little more complicated, as it requires authentication: ubyte[] result; Http twitter = Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World"); twitter.setAuthentication("user", "password"); twitter.onReceive = (ubyte[] data) { result ~= data; return data.length; }; twitter.method = Http.Method.post; twitter.perform(); (Status should be url encoded, Http.escape can be used for that) Not sure, if this still works though. Twitter is switching from HTTP basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the documentation would be very informative and useful.
 4. How do I download a file from ftp?

 ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file"); or ubyte[] file = Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

Great! Please add to the documentation. I also think that etc.curl would benefit greatly from an article written about it showing how to use it to accomplish various fun and simple tasks.

Actually I truely believe that much of the documentation you're missing is actually in the source code. As I've previously mentioned ddoc does not handle mixins very well. In the etc.curl case it results in the Result and AsyncResult structs documentation is nearly missing. They are pretty important and especially the byChunk()/byLine() methods (mixed in) are not visible in the docs because ddoc cannot handle it. I would really like to know how to handle this situation because it understandably confuses people and makes it impossible to figure out how to do stuff as you have experienced yourself. I will see if I can find the time to put an article together though. Thanks. /Jonas
Aug 25 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, August 23, 2011 14:35 Walter Bright wrote:
 On 8/21/2011 2:56 PM, Jonathan M Davis wrote:
 80 characters should be enough. Most of the documentation in Phobos
 restricts itself to 80 characters, including the examples.

80 characters is way too long for magazine/book examples, and will cause problems when converting the Phobos docs to an ebook format. I recommend examples should be ruthlessly narrowed. At least less than 40.

40 characters is disgustingly narrow. We may need to cut examples to that sort of length for the sake of e-books, but I'm willing to bet that something like 3/4 of the examples would have to be wrapped because of that. It's a serious downgrade for anyone reading them on a computer. - Jonathan M Davis
Aug 23 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, August 23, 2011 15:17 Walter Bright wrote:
 On 8/23/2011 3:09 PM, Jonathan M Davis wrote:
 40 characters is disgustingly narrow. We may need to cut examples to that
 sort of length for the sake of e-books, but I'm willing to bet that
 something like 3/4 of the examples would have to be wrapped because of
 that. It's a serious downgrade for anyone reading them on a computer.

It's not that bad. I did it for the Digital Mars C examples. I just want to make the point that the Phobos implementation code will necessarily follow a different formatting style than the example code. It's just a fact of life, and it isn't just for ebooks. Take a look at any programming book, magazine article, etc. Vertical space is also at a premium for example code.

Well, I've been keeping documentation (including examples) to a hard limit of 80 characters, whereas the actual code only has a soft limit of 80 characters. I understand if examples have to have a shorter line limit due to how they're presented, but the shorter the limit, the more restrictive and annoying it gets. - Jonathan M Davis
Aug 23 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Walter Bright wrote:
On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents =3D getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toStrin= g(); or, if you don't care about timeouts: string contents =3D Http.get("http://www.google.com").toString();
and contents[] should be the web page; what I'd see if I opened it in
a browser and selected "view source". From perusing the etc.curl doc I
have absolutely no idea how to do this.

2. I'd like to get the contents as an input range, which I can then
iterate like any other range.

auto res =3D Http.getAsync("http://www.digitalmars.com"); auto range =3D res.byChunk(100); writeln(range.front); range.popFront(); writeln(range.front); There's only a byChunk for synchronous requests, but I think this first reads the complete Result into memory and then returns slices. (This is a limitation caused by curl's API, as curl uses callbacks)=20
3. A lot of web sites like twitter.com have an API. How do I access
that API from D?

The twitter API is a REST API, so if you want to get information from twitter, you use a GET request with a special url: string xmlTimeLine =3D Http.get("http://twitter.com/statuses/user_timeline.xml?id=3Dwalterbright")= .toString(); Posting is a little more complicated, as it requires authentication: ubyte[] result; Http twitter =3D Http("http://api.twitter.com/1/statuses/update.format?status=3DHello%20Worl= d"); twitter.setAuthentication("user", "password"); twitter.onReceive =3D (ubyte[] data) { result ~=3D data; return data.length; }; twitter.method =3D Http.Method.post; twitter.perform(); (Status should be url encoded, Http.escape can be used for that) Not sure, if this still works though. Twitter is switching from HTTP basic authentication to OAuth. Also, this code is not tested ;-)
4. How do I download a file from ftp?

    ubyte[] file =3D
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file"); or ubyte[] file =3D Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;
5. I'd like an NNTP interface so I can get and post newsgroup messages
from D, though I suspect that is beyond the scope of etc.curl.

BTW: There seems to be a small problem with the API docs: Some members are not properly indented. For example, "statusLine();" which is a member of Result, is not indented. --=20 Johannes Pfau
Aug 24 2011
prev sibling next sibling parent reply Johannes Pfau <spam example.com> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

David Nadlinger wrote:
Now that Lars Kyllingstad's new and improved std.path has passed the=20
vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garcia,=

of the proposed logging module, is currently not available, the
etc.curl module by Jonas Drewsen is at the front of the review queue.
I have volunteered to run the formal process for inclusion with Phobos.

etc.curl provides a high-level interface to client-side networking=20
functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl=20
[1], accessing the C interface of which is already possible using
Phobos in the form of etc.c.curl. Prior stages of the work have
already been discussed in March [2], May [3], and June [4]. Thanks to
everybody who took the time to evaluate the library back then; as far
as I am aware, all raised concerns have been addressed since then.

Based on the amount of discussion this module already underwent, I=20
hereby propose a two-week review period. Barring any objections, the=20
review starts today (Aug 18) and ends on Aug 31, followed by a one
week vote (Sep 1=E2=80=89-=E2=80=89Sep 7).

Code:
   https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

API documentation:
   http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

A discussion on the wrapper has already been started on digitalmars.D=20
[5]. To keep everything in one place, please post your comments and=20
reviews there.

At this point, I would like to invite everyone to spend some time=20
testing the module and reading its documentation and code. It is=20
essential for a working open source code review process that many
people participate in it, and regardless of whether you are new to D
or a seasoned contributor, your opinion is very valuable.

Jonas also asked us to have a look at the list of known issues (which=20
can also be found at the top of the source file), and, if possible,=20
suggest a solution:

---
Known issues:
  * DDoc is not generated where the mixins ByLineAsync, ByLineSync,=20
ByChunkAsync and ByLineSync are used. This seems to be a limitation of=20
ddoc - suggestions on how to circumvent this appreciated.

Possible improvements:
  * Progress may be deprecated in the future. Maybe implement a
 replacement.
  * Support typed http headers - (Johannes Pfau)
---

To make sure it doesn't get lost, let me also repeat Johannes Pfau's=20
request to add proxy support from another thread ([6]) here.

Thanks a lot for your reviewing efforts,
David



[1] http://curl.haxx.se/libcurl/
[2]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_13175=

[3]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.ht=

[4]=20
http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two=

[5]=20
http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmars.=

[6]=20
http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalmars.=

+++++++++++++++++++++++++++ Code Review +++++++++++++++++++++++++ Comments and examples: /** Example: ---- curl.onReceive =3D (ubyte[] data) { writeln("Got data", cast(char[]) data); return data.length;}; ---- */ Maybe rewrite comments like this: /** * Example: * ---- * curl.onReceive =3D (ubyte[] data) { writeln("Got data", * cast(char[]) data); return data.length;}; * ---- */ This way, the examples don't break the indentation. I think this "misaligned" example blocks make the source harder to read. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152 bu --> by; thread --> threads https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148 What does the phobos style guide say for alias names? I'd argue outdata/indata is a type (or at least, looks like one), so --> OutData/InData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198 Do not call this from... --> Warning: Do not call this from... https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289 CurlOption.file --> CurlOption.writeData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326 Content of the delegate in the example should be idented. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345 CurlOption.infile --> CurlOption.readData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693 dltotal, dlnow, ultotal, ulnow --> dlTotal, dlNow, ulTotal, ulNow ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432 The C callbacks take char* pointers to the data buffers. As these could contain binary data, shouldn't ubyte* be used here? I know that the data is later cast to ubyte[], but I'm a little concerned about this pointer slicing: str[0..size*nmemb] . This creates a temporary char[] which could be invalid. Maybe not a big deal, but I think we should avoid that. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540 It's a little strange that such properties are write-only, but I know that's caused by a curl limitation, so nothing we can do about that. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576 CurlOption.intrface --> CurlOption.interface, likely also wrong in etc.c.curl https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584 int port --> ushort port? I see no need for negative values? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662 aften --> after https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817 shouldn't those be documented, so that the user knows which methods can be used on Http.Result/AsyncResult? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899 Maybe some mixin documentation should be added manually, something like "Mixes in Protocol" https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431 I think I've seen REST APIs which use POST without data or with parameters embedded in the URL. Is there no simple way to tell curl to _not add_ any data and _not set_ the Content-Type header? But I'm not even sure if a POST without a BODY is valid in HTTP. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500 Shouldn't delim be "&"? Also in the foreach loop, delim should be ignored for the first pair? foreach (i, key; params.byKey()) { if(i =3D=3D 0) data ~=3D key ~ "=3D" ~ params[key]; else data ~=3D delim ~ key ~ "=3D" ~ params[key]; } Also, is there a reason why the user must escape the parameters manually? As the user has no Curl handle, that could be annoying. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537 const(char)[] k =3D "Content-Type"; res.header(k, contentType); --> res.header("Content-Type", contentType); ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658 if ( _true_ ||!netAllowed) return; ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770 Add "; charset=3Dutf-8" ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789 IIRC it's possible to add parameter names to delegate types in D to document their use, so void delegate(const(char)[],const(char)[]) callback) --> void delegate(const(char)[] key,const(char)[] value) callback) https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798 if (header[0..5] =3D=3D "HTTP/") { --> if (header.length >=3D 5 && header[0..5] =3D=3D "HTTP/") { https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829 if --> of https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959 add a toString() method for debugging purposes? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221 document which functions call execute() ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283 why private? (*v) ~=3D value; --> (*v) ~=3D "," ~ value; ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308 Document that this still transfers all data into memory first. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643 Document that Result mixes in ProtocolRequestParamsSetters and RequestParamsSetters https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652 Add this bug and all other similar bugs to a ddoc BUG: section in the module comment, so this doesn't get lost. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943 Bounds checking: At least a assert((recipient|sender).length > 0); should be added. (I've skipped most of the async stuff, maybe someone else should have a look at that.) ++++++++++++++++++++++ API docs Review ++++++++++++++++++++++ First, huge example: Split this into many smaller code blocks, add some high-level overview, here's some inspiration in pseudo-ddoc, feel free to use this: etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP functions. There are convenience functions which can be used to make simple requests with a single line of code: ---------------------------------------- // Simple HTTP GET auto result =3D Http.get("http://www.google.com").toString(); ---------------------------------------- This makes a HTTP GET request to "http://www.google.com". These convenience functions are always _static_ functions of the HTTP / FTP / SMTP structs. They return a Result/AsyncResult struct which can be used to set options of the request and get the result as a string or ubyte[] array. ---------------------------------------- // Simple HTTP GET with timeout auto request =3D Http.get("http://www.google.com"); request.connectTimeout(dur!"seconds"(10)); auto result =3D request.toString(); ---------------------------------------- As connectTimeout and similar methods return the Result/AsyncResult instance, it's possible to chain those calls into a single line: ---------------------------------------- auto result =3D Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toStrin= g(); ---------------------------------------- Note that the request to the remote webserver is done when you call toString() or similar methods that internally call execute (see $(LINK )). If you don't call toString() or a similar method, no request is made! etc.curl additionally provides asynchronous versions of these convenience functions: Http.getAsync() is the asynchronous version that will spawn a thread in the background and return a Http.AsyncResult immediately. You can read data from the result at later point in time. This allows you to start processing data before all data has been received by using byChunk() or byLine() on the Http.AsyncResult instance. ---------------------------------------- // GET using an asynchronous range foreach (line; Http.getAsync("http://www.google.com").byLine()) { // Do some expensive processing on the line while more lines are // received asynchronously in the background. writeln("asyncLine: ", line); } ---------------------------------------- If you need more control than the convenience functions provide, you'll have to use the full API: ---------------------------------------- // GET with custom data receivers Http http =3D Http("http://www.google.com"); http.onReceiveHeader =3D (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~ value); }; http.onReceive =3D (ubyte[] data) { /+ drop +/ return data.length; }; http.perform(); ---------------------------------------- First, we create an instance of the reference-counted Http struct. We then set custom delegates which are called whenever the Http instance receives a header or a data buffer. In this simple example, we simply print the headers and ignore the data. See $(LINK onReceiveHeader/onReceive) for more information. We finally start the HTTP request by calling perform(). Note that you cannot combine the convenience get/post/put/... functions with this extended API: Although you could call http.get("http://some.url") instead of perform, this would completely ignore the settings made to the Http instance! Remember that get/post/put/etc are _static_ methods. They cannot access this instances data. If you want to make a different request than GET with the extended API, you have to set the method property: ---------------------------------------- // PUT with data senders string msg =3D "Hello world"; http.onSend =3D (void[] data) { if (msg.empty) return 0; auto m =3D cast(void[])msg; size_t length =3D m.length; //We assume msg fits into the data buffer, real code shouldn't do //that assert(data.length > msg.length); data[0..length] =3D m[0..$]; return length; }; http.method =3D Http.Method.put; http.contentLength =3D 11; // defaults to chunked transfer if not specified=20 http.perform(); ---------------------------------------- If you data doesn't fit into the buffer provided by onSend, onSend will be called multiple times. See $(LINK) for more information. The extended API has also functions to help you track the transfer's progress: ---------------------------------------- // Track progress http.method =3D Http.Method.get; http.url =3D "http://upload.wikimedia.org/wikipedia/commons/" "5/53/Wikipedia-logo-en-big.png"; http.onReceive =3D (ubyte[] data) { return data.length; }; //Discard data http.onProgress =3D (double dltotal, double dlnow, double ultotal, double ulnow) { writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ", ulnow); return 0; }; http.perform(); ---------------------------------------- As a last example, here's how to send an email with etc.curl: ---------------------------------------- // Send an email with SMTPS Smtp smtp =3D Smtp("smtps://smtp.gmail.com"); smtp.setAuthentication("from.addr gmail.com", "password"); smtp.mailTo =3D ["<to.addr gmail.com>"]; smtp.mailFrom =3D "<from.addr gmail.com>"; smtp.message =3D "Example Message"; smtp.perform(); ---------------------------------------- Some complete examples like these above could also be added to the Http/Ftp/Smtp struct documentation. You could use the examples I used in a reply to Walter, if you wanted to. I'd also say, at least in the examples, use braces on their own lines. And I'd insert a few empty lines in certain places, for example: ---------------------------------------- http.onSend =3D (void[] data) { if (msg.empty) return 0; auto m =3D cast(void[])msg; size_t length =3D m.length; data[0..length] =3D m[0..$]; msg.length =3D 0; return length; }; ---------------------------------------- doesn't look very good, I'd write it like this: ---------------------------------------- http.onSend =3D (void[] data) { if (msg.empty) return 0; auto m =3D cast(void[])msg; size_t length =3D m.length; //We assume msg fits into the data buffer, real code shouldn't do //that assert(data.length > msg.length); data[0..length] =3D m[0..$]; return length; }; ---------------------------------------- ++++++++++++++++++++++ Feature request ++++++++++++++++++++++ Although David already quoted my request in the initial post (thx David), I'll repeat it here: =20 Could you add proxy support to the wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is too new, that's not even in the current ubuntu) --=20 Johannes Pfau
Aug 24 2011
next sibling parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 24/08/11 14.58, Johannes Pfau wrote:
 David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  – congratulations, Lars! –, and Jose Armando Garcia, the author
 of the proposed logging module, is currently not available, the
 etc.curl module by Jonas Drewsen is at the front of the review queue.
 I have volunteered to run the formal process for inclusion with Phobos.

 etc.curl provides a high-level interface to client-side networking
 functionality for the HTTP, FTP and SMTP protocols by wrapping libcurl
 [1], accessing the C interface of which is already possible using
 Phobos in the form of etc.c.curl. Prior stages of the work have
 already been discussed in March [2], May [3], and June [4]. Thanks to
 everybody who took the time to evaluate the library back then; as far
 as I am aware, all raised concerns have been addressed since then.

 Based on the amount of discussion this module already underwent, I
 hereby propose a two-week review period. Barring any objections, the
 review starts today (Aug 18) and ends on Aug 31, followed by a one
 week vote (Sep 1 - Sep 7).

 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 A discussion on the wrapper has already been started on digitalmars.D
 [5]. To keep everything in one place, please post your comments and
 reviews there.

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or a seasoned contributor, your opinion is very valuable.

 Jonas also asked us to have a look at the list of known issues (which
 can also be found at the top of the source file), and, if possible,
 suggest a solution:

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation of
 ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
 replacement.
   * Support typed http headers - (Johannes Pfau)
 ---

 To make sure it doesn't get lost, let me also repeat Johannes Pfau's
 request to add proxy support from another thread ([6]) here.

 Thanks a lot for your reviewing efforts,
 David



 [1] http://curl.haxx.se/libcurl/
 [2]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_131753.html
 [3]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372.html
 [4]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_two_138945.html
 [5]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142661
 [6]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=142175

+++++++++++++++++++++++++++ Code Review +++++++++++++++++++++++++ Comments and examples: /** Example: ---- curl.onReceive = (ubyte[] data) { writeln("Got data", cast(char[]) data); return data.length;}; ---- */ Maybe rewrite comments like this: /** * Example: * ---- * curl.onReceive = (ubyte[] data) { writeln("Got data", * cast(char[]) data); return data.length;}; * ---- */ This way, the examples don't break the indentation. I think this "misaligned" example blocks make the source harder to read. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152 bu --> by; thread --> threads https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148 What does the phobos style guide say for alias names? I'd argue outdata/indata is a type (or at least, looks like one), so --> OutData/InData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198 Do not call this from... --> Warning: Do not call this from... https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289 CurlOption.file --> CurlOption.writeData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326 Content of the delegate in the example should be idented. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345 CurlOption.infile --> CurlOption.readData ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693 dltotal, dlnow, ultotal, ulnow --> dlTotal, dlNow, ulTotal, ulNow ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432 The C callbacks take char* pointers to the data buffers. As these could contain binary data, shouldn't ubyte* be used here? I know that the data is later cast to ubyte[], but I'm a little concerned about this pointer slicing: str[0..size*nmemb] . This creates a temporary char[] which could be invalid. Maybe not a big deal, but I think we should avoid that. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540 It's a little strange that such properties are write-only, but I know that's caused by a curl limitation, so nothing we can do about that. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576 CurlOption.intrface --> CurlOption.interface, likely also wrong in etc.c.curl https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584 int port --> ushort port? I see no need for negative values? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645 indent https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662 aften --> after https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817 shouldn't those be documented, so that the user knows which methods can be used on Http.Result/AsyncResult? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899 Maybe some mixin documentation should be added manually, something like "Mixes in Protocol" https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431 I think I've seen REST APIs which use POST without data or with parameters embedded in the URL. Is there no simple way to tell curl to _not add_ any data and _not set_ the Content-Type header? But I'm not even sure if a POST without a BODY is valid in HTTP. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500 Shouldn't delim be "&"? Also in the foreach loop, delim should be ignored for the first pair? foreach (i, key; params.byKey()) { if(i == 0) data ~= key ~ "=" ~ params[key]; else data ~= delim ~ key ~ "=" ~ params[key]; } Also, is there a reason why the user must escape the parameters manually? As the user has no Curl handle, that could be annoying. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537 const(char)[] k = "Content-Type"; res.header(k, contentType); --> res.header("Content-Type", contentType); ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658 if ( _true_ ||!netAllowed) return; ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770 Add "; charset=utf-8" ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789 IIRC it's possible to add parameter names to delegate types in D to document their use, so void delegate(const(char)[],const(char)[]) callback) --> void delegate(const(char)[] key,const(char)[] value) callback) https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798 if (header[0..5] == "HTTP/") { --> if (header.length>= 5&& header[0..5] == "HTTP/") { https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829 if --> of https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959 add a toString() method for debugging purposes? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221 document which functions call execute() ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283 why private? (*v) ~= value; --> (*v) ~= "," ~ value; ? https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308 Document that this still transfers all data into memory first. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643 Document that Result mixes in ProtocolRequestParamsSetters and RequestParamsSetters https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652 Add this bug and all other similar bugs to a ddoc BUG: section in the module comment, so this doesn't get lost. https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943 Bounds checking: At least a assert((recipient|sender).length> 0); should be added. (I've skipped most of the async stuff, maybe someone else should have a look at that.) ++++++++++++++++++++++ API docs Review ++++++++++++++++++++++ First, huge example: Split this into many smaller code blocks, add some high-level overview, here's some inspiration in pseudo-ddoc, feel free to use this: etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP functions. There are convenience functions which can be used to make simple requests with a single line of code: ---------------------------------------- // Simple HTTP GET auto result = Http.get("http://www.google.com").toString(); ---------------------------------------- This makes a HTTP GET request to "http://www.google.com". These convenience functions are always _static_ functions of the HTTP / FTP / SMTP structs. They return a Result/AsyncResult struct which can be used to set options of the request and get the result as a string or ubyte[] array. ---------------------------------------- // Simple HTTP GET with timeout auto request = Http.get("http://www.google.com"); request.connectTimeout(dur!"seconds"(10)); auto result = request.toString(); ---------------------------------------- As connectTimeout and similar methods return the Result/AsyncResult instance, it's possible to chain those calls into a single line: ---------------------------------------- auto result = Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString(); ---------------------------------------- Note that the request to the remote webserver is done when you call toString() or similar methods that internally call execute (see $(LINK )). If you don't call toString() or a similar method, no request is made! etc.curl additionally provides asynchronous versions of these convenience functions: Http.getAsync() is the asynchronous version that will spawn a thread in the background and return a Http.AsyncResult immediately. You can read data from the result at later point in time. This allows you to start processing data before all data has been received by using byChunk() or byLine() on the Http.AsyncResult instance. ---------------------------------------- // GET using an asynchronous range foreach (line; Http.getAsync("http://www.google.com").byLine()) { // Do some expensive processing on the line while more lines are // received asynchronously in the background. writeln("asyncLine: ", line); } ---------------------------------------- If you need more control than the convenience functions provide, you'll have to use the full API: ---------------------------------------- // GET with custom data receivers Http http = Http("http://www.google.com"); http.onReceiveHeader = (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~ value); }; http.onReceive = (ubyte[] data) { /+ drop +/ return data.length; }; http.perform(); ---------------------------------------- First, we create an instance of the reference-counted Http struct. We then set custom delegates which are called whenever the Http instance receives a header or a data buffer. In this simple example, we simply print the headers and ignore the data. See $(LINK onReceiveHeader/onReceive) for more information. We finally start the HTTP request by calling perform(). Note that you cannot combine the convenience get/post/put/... functions with this extended API: Although you could call http.get("http://some.url") instead of perform, this would completely ignore the settings made to the Http instance! Remember that get/post/put/etc are _static_ methods. They cannot access this instances data. If you want to make a different request than GET with the extended API, you have to set the method property: ---------------------------------------- // PUT with data senders string msg = "Hello world"; http.onSend = (void[] data) { if (msg.empty) return 0; auto m = cast(void[])msg; size_t length = m.length; //We assume msg fits into the data buffer, real code shouldn't do //that assert(data.length> msg.length); data[0..length] = m[0..$]; return length; }; http.method = Http.Method.put; http.contentLength = 11; // defaults to chunked transfer if not specified http.perform(); ---------------------------------------- If you data doesn't fit into the buffer provided by onSend, onSend will be called multiple times. See $(LINK) for more information. The extended API has also functions to help you track the transfer's progress: ---------------------------------------- // Track progress http.method = Http.Method.get; http.url = "http://upload.wikimedia.org/wikipedia/commons/" "5/53/Wikipedia-logo-en-big.png"; http.onReceive = (ubyte[] data) { return data.length; }; //Discard data http.onProgress = (double dltotal, double dlnow, double ultotal, double ulnow) { writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ", ulnow); return 0; }; http.perform(); ---------------------------------------- As a last example, here's how to send an email with etc.curl: ---------------------------------------- // Send an email with SMTPS Smtp smtp = Smtp("smtps://smtp.gmail.com"); smtp.setAuthentication("from.addr gmail.com", "password"); smtp.mailTo = ["<to.addr gmail.com>"]; smtp.mailFrom = "<from.addr gmail.com>"; smtp.message = "Example Message"; smtp.perform(); ---------------------------------------- Some complete examples like these above could also be added to the Http/Ftp/Smtp struct documentation. You could use the examples I used in a reply to Walter, if you wanted to. I'd also say, at least in the examples, use braces on their own lines. And I'd insert a few empty lines in certain places, for example: ---------------------------------------- http.onSend = (void[] data) { if (msg.empty) return 0; auto m = cast(void[])msg; size_t length = m.length; data[0..length] = m[0..$]; msg.length = 0; return length; }; ---------------------------------------- doesn't look very good, I'd write it like this: ---------------------------------------- http.onSend = (void[] data) { if (msg.empty) return 0; auto m = cast(void[])msg; size_t length = m.length; //We assume msg fits into the data buffer, real code shouldn't do //that assert(data.length> msg.length); data[0..length] = m[0..$]; return length; }; ---------------------------------------- ++++++++++++++++++++++ Feature request ++++++++++++++++++++++ Although David already quoted my request in the initial post (thx David), I'll repeat it here: Could you add proxy support to the wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is too new, that's not even in the current ubuntu)

Thank you. Will chew through it slowly. /Jonas
Aug 25 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/25/2011 3:07 AM, Jonas Drewsen wrote:
 Thank you. Will chew through it slowly.

This is a meta comment for everyone - when replying to a long post, please quote only what is necessary. It makes navigating these threads easier.
Aug 25 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 25-08-2011 18:49, Walter Bright skrev:
 quote only what is necessary

Aug 25 2011
prev sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 24/08/11 16.34, Johannes Pfau wrote:
 Damn, I forgot the most important part:

 etc.curl currently uses none of D's "advanced" function attributes.
 This means that etc.curl can't be used in safeD for example. IMHO
 that's a big problem: New phobos code should be usable in safeD it's
 bad enough that lots of old code isn't.

 So:
  safe or  trusted needs to be added to almost all functions in etc.curl.

 'const' (as in const functions, not const arguments) probably doesn't
 make sense for a wrapper, so that's not needed.

 There are some functions which could be 'pure': escape&  unescape for
 example

Actually libcurl just recently changed their escape API to require a curl handle. This means we cannot make it pure I guess.
 Seems like most functions can throw, but maybe some are 'nothrow'.

Will check.
Aug 25 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Damn, I forgot the most important part:

etc.curl currently uses none of D's "advanced" function attributes.
This means that etc.curl can't be used in safeD for example. IMHO
that's a big problem: New phobos code should be usable in safeD it's
bad enough that lots of old code isn't.

So:
 safe or  trusted needs to be added to almost all functions in etc.curl.

'const' (as in const functions, not const arguments) probably doesn't
make sense for a wrapper, so that's not needed.

There are some functions which could be 'pure': escape & unescape for
example

Seems like most functions can throw, but maybe some are 'nothrow'.

-- 
Johannes Pfau
Aug 24 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Walter Bright wrote:
On 8/24/2011 2:18 AM, Johannes Pfau wrote:
 Walter Bright wrote:
 On 8/17/2011 4:12 PM, David Nadlinger wrote:
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue.

Thanks for doing this. I preface my remarks with saying I have never done http: programming and know little about it. 1. The first, most obvious thing I'd like to do is present a URL and get a string back that is the contents of that web page: string contents = getURL("http://www.digitalmars.com");

That's the first example in the API docs: // Simple GET with connect timeout of 10 seconds Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

Ok, but 1. there's no indication in the doc that that's what it does because 1. it is an incomplete fragment and 2. a simpler interface (like what I proposed) would likely be better.

I agree, I've included some suggestions on the API docs in my review comment. I think the big code example should be split into many smaller examples with better explanations.
 or, if you don't care about timeouts:
 string contents = Http.get("http://www.google.com").toString();

This example needs to be first and needs an explanation. Timeouts, error recovery, progress indicators, etc., should all be brought up later.

I completely agree.
 and contents[] should be the web page; what I'd see if I opened it
 in a browser and selected "view source". From perusing the etc.curl
 doc I have absolutely no idea how to do this.

 2. I'd like to get the contents as an input range, which I can then
 iterate like any other range.

auto res = Http.getAsync("http://www.digitalmars.com"); auto range = res.byChunk(100); writeln(range.front); range.popFront(); writeln(range.front); There's only a byChunk for synchronous requests, but I think this first reads the complete Result into memory and then returns slices. (This is a limitation caused by curl's API, as curl uses callbacks)

Ok.

I guess you already figured that out)
 3. A lot of web sites like twitter.com have an API. How do I access
 that API from D?

The twitter API is a REST API, so if you want to get information from twitter, you use a GET request with a special url: string xmlTimeLine = Http.get("http://twitter.com/statuses/user_timeline.xml?id=walterbright").toString(); Posting is a little more complicated, as it requires authentication: ubyte[] result; Http twitter = Http("http://api.twitter.com/1/statuses/update.format?status=Hello%20World"); twitter.setAuthentication("user", "password"); twitter.onReceive = (ubyte[] data) { result ~= data; return data.length; }; twitter.method = Http.Method.post; twitter.perform(); (Status should be url encoded, Http.escape can be used for that) Not sure, if this still works though. Twitter is switching from HTTP basic authentication to OAuth. Also, this code is not tested ;-)

I'm glad to see this can be done. A complete example in the documentation would be very informative and useful.

Someone with a twitter account should test it and then it could be added as an example. BTW: Thinking about it this should also be possible with the 'simple' API: auto result = Http.post(http://api.twitter.com/1/statuses/update.format, ["status" : "Hello%20World"]).authentication("user", "password").toString();
 4. How do I download a file from ftp?

     ubyte[] file =
 downloadFile("ftp://www.digitalmars.com/whatever.zip");

Ftp.get("ftp://www.digitalmars.com/whatever.zip","/tmp/downloaded-file"); or ubyte[] file = Ftp.get("ftp://www.digitalmars.com/whatever.zip").bytes;

Great! Please add to the documentation. I also think that etc.curl would benefit greatly from an article written about it showing how to use it to accomplish various fun and simple tasks.

-- Johannes Pfau
Aug 24 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Jonas Drewsen wrote:
On 24/08/11 16.34, Johannes Pfau wrote:
 Damn, I forgot the most important part:

 etc.curl currently uses none of D's "advanced" function attributes.
 This means that etc.curl can't be used in safeD for example. IMHO
 that's a big problem: New phobos code should be usable in safeD it's
 bad enough that lots of old code isn't.

 So:
  safe or  trusted needs to be added to almost all functions in
 etc.curl.

 'const' (as in const functions, not const arguments) probably doesn't
 make sense for a wrapper, so that's not needed.

 There are some functions which could be 'pure': escape&  unescape for
 example

Actually libcurl just recently changed their escape API to require a curl handle. This means we cannot make it pure I guess.

I have to admit I'm not very familiar with the 'pure' feature. I think it's possible to have pure member functions, but I'm not sure about the details.
 Seems like most functions can throw, but maybe some are 'nothrow'.

Will check.

-- Johannes Pfau
Aug 25 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Jonas Drewsen wrote:
On 24/08/11 14.58, Johannes Pfau wrote:
 David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote  =E2=80=93 congratulations, Lars! =E2=80=93, and Jose Armando Garc=



 author of the proposed logging module, is currently not available,
 the etc.curl module by Jonas Drewsen is at the front of the review
 queue. I have volunteered to run the formal process for inclusion
 with Phobos.

 etc.curl provides a high-level interface to client-side networking
 functionality for the HTTP, FTP and SMTP protocols by wrapping
 libcurl [1], accessing the C interface of which is already possible
 using Phobos in the form of etc.c.curl. Prior stages of the work
 have already been discussed in March [2], May [3], and June [4].
 Thanks to everybody who took the time to evaluate the library back
 then; as far as I am aware, all raised concerns have been addressed
 since then.

 Based on the amount of discussion this module already underwent, I
 hereby propose a two-week review period. Barring any objections, the
 review starts today (Aug 18) and ends on Aug 31, followed by a one
 week vote (Sep 1=E2=80=89-=E2=80=89Sep 7).

 Code:
    https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d

 API documentation:
    http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 A discussion on the wrapper has already been started on
 digitalmars.D [5]. To keep everything in one place, please post
 your comments and reviews there.

 At this point, I would like to invite everyone to spend some time
 testing the module and reading its documentation and code. It is
 essential for a working open source code review process that many
 people participate in it, and regardless of whether you are new to D
 or a seasoned contributor, your opinion is very valuable.

 Jonas also asked us to have a look at the list of known issues
 (which can also be found at the top of the source file), and, if
 possible, suggest a solution:

 ---
 Known issues:
   * DDoc is not generated where the mixins ByLineAsync, ByLineSync,
 ByChunkAsync and ByLineSync are used. This seems to be a limitation
 of ddoc - suggestions on how to circumvent this appreciated.

 Possible improvements:
   * Progress may be deprecated in the future. Maybe implement a
 replacement.
   * Support typed http headers - (Johannes Pfau)
 ---

 To make sure it doesn't get lost, let me also repeat Johannes Pfau's
 request to add proxy support from another thread ([6]) here.

 Thanks a lot for your reviewing efforts,
 David



 [1] http://curl.haxx.se/libcurl/
 [2]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_support_RFC_13=



 [3]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_136372=



 [4]
 http://www.digitalmars.com/d/archives/digitalmars/D/Curl_wrapper_round_=



 [5]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=



 [6]
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=3Ddigitalma=



 +++++++++++++++++++++++++++   Code Review
 +++++++++++++++++++++++++

 Comments and examples:
      /**
         Example:
         ----
 curl.onReceive =3D (ubyte[] data) { writeln("Got data", cast(char[])
         data); return data.length;};
         ----
      */
 Maybe rewrite comments like this:
      /**
       * Example:
       * ----
       * curl.onReceive =3D (ubyte[] data) { writeln("Got data",
       * cast(char[]) data); return data.length;};
       * ----
       */
 This way, the examples don't break the indentation. I think this
 "misaligned" example blocks make the source harder to read.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L152
 bu -->  by; thread -->  threads

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L148
 What does the phobos style guide say for alias names? I'd argue
 outdata/indata is a type (or at least, looks like one), so -->
 OutData/InData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L198
 Do not call this from... -->  Warning: Do not call this from...

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L289
 CurlOption.file -->  CurlOption.writeData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L326
 Content of the delegate in the example should be idented.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L345
 CurlOption.infile -->  CurlOption.readData ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L362
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L414
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L420
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L693
 dltotal, dlnow, ultotal, ulnow -->  dlTotal, dlNow, ulTotal, ulNow ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L432
 The C callbacks take char* pointers to the data buffers. As these
 could contain binary data, shouldn't ubyte* be used here? I know that
 the data is later cast to ubyte[], but I'm a little concerned about
 this pointer slicing: str[0..size*nmemb] . This creates a temporary
 char[] which could be invalid. Maybe not a big deal, but I think we
 should avoid that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L540
 It's a little strange that such properties are write-only, but I know
 that's caused by a curl limitation, so nothing we can do about that.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L576
 CurlOption.intrface -->  CurlOption.interface, likely also wrong in
 etc.c.curl

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L595
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L584
 int port -->  ushort port? I see no need for negative values?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L645
 indent

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L662
 aften -->  after

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L739
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L779
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L817
 shouldn't those be documented, so that the user knows which methods
 can be used on Http.Result/AsyncResult?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1176
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2464
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2899
 Maybe some mixin documentation should be added manually, something
 like "Mixes in Protocol"

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1431
 I think I've seen REST APIs which use POST without data or with
 parameters embedded in the URL. Is there no simple way to tell curl
 to _not add_ any data and _not set_ the Content-Type header? But I'm
 not even sure if a POST without a BODY is valid in HTTP.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1432
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1500
 Shouldn't delim be "&"?
 Also in the foreach loop, delim should be ignored for the first pair?
        foreach (i, key; params.byKey()) {
              if(i =3D=3D 0)
                  data ~=3D key ~ "=3D" ~ params[key];
              else
                  data ~=3D delim ~ key ~ "=3D" ~ params[key];
          }

 Also, is there a reason why the user must escape the parameters
 manually? As the user has no Curl handle, that could be annoying.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1537
        const(char)[] k =3D "Content-Type";
        res.header(k, contentType);
 -->  res.header("Content-Type", contentType); ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1658
 if ( _true_ ||!netAllowed) return;  ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1770
 Add "; charset=3Dutf-8" ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1789
 IIRC it's possible to add parameter names to delegate types in D
 to document their use, so
 void delegate(const(char)[],const(char)[]) callback)
 -->  void delegate(const(char)[] key,const(char)[] value) callback)

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1798
 if (header[0..5] =3D=3D "HTTP/") {
 -->
 if (header.length>=3D 5&&  header[0..5] =3D=3D "HTTP/") {

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1829
 if -->  of

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L1959
 add a toString() method for debugging purposes?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2221
 document which functions call execute() ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2283
 why private?
 (*v) ~=3D value; -->  (*v) ~=3D "," ~ value; ?

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2327
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2308
 Document that this still transfers all data into memory first.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2643
 Document that Result mixes in ProtocolRequestParamsSetters and
 RequestParamsSetters

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2652
 Add this bug and all other similar bugs to a ddoc BUG: section in the
 module comment, so this doesn't get lost.

 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2932
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d#L2943
 Bounds checking: At least a assert((recipient|sender).length>  0);
 should be added.

 (I've skipped most of the async stuff, maybe someone else should
 have a look at that.)

 ++++++++++++++++++++++   API docs Review   ++++++++++++++++++++++

 First, huge example:
 Split this into many smaller code blocks, add some high-level
 overview, here's some inspiration in pseudo-ddoc, feel free to use
 this:

 etc.curl provides wrappers for $(LINK curls) HTTP, FTP and SMTP
 functions.

 There are convenience functions which can be used to make simple
 requests with a single line of code:
 ----------------------------------------
 // Simple HTTP GET
 auto result =3D Http.get("http://www.google.com").toString();
 ----------------------------------------
 This makes a HTTP GET request to "http://www.google.com". These
 convenience functions are always _static_ functions of the HTTP /
 FTP / SMTP structs. They return a Result/AsyncResult struct which can
 be used to set options of the request and get the result as a string
 or ubyte[] array.
 ----------------------------------------
 // Simple HTTP GET with timeout
 auto request =3D Http.get("http://www.google.com");
 request.connectTimeout(dur!"seconds"(10));
 auto result =3D request.toString();
 ----------------------------------------
 As connectTimeout and similar methods return the Result/AsyncResult
 instance, it's possible to chain those calls into a single line:
 ----------------------------------------
 auto result =3D
 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toSt=


 ----------------------------------------
 Note that the request to the remote webserver is done when you call
 toString() or similar methods that internally call execute (see
 $(LINK )). If you don't call toString() or a similar method, no
 request is made!

 etc.curl additionally provides asynchronous versions of these
 convenience functions: Http.getAsync() is the asynchronous version
 that will spawn a thread in the background and return a
 Http.AsyncResult immediately. You can read data from the result at
 later point in time. This allows you to start processing data before
 all data has been received by using byChunk() or byLine() on the
 Http.AsyncResult instance.
 ----------------------------------------
 // GET using an asynchronous range
 foreach (line; Http.getAsync("http://www.google.com").byLine())
 {
      // Do some expensive processing on the line while more lines are
      // received asynchronously in the background.
      writeln("asyncLine: ", line);
 }
 ----------------------------------------

 If you need more control than the convenience functions provide,
 you'll have to use the full API:
 ----------------------------------------
 // GET with custom data receivers
 Http http =3D Http("http://www.google.com");
 http.onReceiveHeader =3D
      (const(char)[] key, const(char)[] value) { writeln(key ~ ": " ~
 value); };
 http.onReceive =3D (ubyte[] data) { /+ drop +/ return
 data.length; };
 http.perform();
 ----------------------------------------
 First, we create an instance of the reference-counted Http struct. We
 then set custom delegates which are called whenever the Http instance
 receives a header or a data buffer. In this simple example, we simply
 print the headers and ignore the data. See $(LINK
 onReceiveHeader/onReceive) for more information.
 We finally start the HTTP request by calling perform(). Note that you
 cannot combine the convenience get/post/put/... functions with this
 extended API: Although you could call http.get("http://some.url")
 instead of perform, this would completely ignore the settings made to
 the Http instance! Remember that get/post/put/etc are _static_
 methods. They cannot access this instances data.

 If you want to make a different request than GET with the extended
 API, you have to set the method property:
 ----------------------------------------
 // PUT with data senders
 string msg =3D "Hello world";
 http.onSend =3D (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m =3D cast(void[])msg;
      size_t length =3D m.length;

      //We assume msg fits into the data buffer, real code shouldn't
 do //that
      assert(data.length>  msg.length);
      data[0..length] =3D m[0..$];
      return length;
 };
 http.method =3D Http.Method.put;
 http.contentLength =3D 11; // defaults to chunked transfer if not
 specified
 http.perform();
 ----------------------------------------
 If you data doesn't fit into the buffer provided by onSend, onSend
 will be called multiple times. See $(LINK) for more information.

 The extended API has also functions to help you track the transfer's
 progress:
 ----------------------------------------
 // Track progress
 http.method =3D Http.Method.get;
 http.url =3D "http://upload.wikimedia.org/wikipedia/commons/"
             "5/53/Wikipedia-logo-en-big.png";
 http.onReceive =3D (ubyte[] data) { return data.length; }; //Discard
 data http.onProgress =3D (double dltotal, double dlnow,
                     double ultotal, double ulnow) {
      writeln("Progress ", dltotal, ", ", dlnow, ", ", ultotal, ", ",
 ulnow); return 0;
 };
 http.perform();
 ----------------------------------------

 As a last example, here's how to send an email with etc.curl:
 ----------------------------------------
 // Send an email with SMTPS
 Smtp smtp =3D Smtp("smtps://smtp.gmail.com");
 smtp.setAuthentication("from.addr gmail.com", "password");
 smtp.mailTo =3D ["<to.addr gmail.com>"];
 smtp.mailFrom =3D "<from.addr gmail.com>";
 smtp.message =3D "Example Message";
 smtp.perform();
 ----------------------------------------

 Some complete examples like these above could also be added to the
 Http/Ftp/Smtp struct documentation. You could use the examples I used
 in a reply to Walter, if you wanted to.

 I'd also say, at least in the examples, use braces on their own
 lines. And I'd insert a few empty lines in certain places, for
 example: ----------------------------------------
 http.onSend =3D (void[] data) {
      if (msg.empty) return 0;
      auto m =3D cast(void[])msg;
      size_t length =3D m.length;
      data[0..length] =3D m[0..$];
      msg.length =3D 0;
      return length;
 };
 ----------------------------------------

 doesn't look very good, I'd write it like this:
 ----------------------------------------
 http.onSend =3D (void[] data)
 {
      if (msg.empty)
          return 0;

      auto m =3D cast(void[])msg;
      size_t length =3D m.length;

      //We assume msg fits into the data buffer, real code shouldn't
 do //that
      assert(data.length>  msg.length);
      data[0..length] =3D m[0..$];
      return length;
 };
 ----------------------------------------

 ++++++++++++++++++++++   Feature request   ++++++++++++++++++++++

 Although David already quoted my request in the initial post (thx
 David), I'll repeat it here:

 Could you add proxy support to the
 wrapper? All of CURLOPT_PROXY CURLOPT_PROXYPORT and CURLOPT_PROXYTYPE
 should be available. (The 'combined' CURLOPT_PROXY since 7.21.7 is
 too new, that's not even in the current ubuntu)

Thank you. Will chew through it slowly. /Jonas

Thanks, I know it's a lot, but most things are small cosmetic changes or typos or similar. --=20 Johannes Pfau
Aug 25 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/17/11 6:12 PM, David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

My review of etc.curl follows. ================ Overall ================ The library is comprehensive but does not cater to frequent use cases. It could be metaphorically compared with a remote control with equally-sized and spaced buttons, as opposed to one with enlarged frequent use buttons. There are two prominent use cases: 1. Given a URL string gimme the content (binary or string). The URL could be even "file://". 2. Given a URL as in (1) gimme a method to go through its content one chunk or one line at a time. The method should be efficient (i.e. support asynchrony transparently such that client code and downloading code don't block each other). I see these cases together covering a vast majority of use cases; (1) would be for quick and dirty scripts, (2) would be for applications that do casual networking. The only apps that need advanced APIs would be those for which networking is a central feature. Neither of these use cases is covered adequately by the proposed API. Furthermore, the documentation is also inadequate. Description is scarce, cross-references are missing, and examples are poor and scarce. A major overhaul is needed for acceptance in Phobos. Finally, I found a couple of egregious bugs related to buffer overflow, both in the examples and the implementation. There should be significant scrutiny of such. Ideally we'd frame these issues as an API design problem that we may be able to design around. My vote is contingent upon fixing these large issues. Below are more details along with a variety of smaller comments. ================ Documentation ================ * Should be std.curl. * "Curl client functionality as provided by libcurl." -> "Networking client functionality as provided by libcurl." * Code inlined within text should use $(D ...). * The first example is meaningless: Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString(); It gets a string but throws it away. At best it's a test that google.com is up (it would throw otherwise). * "Http http = Http("http://www.google.com");" -> "auto http = Http("http://www.google.com");" No stuttering please. Also Http's constructor should imply "http://" if not present, so "auto http = Http("www.google.com");" * "foreach (line; Http.getAsync("http://www.google.com").byLine())" is rather verbose considering it may be the most frequent use pattern. I was hoping for e.g. "foreach (line; Http.byLine("google.com"))" * This comment is inaccurate: // Do some expensive processing on the line while more lines are // received asynchronously in the background. The idea of async in here is seldom to do expensive computation, but rather to offer incremental processing and output (and sometimes to stop early). * In the "put with data senders" it's unclear what the length of data is and what happens if that length is shorter than msg.length. Setting contentLength to 11 is awkward. Surely there must be an easier API for sending some string up the wire. * "Smtp smtp = Smtp("smtps://smtp.gmail.com");" -> "auto ..." * "This documentation should really be in the Http..." -> we need to find a solution to this prior to approval. Perhaps version(StdDdoc) may be of help. * Documentation for individual symbol lacks examples. For example, one has no idea what requestPause is about, and there's no cross-reference sending to the function that uses or returns it. Each symbol or symbol group should have examples. * Documentation is extremely scarce and uses terms without defining them. It's reasonable to assume what escape() and unescape() do in such a module, but for example cleanup() says "Stop and invalidate this instance." which gives no indication of the meaning of invalidation. Also, isStopped() checks whether the object is "stopped and invalidated" but effecting stopping and invalidation is not done by stop(), but by cleanup(). * verbose, dataTimeout etc. don't have corresponding properties for reading. * netInterface takes a string but not found ubytes (or an IP struct). this means someone who has the IP in another format must format it as a string first. * localPort() should take a ushort, not an int. * Phrases such as "tcp nodelay socket option" should link to relevant off-site documentation or a glossary entry. * onSend example is unindented, uses "l" as a symbol, and is liable to buffer overflow, which is, to be brutally honest, appalling. * The example for onReceive should use to!(const(char)[]) instead of the cast, such that the UTF validity is checked. * Since neither value in onProgress can be fractional, the type should be size_t or ulong. * Due to probably a bug in ddoc, the members of struct Http are not indented. * The informal examples given with addHeader and setHeader should also be supported by code examples. * Example for head() misses a semicolon. * Documentation for post() makes it easy to overlook the important distinction between the two overloads. The documentation should explain how one is meant for text upload and the other for binary upload. * optionsAsync has an extra paren. * The various HTTP primitives wrapped (del, connect, ...) should include links to their definition in the documentation. * The dox for the two postData() overloads should be merged. * Until the documentation for perform() comes about towards the end of the dox, it's unclear that a lot of work is done by it. So someone who wants to understand the library without having first learned curl would need to infer this fact from the scarce examples. * Again, many properties have setters but not getters (e.g. maxRedirects). * The word "simply" is overused. ================ Implementation ================ * We usually import modules in alphabetical order. * The frequent pattern if (!expr) throw new CurlException(msg); should be replaced by enforce(expr, new CurlException(msg)); * throwOnStopped should accept a defaulted string. There are several instances of the pattern if (stopped) throw new CurlException("..."); * I think we can in the future replace curl_easy_(un)escape with more efficient native functions. * This kind of stuff is unsafe: cast(char*)(username ~ ":" ~ password) because the result of the concatenation is not necessarily zero-terminated. Please make a pass through all cast instances making sure the code appends a \0. * Some lines are long enough to not be visible in git's code viewer. * More buffer overflow opportunities, e.g. in line 1798: "if (header[0..5] == "HTTP/")" should be "if (header.startsWith("HTTP/"))" Andrei
Aug 29 2011
next sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 29-08-2011 20:55, Andrei Alexandrescu skrev:
 On 8/17/11 6:12 PM, David Nadlinger wrote:
 Now that Lars Kyllingstad's new and improved std.path has passed the
 vote – congratulations, Lars! –, and Jose Armando Garcia, the author of
 the proposed logging module, is currently not available, the etc.curl
 module by Jonas Drewsen is at the front of the review queue. I have
 volunteered to run the formal process for inclusion with Phobos.

My review of etc.curl follows. ================ Overall ================ The library is comprehensive but does not cater to frequent use cases. It could be metaphorically compared with a remote control with equally-sized and spaced buttons, as opposed to one with enlarged frequent use buttons.

Others have mentioned this as well. I'll try to improve.
 Neither of these use cases is covered adequately by the proposed API.

 Furthermore, the documentation is also inadequate. Description is
 scarce, cross-references are missing, and examples are poor and scarce.
 A major overhaul is needed for acceptance in Phobos.

I guess I'm assuming that the user knows something about curl. Will have a look at it.
 ================
 Documentation
 ================

 * Should be std.curl.

 * "Curl client functionality as provided by libcurl." -> "Networking
 client functionality as provided by libcurl."

 * Code inlined within text should use $(D ...).

 * The first example is meaningless:

 Http.get("http://www.google.com").connectTimeout(dur!"seconds"(10)).toString();

 * "Http http = Http("http://www.google.com");" -> "auto http =
 Http("http://www.google.com");" No stuttering please. Also Http's
 constructor should imply "http://" if not present, so "auto http =
 Http("www.google.com");"

 * "foreach (line; Http.getAsync("http://www.google.com").byLine())" is
 rather verbose considering it may be the most frequent use pattern. I
 was hoping for e.g. "foreach (line; Http.byLine("google.com"))"

I could easily create a convenience Http.byLine("google.com") method that simply wraps Http.get("google.com").byLine(). I think that keeping the Http.get("google.com") method is still important since it allows you to change the request settings before calling byLine(). For example: Http.get("google.com").connectTimeout(dur!"seconds"(10)).byLine();
 * This comment is inaccurate:

 // Do some expensive processing on the line while more lines are
 // received asynchronously in the background.

 The idea of async in here is seldom to do expensive computation, but
 rather to offer incremental processing and output (and sometimes to stop
 early).

 * In the "put with data senders" it's unclear what the length of data is
 and what happens if that length is shorter than msg.length. Setting
 contentLength to 11 is awkward. Surely there must be an easier API for
 sending some string up the wire.

curl want to be filled with data to send. The length of data is how large that buffer is of course. As specified it is only necessary to set the content length if you do not want to used chunked transfer. Most often you wouldn't care. Maybe I should just remove the contentLength from the example.
 * "Smtp smtp = Smtp("smtps://smtp.gmail.com");" -> "auto ..."

 * "This documentation should really be in the Http..." -> we need to
 find a solution to this prior to approval. Perhaps version(StdDdoc) may
 be of help.

 * Documentation for individual symbol lacks examples. For example, one
 has no idea what requestPause is about, and there's no cross-reference
 sending to the function that uses or returns it. Each symbol or symbol
 group should have examples.

 * Documentation is extremely scarce and uses terms without defining
 them. It's reasonable to assume what escape() and unescape() do in such
 a module, but for example cleanup() says "Stop and invalidate this
 instance." which gives no indication of the meaning of invalidation.
 Also, isStopped() checks whether the object is "stopped and invalidated"
 but effecting stopping and invalidation is not done by stop(), but by
 cleanup().

 * verbose, dataTimeout etc. don't have corresponding properties for
 reading.

Libcurl only has a function for setting options and not one for getting them. This means that I would have to keep a copy of the setting myself just to make it readable which i judged was not the way to go. This is why they are read only. This is one of the reasons why I see a wrapper for libcurl as just temporary solution to a native networking library i D.
 * netInterface takes a string but not found ubytes (or an IP struct).
 this means someone who has the IP in another format must format it as a
 string first.

the netInterface to accept ubyte or an IP struct as well.
 * localPort() should take a ushort, not an int.

 * Phrases such as "tcp nodelay socket option" should link to relevant
 off-site documentation or a glossary entry.

 * onSend example is unindented, uses "l" as a symbol, and is liable to
 buffer overflow, which is, to be brutally honest, appalling.

 * The example for onReceive should use to!(const(char)[]) instead of the
 cast, such that the UTF validity is checked.

 * Since neither value in onProgress can be fractional, the type should
 be size_t or ulong.

provides. But I will fix it.
 * Due to probably a bug in ddoc, the members of struct Http are not
 indented.

 * The informal examples given with addHeader and setHeader should also
 be supported by code examples.

 * Example for head() misses a semicolon.

 * Documentation for post() makes it easy to overlook the important
 distinction between the two overloads. The documentation should explain
 how one is meant for text upload and the other for binary upload.

 * optionsAsync has an extra paren.

 * The various HTTP primitives wrapped (del, connect, ...) should include
 links to their definition in the documentation.

 * The dox for the two postData() overloads should be merged.

 * Until the documentation for perform() comes about towards the end of
 the dox, it's unclear that a lot of work is done by it. So someone who
 wants to understand the library without having first learned curl would
 need to infer this fact from the scarce examples.

 * Again, many properties have setters but not getters (e.g. maxRedirects).

 * The word "simply" is overused.

 ================
 Implementation
 ================

 * We usually import modules in alphabetical order.

 * The frequent pattern if (!expr) throw new CurlException(msg); should
 be replaced by enforce(expr, new CurlException(msg));

 * throwOnStopped should accept a defaulted string. There are several
 instances of the pattern if (stopped) throw new CurlException("...");

 * I think we can in the future replace curl_easy_(un)escape with more
 efficient native functions.

of the source.
 * This kind of stuff is unsafe: cast(char*)(username ~ ":" ~ password)
 because the result of the concatenation is not necessarily
 zero-terminated. Please make a pass through all cast instances making
 sure the code appends a \0.

 * Some lines are long enough to not be visible in git's code viewer.

from other review comment. One if these is limiting lines to max 80-120 char. When this first review ends on wednesday I'll commit that and begin handling the rest of the review comments.
 * More buffer overflow opportunities, e.g. in line 1798: "if
 (header[0..5] == "HTTP/")" should be "if (header.startsWith("HTTP/"))"

 Andrei

Thank you for you comments.
Aug 29 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/29/11 3:46 PM, jdrewsen wrote:
[sni]

Thanks for taking my comments in stride.

 * verbose, dataTimeout etc. don't have corresponding properties for
 reading.

Libcurl only has a function for setting options and not one for getting them. This means that I would have to keep a copy of the setting myself just to make it readable which i judged was not the way to go. This is why they are read only. This is one of the reasons why I see a wrapper for libcurl as just temporary solution to a native networking library i D.

I looked around and it seems getopt is a moderately commonly asked feature for libcurl. However, since it hasn't been implemented for years, you probably made the right call to not provide shadow members. Regarding a native library vs. libcurl, honest I think your implementation is here to stay, so it's great that you've spent (and still are spending) time to make it compelling. Libcurl is a powerful, mature library that leaves little incentive for a native implementation written from scratch. Also, the author is cooperative and willing to help distribution matters. Regarding the post with callback examples, it would be great to show how to transmit some data one chunk at a time. The change is minimal and there's no more potential overflow. Looking forward to the revised version! Thanks, Andrei
Aug 29 2011
parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 29-08-2011 23:08, Andrei Alexandrescu skrev:
 On 8/29/11 3:46 PM, jdrewsen wrote:

 I looked around and it seems getopt is a moderately commonly asked
 feature for libcurl. However, since it hasn't been implemented for
 years, you probably made the right call to not provide shadow members.

I'll keep it that way then.
 Regarding a native library vs. libcurl, honest I think your
 implementation is here to stay, so it's great that you've spent (and
 still are spending) time to make it compelling. Libcurl is a powerful,
 mature library that leaves little incentive for a native implementation
 written from scratch. Also, the author is cooperative and willing to
 help distribution matters.

The curl wrappers usefullness will not go away any time soon. I do think that phobos would benefit from a pre/reactor based design with native http/ftp etc. support though. A step in that direction could render the curl wrapper less attractive in the future.
 Regarding the post with callback examples, it would be great to show how
 to transmit some data one chunk at a time. The change is minimal and
 there's no more potential overflow.

Walter suggested that I should write an article about using the wrapper. I've now taken the first steps on writing such an article. I will have to get the library API rock stable before I can finish it though.
 Looking forward to the revised version!

/Jonas
Aug 30 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

I have a suggestion for you - write and test an asynchronous copy program. It is a continuous source of surprise to me that even seasoned programmers don't realize that this is an inefficient copy routine: while (read(source, buffer)) write(target, buffer); If the methods are synchronous and the speeds of source and target are independent, the net transfer rate of the routine is R1*R1/(R1+R2), where R1 and R2 are the transfer rates of the source and destination respectively. In the worst case R1=R2 and the net transfer rate is half that. This is an equation very easy to derive from first principles but many people are very incredulous about it. Consequently, many classic file copying programs (including cp; I don't know about wget or curl) use the inefficient method. As the variety of data sources increases (SSD, magnetic, networked etc) I predict async I/O will become increasingly prevalent. In an async approach with a queue, transfer proceeds at the optimal speed min(R1, R2). That's why I'm insisting the async range should be super easy to use, encapsulated, and robust: if people reach for the async range by default for their dealings with networked data, they'll write optimal code, sometimes even without knowing it. If your article discusses this and shows e.g. how to copy data optimally from one server to another using HTTP, or from one server to a file etc, and if furthermore you show how your API makes all that a trivial five-liner, that would be a very instructive piece. Andrei
Aug 30 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:
 while (read(source, buffer))
    write(target, buffer);
 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.
 This is an equation very easy to derive from first principles but many
 people are very incredulous about it. Consequently, many classic file
 copying programs (including cp; I don't know about wget or curl) use the
 inefficient method. As the variety of data sources increases (SSD,
 magnetic, networked etc) I predict async I/O will become increasingly
 prevalent. In an async approach with a queue, transfer proceeds at the
 optimal speed min(R1, R2). That's why I'm insisting the async range
 should be super easy to use, encapsulated, and robust: if people reach
 for the async range by default for their dealings with networked data,
 they'll write optimal code, sometimes even without knowing it.
 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.
 Andrei

A similar situation applies when reading data and performing CPU-intensive processing on it line-by-line, chunk-by-chunk, etc. If the processing takes as long as the reading, you'll take a 50% speed hit for no good reason. This was the motivation behind std.parallelism.asyncBuf. Of course, asyncBuf is unsafe. I/O-related Phobos modules should provide safe, encapsulated solutions rather than requiring the user to use std.concurrency, std.parallelism or core.thread manually. Any of these represents poor encapsulation, the latter two are inherently unsafe and std.concurrency is often not flexible enough to implement efficient async I/O without breaking its safety guarantees with casts. Speaking of which, I should probably get busy with the std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to make. What's good for HTTP is probably good for file I/O as well, and all the low-level concurrency code to make this happen is already in std.parallelism. In general, I think you hit the nail on the head: Safe, encapsulated, easy-to-use async I/O should be a standard feature for all I/O-related Phobos modules.
Aug 30 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Jonathan M Davis (jmdavisProg gmx.com)'s article
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?
 - Jonathan M Davis

Probably just make it asynchronous because I can't figure out why anyone would want a synchronous file copy and we don't want to make the API gargantuan and confusing. At any rate, fixing std.file.copy isn't a high priority because I don't imagine it's a performance bottleneck nearly as often as most other Phobos I/O.
Aug 30 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei
Aug 30 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

What about std.file.byLine and byChunk? IMHO these should be augmented, not replaced, by byLineAsync and byChunkAsync. There's some extra cost associated with doing this asynchronously (you need multiple buffers and you need to fire up a thread unless you have the user provide a TaskPool) and probably some tuning options that should be exposed. Furthermore, byLine and byChunk are inherently lower level than copy and therefore control is more important and simplicity less important.
Aug 30 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 2:35 PM, dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it to be
 asynchronous or to create a second function (e.g. copyAsync) which does an
 asynchrous copy?

file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

What about std.file.byLine and byChunk? IMHO these should be augmented, not replaced, by byLineAsync and byChunkAsync. There's some extra cost associated with doing this asynchronously (you need multiple buffers and you need to fire up a thread unless you have the user provide a TaskPool) and probably some tuning options that should be exposed. Furthermore, byLine and byChunk are inherently lower level than copy and therefore control is more important and simplicity less important.

I agree. Andrei
Aug 30 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved.

Why would more seeking be involved? Andrei
Aug 30 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/31/11 11:39 AM, Marco Leise wrote:
 Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved.

Why would more seeking be involved? Andrei

Usually files are laid out on the disk in more or less contiguous chunks. The naive algorithm would read a large block first and then write it. The disk head only has to move twice for every of these blocks. Once it has to seek the source sector and once the destination sector. If on the other hand you have both operations in parallel then - unless the OS does some heavy heuristic optimization - you have the disk head move constantly as the operating system interleaves the long series of reads and writes in order to serve data to both threads.

If I understand the above correctly, the same amount of seeking goes around in both cases. The only difference is that requests come at a faster pace, as they should. Andrei
Aug 31 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/31/2011 02:15 PM, Marco Leise wrote:
[snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

That's not what I wanted to say. Let me put it like this: If you read the file in one call to read and the write the whole thing from your buffer you have only 2 of these 'long' seeks.

Yes. It is fair to assume that the asynchronous case uses buffers of the same length as the synchronous case. Then the grand total number of seeks will be the same or better for the asynchronous version (better if the OS handles concurrent seek requests cleverly with some elevator algorithm). It's really simple. The synchronous version is the worst case because it needs a seek for read and a seek for write in each epoch. (One epoch = one buffer is read and written.) The asynchronous version still needs to do seeks for reading and writing, but in the same amount, and issues them at a faster rate.
 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

That is the case whether or not the requests are coming from the same thread or not.
 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

This will happen all the more if you have multiple threads. You clearly have a good expertise on OS, I/O, and related matters, and I am honoured by your presence and by the value you're adding to this group. However, in this particular argument you're only firing blanks. Are you sure you have a case? Andrei
Aug 31 2011
next sibling parent =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Andrei Alexandrescu wrote:
 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

That's not what I wanted to say. Let me put it like this: If you read the file in one call to read and the write the whole thing from your buffer you have only 2 of these 'long' seeks.

Yes. It is fair to assume that the asynchronous case uses buffers of th=

 same length as the synchronous case. Then the grand total number of
 seeks will be the same or better for the asynchronous version (better i=

 the OS handles concurrent seek requests cleverly with some elevator
 algorithm).
=20

same in both cases, which means that the asynchronous buffers will have size 1/n times the synchronous buffer size (where n is the number of threads). Which means that the asynchronous version will issue n times as many seeks. Now you are right that those seeks will be issued in a more efficient manner (especially since they can probably be sent to the disk before the ongoing transfer is completed). Whether the efficiency gain will offset the increased number of seeks, I do not know but I doubt it... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Aug 31 2011
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Johannes Pfau (spam example.com)'s article
 Andrei Alexandrescu wrote:
This will happen all the more if you have multiple threads.

You clearly have a good expertise on OS, I/O, and related matters, and
I am honoured by your presence and by the value you're adding to this
group. However, in this particular argument you're only firing blanks.
Are you sure you have a case?


Andrei

Here's a first, naive attempt: https://gist.github.com/1184671 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the async implementation was already ~10% faster in simple tests (copying a 700mb ubuntu .iso; source and target on the same harddisk). I wouldn't have expected this, but it seems the async copy is actually be faster.

Wow, that is naive. You're creating a new buffer for every chunk. I'm guessing that GC overhead eats you alive. The next Phobos release will have a new overload of std.parallelism.asyncBuf that will make it easy to recycle buffers in these cases. If DMD Bug 6582 gets fixed or I find a workaround, I'll also make a better-encapsulated std.stdio.File.ByChunkAsync on top of this. This would make asynchronous file copying without an absurd number of allocations trivial. It would be something like: void copy(in char[] source, in char[] dest) { // Pray that all the d'tor bugs in DMD are fixed so these files // get closed automatically. auto destHandle = File(dest, "wb"); foreach(chunk; File(source).byChunkAsync(chunkSize)) { destHandle.rawWrite(chunk); } }
Aug 31 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/31/11 4:05 PM, dsimcha wrote:
 void copy(in char[] source, in char[] dest) {
      // Pray that all the d'tor bugs in DMD are fixed so these files
      // get closed automatically.

      auto destHandle = File(dest, "wb");
      foreach(chunk; File(source).byChunkAsync(chunkSize)) {
          destHandle.rawWrite(chunk);
      }
 }

That's the spirit! We *must* get this rock-solid and working to a tee. Andrei
Aug 31 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/31/11 4:24 PM, Sean Kelly wrote:
 On Aug 31, 2011, at 1:53 PM, Johannes Pfau wrote:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and
 I am honoured by your presence and by the value you're adding to this
 group. However, in this particular argument you're only firing blanks.
 Are you sure you have a case?

So why don't we benchmark this? Here's a first, naive attempt: https://gist.github.com/1184671 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the async implementation was already ~10% faster in simple tests (copying a 700mb ubuntu .iso; source and target on the same harddisk). I wouldn't have expected this, but it seems the async copy is actually be faster.

Interesting. I ran similar tests a while back using socket IO where one side of the operation was on a machine with ipfw limiting bandwidth to 1.5Mbps and couldn't produce a meaningful difference between the synchronous and asynchronous copy schemes. I've included a snippet of my results below. copy_std is synchronous copy and copy_msg is asynchronous using message-passing. I believe the code is modeled on samples from TDPL. Tests were run using a 1.6MB file, which should take roughly 8 seconds to transfer over the wire at 1.5Mbs. I ran the tests using "time copy_xxx" and the times below are the "real" time reported. Numbers are ballpark averages from 3 tests.

Ah, I remember. IMHO such a short file is subject to a lot of surreptitious caching. The test should involve much larger files. Andrei
Aug 31 2011
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 08/31/2011 04:44 PM, Marco Leise wrote:
 Am 31.08.2011, 21:52 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

That's not what I wanted to say. Let me put it like this: If you read the file in one call to read and the write the whole thing from your buffer you have only 2 of these 'long' seeks.

Yes. It is fair to assume that the asynchronous case uses buffers of the same length as the synchronous case. Then the grand total number of seeks will be the same or better for the asynchronous version (better if the OS handles concurrent seek requests cleverly with some elevator algorithm). It's really simple. The synchronous version is the worst case because it needs a seek for read and a seek for write in each epoch. (One epoch = one buffer is read and written.) The asynchronous version still needs to do seeks for reading and writing, but in the same amount, and issues them at a faster rate.
 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

That is the case whether or not the requests are coming from the same thread or not.
 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

This will happen all the more if you have multiple threads. You clearly have a good expertise on OS, I/O, and related matters, and I am honoured by your presence and by the value you're adding to this group. However, in this particular argument you're only firing blanks. Are you sure you have a case? Andrei

I guess that happens when you use Gentoo. I'm not an OS expert, I find it interesting what ideas get implemented into an OS, chip or language to make it faster, easier or safer. So these things tend to stick with me. I think you are an expert in matters of parallel execution. So I'm interested in a cp program written by you, that employs these asynchronous reads/writes in two threads. In your first post you claimed your algorithm will be faster because the OS performs synchronous flushes, now you say it will be faster because the OS does delay the flush and reorders commands. If you stayed at your point it would be easier for me to refute you. ^^

The goal is not to refute one another as much as figuring out what the best method to copy stuff around is.
 Your idea that the OS will optimize your multi-threaded calls in a way
 it could not do in a single-threaded application sounds interesting. It
 would turn the copy operation into the single-threaded version, but this
 time inside the kernel where small reads and writes are merged to bigger
 ones, hopefully of optimal size. So it is like saying "multi-threading
 results in the better single-threading" and we both made a point. This
 is all theory though until we have a test program. I think it is more
 likely that you 1. add thread-switch overhead and 2. these thread
 switches cut your running read/write into smaller pieces, adding more
 seeks. In any case, a single-threaded program with large enough buffers
 to mask the seeking delay (not Gigabytes) is faster than a
 multi-threaded version. I just can't say for sure anymore if there is
 some minimal buffer size at which the multi-threaded approach is faster,
 because of the OS merging things into larger buffers behind the scenes.

The phenomena involved are rather complex. Probably the only way to move this forward is a comprehensive benchmark involving various I/O scenarios (combinations of HTTP, FTP, NAS mount, magnetic disk, SSD, USB stick). Andrei
Sep 01 2011
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, August 30, 2011 10:57 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s
 
 It is a continuous source of surprise to me that even seasoned
 programmers don't realize that this is an inefficient copy routine:
 while (read(source, buffer))
 
 write(target, buffer);
 
 If the methods are synchronous and the speeds of source and target are
 independent, the net transfer rate of the routine is R1*R1/(R1+R2),
 where R1 and R2 are the transfer rates of the source and destination
 respectively. In the worst case R1=R2 and the net transfer rate is half
 that.
 This is an equation very easy to derive from first principles but many
 people are very incredulous about it. Consequently, many classic file
 copying programs (including cp; I don't know about wget or curl) use the
 inefficient method. As the variety of data sources increases (SSD,
 magnetic, networked etc) I predict async I/O will become increasingly
 prevalent. In an async approach with a queue, transfer proceeds at the
 optimal speed min(R1, R2). That's why I'm insisting the async range
 should be super easy to use, encapsulated, and robust: if people reach
 for the async range by default for their dealings with networked data,
 they'll write optimal code, sometimes even without knowing it.
 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.
 Andrei

A similar situation applies when reading data and performing CPU-intensive processing on it line-by-line, chunk-by-chunk, etc. If the processing takes as long as the reading, you'll take a 50% speed hit for no good reason. This was the motivation behind std.parallelism.asyncBuf. Of course, asyncBuf is unsafe. I/O-related Phobos modules should provide safe, encapsulated solutions rather than requiring the user to use std.concurrency, std.parallelism or core.thread manually. Any of these represents poor encapsulation, the latter two are inherently unsafe and std.concurrency is often not flexible enough to implement efficient async I/O without breaking its safety guarantees with casts. Speaking of which, I should probably get busy with the std.stdio.File.byLineAsync/byChunkAsync pull request I've been meaning to make. What's good for HTTP is probably good for file I/O as well, and all the low-level concurrency code to make this happen is already in std.parallelism. In general, I think you hit the nail on the head: Safe, encapsulated, easy-to-use async I/O should be a standard feature for all I/O-related Phobos modules.

std.file.copy is synchronous. Would the suggestion then be to change it to be asynchronous or to create a second function (e.g. copyAsync) which does an asynchrous copy? - Jonathan M Davis
Aug 30 2011
prev sibling next sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

I have a suggestion for you - write and test an asynchronous copy program. It is a continuous source of surprise to me that even seasoned programmers don't realize that this is an inefficient copy routine: while (read(source, buffer)) write(target, buffer); If the methods are synchronous and the speeds of source and target are independent, the net transfer rate of the routine is R1*R1/(R1+R2), where R1 and R2 are the transfer rates of the source and destination respectively. In the worst case R1=R2 and the net transfer rate is half that.

I guess that e.g. incoming network buffers in the OS often makes the shown copy routine faster than you would think i most cases. These buffers stores the incoming network data asynchronously by the OS and makes the next read() instantanous. The same can be said about writing to disk. Calling sync() is the real enemy here. This is only true as long as the buffers are not full of course.
 If your article discusses this and shows e.g. how to copy data optimally
 from one server to another using HTTP, or from one server to a file etc,
 and if furthermore you show how your API makes all that a trivial
 five-liner, that would be a very instructive piece.

That could be an interesting example. /Jonas
Aug 30 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 1:21 PM, jdrewsen wrote:
 Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the wrapper.
 I've now taken the first steps on writing such an article. I will have
 to get the library API rock stable before I can finish it though.

I have a suggestion for you - write and test an asynchronous copy program. It is a continuous source of surprise to me that even seasoned programmers don't realize that this is an inefficient copy routine: while (read(source, buffer)) write(target, buffer); If the methods are synchronous and the speeds of source and target are independent, the net transfer rate of the routine is R1*R1/(R1+R2), where R1 and R2 are the transfer rates of the source and destination respectively. In the worst case R1=R2 and the net transfer rate is half that.

I guess that e.g. incoming network buffers in the OS often makes the shown copy routine faster than you would think i most cases. These buffers stores the incoming network data asynchronously by the OS and makes the next read() instantanous. The same can be said about writing to disk. Calling sync() is the real enemy here. This is only true as long as the buffers are not full of course.

Unless the OS issues speculative reads (which I don't think it does for either files or sockets), any time spent in write() is a net loss for reading speed. Now, if write is buffered and the buffers are flushed asynchronously, calls to write() would be instantaneous. I'm not sure to what extent the major OSs do that, and for what types of files. Andrei
Aug 30 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does for either
 files or sockets), any time spent in write() is a net loss for reading speed.
 Now, if write is buffered and the buffers are flushed asynchronously, calls to
 write() would be instantaneous. I'm not sure to what extent the major OSs do
 that, and for what types of files.

The way to test this is to read file by 4096 byte chunks forwards, and compare that with the speed of reading it backwards. The OS will try to cache the file in memory between runs, so to defeat this means the file has to be on a removable drive (I use an SD card) and pull the drive out and reinsert it between each test.
Aug 30 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 4:11 PM, Marco Leise wrote:
 Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright
 <newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does
 for either
 files or sockets), any time spent in write() is a net loss for
 reading speed.


Now *that* was speculative, if you know what I mean. Take a look at this article on on-demand readahead: http://lwn.net/Articles/235164/ I suspect the situation on other operating-systems is similar. The buffer for Linux pipes is currently maxed at 64 KB, so there is some room for multiprocessing as well. Sockets are different, but as someone else already pointed out, the OS has to handle the incoming traffic somehow, and it does so by filling a receive buffer. The socket implementation in current operating systems allows you to set the buffer size to large values. The network protocol practically enforces asynchronous operations in the background.
 Now, if write is buffered and the buffers are flushed asynchronously,
 calls to
 write() would be instantaneous. I'm not sure to what extent the major
 OSs do
 that, and for what types of files.


Both OSs and disk drives have write caches.

That's not the question. The question is whether the caches are flushed asynchronously and under what regime. What I can say as an end user of fwrite() et al is that such calls do take time - it's not near-instantaneous. Andrei
Aug 30 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it  
 to be
 asynchronous or to create a second function (e.g. copyAsync) which does  
 an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved. If that is true, then only an identification of the source and destination media through operating system APIs can yield optimal performance.
Aug 30 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/30/11 6:17 PM, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
 HTTP really isn't the ideal protocol for this sort of thing.  By default, the
entire file would be in the message body, which isn't really asynchronous. 
Chunking is an alternative, but there's no way to do this from the client-side,
it's purely a server thing.

I stand corrected--it looks like libcurl lets you enable chunking. I will say, however, that I've never actually see a chunked upload from a web browser, regardless of size. It's kind of interesting that libcurl will do this.

Far as I can tell Google Chrome does it. That allows it to report percent complete for uploads in the status bar. Andrei
Aug 30 2011
prev sibling next sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 31/08/11 01.17, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
 HTTP really isn't the ideal protocol for this sort of thing.  By default, the
entire file would be in the message body, which isn't really asynchronous. 
Chunking is an alternative, but there's no way to do this from the client-side,
it's purely a server thing.

I stand corrected--it looks like libcurl lets you enable chunking. I will say, however, that I've never actually see a chunked upload from a web browser, regardless of size. It's kind of interesting that libcurl will do this.

Chunked transfer is only really necessary when you do not know the size in advance. I doesn't mean that you cannot use asynchronous IO when doing unchunked transfers. When you know the file size you can just set the file size in the http header and send the file asynchronously piece by piece. /Jonas
Aug 31 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved.

Why would more seeking be involved? Andrei

Usually files are laid out on the disk in more or less contiguous chunks. The naive algorithm would read a large block first and then write it. The disk head only has to move twice for every of these blocks. Once it has to seek the source sector and once the destination sector. If on the other hand you have both operations in parallel then - unless the OS does some heavy heuristic optimization - you have the disk head move constantly as the operating system interleaves the long series of reads and writes in order to serve data to both threads. I don't know the internals of any kernel enough to tell you exactly how this is done, but it makes sense that the disk can operate faster if it doesn't waste time moving the head around. And that is what would happen in a multi-threaded copy algorithm on a single HDD. If you want to learn more about IO schedulers, take a look here: http://wlug.org.nz/LinuxIoScheduler and here http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux%20IO%20Schedulers.pdf If you want to test your algorithm without an IO scheduler optimizing your reads and writes on Linux you can to the following. First use cat /sys/block/<your disk here (hda|sda|other...)>/queue/scheduler to find out which schedulers are supported and which is active. For me it prints: noop [cfq] Then, to select 'noop' the "no operation"-scheduler type: echo noop > /sys/block/<your disk here>/queue/scheduler You can use the first command again to verify that the scheduler is set. - Marco
Aug 31 2011
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
--00151774187a4bd7fe04abd1eaf6
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Aug 31, 2011 at 11:39 AM, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu <
 SeeWebsiteForEmail erdani.org**>:


  On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org**>:

  On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved.

Why would more seeking be involved? Andrei

Usually files are laid out on the disk in more or less contiguous chunks. The naive algorithm would read a large block first and then write it. The disk head only has to move twice for every of these blocks. Once it has to seek the source sector and once the destination sector. If on the other hand you have both operations in parallel then - unless the OS does some heavy heuristic optimization - you have the disk head move constantly as the operating system interleaves the long series of reads and writes in order to serve data to both threads. I don't know the internals of any kernel enough to tell you exactly how this is done, but it makes sense that the disk can operate faster if it doesn't waste time moving the head around. And that is what would happen in a multi-threaded copy algorithm on a single HDD. If you want to learn more about IO schedulers, take a look here: http://wlug.org.nz/**LinuxIoScheduler<http://wlug.org.nz/LinuxIoScheduler> and here http://www.cs.ccu.edu.tw/~**lhr89/linux-kernel/Linux%20IO%** 20Schedulers.pdf<http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux%20IO%20Schedulers.pdf> If you want to test your algorithm without an IO scheduler optimizing your reads and writes on Linux you can to the following. First use cat /sys/block/<your disk here (hda|sda|other...)>/queue/**scheduler to find out which schedulers are supported and which is active. For me it prints: noop [cfq] Then, to select 'noop' the "no operation"-scheduler type: echo noop > /sys/block/<your disk here>/queue/scheduler You can use the first command again to verify that the scheduler is set.

write requests (which SCSI was designed to allow), which should make this never be an issue. My understanding is that most SCSI implementations (and it's in just about every storage protocol out there) allow this, with the exception of Bulk Only Transport, which is used with USB hard drives and flash drives. That will soon be replaced by USB Attached SCSI ( http://en.wikipedia.org/wiki/USB_Attached_SCSI), which specifically allows out of order execution. The end idea behind much of SCSI is that the disk designers are much better at knowing what's fast and what's not than the OS developers, and if we need to worry about it on the application level, something has seriously gone wrong. --00151774187a4bd7fe04abd1eaf6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Wed, Aug 31, 2011 at 11:39 AM, Marco Leise <s= pan dir=3D"ltr">&lt;<a href=3D"mailto:Marco.Leise gmx.de">Marco.Leise gmx.d= e</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi= n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu &lt;<a href=3D"mailto= :SeeWebsiteForEmail erdani.org" target=3D"_blank">SeeWebsiteForEmail erdani= .org</a><u></u>&gt;:<div><div></div><div class=3D"h5"><br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 8/30/11 3:34 PM, Marco Leise wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu<br> &lt;<a href=3D"mailto:SeeWebsiteForEmail erdani.org" target=3D"_blank">SeeW= ebsiteForEmail erdani.org</a><u></u>&gt;:<br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> On 8/30/11 1:10 PM, Jonathan M Davis wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> std.file.copy is synchronous. Would the suggestion then be to change<br> it to be<br> asynchronous or to create a second function (e.g. copyAsync) which<br> does an<br> asynchrous copy?<br> </blockquote> <br> I think std.file.copy should do whatever the heck is best to copy a<br> file. As such, it should be transparently asynchronous. It would be<br> great if such a task caught your fancy - and don&#39;t forget to test<br> speed under a few circumstances.<br> <br> Andrei<br> </blockquote> <br> I expect the performance to degrade if you copy asynchronously on a<br> single HDD, since more seeking is involved.<br> </blockquote> <br> Why would more seeking be involved?<br> <br> Andrei<br> </blockquote> <br></div></div> Usually files are laid out on the disk in more or less contiguous chunks. T= he naive algorithm would read a large block first and then write it. The di= sk head only has to move twice for every of these blocks. Once it has to se= ek the source sector and once the destination sector. If on the other hand = you have both operations in parallel then - unless the OS does some heavy h= euristic optimization - you have the disk head move constantly as the opera= ting system interleaves the long series of reads and writes in order to ser= ve data to both threads. I don&#39;t know the internals of any kernel enoug= h to tell you exactly how this is done, but it makes sense that the disk ca= n operate faster if it doesn&#39;t waste time moving the head around. And t= hat is what would happen in a multi-threaded copy algorithm on a single HDD= .<br> <br> If you want to learn more about IO schedulers, take a look here:<br> <a href=3D"http://wlug.org.nz/LinuxIoScheduler" target=3D"_blank">http://wl= ug.org.nz/<u></u>LinuxIoScheduler</a><br> and here<br> <a href=3D"http://www.cs.ccu.edu.tw/~lhr89/linux-kernel/Linux%20IO%20Schedu= lers.pdf" target=3D"_blank">http://www.cs.ccu.edu.tw/~<u></u>lhr89/linux-ke= rnel/Linux%20IO%<u></u>20Schedulers.pdf</a><br> <br> If you want to test your algorithm without an IO scheduler optimizing your = reads and writes on Linux you can to the following. First use<br> <br> =A0 =A0cat /sys/block/&lt;your disk here (hda|sda|other...)&gt;/queue/<u><= /u>scheduler<br> <br> to find out which schedulers are supported and which is active. For me it p= rints:<br> <br> =A0 =A0noop [cfq]<br> <br> Then, to select &#39;noop&#39; the &quot;no operation&quot;-scheduler type:= <br> <br> =A0 =A0echo noop &gt; /sys/block/&lt;your disk here&gt;/queue/scheduler<br=

You can use the first command again to verify that the scheduler is set.<br=
<font color=3D"#888888">

e to do out-of-order execution of read and write requests (which SCSI was d= esigned to allow), which should make this never be an issue. My understandi= ng is that most SCSI implementations (and it&#39;s in just about every stor= age protocol out there) allow this, with the exception of Bulk Only Transpo= rt, which is used with USB hard drives and flash drives. That will soon be = replaced by USB Attached SCSI (<a href=3D"http://en.wikipedia.org/wiki/USB_= Attached_SCSI">http://en.wikipedia.org/wiki/USB_Attached_SCSI</a>), which s= pecifically allows out of order execution.</div> <div>The end idea behind much of SCSI is that the disk designers are much b= etter at knowing what&#39;s fast and what&#39;s not than the OS developers,= and if we need to worry about it on the application level, something has s= eriously gone wrong.=A0</div> </div><br> --00151774187a4bd7fe04abd1eaf6--
Aug 31 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 31.08.2011, 20:08 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 8/31/11 11:39 AM, Marco Leise wrote:
 Am 31.08.2011, 00:07 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 3:34 PM, Marco Leise wrote:
 Am 30.08.2011, 20:48 Uhr, schrieb Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org>:

 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei

I expect the performance to degrade if you copy asynchronously on a single HDD, since more seeking is involved.

Why would more seeking be involved? Andrei

Usually files are laid out on the disk in more or less contiguous chunks. The naive algorithm would read a large block first and then write it. The disk head only has to move twice for every of these blocks. Once it has to seek the source sector and once the destination sector. If on the other hand you have both operations in parallel then - unless the OS does some heavy heuristic optimization - you have the disk head move constantly as the operating system interleaves the long series of reads and writes in order to serve data to both threads.

If I understand the above correctly, the same amount of seeking goes around in both cases. The only difference is that requests come at a faster pace, as they should. Andrei

That's not what I wanted to say. Let me put it like this: If you read the file in one call to read and the write the whole thing from your buffer you have only 2 of these 'long' seeks. Practically you wont use a 2 GB buffer for a 2 GB file though, but I assume that it would be the fastest copy mode from and to the same HDD, whereas the multi-threaded approach would make the OS switch between writing some data for the writer thread and reading some data for the reader thread, probably several times per second. And each time a seek is involved. (Due to IO scheduler optimizations it wont end up like this though. The OS will detect your read pattern (linear) and read a sane amount of data ahead and disk access will generally be optimized to reduce seek times.)
Aug 31 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 31.08.2011, 21:12 Uhr, schrieb Andrew Wiley <wiley.andrew.j gmail.com>:

 Yes, but the disk should be able to do out-of-order execution of read and
 write requests (which SCSI was designed to allow), which should make this
 never be an issue. My understanding is that most SCSI implementations  
 (and
 it's in just about every storage protocol out there) allow this, with the
 exception of Bulk Only Transport, which is used with USB hard drives and
 flash drives. That will soon be replaced by USB Attached SCSI (
 http://en.wikipedia.org/wiki/USB_Attached_SCSI), which specifically  
 allows
 out of order execution.
 The end idea behind much of SCSI is that the disk designers are much  
 better
 at knowing what's fast and what's not than the OS developers, and if we  
 need
 to worry about it on the application level, something has seriously gone
 wrong.

SCSI has internal COPY commands. Under optimal conditions the data is never copied to main memory. That's what would be used ideally. But here we have our application that wants to read a chunk from the file, say 64 KB and the OS detects a linear file read pattern quickly and reads a larger block of say 512 KB. The chunks are written piece-wise and if all went well the kernel will merge the writes so they become a 512 KB chunk on their own. These are sent to the disk that has an internal cache of 8 MB and is in the trouble of deciding at this low level if it should write 8 MB at once or allow for some read access in-between which makes applications more responsive, but increases the seek times. Depending on the scenario either case is preferable. In the file copy scenario it would be better to write the whole cache and then switch back to reading. If on the other hand you were saving a video and click on the 'start' menu that has to load a lot of small icons, then the long write operation should have low priority over the reads. The disk cannot make an informed decision. I think the basic conflict is that we have desktop and server environments that are heavily multi-threaded and HDDs that are inherently single-threaded. They work best if you perform one continuous operation, because every context-switch requires seeking. If the issue was so easy to solve at the low level there wouldn't be three official full blown IO schedulers in the Linux kernel now, IMO. I don't worry about it on the application level, because I know the kernel and disk will handle most use cases well without real worst-case scenarios. But if using a HDD in a multi-threaded way actually yielded a higher throughput than the naive read-first-write-later approach I would be seriously surprised. If I was to bet I'd say it is at least 5% slower (with enabled buffers and IO scheduler) and a real disaster without them. :D - Marco
Aug 31 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 31, 2011, at 1:53 PM, Johannes Pfau wrote:

 Andrei Alexandrescu wrote:
=20
 This will happen all the more if you have multiple threads.
=20
 You clearly have a good expertise on OS, I/O, and related matters, =


 I am honoured by your presence and by the value you're adding to this=20=


 group. However, in this particular argument you're only firing =


 Are you sure you have a case?

So why don't we benchmark this? Here's a first, naive attempt: https://gist.github.com/1184671 =20 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the =

 implementation was already ~10% faster in simple tests (copying a =

 ubuntu .iso; source and target on the same harddisk). I wouldn't have
 expected this, but it seems the async copy is actually be faster.

Interesting. I ran similar tests a while back using socket IO where one = side of the operation was on a machine with ipfw limiting bandwidth to = 1.5Mbps and couldn't produce a meaningful difference between the = synchronous and asynchronous copy schemes. I've included a snippet of = my results below. copy_std is synchronous copy and copy_msg is = asynchronous using message-passing. I believe the code is modeled on = samples from TDPL. Tests were run using a 1.6MB file, which should take roughly 8 seconds = to transfer over the wire at 1.5Mbs. I ran the tests using "time = copy_xxx" and the times below are the "real" time reported. Numbers are = ballpark averages from 3 tests. local/local remote/local local/remote copy_std 0.018s 8.677s 0.085s copy_msg 0.022s 8.710s 0.088s The results weren't what I expected, so I tried again with a 4.8MB file = for comparison: local/local remote/local local/remote copy_std 0.035s 26.522s 0.210s copy_msg 0.040s 26.382s 0.232s=
Aug 31 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 31.08.2011, 21:52 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 On 08/31/2011 02:15 PM, Marco Leise wrote:
 [snip]
 If I understand the above correctly, the same amount of seeking goes
 around in both cases. The only difference is that requests come at a
 faster pace, as they should.

 Andrei

That's not what I wanted to say. Let me put it like this: If you read the file in one call to read and the write the whole thing from your buffer you have only 2 of these 'long' seeks.

Yes. It is fair to assume that the asynchronous case uses buffers of the same length as the synchronous case. Then the grand total number of seeks will be the same or better for the asynchronous version (better if the OS handles concurrent seek requests cleverly with some elevator algorithm). It's really simple. The synchronous version is the worst case because it needs a seek for read and a seek for write in each epoch. (One epoch = one buffer is read and written.) The asynchronous version still needs to do seeks for reading and writing, but in the same amount, and issues them at a faster rate.
 Practically you wont use a
 2 GB buffer for a 2 GB file though, but I assume that it would be the
 fastest copy mode from and to the same HDD, whereas the multi-threaded
 approach would make the OS switch between writing some data for the
 writer thread and reading some data for the reader thread, probably
 several times per second. And each time a seek is involved.

That is the case whether or not the requests are coming from the same thread or not.
 (Due to IO scheduler optimizations it wont end up like this though. The
 OS will detect your read pattern (linear) and read a sane amount of data
 ahead and disk access will generally be optimized to reduce seek times.)

This will happen all the more if you have multiple threads. You clearly have a good expertise on OS, I/O, and related matters, and I am honoured by your presence and by the value you're adding to this group. However, in this particular argument you're only firing blanks. Are you sure you have a case? Andrei

I guess that happens when you use Gentoo. I'm not an OS expert, I find it interesting what ideas get implemented into an OS, chip or language to make it faster, easier or safer. So these things tend to stick with me. I think you are an expert in matters of parallel execution. So I'm interested in a cp program written by you, that employs these asynchronous reads/writes in two threads. In your first post you claimed your algorithm will be faster because the OS performs synchronous flushes, now you say it will be faster because the OS does delay the flush and reorders commands. If you stayed at your point it would be easier for me to refute you. ^^ Your idea that the OS will optimize your multi-threaded calls in a way it could not do in a single-threaded application sounds interesting. It would turn the copy operation into the single-threaded version, but this time inside the kernel where small reads and writes are merged to bigger ones, hopefully of optimal size. So it is like saying "multi-threading results in the better single-threading" and we both made a point. This is all theory though until we have a test program. I think it is more likely that you 1. add thread-switch overhead and 2. these thread switches cut your running read/write into smaller pieces, adding more seeks. In any case, a single-threaded program with large enough buffers to mask the seeking delay (not Gigabytes) is faster than a multi-threaded version. I just can't say for sure anymore if there is some minimal buffer size at which the multi-threaded approach is faster, because of the OS merging things into larger buffers behind the scenes. - Marco
Aug 31 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 31.08.2011, 22:53 Uhr, schrieb Johannes Pfau <spam example.com>:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters, and
 I am honoured by your presence and by the value you're adding to this
 group. However, in this particular argument you're only firing blanks.
 Are you sure you have a case?


 Andrei

So why don't we benchmark this? Here's a first, naive attempt: https://gist.github.com/1184671 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the async implementation was already ~10% faster in simple tests (copying a 700mb ubuntu .iso; source and target on the same harddisk). I wouldn't have expected this, but it seems the async copy is actually be faster.

I cannot verify your numbers at all. I drop the caches before every run and make two runs with --async first and --sync second and two runs the other way round. On a 1GB file the async version adds an average of 0.22%. With a larger buffer size that margin increases even more. Did you actually clear the disk cache before runs and no other application used the disk much? Browsers tend to save the session in intervals. At the bottom of your heart you know that your numbers must be wrong. :p Verify it once more!
Aug 31 2011
prev sibling next sibling parent Josh Simmons <simmons.44 gmail.com> writes:
On Thu, Sep 1, 2011 at 10:00 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 I cannot verify your numbers at all. I drop the caches before every run and
 make two runs with --async first and --sync second and two runs the other
 way round. On a 1GB file the async version adds an average of 0.22%. With a
 larger buffer size that margin increases even more. Did you actually clear
 the disk cache before runs and no other application used the disk much?
 Browsers tend to save the session in intervals. At the bottom of your heart
 you know that your numbers must be wrong. :p Verify it once more!

Not sure why we're evaluating disk access performance patterns when dealing with network throughput which is only copying the data from userspace to a kernel buffer provided you're using a proper async socket access pattern. Implemented properly there is absolutely no need to complicate the whole show with threading (unless you're on windows where there's only IOCP). At any rate curl provides an async api which one would assume is relatively sane so once again I'm not sure how this speculation is appropriate. I'll also note that I doubt very much that this cost would be significant in any way when compared to http overhead. As well as this, as an API issue I don't think a shortcut method for avoiding the HTTP verb is a good idea. Being explicit as to what you're doing is very much appropriate and hiding the verb because you don't have any understanding of HTTP is not a fantastic decision.
Aug 31 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
The trick is sending from a fast local source to a slow remote destination w=
ithout blocking when the write buffer fills anyway, IMO. The rest of this is=
 mostly an issue of academic interest.=20

Sent from my iPhone

On Aug 31, 2011, at 6:32 PM, Josh Simmons <simmons.44 gmail.com> wrote:

 On Thu, Sep 1, 2011 at 10:00 AM, Marco Leise <Marco.Leise gmx.de> wrote:
 I cannot verify your numbers at all. I drop the caches before every run a=


 make two runs with --async first and --sync second and two runs the other=


 way round. On a 1GB file the async version adds an average of 0.22%. With=


 larger buffer size that margin increases even more. Did you actually clea=


 the disk cache before runs and no other application used the disk much?
 Browsers tend to save the session in intervals. At the bottom of your hea=


 you know that your numbers must be wrong. :p Verify it once more!
=20

Not sure why we're evaluating disk access performance patterns when dealing with network throughput which is only copying the data from userspace to a kernel buffer provided you're using a proper async socket access pattern. Implemented properly there is absolutely no need to complicate the whole show with threading (unless you're on windows where there's only IOCP). At any rate curl provides an async api which one would assume is relatively sane so once again I'm not sure how this speculation is appropriate. =20 I'll also note that I doubt very much that this cost would be significant in any way when compared to http overhead. =20 As well as this, as an API issue I don't think a shortcut method for avoiding the HTTP verb is a good idea. Being explicit as to what you're doing is very much appropriate and hiding the verb because you don't have any understanding of HTTP is not a fantastic decision.

Sep 01 2011
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 01.09.2011, 16:56 Uhr, schrieb Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org>:

 The phenomena involved are rather complex. Probably the only way to move  
 this forward is a comprehensive benchmark involving various I/O  
 scenarios (combinations of HTTP, FTP, NAS mount, magnetic disk, SSD, USB  
 stick).

 Andrei

I split this into a new thread: Fast file copy (threaded or not?)
Sep 01 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright  
<newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Unless the OS issues speculative reads (which I don't think it does for  
 either
 files or sockets), any time spent in write() is a net loss for reading  
 speed.


Now *that* was speculative, if you know what I mean. Take a look at this article on on-demand readahead: http://lwn.net/Articles/235164/ I suspect the situation on other operating-systems is similar. The buffer for Linux pipes is currently maxed at 64 KB, so there is some room for multiprocessing as well. Sockets are different, but as someone else already pointed out, the OS has to handle the incoming traffic somehow, and it does so by filling a receive buffer. The socket implementation in current operating systems allows you to set the buffer size to large values. The network protocol practically enforces asynchronous operations in the background.
 Now, if write is buffered and the buffers are flushed asynchronously,  
 calls to
 write() would be instantaneous. I'm not sure to what extent the major  
 OSs do
 that, and for what types of files.


Both OSs and disk drives have write caches. For removable media the OS cache is often disabled to reduce data loss. The on-disk cache is usually several MB in size so small files can be transferred at full bus speed. On SATA systems the disk can then decide to reorder read and write commands for optimal performance (http://www.sata-io.org/technology/ncq.asp). The filesystem driver in the kernel may want to stop the drive from messing around with the command order, because it wants to perform some atomic operation so it is also still possible to wait for the drive flush its write buffer. This is a trade-off between performance and security.
 The way to test this is to read file by 4096 byte chunks forwards, and  
 compare that with the speed of reading it backwards.

 The OS will try to cache the file in memory between runs, so to defeat  
 this means the file has to be on a removable drive (I use an SD card)  
 and pull the drive out and reinsert it between each test.

On Linux you can clear the caches with "echo 3 > /proc/sys/vm/drop_caches" (kernel 2.6.16) For Windows I could only find this side-effect of opening a file in unbuffered mode on WinXP: http://stackoverflow.com/questions/478340/clear-file-cache-to-repeat-performance-testing/7113153#7113153 - Marco
Aug 30 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 30, 2011, at 11:59 AM, Andrei Alexandrescu wrote:

 On 8/30/11 1:21 PM, jdrewsen wrote:
=20
=20
 I guess that e.g. incoming network buffers in the OS often makes the
 shown copy routine faster than you would think i most cases. These
 buffers stores the incoming network data asynchronously by the OS and
 makes the next read() instantanous. The same can be said about =


 to disk. Calling sync() is the real enemy here. This is only true as
 long as the buffers are not full of course.

Unless the OS issues speculative reads (which I don't think it does =

for reading speed. Now, if write is buffered and the buffers are flushed = asynchronously, calls to write() would be instantaneous. I'm not sure to = what extent the major OSs do that, and for what types of files. They do this for writing to sockets, so I'd guess it applies to most = types of files. However, the write buffer is finite in size, and = writing more than this amount will cause write to block. The really = great part is that I'm not aware of a way to determine how much space is = available in the write buffer=85 only that a nonzero amount is = available. Moving data efficiently in either direction between = locations with different transfer rates is a pretty tricky problem, and = I don't know that the facilities exist to do this as well as one would = like.=
Aug 30 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 30, 2011, at 10:38 AM, Andrei Alexandrescu wrote:
=20
 This is an equation very easy to derive from first principles but many =

copying programs (including cp; I don't know about wget or curl) use the = inefficient method. As the variety of data sources increases (SSD, = magnetic, networked etc) I predict async I/O will become increasingly = prevalent. In an async approach with a queue, transfer proceeds at the = optimal speed min(R1, R2). That's why I'm insisting the async range = should be super easy to use, encapsulated, and robust: if people reach = for the async range by default for their dealings with networked data, = they'll write optimal code, sometimes even without knowing it.
=20
 If your article discusses this and shows e.g. how to copy data =

file etc, and if furthermore you show how your API makes all that a = trivial five-liner, that would be a very instructive piece. HTTP really isn't the ideal protocol for this sort of thing. By = default, the entire file would be in the message body, which isn't = really asynchronous. Chunking is an alternative, but there's no way to = do this from the client-side, it's purely a server thing. In short, = HTTP kind of stinks for the client uploading large amounts of data to = the server. On a more general note though, I absolutely agree about = AIO.=
Aug 30 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
=20
 HTTP really isn't the ideal protocol for this sort of thing.  By =

really asynchronous. Chunking is an alternative, but there's no way to = do this from the client-side, it's purely a server thing. I stand corrected--it looks like libcurl lets you enable chunking. I = will say, however, that I've never actually see a chunked upload from a = web browser, regardless of size. It's kind of interesting that libcurl = will do this.=
Aug 30 2011
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
You're right of course. I even do this pretty regularly. Dunno what I was th=
inking.=20

Sent from my iPhone

On Aug 31, 2011, at 12:48 AM, Jonas Drewsen <jdrewsen nospam.com> wrote:

 On 31/08/11 01.17, Sean Kelly wrote:
 On Aug 30, 2011, at 4:14 PM, Sean Kelly wrote:
=20
 HTTP really isn't the ideal protocol for this sort of thing.  By default=



ous. Chunking is an alternative, but there's no way to do this from the cli= ent-side, it's purely a server thing.
=20
 I stand corrected--it looks like libcurl lets you enable chunking.  I wil=


wser, regardless of size. It's kind of interesting that libcurl will do thi= s.
=20
 Chunked transfer is only really necessary when you do not know the size in=

unked transfers. When you know the file size you can just set the file size i= n the http header and send the file asynchronously piece by piece.
=20
 /Jonas

Aug 31 2011
prev sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
------------ukxhTxijXpvYyUYkNStW71
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
Content-Transfer-Encoding: Quoted-Printable

Am 31.08.2011, 00:11 Uhr, schrieb Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org>:

 On 8/30/11 4:11 PM, Marco Leise wrote:
 Am 30.08.2011, 21:56 Uhr, schrieb Walter Bright
 <newshound2 digitalmars.com>:

 On 8/30/2011 11:59 AM, Andrei Alexandrescu wrote:
 Now, if write is buffered and the buffers are flushed asynchronousl=




 calls to
 write() would be instantaneous. I'm not sure to what extent the maj=




 OSs do
 that, and for what types of files.


Both OSs and disk drives have write caches.

That's not the question. The question is whether the caches are flushe=

 asynchronously and under what regime. What I can say as an end user of=

 fwrite() et al is that such calls do take time - it's not  =

 near-instantaneous.

 Andrei

Ok, at least you added an anonymous 'regime' that may influence the buff= er = flushes ;) To be honest, I think we can short-circuit the discussion. Your = multi-threaded solution will perfectly hide away the delays between = consecutive reads() and writes() in a file copy operation. Period. In = other words, the slowest operation will be the mark. What I want to show you is that your assumption, that an operating syste= m = doesn't flush asynchronously is wrong under the regime that you use the = = default settings when writing data to a local disk with the system specs= I = mention below. It does not mean that write() has to be instantaneous. We= = don't know what write() actually has to care for internally. I could = imagine amongst other things: query and lock enough free disk space from= = the fs driver, check auxiliary modules (disk quota, encryption, = compression), write a duplicate of the data to the disk/file cache and = remove other data if it is full. As a test case I wrote a small benchmark to get some numbers. It compare= s = the default write() operation with two ways to get the synchronous = behavior you predicted. The test runs the following steps: - create an empty "test.file" - flush pending I/O buffers to disk using the sync() system call, in ord= er = to ensure equal conditions. - open the file write-only - start timing - write 1 MB of repeated 0..255 patterns - close the file - stop timing The two synchronous test cases work like this: A) open the file to be written in O_SYNC mode, making the write operatio= n = itself synchronous B) use fsync() after the write operation My test system is: CPU: Core 2 Duo 2.0 Ghz disk: WDC WD2500BEVS-75UST0 (SATA, UDMA-6, 5400 RPM, 8 MB cache) filesystem: ext4 kernel: 2.6.39 The results show that caches make the write ~30 times faster: default: 3_184 =CE=BCs O_SYNC : 104_572 =CE=BCs fsync(): 104_296 =CE=BCs To check back I remounted my root partition with the 'sync' option ("mou= nt = -o remount,rw,sync /"), effectively bypassing caches and ran the test = again: default: 109_361 =CE=BCs O_SYNC : 111_370 =CE=BCs fsync(): 102_295 =CE=BCs Actually "mount -o remount,rw,sync /" works without reboot and gives a = nice impression of how the OS would feel without caches. Please try it i= f = you have a Linux. The benchmark code is in the attachment. - Marco ------------ukxhTxijXpvYyUYkNStW71 Content-Disposition: attachment; filename=cache_bench.d Content-Type: application/octet-stream; name=cache_bench.d Content-Transfer-Encoding: Base64 bW9kdWxlIGNhY2hlX2JlbmNoOwoKaW1wb3J0IHN0ZC5zdGRpbzsKaW1wb3J0IHN0 ZC5hbGdvcml0aG07CmltcG9ydCBzdGQucmFuZ2U7CmltcG9ydCBjb3JlLnN5cy5w b3NpeC5mY250bDsKaW1wb3J0IGNvcmUuc3lzLnBvc2l4LnVuaXN0ZDsKaW1wb3J0 IGNvcmUudGltZTsKCmVudW0gZmlsZU5hbWUgPSAidGVzdC5maWxlIjsKCnZvaWQg cmVjcmVhdGVGaWxlKCkgewoJLy8gKHJlKWNyZWF0ZSBhbiBlbXB0eSBmaWxlCgly ZW1vdmUoZmlsZU5hbWUpOwoJYXV0byBmID0gY3JlYXQoZmlsZU5hbWUsIFNfSVJV U1IgfCBTX0lXVVNSIHwgU19JUkdSUCB8IFNfSVdHUlApOwoJY2xvc2UoZik7Cglz eW5jKCk7CglzbGVlcCgzKTsKfQoKdm9pZCBtYWluKCkgIAp7CglieXRlW10gZGF0 YTsKCWRhdGEubGVuZ3RoID0gMTAyNCAqIDEwMjQ7Cgljb3B5KG1hcCEiY2FzdChi eXRlKWEiKGlvdGEoMCwgZGF0YS5sZW5ndGgsIDEpKSwgZGF0YSk7CgoJcmVjcmVh dGVGaWxlKCk7CgoJYXV0byBmID0gb3BlbihmaWxlTmFtZSwgT19XUk9OTFkpOwoJ c3Rkb3V0LndyaXRlKCJUZXN0aW5nIGRlZmF1bHQgYmVoYXZpb3VyLi4uICIpOwoJ c3Rkb3V0LmZsdXNoOwoJYXV0byBzdGFydCA9IFRpY2tEdXJhdGlvbi5jdXJyU3lz dGVtVGljaygpOwoJY29yZS5zeXMucG9zaXgudW5pc3RkLndyaXRlKGYsIGRhdGEu cHRyLCBkYXRhLmxlbmd0aCk7CgljbG9zZShmKTsKCXN0ZG91dC53cml0ZWxuKChU aWNrRHVyYXRpb24uY3VyclN5c3RlbVRpY2soKSAtIHN0YXJ0KS51c2VjcywgIiDO vHMiKTsKCglyZWNyZWF0ZUZpbGUoKTsKCQoJZiA9IG9wZW4oZmlsZU5hbWUsIE9f V1JPTkxZIHwgT19TWU5DKTsKCXN0ZG91dC53cml0ZSgiVGVzdGluZyBPX1NZTkMg YmVoYXZpb3VyLi4uICIpOwoJc3Rkb3V0LmZsdXNoOwoJc3RhcnQgPSBUaWNrRHVy YXRpb24uY3VyclN5c3RlbVRpY2soKTsKCWNvcmUuc3lzLnBvc2l4LnVuaXN0ZC53 cml0ZShmLCBkYXRhLnB0ciwgZGF0YS5sZW5ndGgpOwoJY2xvc2UoZik7CglzdGRv dXQud3JpdGVsbigoVGlja0R1cmF0aW9uLmN1cnJTeXN0ZW1UaWNrKCkgLSBzdGFy dCkudXNlY3MsICIgzrxzIik7CgkKCXJlY3JlYXRlRmlsZSgpOwoKCWYgPSBvcGVu KGZpbGVOYW1lLCBPX1dST05MWSk7CglzdGRvdXQud3JpdGUoIlRlc3RpbmcgZnN5 bmMoKSBiZWhhdmlvdXIuLi4gIik7CglzdGRvdXQuZmx1c2g7CglzdGFydCA9IFRp Y2tEdXJhdGlvbi5jdXJyU3lzdGVtVGljaygpOwoJY29yZS5zeXMucG9zaXgudW5p c3RkLndyaXRlKGYsIGRhdGEucHRyLCBkYXRhLmxlbmd0aCk7Cglmc3luYyhmKTsK CWNsb3NlKGYpOwoJc3Rkb3V0LndyaXRlbG4oKFRpY2tEdXJhdGlvbi5jdXJyU3lz dGVtVGljaygpIC0gc3RhcnQpLnVzZWNzLCAiIM68cyIpOwp9Cg== ------------ukxhTxijXpvYyUYkNStW71--
Aug 31 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, August 30, 2011 13:48:10 Andrei Alexandrescu wrote:
 On 8/30/11 1:10 PM, Jonathan M Davis wrote:
 std.file.copy is synchronous. Would the suggestion then be to change it
 to be asynchronous or to create a second function (e.g. copyAsync)
 which does an asynchrous copy?

I think std.file.copy should do whatever the heck is best to copy a file. As such, it should be transparently asynchronous. It would be great if such a task caught your fancy - and don't forget to test speed under a few circumstances.

I'm sure that I'll get around to it eventually if no one else does, but I don't know how quickly. And if std.file.copy is being messed with, http://d.puremagic.com/issues/show_bug.cgi?id=3862 should probably be sorted out too. So, there's definitely work that needs to be done on std.file.copy. - Jonathan M Davis
Aug 30 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Andrei Alexandrescu wrote:
On 8/30/11 1:21 PM, jdrewsen wrote:
 Den 30-08-2011 19:38, Andrei Alexandrescu skrev:
 On 8/30/11 12:22 PM, jdrewsen wrote:
 Walter suggested that I should write an article about using the
 wrapper. I've now taken the first steps on writing such an
 article. I will have to get the library API rock stable before I
 can finish it though.

I have a suggestion for you - write and test an asynchronous copy program. It is a continuous source of surprise to me that even seasoned programmers don't realize that this is an inefficient copy routine: while (read(source, buffer)) write(target, buffer); If the methods are synchronous and the speeds of source and target are independent, the net transfer rate of the routine is R1*R1/(R1+R2), where R1 and R2 are the transfer rates of the source and destination respectively. In the worst case R1=R2 and the net transfer rate is half that.

I guess that e.g. incoming network buffers in the OS often makes the shown copy routine faster than you would think i most cases. These buffers stores the incoming network data asynchronously by the OS and makes the next read() instantanous. The same can be said about writing to disk. Calling sync() is the real enemy here. This is only true as long as the buffers are not full of course.

Unless the OS issues speculative reads (which I don't think it does for either files or sockets), any time spent in write() is a net loss for reading speed. Now, if write is buffered and the buffers are flushed asynchronously, calls to write() would be instantaneous. I'm not sure to what extent the major OSs do that, and for what types of files. Andrei

Write buffering / asynchronous buffer flushing happens at least in Linux and Windows. This is the main reason for the windows "safe-eject" function for usb-sticks. And linux/gnome does a similar thing when ejecting a usb drive; it even shows a "writing data to disk" progressbar. It's also the reason why there's a O_SYNC flag for the posix "open" method. Here's a snipped that proves that sockets also do asynchronous flushing: https://gist.github.com/1182041 Note that you can even exit the client before the server ever makes it's first receive call, and it'll still get the data. To really test if the data is buffered at client or server-side, you'd need two machines and cut the connection at the right moment. -- Johannes Pfau
Aug 30 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Andrei Alexandrescu wrote:
This will happen all the more if you have multiple threads.

You clearly have a good expertise on OS, I/O, and related matters, and
I am honoured by your presence and by the value you're adding to this 
group. However, in this particular argument you're only firing blanks. 
Are you sure you have a case?


Andrei

So why don't we benchmark this? Here's a first, naive attempt: https://gist.github.com/1184671 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the async implementation was already ~10% faster in simple tests (copying a 700mb ubuntu .iso; source and target on the same harddisk). I wouldn't have expected this, but it seems the async copy is actually be faster. -- Johannes Pfau
Aug 31 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Marco Leise wrote:
Am 31.08.2011, 22:53 Uhr, schrieb Johannes Pfau <spam example.com>:

 Andrei Alexandrescu wrote:
 This will happen all the more if you have multiple threads.

 You clearly have a good expertise on OS, I/O, and related matters,
 and I am honoured by your presence and by the value you're adding
 to this group. However, in this particular argument you're only
 firing blanks. Are you sure you have a case?


 Andrei

So why don't we benchmark this? Here's a first, naive attempt: https://gist.github.com/1184671 There's not much to do for the sync implementation, but the async one should maybe keep a pool of buffers and reuse those. However, the async implementation was already ~10% faster in simple tests (copying a 700mb ubuntu .iso; source and target on the same harddisk). I wouldn't have expected this, but it seems the async copy is actually be faster.

I cannot verify your numbers at all. I drop the caches before every run and make two runs with --async first and --sync second and two runs the other way round. On a 1GB file the async version adds an average of 0.22%. With a larger buffer size that margin increases even more. Did you actually clear the disk cache before runs and no other application used the disk much? Browsers tend to save the session in intervals. At the bottom of your heart you know that your numbers must be wrong. :p Verify it once more!

Yeah I only did 2-3 quick tests on my desktop machine. I tested that on a file-server machine now (I know for sure that no other application accesses the disk there) and dropped the caches as you said. And you're right, there's no big difference anymore. Well, I guess I'll just wait till phobos has an async copy function and let someone else do the benchmarks ;-) -- Johannes Pfau
Sep 01 2011
prev sibling parent Johannes Pfau <spam example.com> writes:
Josh Simmons wrote:
Not sure why we're evaluating disk access performance patterns when
dealing with network throughput which is only copying the data from
userspace to a kernel buffer provided you're using a proper async
socket access pattern.

The discussion drifted away from the original topic [as it's often the case in this newsgroup ;-)] =46rom a previous post:
 On 8/30/11 1:10 PM, Jonathan M Davis wrote: =20
 std.file.copy is synchronous. Would the suggestion then be to change
 it to be
 asynchronous or to create a second function (e.g. copyAsync) which
 does an
 asynchrous copy? =20

I think std.file.copy should do whatever the heck is best to copy a =20 file. As such, it should be transparently asynchronous. It would be =20 great if such a task caught your fancy - and don't forget to test speed under a few circumstances. Andrei =20

Implemented properly there is absolutely no
need to complicate the whole show with threading (unless you're on
windows where there's only IOCP). At any rate curl provides an async
api which one would assume is relatively sane so once again I'm not
sure how this speculation is appropriate.

I guess that an "async" copy as discussed here won't bring any performance gain. But the async* methods in the curl-wrapper still have an important use case: It's the only way to treat curl http streams as a range: Curl's callback based 'push' api doesn't work well with 'pull' based ranges. This could also be done with curl's multi interface, but that has it's own issues: http://curl.haxx.se/mail/lib-2000-12/0013.html http://curl.haxx.se/mail/lib-2007-06/0239.html http://curl.haxx.se/mail/lib-2005-10/0044.html
I'll also note that I doubt very much that this cost would be
significant in any way when compared to http overhead.

As well as this, as an API issue I don't think a shortcut method for
avoiding the HTTP verb is a good idea. Being explicit as to what
you're doing is very much appropriate and hiding the verb because you
don't have any understanding of HTTP is not a fantastic decision.

--=20 Johannes Pfau
Sep 01 2011