www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Second Round CURL Wrapper Review

reply dsimcha <dsimcha yahoo.com> writes:
I volunteered ages ago to manage the review for the second round of 
Jonas Drewsen's CURL wrapper.  After the first round it was decided 
that, after a large number of minor issues were fixed, a second round 
would be necessary.

Significant open issues:

1.  Should libcurl be bundled with DMD on Windows?

2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back 
but it was buried deep in a thread and a lot of people may have missed 
it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

For those of you on Windows, a libcurl binary built by DMC is available 
at http://gool.googlecode.com/files/libcurl_7.21.7.zip.

Review starts now and ends on December 16, followed by one week of voting.
Dec 02 2011
next sibling parent reply mta`chrono <chrono mta-international.net> writes:
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

Low Level Part --> deimos High Level Part --> std.curl
Dec 02 2011
next sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 03/12/11 06.12, mta`chrono wrote:
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

Low Level Part --> deimos High Level Part --> std.curl

I would really hate if phobos would be depending on deimos. As I see it either both levels go in or none at all. /Jonas
Dec 03 2011
prev sibling parent Kapps <Kapps NotValidEmail.com> writes:
On 02/12/2011 11:12 PM, mta`chrono wrote:
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

Low Level Part --> deimos High Level Part --> std.curl

Deimos is meant to be something you grab parts of as necessary; as such, it would not (IMO) make too much sense to force something in Phobos to be dependent on it, as now the user has to go and get something else just to use Phobos.
Dec 03 2011
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 03 Dec 2011 06:26:10 +0200, dsimcha <dsimcha yahoo.com> wrote:

 I volunteered ages ago to manage the review for the second round of  
 Jonas Drewsen's CURL wrapper.  After the first round it was decided  
 that, after a large number of minor issues were fixed, a second round  
 would be necessary.

 Significant open issues:

 1.  Should libcurl be bundled with DMD on Windows?

Yes, please! Finding/building a library compatible with OPTLINK is a needless hassle.
 Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

I think the docs for the asynchronous functions need a bit more elaboration. For example, I had these questions after reading the documentation: Does the working thread block or buffer when data is available but the main thread isn't reading it? The examples for byLine and byLineAsync are almost identical. Is there any difference in functionality? Is it possible to check if data is available without blocking? If not, what's the point of byLineAsync (why not use byLine)? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 02 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 03/12/11 08.56, Vladimir Panteleev wrote:
 On Sat, 03 Dec 2011 06:26:10 +0200, dsimcha <dsimcha yahoo.com> wrote:

 I think the docs for the asynchronous functions need a bit more
 elaboration. For example, I had these questions after reading the
 documentation:

 Does the working thread block or buffer when data is available but the
 main thread isn't reading it?

When the all buffers have been filled yes. I'll state that explicitely in the docs.
 The examples for byLine and byLineAsync are almost identical. Is there
 any difference in functionality?

As mentioned the async version performs the request in another thread leaving the main thread available for you to do something else. I'll clarify in the docs that it is when you actually call empty/front on the returned range you will get data from the other thread and may be blocking.
 Is it possible to check if data is available without blocking? If not,
 what's the point of byLineAsync (why not use byLine)?

No. The .empty property needs to wait in order to have the proper range behavior. But is does seem like a good idea to add a method for this. Just need a nice name for it since I guess this will be needed by future async ranges as well. Maybe range.wait(duration) returning true if data is present after waiting? Going down this road makes me think that selecting on async ranges would be a very nice feature. But that is definitely not something that will get included in the version getting reviewed now. /Jonas
Dec 03 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
Den 03-12-2011 13:10, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 13:53:16 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 As mentioned the async version performs the request in another thread
 leaving the main thread available for you to do something else. I'll
 clarify in the docs that it is when you actually call empty/front on
 the returned range you will get data from the other thread and may be
 blocking.

I can't think of a realistic use case for the current asynchronous API. Basically, all you can do is start a request in the background, but you're neither notified of the request nor can you poll it to check its status? So the only thing you CAN do is ask for a request that you will need in the future, and when that future moment comes, block if necessary to get the result?

The standard example is downloading some content and saving it at the same time. While your main thread saves a chunk to disk (or uploads to another server) the async thread is busy buffering incoming chunks of data. This means that you effectively only wait for the slowest of the two IO operations. If you did it synchronously would worst case have to wait for all everything to be downloaded and the wait for everything to be saved or uploaded. foreach(chunk; byChunkAsync("www.abc.com/hugefile.bin")) { // While writing to file in this thrad // new chunks are downloaded // in the background thread file.write(chunk); } I hope this makes sense. /Jonas
Dec 03 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
Den 03-12-2011 21:58, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 21:17:25 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 Den 03-12-2011 13:10, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 13:53:16 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 As mentioned the async version performs the request in another thread
 leaving the main thread available for you to do something else. I'll
 clarify in the docs that it is when you actually call empty/front on
 the returned range you will get data from the other thread and may be
 blocking.

I can't think of a realistic use case for the current asynchronous API. Basically, all you can do is start a request in the background, but you're neither notified of the request nor can you poll it to check its status? So the only thing you CAN do is ask for a request that you will need in the future, and when that future moment comes, block if necessary to get the result?

The standard example is downloading some content and saving it at the same time. While your main thread saves a chunk to disk (or uploads to another server) the async thread is busy buffering incoming chunks of data. This means that you effectively only wait for the slowest of the two IO operations. If you did it synchronously would worst case have to wait for all everything to be downloaded and the wait for everything to be saved or uploaded. foreach(chunk; byChunkAsync("www.abc.com/hugefile.bin")) { // While writing to file in this thrad // new chunks are downloaded // in the background thread file.write(chunk); } I hope this makes sense.

Well, this makes sense from a theoretical / high-level perspective, but OS write buffers greatly reduce the practicality of this. In common use cases the speed difference between disk and wire will differ by orders of magnitude as well.

Read/write buffers does indeed help a lot and there have been quite some discussions on this topic earlier in this newsgroups regarding tradeoffs etc. Please have a look at these threads (i dont have any links at hand unfortunately). /Jonas
Dec 04 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
Den 04-12-2011 12:05, Vladimir Panteleev skrev:
 On Sun, 04 Dec 2011 12:59:15 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 Den 03-12-2011 21:58, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 21:17:25 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:
 The standard example is downloading some content and saving it at the
 same time.

 While your main thread saves a chunk to disk (or uploads to another
 server) the async thread is busy buffering incoming chunks of data.
 This means that you effectively only wait for the slowest of the two
 IO operations. If you did it synchronously would worst case have to
 wait for all everything to be downloaded and the wait for everything
 to be saved or uploaded.

 foreach(chunk; byChunkAsync("www.abc.com/hugefile.bin"))
 {
 // While writing to file in this thrad
 // new chunks are downloaded
 // in the background thread
 file.write(chunk);
 }

 I hope this makes sense.

Well, this makes sense from a theoretical / high-level perspective, but OS write buffers greatly reduce the practicality of this. In common use cases the speed difference between disk and wire will differ by orders of magnitude as well.

Read/write buffers does indeed help a lot and there have been quite some discussions on this topic earlier in this newsgroups regarding tradeoffs etc. Please have a look at these threads (i dont have any links at hand unfortunately).

If you're referring to the discussion which was in the context of copying files, then I have read it. However, it does not apply to typical use cases of curl. The question here is if this example makes any practical sense, and by the looks of everything it does not. Or do you disagree?

The same applies here because all it comes down to in the end is the sizes of buffers. The async ranges simply allows you to fill a specified number buffers in another thread async. Most OSes also have socket buffers that serves the same purpose but async ranges allows you to specify the buffer size without being privileged. The standard max read buffer size for a tcp connection on the ubuntu I have is set to 112640 which is doubled by the kernel to 225280 bytes. A developer may very well like buffers larger that this without needing to set privileged kernel variables. /Jonas
Dec 04 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
Den 05-12-2011 10:26, Vladimir Panteleev skrev:
 On Sun, 04 Dec 2011 22:52:13 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 The same applies here because all it comes down to in the end is the
 sizes of buffers.

 The async ranges simply allows you to fill a specified number buffers
 in another thread async. Most OSes also have socket buffers that
 serves the same purpose but async ranges allows you to specify the
 buffer size without being privileged.

 The standard max read buffer size for a tcp connection on the ubuntu I
 have is set to 112640 which is doubled by the kernel to 225280 bytes.
 A developer may very well like buffers larger that this without
 needing to set privileged kernel variables.

I still don't see how this could ever be useful in practice without the ability to poll for whether data is available. Is it possible to write two short programs that use the synchronous and asynchronous APIs in a way that makes a difference?

It is - but as stated it all depends on buffer sizes and IO speed. It is for the same reason that java recommends using BufferedReader around sockets instead of reading directly from the socket stream. Another example that might show another async range advantage and is clearer: auto f1 = byChunkAsync("www.foo.com/file1.txt", 2^^19); auto f2 = byChunkAsync("www.foo.com/file2.txt", 2^^19); auto f3 = byChunkAsync("www.foo.com/file3.txt", 2^^19); // While this iteration goes is done the file2 and file3 are downloaded // in the background threads foreach (l; f1) { writeln(to!string(l)); } // f2 and f3 are probably downloaded now and no waiting is necessary foreach (l; f2) { writeln(to!string(l)); } foreach (l; f3) { writeln(to!string(l)); } Anyway - I _have_ said that I will add a data availability poll functionality so I guess your initial concern is also covered. /Jonas
Dec 05 2011
parent reply Somedude <lovelydear mailmetrash.com> writes:
Le 05/12/2011 22:36, Jonas Drewsen a écrit :
 
 Anyway - I _have_ said that I will add a data availability poll
 functionality so I guess your initial concern is also covered.
 
 /Jonas

Just an idea: would it be possible/useful to use the signals/slots mechanism for this kind of synch ?
Dec 12 2011
parent David Nadlinger <see klickverbot.at> writes:
On 12/12/11 1:18 PM, jdrewsen wrote:
 On Monday, 12 December 2011 at 10:26:52 UTC, Somedude wrote:
 Just an idea: would it be possible/useful to use the signals/slots
 mechanism for this kind of synch ?

This would be most useful if there were some kind of main loop. This is needed because the request is done in another thread and the response should be handled in the main thread. Without an main loop in the main thread or a destinct call to e.g. a poll() method the signal/slot s would not work.

I don't know if you already have a solution in the works, but maybe the future interface I did for Thrift is similar to what you are looking for: http://klickverbot.at/code/gsoc/thrift/docs/thrift.util.future.html David
Dec 12 2011
prev sibling next sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 03/12/11 05.26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided that,
 after a large number of minor issues were fixed, a second round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

 Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 For those of you on Windows, a libcurl binary built by DMC is available
 at http://gool.googlecode.com/files/libcurl_7.21.7.zip.

 Review starts now and ends on December 16, followed by one week of voting.

Thank you for stepping up again even though I know you have plenty of other things to do. /Jonas
Dec 03 2011
prev sibling next sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 03/12/11 05.26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided that,
 after a large number of minor issues were fixed, a second round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

If there are no licensing issues I think it should. At least until we have a proper package management tool in place that could download the library automatically when needed on windows. /Jonas
Dec 03 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 03 Dec 2011 13:53:16 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 As mentioned the async version performs the request in another thread  
 leaving the main thread available for you to do something else. I'll  
 clarify in the docs that it is when you actually call empty/front on the  
 returned range you will get data from the other thread and may be  
 blocking.

I can't think of a realistic use case for the current asynchronous API. Basically, all you can do is start a request in the background, but you're neither notified of the request nor can you poll it to check its status? So the only thing you CAN do is ask for a request that you will need in the future, and when that future moment comes, block if necessary to get the result? Adding a method to query the request's status would certainly be useful. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sat, 03 Dec 2011 21:17:25 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 Den 03-12-2011 13:10, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 13:53:16 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 As mentioned the async version performs the request in another thread
 leaving the main thread available for you to do something else. I'll
 clarify in the docs that it is when you actually call empty/front on
 the returned range you will get data from the other thread and may be
 blocking.

I can't think of a realistic use case for the current asynchronous API. Basically, all you can do is start a request in the background, but you're neither notified of the request nor can you poll it to check its status? So the only thing you CAN do is ask for a request that you will need in the future, and when that future moment comes, block if necessary to get the result?

The standard example is downloading some content and saving it at the same time. While your main thread saves a chunk to disk (or uploads to another server) the async thread is busy buffering incoming chunks of data. This means that you effectively only wait for the slowest of the two IO operations. If you did it synchronously would worst case have to wait for all everything to be downloaded and the wait for everything to be saved or uploaded. foreach(chunk; byChunkAsync("www.abc.com/hugefile.bin")) { // While writing to file in this thrad // new chunks are downloaded // in the background thread file.write(chunk); } I hope this makes sense.

Well, this makes sense from a theoretical / high-level perspective, but OS write buffers greatly reduce the practicality of this. In common use cases the speed difference between disk and wire will differ by orders of magnitude as well. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 03 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sun, 04 Dec 2011 12:59:15 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 Den 03-12-2011 21:58, Vladimir Panteleev skrev:
 On Sat, 03 Dec 2011 21:17:25 +0200, Jonas Drewsen <jdrewsen nospam.com>
 wrote:
 The standard example is downloading some content and saving it at the
 same time.

 While your main thread saves a chunk to disk (or uploads to another
 server) the async thread is busy buffering incoming chunks of data.
 This means that you effectively only wait for the slowest of the two
 IO operations. If you did it synchronously would worst case have to
 wait for all everything to be downloaded and the wait for everything
 to be saved or uploaded.

 foreach(chunk; byChunkAsync("www.abc.com/hugefile.bin"))
 {
 // While writing to file in this thrad
 // new chunks are downloaded
 // in the background thread
 file.write(chunk);
 }

 I hope this makes sense.

Well, this makes sense from a theoretical / high-level perspective, but OS write buffers greatly reduce the practicality of this. In common use cases the speed difference between disk and wire will differ by orders of magnitude as well.

Read/write buffers does indeed help a lot and there have been quite some discussions on this topic earlier in this newsgroups regarding tradeoffs etc. Please have a look at these threads (i dont have any links at hand unfortunately).

If you're referring to the discussion which was in the context of copying files, then I have read it. However, it does not apply to typical use cases of curl. The question here is if this example makes any practical sense, and by the looks of everything it does not. Or do you disagree? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 04 2011
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-12-03 05:26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided that,
 after a large number of minor issues were fixed, a second round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

I vote for std.net.curl. -- /Jacob Carlborg
Dec 04 2011
next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 04.12.2011, 14:02 Uhr, schrieb Jacob Carlborg <doob me.com>:

 On 2011-12-03 05:26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided that,
 after a large number of minor issues were fixed, a second round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

I vote for std.net.curl.

Hey you std.net.curl people must be cheating. You were 50% of the etc.curl people in the past, now you are at 90% of them. Or did some change of mind happen in the community recently?
Dec 04 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sun, 04 Dec 2011 16:33:52 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

 Am 04.12.2011, 14:02 Uhr, schrieb Jacob Carlborg <doob me.com>:

 On 2011-12-03 05:26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided  
 that,
 after a large number of minor issues were fixed, a second round would  
 be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

I vote for std.net.curl.

Hey you std.net.curl people must be cheating. You were 50% of the etc.curl people in the past, now you are at 90% of them. Or did some change of mind happen in the community recently?

That's internet polls for you. I think we should keep voting as NG replies, the same thing we do when voting on Phobos contributions. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 04 2011
prev sibling next sibling parent "Marco Leise" <Marco.Leise gmx.de> writes:
Am 04.12.2011, 15:36 Uhr, schrieb Vladimir Panteleev  
<vladimir thecybershadow.net>:

 On Sun, 04 Dec 2011 16:33:52 +0200, Marco Leise <Marco.Leise gmx.de>  
 wrote:

 Am 04.12.2011, 14:02 Uhr, schrieb Jacob Carlborg <doob me.com>:

 On 2011-12-03 05:26, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided  
 that,
 after a large number of minor issues were fixed, a second round would  
 be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back  
 but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

I vote for std.net.curl.

Hey you std.net.curl people must be cheating. You were 50% of the etc.curl people in the past, now you are at 90% of them. Or did some change of mind happen in the community recently?

That's internet polls for you. I think we should keep voting as NG replies, the same thing we do when voting on Phobos contributions.

Polling on the NG has two problems: 1. Polls generate a lot of noise here (single word posts). 2. Counting the results must be done manually. Although voting is not anonymous, you can cheat here as well, by using two names and email addresses. I'll have future polls allow only one vote per IP address in any case.
Dec 05 2011
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 05 Dec 2011 10:12:00 +0200, Marco Leise <Marco.Leise gmx.de> wrote:

 Polling on the NG has two problems:
 1. Polls generate a lot of noise here (single word posts).
 2. Counting the results must be done manually.

These are rather minor problems.
 Although voting is not anonymous, you can cheat here as well, by using  
 two names and email addresses.

One fake vote is unlikely to swerve the result in either direction, and abundant abuse is obvious. Web form polls are considerably easier to cheat, even with IP limitations. -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 05 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Sun, 04 Dec 2011 22:52:13 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 The same applies here because all it comes down to in the end is the  
 sizes of buffers.

 The async ranges simply allows you to fill a specified number buffers in  
 another thread async. Most OSes also have socket buffers that serves the  
 same purpose but async ranges allows you to specify the buffer size  
 without being privileged.

 The standard max read buffer size for a tcp connection on the ubuntu I  
 have is set to 112640 which is doubled by the kernel to 225280 bytes. A  
 developer may very well like buffers larger that this without needing to  
 set privileged kernel variables.

I still don't see how this could ever be useful in practice without the ability to poll for whether data is available. Is it possible to write two short programs that use the synchronous and asynchronous APIs in a way that makes a difference? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 05 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Mon, 05 Dec 2011 23:36:27 +0200, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 It is - but as stated it all depends on buffer sizes and IO speed. It is  
 for the same reason that java recommends using BufferedReader around  
 sockets instead of reading directly from the socket stream.

 Another example that might show another async range advantage and is  
 clearer:

 auto f1 = byChunkAsync("www.foo.com/file1.txt", 2^^19);
 auto f2 = byChunkAsync("www.foo.com/file2.txt", 2^^19);
 auto f3 = byChunkAsync("www.foo.com/file3.txt", 2^^19);

 // While this iteration goes is done the file2 and file3 are downloaded
 // in the background threads
 foreach (l; f1) { writeln(to!string(l)); }

 // f2 and f3 are probably downloaded now and no waiting is necessary
 foreach (l; f2) { writeln(to!string(l)); }
 foreach (l; f3) { writeln(to!string(l)); }

Thanks. Perhaps include this example in the documentation? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
Dec 05 2011
prev sibling next sibling parent reply "dsimcha" <dsimcha yahoo.com> writes:
Here's my review.  Remember, review ends on December 16.

Overall, this library has massively improved due to the rounds of 
review it's been put through.  I only found a few minor nitpicks. 
  However, a recurring pattern is minor grammar mistakes in the 
documentation.  Please proofread all documentation again.

Docs:

"The high level API is build" -> "The high level API is built"

"LibCurl is licensed under a MIT/X derivate license" -> "LibCurl 
is licensed under an MIT/X derivative license"

AutoConnect:  "Connection type used when the url should be used 
to auto detect protocol."  ->  "auto detect THE protocol"

Why is there a link to curl_easy_set_opt in the byLineAsync and 
byChunkAsync docs?

In onSend:  "The length of the void[] specifies the maximum 
number of bytes that can be send." -> "can be SENT"

What is the use case for exposing struct Curl?  I prefer if this 
were unexposed because we'll obviously be unable to provide a 
replacement if/when the backend to this library is rewritten in 
pure D.

Actually, that leads to another question:  Should this module 
really be named etc.curl/std.curl/std.net.curl, or should the 
fact that it currently uses Curl as a backend be relegated to an 
implementation detail?

Code:

pragma(lib) basically doesn't work on Linux because the object 
format doesn't support it.  Don't rely on it.

Should the protocol detection be case-insensitive, i.e. "ftp://" 
== "FTP://"?
Dec 11 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 12/11/2011 7:53 PM, dsimcha wrote:
 Should the protocol detection be case-insensitive, i.e. "ftp://" ==
 "FTP://"?

Oh, one more thing: Factor the protocol detection out into a function. You have the same expression cut and pasted everywhere: if(url.startsWith("ftp://") || url.startsWith("ftps://") ...
Dec 11 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/12/11 6:25 AM, jdrewsen wrote:
 On Monday, 12 December 2011 at 01:05:25 UTC, Jonathan M Davis wrote:
 On Sunday, December 11, 2011 19:55:28 dsimcha wrote:
 On 12/11/2011 7:53 PM, dsimcha wrote:
 Should the protocol detection be case-insensitive, i.e. > "ftp://" ==
 "FTP://"?

Oh, one more thing: Factor the protocol detection out into a function. You have the same expression cut and pasted everywhere: if(url.startsWith("ftp://") || url.startsWith("ftps://") ...

And startsWith will take multiple arguments, so the code definitely shouldn't have x.startsWith(y) || x.startsWith(z) || ... It's more verbose and less efficient that way. - Jonathan M Davis

I hadn't noticed that multiple arguments is accepted. Thanks. /Jonas

You can also pass a predicate for case insensitive comparison. Andrei
Dec 12 2011
prev sibling next sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 12/12/2011 01:53, dsimcha a écrit :
 Actually, that leads to another question:  Should this module really be
 named etc.curl/std.curl/std.net.curl, or should the fact that it
 currently uses Curl as a backend be relegated to an implementation detail?
 

I think it's good to remind that this is a wrapper and not a native D implementation. At least if I was a user, I would like to know upfront. So I would keep curl in the name.
Dec 12 2011
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-12-12 01:53, dsimcha wrote:
 Here's my review. Remember, review ends on December 16.

 What is the use case for exposing struct Curl? I prefer if this were
 unexposed because we'll obviously be unable to provide a replacement
 if/when the backend to this library is rewritten in pure D.

 Actually, that leads to another question: Should this module really be
 named etc.curl/std.curl/std.net.curl, or should the fact that it
 currently uses Curl as a backend be relegated to an implementation detail?

That's a good question. When I think about it, I think it should be considered an implementation detail.
 Code:

 pragma(lib) basically doesn't work on Linux because the object format
 doesn't support it. Don't rely on it.

 Should the protocol detection be case-insensitive, i.e. "ftp://" ==
 "FTP://"?

I think so. -- /Jacob Carlborg
Dec 12 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 11, 2011 19:55:28 dsimcha wrote:
 On 12/11/2011 7:53 PM, dsimcha wrote:
 Should the protocol detection be case-insensitive, i.e. "ftp://" ==
 "FTP://"?

Oh, one more thing: Factor the protocol detection out into a function. You have the same expression cut and pasted everywhere: if(url.startsWith("ftp://") || url.startsWith("ftps://") ...

And startsWith will take multiple arguments, so the code definitely shouldn't have x.startsWith(y) || x.startsWith(z) || ... It's more verbose and less efficient that way. - Jonathan M Davis
Dec 11 2011
prev sibling next sibling parent reply Somedude <lovelydear mailmetrash.com> writes:
Le 03/12/2011 05:26, dsimcha a crit :
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper.  After the first round it was decided
 that, after a large number of minor issues were fixed, a second round
 would be necessary.
 
 Significant open issues:
 
 1.  Should libcurl be bundled with DMD on Windows?

Yes. This is a must.
Dec 12 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Somedude:

 1.  Should libcurl be bundled with DMD on Windows?

Yes. This is a must.

I agree it's better to have this battery included. Bye, bearophile
Dec 12 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Monday, 12 December 2011 at 10:26:52 UTC, Somedude wrote:
 Le 05/12/2011 22:36, Jonas Drewsen a écrit :
 
 Anyway - I _have_ said that I will add a data availability poll
 functionality so I guess your initial concern is also covered.
 
 /Jonas

Just an idea: would it be possible/useful to use the signals/slots mechanism for this kind of synch ?

This would be most useful if there were some kind of main loop. This is needed because the request is done in another thread and the response should be handled in the main thread. Without an main loop in the main thread or a destinct call to e.g. a poll() method the signal/slot s would not work. As a side note I would very much like D to enter a main loop with a task scheduler as part of the runtime. That way co-routines and event handling would be much nicer to use out of the box. For example: // The main function is running a fiber in the main loop. // the main loop terminates when no more fibers are active void main(string[] args) { fiber( { auto content= download("google.com"); write(content); } ); fiber( { sleep(2.0); writeln("Hello in 2 secs"); } ); writeln("first line written"); } The download() yields until data is downloaded and the sleep() yield until the time has passed. You wouldn't notice this when not using fibers because you would only have the main() fiber which terminates the main loop (and the program) when it is done. Optimally the runtime should support replacing the standard mainloop scheduler with your own. /Jonas
Dec 12 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Monday, 12 December 2011 at 00:53:14 UTC, dsimcha wrote:
 Here's my review.  Remember, review ends on December 16.

 Overall, this library has massively improved due to the rounds 
 of review it's been put through.  I only found a few minor 
 nitpicks.
 However, a recurring pattern is minor grammar mistakes in the 
 documentation.  Please proofread all documentation again.

 Docs:

 "The high level API is build" -> "The high level API is built"

 "LibCurl is licensed under a MIT/X derivate license" -> 
 "LibCurl is licensed under an MIT/X derivative license"

 AutoConnect:  "Connection type used when the url should be used 
 to auto detect protocol."  ->  "auto detect THE protocol"

ok
 Why is there a link to curl_easy_set_opt in the byLineAsync and 
 byChunkAsync docs?

will fix
 In onSend:  "The length of the void[] specifies the maximum 
 number of bytes that can be send." -> "can be SENT"

ok
 What is the use case for exposing struct Curl?  I prefer if 
 this were unexposed because we'll obviously be unable to 
 provide a replacement if/when the backend to this library is 
 rewritten in pure D.

Initially it was not exposed but there were a couple of requests to expose it. Thats why.
 Actually, that leads to another question:  Should this module 
 really be named etc.curl/std.curl/std.net.curl, or should the 
 fact that it currently uses Curl as a backend be relegated to 
 an implementation detail?

I think it should have curl in the name. I do not believe that a native D network library should have the same API as this module. It is limited by the design of libcurl and should be improved by a native D net library.
 Code:

 pragma(lib) basically doesn't work on Linux because the object 
 format doesn't support it.  Don't rely on it.

ok
 Should the protocol detection be case-insensitive, i.e. 
 "ftp://" == "FTP://"?

yes Thanks for the feedback /Jonas
Dec 12 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Monday, 12 December 2011 at 01:05:25 UTC, Jonathan M Davis 
wrote:
 On Sunday, December 11, 2011 19:55:28 dsimcha wrote:
 On 12/11/2011 7:53 PM, dsimcha wrote:
 Should the protocol detection be case-insensitive, i.e. 
 "ftp://" ==
 "FTP://"?

Oh, one more thing: Factor the protocol detection out into a function. You have the same expression cut and pasted everywhere: if(url.startsWith("ftp://") || url.startsWith("ftps://") ...

And startsWith will take multiple arguments, so the code definitely shouldn't have x.startsWith(y) || x.startsWith(z) || ... It's more verbose and less efficient that way. - Jonathan M Davis

I hadn't noticed that multiple arguments is accepted. Thanks. /Jonas
Dec 12 2011
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Monday, 12 December 2011 at 12:24:26 UTC, jdrewsen wrote:
 I think it should have curl in the name. I do not believe that 
 a native D network library should have the same API as this 
 module. It is limited by the design of libcurl and should be 
 improved by a native D net library.

I agree with this, it's pointless to limit ourselves to an interface that was designed with the limitations of curl in mind. Using a generic name for this module gains us absolutely nothing.
Dec 12 2011
prev sibling next sibling parent "dsimcha" <dsimcha yahoo.com> writes:
On Tuesday, 13 December 2011 at 00:47:26 UTC, David Nadlinger 
wrote:
 I don't know if you already have a solution in the works, but 
 maybe the future interface I did for Thrift is similar to what 
 you are looking for: 
 http://klickverbot.at/code/gsoc/thrift/docs/thrift.util.future.html

 David

Doesn't std.parallelism's task parallelism API work for this? (Roughly speaking a task in std.parallelism == a future in your Thrift API.) If not, what can I do to fix it so that it can? Looking briefly at your API, one thing I notice is the ability to cancel a future. This would be trivial to implement in std.parallelism for tasks that haven't yet started executing, but difficult if not impossible for tasks that are already executing. Does your Thrift API allow cancelling futures that are already executing? If so, how is that accomplished? The TFutureAggregatorRange could be handled by a parallel foreach loop if I understand correctly, though it would look a little different.
Dec 12 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Monday, 12 December 2011 at 14:42:17 UTC, Andrei Alexandrescu 
wrote:
 On 12/12/11 6:25 AM, jdrewsen wrote:
 On Monday, 12 December 2011 at 01:05:25 UTC, Jonathan M Davis 
 wrote:
 On Sunday, December 11, 2011 19:55:28 dsimcha wrote:
 On 12/11/2011 7:53 PM, dsimcha wrote:
 Should the protocol detection be case-insensitive, i.e. > 
 "ftp://" ==
 "FTP://"?

Oh, one more thing: Factor the protocol detection out into a function. You have the same expression cut and pasted everywhere: if(url.startsWith("ftp://") || url.startsWith("ftps://") ...

And startsWith will take multiple arguments, so the code definitely shouldn't have x.startsWith(y) || x.startsWith(z) || ... It's more verbose and less efficient that way. - Jonathan M Davis

I hadn't noticed that multiple arguments is accepted. Thanks. /Jonas

You can also pass a predicate for case insensitive comparison. Andrei

Perfect
Dec 13 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Tuesday, 13 December 2011 at 00:47:26 UTC, David Nadlinger 
wrote:
 On 12/12/11 1:18 PM, jdrewsen wrote:
 On Monday, 12 December 2011 at 10:26:52 UTC, Somedude wrote:
 Just an idea: would it be possible/useful to use the 
 signals/slots
 mechanism for this kind of synch ?

This would be most useful if there were some kind of main loop. This is needed because the request is done in another thread and the response should be handled in the main thread. Without an main loop in the main thread or a destinct call to e.g. a poll() method the signal/slot s would not work.

I don't know if you already have a solution in the works, but maybe the future interface I did for Thrift is similar to what you are looking for: http://klickverbot.at/code/gsoc/thrift/docs/thrift.util.future.html David

I don't have any plans for this in the curl stuff. But it is something like your future implementation and a builtin mainloop I think would make async programming much easier in D. /Jonas
Dec 13 2011
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, December 02, 2011 23:26:10 dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper.  After the first round it was decided
 that, after a large number of minor issues were fixed, a second round
 would be necessary.
 
 Significant open issues:
 
 1.  Should libcurl be bundled with DMD on Windows?

I'd argue that it should be a separate download but that it should definitely be provided.
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

If we're going to follow a policy that wrappers around 3rd party libraries go in etc, then it should be etc.curl. Otherwise, I think that it should be std.net.curl. I'd argue that AutoConnection should be AutoConnect, because it's shorter and just as informative. I'd argue that the acronyms should be in all caps if camelcasing would require that the first letter of the acronym be capitalized and all lower case if camelcasing would require that the first letter of the acronym be lower case. We're not completely consistent in how we using acronyms in names in Phobos, but I believe that we primarily follow that rule when putting them in symbol names. So, for instance, it would be HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp. I'd rename T to C or Char in the template parameters, since it's a character type rather than a generic type. Not a huge deal, but I think that it makes the code quicker to understand. Why are some parameters const(char)[] and others string? Why not make them all const(char)[] or even actually templatize them on character type (one per parameter which is a string)? Shorten the URL in your examples. Its long length increases the risk of making the examples too wide. It's a made up URL, so there's no need for it to be that long. Something like foo.bar.com is probably more than enough. Why does byLine use '\x0a' instead of '\n'? '\n' is clear, whereas most people will have to look up what '\x0a' is, so I really think that it should be '\n'. Also, if byLine is going to return an internal range type instead of having an externally defined one, its documentation needs to explain what Char is used for. As it stands, it looks like a pointless template parameter. To do that, the return section should probably say something like "A range of Char[]..." All of this goes for byLineAsync as well. In general, the functions should do a better job of documenting their parameters. Many of them don't document their parameters at all. For instance, while it's fairly easy to guess what keepTerminator and terminator are used for, there is no explanation about them whatsoever in either byLine or byLineAsync's documentation. If any of the range return types of these functions have non-standard functions on them, they need to be properly documented. e.g. wait(Duration) is mentioned in passing in byLineAsync's documentation, but no explanation for its purpose is given. The documentation needs to explain everything the programmer needs to know about the return types. For most ranges, it's enough to say that the return type is a range of X, but if the return type has anything other than the standard range functions, then just saying that the function returns a range of X is not enough. The extra functions must be listed and explained. I'd warn you that the fact that the documentation for Protocol is showing up in the documentation is likely a bug, since Protocol is private - probably the bug that all templates are currently public even if they're marked private. So, either Protocol needs to be public (or maybe protected), or it's going to disappear from the documentation eventually. Another option would be to do something similar to std.container.TotalContainer and declare a struct which is intended only for documentation (probably in a version(StdDdoc) block), and put all of the protocol documenation on _it_ instead of the mixin. Also, Protocol needs a better explanation as to what it's actually for. "Mixin template for all supported curl protocols" doesn't really tell me anything other than the fact that it relates to the supported protocols somehow. Looking at the source, it seems clear enough, but not from the documentation - especially when the documentation seems to imply that it's specific to the Http struct. onReceiveHeader has const(char)[] parameters but mentions char[] in its documentation. The meaning is clear enough, but it would be more correct to say const(char)[] or to just mention the variable names explicitly. It would be nice if Smtp.mailTo was variadic - i.e. mailTo(string[] recipients...). All of the current use cases for it would be exactly the same with the addition of being able to do something like mailTo(str1, str2, str3). The exception type's constructors should look like this: nothrow this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null) { super(msg, file, line, next); } As it stands, the line number will be completely wrong. You can just copy std.string.StringException and rename it. And please use that exact signature. Don't make it take a const(char)[]. Exception requires a string, and with the constructor taking const(char)[], it's going to needlessly idup every string that gets passed into it. The code which calls it can idup if it has a char[] instead of an immutable(int)[]. Having it take a const(char)[] adds no value and is less efficient. The same goes for any other function which is always going to idup its const(char)[] argument. Just make it take a string. Why do CURLoption and CURLcode have CURL in uppercase? That doesn't match the rest of the module. I'd argue that they should be CurlOption and CurlCode. To make matters even weirder, your examples appear to use CurlOption but the declarations use CURLoption. They should be consistent. - Jonathan M Davis
Dec 14 2011
next sibling parent Justin C Calvarese <jccalvarese gmail.com> writes:
== Quote from jdrewsen (jdrewsen nospam.com)'s article
 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis
 wrote:

 I'd argue that the acronyms should be in all caps if
 camelcasing would require that the first letter of the acronym
 be capitalized and all lower case if camelcasing would require
 that the first letter of the acronym be lower case. We're not
 completely consistent in how we using acronyms in names in
 Phobos, but I believe that we primarily follow that rule when
 putting them in symbol names. So, for instance, it would be
 HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.

then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

(I'm sorry if this comes across as a repost, but I guess I don't know how to post using the mailing list.) I absolutely agree that the "D Style" page should be revised and extended to reflect the current standards that new Phobos modules are expected to adhere to. I found an open pull request that indicated a past effort in this regard: https://github.com/D-Programming-Language/d-programming-language.org/pull/16 (I thought that another document was also floating around that was a proposed new "Style Guide for Phobos", but I don't recall who authored it and I don't remember where it was located.) jcc7
Dec 17 2011
prev sibling parent Justin C Calvarese <jccalvarese gmail.com> writes:
== Quote from jdrewsen (jdrewsen nospam.com)'s article
 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis
 wrote:

 I'd argue that the acronyms should be in all caps if
 camelcasing would require that the first letter of the acronym
 be capitalized and all lower case if camelcasing would require
 that the first letter of the acronym be lower case. We're not
 completely consistent in how we using acronyms in names in
 Phobos, but I believe that we primarily follow that rule when
 putting them in symbol names. So, for instance, it would be
 HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.

then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

(I'm sorry if this comes across as a repost, but I guess I don't know how to post using the mailing list.) I absolutely agree that the "D Style" page should be revised and extended to reflect the current standards that new Phobos modules are expected to adhere to. I found an open pull request that indicated a past effort in this regard: https://github.com/D-Programming-Language/d-programming-language.org/pull/16 (I thought that another document was also floating around that was a proposed new "Style Guide for Phobos", but I don't recall who authored it and I don't remember where it was located.) jcc7
Dec 17 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Monday, December 12, 2011 01:53:13 dsimcha wrote:
 Actually, that leads to another question:  Should this module
 really be named etc.curl/std.curl/std.net.curl, or should the
 fact that it currently uses Curl as a backend be relegated to an
 implementation detail?

It's too specific to curl for curl to be an implementation detail. I expect that if we were designing a general API or one of our own which didn't use curl at all, I wouldn't expect the API to be the same. - Jonathan M Davis
Dec 14 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
Line# 235 is identical to line# 239. Shouldn't line# 235 be creating an Http 
object, not an Ftp object? That mistake definitely makes it look like download 
hasn't been properly tested.

You should create template similar to

template isCurlConn(Conn)
{
    auto isCurlConn = is(Conn : Http) || is(Conn : Ftp) || is(Conn : 
AutoConnection);
}

It would reduce code duplication.

The functions which have a template parameter T = char really need to have 
that parameter properly explained in the documentation. From the 
documentation, I would have thought that it was any character type, but it's 
char or ubyte. There is no hint whatsoever in the documentation that ubyte 
would work. All of the parameters need to be made clear. That goes for all 
functions.

Is there a reason that the functions which have a template argument T which 
can be either a char or a ubyte can't work with immutable char? Glancing at 
_basicHttp, I don't see any reason why T couldn't be immutable char. Yes, it 
would require casting to immutable(char)[], but you're already casting to 
char[], and the data being returned appears to be unique such that it could be 
safely cast to immutable. That being the case, I'd encourage you to not only 
make it work with immutable char but to make immutable char the default 
instead of char.

Once you've fixed the exception types like I requested in my first review, you 
can and should use enforceEx!CurlException(cond, msg) instead of enforce(cond, 
new CurlException(msg)).

it may not matter, since we're dealing with strings here, but I'd argue that 
!empty should be used rather than length > 0 (e.g. line# 475). In stuff other 
than strings it definitely can be more efficient, and even with strings, it may 
be, since I think that checking whether the length is 0 (as empty does) is 
slightly more efficient than checking whether it's greater than 0.

Please make sure that opening braces on on their own line. That's the way that 
the rest of Phobos does it and it's one of the few formatting rules that we've 
been trying to use consistently throughout Phobos. For the most part, you get 
it right, but not everywhere - nested functions in particular seem to have 
braces on the same line for some reason.

It's not a big deal, but you should use auto more. For instance, lines #626, 
#638, and #640 don't use auto when they could.

I have no idea why you keep putting parens around uses of the in operator, but 
it's not necessary and makes the code harder to read IMHO. It's certainly not 
required that you change that, but I'd appreciate it if you did.

I see that regexes are used in the module. Please make sure that they still 
work correctly with the new std.regex. They probably do, but it's not 100% 
backwards compatible.

byLine's template constraint needs to verify that the types of Terminator and 
Char are valid. As it is, I could try and pass it something like an Http 
struct if I wanted to. In general, _all_ templates need to verify that their 
arguments are of the appropriate type for _all_ of their arguments.

It's odd that popFront (line #780) is not  safe when empty and front are. Does 
findSplit or findSplitAfter prevent it?

With a name like Char on byLine, I'd expect it to take any char type, but not 
only do you not verify the type (as previously mentioned), but you instantiate 
_basicHttp with it, which works with char and ubyte, not any char. If Char is 
going to take char and ubyte specifically, then Char is a bad name for it.

I'd suggest changing line# 823 to

auto result = get(url, isFtpUrl(url) ? FTP() : Http());

It would save several lines of code.

Line# 883 should be using +=.

This would be easier if I could comment on the code directly, but I don't seem 
to be able to with the link that dsimcha provided (probably because it's blob; 
there's probably another link which would allow direct commenting, but oh 
well). I'll comment more on the implementation later, but bed beckons...

- Jonathan M Davis
Dec 15 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
Please make sure that you remove trailing whitespace from the file. A lot of 
the lines have trailing whitespace. Also, make sure that you don't have any 
tabs in the file. There are a few places where you used tabs.

Line# 916 claims that the code there won't work and that it needs to be fixed, 
so please fix it before it goes into Phobos (assuming that it's voted in).

On line# 1051, just do

auto arr = new Char[](bufferSize);

There's no reason to create it and then assign to its length property.

The braces in the block follow line# 1131 need to be fixed. The indentation is 
wrong on some of them, and not all of them are on their own line.

You should see if the cast(void[])'s can be removed from byLineAsync. They 
might still be necessary, but null was given a type with the last release, so 
it might work without the cast now. The same goes with byChunkAsync.

Same with line# 1215 as 1051. There's no point in wasting a line of code by 
declaring an array and the assigning to its length variable. The code is 
probably slightly less efficient that way too. Just allocate a new array of the 
correct length.

On line#1297, the braces need to be fixed. Same with line# 1308 and line# 1334. 
Just check over your braces in general to make sure that they're consistent 
and are on their own line unless you're dealing with something like a one line 
lambda.

And as I mentioned before, all of these enforces which take a new 
CurlException, should be changed to use enforceEx after you've fixed 
CurlException's constructor.

The documentation on Protocol's isValid is wrong. It claims that isValid is 
true if the instance is stoppend and invalid. Wouldn't that mean that the 
object _wasn't_ valid?

There are quite a few places where you're concatenating several strings and 
format would make the code much cleaner (e.g. Protocol's netInterface 
function). Unless you're avoiding format in an attempt to allow functions to 
be pure (since unfortunately, format can't currently be pure), you really 
should be using format.

I'd argue that it's better to use empty than check whether a string is equal 
to "". e.g. domain.empty instead of domain != "" on line# 1538.

There bodies of Http's constructor, static opCall, and dup functions are all 
very similar - especially the constructor and static opCall. You should look 
at refactoring those so that they don't have to duplicate so much code. Maybe 
it can't be reasonably done easily, but it would certainly be better if that 
code duplication could be reduced.

Change setTimeCondition to take a SysTime. DateTime is not intended for 
timestamps. It's intended for calendar-based time, not the system time. Also, 
SysTime makes it easier to convert to time_t. In general, the fact that you're 
looking to deal with a time_t is a sign that you should be using SysTime. This 
is especially true, since DateTime is going to give the wrong time_t in 
general, since it has no time zone. As it stands, the timestamp is assumed to 
be in UTC. That's just begging for bugs. Users won't expect that. Definitely 
make it a SysTime. And once that's done, line# 2016 would disappear, and line# 
2017 would become

p.curl.set(CurlOption.timevalue, timestamp.toUnixTime());

On line# 2247, why do you use format with no format string? There should be no 
~ or to!string in a call to format. It's just creating unnecessary memory 
allocations. It should be something more like

format("%s%s(%s.", code, reason, majorVersion, minorVersion);

Though I find that '(' to be rather odd, since it has no matching closing 
paren, so one may need to be added. Regardless, that line shouldn't be 
concatenating strings or use conv.to. format takes care of all of that and 
should do it with fewer memory allocations.

Pointers really should have their * next to the type not floating in space like 
on line #2274. As it is, the code looks like a multiplication. Unlike in 
C/C++, the * is clearly associated with the type. e.g.

int* a, b;

creates an int* and an int in C, but it creates int* and int* in D. Separating 
the * from the type has a tendancy to make the code harder to understand.

Ftp's consturctor and static opCall have a code duplication problem similar to 
that of Http.

Make encoding take a string. Any time that you are _always_ going to idup a 
parameter which is a character array, make it a string, _not_ const. That way, 
if it's a string, no idup is necessary, and if it's char[], then it can be 
iduped when it's passed in, and iduping only occurs when it's actually 
necessary.

In Smtp's constructor, yo ushould probably create a variable for the toLowered 
url. e.g.

auto lowered = url.toLower();

That way, you avoid having to lower it twice if the else branch of the if-else 
is taken.

Is p.curl.perform a property? If so, line# 2455 does nothing. If not, then it 
needs parens. The curl module should be able to be built with -property. I 
believe that Phobos as a whole is currently being built with tha flag. If not, 
it will be soon, and this module will need to do that if it's merged into 
Phobos.

If you can, please use a static foreach with EnumMembers!CurlOption in Curl's 
dup. And if you can't, because you're not clearing all of them, then put all 
of the ones that you _are_ clearing in an array (or TypeTuple if you want to 
avoid the allocation for the array) and iterate over them with a foreach. e.g.

foreach(option; TypeTuple!(CurlOption.file, CurlOptions.writefunction))
    copy.clear(option);

You should be able to cut down on the number of lines by doing that 
(_especially_ if you're actually clearing all of them and can use 
EnumMembers!CurlOption).

Line#2639 is another case where format should be used. In general, if you're 
doing more than one ~, consider using format unless you're trying to make a 
function pure.

I'd suggest renaming Message to something like CurlMessage. The odds of a name 
clash with Message are likely high. And while they won't be able to use 
Message, since it's private, it will still affect whether the full path for 
Message must be given (e.g. my.module.Message instead of Message). CurlMessage 
is far less likely to clash.

Rename DATA to Data. DATA does not follow Phobos' naming conventions. Granted, 
it's private, but it's completely off. Type names should be pascal cased.

Why isn't empty a property on #3056? Sure, it's not a range, but it would be 
more consistent with everything else if it were  a property.

In general, you should use when converting between TickDuration and Duration, 
but in some cases, no conversion should be necessary. For instance, line# 985 
shouldn't need to do any converting. Duration's opOpAssign should be able to 
handle a TickDuration.

Overall, the design looks fairly solid, and the code looks pretty good, but 
there are definitely a number of minor issues which should be addressed before 
this code makes it into Phobos. The biggest involve the functions' parameters 
(such as using SysTime, not DateTime, and using string in some places where 
you're using const(char)[] in order to avoid unnecessary idups). And those 
definitely need to be sorted out sooner rather than later, since they affect
the 
API and will break code if they're changed later.

- Jonathan M Davis
Dec 16 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis 
wrote:
 On Friday, December 02, 2011 23:26:10 dsimcha wrote:
 I volunteered ages ago to manage the review for the second 
 round of
 Jonas Drewsen's CURL wrapper.  After the first round it was 
 decided
 that, after a large number of minor issues were fixed, a 
 second round
 would be necessary.
 
 Significant open issues:
 
 1.  Should libcurl be bundled with DMD on Windows?

I'd argue that it should be a separate download but that it should definitely be provided.
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a 
 while back
 but it was buried deep in a thread and a lot of people may 
 have missed
 it:  
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

If we're going to follow a policy that wrappers around 3rd party libraries go in etc, then it should be etc.curl. Otherwise, I think that it should be std.net.curl. I'd argue that AutoConnection should be AutoConnect, because it's shorter and just as informative.

AutoConnect sounds like a command to connection automatically which might be confusing since that is not what it does. Therefore I went with AutoConnection which I still believe is better.
 I'd argue that the acronyms should be in all caps if 
 camelcasing would require that the first letter of the acronym 
 be capitalized and all lower case if camelcasing would require 
 that the first letter of the acronym be lower case. We're not 
 completely consistent in how we using acronyms in names in 
 Phobos, but I believe that we primarily follow that rule when 
 putting them in symbol names. So, for instance, it would be 
 HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.

I like Http better as personal taste. But if HTTP is preferred then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.
 I'd rename T to C or Char in the template parameters, since 
 it's a character type rather than a generic type. Not a huge 
 deal, but I think that it makes the code quicker to understand.

I've named it T because it can also be ubyte. In that case C for Char is confusing.
 Why are some parameters const(char)[] and others string? Why 
 not make them all const(char)[] or even actually templatize 
 them on character type (one per parameter which is a string)?

The parameters that are const(char)[] will accept both string and char[] which is what I want because libcurl make an internal copy of it anyway. If I typed it as string then you would have to idup a char[] for the parameter without any reason. The parameters that are string is internally passed to functions (in other std modules) that only accept strings. I could accept const(char)[] and idup it in order to have consistant parameter types but that gives an unnecessary overhead.
 Shorten the URL in your examples. Its long length increases the 
 risk of making the examples too wide. It's a made up URL, so 
 there's no need for it to be that long. Something like 
 foo.bar.com is probably more than enough.

Actually the long urls point to a google appspot app that fits the example. I've done it this way so that people can just copy/paste the example and have something working. Not too many people have a test HTTP POST server setup for a quick test AFAIK.
 Why does byLine use '\x0a' instead of '\n'? '\n' is clear, 
 whereas most people will have to look up what '\x0a' is, so I 
 really think that it should be '\n'.

I looked at how it was done elsewhere in phobos and did like that: http://dlang.org/phobos/std_stdio.html#byLine But I agree with you that \n is better and will change it.
 Also, if byLine is going to return an internal range type 
 instead of having an externally defined one, its documentation 
 needs to explain what Char is used for. As it stands, it looks 
 like a pointless template parameter. To do that, the return 
 section should probably say something like "A range of 
 Char[]..." All of this goes for byLineAsync as well.

Ok.
 In general, the functions should do a better job of documenting 
 their parameters. Many of them don't document their parameters 
 at all. For instance, while it's fairly easy to guess what 
 keepTerminator and terminator are used for, there is no 
 explanation about them whatsoever in either byLine or 
 byLineAsync's documentation.

Again I looked at phobos to see what was current the style: http://dlang.org/phobos/std_stdio.html#byLine I'll include some more detail though.
 If any of the range return types of these functions have 
 non-standard functions on them, they need to be properly 
 documented. e.g. wait(Duration) is mentioned in passing in 
 byLineAsync's documentation, but no explanation for its purpose 
 is given. The documentation needs to explain everything the 
 programmer needs to know about the return types. For most 
 ranges, it's enough to say that the return type is a range of 
 X, but if the return type has anything other than the standard 
 range functions, then just saying that the function returns a 
 range of X is not enough. The extra functions must be listed 
 and explained.

This is what is currently documented: "If no data is available and the main thread access the range it will block until data becomes available. An exception to this is the $(D wait(Duration)) method on the range. This method will wait at maximum for the specified duration and return true if data is available." I guess that explains it?
 I'd warn you that the fact that the documentation for Protocol 
 is showing up in the documentation is likely a bug, since 
 Protocol is private - probably the bug that all templates are 
 currently public even if they're marked private. So, either 
 Protocol needs to be public (or maybe protected), or it's going 
 to disappear from the documentation eventually. Another option 
 would be to do something similar to 
 std.container.TotalContainer and declare a struct which is 
 intended only for documentation (probably in a version(StdDdoc) 
 block), and put all of the protocol documenation on _it_ 
 instead of the mixin.

Ok. didn't know about the private template bug. I really think it is a bug in ddoc that it should be able to insert mixed in docs ie. the Protocol documentation should be mixed into the Http documentation. Anyway - I don't see the TotalContainer docs showing in the generated html? Another workaround is to manually copy the Protocol docs into the HTTP/FTP/SMTP structs inside a version(StdDdoc).
 Also, Protocol needs a better explanation as to what it's 
 actually for. "Mixin template for all supported curl protocols" 
 doesn't really tell me anything other than the fact that it 
 relates to the supported protocols somehow. Looking at the 
 source, it seems clear enough, but not from the documentation - 
 especially when the documentation seems to imply that it's 
 specific to the Http struct.

I'll remove the HTTP hint and improve the doc.
 onReceiveHeader has const(char)[] parameters but mentions 
 char[] in its documentation. The meaning is clear enough, but 
 it would be more correct to say const(char)[] or to just 
 mention the variable names explicitly.

ok.
 It would be nice if Smtp.mailTo was variadic  - i.e. 
 mailTo(string[] recipients...). All of the current use cases 
 for it would be exactly the same with the addition of being 
 able to do something like mailTo(str1, str2, str3).

ok
 The exception type's constructors should look like this:

   nothrow this(string msg, string file = __FILE__, size_t line 
 = __LINE__, Throwable next = null)
   {
       super(msg, file, line, next);
   }

 As it stands, the line number will be completely wrong. You can 
 just copy std.string.StringException and rename it. And please 
 use that exact signature. Don't make it take a const(char)[]. 
 Exception requires a string, and with the constructor taking 
 const(char)[], it's going to needlessly idup every string that 
 gets passed into it. The code which calls it can idup if it has 
 a char[] instead of an immutable(int)[]. Having it take a 
 const(char)[] adds no value and is less efficient. The same 
 goes for any other function which is always going to idup its 
 const(char)[] argument. Just make it take a string.

ok
 Why do CURLoption and CURLcode  have CURL in uppercase? That 
 doesn't match the rest of the module. I'd argue that they 
 should be CurlOption and CurlCode. To make matters even 
 weirder, your examples appear to use CurlOption but the 
 declarations use CURLoption. They should be consistent.

ok Thanks for the comments. -Jonas
Dec 17 2011
prev sibling next sibling parent Justin C Calvarese <technocrat7+d gmail.com> writes:
--f46d04083875582ee704b44b4685
Content-Type: text/plain; charset=ISO-8859-1

On Sat, Dec 17, 2011 at 5:56 AM, jdrewsen <jdrewsen nospam.com> wrote:

 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis wrote:

 ...



  I'd argue that the acronyms should be in all caps if camelcasing would
 require that the first letter of the acronym be capitalized and all lower
 case if camelcasing would require that the first letter of the acronym be
 lower case. We're not completely consistent in how we using acronyms in
 names in Phobos, but I believe that we primarily follow that rule when
 putting them in symbol names. So, for instance, it would be HTTP, FTP, and
 SMTP rather than Http, Ftp, and Smtp.

I like Http better as personal taste. But if HTTP is preferred then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

I absolutely agree that the "D Style" page should be revised and extended to reflect the current standards that new Phobos modules are expected to adhere to. I found an open pull request that indicates a past effort in this regard: https://github.com/D-Programming-Language/d-programming-language.org/pull/16 (I thought that another document was also floating around that was a proposed new "Style Guide for Phobos", but I don't recall who authored it and I don't remember where it was located.) jcc7 --f46d04083875582ee704b44b4685 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sat, Dec 17, 2011 at 5:56 AM, jdrewsen <span dir=3D"ltr">&lt;<a href=3D"= mailto:jdrewsen nospam.com">jdrewsen nospam.com</a>&gt;</span> wrote:<br><d= iv class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:= 0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class=3D"im">On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M= Davis wrote:<br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"><blockquote class=3D= "gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding= -left:1ex"> <br></blockquote></div></blockquote></blockquote><div>...=A0</div><blockquo= te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so= lid;padding-left:1ex"><div class=3D"im"> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> I&#39;d argue that the acronyms should be in all caps if camelcasing would = require that the first letter of the acronym be capitalized and all lower c= ase if camelcasing would require that the first letter of the acronym be lo= wer case. We&#39;re not completely consistent in how we using acronyms in n= ames in Phobos, but I believe that we primarily follow that rule when putti= ng them in symbol names. So, for instance, it would be HTTP, FTP, and SMTP = rather than Http, Ftp, and Smtp.<br> </blockquote> <br></div> I like Http better as personal taste. But if HTTP is preferred then I&#39;l= l do that. Actually I think I will make a pull request to extend the <a hre= f=3D"http://dlang.org/dstyle.html" target=3D"_blank">dlang.org/dstyle.html<= /a> doc with and example that we can point to when someone asks about styli= ng. I&#39;ve spend too much time restyling because there is no such style d= oc already and people are complaining about style in reviews anyway.</block= quote> <div><br></div><div>I absolutely agree that the &quot;D Style&quot; page sh= ould be revised and extended to reflect the current standards that new Phob= os modules are expected to adhere to.=A0</div><div><br></div><div>I found a= n open pull request that indicates a past effort in this regard:</div> <div><a href=3D"https://github.com/D-Programming-Language/d-programming-lan= guage.org/pull/16">https://github.com/D-Programming-Language/d-programming-= language.org/pull/16</a> </div><div><br></div><div>(I thought that another document was also floatin= g around that was a proposed new &quot;Style Guide for Phobos&quot;, but I = don&#39;t recall who authored it and I don&#39;t remember where it was loca= ted.)</div> <div><br></div><div>jcc7</div></div> --f46d04083875582ee704b44b4685--
Dec 17 2011
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
The docs for onReceiveHeader,

http://freeze.steamwinter.com/D/web/phobos/etc_curl.html#onReceiveHeader

explicitly says that the const string parameters are not valid 
after the function returns. This is all well and good; but a 
minor improvement would be to change the signature to:

 property void onReceiveHeader(void delegate(in char[] key, in 
char[] value) callback);

to add `scope` to the parameters. This documents the fact that 
the constant strings shouldn't be escaped as-is in both code and 
documentation. I suspect many other instances of `const(char)[]` 
could be changed to `in char[]` to better document how they use 
the parameters (I also personally think it looks better as it is 
more concise).
Dec 17 2011
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 17 December 2011 at 11:56:04 UTC, jdrewsen wrote:
 AutoConnect sounds like a command to connection automatically 
 which might be confusing since that is not what it does. 
 Therefore I went with AutoConnection which I still believe is 
 better.

How about something completely different then, like URLInfer? Also, some instances in the docs of "url" in lowercase should probably be changed to uppercase.
Dec 17 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Thursday, 15 December 2011 at 08:51:29 UTC, Jonathan M Davis 
wrote:
 Line# 235 is identical to line# 239. Shouldn't line# 235 be 
 creating an Http object, not an Ftp object? That mistake 
 definitely makes it look like download hasn't been properly 
 tested.

It has been tested as you can see by the unittest just below the function. It just did not fail because libcurl requires the same setup for this download situation for both ftp and http. Will fix it of course.
 You should create template similar to

 template isCurlConn(Conn)
 {
   auto isCurlConn = is(Conn : Http) || is(Conn : Ftp) || 
 is(Conn : AutoConnection);
 }

 It would reduce code duplication.

ok
 The functions which have a template parameter T = char really 
 need to have that parameter properly explained in the 
 documentation. From the documentation, I would have thought 
 that it was any character type, but it's char or ubyte. There 
 is no hint whatsoever in the documentation that ubyte would 
 work. All of the parameters need to be made clear. That goes 
 for all functions.

ok
 Is there a reason that the functions which have a template 
 argument T which can be either a char or a ubyte can't work 
 with immutable char? Glancing at _basicHttp, I don't see any 
 reason why T couldn't be immutable char. Yes, it would require 
 casting to immutable(char)[], but you're already casting to 
 char[], and the data being returned appears to be unique such 
 that it could be safely cast to immutable. That being the case, 
 I'd encourage you to not only make it work with immutable char 
 but to make immutable char the default instead of char.

Is is indeed unique and can be cast to immutable. I'll add that option. Why do you think immutable char as the default is better than char? I know that the return type in that case would be string and not char[] - but why is that better?
 Once you've fixed the exception types like I requested in my 
 first review, you can and should use 
 enforceEx!CurlException(cond, msg) instead of enforce(cond, new 
 CurlException(msg)).

ok
 it may not matter, since we're dealing with strings here, but 
 I'd argue that !empty should be used rather than length > 0 
 (e.g. line# 475). In stuff other than strings it definitely can 
 be more efficient, and even with strings, it may be, since I 
 think that checking whether the length is 0 (as empty does) is 
 slightly more efficient than checking whether it's greater than 
 0.

ok
 Please make sure that opening braces on on their own line. 
 That's the way that the rest of Phobos does it and it's one of 
 the few formatting rules that we've been trying to use 
 consistently throughout Phobos. For the most part, you get it 
 right, but not everywhere - nested functions in particular seem 
 to have braces on the same line for some reason.

When I look through the code it seems ok with regards to braces on own line. Maybe by nested functions you mean delegates? The delegates I use do indeed have braces on same line which is ok afaik.
 It's not a big deal, but you should use auto more. For 
 instance, lines #626, #638, and #640 don't use auto when they 
 could.

I'm a bit in doubt. On one hand it is great to make everything auto in order to make it easy to change type and to remove redundancy. One the other hand it is very convenient to be able to see the type when you read the code. I know that a clever editor would be able to figure out the type for me. But I also read code in normal editors for diffs etc. or on github. Anyway - I've changed as much as possible to auto now.
 I have no idea why you keep putting parens around uses of the 
 in operator, but it's not necessary and makes the code harder 
 to read IMHO. It's certainly not required that you change that, 
 but I'd appreciate it if you did.

Bad habit. Will remove.
 I see that regexes are used in the module. Please make sure 
 that they still work correctly with the new std.regex. They 
 probably do, but it's not 100% backwards compatible.

ok
 byLine's template constraint needs to verify that the types of 
 Terminator and Char are valid. As it is, I could try and pass 
 it something like an Http struct if I wanted to. In general, 
 _all_ templates need to verify that their arguments are of the 
 appropriate type for _all_ of their arguments.

ok
 It's odd that popFront (line #780) is not  safe when empty and 
 front are. Does findSplit or findSplitAfter prevent it?

exactly
 With a name like Char on byLine, I'd expect it to take any char 
 type, but not only do you not verify the type (as previously 
 mentioned), but you instantiate _basicHttp with it, which works 
 with char and ubyte, not any char. If Char is going to take 
 char and ubyte specifically, then Char is a bad name for it.

I'll verify the type. For byLine it will only accept char types.
 I'd suggest changing line# 823 to

 auto result = get(url, isFtpUrl(url) ? FTP() : Http());

Not possible. get() is a template function where the template parameter is connection type ie. one of FTP or HTTP. isFtpUrl(url) is evaluated at runtime and not compile time.
 It would save several lines of code.

 Line# 883 should be using +=.

ok
 This would be easier if I could comment on the code directly, 
 but I don't seem to be able to with the link that dsimcha 
 provided (probably because it's blob; there's probably another 
 link which would allow direct commenting, but oh well). I'll 
 comment more on the implementation later, but bed beckons...

 - Jonathan M Davis

Thanks for this second round of comments /Jonas
Dec 17 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/2/11 10:26 PM, dsimcha wrote:
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper. After the first round it was decided that,
 after a large number of minor issues were fixed, a second round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

Yes.
 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while back but
 it was buried deep in a thread and a lot of people may have missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

std.net.curl
 Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

I have a few comments. They are not critical for library's admission (apologies I didn't make the deadline) and could be integrated before the commit. 1. Tables must be centered (this is a note to self) 2. Replace d-p-l.org in examples with dlang.org 3. "Compared to using libcurl directly this module provides a simpler API for performing common tasks." -> "Compared to using libcurl directly this module allows simpler client code for common uses, requires no unsafe operations, and integrates better with the rest of the language." 4. From the high level ops it seems there's no way to issue a PUT/POST and then direct the result to a file, or read the result with a foreach. 5. Also there seems to be no way to issue a PUT/POST from a range. 6. s/Examples:/Example:/ 7. s/HTTP http = HTTP("www.d-p-l.org");/auto http = HTTP("www.d-p-l.org");/ 8. "receives a header or a data buffer" -> "receives a header and a data buffer, respectively" 9. "In this simple example, the headers are writting to stdout and the data is ignored." How can we make it to stop the request? A sentence about that would be great. 10. "Finally the HTTP request is started by calling perform()." -> "Finally the HTTP request is effected by calling perform(), which is synchronous." 11. Use InferredProtocol instead of AutoConnect(ion). The type is exactly what it claims. If "inferred" is too long, "AutoProtocol" would be fine too. But it's not the connection that is automated. The word must not be a verb, i.e. InferProtocol/DeduceProtocol would not be good. 12. The example for InferredProtocol nee AutoConnection does not show any interesting example, e.g. ftp. 13. I think download() and upload() are good convenience functions but are very limited. Such operations can last arbitrarily long and it's impossible to stop them. Therefore, they will foster poor program design (e.g. program hangs until you kill it etc). We should keep these for convenience in scripts, and we already have byXxx() for downloading, but we don't have anything for uploading. We should offer output ranges for uploading sync/async. Maybe in the next minor release. 14. Parameter doc for put(), post(), del() etc. is messed up. 15. I don't think "connection" is the right term. For example we have T[] get(Conn = AutoConnection, T = char)(const(char)[] url, Conn conn = Conn()); The connection is established only during the call to get(), so conn is not really a connection - more like a sort of a bundle of connection parameters. But then it does clean up the connection once established, so I'm unsure... 16. For someone who'd not versed in HTTP options, the example string content = options("d-programming-language.appspot.com/testUrl2", "something"); isn't terribly informative. 17. In byLine: "A range of lines is returned when the request is complete." This suggests the last byte has been read as the client gets to the first, which is not the case. "Data will be read on demand as the foreach loop makes progress." Similar for byChunk. 18. In byXxxAsync wait(Duration) should be cross-referenced to its definition. ========================= Overall: I think this is a valuable addition to Phobos, but I have the feeling we don't have a good scenario for interrupting connections. For example, if someone wants to offer a "Cancel" button, the current API does not give them a robust option to do so: all functions that transfer data are potentially blocking, and there's no thread-shared way to interrupt a connection in a thread from another thread. Such functionality may be a pure addition later, so I think we can commit this as is. Please use std.net.curl. Thanks, Andrei
Dec 17 2011
parent "jdrewsen" <jdrewsen nospam.com> writes:
On Saturday, 17 December 2011 at 22:25:07 UTC, Andrei 
Alexandrescu wrote:
 On 12/2/11 10:26 PM, dsimcha wrote:
 I volunteered ages ago to manage the review for the second 
 round of
 Jonas Drewsen's CURL wrapper. After the first round it was 
 decided that,
 after a large number of minor issues were fixed, a second 
 round would be
 necessary.

 Significant open issues:

 1. Should libcurl be bundled with DMD on Windows?

Yes.
 2. etc.curl, std.curl, or std.net.curl? (We had a vote a while 
 back but
 it was buried deep in a thread and a lot of people may have 
 missed it:
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

std.net.curl

ok
 Code: 
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

I have a few comments. They are not critical for library's admission (apologies I didn't make the deadline) and could be integrated before the commit. 1. Tables must be centered (this is a note to self) 2. Replace d-p-l.org in examples with dlang.org

ok
 3. "Compared to using libcurl directly this module provides a 
 simpler API for performing common tasks." -> "Compared to using 
 libcurl directly this module allows simpler client code for 
 common uses, requires no unsafe operations, and integrates 
 better with the rest of the language."

ok
 4. From the high level ops it seems there's no way to issue a 
 PUT/POST and then direct the result to a file, or read the 
 result with a foreach.

Use byLine/Chunk with a connection initialized with post/put data: auto http = HTTP(); http.postData = "hello world"; foreach (line; byLine("dlang.org/there", KeepTerminator.no, '\n', http)) writeln(line); Same deal for directing to file using download()
 5. Also there seems to be no way to issue a PUT/POST from a 
 range.

Correct. It will be a later extension.
 6. s/Examples:/Example:/

ok
 7. s/HTTP http = HTTP("www.d-p-l.org");/auto http = 
 HTTP("www.d-p-l.org");/

ok
 8. "receives a header or a data buffer" -> "receives a header 
 and a data buffer, respectively"

ok
 9. "In this simple example, the headers are writting to stdout 
 and the data is ignored." How can we make it to stop the 
 request? A sentence about that would be great.

ok
 10. "Finally the HTTP request is started by calling perform()." 
 -> "Finally the HTTP request is effected by calling perform(), 
 which is synchronous."

ok
 11. Use InferredProtocol instead of AutoConnect(ion). The type 
 is exactly what it claims. If "inferred" is too long, 
 "AutoProtocol" would be fine too. But it's not the connection 
 that is automated. The word must not be a verb, i.e. 
 InferProtocol/DeduceProtocol would not be good.

ok
 12. The example for InferredProtocol nee AutoConnection does 
 not show any interesting example, e.g. ftp.

improved
 13. I think download() and upload() are good convenience 
 functions but are very limited. Such operations can last 
 arbitrarily long and it's impossible to stop them. Therefore, 
 they will foster poor program design (e.g. program hangs until 
 you kill it etc). We should keep these for convenience in 
 scripts, and we already have byXxx() for downloading, but we 
 don't have anything for uploading. We should offer output 
 ranges for uploading sync/async. Maybe in the next minor 
 release.

Using the same method as for post it is possible to set timeouts. auto http = HTTP(); http.dataTimeout = dur!"minutes"(2); foreach (line; byLine("dlang.org/there", KeepTerminator.no, '\n', http)) writeln(line); Regarding uploading ranges this will be handled by the same extension as for POST/PUT output ranges.
 14. Parameter doc for put(), post(), del() etc. is messed up.

ok
 15. I don't think "connection" is the right term. For example 
 we have

 T[] get(Conn = AutoConnection, T = char)(const(char)[] url, 
 Conn conn = Conn());

 The connection is established only during the call to get(), so 
 conn is not really a connection - more like a sort of a bundle 
 of connection parameters. But then it does clean up the 
 connection once established, so I'm unsure...

But you can provide an existing connection to use has that is already connected to the server. Hence the name connection.
 16. For someone who'd not versed in HTTP options, the example

 string content = 
 options("d-programming-language.appspot.com/testUrl2", 
 "something");

 isn't terribly informative.

ok - improved
 17. In byLine: "A range of lines is returned when the request 
 is complete." This suggests the last byte has been read as the 
 client gets to the first, which is not the case. "Data will be 
 read on demand as the foreach loop makes progress." Similar for 
 byChunk.

This will change when multi API is supported as well.
 18. In byXxxAsync wait(Duration) should be cross-referenced to 
 its definition.

ok
 =========================

 Overall: I think this is a valuable addition to Phobos, but I 
 have the feeling we don't have a good scenario for interrupting 
 connections. For example, if someone wants to offer a "Cancel" 
 button, the current API does not give them a robust option to 
 do so: all functions that transfer data are potentially 
 blocking, and there's no thread-shared way to interrupt a 
 connection in a thread from another thread.

 Such functionality may be a pure addition later, so I think we 
 can commit this as is. Please use std.net.curl.

They can abort from the callback handlers - but otherwise it is a limitation of the libcurl easy API. To actually do this the multi extension would have to be added. Thank you for the comments and sorry for the late reply. /Jonas
Dec 30 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Saturday, 17 December 2011 at 20:38:46 UTC, Jakob Ovrum wrote:
 On Saturday, 17 December 2011 at 11:56:04 UTC, jdrewsen wrote:
 AutoConnect sounds like a command to connection automatically 
 which might be confusing since that is not what it does. 
 Therefore I went with AutoConnection which I still believe is 
 better.

How about something completely different then, like URLInfer? Also, some instances in the docs of "url" in lowercase should probably be changed to uppercase.

Will change case. Don't know if the URLInfer name is better though. /Jonas
Dec 17 2011
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 17 December 2011 at 22:25:10 UTC, jdrewsen wrote:
 On Saturday, 17 December 2011 at 20:38:46 UTC, Jakob Ovrum 
 wrote:
 On Saturday, 17 December 2011 at 11:56:04 UTC, jdrewsen wrote:
 AutoConnect sounds like a command to connection automatically 
 which might be confusing since that is not what it does. 
 Therefore I went with AutoConnection which I still believe is 
 better.

How about something completely different then, like URLInfer? Also, some instances in the docs of "url" in lowercase should probably be changed to uppercase.

Will change case. Don't know if the URLInfer name is better though. /Jonas

Andrei is right, a noun is better. I'm with InferredProtocol.
Dec 17 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 12:56:02 jdrewsen wrote:
 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis
 
 I'd argue that the acronyms should be in all caps if
 camelcasing would require that the first letter of the acronym
 be capitalized and all lower case if camelcasing would require
 that the first letter of the acronym be lower case. We're not
 completely consistent in how we using acronyms in names in
 Phobos, but I believe that we primarily follow that rule when
 putting them in symbol names. So, for instance, it would be
 HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.

I like Http better as personal taste. But if HTTP is preferred then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

I actually have such a pull request, but it's been languishing because there are some items in it that need to be discussed. I keep forgetting to bring them up in the Phobos news group so that they can be decided and we can move on with that. https://github.com/D-Programming-Language/d-programming-language.org/pull/16
 I'd rename T to C or Char in the template parameters, since
 it's a character type rather than a generic type. Not a huge
 deal, but I think that it makes the code quicker to understand.

I've named it T because it can also be ubyte. In that case C for Char is confusing.

Yeah. I got that after reading the source, but the needs to be clearer in the documentation. Normally, when something is templated the way that that is, it's going to be templated on character type. And something which can be either ubyte or char but nothing else is rather abnormal. It needs to be clearer. And it's possible that at some point, it really should be changed to be any character type and ubyte rather than just char or ubyte (depending on how that affect efficiency - basically if it's more efficient to make it work with dchar from the get-go rather then get a string and convert it, then it's more desirable to make it work with wchar and dchar, but if you're just going to have to duplicate it or convert it anyway, then you might as well just restrict it to char and ubyte).
 Why are some parameters const(char)[] and others string? Why
 not make them all const(char)[] or even actually templatize
 them on character type (one per parameter which is a string)?

The parameters that are const(char)[] will accept both string and char[] which is what I want because libcurl make an internal copy of it anyway. If I typed it as string then you would have to idup a char[] for the parameter without any reason. The parameters that are string is internally passed to functions (in other std modules) that only accept strings. I could accept const(char)[] and idup it in order to have consistant parameter types but that gives an unnecessary overhead.

Okay. In general, functions should either take string or take a range of dchar. Taking a range of dchar is the most flexible, so it's generally the best (though this often results in specializations within the template for narrow strings - std.algorithm does that a lot). If you need a string spefically (e.g. all of the std.string functions specifically operate on strings), then taking an array that's templated on character type is generally better, because it's more flexible. If you need a string internally, then do _not_ use const(char)[]. or const(C) []. When you do that, you're forced to idup the string (or to!string is forced to idup it). It's far better to either take string (in which case the caller will idup if it's necessary) or to not use const. That way, the string is copied only when it actually needs to be copied. That does sometimes result in functions which take string for some parameters and const(char)[] for others, which looks odd to the programmer calling it, but that's life I guess. It's less of an issue when ranges are used, however, because then they're all ranges and it's just how theyre treated internally, I guess. I don't know how well you've done with this in general without looking over the code again. I know that you've generally taken const(char)[] instead of string, and sometimes you've iduped that. And those parameters needs to be fixed to take string and use to!string so that the iduping isn't always necessary. Also, since in most cases, you're ultimately passing the strings to the C curl API, the benefit of operating on ranges of dchar instead of strings is debatable, so it's not necessarily a problem that you're not taking ranges of dchar. You do, however, need to make sure that you use string rather than const(char)[] in places where you're iduping so that we can avoid such unnecessary allocations.
 Shorten the URL in your examples. Its long length increases the
 risk of making the examples too wide. It's a made up URL, so
 there's no need for it to be that long. Something like
 foo.bar.com is probably more than enough.

Actually the long urls point to a google appspot app that fits the example. I've done it this way so that people can just copy/paste the example and have something working. Not too many people have a test HTTP POST server setup for a quick test AFAIK.

I thought that it was fake. Well, if you can, use a shorter URL. Like dlang instead of d-programming-language now that we havethat domain.
 Why does byLine use '\x0a' instead of '\n'? '\n' is clear,
 whereas most people will have to look up what '\x0a' is, so I
 really think that it should be '\n'.

I looked at how it was done elsewhere in phobos and did like that: http://dlang.org/phobos/std_stdio.html#byLine But I agree with you that \n is better and will change it.

std.stdio should be changed then IMHO.
 In general, the functions should do a better job of documenting
 their parameters. Many of them don't document their parameters
 at all. For instance, while it's fairly easy to guess what
 keepTerminator and terminator are used for, there is no
 explanation about them whatsoever in either byLine or
 byLineAsync's documentation.

Again I looked at phobos to see what was current the style: http://dlang.org/phobos/std_stdio.html#byLine I'll include some more detail though.

Phobos' documentation can definitely be improved in some cases, and I think that this is one of them.
 If any of the range return types of these functions have
 non-standard functions on them, they need to be properly
 documented. e.g. wait(Duration) is mentioned in passing in
 byLineAsync's documentation, but no explanation for its purpose
 is given. The documentation needs to explain everything the
 programmer needs to know about the return types. For most
 ranges, it's enough to say that the return type is a range of
 X, but if the return type has anything other than the standard
 range functions, then just saying that the function returns a
 range of X is not enough. The extra functions must be listed
 and explained.

This is what is currently documented: "If no data is available and the main thread access the range it will block until data becomes available. An exception to this is the $(D wait(Duration)) method on the range. This method will wait at maximum for the specified duration and return true if data is available." I guess that explains it?

I really think that you need to state specifically that the returned range has an the additional wait(Duration) function on it and explain directly what it's for and how it's used. Also, put it in the example. The way it states it by saying "An exception to this is..." makes the fact that wait(Duration) is mentioned at all seem like an afterthought, and while it does give some explanation for it, I really don't think that it's clear enough how it's used and what it does. It's enough to start to get the idea, but not enough to actually use it. An example would probably be the biggest help in that regard, but I do think that the function needs to be explicitly documented with a description of what it does rather than seemingly mentioning it as an afterthought - especially since it also raises the question of what other non- standard functions the range has that aren't mentioned.
 I'd warn you that the fact that the documentation for Protocol
 is showing up in the documentation is likely a bug, since
 Protocol is private - probably the bug that all templates are
 currently public even if they're marked private. So, either
 Protocol needs to be public (or maybe protected), or it's going
 to disappear from the documentation eventually. Another option
 would be to do something similar to
 std.container.TotalContainer and declare a struct which is
 intended only for documentation (probably in a version(StdDdoc)
 block), and put all of the protocol documenation on _it_
 instead of the mixin.

Ok. didn't know about the private template bug. I really think it is a bug in ddoc that it should be able to insert mixed in docs ie. the Protocol documentation should be mixed into the Http documentation. Anyway - I don't see the TotalContainer docs showing in the generated html? Another workaround is to manually copy the Protocol docs into the HTTP/FTP/SMTP structs inside a version(StdDdoc).

We have at least 2 bugs here. One, that templates are always public, and two, that the documentation isn't mixed in. As for TotalContainer, you appear to be right. I'm in such a habit of reading the code rather than the docs, that I missed that. It's using /* rather than /**, so it doesn't end up in the ddoc. The best would be if the functions were on their approriate types with StdDoc, but at present, that means duplicating all of that documentation, which would kind of suck. So, it's up to you. From the perspective of those reading the documentation, it would definitely be better if it were on the appropriate types though, so I would be inclined to argue that they should go on the types and that we can remove them when the mixin bug is fixed, but I don't know what others would think about that. At minimum, it needs to be more clearly explained that Protocol is a template mixin that is being mixed into Http, Ftp, and Smtp, and those have all of Protocol's functions. - Jonathan M Davis
Dec 17 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 23:10:00 jdrewsen wrote:
 On Thursday, 15 December 2011 at 08:51:29 UTC, Jonathan M Davis
 
 wrote:
 Line# 235 is identical to line# 239. Shouldn't line# 235 be
 creating an Http object, not an Ftp object? That mistake
 definitely makes it look like download hasn't been properly
 tested.

It has been tested as you can see by the unittest just below the function. It just did not fail because libcurl requires the same setup for this download situation for both ftp and http. Will fix it of course.

Well, when there's code like that that is clearly wrong, it makes it look like it's either not being tested by the unit tests or that the unit tests haven't been run. I guess that this is just a case where the unit tests are unable to catch the problem.
 Is there a reason that the functions which have a template
 argument T which can be either a char or a ubyte can't work
 with immutable char? Glancing at _basicHttp, I don't see any
 reason why T couldn't be immutable char. Yes, it would require
 casting to immutable(char)[], but you're already casting to
 char[], and the data being returned appears to be unique such
 that it could be safely cast to immutable. That being the case,
 I'd encourage you to not only make it work with immutable char
 but to make immutable char the default instead of char.

Is is indeed unique and can be cast to immutable. I'll add that option. Why do you think immutable char as the default is better than char? I know that the return type in that case would be string and not char[] - but why is that better?

strings can't be sliced with impunity, because their elements are immutable. With char[] and const(char)[], you have to worry about the elements changing, so you're often forced to duplicate the string rather than simply slice it. As such, string is far preferrable to char[] or const(char)[] in the general case. It's even better when there's a choice between them, but if you have to choose, you should almost always choose string (there are exceptions though - e.g. buffers like in std.stdio.ByLine where it reuses its char[] buffer rather than allocating a new one).
 Please make sure that opening braces on on their own line.
 That's the way that the rest of Phobos does it and it's one of
 the few formatting rules that we've been trying to use
 consistently throughout Phobos. For the most part, you get it
 right, but not everywhere - nested functions in particular seem
 to have braces on the same line for some reason.

When I look through the code it seems ok with regards to braces on own line. Maybe by nested functions you mean delegates? The delegates I use do indeed have braces on same line which is ok afaik.

In general, just make sure that braces are on their own line unless you're dealing with a one line lambda or something similar. You do have nested functions or delegates which don't do that.
 It's not a big deal, but you should use auto more. For
 instance, lines #626, #638, and #640 don't use auto when they
 could.

I'm a bit in doubt. On one hand it is great to make everything auto in order to make it easy to change type and to remove redundancy. One the other hand it is very convenient to be able to see the type when you read the code. I know that a clever editor would be able to figure out the type for me. But I also read code in normal editors for diffs etc. or on github. Anyway - I've changed as much as possible to auto now.

Just in general, it's best practice to use auto unless you can't. There _are_ exceptions to that rule, but that's almost always the way that it should be.
 I'd suggest changing line# 823 to
 
 auto result = get(url, isFtpUrl(url) ? FTP() : Http());

Not possible. get() is a template function where the template parameter is connection type ie. one of FTP or HTTP. isFtpUrl(url) is evaluated at runtime and not compile time.

Yeah. I missed that. Stuff like that makes me wish that we had some kind of static ternary operator, but it would probably complicate the language too much. In some cases though, you might be able to use alias to fix the problem (static if the alias and then use the alias in the function call so that the entire function call isn't duplicated). - Jonathan M Davis
Dec 17 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 23:51:08 Jakob Ovrum wrote:
 On Saturday, 17 December 2011 at 22:25:10 UTC, jdrewsen wrote:
 On Saturday, 17 December 2011 at 20:38:46 UTC, Jakob Ovrum
 
 wrote:
 On Saturday, 17 December 2011 at 11:56:04 UTC, jdrewsen wrote:
 AutoConnect sounds like a command to connection automatically
 which might be confusing since that is not what it does.
 Therefore I went with AutoConnection which I still believe is
 better.

How about something completely different then, like URLInfer? Also, some instances in the docs of "url" in lowercase should probably be changed to uppercase.

Will change case. Don't know if the URLInfer name is better though. /Jonas

Andrei is right, a noun is better. I'm with InferredProtocol.

You inferred protocol is better. Still annoyingly long, but it's a clearer name. - JonathnM m Davis
Dec 17 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, December 17, 2011 09:21:42 Justin C Calvarese wrote:
 On Sat, Dec 17, 2011 at 5:56 AM, jdrewsen <jdrewsen nospam.com> wrote:
 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M Davis wrote:
 ...


I'd argue that the acronyms should be in all caps if camelcasing would
 require that the first letter of the acronym be capitalized and all
 lower case if camelcasing would require that the first letter of the
 acronym be lower case. We're not completely consistent in how we
 using acronyms in names in Phobos, but I believe that we primarily
 follow that rule when putting them in symbol names. So, for instance,
 it would be HTTP, FTP, and SMTP rather than Http, Ftp, and Smtp.

I like Http better as personal taste. But if HTTP is preferred then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

I absolutely agree that the "D Style" page should be revised and extended to reflect the current standards that new Phobos modules are expected to adhere to. I found an open pull request that indicates a past effort in this regard: https://github.com/D-Programming-Language/d-programming-language.org/pull/16 (I thought that another document was also floating around that was a proposed new "Style Guide for Phobos", but I don't recall who authored it and I don't remember where it was located.)

Yeah. I need to get back to that. Portions of it were completely agreed upon and others need further discussion, and I keep forgetting to get those discussed among the Phobos devs and get those questions settled. - Jonathan m Davis
Dec 17 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Sunday, 18 December 2011 at 01:30:10 UTC, Jonathan M Davis 
wrote:
 On Saturday, December 17, 2011 09:21:42 Justin C Calvarese 
 wrote:
 On Sat, Dec 17, 2011 at 5:56 AM, jdrewsen 
 <jdrewsen nospam.com> wrote:
 On Thursday, 15 December 2011 at 07:46:56 UTC, Jonathan M 
 Davis wrote:
 ...


I'd argue that the acronyms should be in all caps if camelcasing would
 require that the first letter of the acronym be capitalized 
 and all
 lower case if camelcasing would require that the first 
 letter of the
 acronym be lower case. We're not completely consistent in 
 how we
 using acronyms in names in Phobos, but I believe that we 
 primarily
 follow that rule when putting them in symbol names. So, for 
 instance,
 it would be HTTP, FTP, and SMTP rather than Http, Ftp, and 
 Smtp.

I like Http better as personal taste. But if HTTP is preferred then I'll do that. Actually I think I will make a pull request to extend the dlang.org/dstyle.html doc with and example that we can point to when someone asks about styling. I've spend too much time restyling because there is no such style doc already and people are complaining about style in reviews anyway.

I absolutely agree that the "D Style" page should be revised and extended to reflect the current standards that new Phobos modules are expected to adhere to. I found an open pull request that indicates a past effort in this regard: https://github.com/D-Programming-Language/d-programming-language.org/pull/16 (I thought that another document was also floating around that was a proposed new "Style Guide for Phobos", but I don't recall who authored it and I don't remember where it was located.)

Yeah. I need to get back to that. Portions of it were completely agreed upon and others need further discussion, and I keep forgetting to get those discussed among the Phobos devs and get those questions settled. - Jonathan m Davis

Nice to see that something is in progress. Please also add a note about acronyms in names then. /Jonas
Dec 18 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Sunday, 18 December 2011 at 01:27:45 UTC, Jonathan M Davis 
wrote:
 On Saturday, December 17, 2011 23:10:00 jdrewsen wrote:
 On Thursday, 15 December 2011 at 08:51:29 UTC, Jonathan M Davis
 
 wrote:
 Line# 235 is identical to line# 239. Shouldn't line# 235 be
 creating an Http object, not an Ftp object? That mistake
 definitely makes it look like download hasn't been properly
 tested.

It has been tested as you can see by the unittest just below the function. It just did not fail because libcurl requires the same setup for this download situation for both ftp and http. Will fix it of course.

Well, when there's code like that that is clearly wrong, it makes it look like it's either not being tested by the unit tests or that the unit tests haven't been run. I guess that this is just a case where the unit tests are unable to catch the problem.
 Is there a reason that the functions which have a template
 argument T which can be either a char or a ubyte can't work
 with immutable char? Glancing at _basicHttp, I don't see any
 reason why T couldn't be immutable char. Yes, it would 
 require
 casting to immutable(char)[], but you're already casting to
 char[], and the data being returned appears to be unique such
 that it could be safely cast to immutable. That being the 
 case,
 I'd encourage you to not only make it work with immutable 
 char
 but to make immutable char the default instead of char.

Is is indeed unique and can be cast to immutable. I'll add that option. Why do you think immutable char as the default is better than char? I know that the return type in that case would be string and not char[] - but why is that better?

strings can't be sliced with impunity, because their elements are immutable. With char[] and const(char)[], you have to worry about the elements changing, so you're often forced to duplicate the string rather than simply slice it. As such, string is far preferrable to char[] or const(char)[] in the general case. It's even better when there's a choice between them, but if you have to choose, you should almost always choose string (there are exceptions though - e.g. buffers like in std.stdio.ByLine where it reuses its char[] buffer rather than allocating a new one).
 Please make sure that opening braces on on their own line.
 That's the way that the rest of Phobos does it and it's one 
 of
 the few formatting rules that we've been trying to use
 consistently throughout Phobos. For the most part, you get it
 right, but not everywhere - nested functions in particular 
 seem
 to have braces on the same line for some reason.

When I look through the code it seems ok with regards to braces on own line. Maybe by nested functions you mean delegates? The delegates I use do indeed have braces on same line which is ok afaik.

In general, just make sure that braces are on their own line unless you're dealing with a one line lambda or something similar. You do have nested functions or delegates which don't do that.
 It's not a big deal, but you should use auto more. For
 instance, lines #626, #638, and #640 don't use auto when they
 could.

I'm a bit in doubt. On one hand it is great to make everything auto in order to make it easy to change type and to remove redundancy. One the other hand it is very convenient to be able to see the type when you read the code. I know that a clever editor would be able to figure out the type for me. But I also read code in normal editors for diffs etc. or on github. Anyway - I've changed as much as possible to auto now.

Just in general, it's best practice to use auto unless you can't. There _are_ exceptions to that rule, but that's almost always the way that it should be.
 I'd suggest changing line# 823 to
 
 auto result = get(url, isFtpUrl(url) ? FTP() : Http());

Not possible. get() is a template function where the template parameter is connection type ie. one of FTP or HTTP. isFtpUrl(url) is evaluated at runtime and not compile time.

Yeah. I missed that. Stuff like that makes me wish that we had some kind of static ternary operator, but it would probably complicate the language too much. In some cases though, you might be able to use alias to fix the problem (static if the alias and then use the alias in the function call so that the entire function call isn't duplicated).

A static ternary operator wouldn't work in this case since isFtpUrl(url) cannot be evaluated at compile time. /Jonas
Dec 18 2011
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, December 18, 2011 21:49:16 jdrewsen wrote:
 A static ternary operator wouldn't work in this case since
 isFtpUrl(url) cannot be evaluated at compile time.

Ah. That would be true. Still, it's the sort of thing which begs for a ternary operator but which just can't use it due to technical limitations in the language. It may be completely impractical to try and overcome those technical limitations, but it's still frustrating sometimes. Still, at least we _have_ static if, unlike C++. - Jonathan M Davis
Dec 18 2011
prev sibling next sibling parent reply Somedude <lovelydear mailmetrash.com> writes:
Le 03/12/2011 05:26, dsimcha a crit :
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper.  After the first round it was decided
 that, after a large number of minor issues were fixed, a second round
 would be necessary.
 
 Significant open issues:
 
 1.  Should libcurl be bundled with DMD on Windows?
 
 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )
 
 Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 
 For those of you on Windows, a libcurl binary built by DMC is available
 at http://gool.googlecode.com/files/libcurl_7.21.7.zip.
 
 Review starts now and ends on December 16, followed by one week of voting.

A bit late, but following Andrei's important remark that some operations are blocking, I also noticed that curl-multi API is not covered. http://curl.haxx.se/libcurl/c/libcurl-multi.html Is there a reason why ? Would it break the current API to add it later ?
Dec 19 2011
next sibling parent Somedude <lovelydear mailmetrash.com> writes:
Le 19/12/2011 19:05, Somedude a crit :
 Le 03/12/2011 05:26, dsimcha a crit :
 I volunteered ages ago to manage the review for the second round of
 Jonas Drewsen's CURL wrapper.  After the first round it was decided
 that, after a large number of minor issues were fixed, a second round
 would be necessary.

 Significant open issues:

 1.  Should libcurl be bundled with DMD on Windows?

 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back
 but it was buried deep in a thread and a lot of people may have missed
 it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )

 Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 For those of you on Windows, a libcurl binary built by DMC is available
 at http://gool.googlecode.com/files/libcurl_7.21.7.zip.

 Review starts now and ends on December 16, followed by one week of voting.

A bit late, but following Andrei's important remark that some operations are blocking, I also noticed that curl-multi API is not covered. http://curl.haxx.se/libcurl/c/libcurl-multi.html Is there a reason why ? Would it break the current API to add it later ?

It looks like curl_multi_perform's behavior is more interesting than curl_easy_perform. * From the tutorial: http://curl.haxx.se/libcurl/c/libcurl-tutorial.html "curl_multi_perform(3) is asynchronous. It will only execute as little as possible and then return back control to your program. It is designed to never block. If it returns CURLM_CALL_MULTI_PERFORM you better call it again soon, as that is a signal that it still has local data to send or remote data to receive. "
Dec 19 2011
prev sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Monday, 19 December 2011 at 18:14:29 UTC, Somedude wrote:
 Le 19/12/2011 19:05, Somedude a écrit :
 Le 03/12/2011 05:26, dsimcha a écrit :
 I volunteered ages ago to manage the review for the second 
 round of
 Jonas Drewsen's CURL wrapper.  After the first round it was 
 decided
 that, after a large number of minor issues were fixed, a 
 second round
 would be necessary.

 Significant open issues:

 1.  Should libcurl be bundled with DMD on Windows?

 2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a 
 while back
 but it was buried deep in a thread and a lot of people may 
 have missed
 it:  
 http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab 
 )

 Code: 
 https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
 Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 For those of you on Windows, a libcurl binary built by DMC is 
 available
 at http://gool.googlecode.com/files/libcurl_7.21.7.zip.

 Review starts now and ends on December 16, followed by one 
 week of voting.

A bit late, but following Andrei's important remark that some operations are blocking, I also noticed that curl-multi API is not covered. http://curl.haxx.se/libcurl/c/libcurl-multi.html Is there a reason why ? Would it break the current API to add it later ?

It looks like curl_multi_perform's behavior is more interesting than curl_easy_perform. * From the tutorial: http://curl.haxx.se/libcurl/c/libcurl-tutorial.html "curl_multi_perform(3) is asynchronous. It will only execute as little as possible and then return back control to your program. It is designed to never block. If it returns CURLM_CALL_MULTI_PERFORM you better call it again soon, as that is a signal that it still has local data to send or remote data to receive. "

I am aware of the multi_perform API. The multi API actually uses easy handles and therefore it would be natural to extend the wrapper with multi support later on. It will allow for read() and write() methods on the HTTP/FTP structs as well. For now I want to limit the scope and stick to the easy handles. /Jonas
Dec 30 2011
prev sibling next sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Friday, 16 December 2011 at 09:42:50 UTC, Jonathan M Davis 
wrote:
 Please make sure that you remove trailing whitespace from the 
 file. A lot of the lines have trailing whitespace. Also, make 
 sure that you don't have any tabs in the file. There are a few 
 places where you used tabs.

ok
 Line# 916 claims that the code there won't work and that it 
 needs to be fixed, so please fix it before it goes into Phobos 
 (assuming that it's voted in).

ok
 On line# 1051, just do

 auto arr = new Char[](bufferSize);

 There's no reason to create it and then assign to its length 
 property.

ok
 The braces in the block follow line# 1131 need to be fixed. The 
 indentation is wrong on some of them, and not all of them are 
 on their own line.

ok
 You should see if the cast(void[])'s can be removed from 
 byLineAsync. They might still be necessary, but null was given 
 a type with the last release, so it might work without the cast 
 now. The same goes with byChunkAsync.

Still doesn't work.
 Same with line# 1215 as 1051. There's no point in wasting a 
 line of code by declaring an array and the assigning to its 
 length variable. The code is probably slightly less efficient 
 that way too. Just allocate a new array of the correct length.

ok
 On line#1297, the braces need to be fixed. Same with line# 1308 
 and line# 1334. Just check over your braces in general to make 
 sure that they're consistent and are on their own line unless 
 you're dealing with something like a one line lambda.

ok
 And as I mentioned before, all of these enforces which take a 
 new CurlException, should be changed to use enforceEx after 
 you've fixed CurlException's constructor.

ok
 The documentation on Protocol's isValid is wrong. It claims 
 that isValid is true if the instance is stoppend and invalid. 
 Wouldn't that mean that the object _wasn't_ valid?

ok
 There are quite a few places where you're concatenating several 
 strings and format would make the code much cleaner (e.g. 
 Protocol's netInterface function). Unless you're avoiding 
 format in an attempt to allow functions to be pure (since 
 unfortunately, format can't currently be pure), you really 
 should be using format.

ok
 I'd argue that it's better to use empty than check whether a 
 string is equal to "". e.g. domain.empty instead of domain != 
 "" on line# 1538.

ok
 There bodies of Http's constructor, static opCall, and dup 
 functions are all very similar - especially the constructor and 
 static opCall. You should look at refactoring those so that 
 they don't have to duplicate so much code. Maybe it can't be 
 reasonably done easily, but it would certainly be better if 
 that code duplication could be reduced.

ok
 Change setTimeCondition to take a SysTime. DateTime is not 
 intended for timestamps. It's intended for calendar-based time, 
 not the system time. Also, SysTime makes it easier to convert 
 to time_t. In general, the fact that you're looking to deal 
 with a time_t is a sign that you should be using SysTime. This 
 is especially true, since DateTime is going to give the wrong 
 time_t in general, since it has no time zone. As it stands, the 
 timestamp is assumed to be in UTC. That's just begging for 
 bugs. Users won't expect that. Definitely make it a SysTime. 
 And once that's done, line# 2016 would disappear, and line# 
 2017 would become

 p.curl.set(CurlOption.timevalue, timestamp.toUnixTime());

ok
 On line# 2247, why do you use format with no format string? 
 There should be no ~ or to!string in a call to format. It's 
 just creating unnecessary memory allocations. It should be 
 something more like

 format("%s%s(%s.", code, reason, majorVersion, minorVersion);

 Though I find that '(' to be rather odd, since it has no 
 matching closing paren, so one may need to be added. 
 Regardless, that line shouldn't be concatenating strings or use 
 conv.to. format takes care of all of that and should do it with 
 fewer memory allocations.

ok - ')' was missing
 Pointers really should have their * next to the type not 
 floating in space like on line #2274. As it is, the code looks 
 like a multiplication. Unlike in C/C++, the * is clearly 
 associated with the type. e.g.

 int* a, b;

 creates an int* and an int in C, but it creates int* and int* 
 in D. Separating the * from the type has a tendancy to make the 
 code harder to understand.

If that's the D style I'll do that.
 Ftp's consturctor and static opCall have a code duplication 
 problem similar to that of Http.

ok
 Make encoding take a string. Any time that you are _always_ 
 going to idup a parameter which is a character array, make it a 
 string, _not_ const. That way, if it's a string, no idup is 
 necessary, and if it's char[], then it can be iduped when it's 
 passed in, and iduping only occurs when it's actually necessary.

ok
 In Smtp's constructor, yo ushould probably create a variable 
 for the toLowered url. e.g.

 auto lowered = url.toLower();

 That way, you avoid having to lower it twice if the else branch 
 of the if-else is taken.

ok
 Is p.curl.perform a property? If so, line# 2455 does nothing. 
 If not, then it needs parens. The curl module should be able to 
 be built with -property. I believe that Phobos as a whole is 
 currently being built with tha flag. If not, it will be soon, 
 and this module will need to do that if it's merged into Phobos.

ok
 If you can, please use a static foreach with 
 EnumMembers!CurlOption in Curl's dup. And if you can't, because 
 you're not clearing all of them, then put all of the ones that 
 you _are_ clearing in an array (or TypeTuple if you want to 
 avoid the allocation for the array) and iterate over them with 
 a foreach. e.g.

 foreach(option; TypeTuple!(CurlOption.file, 
 CurlOptions.writefunction))
   copy.clear(option);

 You should be able to cut down on the number of lines by doing 
 that (_especially_ if you're actually clearing all of them and 
 can use EnumMembers!CurlOption).

Nice. I actually found out that you can do: with (CurlOption) foreach(option; TypeTuple!(file, writefunction)) copy.clear(option); Which is pretty neat when you have to list as many options as in the curl wrapper.
 Line#2639 is another case where format should be used. In 
 general, if you're doing more than one ~, consider using format 
 unless you're trying to make a function pure.

ok
 I'd suggest renaming Message to something like CurlMessage. The 
 odds of a name clash with Message are likely high. And while 
 they won't be able to use Message, since it's private, it will 
 still affect whether the full path for Message must be given 
 (e.g. my.module.Message instead of Message). CurlMessage is far 
 less likely to clash.

ok
 Rename DATA to Data. DATA does not follow Phobos' naming 
 conventions. Granted, it's private, but it's completely off. 
 Type names should be pascal cased.

ok
 Why isn't empty a property on #3056? Sure, it's not a range, 
 but it would be more consistent with everything else if it were 
 a property.

ok
 In general, you should use when converting between TickDuration 
 and Duration, but in some cases, no conversion should be 
 necessary. For instance, line# 985 shouldn't need to do any 
 converting. Duration's opOpAssign should be able to handle a 
 TickDuration.

ok
 Overall, the design looks fairly solid, and the code looks 
 pretty good, but there are definitely a number of minor issues 
 which should be addressed before this code makes it into 
 Phobos. The biggest involve the functions' parameters (such as 
 using SysTime, not DateTime, and using string in some places 
 where you're using const(char)[] in order to avoid unnecessary 
 idups). And those definitely need to be sorted out sooner 
 rather than later, since they affect the API and will break 
 code if they're changed later.

 - Jonathan M Davis

Thanks for the comments /Jonas
Dec 30 2011
prev sibling parent "jdrewsen" <jdrewsen nospam.com> writes:
On Saturday, 17 December 2011 at 20:01:04 UTC, Jakob Ovrum wrote:
 The docs for onReceiveHeader,

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html#onReceiveHeader

 explicitly says that the const string parameters are not valid 
 after the function returns. This is all well and good; but a 
 minor improvement would be to change the signature to:

  property void onReceiveHeader(void delegate(in char[] key, in 
 char[] value) callback);

 to add `scope` to the parameters. This documents the fact that 
 the constant strings shouldn't be escaped as-is in both code 
 and documentation. I suspect many other instances of 
 `const(char)[]` could be changed to `in char[]` to better 
 document how they use the parameters (I also personally think 
 it looks better as it is more concise).

I agree. Actually I thought that 'in' was just an alias of 'const' but now I see that it is actually an alias of 'const scope'.
Dec 30 2011