www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Curl wrapper

reply jdrewsen <jdrewsen nospam.com> writes:
Hi,

    I've been working on a wrapper for the etc.c.curl module. It is now 
pretty stable and I would very much like some feedback on the API.

http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

BTW I use template mixins which doesn't seem to get included in the 
generated docs. Is there any way I can make this work?

/Jonas
May 16 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from jdrewsen (jdrewsen nospam.com)'s article
 Hi,
     I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.
 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?
 /Jonas

This looks very good and I'd definitely be in favor of including it in the next release. However, there are a few small issues: 1. In the docs for Http, I don't think you need to say "Do not use the same instance in two threads simultaneously." I think this is pretty much common sense. Same with the Ftp class. 2. A short overview of the asynch stuff at the top of the file would be nice. I'm a little confused about what it's for and when to use it instead of the regular synchronous functions. Does this allow you to process the beginning of a request while the end is still being received? 3. I don't think you need to say how all the convenience functions are implemented, i.e. "Internally this is implemented using an instance of the Http class." This is both likely to be assumed and an irrelevant implementation detail. Overall, nice work.
May 16 2011
next sibling parent Jimmy Cao <jcao219 gmail.com> writes:
--0016e6408cae3fd2df04a36e2b75
Content-Type: text/plain; charset=ISO-8859-1

It looks nice.  I plan to use this soon.
I hope it can be in phobos soon.

--0016e6408cae3fd2df04a36e2b75
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It looks nice. =A0I plan to use this soon.<div>I hope it can be in phobos s=
oon.</div>

--0016e6408cae3fd2df04a36e2b75--
May 16 2011
prev sibling next sibling parent Daniel Gibson <metalcaedes gmail.com> writes:
Am 17.05.2011 02:43, schrieb dsimcha:
 == Quote from jdrewsen (jdrewsen nospam.com)'s article
 Hi,
      I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.
 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?
 /Jonas

This looks very good and I'd definitely be in favor of including it in the next release. However, there are a few small issues: 1. In the docs for Http, I don't think you need to say "Do not use the same instance in two threads simultaneously." I think this is pretty much common sense. Same with the Ftp class.

Maybe it's common sense, but it doesn't hurt to have that information. Someone will certainly try that and ask why it gives strange results ;)
 2.  A short overview of the asynch stuff at the top of the file would be nice.
 I'm a little confused about what it's for and when to use it instead of the
 regular synchronous functions.  Does this allow you to process the beginning
of a
 request while the end is still being received?

 3.  I don't think you need to say how all the convenience functions are
 implemented, i.e. "Internally this is implemented using an instance of the Http
 class."  This is both likely to be assumed and an irrelevant implementation
detail.

 Overall, nice work.

May 16 2011
prev sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 17/05/11 02.43, dsimcha wrote:
 == Quote from jdrewsen (jdrewsen nospam.com)'s article
 Hi,
      I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.
 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html
 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?
 /Jonas

This looks very good and I'd definitely be in favor of including it in the next release. However, there are a few small issues: 1. In the docs for Http, I don't think you need to say "Do not use the same instance in two threads simultaneously." I think this is pretty much common sense. Same with the Ftp class.

I included it because the libcurl handle used in a HTTP instance is not thread safe itself. What if I instantiated a shared instance of this class. Wouldn't that make the comment relevant in that access to the internal libcurl handle is not protected by mutexes?
 2.  A short overview of the asynch stuff at the top of the file would be nice.
 I'm a little confused about what it's for and when to use it instead of the
 regular synchronous functions.  Does this allow you to process the beginning
of a
 request while the end is still being received?

Exactly. I'll add the missing overview.
 3.  I don't think you need to say how all the convenience functions are
 implemented, i.e. "Internally this is implemented using an instance of the Http
 class."  This is both likely to be assumed and an irrelevant implementation
detail.

You're right.
 Overall, nice work.

Thank you for the feedback!
May 17 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/16/2011 04:07 PM, jdrewsen wrote:
 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas

Now that's one sight for sore eyes. Nice work. A few comments and nits: 1. Http can't be a class. Network resources are the poster child of expensive resources that are NOT memory and don't work well with garbage collection. RAII is the poster child of deterministic resource management. So... 2. Furthermore I see customization via delegates (onReceive etc). That should be a good proxy for customization instead of using inheritance. 3. Data on wire is ubyte[], not void[]. 4. Shouldn't onSend take a ref to its argument in case it wants to expand it? 5. Why HttpMethod and not Http.Method? 6. "Do not use the same instance of this class in two threads simultanously." -> remove. A D program isn't supposed to do that unless you offer shared methods. 7. HttpStatusLine -> Http.StatusLine. One up for encapsulation. 8. HttpMethod -> Http.Method. Ditto. 9. "this(in const(char)[] url);" -> "this(string url);" I assume you store a copy of the URL inside by calling e.g. to!string. If the client has a string, you refuse to tap into that nice information and share the string safely and efficiently, and make a copy of it for no good reason. If the client doesn't have a string, no worry, the compiler will tell her she needs to do a copy. 10. addHeader etc. Same comment. Use string whenever you want to keep a copy anyway. 11. setTimeCondition -> use core.Duration for time representation throughout. 12. AsyncHttpResult -> Http.AsyncResult :o) 13. HttpResult -> Http.Result 14. Isn't the max redirect configurable via a parameter? 15. Typo: "Callbacks is not supported..." 16. Documentation should point to descriptions of the HTTP methods wrapped (e.g. "post", "del" etc). 17. There should be an example with login/password. 18. See onReceiveHeader">std.curl.Curl.onReceiveHeader and others look like doc macros gone wrong. 19. "chuncked" -> "chunked" 20. "Max allowed redirs. -1 for infinite." -> use uint and uint.max for infinite. 21. "short isRunning()" -> what's the semantics of the short? 22. byLine/byChunk should not expose a string or generally data that can be shared safely by the client. That's because it would need to create a new string with each iteration, which is very inefficient. Instead, they should expose char[]/ubyte[] just like File.byLine does, and reuse the buffer with each call. Essentially byLine/byChunk should be near-zero-overhead means to transfer and inspect arbitrarily large streams. 23. Ftp should have get in-memory and streaming byLine/byChunk, not only get to file. 24. The shutdown mechanism should be handled properly. Shutting down libcurl would have all pending transfers instantly throw a special exception. Without a shutdown API, applications won't be able to implement e.g. a Quit button. Again, great work! Andrei
May 16 2011
next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
May 16 2011
next sibling parent Jonas Drewsen <jdrewsen nospam.com> writes:
On 17/05/11 07.23, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

I believe we decided on ubyte[] for reading and void[] for write in an earlier thread. And std.file should be fixed.
May 17 2011
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file. Andrei
May 17 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/17/11 12:39 PM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 17:45:00 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file.

I remember suggesting a long time ago that all types that do not have pointers be implicitly convertible to ubyte[]. That way, write functions can safely use ubyte[] without casts (and even provide a bit of safety against accidentally writing pointers to disk or network). Are you still against the idea?

Yes. The problem is that converting a type implicitly to ubyte[] allows anyone to write raw bytes into an object, making a hash of any type system guarantees without a cast in sight. That is, granted, memory-safe (because there are no indirections), but is not recommendable. Andrei
May 17 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/17/11 1:00 PM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 20:43:11 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:39 PM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 17:45:00 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file.

I remember suggesting a long time ago that all types that do not have pointers be implicitly convertible to ubyte[]. That way, write functions can safely use ubyte[] without casts (and even provide a bit of safety against accidentally writing pointers to disk or network). Are you still against the idea?

Yes. The problem is that converting a type implicitly to ubyte[] allows anyone to write raw bytes into an object, making a hash of any type system guarantees without a cast in sight. That is, granted, memory-safe (because there are no indirections), but is not recommendable.

Ah, of course. const(ubyte)[] ? :)

Yah, const(ubyte)[] is much more reasonable. We may as well add a function representation() that returns it for any object. Andrei
May 17 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/17/11 1:35 PM, Andrej Mitrovic wrote:
 On 5/17/11, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 Yah, const(ubyte)[] is much more reasonable. We may as well add a
 function representation() that returns it for any object.

Would this mean that e.g. you could save the state of an object to disk via repr(), and then retrieve the state at a later point?

If it embeds no pointers, sure. Otherwise, a more involved serialization mechanism must be used. Andrei
May 17 2011
prev sibling parent KennyTM~ <kennytm gmail.com> writes:
On May 17, 11 22:45, Andrei Alexandrescu wrote:
 On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file. Andrei

I suppose that applies to std.mmfile too? MmFile.opSlice() currently returns a void[].
May 17 2011
prev sibling next sibling parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 17/05/11 06.43, Andrei Alexandrescu wrote:
 On 05/16/2011 04:07 PM, jdrewsen wrote:
 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas

Now that's one sight for sore eyes. Nice work. A few comments and nits: 1. Http can't be a class. Network resources are the poster child of expensive resources that are NOT memory and don't work well with garbage collection. RAII is the poster child of deterministic resource management. So...

I see what you mean. Will fix.
 2. Furthermore I see customization via delegates (onReceive etc). That
 should be a good proxy for customization instead of using inheritance.

The code actually defined the Http/Ftp classes as final. But somehow it is not reflected in the generated docs. Anyway, I will change it to struct and inheritance is not an issue anyway.
 3. Data on wire is ubyte[], not void[].

Data received is ubyte[] and a convenience method for conversion to string is provided. Data send is void[] since all arrays implicitely converts to void[]. This makes e.g [1,2,3,4,5] works as input. Convenience methods are provided for chars in order to set HTTP headers correctly. For onSend callback it is simply void[] in order to use it as an out parameter. For Http.post/put it is const(void)[] For Http.postAsync/putAsync it is immutable(void)[] because internally I need to send the data to another thread using message passing. Hopes this makes sense?
 4. Shouldn't onSend take a ref to its argument in case it wants to
 expand it?

It is not allowed to expand it. The callback originates from libcurl which determines the size of the array.
 5. Why HttpMethod and not Http.Method?

I had the thought myself but decided on the former. By looking at the replies to my post it seems that it should be changed. I'll do that.
 6. "Do not use the same instance of this class in two threads
 simultanously." -> remove. A D program isn't supposed to do that unless
 you offer shared methods.

Ok. So even though I declare: shared Http http = new Http(); I cannot have multiple threads call http.perform without that method being marked as shared as well? If thats the case I'll remove the comment.
 7. HttpStatusLine -> Http.StatusLine. One up for encapsulation.

 8. HttpMethod -> Http.Method. Ditto.

 9. "this(in const(char)[] url);" -> "this(string url);" I assume you
 store a copy of the URL inside by calling e.g. to!string. If the client
 has a string, you refuse to tap into that nice information and share the
 string safely and efficiently, and make a copy of it for no good reason.
 If the client doesn't have a string, no worry, the compiler will tell
 her she needs to do a copy.

I do not store a copy but calls libcurl which in turn does a copy by itself. So I believe the current signature is appropriate?
 10. addHeader etc. Same comment. Use string whenever you want to keep a
 copy anyway.

At 9. the string is copied by libcurl itself.
 11. setTimeCondition -> use core.Duration for time representation
 throughout.

Will fix.
 12. AsyncHttpResult -> Http.AsyncResult :o)

 13. HttpResult -> Http.Result

 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.
 15. Typo: "Callbacks is not supported..."

ok.
 16. Documentation should point to descriptions of the HTTP methods
 wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?
 17. There should be an example with login/password.

ok.
 18. See onReceiveHeader">std.curl.Curl.onReceiveHeader and others look
 like doc macros gone wrong.

 19. "chuncked" -> "chunked"

 20. "Max allowed redirs. -1 for infinite." -> use uint and uint.max for
 infinite.

Will fix.
 21. "short isRunning()" -> what's the semantics of the short?

This is a bug. It returns an internal bool value.
 22. byLine/byChunk should not expose a string or generally data that can
 be shared safely by the client. That's because it would need to create a
 new string with each iteration, which is very inefficient. Instead, they
 should expose char[]/ubyte[] just like File.byLine does, and reuse the
 buffer with each call. Essentially byLine/byChunk should be
 near-zero-overhead means to transfer and inspect arbitrarily large streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?
 23. Ftp should have get in-memory and streaming byLine/byChunk, not only
 get to file.

Definitely. I wanted to get feedback on HTTP API before doing the same to FTP.
 24. The shutdown mechanism should be handled properly. Shutting down
 libcurl would have all pending transfers instantly throw a special
 exception. Without a shutdown API, applications won't be able to
 implement e.g. a Quit button.

If you're only using this API this should be handled. But I see that it is not documented.
 Again, great work!

Thank you for the feedback.
May 17 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Thanks for the response! A few more answers and comments within 
(everything deleted counts as "sounds great").

On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 3. Data on wire is ubyte[], not void[].

Data received is ubyte[] and a convenience method for conversion to string is provided. Data send is void[] since all arrays implicitely converts to void[]. This makes e.g [1,2,3,4,5] works as input. Convenience methods are provided for chars in order to set HTTP headers correctly. For onSend callback it is simply void[] in order to use it as an out parameter. For Http.post/put it is const(void)[] For Http.postAsync/putAsync it is immutable(void)[] because internally I need to send the data to another thread using message passing. Hopes this makes sense?

I think it's best to systematically use void[] when sending data out (i.e. you start with structured data in memory and you let it implicitly "decay" as you pass it on the wire), and ubyte[] when receiving data (i.e. you start with unstructured data and force the user to explicitly build structure on it). As noted the current File API is not to be considered an example to follow.
 6. "Do not use the same instance of this class in two threads
 simultanously." -> remove. A D program isn't supposed to do that unless
 you offer shared methods.

Ok. So even though I declare: shared Http http = new Http(); I cannot have multiple threads call http.perform without that method being marked as shared as well? If thats the case I'll remove the comment.

Yes, shared limits access to only shared fields and methods.
 9. "this(in const(char)[] url);" -> "this(string url);" I assume you
 store a copy of the URL inside by calling e.g. to!string. If the client
 has a string, you refuse to tap into that nice information and share the
 string safely and efficiently, and make a copy of it for no good reason.
 If the client doesn't have a string, no worry, the compiler will tell
 her she needs to do a copy.

I do not store a copy but calls libcurl which in turn does a copy by itself. So I believe the current signature is appropriate?

Got it, thanks. I sense a little inefficiency here that we can't quite get rid of - the user passes a string, we must convert it to a zero-terminated char* (one allocation) and then libcurl copies it again (two allocations). Oh well.
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.
 16. Documentation should point to descriptions of the HTTP methods
 wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)
 22. byLine/byChunk should not expose a string or generally data that can
 be shared safely by the client. That's because it would need to create a
 new string with each iteration, which is very inefficient. Instead, they
 should expose char[]/ubyte[] just like File.byLine does, and reuse the
 buffer with each call. Essentially byLine/byChunk should be
 near-zero-overhead means to transfer and inspect arbitrarily large
 streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior. One more suggestion - you may want to also provide classic boring synchronous read/write methods that take a user-provided buffer. I'm sure people could use such e.g. when they want to stream data inside their own threads, or even in the main thread if they don't mind blocking.
 24. The shutdown mechanism should be handled properly. Shutting down
 libcurl would have all pending transfers instantly throw a special
 exception. Without a shutdown API, applications won't be able to
 implement e.g. a Quit button.

If you're only using this API this should be handled. But I see that it is not documented.

Yah, in the future we need to put an onShutdown protocol inside core, similar to atexit() in C. Each library that may block would plant a hook. That way an application will be able to exit quickly and gracefully even if it has blocked threads. Andrei
May 17 2011
parent reply jdrewsen <jdrewsen nospam.com> writes:
Please see comments below.

Den 17-05-2011 16:42, Andrei Alexandrescu skrev:
 Thanks for the response! A few more answers and comments within
 (everything deleted counts as "sounds great").

 On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.

I guess I should just remove that comment because other defaults has been selected as well e.g. timeouts.
 16. Documentation should point to descriptions of the HTTP methods
 wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)

Well DEL is actually called DELETE in the HTTP RFC. But delete is a reserved word i D so I used del() instead. delete() wouldn't work in following code in think: with(auto http = ...) { delete(...); }
 22. byLine/byChunk should not expose a string or generally data that can
 be shared safely by the client. That's because it would need to create a
 new string with each iteration, which is very inefficient. Instead, they
 should expose char[]/ubyte[] just like File.byLine does, and reuse the
 buffer with each call. Essentially byLine/byChunk should be
 near-zero-overhead means to transfer and inspect arbitrarily large
 streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior.

ok nice to know.
 One more suggestion - you may want to also provide classic boring
 synchronous read/write methods that take a user-provided buffer. I'm
 sure people could use such e.g. when they want to stream data inside
 their own threads, or even in the main thread if they don't mind blocking.

This would mean hooking into libcurl and selecting on its sockets. Totally doable. But I would rather stop here feature wise and wrap this thing up. I would like to focus my efforts on a candidate for std.net.event/reactor/proactor module. Anyone who wants to implement this read/write in the curl wrapper are ofcourse very welcome.
 24. The shutdown mechanism should be handled properly. Shutting down
 libcurl would have all pending transfers instantly throw a special
 exception. Without a shutdown API, applications won't be able to
 implement e.g. a Quit button.

If you're only using this API this should be handled. But I see that it is not documented.

Yah, in the future we need to put an onShutdown protocol inside core, similar to atexit() in C. Each library that may block would plant a hook. That way an application will be able to exit quickly and gracefully even if it has blocked threads.

That would be neat. /Jonas
May 17 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 Please see comments below.

 Den 17-05-2011 16:42, Andrei Alexandrescu skrev:
 Thanks for the response! A few more answers and comments within
 (everything deleted counts as "sounds great").

 On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.

I guess I should just remove that comment because other defaults has been selected as well e.g. timeouts.
 16. Documentation should point to descriptions of the HTTP methods
 wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)

Well DEL is actually called DELETE in the HTTP RFC. But delete is a reserved word i D so I used del() instead. delete() wouldn't work in following code in think: with(auto http = ...) { delete(...); }
 22. byLine/byChunk should not expose a string or generally data
 that can be shared safely by the client. That's because it would
 need to create a new string with each iteration, which is very
 inefficient. Instead, they should expose char[]/ubyte[] just like
 File.byLine does, and reuse the buffer with each call. Essentially
 byLine/byChunk should be near-zero-overhead means to transfer and
 inspect arbitrarily large streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior.

ok nice to know.
 One more suggestion - you may want to also provide classic boring
 synchronous read/write methods that take a user-provided buffer. I'm
 sure people could use such e.g. when they want to stream data inside
 their own threads, or even in the main thread if they don't mind
 blocking.

This would mean hooking into libcurl and selecting on its sockets. Totally doable.

api is possible, but select is only used to wait until data is ready, reading and writing the data is still done by curl. And as curl does not allow you to supply the data buffer, having user-provided buffers is afaik impossible.

_Doable_ - not easy :) Select will wait for data to be ready and ask curl to handle the data chunk. Curl in turn calls back to a registered callback handler with the data read. That handler fills the buffer provided by the user. If not enough data has been receive an new select is performed until the requested amount of data is read. Then the blocking method can return.
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software. 2, It introduces a dependency to an external project in phobos. Currently no dependencies are present. This makes phobos very easy to install and use out of the box. The optimal solution to these problems from my point of view would be to get that "much discussed" package tool in place soon (CPAN,apt alike). Heck, now I think about it maybe I should do that before any std.net stuff. Let's see how the wind blows - many interesting projects :)
 I've been trying to get familiar with libev lately, so I have bindings
 for livev 3.9 / 4.04 (combined, version can be selected with version
 statements): http://dl.dropbox.com/u/24218791/d/src/libev.html
 (comments in the c header have been turned into doc comments. The
 binding is partially based on Leandro Lucarella bindings:
 http://git.llucax.com.ar/?p=software/ev.d.git)

Very nice. I've also stumbled upon some other async D libs out there that could be a good starting point. It seems that Leandro has both bindings and a wrapper in place. And a wrapper is really nice to have to make it more D'ish. /Jonas
May 18 2011
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/18/11 6:07 AM, Jonas Drewsen wrote:
 Select will wait for data to be ready and ask curl to handle the data
 chunk. Curl in turn calls back to a registered callback handler with the
 data read. That handler fills the buffer provided by the user. If not
 enough data has been receive an new select is performed until the
 requested amount of data is read. Then the blocking method can return.

Perhaps this would be too complicated. In any case the core functionality must be paid top attention. And the core functionality is streaming. Currently there are two proposed ways to stream data from an HTTP address: (a) by using the onReceive callback, and (b) by using byLine/byChunk. If either of these perform slower than the best-of-the-breed streaming using libcurl, we have failed. The onReceive method is not particularly appealing because the client and libcurl block each other: the client is blocked while libcurl is waiting for data, and the client blocks libcurl while inside the callback. (Please correct me if I'm wrong.) To make byLine/byChunk fast, the basic setup should include a hidden thread that does the download in separation from the client's thread. There should be K buffers allocated (K = 2 to e.g. 10), and a simple protocol for passing the buffers back and forth between the client thread and the hidden thread. That way, in the quiescent state, there is no memory allocation and either both client and libcurl are busy doing work, or one is much slower than the other, which waits. The same mechanism should be used in byChunkAsync or byFileAsync. Thanks, Andrei
May 18 2011
parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 18-05-2011 16:53, Andrei Alexandrescu skrev:
 On 5/18/11 6:07 AM, Jonas Drewsen wrote:
 Select will wait for data to be ready and ask curl to handle the data
 chunk. Curl in turn calls back to a registered callback handler with the
 data read. That handler fills the buffer provided by the user. If not
 enough data has been receive an new select is performed until the
 requested amount of data is read. Then the blocking method can return.

Perhaps this would be too complicated. In any case the core functionality must be paid top attention. And the core functionality is streaming. Currently there are two proposed ways to stream data from an HTTP address: (a) by using the onReceive callback, and (b) by using byLine/byChunk. If either of these perform slower than the best-of-the-breed streaming using libcurl, we have failed. The onReceive method is not particularly appealing because the client and libcurl block each other: the client is blocked while libcurl is waiting for data, and the client blocks libcurl while inside the callback. (Please correct me if I'm wrong.) To make byLine/byChunk fast, the basic setup should include a hidden thread that does the download in separation from the client's thread. There should be K buffers allocated (K = 2 to e.g. 10), and a simple protocol for passing the buffers back and forth between the client thread and the hidden thread. That way, in the quiescent state, there is no memory allocation and either both client and libcurl are busy doing work, or one is much slower than the other, which waits. The same mechanism should be used in byChunkAsync or byFileAsync.

If byChunk is using a hidden thread to download into buffers, then how does it differ from the byChunkAsync that you mention? The current curl wrapper actually does the hidden thread trick (based on a hint you gave me a while ago). It does not reuse buffers because I thought that all data had to be immutable or by value to go through the message passing system. I'll fix this since it is a good place to do some type casting to allow passing the buffers for reuse. I think that we have to consider the context of the streaming before we can tell the best solution. I do not have any number to back the following up, but this is how I see it: If data that is read is going to be processed (e.g. compressed) in some way it is most likely a benefit to spawn a thread to handle the data buffering. If no processing is done (e.g. a simple copy from net to disk) I believe keeping things in the same thread and simply select on sockets (disk or net) is fastest. This way no message passing and context switching is taking place and does cause any overhead. libcurl can give you access to the file descriptors for this exact purpose but it does have some drawbacks: you are not in control of the buffers used by libcurl. This means that reading from one curl connection and sending on another you would have to copy the data. libcurl does in fact provide even simpler methods where you can provide your own buffers for read/writes. Unfortunately this is only supported for HTTP and a lot of the convenience features such as redirections are lost. The more you want to control to get the last drop of performance, the more you have to manually handle yourself. In my opinion I think that providing the performance of the standard libcurl API in the D wrapper is the way to go (as done in the current curl wrapper). Generic and efficient streaming across protocols is best done in std.net where buffers can be handled entirely in D. I know this is not a small task which is why I started out with wrapping libcurl. Thanks Jonas
May 18 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/18/11 5:29 PM, jdrewsen wrote:
 Den 18-05-2011 16:53, Andrei Alexandrescu skrev:
 On 5/18/11 6:07 AM, Jonas Drewsen wrote:
 Select will wait for data to be ready and ask curl to handle the data
 chunk. Curl in turn calls back to a registered callback handler with the
 data read. That handler fills the buffer provided by the user. If not
 enough data has been receive an new select is performed until the
 requested amount of data is read. Then the blocking method can return.

Perhaps this would be too complicated. In any case the core functionality must be paid top attention. And the core functionality is streaming. Currently there are two proposed ways to stream data from an HTTP address: (a) by using the onReceive callback, and (b) by using byLine/byChunk. If either of these perform slower than the best-of-the-breed streaming using libcurl, we have failed. The onReceive method is not particularly appealing because the client and libcurl block each other: the client is blocked while libcurl is waiting for data, and the client blocks libcurl while inside the callback. (Please correct me if I'm wrong.) To make byLine/byChunk fast, the basic setup should include a hidden thread that does the download in separation from the client's thread. There should be K buffers allocated (K = 2 to e.g. 10), and a simple protocol for passing the buffers back and forth between the client thread and the hidden thread. That way, in the quiescent state, there is no memory allocation and either both client and libcurl are busy doing work, or one is much slower than the other, which waits. The same mechanism should be used in byChunkAsync or byFileAsync.

If byChunk is using a hidden thread to download into buffers, then how does it differ from the byChunkAsync that you mention?

Sorry, byChunkAsync and byLineAsync (which I wrongly denoted as byFileAsync) would be methods File.
 The current curl wrapper actually does the hidden thread trick (based on
 a hint you gave me a while ago). It does not reuse buffers because I
 thought that all data had to be immutable or by value to go through the
 message passing system. I'll fix this since it is a good place to do
 some type casting to allow passing the buffers for reuse.

Great, thanks. Don't forget there's great motivation to do so.
 I think that we have to consider the context of the streaming before we
 can tell the best solution. I do not have any number to back the
 following up, but this is how I see it:

 If data that is read is going to be processed (e.g. compressed) in some
 way it is most likely a benefit to spawn a thread to handle the data
 buffering.

 If no processing is done (e.g. a simple copy from net to disk) I believe
 keeping things in the same thread and simply select on sockets (disk or
 net) is fastest.

Not at all. If operating with the network and operating with the disk block each other, you're guaranteed to be slower than the slowest of them. Consider that disk speed is V1 MB/s and network speed is V2 MB/s, and that they're independent of each other. If you do one thing at a time, you need to take 1/V1 + 1/V2 seconds to transfer one MB. The speed of the process is therefore 1/(1/V1 + 1/V2) = V1 * V2 / (V1 + V2). If the two devices have comparable speeds, you're halving the speed. As soon as you do buffering with two threads you can easily reach close to the minimum of the two speeds, which is the theoretical best.
 In my opinion I think that providing the performance of the standard
 libcurl API in the D wrapper is the way to go (as done in the current
 curl wrapper). Generic and efficient streaming across protocols is best
 done in std.net where buffers can be handled entirely in D. I know this
 is not a small task which is why I started out with wrapping libcurl.

Sounds reasonable, although if you take care of recycling the buffers in the implementation you have, your wrapper may as well be the best of breed. Andrei
May 18 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 19-05-2011 00:54, Andrei Alexandrescu skrev:
 On 5/18/11 5:29 PM, jdrewsen wrote:
 Den 18-05-2011 16:53, Andrei Alexandrescu skrev:
 On 5/18/11 6:07 AM, Jonas Drewsen wrote:
 Select will wait for data to be ready and ask curl to handle the data
 chunk. Curl in turn calls back to a registered callback handler with
 the
 data read. That handler fills the buffer provided by the user. If not
 enough data has been receive an new select is performed until the
 requested amount of data is read. Then the blocking method can return.

Perhaps this would be too complicated. In any case the core functionality must be paid top attention. And the core functionality is streaming. Currently there are two proposed ways to stream data from an HTTP address: (a) by using the onReceive callback, and (b) by using byLine/byChunk. If either of these perform slower than the best-of-the-breed streaming using libcurl, we have failed. The onReceive method is not particularly appealing because the client and libcurl block each other: the client is blocked while libcurl is waiting for data, and the client blocks libcurl while inside the callback. (Please correct me if I'm wrong.) To make byLine/byChunk fast, the basic setup should include a hidden thread that does the download in separation from the client's thread. There should be K buffers allocated (K = 2 to e.g. 10), and a simple protocol for passing the buffers back and forth between the client thread and the hidden thread. That way, in the quiescent state, there is no memory allocation and either both client and libcurl are busy doing work, or one is much slower than the other, which waits. The same mechanism should be used in byChunkAsync or byFileAsync.

If byChunk is using a hidden thread to download into buffers, then how does it differ from the byChunkAsync that you mention?

Sorry, byChunkAsync and byLineAsync (which I wrongly denoted as byFileAsync) would be methods File.
 The current curl wrapper actually does the hidden thread trick (based on
 a hint you gave me a while ago). It does not reuse buffers because I
 thought that all data had to be immutable or by value to go through the
 message passing system. I'll fix this since it is a good place to do
 some type casting to allow passing the buffers for reuse.

Great, thanks. Don't forget there's great motivation to do so.
 I think that we have to consider the context of the streaming before we
 can tell the best solution. I do not have any number to back the
 following up, but this is how I see it:

 If data that is read is going to be processed (e.g. compressed) in some
 way it is most likely a benefit to spawn a thread to handle the data
 buffering.

 If no processing is done (e.g. a simple copy from net to disk) I believe
 keeping things in the same thread and simply select on sockets (disk or
 net) is fastest.

Not at all. If operating with the network and operating with the disk block each other, you're guaranteed to be slower than the slowest of them. Consider that disk speed is V1 MB/s and network speed is V2 MB/s, and that they're independent of each other. If you do one thing at a time, you need to take 1/V1 + 1/V2 seconds to transfer one MB. The speed of the process is therefore 1/(1/V1 + 1/V2) = V1 * V2 / (V1 + V2). If the two devices have comparable speeds, you're halving the speed. As soon as you do buffering with two threads you can easily reach close to the minimum of the two speeds, which is the theoretical best.

It see your point. By buffering data asynchronously the reads and writes don't block each other and this increases performance. The thing is that the OS already does buffering for us. So while we're writing data to disk the OS is buffering incoming data from the network asynchronously.
 In my opinion I think that providing the performance of the standard
 libcurl API in the D wrapper is the way to go (as done in the current
 curl wrapper). Generic and efficient streaming across protocols is best
 done in std.net where buffers can be handled entirely in D. I know this
 is not a small task which is why I started out with wrapping libcurl.

Sounds reasonable, although if you take care of recycling the buffers in the implementation you have, your wrapper may as well be the best of breed. Andrei

May 19 2011
prev sibling next sibling parent jdrewsen <jdrewsen nospam.com> writes:
Den 18-05-2011 14:52, Johannes Pfau skrev:
 Jonas Drewsen wrote:
 On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 Please see comments below.

 Den 17-05-2011 16:42, Andrei Alexandrescu skrev:
 Thanks for the response! A few more answers and comments within
 (everything deleted counts as "sounds great").

 On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.

I guess I should just remove that comment because other defaults has been selected as well e.g. timeouts.
 16. Documentation should point to descriptions of the HTTP
 methods wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)

Well DEL is actually called DELETE in the HTTP RFC. But delete is a reserved word i D so I used del() instead. delete() wouldn't work in following code in think: with(auto http = ...) { delete(...); }
 22. byLine/byChunk should not expose a string or generally data
 that can be shared safely by the client. That's because it would
 need to create a new string with each iteration, which is very
 inefficient. Instead, they should expose char[]/ubyte[] just like
 File.byLine does, and reuse the buffer with each call.
 Essentially byLine/byChunk should be near-zero-overhead means to
 transfer and inspect arbitrarily large streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior.

ok nice to know.
 One more suggestion - you may want to also provide classic boring
 synchronous read/write methods that take a user-provided buffer.
 I'm sure people could use such e.g. when they want to stream data
 inside their own threads, or even in the main thread if they don't
 mind blocking.

This would mean hooking into libcurl and selecting on its sockets. Totally doable.

blocking api is possible, but select is only used to wait until data is ready, reading and writing the data is still done by curl. And as curl does not allow you to supply the data buffer, having user-provided buffers is afaik impossible.

_Doable_ - not easy :) Select will wait for data to be ready and ask curl to handle the data chunk. Curl in turn calls back to a registered callback handler with the data read. That handler fills the buffer provided by the user. If not enough data has been receive an new select is performed until the requested amount of data is read. Then the blocking method can return.

I think it's not necessary to do multiple calls if the first didn't return enough data. Most read calls are only guaranteed to return a maximum of data, less is always possible. It's a bigger problem if the user supplied buffer is smaller than the curl buffer. And you always end up copying data. So it might be better to just return a reference to the buffer allocated by curl, or just forget about this api, curl just doesn't work well this way.

Yeah. That's part of the reason why I'm not interested in implementing it.
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software.

That could indeed be a problem. I think libev/libevent even have the same license. Probably it's really best to implement it from scratch, but it will be quite some work. I'd still have a look at the libev documentation though, it contains some details about OS quirks. And try to avoid Linux AIO ;-)
 2, It introduces a dependency to an external project in phobos.
 Currently no dependencies are present. This makes phobos very easy to
 install and use out of the box.

 The optimal solution to these problems from my point of view would be
 to get that "much discussed" package tool in place soon (CPAN,apt
 alike).

 Heck, now I think about it maybe I should do that before any std.net
 stuff. Let's see how the wind blows - many interesting projects :)

 I've been trying to get familiar with libev lately, so I have
 bindings for livev 3.9 / 4.04 (combined, version can be selected
 with version statements):
 http://dl.dropbox.com/u/24218791/d/src/libev.html (comments in the c
 header have been turned into doc comments. The binding is partially
 based on Leandro Lucarella bindings:
 http://git.llucax.com.ar/?p=software/ev.d.git)

Very nice. I've also stumbled upon some other async D libs out there that could be a good starting point. It seems that Leandro has both bindings and a wrapper in place. And a wrapper is really nice to have to make it more D'ish.

Yes, I have a higher level wrapper for libev as well, but the new 'shared' stuff made it a lot more complicated to get that right.
 /Jonas


May 18 2011
prev sibling parent reply jdrewsen <jdrewsen nospam.com> writes:
Den 23-05-2011 21:02, Masahiro Nakagawa skrev:
 On Wed, 18 May 2011 20:07:01 +0900, Jonas Drewsen <jdrewsen nospam.com> 
 wrote:
 
 On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software. 2, It introduces a dependency to an external project in phobos. Currently no dependencies are present. This makes phobos very easy to install and use out of the box. The optimal solution to these problems from my point of view would be to get that "much discussed" package tool in place soon (CPAN,apt alike). Heck, now I think about it maybe I should do that before any std.net stuff. Let's see how the wind blows - many interesting projects :)

Good! Last year, I tried to implement std.event. In Windows, using IOCP with Overlapped-IO for proactor. Others, IO multiplexing with worker-thread for proactor emulation. But Johannes already started implementing event module using libev, so I deleted this task from my tasks.

If you have any code for inspiration or that could be used as a base I would very much like to see/use it. Especially if it is under Boost license. I would like to create reactor initially and proceed to build the proactor on top of that (not on top for windows and IOCP of course). Reactor and proactor designs each have their forces and therefore I do not think targeting a proactor only design is the way to go. /Jonas
May 23 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 30-05-2011 01:16, Jimmy Cao skrev:
 Is there any reason why this code (using SMTP + Curl):
 https://gist.github.com/998214 segfaults on Windows but not on Fedora 15?

 I found an easy fix, but it looks like a bug somewhere that I can't
 figure out.

etc.curl is currently a moving target. The last version I pushed to github is outdated. Currently the focus is on getting http and ftp working and the other protocol hasn't been tested. Out of curiosity I would very much like to known what your easy fix was? /Jonas
May 30 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 17 May 2011 17:45:00 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file.

I remember suggesting a long time ago that all types that do not have pointers be implicitly convertible to ubyte[]. That way, write functions can safely use ubyte[] without casts (and even provide a bit of safety against accidentally writing pointers to disk or network). Are you still against the idea? -- Best regards, Vladimir mailto:vladimir thecybershadow.net
May 17 2011
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tue, 17 May 2011 20:43:11 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:39 PM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 17:45:00 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 5/17/11 12:23 AM, Vladimir Panteleev wrote:
 On Tue, 17 May 2011 07:43:06 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 3. Data on wire is ubyte[], not void[].

I was the one who suggested using void[]. If data on wire is ubyte[], why is data on disk void[] (according to std.file.read/write)?

Bug in std.file.

I remember suggesting a long time ago that all types that do not have pointers be implicitly convertible to ubyte[]. That way, write functions can safely use ubyte[] without casts (and even provide a bit of safety against accidentally writing pointers to disk or network). Are you still against the idea?

Yes. The problem is that converting a type implicitly to ubyte[] allows anyone to write raw bytes into an object, making a hash of any type system guarantees without a cast in sight. That is, granted, memory-safe (because there are no indirections), but is not recommendable.

Ah, of course. const(ubyte)[] ? :) -- Best regards, Vladimir mailto:vladimir thecybershadow.net
May 17 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 5/17/11, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 Yah, const(ubyte)[] is much more reasonable. We may as well add a
 function representation() that returns it for any object.

Would this mean that e.g. you could save the state of an object to disk via repr(), and then retrieve the state at a later point?
May 17 2011
prev sibling next sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Wed, 18 May 2011 20:07:01 +0900, Jonas Drewsen <jdrewsen nospam.com>  
wrote:

 On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software. 2, It introduces a dependency to an external project in phobos. Currently no dependencies are present. This makes phobos very easy to install and use out of the box. The optimal solution to these problems from my point of view would be to get that "much discussed" package tool in place soon (CPAN,apt alike). Heck, now I think about it maybe I should do that before any std.net stuff. Let's see how the wind blows - many interesting projects :)

Good! Last year, I tried to implement std.event. In Windows, using IOCP with Overlapped-IO for proactor. Others, IO multiplexing with worker-thread for proactor emulation. But Johannes already started implementing event module using libev, so I deleted this task from my tasks. My pending projects need event module. I expect your efforts! Masahiro P.S. cURL wrapper is nice. I wait this :)
May 23 2011
prev sibling next sibling parent "Masahiro Nakagawa" <repeatedly gmail.com> writes:
On Tue, 24 May 2011 04:50:10 +0900, jdrewsen <jdrewsen nospam.com> wrote:

 Den 23-05-2011 21:02, Masahiro Nakagawa skrev:
 On Wed, 18 May 2011 20:07:01 +0900, Jonas Drewsen <jdrewsen nospam.com>
 wrote:

 On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software. 2, It introduces a dependency to an external project in phobos. Currently no dependencies are present. This makes phobos very easy to install and use out of the box. The optimal solution to these problems from my point of view would be to get that "much discussed" package tool in place soon (CPAN,apt alike). Heck, now I think about it maybe I should do that before any std.net stuff. Let's see how the wind blows - many interesting projects :)

Good! Last year, I tried to implement std.event. In Windows, using IOCP with Overlapped-IO for proactor. Others, IO multiplexing with worker-thread for proactor emulation. But Johannes already started implementing event module using libev, so I deleted this task from my tasks.

If you have any code for inspiration or that could be used as a base I would very much like to see/use it. Especially if it is under Boost license.

My old design is based on Boost/Asio. After I rewrote std.socket(including other utilities such as DNS, IPAddresses), I stopped implementing task.
 I would like to create reactor initially and proceed to build the
 proactor on top of that (not on top for windows and IOCP of course).
 Reactor and proactor designs each have their forces and therefore I do
 not think targeting a proactor only design is the way to go.

I agree. General library should provides two approaches. My main purpose is high-performance RPC. In such cases, Proactor is better(Java's Netty provides good abstraction APIs).
May 29 2011
prev sibling next sibling parent Jimmy Cao <jcao219 gmail.com> writes:
--001485f9ede4c6082a04a4725eca
Content-Type: text/plain; charset=ISO-8859-1

Is there any reason why this code (using SMTP + Curl):
https://gist.github.com/998214 segfaults on Windows but not on Fedora 15?

I found an easy fix, but it looks like a bug somewhere that I can't figure
out.

--001485f9ede4c6082a04a4725eca
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Is there any reason why this code (using SMTP + Curl): =A0<meta http-equiv=
=3D"content-type" content=3D"text/html; charset=3Dutf-8"><a href=3D"https:/=
/gist.github.com/998214">https://gist.github.com/998214</a>=A0segfaults on =
Windows but not on Fedora 15?<div>

<br><div>I found an easy fix, but it looks like a bug somewhere that I can&=
#39;t figure out.</div></div>

--001485f9ede4c6082a04a4725eca--
May 29 2011
prev sibling parent Jimmy Cao <jcao219 gmail.com> writes:
--0016e640d17431e4fe04a4821522
Content-Type: text/plain; charset=ISO-8859-1

On Mon, May 30, 2011 at 12:38 PM, jdrewsen <jdrewsen nospam.com> wrote:

 Den 30-05-2011 01:16, Jimmy Cao skrev:

  Is there any reason why this code (using SMTP + Curl):
 https://gist.github.com/998214 segfaults on Windows but not on Fedora 15?

 I found an easy fix, but it looks like a bug somewhere that I can't
 figure out.

etc.curl is currently a moving target. The last version I pushed to github is outdated. Currently the focus is on getting http and ftp working and the other protocol hasn't been tested. Out of curiosity I would very much like to known what your easy fix was? /Jonas

Uncomment line 19 and it works. --0016e640d17431e4fe04a4821522 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Mon, May 30, 2011 at 12:38 PM, jdrewsen <span dir=3D"ltr">&lt;<a href=3D= "mailto:jdrewsen nospam.com">jdrewsen nospam.com</a>&gt;</span> wrote:<br><= div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin= :0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> Den 30-05-2011 01:16, Jimmy Cao skrev:<div><div></div><div class=3D"h5"><br=

x #ccc solid;padding-left:1ex"> Is there any reason why this code (using SMTP + Curl):<br> <a href=3D"https://gist.github.com/998214" target=3D"_blank">https://gist.g= ithub.com/998214</a> segfaults on Windows but not on Fedora 15?<br> <br> I found an easy fix, but it looks like a bug somewhere that I can&#39;t<br> figure out.<br> </blockquote> <br></div></div> etc.curl is currently a moving target. The last version I pushed to github = is outdated. Currently the focus is on getting http and ftp working and the= other protocol hasn&#39;t been tested.<br> <br> Out of curiosity I would very much like to known what your easy fix was?<br=

/Jonas<br> </blockquote></div><br><div><br></div><div>Uncomment line 19 and it works.<= /div> --0016e640d17431e4fe04a4821522--
May 30 2011
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 16 May 2011 17:07:57 -0400, jdrewsen <jdrewsen nospam.com> wrote:

 Hi,

     I've been working on a wrapper for the etc.c.curl module. It is now  
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the  
 generated docs. Is there any way I can make this work?

 /Jonas

I think the API looks good with regard to the core functionality it provides, but the documentation is sub-par, an certain structs should be defined inside the Http class. I've listed a bunch of comments I had while reading the documentation below. One minor question regarding the actual implementation: do you define a pragma(lib, ...) to ease setup/tool chain pains, or do you intend something different? Regarding the First example: * Use string instead of const(char)[] * Don't waste vertical space with empty comment lines like this // // This comment style // * Comment lines should be grammatically correct, i.e. '// GET using async range' should be '// GET using an asynchronous range' or '// Perform a GET using an asynchronous range', etc * Don't use 'l' as a variable. It looks like a 1. Also, using 'line' would make the meaning clearer * Example code should not exceed 80 chars in width. People's browser setups vary, and wrapped lines are bad. * You can drop the 'delegate size_t' from the PUT example by making the return 0u, 0U or size_t.init. * The last example is missing a description. (Also, it goes over 80 chars) The license section should include a mention of libcurl's license, etc in addition to the wrapper's license. In general, for non-Boost libraries like this, should we include the license/copyright as a string/tuple enum? (i.e. for use in about boxes / command line options) Should said license/copyright be register to a global repository upon module import? (i.e. for auto-magical about boxes, etc) 'Do not use the same instance of this class in two threads simultanously.' First, you need to spell check all your documentation (i.e. simultaneously vs simultanously). Second, this should be assumed by most programmers to be true of all non-shared classes. Third, if you do want to leave this line in, use something short and sweet, like 'Http is not thread safe', instead. You need to look at/think about the order in which things are declared. Having things like HttpMethod declared long after Http doesn't make any sense. Also, it looks like you're not using '///ditto' for HttpMethod. On that note, why is HttpMethod a free-standing enum and not declared inside http? Http.Method, Http.Result, etc. make a lot more sense. The setTimeCondition documentation seems a bit confusing. I'd recommend stream-lining the convenience function documentation. For example, http.head only requires a /// Returns: An HttpResult containing the url's headers. Alternatively, you could use ditto and merge the documentation and examples for head/get/etc into a single block. Also, headAsync seems to document a data argument, but not take one, while post has an undocumented parameter, etc. In general, you might want to consider removing some of the redundant/self-obvious parameter documentation blocks. On second thought, use ditto, slim down the docs and move this set of functions either to the beginning or end of Http. It would make sense to put these right after the definition of Http.Method/method. You also might want to consider grouping all the xxxAsync functions together, instead of interleaving them. Check the grammar in http.postData http.onReceiveHeader has issues with it's 'See' link. It's example is too wide. You don't need really need the parameter documentation, particularly if you improve the example, i.e.: string[string] headerInformation; with(auto http = new Http("http://www.google.com")) { onReceiveHeader = (string key, string value) { if ( value.length <= 10 ) headerInformation[key.idup] = value.idup; }; onReceive = (ubyte[] data) { }; perform; } Also, an tab/indent should be 4 spaces not 5 and no example should start indented. Remember to comment like this: /** My regular comment * Example: --- // Example code --- * Other Stuff: */ You need to fix/complete onSend's documentation. contentLength(size_t len); => contentLength(size_t length); 'Perform http request' => 'Perform an http request' AuthMethod/CurlAuth need to be part of docs. Why is setAuthenticationMethod not authenticationMethod or authentication? (And why is AuthMethod not AuthenticationMethod?) Similarly, why is setTimeCondition not timeCondition and CurlTimeCond not Http.TimeCondition? Why is followLocation(int maxRedirs) not named maxRedirections? HttpResult.text should have huge warning flags with regard to use and/or have its design re-thought. Is there some reason you're not at least caching the result and throwing errors on invalid uses of content, etc. Calling text multiple times should be okay. And calling content after text should be either logically valid, or throw. AsyncHttpResult.byLine/byChunk examples is improperly indented and too wide. Why is HttpStatusLine not Http.Status?
May 16 2011
parent reply Jonas Drewsen <jdrewsen nospam.com> writes:
On 17/05/11 08.03, Robert Jacques wrote:
 On Mon, 16 May 2011 17:07:57 -0400, jdrewsen <jdrewsen nospam.com> wrote:

 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas

I think the API looks good with regard to the core functionality it provides, but the documentation is sub-par, an certain structs should be defined inside the Http class. I've listed a bunch of comments I had while reading the documentation below. One minor question regarding the actual implementation: do you define a pragma(lib, ...) to ease setup/tool chain pains, or do you intend something different?

I define a pragma yes.
 Regarding the First example:
 * Use string instead of const(char)[]

 * Don't waste vertical space with empty comment lines like this

 //
 // This comment style
 //
 * Comment lines should be grammatically correct, i.e. '// GET using
 async range' should be '// GET using an asynchronous range' or '//
 Perform a GET using an asynchronous range', etc

 * Don't use 'l' as a variable. It looks like a 1. Also, using 'line'
 would make the meaning clearer

 * Example code should not exceed 80 chars in width. People's browser
 setups vary, and wrapped lines are bad.

 * You can drop the 'delegate size_t' from the PUT example by making the
 return 0u, 0U or size_t.init.

 * The last example is missing a description. (Also, it goes over 80 chars)

 The license section should include a mention of libcurl's license, etc
 in addition to the wrapper's license. In general, for non-Boost
 libraries like this, should we include the license/copyright as a
 string/tuple enum? (i.e. for use in about boxes / command line options)
 Should said license/copyright be register to a global repository upon
 module import? (i.e. for auto-magical about boxes, etc)

special string/tuple has been decided upon I'll leave this out.
 'Do not use the same instance of this class in two threads
 simultanously.' First, you need to spell check all your documentation
 (i.e. simultaneously vs simultanously). Second, this should be assumed
 by most programmers to be true of all non-shared classes. Third, if you
 do want to leave this line in, use something short and sweet, like 'Http
 is not thread safe', instead.

 You need to look at/think about the order in which things are declared.
 Having things like HttpMethod declared long after Http doesn't make any
 sense. Also, it looks like you're not using '///ditto' for HttpMethod.

I don't know what ditto does. I'll try to look at the ddoc documentation.
 On that note, why is HttpMethod a free-standing enum and not declared
 inside http? Http.Method, Http.Result, etc. make a lot more sense.

 The setTimeCondition documentation seems a bit confusing.

 I'd recommend stream-lining the convenience function documentation. For
 example, http.head only requires a /// Returns: An HttpResult containing
 the url's headers. Alternatively, you could use ditto and merge the
 documentation and examples for head/get/etc into a single block. Also,
 headAsync seems to document a data argument, but not take one, while
 post has an undocumented parameter, etc. In general, you might want to
 consider removing some of the redundant/self-obvious parameter
 documentation blocks. On second thought, use ditto, slim down the docs
 and move this set of functions either to the beginning or end of Http.
 It would make sense to put these right after the definition of
 Http.Method/method. You also might want to consider grouping all the
 xxxAsync functions together, instead of interleaving them.

As mentioned I need to figure out what ditto does. headAsync comment does not match its signature. I'll fix. I've noticed that trying out examples live in the docs may be activated at some point. If that is the case I think it is nice to have running examples of each method.
 Check the grammar in http.postData

 http.onReceiveHeader has issues with it's 'See' link. It's example is
 too wide. You don't need really need the parameter documentation,
 particularly if you improve the example, i.e.:
 string[string] headerInformation;
 with(auto http = new Http("http://www.google.com")) {
 onReceiveHeader = (string key, string value) {
 if ( value.length <= 10 )
 headerInformation[key.idup] = value.idup;
 };
 onReceive = (ubyte[] data) { };
 perform;
 }

I see your point. The question is how we would like phobos docs to be: 1, consistent with all parameters documented for each method and each method explained and exemplified 2, minimal with examples and explanations where absolutely needed. or somewhere in between. The current style if the curl wrapper is closer to 1 than 2. Anything official about this?
 Also, an tab/indent should be 4 spaces not 5 and no example should start
 indented. Remember to comment like this:
 /** My regular comment
 * Example:
 ---
 // Example code
 ---
 * Other Stuff:
 */

 You need to fix/complete onSend's documentation.

 contentLength(size_t len); => contentLength(size_t length);

 'Perform http request' => 'Perform an http request'

 AuthMethod/CurlAuth need to be part of docs.

 Why is setAuthenticationMethod not authenticationMethod or
 authentication? (And why is AuthMethod not AuthenticationMethod?)
 Similarly, why is setTimeCondition not timeCondition and CurlTimeCond
 not Http.TimeCondition?

AuthMethod because that's what it is called in libcurl. I like the short form better and have done the same for AsyncHttpResult (not AsynchronousHttpResult). setTimeCondition cannot be made into a property since it takes two arguments. CurlTimeCond is not calld Http.TimeCondition because it comes from a etc.c.curl. I could alias it in Http ofcourse.
 Why is followLocation(int maxRedirs) not named maxRedirections?

This is what it is called in libcurl. I think I'll rename it to maxRedirections anyway since it seems to confuse.
 HttpResult.text should have huge warning flags with regard to use and/or
 have its design re-thought. Is there some reason you're not at least
 caching the result and throwing errors on invalid uses of content, etc.
 Calling text multiple times should be okay. And calling content after
 text should be either logically valid, or throw.

It is mostly based on storage concerns. Initially the data is stored as ubyte[] and can be returned by content(). When calling text() to get a string it will be decoded and returned. I now have a coule of options: 1, to store the string in the result for future calls to text(). I can do that and keep the original data for future calls to content(). This will double the amount of memory needed. 2, generated the string on each call to text(). This way no copy of the data is kept in HttpResult. 3, generated and store the string and then throw away the original data. Again no copy is kept in HttpResult. I've chosen the 2. version since I guessed that once you have called text() it is more likely that you'll call text again and not content().
 AsyncHttpResult.byLine/byChunk examples is improperly indented and too
 wide.

 Why is HttpStatusLine not Http.Status?

Thank you for your feedback!
May 17 2011
parent jdrewsen <jdrewsen nospam.com> writes:
Den 17-05-2011 16:57, Robert Jacques skrev:
 On Tue, 17 May 2011 07:24:16 -0400, Jonas Drewsen <jdrewsen nospam.com>
 wrote:
 On 17/05/11 08.03, Robert Jacques wrote:
 On Mon, 16 May 2011 17:07:57 -0400, jdrewsen <jdrewsen nospam.com>
 wrote:

 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas



 I don't know what ditto does. I'll try to look at the ddoc documentation.

"If a documentation comment for a declaration consists only of the identifier ditto then the documentation comment for the previous declaration at the same declaration scope is applied to this declaration as well." [snip]
 http.onReceiveHeader has issues with it's 'See' link. It's example is
 too wide. You don't need really need the parameter documentation,
 particularly if you improve the example, i.e.:
 string[string] headerInformation;
 with(auto http = new Http("http://www.google.com")) {
 onReceiveHeader = (string key, string value) {
 if ( value.length <= 10 )
 headerInformation[key.idup] = value.idup;
 };
 onReceive = (ubyte[] data) { };
 perform;
 }

I see your point. The question is how we would like phobos docs to be: 1, consistent with all parameters documented for each method and each method explained and exemplified 2, minimal with examples and explanations where absolutely needed. or somewhere in between. The current style if the curl wrapper is closer to 1 than 2. Anything official about this?

Official? I don't know. But looking over std.algorithm, Param: is only used 3 times, for each of the find variants, in order to clarify what haystack and needle are and in one case make the limitations/requirements on haystack/needle explicit. Looking over the rest of phobos, I see a mix of styles, depending on the author, and some things (like datetime) seem to do both. [snip]
 CurlTimeCond is not calld Http.TimeCondition because it comes from a
 etc.c.curl. I could alias it in Http ofcourse.

I think where it's defined is less of an issue than where the documentation is. [snip]
 HttpResult.text should have huge warning flags with regard to use and/or
 have its design re-thought. Is there some reason you're not at least
 caching the result and throwing errors on invalid uses of content, etc.
 Calling text multiple times should be okay. And calling content after
 text should be either logically valid, or throw.

It is mostly based on storage concerns. Initially the data is stored as ubyte[] and can be returned by content(). When calling text() to get a string it will be decoded and returned. I now have a coule of options: 1, to store the string in the result for future calls to text(). I can do that and keep the original data for future calls to content(). This will double the amount of memory needed. 2, generated the string on each call to text(). This way no copy of the data is kept in HttpResult. 3, generated and store the string and then throw away the original data. Again no copy is kept in HttpResult. I've chosen the 2. version since I guessed that once you have called text() it is more likely that you'll call text again and not content().

Hmm... I take it that content and text are going to be large enough, and result long lived enough to warrant not having the array cached inside result. In that case you might want to rename text to toString or decode. 'text' feels like it should be cheaply reusable, while 'toString'/'decode' doesn't. (this is a verb/noun thing) Also, 'toString' tends to compose better with the rest of D while 'decode' is similar to the std.utf functions. Also, if memory re-use is a concern, a 'toStringInPlace' or 'decodeInPlace' with the old behavior might be appropriate, although I might recommend setting content to the utf-8 string and updating the encodingSchemeName as appropriate.

I think I'll go for the toString() version. /Jonas
May 17 2011
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2011-05-16 23:07, jdrewsen wrote:
 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas

I like the API, seems simple. -- /Jacob Carlborg
May 17 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 17 May 2011 07:24:16 -0400, Jonas Drewsen <jdrewsen nospam.com>  
wrote:
 On 17/05/11 08.03, Robert Jacques wrote:
 On Mon, 16 May 2011 17:07:57 -0400, jdrewsen <jdrewsen nospam.com>  
 wrote:

 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas



 I don't know what ditto does. I'll try to look at the ddoc documentation.

"If a documentation comment for a declaration consists only of the identifier ditto then the documentation comment for the previous declaration at the same declaration scope is applied to this declaration as well." [snip]
 http.onReceiveHeader has issues with it's 'See' link. It's example is
 too wide. You don't need really need the parameter documentation,
 particularly if you improve the example, i.e.:
 string[string] headerInformation;
 with(auto http = new Http("http://www.google.com")) {
 onReceiveHeader = (string key, string value) {
 if ( value.length <= 10 )
 headerInformation[key.idup] = value.idup;
 };
 onReceive = (ubyte[] data) { };
 perform;
 }

I see your point. The question is how we would like phobos docs to be: 1, consistent with all parameters documented for each method and each method explained and exemplified 2, minimal with examples and explanations where absolutely needed. or somewhere in between. The current style if the curl wrapper is closer to 1 than 2. Anything official about this?

Official? I don't know. But looking over std.algorithm, Param: is only used 3 times, for each of the find variants, in order to clarify what haystack and needle are and in one case make the limitations/requirements on haystack/needle explicit. Looking over the rest of phobos, I see a mix of styles, depending on the author, and some things (like datetime) seem to do both. [snip]
 CurlTimeCond is not calld Http.TimeCondition because it comes from a  
 etc.c.curl. I could alias it in Http ofcourse.

I think where it's defined is less of an issue than where the documentation is. [snip]
 HttpResult.text should have huge warning flags with regard to use and/or
 have its design re-thought. Is there some reason you're not at least
 caching the result and throwing errors on invalid uses of content, etc.
 Calling text multiple times should be okay. And calling content after
 text should be either logically valid, or throw.

It is mostly based on storage concerns. Initially the data is stored as ubyte[] and can be returned by content(). When calling text() to get a string it will be decoded and returned. I now have a coule of options: 1, to store the string in the result for future calls to text(). I can do that and keep the original data for future calls to content(). This will double the amount of memory needed. 2, generated the string on each call to text(). This way no copy of the data is kept in HttpResult. 3, generated and store the string and then throw away the original data. Again no copy is kept in HttpResult. I've chosen the 2. version since I guessed that once you have called text() it is more likely that you'll call text again and not content().

Hmm... I take it that content and text are going to be large enough, and result long lived enough to warrant not having the array cached inside result. In that case you might want to rename text to toString or decode. 'text' feels like it should be cheaply reusable, while 'toString'/'decode' doesn't. (this is a verb/noun thing) Also, 'toString' tends to compose better with the rest of D while 'decode' is similar to the std.utf functions. Also, if memory re-use is a concern, a 'toStringInPlace' or 'decodeInPlace' with the old behavior might be appropriate, although I might recommend setting content to the utf-8 string and updating the encodingSchemeName as appropriate.
May 17 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

jdrewsen wrote:
Please see comments below.

Den 17-05-2011 16:42, Andrei Alexandrescu skrev:
 Thanks for the response! A few more answers and comments within
 (everything deleted counts as "sounds great").

 On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.

I guess I should just remove that comment because other defaults has=20 been selected as well e.g. timeouts.
 16. Documentation should point to descriptions of the HTTP methods
 wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)

Well DEL is actually called DELETE in the HTTP RFC. But delete is a=20 reserved word i D so I used del() instead. delete() wouldn't work in=20 following code in think: with(auto http =3D ...) { delete(...); }
 22. byLine/byChunk should not expose a string or generally data
 that can be shared safely by the client. That's because it would
 need to create a new string with each iteration, which is very
 inefficient. Instead, they should expose char[]/ubyte[] just like
 File.byLine does, and reuse the buffer with each call. Essentially
 byLine/byChunk should be near-zero-overhead means to transfer and
 inspect arbitrarily large streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior.

ok nice to know.
 One more suggestion - you may want to also provide classic boring
 synchronous read/write methods that take a user-provided buffer. I'm
 sure people could use such e.g. when they want to stream data inside
 their own threads, or even in the main thread if they don't mind
 blocking.

This would mean hooking into libcurl and selecting on its sockets.=20 Totally doable.

api is possible, but select is only used to wait until data is ready, reading and writing the data is still done by curl. And as curl does not allow you to supply the data buffer, having user-provided buffers is afaik impossible.
But I would rather stop here feature wise and wrap
this thing up. I would like to focus my efforts on a candidate for=20
std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent? I've been trying to get familiar with libev lately, so I have bindings for livev 3.9 / 4.04 (combined, version can be selected with version statements): http://dl.dropbox.com/u/24218791/d/src/libev.html (comments in the c header have been turned into doc comments. The binding is partially based on Leandro Lucarella bindings: http://git.llucax.com.ar/?p=3Dsoftware/ev.d.git) --=20 Johannes Pfau
May 18 2011
prev sibling next sibling parent Johannes Pfau <spam example.com> writes:
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Jonas Drewsen wrote:
On 18/05/11 10.09, Johannes Pfau wrote:
 jdrewsen wrote:
 Please see comments below.

 Den 17-05-2011 16:42, Andrei Alexandrescu skrev:
 Thanks for the response! A few more answers and comments within
 (everything deleted counts as "sounds great").

 On 5/17/11 3:50 AM, Jonas Drewsen wrote:
 14. Isn't the max redirect configurable via a parameter?

Yes. It is called Http.followLocation from libcurls followLocation option. I will rename it to maxRedirections for clearity.

You mention this about many higher-level functions: "Maximum redirections are set to 10." Then I'm thinking, if it's an important parameter, make it a defaulted parameter for the function.

I guess I should just remove that comment because other defaults has been selected as well e.g. timeouts.
 16. Documentation should point to descriptions of the HTTP
 methods wrapped (e.g. "post", "del" etc).

Do you mean point to the relevant RFC?

Yah, or a good tutorial. (I didn't know DEL existed!)

Well DEL is actually called DELETE in the HTTP RFC. But delete is a reserved word i D so I used del() instead. delete() wouldn't work in following code in think: with(auto http =3D ...) { delete(...); }
 22. byLine/byChunk should not expose a string or generally data
 that can be shared safely by the client. That's because it would
 need to create a new string with each iteration, which is very
 inefficient. Instead, they should expose char[]/ubyte[] just like
 File.byLine does, and reuse the buffer with each call.
 Essentially byLine/byChunk should be near-zero-overhead means to
 transfer and inspect arbitrarily large streams.

byLine/byChunk is currenly only available for the async methods which is implemented using message passing. This means that for passing the message a copy has to be made because a message is by-value or immutable data. Is there another way for me to pass the message without doing a copy... some kind of move semantic?

A library element is planned but not on the short list yet. You can work around that by internally doing a cast. As long as you observe the commitment that buffers are not accessed simultaneously by more than one thread you're well within defined behavior.

ok nice to know.
 One more suggestion - you may want to also provide classic boring
 synchronous read/write methods that take a user-provided buffer.
 I'm sure people could use such e.g. when they want to stream data
 inside their own threads, or even in the main thread if they don't
 mind blocking.

This would mean hooking into libcurl and selecting on its sockets. Totally doable.

blocking api is possible, but select is only used to wait until data is ready, reading and writing the data is still done by curl. And as curl does not allow you to supply the data buffer, having user-provided buffers is afaik impossible.

_Doable_ - not easy :) Select will wait for data to be ready and ask curl to handle the data=20 chunk. Curl in turn calls back to a registered callback handler with the data read. That handler fills the buffer provided by the user. If not enough data has been receive an new select is performed until the=20 requested amount of data is read. Then the blocking method can return.

I think it's not necessary to do multiple calls if the first didn't return enough data. Most read calls are only guaranteed to return a maximum of data, less is always possible. It's a bigger problem if the user supplied buffer is smaller than the curl buffer. And you always end up copying data. So it might be better to just return a reference to the buffer allocated by curl, or just forget about this api, curl just doesn't work well this way.
 But I would rather stop here feature wise and wrap
 this thing up. I would like to focus my efforts on a candidate for
 std.net.event/reactor/proactor module.

Have you already started working on std.net.event? Do you plan to base that on libev / libevent?

I'll finish the curl wrapper before starting on it. I have had a look at libev/event before and they are very nice libs. But I think I'll have to implement it from scratch for two reasons: 1, AFAIK we cannot include code in std phobos that is not boost licensed or a license as liberal. libev for example requires you to distribute the license with your software.

That could indeed be a problem. I think libev/libevent even have the same license. Probably it's really best to implement it from scratch, but it will be quite some work. I'd still have a look at the libev documentation though, it contains some details about OS quirks. And try to avoid Linux AIO ;-)
2, It introduces a dependency to an external project in phobos.=20
Currently no dependencies are present. This makes phobos very easy to=20
install and use out of the box.

The optimal solution to these problems from my point of view would be
to get that "much discussed" package tool in place soon (CPAN,apt
alike).

Heck, now I think about it maybe I should do that before any std.net=20
stuff. Let's see how the wind blows - many interesting projects :)

 I've been trying to get familiar with libev lately, so I have
 bindings for livev 3.9 / 4.04 (combined, version can be selected
 with version statements):
 http://dl.dropbox.com/u/24218791/d/src/libev.html (comments in the c
 header have been turned into doc comments. The binding is partially
 based on Leandro Lucarella bindings:
 http://git.llucax.com.ar/?p=3Dsoftware/ev.d.git)

Very nice. I've also stumbled upon some other async D libs out there=20 that could be a good starting point. It seems that Leandro has both bindings and a wrapper in place. And a=20 wrapper is really nice to have to make it more D'ish.

Yes, I have a higher level wrapper for libev as well, but the new 'shared' stuff made it a lot more complicated to get that right.
/Jonas

Johannes Pfau
May 18 2011
prev sibling parent bls <bizprac orange.fr> writes:
Am 16.05.2011 23:07, schrieb jdrewsen:
 Hi,

 I've been working on a wrapper for the etc.c.curl module. It is now
 pretty stable and I would very much like some feedback on the API.

 http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

 BTW I use template mixins which doesn't seem to get included in the
 generated docs. Is there any way I can make this work?

 /Jonas

You really think this is good for your mental health ? // PUT with data senders string msg = "Hello world"; -------------------------------- http.onSend = (void[] data) { if (msg.empty) return 0; auto m = cast(void[])msg; typeof(size_t) len = m.length; data[0..len] = m[0..$]; msg.length = 0; return len; }; Remarkable stuff, indeed ! I am not even sure that this thingy compiles , but fuc da duc, at least it shows what you CAN do in D. :)
May 23 2011