digitalmars.D - Asynchronicity in D
- Max Klyga <max.klyga gmail.com> Mar 31 2011
- Piotr Szturmaj <bncrbme jadamspam.pl> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- Piotr Szturmaj <bncrbme jadamspam.pl> Mar 31 2011
- =?UTF-8?B?QWxla3NhbmRhciBSdcW+acSNacSH?= <ruzicic.aleksandar gmail.com> Mar 31 2011
- =?UTF-8?B?QWxla3NhbmRhciBSdcW+acSNacSH?= <ruzicic.aleksandar gmail.com> Mar 31 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Apr 01 2011
- Robert Clipsham <robert octarineparrot.com> Mar 31 2011
- Robert Clipsham <robert octarineparrot.com> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Apr 01 2011
- Robert Clipsham <robert octarineparrot.com> Apr 05 2011
- Andrej Mitrovic <andrej.mitrovich gmail.com> Mar 31 2011
- "Steven Schveighoffer" <schveiguy yahoo.com> Mar 31 2011
- Torarin <torarind gmail.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Mar 31 2011
- Max Klyga <max.klyga gmail.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Mar 31 2011
- dsimcha <dsimcha yahoo.com> Mar 31 2011
- Jonas Drewsen <jdrewsen nospam.com> Apr 01 2011
- dsimcha <dsimcha yahoo.com> Apr 01 2011
- Jonas Drewsen <jdrewsen nospam.com> Apr 01 2011
- Jonas Drewsen <jdrewsen nospam.com> Apr 01 2011
- Piotr Szturmaj <bncrbme jadamspam.pl> Apr 01 2011
- dsimcha <dsimcha yahoo.com> Apr 01 2011
- dsimcha <dsimcha yahoo.com> Apr 01 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
- Max Klyga <max.klyga gmail.com> Apr 04 2011
- Jonas Drewsen <jdrewsen nospam.com> Apr 05 2011
- Brad Roberts <braddr slice-2.puremagic.com> Apr 01 2011
- Max Klyga <max.klyga gmail.com> Mar 31 2011
- Max Klyga <max.klyga gmail.com> Mar 31 2011
- Sean Kelly <sean invisibleduck.org> Mar 31 2011
- Torarin <torarind gmail.com> Apr 01 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
- Brad Roberts <braddr puremagic.com> Apr 01 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
- Fawzi Mohamed <fawzi gmx.ch> Apr 02 2011
- Sean Kelly <sean invisibleduck.org> Apr 04 2011
- Jose Armando Garcia <jsancio gmail.com> Apr 04 2011
- Sean Kelly <sean invisibleduck.org> Apr 01 2011
I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
Mar 31 2011
Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
== Quote from Piotr Szturmaj (bncrbme jadamspam.pl)'s articleMax Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Forgive any naiveness here, but isn't this just a special case of future promise parallelism? Using my proposed std.parallelism module: auto myTask = task(&someNetworkClass.recv); // Use a new thread, but this could also be executed on a task // queue to keep the number of threads down. myTask.executeInNewThread(); // Do other stuff. auto recvResults = myTask.yieldWait(); // Do stuff with recvResults If I understand correctly (though it's very likely I don't since I've never written any serious networking code before) such a thing can and should be implemented on top of more general parallelism primitives rather than being baked directly into the networking design.
Mar 31 2011
dsimcha wrote:== Quote from Piotr Szturmaj (bncrbme jadamspam.pl)'s articleMax Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Forgive any naiveness here, but isn't this just a special case of future promise parallelism? Using my proposed std.parallelism module: auto myTask = task(&someNetworkClass.recv); // Use a new thread, but this could also be executed on a task // queue to keep the number of threads down. myTask.executeInNewThread(); // Do other stuff. auto recvResults = myTask.yieldWait(); // Do stuff with recvResults If I understand correctly (though it's very likely I don't since I've never written any serious networking code before) such a thing can and should be implemented on top of more general parallelism primitives rather than being baked directly into the networking design.
Asynchronous tasks are great thing, but async networking IO aka overlapped IO is something different. Its efficency comes from direct interaction with operating system. In case of tasks you need one thread for each task, whereas in overlapped IO, you just request some well known IO operation, which is completed by the OS in the background. You don't need any threads, besides those which handle completion events. Here is a good explanation of how it works in WinNT: http://en.wikipedia.org/wiki/Overlapped_I/O
Mar 31 2011
I really like design of node.js (http://nodejs.org) it's internally based on libev and everything runs in a single-threaded event loop. It's proven to be highly concurrent and memory efficient. Maybe a wrapper around libev(ent) for D ala node.js would be good solution for asynchronous API, other than thread approach (I always like to have more than one option and choose one which suits better for concrete task I'm dealing with). Whatever solution to be chosen I'd like to have an API like this: readTextAsync(filename, (string contents) { // do something with contents }); On Thu, Mar 31, 2011 at 2:04 PM, Piotr Szturmaj <bncrbme jadamspam.pl> wrote:Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
Yes, asynchronous networking API would be more scalable. If you're collecting information about async IO, please take a look at libevent and libev, also NT's completion ports, FreeBSD's kqueue and Linux epoll. Protocols implemented using event-driven APIs should scale to thousands of connections using few working threads. Moreover async protocols could be wrapped to be synchronous (but not other way around) and used just like well known blocking API's. Basically, while using async IO, you request some data to be written and then wait for completion event (usually by callback function). Then you request some allocated buffer to be read and then you wait until network stack fills it up. You do not wait for blocking operation like with using send() or recv(), instead you may do some useful processing between events.
Mar 31 2011
I really like design of node.js (http://nodejs.org) it's internally based on libev and everything runs in a single-threaded event loop. It's proven to be highly concurrent and memory efficient. Maybe a wrapper around libev(ent) for D ala node.js would be good solution for asynchronous API, other than thread approach (I always like to have more than one option and choose one which suits better for concrete task I'm dealing with). Whatever solution to be chosen I'd like to have an API like this: readTextAsync(filename, (string contents) { // do something with contents });
Mar 31 2011
On 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Mar 31 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?
Mar 31 2011
On 3/31/11 11:43 AM, dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?
asyncBuf would be an excellent backend for that, but the entire thing needs encapsulation so as to not expose user code to the risks of undue sharing. Andrei
Mar 31 2011
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 11:43 AM, dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?
needs encapsulation so as to not expose user code to the risks of undue sharing. Andrei
Ok. If there are any enhancements that would make asyncBuf work better for this, let me know.
Mar 31 2011
On 31/03/11 18.43, dsimcha wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleOn 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
Is this basically std.parallelism.asyncBuf (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html#asyncBuf) or something different?
Cool! I've been thinking about creating such a class myself. I definitely think that asyncBuf fits on with the 'foreach' support in the curl wrapper.
Apr 01 2011
On 31/03/2011 17:26, Andrei Alexandrescu wrote:foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... }
What would be awesome is if this was backed by fibers, then you have a really simple and easy wrapper for doing async io, handling lots of connections as the data comes in one thread. Of course a none-by-line version would also be excellent given that a lot of IO doesn't care about new lines. -- Robert http://octarineparrot.com/
Mar 31 2011
On 31/03/2011 17:53, Robert Clipsham wrote:On 31/03/2011 17:26, Andrei Alexandrescu wrote:foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... }
What would be awesome is if this was backed by fibers, then you have a really simple and easy wrapper for doing async io, handling lots of connections as the data comes in one thread. Of course a none-by-line version would also be excellent given that a lot of IO doesn't care about new lines. -- Robert http://octarineparrot.com/
To clarify, this isn't much use for clients, but for servers it could be useful, or if you're wanting to act as multiple clients. -- Robert http://octarineparrot.com/
Mar 31 2011
== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1 fibers running on one hardware thread, your code will only use a single core. 2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
Mar 31 2011
On 31/03/11 20.48, dsimcha wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N> 1 fibers running on one hardware thread, your code will only use a single core.
The fastest webservers out there (e.g. zeus, nginx, lighttpd) also use some kind of fibers and they solve this problem by simply forking the process and sharing the listening socket between processes. That way you get the best of two worlds. /Jonas2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
Mar 31 2011
On 31/03/11 21.19, Torarin wrote:I'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. Torarin
Very interesting! Do you have a github repos we can see? /Jonas
Mar 31 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s
Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently implemented
core.thread; I have no idea how this applies to the general case in
languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1
running on one hardware thread, your code will only use a single core.
default thread-local storage of statics. All fibers running on a thread will currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's a non-trivial change.
Let's assume for the sake of argument that we are otherwise ready to make said change. What would the performance implications of this be for programs using TLS heavily but not using fibers? My gut feeling is that, if this has considerable performance implications for non-fiber-using programs, it should be left alone long-term, or fiber-local storage should be added as a separate entity.
Mar 31 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 4:03 PM, dsimcha wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s
Are fibers really better/faster than threads? I've heard rumors
they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently
incore.thread; I have no idea how this applies to the general case in
languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N >
fibersrunning on one hardware thread, your code will only use a single
It bears mentioning that this has interesting implications for the default thread-local storage of statics. All fibers running on a
will currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's a
change.
Let's assume for the sake of argument that we are otherwise ready to
change. What would the performance implications of this be for
heavily but not using fibers? My gut feeling is that, if this has
performance implications for non-fiber-using programs, it should be
long-term, or fiber-local storage should be added as a separate
It's more an issue of creating an understandable programming model. If someone is using statics, the result should be the same regardless of whether the code gets a dedicated thread or is multiplexed with other code on one thread. ie. fibers are ideally an implementation detail.
Yes, but what would be the likely performance cost of doing so?
Apr 01 2011
On 31/03/2011 19:34, Andrej Mitrovic wrote:Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
I've written up a first draft of an article about this at: http://octarineparrot.com/article/view/getting-more-fiber-in-your-diet I'd be grateful if the people replying this thread could take a look over it. -- Robert http://octarineparrot.com/
Apr 05 2011
Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Mar 31 2011
On Thu, 31 Mar 2011 14:48:13 -0400, dsimcha <dsimcha yahoo.com> wrote:== Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s articleAre fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently implemented in core.thread; I have no idea how this applies to the general case in other languages) and threads: 1. Fibers can't be used to implement parallelism. If you have N > 1 fibers running on one hardware thread, your code will only use a single core. 2. Fibers use cooperative concurrency, threads use preemptive concurrency. This means three things: a. It's the programmer's responsibility to determine how execution time is split between a group of fibers, not the OS's. b. If one fiber goes into an endless loop, all fibers executing on that thread will hang. c. Getting concurrency issues right is easier, since fibers can't be implicitly pre-empted by other fibers in the middle of some operation. All context switches are explicit, and as mentioned there is no true parallelism. 3. Fibers are implemented in userland, and context switches are a lot cheaper (IIRC an order of magnitude or more, on the order of 100 clock cycles for fibers vs. 1000 for OS threads).
4. often there is an OS limit on how many threads a process can create. There is no such limit on fibers (only memory). Using fibers can increase the number of simultaneous tasks that can be run by quite a bit. -Steve
Mar 31 2011
I'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. Torarin
Mar 31 2011
On 31/03/11 18.26, Andrei Alexandrescu wrote:On 3/31/11 6:35 AM, Max Klyga wrote:I've been thinking on things I can change in my GSoC proposal to make it stronger and noticed that currently Phobos does not address asynchronous I/O of any kind. A number of threads on thid newsgroup mentioned about this problem or shown ways other languages address asynchronicity. I want to ask D community about plans on asynchronicity in Phobos. Did somenone in Phobos team thought about possible design? How does asynchronicity stacks with ranges? What model should D adapt? etc.
I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor. I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads. Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent). Thanks, Andrei
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this. /Jonas
Mar 31 2011
On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:snip
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this.
I'm very glad to hear this. Now my motivation doubled!/Jonas
Any comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread. libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
On 31/03/11 23.20, Max Klyga wrote:On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:snip
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this.
I'm very glad to hear this. Now my motivation doubled!/Jonas
Any comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
== Quote from Jonas Drewsen (jdrewsen nospam.com)'s articleOn 31/03/11 23.20, Max Klyga wrote:On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:snip
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this.
I'm very glad to hear this. Now my motivation doubled!/Jonas
Any comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
32767 per process: http://support.microsoft.com/kb/111855
Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.
Mar 31 2011
On 01/04/11 01.07, dsimcha wrote:== Quote from Jonas Drewsen (jdrewsen nospam.com)'s articleOn 31/03/11 23.20, Max Klyga wrote:On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:snip
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this.
I'm very glad to hear this. Now my motivation doubled!/Jonas
Any comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
32767 per process: http://support.microsoft.com/kb/111855
Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.
There doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. /Jonas
Apr 01 2011
== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:On 01/04/11 01.07, dsimcha wrote:Again forgive my naiveness, as most of my experience with concurrency
concurrency to implement parallelism, not concurrency for its own
32,000 threads be more than enough for anything? I can't imagine
programs would really need this level of concurrency, or how bad
any specific thread would be when you have this many. Right now in
Manager the program with the most threads is explorer.exe, with 28.
There doesn't have to be a thread for each socket. Actually many
not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.
...or use such huge timeslices that the illusion of simultaneous execution breaks down.
Apr 01 2011
On 01/04/11 18.12, dsimcha wrote:== Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:On 01/04/11 01.07, dsimcha wrote:Again forgive my naiveness, as most of my experience with concurrency
concurrency to implement parallelism, not concurrency for its own
32,000 threads be more than enough for anything? I can't imagine
programs would really need this level of concurrency, or how bad
any specific thread would be when you have this many. Right now in
Manager the program with the most threads is explorer.exe, with 28.
There doesn't have to be a thread for each socket. Actually many
not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.
...or use such huge timeslices that the illusion of simultaneous execution breaks down.
I guess multiple cores will help out there.
Apr 01 2011
On 01/04/11 17.21, Sean Kelly wrote:On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:On 01/04/11 01.07, dsimcha wrote:Again forgive my naiveness, as most of my experience with concurrency is concurrency to implement parallelism, not concurrency for its own sake. Shouldn't 32,000 threads be more than enough for anything? I can't imagine what kinds of programs would really need this level of concurrency, or how bad performance on any specific thread would be when you have this many. Right now in my Task Manager the program with the most threads is explorer.exe, with 28.
There doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common.
Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.
For services where clients spend most time inactive this works. An example could be a server for messenger like clients. Most of the time the clients are just connected waiting for messages. As long as nothing is transmitted no context switching is done. Or maybe I've misunderstood the reason for the context switching? /Jonas
Apr 01 2011
Sean Kelly wrote:Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.
Breaking that barrier requires more than one IP address :)
Apr 01 2011
== Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, Brad
Why/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
Apr 01 2011
On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote:== Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, Brad
Why/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
A huge asynchronous message queue. State is handled either explicitly or implicitly via fibers. After reading Brad's statement, I'd be interested in seeing a comparison of the memory and performance differences of a thread per socket vs. asynchronous model though (assuming that sockets don't need to interact, so no need for synchronization).
From the discussions lately I'm thoroughly surprised just how specialized a field massively concurrent server programming apparently is. Since it's so far from the type of programming I do my naive opinion was that it wouldn't take a Real Programmer from another specialty (though I emphasize Real Programmer, not code monkey) long to get up to speed.
Apr 01 2011
On Apr 1, 2011, at 1:43 PM, Piotr Szturmaj wrote:Sean Kelly wrote:Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.
Breaking that barrier requires more than one IP address :)
That's why it gets weird :-)
Apr 01 2011
Jonas, thanks for your valuable feedback. You've expressed interest in mentoring a networking a networking project and since I couldn't find any other way to contact you directly, I'll post my message here. As was discussed later, your work on curl supersedes my future effort on network clients. You stated that a foundation for implementing web servers would be a good project. Web servers/clients would benefit from a framework similar to Boost.ASIO or libev. So I would like to ask you to contact me directly or write a message here about what do I need to do to interest you in mentoring such a project. I plan to post my updated proposal tomorrow and gather some more feedback while I still have time until the deadline. Comments are welcome.
Apr 04 2011
On 04/04/11 22.23, Max Klyga wrote:Jonas, thanks for your valuable feedback. You've expressed interest in mentoring a networking a networking project and since I couldn't find any other way to contact you directly, I'll post my message here. As was discussed later, your work on curl supersedes my future effort on network clients. You stated that a foundation for implementing web servers would be a good project. Web servers/clients would benefit from a framework similar to Boost.ASIO or libev. So I would like to ask you to contact me directly or write a message here about what do I need to do to interest you in mentoring such a project. I plan to post my updated proposal tomorrow and gather some more feedback while I still have time until the deadline. Comments are welcome.
Both are excellent frameworks to get inspired from and would definitely catch my interest. And as you can see from the news threads about networking and asynchronicity there are a lot of people who have experience on that topic and can provide help/feedback. I have signed up to be a mentor but I still need to be accepted. Looking forward to the updated proposal. /Jonas
Apr 05 2011
On Fri, 1 Apr 2011, dsimcha wrote:On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote:== Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, Brad
Why/how do you have all these connections open concurrently with only a few threads? Fibers? A huge asynchronous message queue to deal with new requests from connections that aren't idle?
A huge asynchronous message queue. State is handled either explicitly or implicitly via fibers. After reading Brad's statement, I'd be interested in seeing a comparison of the memory and performance differences of a thread per socket vs. asynchronous model though (assuming that sockets don't need to interact, so no need for synchronization).
From the discussions lately I'm thoroughly surprised just how specialized a field massively concurrent server programming apparently is. Since it's so far from the type of programming I do my naive opinion was that it wouldn't take a Real Programmer from another specialty (though I emphasize Real Programmer, not code monkey) long to get up to speed.
I won't go into the why part, it's not interesting here, and I probably can't talk about it anyway. The simplified view of how: No fibers, just a small number of kernel threads (via pthread). An epoll thread that queues tasks that are pulled by the 1 per cpu worker threads. The queue is only as big as the outstanding work to do. Assuming that the rate of socket events is less than the time it takes to deal with the data, the queue stays empty. It's actually quite a simple architecture at the 50k foot view. Having recently hired some new people, I've got recent evidence... it doesn't take a lot of time to fully 'get' the network layer of the system. There's other parts that are more complicated, but they're not part of this discussion. A thread per socket would never handle this load. Even with a 4k stack (which you'd have to be _super_ careful with since C/C++/D does nothing to help you track), you'd be spending 4G of ram on just the stacks. And that's before you get near the data structures for all the sockets, etc. Later, Brad
Apr 01 2011
On 2011-04-01 01:45:54 +0300, Jonas Drewsen said:On 31/03/11 23.20, Max Klyga wrote:On 2011-03-31 22:35:43 +0300, Jonas Drewsen said:On 31/03/11 18.26, Andrei Alexandrescu wrote:snip
I believe that we would need both the threaded async IO that you describe but also a select based one. The thread based is important e.g. in order to keep buffering incoming data while processing elements in the range (the OS will only buffer the number of bytes allowed by sysadmin). The select based is important in order to handle _many_ connections at the same time (think D as the killer app for websockets). As Robert mentions fibers would be nice to take into consideration as well. What I also see as an unresolved issue is non-blocking handling in http://erdani.com/d/phobos/std_stream2.html which fits in naturally with this topic I think. I may very well agree mentoring if we get a solid proposal out of this.
I'm very glad to hear this. Now my motivation doubled!/Jonas
Any comments, if this proposal be more focused on asyncronous networking or should it address asyncronisity in Phobos in general? I researched a little about libev and libevent. Both seem to have some limitations on Windows platform. libev can only be used to deal with sockets on Windows and uses select, which limits libev to 64 file handles per thread.
Actually it seems the limit is OS version dependent and for NT it is 32767 per process: http://support.microsoft.com/kb/111855
That page also mentions that actual limit is 64 by default and is adjustable, but requires recompilation, because it is defined in a macro (FD_SETSIZE).libevent uses Windows overlaping I/O, but this thread[1] shows that current implementation has perfomance limitations. So one option may be to use either libev or libevent, and implement things on top of them. Another is to make a new implementation (from scratch, or reuse some code from Boost.ASIO[2]) using threads or fibers, or maybe both. 1. http://www.mail-archive.com/libevent-users monkey.org/msg01730.html 2. http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html
Mar 31 2011
On 2011-03-31 19:26:45 +0300, Andrei Alexandrescu said:On 3/31/11 6:35 AM, Max Klyga wrote:snip
I think that would be a good contribution that would complement Jonas'. You'll need to discuss cooperation with him and at best Jonas would agree to become a mentor.
Jonas agreed to become a mentor if I make this proposal strong/interesting enough.I've posted a couple of weeks earlier how I think that could work with ranges: the range maintains the asynchronous state and has a queue of already-available buffers received. The network traffic occurs in a different thread; the range throws requests over the fence to libcurl and libcurl throws buffers over the fence back to the range. The range offers a seemingly synchronous interface: foreach (line; byLineAsync("http://d-programming-language.org")) { ... use line ... } except that the processing and the fetching of data occur in distinct threads.
I thought about the same.Server-side code such as network servers etc. would also be an interesting topic. Let me know if you're versed in the likes of libev(ent).
I have no experience with libev/libevent, but I see no problem working with either, after reading some exapmles or/and documentation.
Mar 31 2011
On Mar 31, 2011, at 11:48 AM, dsimcha wrote:=3D=3D Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s =
Are fibers really better/faster than threads? I've heard rumors that they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently implemented =
core.thread; I have no idea how this applies to the general case in =
languages) and threads: =20 1. Fibers can't be used to implement parallelism. If you have N > 1 =
running on one hardware thread, your code will only use a single core.
It bears mentioning that this has interesting implications for the = default thread-local storage of statics. All fibers running on a thread = will currently share the thread's static data. This could be worked = around by doing TLS manually at the fiber level, but it's a non-trivial = change.=
Mar 31 2011
2011/3/31 Jonas Drewsen <jdrewsen nospam.com>:On 31/03/11 21.19, Torarin wrote:I'm currently working on an http and networking library that uses asynchronous sockets running in fibers and an event loop a la libev. These async sockets have the same interface as regular Berkeley sockets, so clients can choose whether to be synchronous, asynchronous or threaded with template arguments. For instance, it has HttpClient!AsyncSocket and HttpClient!Socket. Torarin
Very interesting! Do you have a github repos we can see? /Jonas
I just put one up: https://github.com/torarin/net Here's an example: https://github.com/torarin/net/blob/master/example.d Torarin
Apr 01 2011
On Mar 31, 2011, at 4:03 PM, dsimcha wrote:=3D=3D Quote from Sean Kelly (sean invisibleduck.org)'s articleOn Mar 31, 2011, at 11:48 AM, dsimcha wrote:=3D=3D Quote from Andrej Mitrovic (andrej.mitrovich gmail.com)'s
Are fibers really better/faster than threads? I've heard rumors =
they perform exactly the same, and that there's no benefit of using fibers over threads. Is that true?
Here are some key differences between fibers (as currently =
incore.thread; I have no idea how this applies to the general case in
languages) and threads: =20 1. Fibers can't be used to implement parallelism. If you have N > =
fibersrunning on one hardware thread, your code will only use a single =
It bears mentioning that this has interesting implications for the default thread-local storage of statics. All fibers running on a =
will currently share the thread's static data. This could be worked around by doing TLS manually at the fiber level, but it's a =
change.
Let's assume for the sake of argument that we are otherwise ready to =
change. What would the performance implications of this be for =
heavily but not using fibers? My gut feeling is that, if this has =
performance implications for non-fiber-using programs, it should be =
long-term, or fiber-local storage should be added as a separate =
It's more an issue of creating an understandable programming model. If = someone is using statics, the result should be the same regardless of = whether the code gets a dedicated thread or is multiplexed with other = code on one thread. ie. fibers are ideally an implementation detail.=
Apr 01 2011
On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:On 01/04/11 01.07, dsimcha wrote:=20 =20 Again forgive my naiveness, as most of my experience with concurrency =
concurrency to implement parallelism, not concurrency for its own =
32,000 threads be more than enough for anything? I can't imagine =
programs would really need this level of concurrency, or how bad =
any specific thread would be when you have this many. Right now in =
Manager the program with the most threads is explorer.exe, with 28.
There doesn't have to be a thread for each socket. Actually many =
not unimaginable for certain server loads e.g. websockets or game = servers. But I know it is not that common. Hopefully not at all common. With that level of concurrency the process = will spend more time context switching than executing code.=
Apr 01 2011
On Apr 1, 2011, at 8:47 AM, dsimcha wrote:=3D=3D Quote from Sean Kelly (sean invisibleduck.org)'s article=20 It's more an issue of creating an understandable programming model. =
someone is using statics, the result should be the same regardless of whether the code gets a dedicated thread or is multiplexed with other code on one thread. ie. fibers are ideally an implementation detail.
Yes, but what would be the likely performance cost of doing so?
The cost of context-switching with fibers is significantly smaller than = kernel threads--I think Mikola Lysenko's talk at the Tango conference a = few years back may have some numbers. The performance of using TLS in = general isn't great regardless of whether fibers are involved. Manually = implemented TLS maybe requires one additional lookup? TLS is done = manually on OSX right now, so the code for how it would work is already = in place.=
Apr 01 2011
On Fri, 1 Apr 2011, Sean Kelly wrote:On Apr 1, 2011, at 12:59 PM, Jonas Drewsen wrote:On 01/04/11 17.21, Sean Kelly wrote:On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:There doesn't have to be a thread for each socket. Actually many servers have very few threads with many sockets each. 32000 sockets is not unimaginable for certain server loads e.g. websockets or game servers. But I know it is not that common.
Hopefully not at all common. With that level of concurrency the process will spend more time context switching than executing code.
For services where clients spend most time inactive this works. An example could be a server for messenger like clients. Most of the time the clients are just connected waiting for messages. As long as nothing is transmitted no context switching is done.
Fair enough. Though I'd still say it's a terrible use of resources, given available asynchronous socket APIs. And as an aside, I think 32K sockets per process is not at all surprising. I've seen apps that use orders of magnitude more than that, though breaking the 64K barrier does get a bit weird.
I've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor limits and memory. It runs a very standard 1 thread per cpu model. Unfortunatly, not yet in D. Later, Brad
Apr 01 2011
On Apr 1, 2011, at 2:24 PM, dsimcha wrote:=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen it break 1M yet, but the only thing stopping it is file descriptor =
memory. It runs a very standard 1 thread per cpu model. =
not yet in D. Later, Brad
Why/how do you have all these connections open concurrently with only =
threads? Fibers? A huge asynchronous message queue to deal with new =
from connections that aren't idle?
A huge asynchronous message queue. State is handled either explicitly = or implicitly via fibers. After reading Brad's statement, I'd be = interested in seeing a comparison of the memory and performance = differences of a thread per socket vs. asynchronous model though = (assuming that sockets don't need to interact, so no need for = synchronization).=
Apr 01 2011
There are several difficult issues connected with asynchronicity, high performace networking and connected things. I had to deal with them developing blip ( http://fawzi.github.com/ blip ). My goal with it was to have a good basis for my program dchem, and as consequence is not so optimized in particular for non recursive tasks, and it is D1, but I think that the issues are generally relevant. i/o and asynchronicity is a very important aspect and one that will tend to "pollute" many parts of the library, and introduce dependencies that are difficult to remove thus those choices have to be done carefully. Overview: ======== Threads vs fibers: ----------------------- * an issue not yet brought up is that thread wire some memory, and so have an extra cost that fibers don't. * evaluation strategy of fibers can be chosen by the user, this is relevant for recursive tasks where each task spawns other tasks, different strategies (breadth first evaluation like threads uses a *lot* more resources than depth first, by having many more tasks concurrently in evaluation) Otherwise the relevant points already brought forth by others are: - context switch of fibers (assuming that memory is active) is much faster - context switch are chosen by the user in fibers (cooperative multitasking), this allows one to choose the optima point to switch, but a "bad" fibers can ruin the response time the others. - d is not stackless (like Go for example), so each fiber needs to have enough space for the stack (something that often is not so easy to predict). This makes fiber still a bit costly if one really needs a lot of them. 64 bit can help here, because hopefully the active part is small, and it can be kept in RAM, even using a rather large virtual space. Still as correctly said by Brad for heavily uniform handling of many tasks manual management (and using stateless functions as much as possible) can be much more efficient. Closures ------------ When possible and for the low level (often used) operations delegates and functions calls are a better solution than , structs and manual memory handling for "closures" are a good choice for low level operations, because one can avoid the heap allocation connected with the automatic closure. This approach cannot be avoided in D1, whereas D2 has the very useful closures, but at low level their cost should be avoided when possible. About using structs there are subtle issues that I think are connected with optimization of the compiler (I never really investigated them, I always changed the code, or resorted to heap allocation. The main issue is that one would like to optimize as much as possible, and to do it it normally assumes that the current thread is the only user of the stack. If you pass stack stored structures to other threads these assumptions aren't true anymore, so the memory of a stack allocated struct might be reused even before the function returns (unless I am mistaken and the ABI forbids it, in this case tell me). Async i/o ---------- * almost always i/o is much slower than CPU, so an i/o operation is bound to make the cpu wait, so one wants to use the wait efficiently. - A very simple way is to just use blocking i/o, and just have other threads do other threads. - async i/o allows overlap of several operations in a single thread. - for files an even more efficient way to communicate sharing of the buffer with the kernel (aio_*) - an important issue is avoiding waste of cpu cycles while waiting, to achieve this one can collect several waiting operations and use a single thread to wait on several of them, select, poll and epoll allow this, and increase the efficiency of several kinds of programs - libev and libevent are cross platform libraries that can help having an event based approach, taking care to check a large number of events and call a user defined callback when they happen in a robust cross platform way locks, semaphores ------------ to synchronize between threads locks and semaphores are a standard way to synchronize. One has to be careful to mix them with fiber scheduling with locks, as one can easily deadlock. Hardware informationy ----------------------------- Efficient usage of computational resource depends also on being able to identify the available hardware. Don did quite some hacking to get useful information out of cpuinfo, but if one is interested in more complex computers more info would be nice. I use hwloc for this purpose, it is cross plattform, can be embedded. Possible solutions ============== Having async i/o can be presented as normal synchronous (blocking) i/ o, but this makes sense only if one has several objects waiting, or uses fibers, and executes other fiber while waiting. How acceptable it is to rely (and thus introduce a dependency on) things like libev or hwloc? For my purposes using them was ok, and they are cross platform and embeddable, but is it true also for phobos? Asynchronicity means being able to have work to be executed concurrently and then resynchronize at a later point. One can use processes (that also give memory protection), threads, or fibers to achieve this. If one uses just threads, then asynchronous i/o makes sense only with a fully manual (explicit) handling of it, hiding it away will be equivalent to blocking i/o. Fibers allow one to hide async io and make it look as blocking, but as Sean told there are issues with using fibers with D2 TLS. I kind of dislike the use of TSL for non low level infrastructure stuff, but that is just me around here it seems. In blip I choose to go with fiber based switching. I wrapped libev both at low level and at a higher level, in such a way than one can use them directly (for maximum performance) For the sockets I use non blocking calls, and a single "waiting" (io) thread, but hide them so that they are used just like blocking calls. An important design decision if using fibers is if one should be able to have a "naked" thread, or hide the fiber scheduling in each thread. In blip I went for yes, because it is entirely realized as a normal library, but that gives some ugly corner cases when one uses a method that wants to suspend a thread that doesn't have scheduling place. Building the scheduling into all threads is probably cleaner if one goes with fibers. The problem of TSL and fibers remains though, especially if one allows the migration of fibers from one thread to the other (as I do in blip). An important design choice in blip was being able to cope with recursive parallelism (typical of computation tasks), not just with the (server like) concurrent parallelism that is typical of servers. I feel that it is important, but is something that might not be seen as such by others. To do ==== Now about async io the first step is for sure to expose an asynchronous API. This doesn't influence or depends on other parts of the library much. An important decision if/which external libraries one can rely on. Making the async API nicer to use, or even use it "behind the scenes" as I do in blip needs more complex choices on the basic handling of suspension and synchronization. Something like that is bound to be used in several parts of phobos so a careful choice is needed. This parts are also partially connected with high performance networking (another GSoC project). Fawzi Fawzi
Apr 02 2011
On Apr 1, 2011, at 6:08 PM, Brad Roberts wrote:On Fri, 1 Apr 2011, dsimcha wrote: =20On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote: =20=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). I haven't seen =
break 1M yet, but the only thing stopping it is file descriptor =
and memory. It runs a very standard 1 thread per cpu model. =
not yet in D. =20
I won't go into the why part, it's not interesting here, and I =
can't talk about it anyway. =20 The simplified view of how: No fibers, just a small number of kernel=20=
threads (via pthread). An epoll thread that queues tasks that are=20 pulled by the 1 per cpu worker threads. The queue is only as big as =
outstanding work to do. Assuming that the rate of socket events is =
than the time it takes to deal with the data, the queue stays empty. =20 It's actually quite a simple architecture at the 50k foot view. =
recently hired some new people, I've got recent evidence... it doesn't=20=
take a lot of time to fully 'get' the network layer of the system. =20 There's other parts that are more complicated, but they're not part of=20=
this discussion. =20 A thread per socket would never handle this load. Even with a 4k =
(which you'd have to be _super_ careful with since C/C++/D does =
help you track), you'd be spending 4G of ram on just the stacks. And=20=
that's before you get near the data structures for all the sockets, =
I misread your prior post as one thread per socket and was a bit = baffled. Makes a lot more sense now. Potentially one read event per = socket still means a pretty long queue though. Regarding the stack size... is that much of an issue with 64-bit = processes? Figure a few pages of committed memory per thread plus a = large reserved range that shouldn't impact things at all. Definitely = more than the event model, but maybe not tremendously so?=
Apr 04 2011
The problem with threads is the context switch not really the stack size. Threads are not the solution to increase performance. In high performance systems threads are used for fairness in the resquest-response pipeline not for performance. Obviously, this fact is not argued when talking about uni-processor. With the availability of cheap multi-processor, multi-core and hyper-threading multiple threads are needed to keep all logical processors busy. In other words multiple threads are needed to get the most out of the hardware even if you don't care about fairness. Now the argument above doesn't take into account implementability. Most people write sequential multithreaded because it is "easier" (I personally think it is harder not to violate the invariant in the presence of concurrency/sharing). Many people feel it is easier to extend the programmer to understand a sequential shared model than to do a paradigm switch to an event based model. On Mon, Apr 4, 2011 at 6:49 PM, Sean Kelly <sean invisibleduck.org> wrote:On Apr 1, 2011, at 6:08 PM, Brad Roberts wrote:On Fri, 1 Apr 2011, dsimcha wrote:On 4/1/2011 7:27 PM, Sean Kelly wrote:On Apr 1, 2011, at 2:24 PM, dsimcha wrote:=3D=3D Quote from Brad Roberts (braddr puremagic.com)'s articleI've got an app that regularly runs with hundreds of thousands of connections (though most of them are mostly idle). =A0I haven't seen=
break 1M yet, but the only thing stopping it is file descriptor limi=
and memory. =A0It runs a very standard 1 thread per cpu model. =A0Unfort=
not yet in D.
I won't go into the why part, it's not interesting here, and I probably can't talk about it anyway. The simplified view of how: No fibers, just a small number of kernel threads (via pthread). =A0An epoll thread that queues tasks that are pulled by the 1 per cpu worker threads. =A0The queue is only as big as t=
outstanding work to do. =A0Assuming that the rate of socket events is le=
than the time it takes to deal with the data, the queue stays empty. It's actually quite a simple architecture at the 50k foot view. =A0Havin=
recently hired some new people, I've got recent evidence... it doesn't take a lot of time to fully 'get' the network layer of the system. There's other parts that are more complicated, but they're not part of this discussion. A thread per socket would never handle this load. =A0Even with a 4k stac=
(which you'd have to be _super_ careful with since C/C++/D does nothing =
help you track), you'd be spending 4G of ram on just the stacks. =A0And that's before you get near the data structures for all the sockets, etc.
I misread your prior post as one thread per socket and was a bit baffled.=
ill means a pretty long queue though.Regarding the stack size... is that much of an issue with 64-bit processe=
ed range that shouldn't impact things at all. =A0Definitely more than the e= vent model, but maybe not tremendously so?
Apr 04 2011
On Apr 1, 2011, at 12:59 PM, Jonas Drewsen wrote:On 01/04/11 17.21, Sean Kelly wrote:=20 On Apr 1, 2011, at 7:49 AM, Jonas Drewsen wrote:=20 There doesn't have to be a thread for each socket. Actually many =
not unimaginable for certain server loads e.g. websockets or game = servers. But I know it is not that common.=20 Hopefully not at all common. With that level of concurrency the =
=20 For services where clients spend most time inactive this works. An =
the clients are just connected waiting for messages. As long as nothing = is transmitted no context switching is done. Fair enough. Though I'd still say it's a terrible use of resources, = given available asynchronous socket APIs. And as an aside, I think 32K = sockets per process is not at all surprising. I've seen apps that use = orders of magnitude more than that, though breaking the 64K barrier does = get a bit weird.=
Apr 01 2011









Piotr Szturmaj <bncrbme jadamspam.pl> 