www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Network server design question

reply Marek Janukowicz <marek janukowicz.net> writes:
I'm writing a network server with some specific requirements:
- 5-50 clients connected (almost) permanently (maybe a bit more, but 
definitely not hundreds of them)
- possibly thousands of requests per seconds
- responses need to be returned within 5 seconds or the client will 
disconnect and complain

Currently I have a Master thread (which is basically the main thread) which 
is handling connections/disconnections, socket operations, sends parsed 
requests for processing to single Worker thread, sends responses to clients. 
Interaction with Worker is done via message passing.

The problem with my approach is that I read as much data as possible from 
each ready client in order. As there are many requests this read phase might 
take a few seconds making the clients disconnect. Now I see 2 possible 
solutions:

1. Stay with the design I have, but change the workflow somewhat - instead 
of reading all the data from clients just read some requests and then send 
responses that are ready and repeat; the downside is that it's more 
complicated than current design, might be slower (more loop iterations with 
less work done in each iteration) and might require quite a lot of tweaking 
when it comes to how many requests/responses handle each time etc.

2. Create separate thread per each client connection. I think this could 
result in a nice, clean setup, but I see some problems:
- I'm not sure how ~50 threads will do resource-wise (although they will 
probably be mostly waiting on Socket.select)
- I can't initialize threads created via std.concurrency.spawn with a Socket 
object ("Aliases to mutable thread-local data not allowed.")
- I already have problems with "interrupted system call" on Socket.select 
due to GC kicking in; I'm restarting the call manually, but TBH it sucks I 
have to do anything about that and would suck even more to do that with 50 
or so threads

If anyone has any idea how to handle the problems I mentioned or has any 
idea for more suitable design I would be happy to hear it. It's also 
possible I'm approaching the issue from completely wrong direction, so you 
can correct me on that as well.

-- 
Marek Janukowicz
Aug 04 2013
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 4 August 2013 at 19:37:40 UTC, Marek Janukowicz wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit 
 more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client 
 will
 disconnect and complain

 Currently I have a Master thread (which is basically the main 
 thread) which
 is handling connections/disconnections, socket operations, 
 sends parsed
 requests for processing to single Worker thread, sends 
 responses to clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as 
 possible from
 each ready client in order. As there are many requests this 
 read phase might
 take a few seconds making the clients disconnect. Now I see 2 
 possible
 solutions:

 1. Stay with the design I have, but change the workflow 
 somewhat - instead
 of reading all the data from clients just read some requests 
 and then send
 responses that are ready and repeat; the downside is that it's 
 more
 complicated than current design, might be slower (more loop 
 iterations with
 less work done in each iteration) and might require quite a lot 
 of tweaking
 when it comes to how many requests/responses handle each time 
 etc.

 2. Create separate thread per each client connection. I think 
 this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although 
 they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn 
 with a Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on 
 Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH 
 it sucks I
 have to do anything about that and would suck even more to do 
 that with 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned 
 or has any
 idea for more suitable design I would be happy to hear it. It's 
 also
 possible I'm approaching the issue from completely wrong 
 direction, so you
 can correct me on that as well.
Take a look at how vibe.d approaches the problem: http://vibed.org/
Aug 04 2013
parent reply Marek Janukowicz <marek janukowicz.net> writes:
John Colvin wrote:
 Take a look at how vibe.d approaches the problem:
 http://vibed.org/
Vibe.d uses fibers, which I don't find feasible for my particular application for a number of reasons: - I have constant number of ever-connected clients, not an ever-changing number of random clients - after I read and parse a request there is not much room for yielding during processing (I don't do I/O or database calls, I have an in-memory "database" for performance reasons) - event-based programming generally looks complicated to me and (for the reason mentioned above) I don't see much point in utilizing it in this case -- Marek Janukowicz
Aug 04 2013
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 4 August 2013 at 20:37:43 UTC, Marek Janukowicz wrote:
 John Colvin wrote:
 Take a look at how vibe.d approaches the problem:
 http://vibed.org/
Vibe.d uses fibers, which I don't find feasible for my particular application for a number of reasons: - I have constant number of ever-connected clients, not an ever-changing number of random clients - after I read and parse a request there is not much room for yielding during processing (I don't do I/O or database calls, I have an in-memory "database" for performance reasons) - event-based programming generally looks complicated to me and (for the reason mentioned above) I don't see much point in utilizing it in this case
You'd be surprised how easy it can be with vibe and D Nonetheless, this isn't my area of expertise, I just thought it might be interesting, if you hadn't already seen it.
Aug 04 2013
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
04-Aug-2013 23:38, Marek Janukowicz пишет:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread) which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to clients.
 Interaction with Worker is done via message passing.
Typical approach would be to separate responsibilities even more and make a pool of threads per each stage. You may want to make a Master thread only handle new connections selecting over an "accept socket" (or a few if multiple end-points). Then it may distribute connected clients over I/O worker threads. A pool of I/O workers would then only send/receive data passing parsed request to "real" workers and responses back. They handle disconnects and closing though. The real workers could be again pooled to be more responsive (or e.g. just one per each I/O thread).
 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat - instead
 of reading all the data from clients just read some requests and then send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations with
 less work done in each iteration) and might require quite a lot of tweaking
 when it comes to how many requests/responses handle each time etc.
Or split the clients across a group of threads to reduce maximum latency. See above, just determine the amount of clients per thread your system can sustain in time. A better way would be to dynamically load-balance clients between threads but it's far more complicated.
 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer. The benefit with thread per client is that you don't even need Socket.select, just use blocking I/O and do the work per each parsed request in the same thread.
 - I can't initialize threads created via std.concurrency.spawn with a Socket
 object ("Aliases to mutable thread-local data not allowed.")
This can be hacked with casts to shared void* and back. Not pretty but workable.
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks I
 have to do anything about that and would suck even more to do that with 50
 or so threads
I'm not sure if that problem will surface with blocking reads.
 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.
-- Dmitry Olshansky
Aug 04 2013
parent reply Marek Janukowicz <marek janukowicz.net> writes:
Dmitry Olshansky wrote:

 04-Aug-2013 23:38, Marek Janukowicz пишет:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread)
 which is handling connections/disconnections, socket operations, sends
 parsed requests for processing to single Worker thread, sends responses
 to clients. Interaction with Worker is done via message passing.
Typical approach would be to separate responsibilities even more and make a pool of threads per each stage. You may want to make a Master thread only handle new connections selecting over an "accept socket" (or a few if multiple end-points). Then it may distribute connected clients over I/O worker threads. A pool of I/O workers would then only send/receive data passing parsed request to "real" workers and responses back. They handle disconnects and closing though.
This is basically approach "2." I mentioned in my original post, I'm glad you agree it makes sense :)
 The real workers could be again pooled to be more responsive (or e.g.
 just one per each I/O thread).
There are more things specific to this particular application that would play a role here. One is that such "real workers" would operate on a common data structure and I would have to introduce some synchronization. Single worker thread was not my first approach, but after some woes with other solutions I decided to take it, because the problem is really not in processing (where a single thread does just fine so far), but in socket read/write operations.
 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase
 might take a few seconds making the clients disconnect. Now I see 2
 possible solutions:

 1. Stay with the design I have, but change the workflow somewhat -
 instead of reading all the data from clients just read some requests and
 then send responses that are ready and repeat; the downside is that it's
 more complicated than current design, might be slower (more loop
 iterations with less work done in each iteration) and might require quite
 a lot of tweaking when it comes to how many requests/responses handle
 each time etc.
Or split the clients across a group of threads to reduce maximum latency. See above, just determine the amount of clients per thread your system can sustain in time. A better way would be to dynamically load-balance clients between threads but it's far more complicated.
Yeah, both approaches seem to be somewhat more complicated and I'd like to aovid this if possible. So one client per thread makes sense to me.
 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer.
Thanks for those numbers, it's great to know at least the ranges here.
 The benefit with thread per client is that you don't even need
 Socket.select, just use blocking I/O and do the work per each parsed
 request in the same thread.
Not really. This is something that Go (the language I also originally considered for the project) has solved in much better way - you can "select" on a number of "channels" and have both I/O and message passing covered by those. In D I must react both to network data or message from worker incoming, which means either self-pipe trick (which leads to Socket.select again) or some quirky stuff with timeouts on socket read and message receive (but this is basically a busy loop).
 - I can't initialize threads created via std.concurrency.spawn with a
 Socket object ("Aliases to mutable thread-local data not allowed.")
This can be hacked with casts to shared void* and back. Not pretty but workable.
I'm using this trick elsewhere, was a bit reluctant to try it here. Btw. would it work if I pass a socket to 2 threads - reader and writer (by working I mean - not running into race conditions and other scary concurrent stuff)? Also I'm really puzzled by the fact this common idiom doesn't work in some elegant way in D. I tried to Google a solution, but only found some weird tricks. Can anyone really experienced in D tell me why there is no nice solution for this (or correct me if I'm mistaken)?
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks
 I have to do anything about that and would suck even more to do that with
 50 or so threads
I'm not sure if that problem will surface with blocking reads.
Unfortunately it will (it precisely happens with blocking calls). Thanks for your input, which shed some more light for me and also allowed me to explain the whole thing a bit more. -- Marek Janukowicz
Aug 04 2013
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sun, 04 Aug 2013 22:59:04 +0200
schrieb Marek Janukowicz <marek janukowicz.net>:

 - I already have problems with "interrupted system call" on
 Socket.select due to GC kicking in; I'm restarting the call
 manually, but TBH it sucks I have to do anything about that and
 would suck even more to do that with 50 or so threads
I'm not sure if that problem will surface with blocking reads.
Unfortunately it will (it precisely happens with blocking calls). Thanks for your input, which shed some more light for me and also allowed me to explain the whole thing a bit more.
This is a bug in std.socket BTW. Blocking calls will get interrupted by the GC - there's no way to avoid that - but std.socket should handle this internally and just retry the interrupted operation. Please file a bug report about this. (Partial writes is another issue that could/should be handled in std.socket so the user doesn't have to care about it)
Aug 04 2013
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:
 This is a bug in std.socket BTW. Blocking calls will get 
 interrupted by
 the GC - there's no way to avoid that - but std.socket should 
 handle
 this internally and just retry the interrupted operation. 
 Please file a
 bug report about this.
I'm not sure whether we can do anything about Socket.select itself at this point, as it would be a breaking API change – interrupted calls returning a negative value is even mentioned explicitly in the docs. There should, however, be a way to implement this in a platform-independent manner in client code, or even a second version that handles signal interruptions internally.
 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)
I don't think that would be possible – std.socket by design is a thin wrapper around BSD sockets (whether that's a good idea or not is another question), and how to handle partial writes depends entirely on the environment the socket is used in (think event-based architecture using fibers vs. other designs). In general, I wonder what the best way for going forward with std.socket is. Sure, we could try to slowly morph it into a "modern" networking implementation, but the current state also has its merits, as it allows people to use the familiar BSD sockets API without having to worry about all the trivial differences between the platforms (e.g. in symbol names). We should definitely add a note to std.socket though that it is a low-level API and that there might be a better choice for most applications (e.g. vibe.d, Thrift, …). David
Aug 05 2013
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Mon, 05 Aug 2013 16:07:40 +0200
schrieb "David Nadlinger" <code klickverbot.at>:

 On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:
 This is a bug in std.socket BTW. Blocking calls will get=20
 interrupted by
 the GC - there's no way to avoid that - but std.socket should=20
 handle
 this internally and just retry the interrupted operation.=20
 Please file a
 bug report about this.
=20 I'm not sure whether we can do anything about Socket.select=20 itself at this point, as it would be a breaking API change =E2=80=93=20 interrupted calls returning a negative value is even mentioned=20 explicitly in the docs. =20 There should, however, be a way to implement this in a=20 platform-independent manner in client code, or even a second=20 version that handles signal interruptions internally. =20
 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)
=20 I don't think that would be possible =E2=80=93 std.socket by design is a=
=20
 thin wrapper around BSD sockets (whether that's a good idea or=20
 not is another question), and how to handle partial writes=20
 depends entirely on the environment the socket is used in (think=20
 event-based architecture using fibers vs. other designs).
=20
 In general, I wonder what the best way for going forward with=20
 std.socket is. Sure, we could try to slowly morph it into a=20
 "modern" networking implementation, but the current state also=20
 has its merits, as it allows people to use the familiar BSD=20
 sockets API without having to worry about all the trivial=20
 differences between the platforms (e.g. in symbol names).
=20
 We should definitely add a note to std.socket though that it is a=20
 low-level API and that there might be a better choice for most=20
 applications (e.g. vibe.d, Thrift, =E2=80=A6).
=20
 David
You're right, I somehow thought std.socket was supposed to offer a high level API. But as it was designed as a low level wrapper we probably can't do much without breaking API compatibility.
Aug 05 2013
parent Marek Janukowicz <marek janukowicz.net> writes:
Johannes Pfau wrote:

 This is a bug in std.socket BTW. Blocking calls will get
 interrupted by
 the GC - there's no way to avoid that - but std.socket should
 handle
 this internally and just retry the interrupted operation.
 Please file a
 bug report about this.
I'm not sure whether we can do anything about Socket.select itself at this point, as it would be a breaking API change – interrupted calls returning a negative value is even mentioned explicitly in the docs. There should, however, be a way to implement this in a platform-independent manner in client code, or even a second version that handles signal interruptions internally.
 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)
I don't think that would be possible – std.socket by design is a thin wrapper around BSD sockets (whether that's a good idea or not is another question), and how to handle partial writes depends entirely on the environment the socket is used in (think event-based architecture using fibers vs. other designs). In general, I wonder what the best way for going forward with std.socket is. Sure, we could try to slowly morph it into a "modern" networking implementation, but the current state also has its merits, as it allows people to use the familiar BSD sockets API without having to worry about all the trivial differences between the platforms (e.g. in symbol names). We should definitely add a note to std.socket though that it is a low-level API and that there might be a better choice for most applications (e.g. vibe.d, Thrift, …). David
You're right, I somehow thought std.socket was supposed to offer a high level API. But as it was designed as a low level wrapper we probably can't do much without breaking API compatibility.
But - as I mentioned in another post - it looks like "interrupted system call" problem happens only with select and not eg. with blocking read. This means that current behaviour is inconsistent between std.socket functions. Also it was possible to make this work for read (I believe this bug & fix address that - http://d.puremagic.com/issues/show_bug.cgi?id=2242) and I don't think anyone considered it as "compatibility breaking", so why not take the same route for select? -- Marek Janukowicz
Aug 06 2013
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, August 05, 2013 16:07:40 David Nadlinger wrote:
 I don't think that would be possible – std.socket by design is a
 thin wrapper around BSD sockets (whether that's a good idea or
 not is another question), and how to handle partial writes
 depends entirely on the environment the socket is used in (think
 event-based architecture using fibers vs. other designs).
 
 In general, I wonder what the best way for going forward with
 std.socket is. Sure, we could try to slowly morph it into a
 "modern" networking implementation, but the current state also
 has its merits, as it allows people to use the familiar BSD
 sockets API without having to worry about all the trivial
 differences between the platforms (e.g. in symbol names).
I'm all for std.socket being completely rewritten. I think that how it's tied to BSD sockets is a major liability. Where I work, we have a platform- independent socket class (in C++) which is generic enough that we have a derived class which uses OpenSSL so that you can swap between normal sockets and SSL sockets seemlessly. You can't do anything of the sort with std.socket. Unfortunately, I have neither the time nor the expertise at this point to rewrite std.socket, but if no one else does it, I'm sure that I'll write something eventually (whether it makes it into Phobos or not), because I really, really don't like how std.socket is put together. Having used a socket class which enables you to seemlessly pass around SSL sockets in the place of normal sockets, and having seen how fantastic and wonderful that is, I'm likely to have a very low opinion of a socket class whose design does not allow that. - Jonathan M Davis
Aug 05 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
05-Aug-2013 00:59, Marek Janukowicz пишет:
 Dmitry Olshansky wrote:
 There are more things specific to this particular application that would
 play a role here. One is that such "real workers" would operate on a common
 data structure and I would have to introduce some synchronization. Single
 worker thread was not my first approach, but after some woes with other
 solutions I decided to take it, because the problem is really not in
 processing (where a single thread does just fine so far), but in socket
 read/write operations.
Then what will make it simple is the following scenario X Input threads feed 1 worker thread by putting requests into one shared queue. You would have to use lock around it or get some decent concurrent queue code (but better start with simple lock + queue)... Got carried away ... you can just easily use std.concurrency message passing (as *it is* an implicit message queue). Then just throw in another writer thread that recieves pairs of responses + sockets (or shared void* e-hm) from "real worker". The pipeline is then roughly: Acceptor --CREATES--> InputWorkers (xN) --SEND REQ--> Real Worker --SOCK/RESP--> Writer
 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
50 threads is not that big a problem. Around 100+ could be, 1000+ is a killer.
Thanks for those numbers, it's great to know at least the ranges here.
 The benefit with thread per client is that you don't even need
 Socket.select, just use blocking I/O and do the work per each parsed
 request in the same thread.
Not really. This is something that Go (the language I also originally considered for the project) has solved in much better way - you can "select" on a number of "channels" and have both I/O and message passing covered by those.
They multiplex stuff in their runtime. In fact AFAIK they don't even have clean-cut native threads. It would be interesting to see how they handle it but I guess either self-pipe or event-driven + async I/O to begin with.
 In D I must react both to network data or message from worker
 incoming, which means either self-pipe trick (which leads to Socket.select
 again) or some quirky stuff with timeouts on socket read and message receive
 (but this is basically a busy loop).
Sadly like others said with std.socket you get to witness the gory glory of BSD sockets API that shows its age. Regardless it's what all major OS directly provide.
  Btw.
 would it work if I pass a socket to 2 threads - reader and writer (by
 working I mean - not running into race conditions and other scary concurrent
 stuff)?
Should be just fine. See also http://stackoverflow.com/questions/1981372/are-parallel-calls-to-send-recv-on-the-same-socket-valid
 Also I'm really puzzled by the fact this common idiom doesn't work in some
 elegant way in D. I tried to Google a solution, but only found some weird
 tricks. Can anyone really experienced in D tell me why there is no nice
 solution for this (or correct me if I'm mistaken)?
The trick is that Socket/std.socket was designed way back before std.concurrency. It's a class as everything back then liked to be. The catch is that classes by default are mutable and thread-local and thus can't be automatically _safely_ transfered across threads. There were/are talks about adding some kind of Unique helper to facilitate such move in a clean way. So at the moment - nope. -- Dmitry Olshansky
Aug 05 2013
prev sibling next sibling parent =?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:
On 2013-08-04 19:38:49 +0000, Marek Janukowicz said:

 ...
 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.
Hi, I would take a look at the BEEP protocol idea and there at the Vortex library [1] it deals with everything you need. The idea of BEEP is, that you don't have to care about all the network pitfalls since these are always the same. Instead you can concentrate on your application level design. Where the time is spent much more valuable. The lib is written in C and works very good. It's matured and multi-threaded to allow for maximum transfers. [1] http://www.aspl.es/vortex/ -- Robert M. Münch Saphirion AG http://www.saphirion.com smarter | better | faster
Aug 05 2013
prev sibling next sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Sun, 04 Aug 2013 20:38:49 +0100, Marek Janukowicz  
<marek janukowicz.net> wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread)  
 which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to  
 clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase  
 might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat -  
 instead
 of reading all the data from clients just read some requests and then  
 send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations  
 with
 less work done in each iteration) and might require quite a lot of  
 tweaking
 when it comes to how many requests/responses handle each time etc.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn with a  
 Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks  
 I
 have to do anything about that and would suck even more to do that with  
 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so  
 you
 can correct me on that as well.
number of clients. I have had loads of experience with server applications on Windows and a little less on the various flavours of UNIXen and 50 connected clients serviced by 50 threads should be perfectly manageable for the OS. It sounds like only non-blocking sockets have the GC interrupt issue, if so use non-blocking sockets instead. However, it occurs to me that the issue may rear it's head again on the call to select() on non-blocking sockets, so it is worth testing this first. If there is no way around the GC interrupt issue then code up your own recv function and re-use it all your threads, not ideal but definitely workable. In the case of non-blocking sockets your read operation needs to account for the /this would block/ error code, and should go something like this.. (using low level socket function call names because I have not used the D socket library recently) 1. Attempt recv(), expect either DATA or ERROR. 1a. If DATA, process data and handle possible partial request(s) - by 1c. If ERROR and not would block, fail/exit/disconnect. 2. Perform select() (**this may be interruptable by GC**) for a finite shortish timeout - if you want your client handlers to react quickly to the signal to shutdown then you want a shorter time - for example. 2b. If select returns an error, fail/exit/disconnect. Do you have control of the connecting client code as well? If so, think about disabling the Nagle algorithm: http://en.wikipedia.org/wiki/Nagle's_algorithm You will want to ensure the client writes it's requests in a single send() call but in this way you reduce the delay in receiving requests at the server side. If the client writes multiple requests rapidly then with Nagle enabled it may buffer them on the client end and will delay the server seeing the first, but with it disabled the server will see the first as soon as it is written and can start processing it while the client writes. So depending on how your clients send requests, you may see a performance improvement here. I don't know how best to solve the "Aliases to mutable thread-local data not allowed.". You will need to ensure the socket is allocated globally (not thread local) and because you know it's unique and not shared you can cast it as such to get it into the thread, once there you can cast it back to unshared/local/mutable. Not ideal, but not illegal or invalid AFAICS. FYI.. For a better more scaleable solution you would use async IO with a pool of worker threads, I am not sure if D has good support for this and library support for it). Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 05 2013
prev sibling next sibling parent Justin Whear <justin economicmodeling.com> writes:
On Sun, 04 Aug 2013 21:38:49 +0200, Marek Janukowicz wrote: 
 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so
 you can correct me on that as well.
Are you familiar with ZeroMQ? I write network infrastructure on a fairly regular basis and wouldn't dream of doing it without ZeroMQ: http:// zeromq.org/ There are D bindings in Deimos: https://github.com/D-Programming-Deimos/ ZeroMQ
Aug 05 2013
prev sibling next sibling parent Marek Janukowicz <marek janukowicz.net> writes:
Marek Janukowicz wrote:

 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain
 
 Currently I have a Master thread (which is basically the main thread)
 which is handling connections/disconnections, socket operations, sends
 parsed requests for processing to single Worker thread, sends responses to
 clients. Interaction with Worker is done via message passing.
I'd like to thank anyone for valuable input. For now I chose Dmitry's suggestion (which was an extension of my idea to go with thread per client), so I have multiple receivers, single worker and multiple senders. That works quite well, although I didn't really test that with many clients. One nice thing is that "interrupted system call" problem magically went away - it looks like it occurred with Socket.select (which I don't use after architectural changes anymore) only and socket.send/receive is apparently not affected. -- Marek Janukowicz
Aug 05 2013
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
A reasonably common way to handle this is that the event loop thread only
detects events (readable, 
writable, etc) and passes them off to worker threads to process (do the reading
and parsing, do the 
writing, etc).  In general, I wouldn't recommend one thread per active
connection, but if you're 
_sure_ that you're constrained to those low sorts of numbers, then it might
well be the easiest path 
to go for your app.  You definitely want to move the actual i/o out of your
event loop thread. to 
let those other cores take on that job, freeing up your single threaded part to
do a little work as 
possible.  It's your bottleneck and that resource needs to be protected.

On 8/4/13 12:38 PM, Marek Janukowicz wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread) which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat - instead
 of reading all the data from clients just read some requests and then send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations with
 less work done in each iteration) and might require quite a lot of tweaking
 when it comes to how many requests/responses handle each time etc.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn with a Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks I
 have to do anything about that and would suck even more to do that with 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.
Aug 04 2013
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> =
wrote:

 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but=20=
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will=20
 disconnect and complain
Given the relatively small number of concurrent connections, you may be = best off just spawning a thread per connection. The cost of context = switching at that level of concurrency is reasonably low, and the code = will be a heck of a lot simpler than an event loop dispatching jobs to a = thread pool (which is the direction you might head with a larger number = of connections).
 Currently I have a Master thread (which is basically the main thread) =
which=20
 is handling connections/disconnections, socket operations, sends =
parsed=20
 requests for processing to single Worker thread, sends responses to =
clients.=20
 Interaction with Worker is done via message passing.
=20
 The problem with my approach is that I read as much data as possible =
from=20
 each ready client in order. As there are many requests this read phase =
might=20
 take a few seconds making the clients disconnect.
This seems weird to me. Are those reads blocking for some length of = time? I would expect them to return pretty much instantly. How much = data is in each request?
 Now I see 2 possible solutions:
=20
 1. Stay with the design I have, but change the workflow somewhat - =
instead=20
 of reading all the data from clients just read some requests and then =
send=20
 responses that are ready and repeat; the downside is that it's more=20
 complicated than current design, might be slower (more loop iterations =
with=20
 less work done in each iteration) and might require quite a lot of =
tweaking=20
 when it comes to how many requests/responses handle each time etc.
There are a bunch of different approaches along these lines, but the = crux of it is that you'll be multiplexing N connections across an = M-sized thread pool. Each connection carries a buffer with it, and = whenever data is available you stick that connection in a work queue, = and let a pooled thread accumulate the new data into that connection's = buffer and potentially process the complete request.
 2. Create separate thread per each client connection. I think this =
could=20
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they =
will=20
 probably be mostly waiting on Socket.select)
With a thread per connection you can probably just do blocking reads in = each thread and not bother with select at all. And with only 50 threads = I don't think you'll see a performance problem. I've been reading up on = Java NIO recently (their approach for supporting epoll within Java), and = some people have actually said that the old thread-per-connection = approach was actually faster in their tests. Of course, no one seems to = test beyond a few thousand concurrent connections, but that's still well = above what you're doing. In short, I'd consider benchmarking it and see = if performance is up to snuff.
 - I can't initialize threads created via std.concurrency.spawn with a =
Socket=20
 object ("Aliases to mutable thread-local data not allowed.")
You can cast the Socket to shared and cast away shared upon receipt. = I'd like a more formal means of moving uniquely referenced data via = std.concurrency, but that will do the trick for now.
 - I already have problems with "interrupted system call" on =
Socket.select=20
 due to GC kicking in; I'm restarting the call manually, but TBH it =
sucks I=20
 have to do anything about that and would suck even more to do that =
with 50=20
 or so threads
Just wrap it in a function that tests the return value and loops if = necessary. Plenty of system calls need to deal with the EINTR error. = It may not just be GC that's causing it. There's a decent chance you'll = have to deal with SIGPIPE as well.=
Aug 05 2013
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
On 8/5/13 4:33 PM, Sean Kelly wrote:
 On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> wrote:

 I'm writing a network server with some specific requirements: - 5-50 clients
connected (almost)
 permanently (maybe a bit more, but definitely not hundreds of them) - possibly
thousands of
 requests per seconds - responses need to be returned within 5 seconds or the
client will
 disconnect and complain
Given the relatively small number of concurrent connections, you may be best off just spawning a thread per connection. The cost of context switching at that level of concurrency is reasonably low, and the code will be a heck of a lot simpler than an event loop dispatching jobs to a thread pool (which is the direction you might head with a larger number of connections).
I agree, with one important caveat: converting from a blocking thread per connection model to a non-blocking pool of threads model is often essentially starting over. Even at the 50 threads point I tend to think you've passed the point of just throwing threads at the problem. But I'm also much more used to dealing with 10's of thousands of sockets, so my view is a tad biased.
Aug 05 2013
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
On Aug 5, 2013, at 4:49 PM, Brad Roberts <braddr puremagic.com> wrote:

 On 8/5/13 4:33 PM, Sean Kelly wrote:
=20
=20
 Given the relatively small number of concurrent connections, you may =
be best off just spawning a
 thread per connection.  The cost of context switching at that level =
of concurrency is reasonably
 low, and the code will be a heck of a lot simpler than an event loop =
dispatching jobs to a thread
 pool (which is the direction you might head with a larger number of =
connections).
=20
 I agree, with one important caveat:  converting from a blocking thread =
per connection model to a non-blocking pool of threads model is often = essentially starting over. Even at the 50 threads point I tend to think = you've passed the point of just throwing threads at the problem. But = I'm also much more used to dealing with 10's of thousands of sockets, so = my view is a tad biased. I'm in the same boat in terms of experience, so I'm trying to resist my = inclination to do things the scalable way in favor of the simplest = approach that meets the stated requirements. You're right that = switching would mean a total rewrite though, except possibly if you = switched to Vibe, which uses fibers to make things look like the one = thread per connection approach when it's actually multiplexing. The real tricky bit about multiplexing, however, is how to deal with = situations when you need to perform IO to handle client requests. If = that IO isn't event-based as well then you're once again spawning = threads to keep that IO from holding up request processing. I'm = actually kind of surprised that more current-gen APIs don't expose the = file descriptor they use for their work or provide some other means of = integrating into an event loop. In a lot of cases it seems like I end = up having to write my own version of whatever library just to get the = scalability characteristics I require, which is a horrible use of time.=
Aug 06 2013