digitalmars.D - Network server design question

Marek Janukowicz (36/36) Aug 04 2013 I'm writing a network server with some specific requirements:

John Colvin (3/60) Aug 04 2013 Take a look at how vibe.d approaches the problem:

Marek Janukowicz (12/14) Aug 04 2013 Vibe.d uses fibers, which I don't find feasible for my particular

John Colvin (4/22) Aug 04 2013 You'd be surprised how easy it can be with vibe and D

Dmitry Olshansky (24/58) Aug 04 2013 Typical approach would be to separate responsibilities even more and

Marek Janukowicz (34/97) Aug 04 2013 This is basically approach "2." I mentioned in my original post, I'm gla...

Johannes Pfau (8/20) Aug 04 2013 This is a bug in std.socket BTW. Blocking calls will get interrupted by

David Nadlinger (23/32) Aug 05 2013 I'm not sure whether we can do anything about Socket.select

Johannes Pfau (6/45) Aug 05 2013 =20

Marek Janukowicz (10/52) Aug 06 2013 But - as I mentioned in another post - it looks like "interrupted system...

Jonathan M Davis (15/27) Aug 05 2013 I'm all for std.socket being completely rewritten. I think that how it's...

Dmitry Olshansky (33/68) Aug 05 2013 Then what will make it simple is the following scenario

=?iso-8859-1?Q?Robert_M._M=FCnch?= (14/19) Aug 05 2013 Hi, I would take a look at the BEEP protocol idea and there at the
Regan Heath (52/97) Aug 05 2013 Option #2 should be fine, provided you don't intend to scale to a larger...
Justin Whear (6/10) Aug 05 2013 Are you familiar with ZeroMQ? I write network infrastructure on a fairl...
Marek Janukowicz (11/22) Aug 05 2013 I'd like to thank anyone for valuable input. For now I chose Dmitry's
Brad Roberts (8/42) Aug 04 2013 A reasonably common way to handle this is that the event loop thread onl...
Sean Kelly (48/80) Aug 05 2013 Given the relatively small number of concurrent connections, you may be ...
Brad Roberts (5/14) Aug 05 2013 I agree, with one important caveat: converting from a blocking thread p...
Sean Kelly (25/34) Aug 06 2013 of concurrency is reasonably

Marek Janukowicz <marek janukowicz.net> writes:

I'm writing a network server with some specific requirements:
- 5-50 clients connected (almost) permanently (maybe a bit more, but 
definitely not hundreds of them)
- possibly thousands of requests per seconds
- responses need to be returned within 5 seconds or the client will 
disconnect and complain

Currently I have a Master thread (which is basically the main thread) which 
is handling connections/disconnections, socket operations, sends parsed 
requests for processing to single Worker thread, sends responses to clients. 
Interaction with Worker is done via message passing.

The problem with my approach is that I read as much data as possible from 
each ready client in order. As there are many requests this read phase might 
take a few seconds making the clients disconnect. Now I see 2 possible 
solutions:

1. Stay with the design I have, but change the workflow somewhat - instead 
of reading all the data from clients just read some requests and then send 
responses that are ready and repeat; the downside is that it's more 
complicated than current design, might be slower (more loop iterations with 
less work done in each iteration) and might require quite a lot of tweaking 
when it comes to how many requests/responses handle each time etc.

2. Create separate thread per each client connection. I think this could 
result in a nice, clean setup, but I see some problems:
- I'm not sure how ~50 threads will do resource-wise (although they will 
probably be mostly waiting on Socket.select)
- I can't initialize threads created via std.concurrency.spawn with a Socket 
object ("Aliases to mutable thread-local data not allowed.")
- I already have problems with "interrupted system call" on Socket.select 
due to GC kicking in; I'm restarting the call manually, but TBH it sucks I 
have to do anything about that and would suck even more to do that with 50 
or so threads

If anyone has any idea how to handle the problems I mentioned or has any 
idea for more suitable design I would be happy to hear it. It's also 
possible I'm approaching the issue from completely wrong direction, so you 
can correct me on that as well.

-- 
Marek Janukowicz

Aug 04 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 4 August 2013 at 19:37:40 UTC, Marek Janukowicz wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit 
 more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client 
 will
 disconnect and complain

 Currently I have a Master thread (which is basically the main 
 thread) which
 is handling connections/disconnections, socket operations, 
 sends parsed
 requests for processing to single Worker thread, sends 
 responses to clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as 
 possible from
 each ready client in order. As there are many requests this 
 read phase might
 take a few seconds making the clients disconnect. Now I see 2 
 possible
 solutions:

 1. Stay with the design I have, but change the workflow 
 somewhat - instead
 of reading all the data from clients just read some requests 
 and then send
 responses that are ready and repeat; the downside is that it's 
 more
 complicated than current design, might be slower (more loop 
 iterations with
 less work done in each iteration) and might require quite a lot 
 of tweaking
 when it comes to how many requests/responses handle each time 
 etc.

 2. Create separate thread per each client connection. I think 
 this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although 
 they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn 
 with a Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on 
 Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH 
 it sucks I
 have to do anything about that and would suck even more to do 
 that with 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned 
 or has any
 idea for more suitable design I would be happy to hear it. It's 
 also
 possible I'm approaching the issue from completely wrong 
 direction, so you
 can correct me on that as well.

Take a look at how vibe.d approaches the problem: 
http://vibed.org/

Aug 04 2013

Marek Janukowicz <marek janukowicz.net> writes:

John Colvin wrote:
 Take a look at how vibe.d approaches the problem:
 http://vibed.org/

Vibe.d uses fibers, which I don't find feasible for my particular 
application for a number of reasons:
- I have constant number of ever-connected clients, not an ever-changing 
number of random clients
- after I read and parse a request there is not much room for yielding 
during processing (I don't do I/O or database calls, I have an in-memory 
"database" for performance reasons)
- event-based programming generally looks complicated to me and (for the 
reason mentioned above) I don't see much point in utilizing it in this case

-- 
Marek Janukowicz

Aug 04 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 4 August 2013 at 20:37:43 UTC, Marek Janukowicz wrote:
 John Colvin wrote:
 Take a look at how vibe.d approaches the problem:
 http://vibed.org/

 Vibe.d uses fibers, which I don't find feasible for my 
 particular
 application for a number of reasons:
 - I have constant number of ever-connected clients, not an 
 ever-changing
 number of random clients
 - after I read and parse a request there is not much room for 
 yielding
 during processing (I don't do I/O or database calls, I have an 
 in-memory
 "database" for performance reasons)
 - event-based programming generally looks complicated to me and 
 (for the
 reason mentioned above) I don't see much point in utilizing it 
 in this case

You'd be surprised how easy it can be with vibe and D

Nonetheless, this isn't my area of expertise, I just thought it 
might be interesting, if you hadn't already seen it.

Aug 04 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

04-Aug-2013 23:38, Marek Janukowicz пишет:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread) which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to clients.
 Interaction with Worker is done via message passing.

Typical approach would be to separate responsibilities even more  and 
make a pool of threads per each stage.

You may want to make a Master thread only handle new connections 
selecting over an "accept socket" (or a few if multiple end-points).
Then it may distribute connected clients over I/O worker threads.

A pool of I/O workers would then only send/receive data passing parsed 
request to "real" workers and responses back. They handle disconnects 
and closing though.

The real workers could be again pooled to be more responsive (or e.g. 
just one per each I/O thread).

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat - instead
 of reading all the data from clients just read some requests and then send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations with
 less work done in each iteration) and might require quite a lot of tweaking
 when it comes to how many requests/responses handle each time etc.

Or split the clients across a group of threads to reduce maximum 
latency. See above, just determine the amount of clients per thread your 
system can sustain in time. A better way would be to dynamically 
load-balance clients between threads but it's far more complicated.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)

50 threads is not that big a problem. Around 100+ could be, 1000+ is a 
killer. The benefit with thread per client is that you don't even need 
Socket.select, just use blocking I/O and do the work per each parsed 
request in the same thread.

 - I can't initialize threads created via std.concurrency.spawn with a Socket
 object ("Aliases to mutable thread-local data not allowed.")

This can be hacked with casts to shared void* and back. Not pretty but 
workable.

 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks I
 have to do anything about that and would suck even more to do that with 50
 or so threads

I'm not sure if that problem will surface with blocking reads.

 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.


-- 
Dmitry Olshansky

Aug 04 2013

Marek Janukowicz <marek janukowicz.net> writes:

Dmitry Olshansky wrote:

 04-Aug-2013 23:38, Marek Janukowicz пишет:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread)
 which is handling connections/disconnections, socket operations, sends
 parsed requests for processing to single Worker thread, sends responses
 to clients. Interaction with Worker is done via message passing.

 
 Typical approach would be to separate responsibilities even more  and
 make a pool of threads per each stage.
 
 You may want to make a Master thread only handle new connections
 selecting over an "accept socket" (or a few if multiple end-points).
 Then it may distribute connected clients over I/O worker threads.
 
 A pool of I/O workers would then only send/receive data passing parsed
 request to "real" workers and responses back. They handle disconnects
 and closing though.

This is basically approach "2." I mentioned in my original post, I'm glad 
you agree it makes sense :)

 The real workers could be again pooled to be more responsive (or e.g.
 just one per each I/O thread).

There are more things specific to this particular application that would 
play a role here. One is that such "real workers" would operate on a common 
data structure and I would have to introduce some synchronization. Single 
worker thread was not my first approach, but after some woes with other 
solutions I decided to take it, because the problem is really not in 
processing (where a single thread does just fine so far), but in socket 
read/write operations.

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase
 might take a few seconds making the clients disconnect. Now I see 2
 possible solutions:

 1. Stay with the design I have, but change the workflow somewhat -
 instead of reading all the data from clients just read some requests and
 then send responses that are ready and repeat; the downside is that it's
 more complicated than current design, might be slower (more loop
 iterations with less work done in each iteration) and might require quite
 a lot of tweaking when it comes to how many requests/responses handle
 each time etc.

 
 Or split the clients across a group of threads to reduce maximum
 latency. See above, just determine the amount of clients per thread your
 system can sustain in time. A better way would be to dynamically
 load-balance clients between threads but it's far more complicated.

Yeah, both approaches seem to be somewhat more complicated and I'd like to 
aovid this if possible. So one client per thread makes sense to me.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)

 
 50 threads is not that big a problem. Around 100+ could be, 1000+ is a
 killer. 

Thanks for those numbers, it's great to know at least the ranges here.

 The benefit with thread per client is that you don't even need
 Socket.select, just use blocking I/O and do the work per each parsed
 request in the same thread.

Not really. This is something that Go (the language I also originally 
considered for the project) has solved in much better way - you can "select" 
on a number of "channels" and have both I/O and message passing covered by 
those. In D I must react both to network data or message from worker 
incoming, which means either self-pipe trick (which leads to Socket.select 
again) or some quirky stuff with timeouts on socket read and message receive 
(but this is basically a busy loop).
 
 - I can't initialize threads created via std.concurrency.spawn with a
 Socket object ("Aliases to mutable thread-local data not allowed.")

 
 This can be hacked with casts to shared void* and back. Not pretty but
 workable.

I'm using this trick elsewhere, was a bit reluctant to try it here. Btw. 
would it work if I pass a socket to 2 threads - reader and writer (by 
working I mean - not running into race conditions and other scary concurrent 
stuff)?

Also I'm really puzzled by the fact this common idiom doesn't work in some 
elegant way in D. I tried to Google a solution, but only found some weird 
tricks. Can anyone really experienced in D tell me why there is no nice 
solution for this (or correct me if I'm mistaken)?

 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks
 I have to do anything about that and would suck even more to do that with
 50 or so threads

 
 I'm not sure if that problem will surface with blocking reads.

Unfortunately it will (it precisely happens with blocking calls).

Thanks for your input, which shed some more light for me and also allowed me 
to explain the whole thing a bit more.

-- 
Marek Janukowicz

Aug 04 2013

Johannes Pfau <nospam example.com> writes:

Am Sun, 04 Aug 2013 22:59:04 +0200
schrieb Marek Janukowicz <marek janukowicz.net>:

 - I already have problems with "interrupted system call" on
 Socket.select due to GC kicking in; I'm restarting the call
 manually, but TBH it sucks I have to do anything about that and
 would suck even more to do that with 50 or so threads

 
 I'm not sure if that problem will surface with blocking reads.

 
 Unfortunately it will (it precisely happens with blocking calls).
 
 Thanks for your input, which shed some more light for me and also
 allowed me to explain the whole thing a bit more.
 

This is a bug in std.socket BTW. Blocking calls will get interrupted by
the GC - there's no way to avoid that - but std.socket should handle
this internally and just retry the interrupted operation. Please file a
bug report about this.

(Partial writes is another issue that could/should be handled in
std.socket so the user doesn't have to care about it)

Aug 04 2013

"David Nadlinger" <code klickverbot.at> writes:

On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:
 This is a bug in std.socket BTW. Blocking calls will get 
 interrupted by
 the GC - there's no way to avoid that - but std.socket should 
 handle
 this internally and just retry the interrupted operation. 
 Please file a
 bug report about this.

I'm not sure whether we can do anything about Socket.select 
itself at this point, as it would be a breaking API change – 
interrupted calls returning a negative value is even mentioned 
explicitly in the docs.

There should, however, be a way to implement this in a 
platform-independent manner in client code, or even a second 
version that handles signal interruptions internally.

 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)

I don't think that would be possible – std.socket by design is a 
thin wrapper around BSD sockets (whether that's a good idea or 
not is another question), and how to handle partial writes 
depends entirely on the environment the socket is used in (think 
event-based architecture using fibers vs. other designs).

In general, I wonder what the best way for going forward with 
std.socket is. Sure, we could try to slowly morph it into a 
"modern" networking implementation, but the current state also 
has its merits, as it allows people to use the familiar BSD 
sockets API without having to worry about all the trivial 
differences between the platforms (e.g. in symbol names).

We should definitely add a note to std.socket though that it is a 
low-level API and that there might be a better choice for most 
applications (e.g. vibe.d, Thrift, …).

David

Aug 05 2013

Johannes Pfau <nospam example.com> writes:

Am Mon, 05 Aug 2013 16:07:40 +0200
schrieb "David Nadlinger" <code klickverbot.at>:

 On Monday, 5 August 2013 at 06:36:15 UTC, Johannes Pfau wrote:
 This is a bug in std.socket BTW. Blocking calls will get=20
 interrupted by
 the GC - there's no way to avoid that - but std.socket should=20
 handle
 this internally and just retry the interrupted operation.=20
 Please file a
 bug report about this.

=20
 I'm not sure whether we can do anything about Socket.select=20
 itself at this point, as it would be a breaking API change =E2=80=93=20
 interrupted calls returning a negative value is even mentioned=20
 explicitly in the docs.
=20
 There should, however, be a way to implement this in a=20
 platform-independent manner in client code, or even a second=20
 version that handles signal interruptions internally.
=20
 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)

=20
 I don't think that would be possible =E2=80=93 std.socket by design is a=

=20
 thin wrapper around BSD sockets (whether that's a good idea or=20
 not is another question), and how to handle partial writes=20
 depends entirely on the environment the socket is used in (think=20
 event-based architecture using fibers vs. other designs).
=20
 In general, I wonder what the best way for going forward with=20
 std.socket is. Sure, we could try to slowly morph it into a=20
 "modern" networking implementation, but the current state also=20
 has its merits, as it allows people to use the familiar BSD=20
 sockets API without having to worry about all the trivial=20
 differences between the platforms (e.g. in symbol names).
=20
 We should definitely add a note to std.socket though that it is a=20
 low-level API and that there might be a better choice for most=20
 applications (e.g. vibe.d, Thrift, =E2=80=A6).
=20
 David

You're right, I somehow thought std.socket was supposed to offer a high
level API. But as it was designed as a low level wrapper we probably
can't do much without breaking API compatibility.

Aug 05 2013

Marek Janukowicz <marek janukowicz.net> writes:

Johannes Pfau wrote:

 This is a bug in std.socket BTW. Blocking calls will get
 interrupted by
 the GC - there's no way to avoid that - but std.socket should
 handle
 this internally and just retry the interrupted operation.
 Please file a
 bug report about this.

 
 I'm not sure whether we can do anything about Socket.select
 itself at this point, as it would be a breaking API change –
 interrupted calls returning a negative value is even mentioned
 explicitly in the docs.
 
 There should, however, be a way to implement this in a
 platform-independent manner in client code, or even a second
 version that handles signal interruptions internally.
 
 (Partial writes is another issue that could/should be handled in
 std.socket so the user doesn't have to care about it)

 
 I don't think that would be possible – std.socket by design is a
 thin wrapper around BSD sockets (whether that's a good idea or
 not is another question), and how to handle partial writes
 depends entirely on the environment the socket is used in (think
 event-based architecture using fibers vs. other designs).
 
 In general, I wonder what the best way for going forward with
 std.socket is. Sure, we could try to slowly morph it into a
 "modern" networking implementation, but the current state also
 has its merits, as it allows people to use the familiar BSD
 sockets API without having to worry about all the trivial
 differences between the platforms (e.g. in symbol names).
 
 We should definitely add a note to std.socket though that it is a
 low-level API and that there might be a better choice for most
 applications (e.g. vibe.d, Thrift, …).
 
 David

 
 You're right, I somehow thought std.socket was supposed to offer a high
 level API. But as it was designed as a low level wrapper we probably
 can't do much without breaking API compatibility.

But - as I mentioned in another post - it looks like "interrupted system 
call" problem happens only with select and not eg. with blocking read. This 
means that current behaviour is inconsistent between std.socket functions. 
Also it was possible to make this work for read (I believe this bug & fix 
address that - http://d.puremagic.com/issues/show_bug.cgi?id=2242) and I 
don't think anyone considered it as "compatibility breaking", so why not 
take the same route for select?

-- 
Marek Janukowicz

Aug 06 2013

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, August 05, 2013 16:07:40 David Nadlinger wrote:
 I don't think that would be possible – std.socket by design is a
 thin wrapper around BSD sockets (whether that's a good idea or
 not is another question), and how to handle partial writes
 depends entirely on the environment the socket is used in (think
 event-based architecture using fibers vs. other designs).
 
 In general, I wonder what the best way for going forward with
 std.socket is. Sure, we could try to slowly morph it into a
 "modern" networking implementation, but the current state also
 has its merits, as it allows people to use the familiar BSD
 sockets API without having to worry about all the trivial
 differences between the platforms (e.g. in symbol names).

I'm all for std.socket being completely rewritten. I think that how it's tied 
to BSD sockets is a major liability. Where I work, we have a platform-
independent socket class (in C++) which is generic enough that we have a 
derived class which uses OpenSSL so that you can swap between normal sockets 
and SSL sockets seemlessly. You can't do anything of the sort with std.socket.

Unfortunately, I have neither the time nor the expertise at this point to 
rewrite std.socket, but if no one else does it, I'm sure that I'll write 
something eventually (whether it makes it into Phobos or not), because I 
really, really don't like how std.socket is put together. Having used a socket 
class which enables you to seemlessly pass around SSL sockets in the place of 
normal sockets, and having seen how fantastic and wonderful that is, I'm
likely to have a very low opinion of a socket class whose design does not
allow that.

- Jonathan M Davis

Aug 05 2013

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

05-Aug-2013 00:59, Marek Janukowicz пишет:
Dmitry Olshansky wrote:
There are more things specific to this particular application that would
play a role here. One is that such "real workers" would operate on a common
data structure and I would have to introduce some synchronization. Single
worker thread was not my first approach, but after some woes with other
solutions I decided to take it, because the problem is really not in
processing (where a single thread does just fine so far), but in socket
read/write operations.

Then what will make it simple is the following scenario
X Input threads feed 1 worker thread by putting requests into one shared
queue.

You would have to use lock around it or get some decent concurrent queue
code (but better start with simple lock + queue)...

Got carried away ... you can just easily use std.concurrency message
passing (as *it is* an implicit message queue).

Then just throw in another writer thread that recieves pairs of
responses + sockets (or shared void* e-hm) from "real worker".

The pipeline is then roughly:
Acceptor
--CREATES--> InputWorkers (xN)
--SEND REQ--> Real Worker
--SOCK/RESP--> Writer

2. Create separate thread per each client connection. I think this could
result in a nice, clean setup, but I see some problems:
- I'm not sure how ~50 threads will do resource-wise (although they will
probably be mostly waiting on Socket.select)

50 threads is not that big a problem. Around 100+ could be, 1000+ is a
killer.

Thanks for those numbers, it's great to know at least the ranges here.

The benefit with thread per client is that you don't even need
Socket.select, just use blocking I/O and do the work per each parsed
request in the same thread.

Not really. This is something that Go (the language I also originally
considered for the project) has solved in much better way - you can "select"
on a number of "channels" and have both I/O and message passing covered by
those.

They multiplex stuff in their runtime. In fact AFAIK they don't even
have clean-cut native threads. It would be interesting to see how they
handle it but I guess either self-pipe or event-driven + async I/O to
begin with.

In D I must react both to network data or message from worker
incoming, which means either self-pipe trick (which leads to Socket.select
again) or some quirky stuff with timeouts on socket read and message receive
(but this is basically a busy loop).

Sadly like others said with std.socket you get to witness the gory glory
of BSD sockets API that shows its age. Regardless it's what all major OS
directly provide.

Btw.
would it work if I pass a socket to 2 threads - reader and writer (by
working I mean - not running into race conditions and other scary concurrent
stuff)?

Should be just fine.
See also
http://stackoverflow.com/questions/1981372/are-parallel-calls-to-send-recv-on-the-same-socket-valid

Also I'm really puzzled by the fact this common idiom doesn't work in some
elegant way in D. I tried to Google a solution, but only found some weird
tricks. Can anyone really experienced in D tell me why there is no nice
solution for this (or correct me if I'm mistaken)?

The trick is that Socket/std.socket was designed way back before
std.concurrency. It's a class as everything back then liked to be.
The catch is that classes by default are mutable and thread-local and
thus can't be automatically _safely_ transfered across threads.

There were/are talks about adding some kind of Unique helper to
facilitate such move in a clean way. So at the moment - nope.

--
Dmitry Olshansky

Aug 05 2013

=?iso-8859-1?Q?Robert_M._M=FCnch?= <robert.muench saphirion.com> writes:

On 2013-08-04 19:38:49 +0000, Marek Janukowicz said:

 ...
 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.

Hi, I would take a look at the BEEP protocol idea and there at the 
Vortex library [1] it deals with everything you need. The idea of BEEP 
is, that you don't have to care about all the network pitfalls since 
these are always the same. Instead you can concentrate on your 
application level design. Where the time is spent much more valuable.

The lib is written in C and works very good. It's matured and 
multi-threaded to allow for maximum transfers.

[1] http://www.aspl.es/vortex/


-- 
Robert M. M�nch
Saphirion AG

http://www.saphirion.com
smarter | better | faster

Aug 05 2013

"Regan Heath" <regan netmail.co.nz> writes:

On Sun, 04 Aug 2013 20:38:49 +0100, Marek Janukowicz  
<marek janukowicz.net> wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread)  
 which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to  
 clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase  
 might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat -  
 instead
 of reading all the data from clients just read some requests and then  
 send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations  
 with
 less work done in each iteration) and might require quite a lot of  
 tweaking
 when it comes to how many requests/responses handle each time etc.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn with a  
 Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks  
 I
 have to do anything about that and would suck even more to do that with  
 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so  
 you
 can correct me on that as well.


number of clients.  I have had loads of experience with server  
applications on Windows and a little less on the various flavours of  
UNIXen and 50 connected clients serviced by 50 threads should be perfectly  
manageable for the OS.

It sounds like only non-blocking sockets have the GC interrupt issue, if  
so use non-blocking sockets instead.  However, it occurs to me that the  
issue may rear it's head again on the call to select() on non-blocking  
sockets, so it is worth testing this first.

If there is no way around the GC interrupt issue then code up your own  
recv function and re-use it all your threads, not ideal but definitely  
workable.

In the case of non-blocking sockets your read operation needs to account  
for the /this would block/ error code, and should go something like this..  
(using low level socket function call names because I have not used the D  
socket library recently)
1. Attempt recv(), expect either DATA or ERROR.
1a. If DATA, process data and handle possible partial request(s) - by  


1c. If ERROR and not would block, fail/exit/disconnect.
2. Perform select() (**this may be interruptable by GC**) for a finite  
shortish timeout - if you want your client handlers to react quickly to  
the signal to shutdown then you want a shorter time - for example.

2b. If select returns an error, fail/exit/disconnect.


Do you have control of the connecting client code as well?  If so, think  
about disabling the Nagle algorithm:
http://en.wikipedia.org/wiki/Nagle's_algorithm

You will want to ensure the client writes it's requests in a single send()  
call but in this way you reduce the delay in receiving requests at the  
server side.  If the client writes multiple requests rapidly then with  
Nagle enabled it may buffer them on the client end and will delay the  
server seeing the first, but with it disabled the server will see the  
first as soon as it is written and can start processing it while the  
client writes.  So depending on how your clients send requests, you may  
see a performance improvement here.

I don't know how best to solve the "Aliases to mutable thread-local data  
not allowed.".  You will need to ensure the socket is allocated globally  
(not thread local) and because you know it's unique and not shared you can  
cast it as such to get it into the thread, once there you can cast it back  
to unshared/local/mutable.  Not ideal, but not illegal or invalid AFAICS.

FYI.. For a better more scaleable solution you would use async IO with a  
pool of worker threads, I am not sure if D has good support for this and  

library support for it).

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Aug 05 2013

Justin Whear <justin economicmodeling.com> writes:

On Sun, 04 Aug 2013 21:38:49 +0200, Marek Janukowicz wrote: 
 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so
 you can correct me on that as well.

Are you familiar with ZeroMQ?  I write network infrastructure on a fairly 
regular basis and wouldn't dream of doing it without ZeroMQ: http://
zeromq.org/

There are D bindings in Deimos: https://github.com/D-Programming-Deimos/
ZeroMQ

Aug 05 2013

Marek Janukowicz <marek janukowicz.net> writes:

Marek Janukowicz wrote:

 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain
 
 Currently I have a Master thread (which is basically the main thread)
 which is handling connections/disconnections, socket operations, sends
 parsed requests for processing to single Worker thread, sends responses to
 clients. Interaction with Worker is done via message passing.

I'd like to thank anyone for valuable input. For now I chose Dmitry's 
suggestion (which was an extension of my idea to go with thread per client), 
so I have multiple receivers, single worker and multiple senders. That works 
quite well, although I didn't really test that with many clients. One nice 
thing is that "interrupted system call" problem magically went away - it 
looks like it occurred with Socket.select (which I don't use after 
architectural changes anymore) only and socket.send/receive is apparently 
not affected.

-- 
Marek Janukowicz

Aug 05 2013

Brad Roberts <braddr puremagic.com> writes:

A reasonably common way to handle this is that the event loop thread only
detects events (readable, 
writable, etc) and passes them off to worker threads to process (do the reading
and parsing, do the 
writing, etc).  In general, I wouldn't recommend one thread per active
connection, but if you're 
_sure_ that you're constrained to those low sorts of numbers, then it might
well be the easiest path 
to go for your app.  You definitely want to move the actual i/o out of your
event loop thread. to 
let those other cores take on that job, freeing up your single threaded part to
do a little work as 
possible.  It's your bottleneck and that resource needs to be protected.

On 8/4/13 12:38 PM, Marek Janukowicz wrote:
 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but
 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will
 disconnect and complain

 Currently I have a Master thread (which is basically the main thread) which
 is handling connections/disconnections, socket operations, sends parsed
 requests for processing to single Worker thread, sends responses to clients.
 Interaction with Worker is done via message passing.

 The problem with my approach is that I read as much data as possible from
 each ready client in order. As there are many requests this read phase might
 take a few seconds making the clients disconnect. Now I see 2 possible
 solutions:

 1. Stay with the design I have, but change the workflow somewhat - instead
 of reading all the data from clients just read some requests and then send
 responses that are ready and repeat; the downside is that it's more
 complicated than current design, might be slower (more loop iterations with
 less work done in each iteration) and might require quite a lot of tweaking
 when it comes to how many requests/responses handle each time etc.

 2. Create separate thread per each client connection. I think this could
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they will
 probably be mostly waiting on Socket.select)
 - I can't initialize threads created via std.concurrency.spawn with a Socket
 object ("Aliases to mutable thread-local data not allowed.")
 - I already have problems with "interrupted system call" on Socket.select
 due to GC kicking in; I'm restarting the call manually, but TBH it sucks I
 have to do anything about that and would suck even more to do that with 50
 or so threads

 If anyone has any idea how to handle the problems I mentioned or has any
 idea for more suitable design I would be happy to hear it. It's also
 possible I'm approaching the issue from completely wrong direction, so you
 can correct me on that as well.

Aug 04 2013

Sean Kelly <sean invisibleduck.org> writes:

On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> =
wrote:

 I'm writing a network server with some specific requirements:
 - 5-50 clients connected (almost) permanently (maybe a bit more, but=20=

 definitely not hundreds of them)
 - possibly thousands of requests per seconds
 - responses need to be returned within 5 seconds or the client will=20
 disconnect and complain

Given the relatively small number of concurrent connections, you may be =
best off just spawning a thread per connection.  The cost of context =
switching at that level of concurrency is reasonably low, and the code =
will be a heck of a lot simpler than an event loop dispatching jobs to a =
thread pool (which is the direction you might head with a larger number =
of connections).


 Currently I have a Master thread (which is basically the main thread) =

which=20
 is handling connections/disconnections, socket operations, sends =

parsed=20
 requests for processing to single Worker thread, sends responses to =

clients.=20
 Interaction with Worker is done via message passing.
=20
 The problem with my approach is that I read as much data as possible =

from=20
 each ready client in order. As there are many requests this read phase =

might=20
 take a few seconds making the clients disconnect.

This seems weird to me.  Are those reads blocking for some length of =
time?  I would expect them to return pretty much instantly.  How much =
data is in each request?


 Now I see 2 possible solutions:
=20
 1. Stay with the design I have, but change the workflow somewhat - =

instead=20
 of reading all the data from clients just read some requests and then =

send=20
 responses that are ready and repeat; the downside is that it's more=20
 complicated than current design, might be slower (more loop iterations =

with=20
 less work done in each iteration) and might require quite a lot of =

tweaking=20
 when it comes to how many requests/responses handle each time etc.

There are a bunch of different approaches along these lines, but the =
crux of it is that you'll be multiplexing N connections across an =
M-sized thread pool.  Each connection carries a buffer with it, and =
whenever data is available you stick that connection in a work queue, =
and let a pooled thread accumulate the new data into that connection's =
buffer and potentially process the complete request.


 2. Create separate thread per each client connection. I think this =

could=20
 result in a nice, clean setup, but I see some problems:
 - I'm not sure how ~50 threads will do resource-wise (although they =

will=20
 probably be mostly waiting on Socket.select)

With a thread per connection you can probably just do blocking reads in =
each thread and not bother with select at all.  And with only 50 threads =
I don't think you'll see a performance problem.  I've been reading up on =
Java NIO recently (their approach for supporting epoll within Java), and =
some people have actually said that the old thread-per-connection =
approach was actually faster in their tests.  Of course, no one seems to =
test beyond a few thousand concurrent connections, but that's still well =
above what you're doing.  In short, I'd consider benchmarking it and see =
if performance is up to snuff.


 - I can't initialize threads created via std.concurrency.spawn with a =

Socket=20
 object ("Aliases to mutable thread-local data not allowed.")

You can cast the Socket to shared and cast away shared upon receipt.  =
I'd like a more formal means of moving uniquely referenced data via =
std.concurrency, but that will do the trick for now.


 - I already have problems with "interrupted system call" on =

Socket.select=20
 due to GC kicking in; I'm restarting the call manually, but TBH it =

sucks I=20
 have to do anything about that and would suck even more to do that =

with 50=20
 or so threads

Just wrap it in a function that tests the return value and loops if =
necessary.  Plenty of system calls need to deal with the EINTR error.  =
It may not just be GC that's causing it.  There's a decent chance you'll =
have to deal with SIGPIPE as well.=

Aug 05 2013

Brad Roberts <braddr puremagic.com> writes:

On 8/5/13 4:33 PM, Sean Kelly wrote:
 On Aug 4, 2013, at 12:38 PM, Marek Janukowicz <marek janukowicz.net> wrote:

 I'm writing a network server with some specific requirements: - 5-50 clients
connected (almost)
 permanently (maybe a bit more, but definitely not hundreds of them) - possibly
thousands of
 requests per seconds - responses need to be returned within 5 seconds or the
client will
 disconnect and complain

 Given the relatively small number of concurrent connections, you may be best
off just spawning a
 thread per connection.  The cost of context switching at that level of
concurrency is reasonably
 low, and the code will be a heck of a lot simpler than an event loop
dispatching jobs to a thread
 pool (which is the direction you might head with a larger number of
connections).

I agree, with one important caveat:  converting from a blocking thread per
connection model to a 
non-blocking pool of threads model is often essentially starting over.  Even at
the 50 threads point 
I tend to think you've passed the point of just throwing threads at the
problem.  But I'm also much 
more used to dealing with 10's of thousands of sockets, so my view is a tad
biased.

Aug 05 2013

Sean Kelly <sean invisibleduck.org> writes:

On Aug 5, 2013, at 4:49 PM, Brad Roberts <braddr puremagic.com> wrote:

 On 8/5/13 4:33 PM, Sean Kelly wrote:
=20
=20
 Given the relatively small number of concurrent connections, you may =


be best off just spawning a
 thread per connection.  The cost of context switching at that level =


of concurrency is reasonably
 low, and the code will be a heck of a lot simpler than an event loop =


dispatching jobs to a thread
 pool (which is the direction you might head with a larger number of =


connections).
=20
 I agree, with one important caveat:  converting from a blocking thread =

per connection model to a non-blocking pool of threads model is often =
essentially starting over.  Even at the 50 threads point I tend to think =
you've passed the point of just throwing threads at the problem.  But =
I'm also much more used to dealing with 10's of thousands of sockets, so =
my view is a tad biased.

I'm in the same boat in terms of experience, so I'm trying to resist my =
inclination to do things the scalable way in favor of the simplest =
approach that meets the stated requirements.  You're right that =
switching would mean a total rewrite though, except possibly if you =
switched to Vibe, which uses fibers to make things look like the one =
thread per connection approach when it's actually multiplexing.

The real tricky bit about multiplexing, however, is how to deal with =
situations when you need to perform IO to handle client requests.  If =
that IO isn't event-based as well then you're once again spawning =
threads to keep that IO from holding up request processing.  I'm =
actually kind of surprised that more current-gen APIs don't expose the =
file descriptor they use for their work or provide some other means of =
integrating into an event loop.  In a lot of cases it seems like I end =
up having to write my own version of whatever library just to get the =
scalability characteristics I require, which is a horrible use of time.=

Aug 06 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Network server design question