www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - eventcore vs boost.asio performance?

reply zoujiaqing <zoujiaqing gmail.com> writes:
eventcore is a very good and stable network library based on the 
Proactor model.
These high-performing C++ frameworks all use asio.
What can we achieve if we use eventcore as network io?

  * https://github.com/vibe-d/eventcore
  * 
https://www.techempower.com/benchmarks/#section=data-r21&hw=cl&test=plaintext
Feb 19 2023
parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 19.02.2023 um 20:53 schrieb zoujiaqing:
 eventcore is a very good and stable network library based on the 
 Proactor model.
 These high-performing C++ frameworks all use asio.
 What can we achieve if we use eventcore as network io?
 
   * https://github.com/vibe-d/eventcore
   * 
 https://www.techempower.com/benchmarks/#section=data-r21&hw=cl&test=plaintext
I'm not sure where it would be today on that list, but I got pretty competitive results for local tests on Linux a few years back. However, there are at least two performance related issues still present: - The API uses internal allocation of memory for I/O operations and per socket descriptor. This is to work around the lack of a way to disable struct moves (and thus making it unsafe to store any kind of pointer to stack values). The allocations are pretty well optimized, but it does lead to some additional memory copies that impede performance. - On both, Linux and Windows, there are new, faster I/O APIs: io_uring and RIO. A PR by Tobias Pankrath (https://github.com/vibe-d/eventcore/pull/175) for io_uring exists, but it still needs to be finished. Another thing that needs to be tackled is better error propagation. Right now, there is often just a generic "error" status, without the possibility to get a more detailed error code or message. By the way, although the vibe.d HTTP implementation naturally adds some overhead over the raw network I/O, the vibe.d results in that list, judging by their poor performance on many-core machines, appear to be affected by GC runs, or possibly some other lock contention, whereas the basic HTTP request handling should be more or less GC-free. So those shouldn't be used for comparison.
Feb 20 2023
next sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Mon, Feb 20, 2023 at 9:30 AM S=C3=B6nke Ludwig via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 ...

 By the way, although the vibe.d HTTP implementation naturally adds some
 overhead over the raw network I/O, the vibe.d results in that list,
 judging by their poor performance on many-core machines, appear to be
 affected by GC runs, or possibly some other lock contention, whereas the
 basic HTTP request handling should be more or less GC-free. So those
 shouldn't be used for comparison.
Last time I checked the main reason why vibed was slower has been because of HTTP parsing. vibe-core with manual http parsing has been the same fast as all other fastest alternatives.
Feb 20 2023
parent reply tchaloupka <chalucha gmail.com> writes:
On Monday, 20 February 2023 at 09:12:37 UTC, Daniel Kozak wrote:
 Last time I checked the main reason why vibed was slower has 
 been because of HTTP parsing. vibe-core with manual http 
 parsing has been the same fast as all other fastest 
 alternatives.
I've compared what syscalls various frameworks generates and by far the most difference makes that in vibe-d response header and body are written in two separate syscalls (tested on linux with epoll). That makes a pretty huge difference of about 30% if I remember correctly. Eventcore itself is not slow and is comparable with the top ones. Tom
Feb 21 2023
parent Daniel Kozak <kozzi11 gmail.com> writes:
On Tue, Feb 21, 2023 at 10:45 AM tchaloupka via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 On Monday, 20 February 2023 at 09:12:37 UTC, Daniel Kozak wrote:
 Last time I checked the main reason why vibed was slower has
 been because of HTTP parsing. vibe-core with manual http
 parsing has been the same fast as all other fastest
 alternatives.
I've compared what syscalls various frameworks generates and by far the most difference makes that in vibe-d response header and body are written in two separate syscalls (tested on linux with epoll). That makes a pretty huge difference of about 30% if I remember correctly. Eventcore itself is not slow and is comparable with the top ones.
Yes, you are right I have changed that too when I have been trying to make vibed as fast as possible.
 Tom
Feb 21 2023
prev sibling parent reply zoujiaqing <zoujiaqing gmail.com> writes:
On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig wrote:
 I'm not sure where it would be today on that list, but I got 
 pretty competitive results for local tests on Linux a few years 
 back. However, there are at least two performance related 
 issues still present:

 - The API uses internal allocation of memory for I/O operations 
 and per socket descriptor. This is to work around the lack of a 
 way to disable struct moves (and thus making it unsafe to store 
 any kind of pointer to stack values). The allocations are 
 pretty well optimized, but it does lead to some additional 
 memory copies that impede performance.

 - On both, Linux and Windows, there are new, faster I/O APIs: 
 io_uring and RIO. A PR by Tobias Pankrath 
 (https://github.com/vibe-d/eventcore/pull/175) for io_uring 
 exists, but it still needs to be finished.

 Another thing that needs to be tackled is better error 
 propagation. Right now, there is often just a generic "error" 
 status, without the possibility to get a more detailed error 
 code or message.

 By the way, although the vibe.d HTTP implementation naturally 
 adds some overhead over the raw network I/O, the vibe.d results 
 in that list, judging by their poor performance on many-core 
 machines, appear to be affected by GC runs, or possibly some 
 other lock contention, whereas the basic HTTP request handling 
 should be more or less GC-free. So those shouldn't be used for 
 comparison.
First, the io_uring is very much to look forward to! When can you merge this PR? Secondly, how to optimize memory allocation and release under high concurrency? Nbuff is a great library, and I've used it before.
Mar 05 2023
parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 05.03.2023 um 16:14 schrieb zoujiaqing:
 On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig wrote:
 I'm not sure where it would be today on that list, but I got pretty 
 competitive results for local tests on Linux a few years back. 
 However, there are at least two performance related issues still present:

 - The API uses internal allocation of memory for I/O operations and 
 per socket descriptor. This is to work around the lack of a way to 
 disable struct moves (and thus making it unsafe to store any kind of 
 pointer to stack values). The allocations are pretty well optimized, 
 but it does lead to some additional memory copies that impede 
 performance.

 - On both, Linux and Windows, there are new, faster I/O APIs: io_uring 
 and RIO. A PR by Tobias Pankrath 
 (https://github.com/vibe-d/eventcore/pull/175) for io_uring exists, 
 but it still needs to be finished.

 Another thing that needs to be tackled is better error propagation. 
 Right now, there is often just a generic "error" status, without the 
 possibility to get a more detailed error code or message.

 By the way, although the vibe.d HTTP implementation naturally adds 
 some overhead over the raw network I/O, the vibe.d results in that 
 list, judging by their poor performance on many-core machines, appear 
 to be affected by GC runs, or possibly some other lock contention, 
 whereas the basic HTTP request handling should be more or less 
 GC-free. So those shouldn't be used for comparison.
First, the io_uring is very much to look forward to! When can you merge this PR?
It doesn't pass the tests and has conflicts, so it needs some work. I could look into that, too, but I don't have much time available.
 Secondly, how to optimize memory allocation and release under high 
 concurrency? Nbuff is a great library, and I've used it before.
Apart for accidental allocations in the timer code, there are very few allocations in eventcore itself. The allocation scheme exploits the small integer nature of Posix handles and keeps a fixed-size slot per file descriptor in an array of arrays. Buffer allocations for read/write operations are the responsibility of the library user. In that regard, nbuff should be usable for high-level data buffers just fine and although I haven't used it, it sounds like a very interesting concept in terms of using range interfaces with network data. For vibe.d's HTTP server module, I'm using a free list based allocation scheme, where each requests gets a pre-allocated buffer that is later going to be reused by another request. This means that after a warmup phase, there will be few to no allocations per request, at least in the core HTTP handling code.
Mar 06 2023
parent ikod <igor.khasilev gmail.com> writes:
On Tuesday, 7 March 2023 at 07:55:04 UTC, Sönke Ludwig wrote:
 Am 05.03.2023 um 16:14 schrieb zoujiaqing:
 On Monday, 20 February 2023 at 08:26:23 UTC, Sönke Ludwig 
 wrote:
 In that regard, nbuff should be usable for high-level data 
 buffers just fine and although I haven't used it, it sounds 
 like a very interesting concept in terms of using range 
 interfaces with network data.
There is few simple ideas behind nbuff: 1) all accepted network data are immutable - this allow share buffers safely 2) usually we get data from network as a chunks of "contiguous and endless" stream, but actually we interested only in a small moving forward window of data. So it would be nice to automate receiving new data chunks, process them in current "window" and throw safely away as soon as they are processed. Nbuff manage list of smart pointers to immutable byte buffers to implement this view on problem.
Mar 08 2023