digitalmars.D - vibe.d benchmarks

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/3) Dec 28 2015 https://www.techempower.com/benchmarks/

Charles (4/7) Dec 28 2015 Sönke is already on it.

Nick B (6/14) Dec 29 2015 Correct me if I am wrong here, but as far I can tell there is no

Charles (3/19) Dec 29 2015 The last time the official benchmark was run was over a month

yawniek (24/27) Dec 30 2015 i guess its not enough, there are still things that make vibe.d

Daniel Kozak via Digitalmars-d (5/44) Dec 30 2015 Which async library you use for vibed? libevent? libev? or libasync?

yawniek (6/11) Dec 30 2015 the numbers above are libevent in release mode, as per original

Daniel Kozak via Digitalmars-d (4/20) Dec 30 2015 Thanks, it is wierd I use libasync and have quite good performance,

Laeeth Isharc (7/34) Dec 31 2015 Isn't there a decent chance the bottleneck is vibe.d's JSON

yawniek (15/21) Dec 31 2015 this is not the same benchmark discussed elsewhere, this one is a

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/21) Dec 31 2015 Go scores 0.5ms latency, vibe.d scores 14.7ms latency. That's a
Etienne Cimon (6/27) Dec 31 2015 That would be the other way around. TCP_NODELAY is not enabled in

yawniek (8/16) Dec 31 2015 obvious typo and thanks for investigating etienne.

Daniel Kozak via Digitalmars-d (12/32) Dec 31 2015 One thing I forgot to mention I have to modify few things

Nick B (4/22) Jan 03 2016 can someone tell me what changes need to be commited, so that we

Etienne Cimon (5/8) Jan 03 2016 Considering that the best benchmarks are from tools that have all

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (8/14) Jan 03 2016 Fiber context changes are not a significant influence. I've created a

=?UTF-8?Q?S=c3=b6nke_Ludwig?= (10/19) Jan 03 2016 For me, threadsPerCPU correctly yields the number of logical cores (i.e....

Daniel Kozak via Digitalmars-d (4/32) Jan 04 2016 On my AMD FX4100 (4 cores) and my AMD AMD A10-7850K(4 core) it is

Daniel Kozak (5/10) Dec 31 2015 When I use HTTPServerOption.distribute with libevent I get better

Etienne Cimon (5/16) Dec 31 2015 I launch libasync programs as multiple processes, a bit like

Daniel Kozak (3/17) Jan 01 2016 ?

Etienne Cimon (4/22) Jan 01 2016 With libasync, you can run multiple instances of your vibe.d

Sebastiaan Koppe (6/9) Jan 02 2016 That is nice. Didn't know that. That would enable

Etienne Cimon (13/18) Jan 02 2016 Yes, although you might still break existing connections unless

Daniel Kozak via Digitalmars-d (4/28) Jan 04 2016 Yes, but I speak about one instance of vibe.d with multiple

Etienne Cimon (2/17) Jan 04 2016 Yes, I will investigate this.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Dec 31 2015 I don't know how the benchmarks are set up, but I would assume

yawniek (5/9) Dec 31 2015 its actually pretty realistic, one point of having a fast

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/6) Dec 31 2015 It does not scale. If you can do it, then you don't really have a

Atila Neves (13/58) Jan 05 2016 vibe.d _was_ faster than Go. I redid the measurements recently

Etienne Cimon (3/17) Jan 05 2016 The Rust mio library doesn't seem to be doing any black magic. I

rsw0x (11/32) Jan 05 2016 Have you used perf(or similar) to attempt to find bottlenecks yet?

Nikolay (5/7) Jan 05 2016 I used perf and wrote my result here:

Etienne (2/11) Jan 05 2016 libasync is the result of an attempt to use epoll directly

Atila Neves (10/38) Jan 06 2016 Extensively. I optimised my D code as much as I know how to. And

Etienne Cimon (5/27) Jan 07 2016 It's possible that those cache misses will be irrelevant when the

Nikolay (4/8) Jan 08 2016 I believe cache-misses problem is related to old vibed version.

Atila Neves (4/8) Jan 06 2016 No black magic, it's a thin wrapper over epoll. But it was faster

Etienne Cimon (3/12) Jan 07 2016 You tested D+mio, but the equivalent would probably be D+libasync

Daniel Kozak (83/110) Dec 31 2015 My results from siege(just return page with Hello World same as
=?UTF-8?Q?S=c3=b6nke_Ludwig?= (4/17) Jan 04 2016 Can you try with the latest GIT master? There are some important

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

https://www.techempower.com/benchmarks/

The entries for vibe.d are either doing very poorly or fail to 
complete. Maybe someone should look into this?

Dec 28 2015

Charles <csmith.ku2013 gmail.com> writes:

On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim Grøstad 
wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail to 
 complete. Maybe someone should look into this?

Sönke is already on it.

http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110

Dec 28 2015

Nick B <nick.barbalich gmail.com> writes:

On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:
 On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim 
 Grøstad wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail to 
 complete. Maybe someone should look into this?

 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110

Correct me if I am wrong here, but as far I can tell there is no 
independent benchmarks showing performance (superior or good 
enough) of D verses Go, or against just about any other language, 
as well  ?

https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hr

Dec 29 2015

Charles <csmith.ku2013 gmail.com> writes:

On Tuesday, 29 December 2015 at 22:49:36 UTC, Nick B wrote:
 On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:
 On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim 
 Grøstad wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail 
 to complete. Maybe someone should look into this?

 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110

 Correct me if I am wrong here, but as far I can tell there is 
 no independent benchmarks showing performance (superior or good 
 enough) of D verses Go, or against just about any other 
 language, as well  ?

 https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hr

The last time the official benchmark was run was over a month 
before Sönke's PR.

Dec 29 2015

yawniek <dlang srtnwz.com> writes:

 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110



i guess its not enough, there are still things that make vibe.d 
slow.

i quickly tried
https://github.com/nanoant/WebFrameworkBenchmark.git
which is really a very simple benchmark but it shows about the 
general overhead.

single core results against go-fasthttp with GOMAXPROCS=1 and 
vibe distribution disabled on a c4.2xlarge ec2 instance 
(archlinux):

vibe.d 0.7.23 with ldc
Requests/sec:  52102.06

vibe.d 0.7.26 with dmd
Requests/sec:  44438.47

vibe.d 0.7.26 with ldc
Requests/sec:  53996.62

go-fasthttp:
Requests/sec: 152573.32

go:
Requests/sec:  62310.04

its sad.

i am aware that go-fasthttp is a very simplistic, stripped down 
webserver and vibe is almost a full blown framework. still it 
should be D and vibe.d's USP to be faster than the fastest in the 
world and not limping around at the end of the charts.

Dec 30 2015

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Wed, 30 Dec 2015 20:32:08 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110  



 
 i guess its not enough, there are still things that make vibe.d 
 slow.
 
 i quickly tried
 https://github.com/nanoant/WebFrameworkBenchmark.git
 which is really a very simple benchmark but it shows about the 
 general overhead.
 
 single core results against go-fasthttp with GOMAXPROCS=1 and 
 vibe distribution disabled on a c4.2xlarge ec2 instance 
 (archlinux):
 
 vibe.d 0.7.23 with ldc
 Requests/sec:  52102.06
 
 vibe.d 0.7.26 with dmd
 Requests/sec:  44438.47
 
 vibe.d 0.7.26 with ldc
 Requests/sec:  53996.62
 
 go-fasthttp:
 Requests/sec: 152573.32
 
 go:
 Requests/sec:  62310.04
 
 its sad.
 
 i am aware that go-fasthttp is a very simplistic, stripped down 
 webserver and vibe is almost a full blown framework. still it 
 should be D and vibe.d's USP to be faster than the fastest in the 
 world and not limping around at the end of the charts.
 
 

Which async library you use for vibed? libevent? libev? or libasync?
Which compilation switches you used?

Without this info it says nothing about vibe.d's performance :)

Dec 30 2015

yawniek <dlang srtnwz.com> writes:

On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak 
wrote:
 V Wed, 30 Dec 2015 20:32:08 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Which async library you use for vibed? libevent? libev? or 
 libasync? Which compilation switches you used?

 Without this info it says nothing about vibe.d's performance :)

the numbers above are libevent in release mode, as per original 
configuration.

for libasync there is a problem so its stuck at 2.4 rps. etcimon 
is currently investigating there.

Dec 30 2015

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Wed, 30 Dec 2015 21:09:37 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak 
 wrote:
 V Wed, 30 Dec 2015 20:32:08 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Which async library you use for vibed? libevent? libev? or 
 libasync? Which compilation switches you used?

 Without this info it says nothing about vibe.d's performance :)  

 
 the numbers above are libevent in release mode, as per original 
 configuration.
 
 for libasync there is a problem so its stuck at 2.4 rps. etcimon 
 is currently investigating there.
 

Thanks, it is wierd I use libasync and have quite good performance,
probably some regression (which version of libasync?)

Dec 30 2015

Laeeth Isharc <laeethnospam nospam.laeeth.com> writes:

On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110



 i guess its not enough, there are still things that make vibe.d 
 slow.

 i quickly tried
 https://github.com/nanoant/WebFrameworkBenchmark.git
 which is really a very simple benchmark but it shows about the 
 general overhead.

 single core results against go-fasthttp with GOMAXPROCS=1 and 
 vibe distribution disabled on a c4.2xlarge ec2 instance 
 (archlinux):

 vibe.d 0.7.23 with ldc
 Requests/sec:  52102.06

 vibe.d 0.7.26 with dmd
 Requests/sec:  44438.47

 vibe.d 0.7.26 with ldc
 Requests/sec:  53996.62

 go-fasthttp:
 Requests/sec: 152573.32

 go:
 Requests/sec:  62310.04

 its sad.

 i am aware that go-fasthttp is a very simplistic, stripped down 
 webserver and vibe is almost a full blown framework. still it 
 should be D and vibe.d's USP to be faster than the fastest in 
 the world and not limping around at the end of the charts.

Isn't there a decent chance the bottleneck is vibe.d's JSON 
implementation rather than the framework as such ?  We know from 
Atila's MQTT project that vibe.D can be significantly faster than 
Go, and we also know that its JSON implementation isn't that 
fast.  Replacing with FastJSON might be interesting.  Sadly I 
don't have time to do that myself.

Dec 31 2015

yawniek <dlang srtnwz.com> writes:

On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
wrote:
 Isn't there a decent chance the bottleneck is vibe.d's JSON 
 implementation rather than the framework as such ?  We know 
 from Atila's MQTT project that vibe.D can be significantly 
 faster than Go, and we also know that its JSON implementation 
 isn't that fast.  Replacing with FastJSON might be interesting.
  Sadly I don't have time to do that myself.

this is not the same benchmark discussed elsewhere, this one is a 
simple echo thing.
no json. it just states that there is some overhead around on 
various layers.
so its testimony is very limited.

from a slightly more distant view you can thus argue that 50k rps 
vs 150k rps basically just means that the framework will most 
probably not be your bottle neck.
none the less, getting ahead in the benchmarks would help to 
attract people who are then pleasantly surprised how easy it is 
to make full blown services with vibe.

the libasync problem seem seems to be because of TCP_NODELAY not 
being deactivated for local connection.

Dec 31 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 from a slightly more distant view you can thus argue that 50k 
 rps vs 150k rps basically just means that the framework will 
 most probably not be your bottle neck.

Go scores 0.5ms latency, vibe.d scores 14.7ms latency. That's a 
big difference that actually matters.

Dart + MongoDB also does very well in the multiple request tests. 
17300 requests versus Python + MySQL at 8800.

 none the less, getting ahead in the benchmarks would help to 
 attract people who are then pleasantly surprised how easy it is 
 to make full blown services with vibe.

It also matters for people who pick a framework. Although the 
benchmark isn't great as general benchmarks it says something 
about:

1. Whether you can stick to the framework even when you need 
better performance, which is why the overhead versus raw platform 
speed is interesting.

2. That the framework has been engineered using performance 
measurements.

It is more useful for writing dynamic web services with simple 
requests rather than regular web servers though.

Dec 31 2015

Etienne Cimon <etcimon gmail.com> writes:

On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
 Isn't there a decent chance the bottleneck is vibe.d's JSON 
 implementation rather than the framework as such ?  We know 
 from Atila's MQTT project that vibe.D can be significantly 
 faster than Go, and we also know that its JSON implementation 
 isn't that fast.  Replacing with FastJSON might be interesting.
  Sadly I don't have time to do that myself.

 this is not the same benchmark discussed elsewhere, this one is 
 a simple echo thing.
 no json. it just states that there is some overhead around on 
 various layers.
 so its testimony is very limited.

 from a slightly more distant view you can thus argue that 50k 
 rps vs 150k rps basically just means that the framework will 
 most probably not be your bottle neck.
 none the less, getting ahead in the benchmarks would help to 
 attract people who are then pleasantly surprised how easy it is 
 to make full blown services with vibe.

 the libasync problem seem seems to be because of TCP_NODELAY 
 not being deactivated for local connection.

That would be the other way around. TCP_NODELAY is not enabled in 
the local connection, which makes a ~20-30ms difference per 
request on keep-alive connections and is the bottleneck in this 
case. Enabling it makes the library competitive in these 
benchmarks.

Dec 31 2015

yawniek <dlang srtnwz.com> writes:

On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 the libasync problem seem seems to be because of TCP_NODELAY 
 not being deactivated for local connection.

 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.

obvious typo and thanks for investigating etienne.

 daniel: i made similar results over the network.
i want to redo them with a more optimized setup though. my wrk 
server was too weak.

the local results are still relevant as its a common setup to 
have nginx distribute to a few vibe instances locally.

Dec 31 2015

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Thu, 31 Dec 2015 12:26:12 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:  
 the libasync problem seem seems to be because of TCP_NODELAY 
 not being deactivated for local connection.  

 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.  

 
 obvious typo and thanks for investigating etienne.
 
  daniel: i made similar results over the network.
 i want to redo them with a more optimized setup though. my wrk 
 server was too weak.
 
 the local results are still relevant as its a common setup to 
 have nginx distribute to a few vibe instances locally.

One thing I forgot to mention I have to modify few things

vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in
setupWorkerThreads, here is a commit which make possible to setup it by
hand.

https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

(I just modify vibe.d code to use all my 4 cores and it helps a lot)

To use more threads it must be setup with distribute option:

settings.options |= HTTPServerOption.distribute;
//setupWorkerThreads(4); // works with master
listenHTTP(settings, &hello);

Dec 31 2015

Nick B <nick.barbalich gmail.com> writes:

On Thursday, 31 December 2015 at 12:44:37 UTC, Daniel Kozak wrote:
 V Thu, 31 Dec 2015 12:26:12 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 
 obvious typo and thanks for investigating etienne.
 
  daniel: i made similar results over the network.
 i want to redo them with a more optimized setup though. my wrk
 server was too weak.
 
 the local results are still relevant as its a common setup to 
 have nginx distribute to a few vibe instances locally.

 One thing I forgot to mention I have to modify few things

 vibe.d has (probably) bug it use threadPerCPU instead of 
 corePerCPU in setupWorkerThreads, here is a commit which make 
 possible to setup it by hand.

 https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

 (I just modify vibe.d code to use all my 4 cores and it helps a 
 lot)

can someone tell me what changes need to be commited, so that we 
have a chance at getting some decent (or even average) benchmark 
numbers ?

Jan 03 2016

Etienne Cimon <etcimon gmail.com> writes:

On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:
 can someone tell me what changes need to be commited, so that 
 we have a chance at getting some decent (or even average) 
 benchmark numbers ?

Considering that the best benchmarks are from tools that have all 
the C calls inlined, I think the best optimizations would be in 
pragma(inline, true), even doing inlining for fiber context 
changes.

Jan 03 2016

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 04.01.2016 um 04:27 schrieb Etienne Cimon:
 On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:
 can someone tell me what changes need to be commited, so that we have
 a chance at getting some decent (or even average) benchmark numbers ?

 Considering that the best benchmarks are from tools that have all the C
 calls inlined, I think the best optimizations would be in pragma(inline,
 true), even doing inlining for fiber context changes.

Fiber context changes are not a significant influence. I've created a 
proof of concept HTTP-server based in vanilla OS calls a while ago and 
got almost no slowdown compared to using only callbacks. The performance 
level was around 200% of current vibe.d.

Having said that, the latest version (0.7.27-alpha.3) contains some 
important performance optimizations over 0.7.26 and should be used for 
comparisons. 0.7.26 also had a performance regression related to allocators.

Jan 03 2016

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:
 vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in
 setupWorkerThreads, here is a commit which make possible to setup it by
 hand.

 https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

 (I just modify vibe.d code to use all my 4 cores and it helps a lot)

 To use more threads it must be setup with distribute option:

 settings.options |= HTTPServerOption.distribute;
 //setupWorkerThreads(4); // works with master
 listenHTTP(settings, &hello);

For me, threadsPerCPU correctly yields the number of logical cores (i.e. 
coresPerCPU * 2 for hyper threading enabled CPUs), which is usually the 
optimal number of threads*. What numbers did you get/expect?

One actual issue could be that, judging by the name, these functions 
would yield the wrong numbers for multi-processor systems. I didn't try 
that so far. Do we have a function in Phobos/Druntime to get the number 
of processors?

* Granted, HT won't help for pure I/O payloads, but worker threads are 
primarily meant for computational tasks.

Jan 03 2016

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Mon, 4 Jan 2016 08:37:10 +0100
Sönke Ludwig via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:
 vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU
 in setupWorkerThreads, here is a commit which make possible to
 setup it by hand.

 https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

 (I just modify vibe.d code to use all my 4 cores and it helps a lot)

 To use more threads it must be setup with distribute option:

 settings.options |= HTTPServerOption.distribute;
 //setupWorkerThreads(4); // works with master
 listenHTTP(settings, &hello);  

 
 For me, threadsPerCPU correctly yields the number of logical cores
 (i.e. coresPerCPU * 2 for hyper threading enabled CPUs), which is
 usually the optimal number of threads*. What numbers did you
 get/expect?
 

On my AMD FX4100 (4 cores) and my AMD AMD A10-7850K(4 core) it is
return 1.

 One actual issue could be that, judging by the name, these functions 
 would yield the wrong numbers for multi-processor systems. I didn't
 try that so far. Do we have a function in Phobos/Druntime to get the
 number of processors?
 
 * Granted, HT won't help for pure I/O payloads, but worker threads
 are primarily meant for computational tasks.

Jan 04 2016

Daniel Kozak <kozzi11 gmail.com> writes:

On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.

When I use HTTPServerOption.distribute with libevent I get better 
performance but with libasync it drops from 20000 req/s to 80 
req/s. So maybe some another performance problem

Dec 31 2015

Etienne Cimon <etcimon gmail.com> writes:

On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in 
 this case. Enabling it makes the library competitive in these 
 benchmarks.

 When I use HTTPServerOption.distribute with libevent I get 
 better performance but with libasync it drops from 20000 req/s 
 to 80 req/s. So maybe some another performance problem

I launch libasync programs as multiple processes, a bit like 
postgresql. The TCP listening is done with REUSEADDR, so the 
kernel can distribute it and it scales linearly without any fear 
of contention on the GC. My globals go in redis or databases

Dec 31 2015

Daniel Kozak <kozzi11 gmail.com> writes:

On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
wrote:
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 [...]

 When I use HTTPServerOption.distribute with libevent I get 
 better performance but with libasync it drops from 20000 req/s 
 to 80 req/s. So maybe some another performance problem

 I launch libasync programs as multiple processes, a bit like 
 postgresql. The TCP listening is done with REUSEADDR, so the 
 kernel can distribute it and it scales linearly without any 
 fear of contention on the GC. My globals go in redis or 
 databases

?

Jan 01 2016

Etienne Cimon <etcimon gmail.com> writes:

On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 [...]

 When I use HTTPServerOption.distribute with libevent I get 
 better performance but with libasync it drops from 20000 
 req/s to 80 req/s. So maybe some another performance problem

 I launch libasync programs as multiple processes, a bit like 
 postgresql. The TCP listening is done with REUSEADDR, so the 
 kernel can distribute it and it scales linearly without any 
 fear of contention on the GC. My globals go in redis or 
 databases

 ?

With libasync, you can run multiple instances of your vibe.d 
server and the linux kernel will round robin the incoming 
connections.

Jan 01 2016

Sebastiaan Koppe <mail skoppe.eu> writes:

On Saturday, 2 January 2016 at 03:00:19 UTC, Etienne Cimon wrote:
 With libasync, you can run multiple instances of your vibe.d 
 server and the linux kernel will round robin the incoming 
 connections.

That is nice. Didn't know that. That would enable 
zero-downtime-updates right?

I use docker a lot so normally I run a proxy container in front 
of the app containers and have it handle ssl and virtual hosts 
routing.

Jan 02 2016

Etienne Cimon <etcimon gmail.com> writes:

On Saturday, 2 January 2016 at 10:05:56 UTC, Sebastiaan Koppe 
wrote:
 That is nice. Didn't know that. That would enable 
 zero-downtime-updates right?

Yes, although you might still break existing connections unless 
you can make the previous process wait for the existing 
connections to close after killing it.

 I use docker a lot so normally I run a proxy container in front 
 of the app containers and have it handle ssl and virtual hosts 
 routing.

I haven't needed to migrate out of my linux server yet (12c/24t 
128gb) but when I do, I'll just add another one and go for DNS 
round robin. I use cloudflare currently and in practice you can 
add/remove A records and it'll round robin through them.

If your server application is capable of running as multiple 
instances, it's only a matter of having the database/cache 
servers accessible from another server and you've got a very 
efficient load balancing that doesn't require any proxies.

Jan 02 2016

Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:

V Sat, 02 Jan 2016 03:00:19 +0000
Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:  
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:  
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:  
 [...]  

 When I use HTTPServerOption.distribute with libevent I get 
 better performance but with libasync it drops from 20000 
 req/s to 80 req/s. So maybe some another performance problem  

 I launch libasync programs as multiple processes, a bit like 
 postgresql. The TCP listening is done with REUSEADDR, so the 
 kernel can distribute it and it scales linearly without any 
 fear of contention on the GC. My globals go in redis or 
 databases  

 ?  

 
 With libasync, you can run multiple instances of your vibe.d 
 server and the linux kernel will round robin the incoming 
 connections.

Yes, but I speak about one instance of vibe.d with multiple
workerThreads witch perform really bad with libasync

Jan 04 2016

Etienne Cimon <etcimon gmail.com> writes:

On Monday, 4 January 2016 at 10:32:41 UTC, Daniel Kozak wrote:
 V Sat, 02 Jan 2016 03:00:19 +0000
 Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> 
 napsáno:

 On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:
 [...]

 ?

 
 With libasync, you can run multiple instances of your vibe.d 
 server and the linux kernel will round robin the incoming 
 connections.

 Yes, but I speak about one instance of vibe.d with multiple 
 workerThreads witch perform really bad with libasync

Yes, I will investigate this.

Jan 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.

I don't know how the benchmarks are set up, but I would assume 
that they don't use a local socket. I wonder if they run the 
database on the same machine, maybe they do, but that's not 
realistic, so they really should not.

Dec 31 2015

yawniek <dlang srtnwz.com> writes:

On Thursday, 31 December 2015 at 15:35:45 UTC, Ola Fosheim 
Grøstad wrote:
 I don't know how the benchmarks are set up, but I would assume 
 that they don't use a local socket. I wonder if they run the 
 database on the same machine, maybe they do, but that's not 
 realistic, so they really should not.

its actually pretty realistic, one point of having a fast 
webserver is that you can save on ressources.
you get a cheap box and have everything there. very common.

Dec 31 2015

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 31 December 2015 at 15:51:50 UTC, yawniek wrote:
 its actually pretty realistic, one point of having a fast 
 webserver is that you can save on ressources.
 you get a cheap box and have everything there. very common.

It does not scale. If you can do it, then you don't really have a 
real need for the throughput in the first place...

Dec 31 2015

Atila Neves <atila.neves gmail.com> writes:

On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
wrote:
 On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110



 i guess its not enough, there are still things that make 
 vibe.d slow.

 i quickly tried
 https://github.com/nanoant/WebFrameworkBenchmark.git
 which is really a very simple benchmark but it shows about the 
 general overhead.

 single core results against go-fasthttp with GOMAXPROCS=1 and 
 vibe distribution disabled on a c4.2xlarge ec2 instance 
 (archlinux):

 vibe.d 0.7.23 with ldc
 Requests/sec:  52102.06

 vibe.d 0.7.26 with dmd
 Requests/sec:  44438.47

 vibe.d 0.7.26 with ldc
 Requests/sec:  53996.62

 go-fasthttp:
 Requests/sec: 152573.32

 go:
 Requests/sec:  62310.04

 its sad.

 i am aware that go-fasthttp is a very simplistic, stripped 
 down webserver and vibe is almost a full blown framework. 
 still it should be D and vibe.d's USP to be faster than the 
 fastest in the world and not limping around at the end of the 
 charts.

 Isn't there a decent chance the bottleneck is vibe.d's JSON 
 implementation rather than the framework as such ?  We know 
 from Atila's MQTT project that vibe.D can be significantly 
 faster than Go, and we also know that its JSON implementation 
 isn't that fast.  Replacing with FastJSON might be interesting.
  Sadly I don't have time to do that myself.

vibe.d _was_ faster than Go. I redid the measurements recently 
once I wrote an MQTT broker in Rust, and it was losing to 
boost::asio, Rust's mio, Go, and Java. I told Soenke about it.

I know it's vibe.d and not my code because after I got the 
disappointing results I wrote bindings from both boost::asio and 
mio to my D code and the winner of the benchmarks shifted to the 
D/mio combo (previously it was Rust - I figured the library was 
the cause and not the language and I was right).

I'd've put up new benchmarks already, I'm only waiting so I can 
show vibe.d in a good light.

Atila

Jan 05 2016

Etienne Cimon <etcimon gmail.com> writes:

On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]

 vibe.d _was_ faster than Go. I redid the measurements recently 
 once I wrote an MQTT broker in Rust, and it was losing to 
 boost::asio, Rust's mio, Go, and Java. I told Soenke about it.

 I know it's vibe.d and not my code because after I got the 
 disappointing results I wrote bindings from both boost::asio 
 and mio to my D code and the winner of the benchmarks shifted 
 to the D/mio combo (previously it was Rust - I figured the 
 library was the cause and not the language and I was right).

 I'd've put up new benchmarks already, I'm only waiting so I can 
 show vibe.d in a good light.

 Atila

The Rust mio library doesn't seem to be doing any black magic. I 
wonder how libasync could be optimized to match it.

Jan 05 2016

rsw0x <anonymous anonymous.com> writes:

On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]

 vibe.d _was_ faster than Go. I redid the measurements recently 
 once I wrote an MQTT broker in Rust, and it was losing to 
 boost::asio, Rust's mio, Go, and Java. I told Soenke about it.

 I know it's vibe.d and not my code because after I got the 
 disappointing results I wrote bindings from both boost::asio 
 and mio to my D code and the winner of the benchmarks shifted 
 to the D/mio combo (previously it was Rust - I figured the 
 library was the cause and not the language and I was right).

 I'd've put up new benchmarks already, I'm only waiting so I 
 can show vibe.d in a good light.

 Atila

 The Rust mio library doesn't seem to be doing any black magic. 
 I wonder how libasync could be optimized to match it.

Have you used perf(or similar) to attempt to find bottlenecks yet?

If you use linux and LDC or GDC, I found it worked fine for my 
needs. Just compile it with optimizations & frame 
pointers(-fno-omit-frame-pointers for GDC and -disable-fp-elim 
for LDC) or dwarf debug symbols. I can't remember which generates 
a better callstack right now, actually, so it's probably worth 
playing around with under the --call-graph flag(fp or dwarf).

Perf is a bit hard to understand if you've never used it before, 
but it's also very powerful.

Bye.

Jan 05 2016

Nikolay <sibnick gmail.com> writes:

On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?

I used perf and wrote my result here: 
http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2

As Sönke Ludwig said direct epoll usage can give more then 200% 
improvements over libevent.

Jan 05 2016

Etienne <etcimon gmail.com> writes:

On Tuesday, 5 January 2016 at 14:45:18 UTC, Nikolay wrote:
 On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?

 I used perf and wrote my result here: 
 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2

 As Sönke Ludwig said direct epoll usage can give more then 200% 
 improvements over libevent.

libasync is the result of an attempt to use epoll directly

Jan 05 2016

Atila Neves <atila.neves gmail.com> writes:

On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]

 vibe.d _was_ faster than Go. I redid the measurements 
 recently once I wrote an MQTT broker in Rust, and it was 
 losing to boost::asio, Rust's mio, Go, and Java. I told 
 Soenke about it.

 I know it's vibe.d and not my code because after I got the 
 disappointing results I wrote bindings from both boost::asio 
 and mio to my D code and the winner of the benchmarks shifted 
 to the D/mio combo (previously it was Rust - I figured the 
 library was the cause and not the language and I was right).

 I'd've put up new benchmarks already, I'm only waiting so I 
 can show vibe.d in a good light.

 Atila

 The Rust mio library doesn't seem to be doing any black magic. 
 I wonder how libasync could be optimized to match it.

 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?

Extensively. I optimised my D code as much as I know how to. And 
that's the same code that gets driven by vibe.d, boost::asio and 
mio.

Nothing stands out anymore in perf. The only main difference I 
can see is that the vibe.d version has far more cache misses. I 
used perf to try and figure out where those came from and 
included them in the email I sent to Soenke.

 Perf is a bit hard to understand if you've never used it 
 before, but it's also very powerful.

Oh, I know. :)

Atila

Jan 06 2016

Etienne Cimon <etcimon gmail.com> writes:

On Wednesday, 6 January 2016 at 08:24:10 UTC, Atila Neves wrote:
 On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon 
 wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]

 The Rust mio library doesn't seem to be doing any black 
 magic. I wonder how libasync could be optimized to match it.

 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?

 Extensively. I optimised my D code as much as I know how to. 
 And that's the same code that gets driven by vibe.d, 
 boost::asio and mio.

 Nothing stands out anymore in perf. The only main difference I 
 can see is that the vibe.d version has far more cache misses. I 
 used perf to try and figure out where those came from and 
 included them in the email I sent to Soenke.

 Perf is a bit hard to understand if you've never used it 
 before, but it's also very powerful.

 Oh, I know. :)

 Atila

It's possible that those cache misses will be irrelevant when the 
requests actually do something, is it not? When a lot of 
different requests are competing for cache lines, I'd assume it's 
shuffling it enough to change these readings

Jan 07 2016

Nikolay <sibnick gmail.com> writes:

On Friday, 8 January 2016 at 04:02:39 UTC, Etienne Cimon wrote:
 It's possible that those cache misses will be irrelevant when 
 the requests actually do something, is it not? When a lot of 
 different requests are competing for cache lines, I'd assume 
 it's shuffling it enough to change these readings

I believe cache-misses problem is related to old vibed version. 
There was to many context switch. Now vibed uses SO_REUSEPORT 
socket option. It reduces context switch count radically.

Jan 08 2016

Atila Neves <atila.neves gmail.com> writes:

On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]

 The Rust mio library doesn't seem to be doing any black magic. 
 I wonder how libasync could be optimized to match it.

No black magic, it's a thin wrapper over epoll. But it was faster 
than boost::asio and vibe.d the last time I measured.

Atila

Jan 06 2016

Etienne Cimon <etcimon gmail.com> writes:

On Wednesday, 6 January 2016 at 08:21:00 UTC, Atila Neves wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]

 The Rust mio library doesn't seem to be doing any black magic. 
 I wonder how libasync could be optimized to match it.

 No black magic, it's a thin wrapper over epoll. But it was 
 faster than boost::asio and vibe.d the last time I measured.

 Atila

You tested D+mio, but the equivalent would probably be D+libasync 
as it is a standalone library, thin wrapper around epoll

Jan 07 2016

Daniel Kozak <kozzi11 gmail.com> writes:

On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110



 i guess its not enough, there are still things that make vibe.d 
 slow.

 i quickly tried
 https://github.com/nanoant/WebFrameworkBenchmark.git
 which is really a very simple benchmark but it shows about the 
 general overhead.

 single core results against go-fasthttp with GOMAXPROCS=1 and 
 vibe distribution disabled on a c4.2xlarge ec2 instance 
 (archlinux):

 vibe.d 0.7.23 with ldc
 Requests/sec:  52102.06

 vibe.d 0.7.26 with dmd
 Requests/sec:  44438.47

 vibe.d 0.7.26 with ldc
 Requests/sec:  53996.62

 go-fasthttp:
 Requests/sec: 152573.32

 go:
 Requests/sec:  62310.04

 its sad.

 i am aware that go-fasthttp is a very simplistic, stripped down 
 webserver and vibe is almost a full blown framework. still it 
 should be D and vibe.d's USP to be faster than the fastest in 
 the world and not limping around at the end of the charts.

My results from siege(just return page with Hello World same as 
WebFrameworkBenchmark):

siege -c 20 -q -b -t30S http://127.0.0.1:8080

vibed: --combined -b release-nobounds --compiler=ldmd

Transactions:		      968269 hits
Availability:		      100.00 %
Elapsed time:		       29.10 secs
Data transferred:	       12.00 MB
Response time:		        0.00 secs
Transaction rate:	    33273.85 trans/sec
Throughput:		        0.41 MB/sec
Concurrency:		       19.62
Successful transactions:      968269
Failed transactions:	           0
Longest transaction:	        0.04
Shortest transaction:	        0.00

vibed(one thread):

Transactions:		      767815 hits
Availability:		      100.00 %
Elapsed time:		       29.94 secs
Data transferred:	        9.52 MB
Response time:		        0.00 secs
Transaction rate:	    25645.12 trans/sec
Throughput:		        0.32 MB/sec
Concurrency:		       19.66
Successful transactions:      767815
Failed transactions:	           0
Longest transaction:	        0.02
Shortest transaction:	        0.00


GOMAXPROCS=4 go run hello.go

Transactions:		      765301 hits
Availability:		      100.00 %
Elapsed time:		       29.52 secs
Data transferred:	        8.03 MB
Response time:		        0.00 secs
Transaction rate:	    25924.83 trans/sec
Throughput:		        0.27 MB/sec
Concurrency:		       19.68
Successful transactions:      765301
Failed transactions:	           0
Longest transaction:	        0.02
Shortest transaction:	        0.00

GOMAXPROCS=1 go run hello.go

Transactions:		      478991 hits
Availability:		      100.00 %
Elapsed time:		       29.47 secs
Data transferred:	        5.02 MB
Response time:		        0.00 secs
Transaction rate:	    16253.51 trans/sec
Throughput:		        0.17 MB/sec
Concurrency:		       19.75
Successful transactions:      478992
Failed transactions:	           0
Longest transaction:	        0.02
Shortest transaction:	        0.00

UnderTow (4 cores):

Transactions:		      965835 hits
Availability:		      100.00 %
Elapsed time:		       29.41 secs
Data transferred:	       10.13 MB
Response time:		        0.00 secs
Transaction rate:	    32840.36 trans/sec
Throughput:		        0.34 MB/sec
Concurrency:		       19.57
Successful transactions:      965836
Failed transactions:	           0
Longest transaction:	        0.01
Shortest transaction:	        0.00

Kore.io (4 workers)

Transactions:		        2043 hits
Availability:		      100.00 %
Elapsed time:		       29.61 secs
Data transferred:	        0.02 MB
Response time:		        0.29 secs
Transaction rate:	       69.00 trans/sec
Throughput:		        0.00 MB/sec
Concurrency:		       19.96
Successful transactions:        2043
Failed transactions:	           0
Longest transaction:	        0.55
Shortest transaction:	        0.00


So it seems vibed has the best results :)

Dec 31 2015

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:

Am 30.12.2015 um 21:32 schrieb yawniek:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110



 i guess its not enough, there are still things that make vibe.d slow.

 i quickly tried
 https://github.com/nanoant/WebFrameworkBenchmark.git
 which is really a very simple benchmark but it shows about the general
 overhead.

 single core results against go-fasthttp with GOMAXPROCS=1 and vibe
 distribution disabled on a c4.2xlarge ec2 instance (archlinux):

 (...)
 its sad.

Can you try with the latest GIT master? There are some important 
optimizations which are not in 0.7.26 (which has at least one 
performance regression).

Jan 04 2016

D Programming

C/C++ Programming

Other

digitalmars.D - vibe.d benchmarks