www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - vibe.d benchmarks

reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
https://www.techempower.com/benchmarks/

The entries for vibe.d are either doing very poorly or fail to 
complete. Maybe someone should look into this?
Dec 28 2015
parent reply Charles <csmith.ku2013 gmail.com> writes:
On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim Grøstad 
wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail to 
 complete. Maybe someone should look into this?
Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Dec 28 2015
parent reply Nick B <nick.barbalich gmail.com> writes:
On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:
 On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim 
 Grøstad wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail to 
 complete. Maybe someone should look into this?
Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Correct me if I am wrong here, but as far I can tell there is no independent benchmarks showing performance (superior or good enough) of D verses Go, or against just about any other language, as well ? https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hr
Dec 29 2015
parent reply Charles <csmith.ku2013 gmail.com> writes:
On Tuesday, 29 December 2015 at 22:49:36 UTC, Nick B wrote:
 On Monday, 28 December 2015 at 13:10:59 UTC, Charles wrote:
 On Monday, 28 December 2015 at 12:24:17 UTC, Ola Fosheim 
 Grøstad wrote:
 https://www.techempower.com/benchmarks/

 The entries for vibe.d are either doing very poorly or fail 
 to complete. Maybe someone should look into this?
Sönke is already on it. http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
Correct me if I am wrong here, but as far I can tell there is no independent benchmarks showing performance (superior or good enough) of D verses Go, or against just about any other language, as well ? https://www.techempower.com/benchmarks/#section=data-r11&hw=peak&test=json&l=cnc&f=zik0vz-zik0zj-zik0zj-zik0zj-hra0hr
The last time the official benchmark was run was over a month before Sönke's PR.
Dec 29 2015
parent reply yawniek <dlang srtnwz.com> writes:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.
Dec 30 2015
next sibling parent reply Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
V Wed, 30 Dec 2015 20:32:08 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110  
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.
Which async library you use for vibed? libevent? libev? or libasync? Which compilation switches you used? Without this info it says nothing about vibe.d's performance :)
Dec 30 2015
parent reply yawniek <dlang srtnwz.com> writes:
On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak 
wrote:
 V Wed, 30 Dec 2015 20:32:08 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Which async library you use for vibed? libevent? libev? or 
 libasync? Which compilation switches you used?

 Without this info it says nothing about vibe.d's performance :)
the numbers above are libevent in release mode, as per original configuration. for libasync there is a problem so its stuck at 2.4 rps. etcimon is currently investigating there.
Dec 30 2015
parent Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
V Wed, 30 Dec 2015 21:09:37 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Wednesday, 30 December 2015 at 20:38:58 UTC, Daniel Kozak 
 wrote:
 V Wed, 30 Dec 2015 20:32:08 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Which async library you use for vibed? libevent? libev? or 
 libasync? Which compilation switches you used?

 Without this info it says nothing about vibe.d's performance :)  
the numbers above are libevent in release mode, as per original configuration. for libasync there is a problem so its stuck at 2.4 rps. etcimon is currently investigating there.
Thanks, it is wierd I use libasync and have quite good performance, probably some regression (which version of libasync?)
Dec 30 2015
prev sibling next sibling parent reply Laeeth Isharc <laeethnospam nospam.laeeth.com> writes:
On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.
Isn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.
Dec 31 2015
next sibling parent reply yawniek <dlang srtnwz.com> writes:
On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
wrote:
 Isn't there a decent chance the bottleneck is vibe.d's JSON 
 implementation rather than the framework as such ?  We know 
 from Atila's MQTT project that vibe.D can be significantly 
 faster than Go, and we also know that its JSON implementation 
 isn't that fast.  Replacing with FastJSON might be interesting.
  Sadly I don't have time to do that myself.
this is not the same benchmark discussed elsewhere, this one is a simple echo thing. no json. it just states that there is some overhead around on various layers. so its testimony is very limited. from a slightly more distant view you can thus argue that 50k rps vs 150k rps basically just means that the framework will most probably not be your bottle neck. none the less, getting ahead in the benchmarks would help to attract people who are then pleasantly surprised how easy it is to make full blown services with vibe. the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.
Dec 31 2015
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 from a slightly more distant view you can thus argue that 50k 
 rps vs 150k rps basically just means that the framework will 
 most probably not be your bottle neck.
Go scores 0.5ms latency, vibe.d scores 14.7ms latency. That's a big difference that actually matters. Dart + MongoDB also does very well in the multiple request tests. 17300 requests versus Python + MySQL at 8800.
 none the less, getting ahead in the benchmarks would help to 
 attract people who are then pleasantly surprised how easy it is 
 to make full blown services with vibe.
It also matters for people who pick a framework. Although the benchmark isn't great as general benchmarks it says something about: 1. Whether you can stick to the framework even when you need better performance, which is why the overhead versus raw platform speed is interesting. 2. That the framework has been engineered using performance measurements. It is more useful for writing dynamic web services with simple requests rather than regular web servers though.
Dec 31 2015
prev sibling parent reply Etienne Cimon <etcimon gmail.com> writes:
On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
 Isn't there a decent chance the bottleneck is vibe.d's JSON 
 implementation rather than the framework as such ?  We know 
 from Atila's MQTT project that vibe.D can be significantly 
 faster than Go, and we also know that its JSON implementation 
 isn't that fast.  Replacing with FastJSON might be interesting.
  Sadly I don't have time to do that myself.
this is not the same benchmark discussed elsewhere, this one is a simple echo thing. no json. it just states that there is some overhead around on various layers. so its testimony is very limited. from a slightly more distant view you can thus argue that 50k rps vs 150k rps basically just means that the framework will most probably not be your bottle neck. none the less, getting ahead in the benchmarks would help to attract people who are then pleasantly surprised how easy it is to make full blown services with vibe. the libasync problem seem seems to be because of TCP_NODELAY not being deactivated for local connection.
That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.
Dec 31 2015
next sibling parent reply yawniek <dlang srtnwz.com> writes:
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:
 the libasync problem seem seems to be because of TCP_NODELAY 
 not being deactivated for local connection.
That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.
obvious typo and thanks for investigating etienne. daniel: i made similar results over the network. i want to redo them with a more optimized setup though. my wrk server was too weak. the local results are still relevant as its a common setup to have nginx distribute to a few vibe instances locally.
Dec 31 2015
parent reply Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
V Thu, 31 Dec 2015 12:26:12 +0000
yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 On Thursday, 31 December 2015 at 08:51:31 UTC, yawniek wrote:  
 the libasync problem seem seems to be because of TCP_NODELAY 
 not being deactivated for local connection.  
That would be the other way around. TCP_NODELAY is not enabled in the local connection, which makes a ~20-30ms difference per request on keep-alive connections and is the bottleneck in this case. Enabling it makes the library competitive in these benchmarks.
obvious typo and thanks for investigating etienne. daniel: i made similar results over the network. i want to redo them with a more optimized setup though. my wrk server was too weak. the local results are still relevant as its a common setup to have nginx distribute to a few vibe instances locally.
One thing I forgot to mention I have to modify few things vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot) To use more threads it must be setup with distribute option: settings.options |= HTTPServerOption.distribute; //setupWorkerThreads(4); // works with master listenHTTP(settings, &hello);
Dec 31 2015
next sibling parent reply Nick B <nick.barbalich gmail.com> writes:
On Thursday, 31 December 2015 at 12:44:37 UTC, Daniel Kozak wrote:
 V Thu, 31 Dec 2015 12:26:12 +0000
 yawniek via Digitalmars-d <digitalmars-d puremagic.com> napsáno:
 
 obvious typo and thanks for investigating etienne.
 
  daniel: i made similar results over the network.
 i want to redo them with a more optimized setup though. my wrk
 server was too weak.
 
 the local results are still relevant as its a common setup to 
 have nginx distribute to a few vibe instances locally.
One thing I forgot to mention I have to modify few things vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in setupWorkerThreads, here is a commit which make possible to setup it by hand. https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447 (I just modify vibe.d code to use all my 4 cores and it helps a lot)
can someone tell me what changes need to be commited, so that we have a chance at getting some decent (or even average) benchmark numbers ?
Jan 03
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:
 can someone tell me what changes need to be commited, so that 
 we have a chance at getting some decent (or even average) 
 benchmark numbers ?
Considering that the best benchmarks are from tools that have all the C calls inlined, I think the best optimizations would be in pragma(inline, true), even doing inlining for fiber context changes.
Jan 03
parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 04.01.2016 um 04:27 schrieb Etienne Cimon:
 On Sunday, 3 January 2016 at 22:16:08 UTC, Nick B wrote:
 can someone tell me what changes need to be commited, so that we have
 a chance at getting some decent (or even average) benchmark numbers ?
Considering that the best benchmarks are from tools that have all the C calls inlined, I think the best optimizations would be in pragma(inline, true), even doing inlining for fiber context changes.
Fiber context changes are not a significant influence. I've created a proof of concept HTTP-server based in vanilla OS calls a while ago and got almost no slowdown compared to using only callbacks. The performance level was around 200% of current vibe.d. Having said that, the latest version (0.7.27-alpha.3) contains some important performance optimizations over 0.7.26 and should be used for comparisons. 0.7.26 also had a performance regression related to allocators.
Jan 03
prev sibling parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:
 vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU in
 setupWorkerThreads, here is a commit which make possible to setup it by
 hand.

 https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

 (I just modify vibe.d code to use all my 4 cores and it helps a lot)

 To use more threads it must be setup with distribute option:

 settings.options |= HTTPServerOption.distribute;
 //setupWorkerThreads(4); // works with master
 listenHTTP(settings, &hello);
For me, threadsPerCPU correctly yields the number of logical cores (i.e. coresPerCPU * 2 for hyper threading enabled CPUs), which is usually the optimal number of threads*. What numbers did you get/expect? One actual issue could be that, judging by the name, these functions would yield the wrong numbers for multi-processor systems. I didn't try that so far. Do we have a function in Phobos/Druntime to get the number of processors? * Granted, HT won't help for pure I/O payloads, but worker threads are primarily meant for computational tasks.
Jan 03
parent Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
V Mon, 4 Jan 2016 08:37:10 +0100
Sönke Ludwig via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 Am 31.12.2015 um 13:44 schrieb Daniel Kozak via Digitalmars-d:
 vibe.d has (probably) bug it use threadPerCPU instead of corePerCPU
 in setupWorkerThreads, here is a commit which make possible to
 setup it by hand.

 https://github.com/rejectedsoftware/vibe.d/commit/f946c3a840eab4ef5f7b98906a6eb143509e1447

 (I just modify vibe.d code to use all my 4 cores and it helps a lot)

 To use more threads it must be setup with distribute option:

 settings.options |= HTTPServerOption.distribute;
 //setupWorkerThreads(4); // works with master
 listenHTTP(settings, &hello);  
For me, threadsPerCPU correctly yields the number of logical cores (i.e. coresPerCPU * 2 for hyper threading enabled CPUs), which is usually the optimal number of threads*. What numbers did you get/expect?
On my AMD FX4100 (4 cores) and my AMD AMD A10-7850K(4 core) it is return 1.
 One actual issue could be that, judging by the name, these functions 
 would yield the wrong numbers for multi-processor systems. I didn't
 try that so far. Do we have a function in Phobos/Druntime to get the
 number of processors?
 
 * Granted, HT won't help for pure I/O payloads, but worker threads
 are primarily meant for computational tasks.
Jan 04
prev sibling next sibling parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.
When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
Dec 31 2015
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in 
 this case. Enabling it makes the library competitive in these 
 benchmarks.
When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases
Dec 31 2015
parent reply Daniel Kozak <kozzi11 gmail.com> writes:
On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
wrote:
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 [...]
When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases
?
Jan 01
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:
 [...]
When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases
?
With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.
Jan 01
next sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Saturday, 2 January 2016 at 03:00:19 UTC, Etienne Cimon wrote:
 With libasync, you can run multiple instances of your vibe.d 
 server and the linux kernel will round robin the incoming 
 connections.
That is nice. Didn't know that. That would enable zero-downtime-updates right? I use docker a lot so normally I run a proxy container in front of the app containers and have it handle ssl and virtual hosts routing.
Jan 02
parent Etienne Cimon <etcimon gmail.com> writes:
On Saturday, 2 January 2016 at 10:05:56 UTC, Sebastiaan Koppe 
wrote:
 That is nice. Didn't know that. That would enable 
 zero-downtime-updates right?
Yes, although you might still break existing connections unless you can make the previous process wait for the existing connections to close after killing it.
 I use docker a lot so normally I run a proxy container in front 
 of the app containers and have it handle ssl and virtual hosts 
 routing.
I haven't needed to migrate out of my linux server yet (12c/24t 128gb) but when I do, I'll just add another one and go for DNS round robin. I use cloudflare currently and in practice you can add/remove A records and it'll round robin through them. If your server application is capable of running as multiple instances, it's only a matter of having the database/cache servers accessible from another server and you've got a very efficient load balancing that doesn't require any proxies.
Jan 02
prev sibling parent reply Daniel Kozak via Digitalmars-d <digitalmars-d puremagic.com> writes:
V Sat, 02 Jan 2016 03:00:19 +0000
Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> napsáno:

 On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:  
 On Thursday, 31 December 2015 at 13:29:49 UTC, Daniel Kozak 
 wrote:  
 On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
 wrote:  
 [...]  
When I use HTTPServerOption.distribute with libevent I get better performance but with libasync it drops from 20000 req/s to 80 req/s. So maybe some another performance problem
I launch libasync programs as multiple processes, a bit like postgresql. The TCP listening is done with REUSEADDR, so the kernel can distribute it and it scales linearly without any fear of contention on the GC. My globals go in redis or databases
?
With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.
Yes, but I speak about one instance of vibe.d with multiple workerThreads witch perform really bad with libasync
Jan 04
parent Etienne Cimon <etcimon gmail.com> writes:
On Monday, 4 January 2016 at 10:32:41 UTC, Daniel Kozak wrote:
 V Sat, 02 Jan 2016 03:00:19 +0000
 Etienne Cimon via Digitalmars-d <digitalmars-d puremagic.com> 
 napsáno:

 On Friday, 1 January 2016 at 11:38:53 UTC, Daniel Kozak wrote:
 On Thursday, 31 December 2015 at 18:23:17 UTC, Etienne Cimon 
 wrote:
 [...]
?
With libasync, you can run multiple instances of your vibe.d server and the linux kernel will round robin the incoming connections.
Yes, but I speak about one instance of vibe.d with multiple workerThreads witch perform really bad with libasync
Yes, I will investigate this.
Jan 04
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 31 December 2015 at 12:09:30 UTC, Etienne Cimon 
wrote:
 That would be the other way around. TCP_NODELAY is not enabled 
 in the local connection, which makes a ~20-30ms difference per 
 request on keep-alive connections and is the bottleneck in this 
 case. Enabling it makes the library competitive in these 
 benchmarks.
I don't know how the benchmarks are set up, but I would assume that they don't use a local socket. I wonder if they run the database on the same machine, maybe they do, but that's not realistic, so they really should not.
Dec 31 2015
parent reply yawniek <dlang srtnwz.com> writes:
On Thursday, 31 December 2015 at 15:35:45 UTC, Ola Fosheim 
Grøstad wrote:
 I don't know how the benchmarks are set up, but I would assume 
 that they don't use a local socket. I wonder if they run the 
 database on the same machine, maybe they do, but that's not 
 realistic, so they really should not.
its actually pretty realistic, one point of having a fast webserver is that you can save on ressources. you get a cheap box and have everything there. very common.
Dec 31 2015
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 31 December 2015 at 15:51:50 UTC, yawniek wrote:
 its actually pretty realistic, one point of having a fast 
 webserver is that you can save on ressources.
 you get a cheap box and have everything there. very common.
It does not scale. If you can do it, then you don't really have a real need for the throughput in the first place...
Dec 31 2015
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
wrote:
 On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.
Isn't there a decent chance the bottleneck is vibe.d's JSON implementation rather than the framework as such ? We know from Atila's MQTT project that vibe.D can be significantly faster than Go, and we also know that its JSON implementation isn't that fast. Replacing with FastJSON might be interesting. Sadly I don't have time to do that myself.
vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
Jan 05
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]
vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Jan 05
next sibling parent reply rsw0x <anonymous anonymous.com> writes:
On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]
vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Have you used perf(or similar) to attempt to find bottlenecks yet? If you use linux and LDC or GDC, I found it worked fine for my needs. Just compile it with optimizations & frame pointers(-fno-omit-frame-pointers for GDC and -disable-fp-elim for LDC) or dwarf debug symbols. I can't remember which generates a better callstack right now, actually, so it's probably worth playing around with under the --call-graph flag(fp or dwarf). Perf is a bit hard to understand if you've never used it before, but it's also very powerful. Bye.
Jan 05
next sibling parent reply Nikolay <sibnick gmail.com> writes:
On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?
I used perf and wrote my result here: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2 As Sönke Ludwig said direct epoll usage can give more then 200% improvements over libevent.
Jan 05
parent Etienne <etcimon gmail.com> writes:
On Tuesday, 5 January 2016 at 14:45:18 UTC, Nikolay wrote:
 On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 Have you used perf(or similar) to attempt to find bottlenecks 
 yet?
I used perf and wrote my result here: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/1670/?page=2 As Sönke Ludwig said direct epoll usage can give more then 200% improvements over libevent.
libasync is the result of an attempt to use epoll directly
Jan 05
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 On Thursday, 31 December 2015 at 08:23:26 UTC, Laeeth Isharc 
 wrote:
  [...]
vibe.d _was_ faster than Go. I redid the measurements recently once I wrote an MQTT broker in Rust, and it was losing to boost::asio, Rust's mio, Go, and Java. I told Soenke about it. I know it's vibe.d and not my code because after I got the disappointing results I wrote bindings from both boost::asio and mio to my D code and the winner of the benchmarks shifted to the D/mio combo (previously it was Rust - I figured the library was the cause and not the language and I was right). I'd've put up new benchmarks already, I'm only waiting so I can show vibe.d in a good light. Atila
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Have you used perf(or similar) to attempt to find bottlenecks yet?
Extensively. I optimised my D code as much as I know how to. And that's the same code that gets driven by vibe.d, boost::asio and mio. Nothing stands out anymore in perf. The only main difference I can see is that the vibe.d version has far more cache misses. I used perf to try and figure out where those came from and included them in the email I sent to Soenke.
 Perf is a bit hard to understand if you've never used it 
 before, but it's also very powerful.
Oh, I know. :) Atila
Jan 06
parent reply Etienne Cimon <etcimon gmail.com> writes:
On Wednesday, 6 January 2016 at 08:24:10 UTC, Atila Neves wrote:
 On Tuesday, 5 January 2016 at 14:15:18 UTC, rsw0x wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon 
 wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
Have you used perf(or similar) to attempt to find bottlenecks yet?
Extensively. I optimised my D code as much as I know how to. And that's the same code that gets driven by vibe.d, boost::asio and mio. Nothing stands out anymore in perf. The only main difference I can see is that the vibe.d version has far more cache misses. I used perf to try and figure out where those came from and included them in the email I sent to Soenke.
 Perf is a bit hard to understand if you've never used it 
 before, but it's also very powerful.
Oh, I know. :) Atila
It's possible that those cache misses will be irrelevant when the requests actually do something, is it not? When a lot of different requests are competing for cache lines, I'd assume it's shuffling it enough to change these readings
Jan 07
parent Nikolay <sibnick gmail.com> writes:
On Friday, 8 January 2016 at 04:02:39 UTC, Etienne Cimon wrote:
 It's possible that those cache misses will be irrelevant when 
 the requests actually do something, is it not? When a lot of 
 different requests are competing for cache lines, I'd assume 
 it's shuffling it enough to change these readings
I believe cache-misses problem is related to old vibed version. There was to many context switch. Now vibed uses SO_REUSEPORT socket option. It reduces context switch count radically.
Jan 08
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
No black magic, it's a thin wrapper over epoll. But it was faster than boost::asio and vibe.d the last time I measured. Atila
Jan 06
parent Etienne Cimon <etcimon gmail.com> writes:
On Wednesday, 6 January 2016 at 08:21:00 UTC, Atila Neves wrote:
 On Tuesday, 5 January 2016 at 13:09:55 UTC, Etienne Cimon wrote:
 On Tuesday, 5 January 2016 at 10:11:36 UTC, Atila Neves wrote:
 [...]
The Rust mio library doesn't seem to be doing any black magic. I wonder how libasync could be optimized to match it.
No black magic, it's a thin wrapper over epoll. But it was faster than boost::asio and vibe.d the last time I measured. Atila
You tested D+mio, but the equivalent would probably be D+libasync as it is a standalone library, thin wrapper around epoll
Jan 07
prev sibling next sibling parent Daniel Kozak <kozzi11 gmail.com> writes:
On Wednesday, 30 December 2015 at 20:32:08 UTC, yawniek wrote:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): vibe.d 0.7.23 with ldc Requests/sec: 52102.06 vibe.d 0.7.26 with dmd Requests/sec: 44438.47 vibe.d 0.7.26 with ldc Requests/sec: 53996.62 go-fasthttp: Requests/sec: 152573.32 go: Requests/sec: 62310.04 its sad. i am aware that go-fasthttp is a very simplistic, stripped down webserver and vibe is almost a full blown framework. still it should be D and vibe.d's USP to be faster than the fastest in the world and not limping around at the end of the charts.
My results from siege(just return page with Hello World same as WebFrameworkBenchmark): siege -c 20 -q -b -t30S http://127.0.0.1:8080 vibed: --combined -b release-nobounds --compiler=ldmd Transactions: 968269 hits Availability: 100.00 % Elapsed time: 29.10 secs Data transferred: 12.00 MB Response time: 0.00 secs Transaction rate: 33273.85 trans/sec Throughput: 0.41 MB/sec Concurrency: 19.62 Successful transactions: 968269 Failed transactions: 0 Longest transaction: 0.04 Shortest transaction: 0.00 vibed(one thread): Transactions: 767815 hits Availability: 100.00 % Elapsed time: 29.94 secs Data transferred: 9.52 MB Response time: 0.00 secs Transaction rate: 25645.12 trans/sec Throughput: 0.32 MB/sec Concurrency: 19.66 Successful transactions: 767815 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 GOMAXPROCS=4 go run hello.go Transactions: 765301 hits Availability: 100.00 % Elapsed time: 29.52 secs Data transferred: 8.03 MB Response time: 0.00 secs Transaction rate: 25924.83 trans/sec Throughput: 0.27 MB/sec Concurrency: 19.68 Successful transactions: 765301 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 GOMAXPROCS=1 go run hello.go Transactions: 478991 hits Availability: 100.00 % Elapsed time: 29.47 secs Data transferred: 5.02 MB Response time: 0.00 secs Transaction rate: 16253.51 trans/sec Throughput: 0.17 MB/sec Concurrency: 19.75 Successful transactions: 478992 Failed transactions: 0 Longest transaction: 0.02 Shortest transaction: 0.00 UnderTow (4 cores): Transactions: 965835 hits Availability: 100.00 % Elapsed time: 29.41 secs Data transferred: 10.13 MB Response time: 0.00 secs Transaction rate: 32840.36 trans/sec Throughput: 0.34 MB/sec Concurrency: 19.57 Successful transactions: 965836 Failed transactions: 0 Longest transaction: 0.01 Shortest transaction: 0.00 Kore.io (4 workers) Transactions: 2043 hits Availability: 100.00 % Elapsed time: 29.61 secs Data transferred: 0.02 MB Response time: 0.29 secs Transaction rate: 69.00 trans/sec Throughput: 0.00 MB/sec Concurrency: 19.96 Successful transactions: 2043 Failed transactions: 0 Longest transaction: 0.55 Shortest transaction: 0.00 So it seems vibed has the best results :)
Dec 31 2015
prev sibling parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig rejectedsoftware.com> writes:
Am 30.12.2015 um 21:32 schrieb yawniek:
 Sönke is already on it.

 http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/post/29110
i guess its not enough, there are still things that make vibe.d slow. i quickly tried https://github.com/nanoant/WebFrameworkBenchmark.git which is really a very simple benchmark but it shows about the general overhead. single core results against go-fasthttp with GOMAXPROCS=1 and vibe distribution disabled on a c4.2xlarge ec2 instance (archlinux): (...) its sad.
Can you try with the latest GIT master? There are some important optimizations which are not in 0.7.26 (which has at least one performance regression).
Jan 04