www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why can't D language framework play best?

reply zoujiaqing <zoujiaqing gmail.com> writes:
We have been continuously optimizing the performance of hunt 
library io module, but have not been as high as AspCore、Rust and 
Java.

https://www.techempower.com/benchmarks/#section=test&runid=9e7a6863-b92e-4079-a2a9-324426369751&hw=ph&test=plaintext

What good ideas do you have to offer?
May 08
next sibling parent Mathias LANG <geod24 gmail.com> writes:
On Friday, 8 May 2020 at 08:13:35 UTC, zoujiaqing wrote:
 We have been continuously optimizing the performance of hunt 
 library io module, but have not been as high as AspCore、Rust 
 and Java.

 https://www.techempower.com/benchmarks/#section=test&runid=9e7a6863-b92e-4079-a2a9-324426369751&hw=ph&test=plaintext

 What good ideas do you have to offer?
Looking at the data table, one thing stands out: Performances for the row of "256" (is it size ? Concurrent requests?) is lower by an order of magnitude. Assuming it's a size, that might just be a badly sized buffer, or a wrong "small data" optimization, but only profiling can tell.
May 08
prev sibling next sibling parent lili <akozhao tencent.com> writes:
On Friday, 8 May 2020 at 08:13:35 UTC, zoujiaqing wrote:
 We have been continuously optimizing the performance of hunt 
 library io module, but have not been as high as AspCore、Rust 
 and Java.

 https://www.techempower.com/benchmarks/#section=test&runid=9e7a6863-b92e-4079-a2a9-324426369751&hw=ph&test=plaintext

 What good ideas do you have to offer?
比前三差了10%,差距有点大啊, IO按道道理和D应该没关系,是不是线程模型导致的。
May 08
prev sibling next sibling parent welkam <wwwelkam gmail.com> writes:
On Friday, 8 May 2020 at 08:13:35 UTC, zoujiaqing wrote:
 We have been continuously optimizing the performance of hunt 
 library io module, but have not been as high as AspCore、Rust 
 and Java.

 https://www.techempower.com/benchmarks/#section=test&runid=9e7a6863-b92e-4079-a2a9-324426369751&hw=ph&test=plaintext

 What good ideas do you have to offer?
If you could create a benchmark that anyone could easily set up and run then I could provide concrete ideas or even patches.
May 08
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2020-05-08 10:13, zoujiaqing wrote:
 We have been continuously optimizing the performance of hunt library io 
 module, but have not been as high as AspCore、Rust and Java.
 
 https://www.techempower.com/benchmarks/#section=test&runid=9e7a6863-b92e-4079-a2a9-324426369751&
w=ph&test=plaintext 
 
 
 What good ideas do you have to offer?
I think 18 out of 404 (plain text) and 8 out of 409 (JSON serialization) is pretty good. Compare that with vibe.d, it's down on 139, or something like that. I haven't used Hunt, but I did have a brief look at the code base. It seems very class centric. That means heap allocation (which are slow) and access through indirection, which are at least slower than a direct access. Keep in mind the D's GC is stop-the-world. Would be interesting to see a benchmark with the GC turned off. Or using multiple processes (assuming it's not already used) instead of multiple threads. I think as much of possible should be based on structs. It always simpler to turn a value type into a reference type (by embedding it in a class) then doing the opposite. I haven't done any benchmarks, but when it comes to allocations it sounds like a request local region allocator, possible backed with a free list, would be efficient. The region allocator could first use a static array as its buffer, then fall back to allocating on the heap when the static buffer is full. -- /Jacob Carlborg
May 11
parent reply welkam <wwwelkam gmail.com> writes:
On Monday, 11 May 2020 at 08:09:24 UTC, Jacob Carlborg wrote:
 It seems very class centric.
The problem with classes is that they introduce indirections. First they are reference type so any access to them goes trough a pointer. Second unless you marked methods as final method calls goes trough vtable meaning that to execute a code on a peace of data it could require 2 pointer "derefrences". I dont know how GC allocates memory but malloc gives 16 byte aligned pointers and if GC does the same then if your class size is not multiple of 16 you would get a lot of "padding" between your classes. Since processor loads 64 bytes at a time you are guarantee that padding will be loaded too wasting cache space and bandwidth. Better to use containers that put data in contiguous peace of memory. You can get most of what class inheritance gives by using alias this on structs. struct example { base foo; alias foo this; } struct base {//data} Jacob is correct to point out that memory management should not be overlooked. DMD got a big improvement when it switched to bump the pointer allocator https://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941 After this article I read another blog post that was inspired by this and preallocated a bunch of memory and saw over 100% speed improvement. Because we dont have profile information our advice can only be vague and not specific.
May 11
parent IGotD- <nise nise.com> writes:
On Monday, 11 May 2020 at 11:03:26 UTC, welkam wrote:
 On Monday, 11 May 2020 at 08:09:24 UTC, Jacob Carlborg wrote:
 It seems very class centric.
The problem with classes is that they introduce indirections. First they are reference type so any access to them goes trough a pointer. Second unless you marked methods as final method calls goes trough vtable meaning that to execute a code on a peace of data it could require 2 pointer "derefrences". I dont know how GC allocates memory but malloc gives 16 byte aligned pointers and if GC does the same then if your class size is not multiple of 16 you would get a lot of "padding" between your classes. Since processor loads 64 bytes at a time you are guarantee that padding will be loaded too wasting cache space and bandwidth. Better to use containers that put data in contiguous peace of memory. You can get most of what class inheritance gives by using alias this on structs. struct example { base foo; alias foo this; } struct base {//data} Jacob is correct to point out that memory management should not be overlooked. DMD got a big improvement when it switched to bump the pointer allocator https://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941 After this article I read another blog post that was inspired by this and preallocated a bunch of memory and saw over 100% speed improvement. Because we dont have profile information our advice can only be vague and not specific.
This is one of my biggest discontents with the D design, that classes are forced to be reference types (it is actually a value type where the pointer is wrapped in the base object struct). The only way to have class expanded in the host class is to use struct, but struct is limited in many ways. Classes expanded in host classes has worked well in C++ so I don't really understand the design decision here. Allocated member classes can sometimes be beneficial in terms of resource handling (used in several classes for example) but in the majority of the times it is better to expand in the class. So if a class as several member classes, all of them needs to be allocated which takes its toll on the performance. It's kind of a Java-ish approach but it certainly has its performance drawbacks. There have been discussions to increase the capabilities of struct so that it can match class better so that people can use struct more but these changes have been denied.
May 11