digitalmars.D.learn - problem with gc?
- zhmt (19/19) May 26 2015 I am writing a echoclient, as below:
- zhmt (2/2) May 26 2015 I noticed that the cpu% falls from 99% down to 4% as well when
- zhmt (2/4) May 26 2015 I tried again for several times, the cpu is still busy, 98.9%.
- thedeemon (6/8) May 27 2015 Yes, on the first sight it looks like your allocations in the
- jklp (33/52) May 27 2015 Have you tried to declare str in the upper scope ?
- zhmt (3/21) May 27 2015 I have tried this, and throughput falls down as well too. Does
- zhmt (58/58) May 27 2015 When I enable the --profle, get something like this, it doesnt
- zhmt (2/2) May 27 2015 If I pass a timeout with 1ms to epoll_wait,the cpu will not be
- Anonymouse (6/9) May 27 2015 Tried callgrind and kcachegrind? If nothing else it's better at
- zhmt (57/66) May 27 2015 @Anonymouse
- zhmt (1/1) May 27 2015 It seems that dlang library is not so effient?
- Adam D. Ruppe (4/5) May 27 2015 The gamelibd could be doing a lot more than just echoing... it
- Anonymouse (6/14) May 27 2015 Without touching on the actual profile (which I'll leave to those
- Steven Schveighoffer (6/8) May 27 2015 I don't see any GC function here, I don't think you are are focusing on
- Kagamin (3/14) May 27 2015 You can use sformat
- Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn (14/42) May 27 2015 what happend if you use sformat instead?
- zhmt (4/19) May 27 2015 @Daniel Kozák
- =?UTF-8?B?Ik3DoXJjaW8=?= Martins" (4/5) May 27 2015 The code you posted is the client code, but the issue seems to be
- zhmt (31/36) May 27 2015 @Márcio Martins
- zhmt (7/7) May 27 2015 I think it is not problem of gc, it is my fault:
- zhmt (9/16) May 28 2015 I found the root of the problem :
- zhmt (1/1) May 28 2015 Thanks for all help.
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (4/5) May 28 2015 Thank you for debugging and reporting. :) I am sure this will be helpful...
- yawniek (2/6) May 28 2015 checkout msgpack-rpc-d, i remember handling around 300k req/s
I am writing a echoclient, as below: Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; for(int i=0; i<N; i++) { scope string str = format("%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close(); When it loops about more than 100000 times,the throughput will fall from 60000request/second to 26request/second. If the code changes a little as below: scope string str = format("%s",10000); //changes to a constant It will runs normally with speed of 60000request/second. I test for 13minutes, everything seems well. What happened when the code changes a little? Who will give an explaination,Thanks a lot?
May 26 2015
I noticed that the cpu% falls from 99% down to 4% as well when the throughput falls down.
May 26 2015
On Wednesday, 27 May 2015 at 05:51:21 UTC, zhmt wrote:I noticed that the cpu% falls from 99% down to 4% as well when the throughput falls down.I tried again for several times, the cpu is still busy, 98.9%.
May 26 2015
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:What happened when the code changes a little? Who will give an explaination,Thanks a lot?Yes, on the first sight it looks like your allocations in the loop make GC spend too much time. I don't think "scope" does anything here. Try adding "delete str;" to the end of the loop, it should do what your "scope" meant. I know it's deprecated but sometimes it's just right.
May 27 2015
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:I am writing a echoclient, as below: Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; for(int i=0; i<N; i++) { scope string str = format("%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close(); When it loops about more than 100000 times,the throughput will fall from 60000request/second to 26request/second. If the code changes a little as below: scope string str = format("%s",10000); //changes to a constant It will runs normally with speed of 60000request/second. I test for 13minutes, everything seems well. What happened when the code changes a little? Who will give an explaination,Thanks a lot?Have you tried to declare str in the upper scope ? --- Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; string str; for(int i=0; i<N; i++) { str = format("%s",i); conn.write(cast(ubyte[]) str); conn.read(buf[0..str.length]); n++; } conn.close(); --- And also you could try to surround the whole block with `GC.disable` and `GC.enable`. This would help to determine if the GC is involved: --- Ptr!Conn conn = connect("127.0.0.1",8881); GC.disable; ubyte[100] buf; string str; for(int i=0; i<N; i++) { str = format("%s",i); conn.write(cast(ubyte[]) str); conn.read(buf[0..str.length]); n++; } GC.enable; conn.close(); ---
May 27 2015
jklpAnd also you could try to surround the whole block with `GC.disable` and `GC.enable`. This would help to determine if the GC is involved: --- Ptr!Conn conn = connect("127.0.0.1",8881); GC.disable; ubyte[100] buf; string str; for(int i=0; i<N; i++) { str = format("%s",i); conn.write(cast(ubyte[]) str); conn.read(buf[0..str.length]); n++; } GC.enable; conn.close(); ---I have tried this, and throughput falls down as well too. Does this mean the gc is not involved? I am confused.
May 27 2015
When I enable the --profle, get something like this, it doesnt give me too much help: ======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs ======== Num Tree Func Per Calls Time Time Call 1298756 4996649885 4987577377 3840 int gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.writeSome(ubyte[]) 30358909 14353678347 4433441425 146 void gamelibd.net.linux.linuxconn.selectAndProcessNetEvents(ulong) 30359249 1670090296 1582462941 52 long gamelibd.util.utcNow() 1298744 874889280 858583339 661 _D8gamelibd3net5linux12TcpLinuxConn12TcpLinuxConn8readSomeMFAhZ9__lambda2MFZv 1300549 25997 803742026 618 int gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.readSome(ubyte[]) 2597506 756203332 702092233 270 void gamelibd.net.linux.epollapi.changeEv(int, int, void*, int) 30358909 9920075665 285890739 9 void gamelibd.net.linux.linuxconn.handleEvent(gamelibd.net.linux.epollapi.epoll_event[]) 1298744 5131800498 244989946 188 void gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.doRead(gamelibd.net.linux.epollapi.epoll_event*) 1298768 0 223952199 172 void gamelibd.net.linux.IoEventHandler.IoEventHandler.autoReaderFiberSetting(void delegate()) 1298756 5207337081 210684924 162 int gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.write(ubyte[]) 30358953 162443804 162443804 5 property bool gamelibd.util.LinkedList!(gamelibd.net.exceptionsafefiber.ExceptionSafeFiber).LinkedList.isEmpty() 648485 989705094 139972636 215 int gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.read(ubyte[]) 1298919 1008643391 124807374 96 void gamelibd.net.exceptionsafefiber.ExceptionSafeFiber.resume() 30359072 88431872 88431872 2 pure nothrow nogc safe gamelibd.net.exceptionsafefiber.TimerTask std.container.rbtree.__T12RedBlackTreeTC8gamelibd3net18exceptionsafefiber9TimerTaskVAyaa25_612e737461727454696d65203c20622e737461727454696d65Vbi1Z.RedBlackTree.front() 1298750 1574897328 87384898 67 void gamelibd.net.linux.IoEventHandler.IoEventHandler.tryResumeReaderFiber() 30359072 77325915 77325915 2 pure nothrow property nogc safe ulong std.container.rbtree.__T12RedBlackTreeTC8gamelibd3net18exceptionsafefiber9TimerTaskVAyaa25_612e737461727454696d65203c20622e737461727454696d65Vbi1Z.RedBlackTree.length() 4544831 61560347 61560347 13 pure safe bool std.exception.enforce!(bool).enforce(bool, lazy object.Throwable) 2597500 51905642 51905642 19 void gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.throwExceptionIfErrAfterOp(long) 1298951 49392056 49392056 38 gamelibd.net.exceptionsafefiber.ExceptionSafeFiber gamelibd.net.exceptionsafefiber.ExceptionSafeFiber.getThis() 2597518 80877181 46297298 17 gamelibd.mem.Ptr!(gamelibd.net.provider.Conn).Ptr.getinout(ref property inout(gamelibd.net.provider.Conn) function()) 2597693 51725730 37629038 14 gamelibd.mem.Ptr!(gamelibd.net.exceptionsafefiber.ExceptionSafeFib r).Ptr.getinout(ref property inout(gamelibd.net.exceptionsafefiber.ExceptionSafeFiber) function()) 1298756 479941655 34717468 26 void gamelibd.net.linux.IoEventHandler.IoEventHandler.enableRead()
May 27 2015
If I pass a timeout with 1ms to epoll_wait,the cpu will not be busy when throughput falls down.
May 27 2015
On Wednesday, 27 May 2015 at 08:42:01 UTC, zhmt wrote:When I enable the --profle, get something like this, it doesnt give me too much help: [...]Tried callgrind and kcachegrind? If nothing else it's better at illustrating the same output, assuming you have graphviz/dot installed. Given --dump-instr you can also get instruction-level profiling if you're really wondering what's happening.
May 27 2015
On Wednesday, 27 May 2015 at 09:39:42 UTC, Anonymouse wrote:On Wednesday, 27 May 2015 at 08:42:01 UTC, zhmt wrote:Anonymouse Thank u very much, I have tried this: valgrind --tool=callgrind ./gamelibdtest callgrind_annotate callgrind.out.29234 -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 124,918,002 ???:_D9invariant12_d_invariantFC6ObjectZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 27,235,809 ???:_D2gc2gc2GC6mallocMFNbmkPmxC8TypeInfoZPv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 22,484,758 /mnt/hgfs/mydoc/dworkspace/gamelibdtest//usr/include/dmd/phobos/std/format.d:_D3std6format74__T14formatUnsignedTS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkTaZ14formatUnsignedFNaNfS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkmKS3std6format18__T10FormatSpe TaZ10FormatSpeckbZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 19,171,341 ???:_D2gc2gc3Gcx10smallAllocMFNbhKmkZPv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 16,334,730 ???:pthread_mutex_lock [/usr/lib64/libpthread-2.17.so] 14,519,760 ???:pthread_mutex_unlock [/usr/lib64/libpthread-2.17.so] 14,503,272 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/epollapi.d:_D8gamelibd3net5linux8ep llapi8bitExistFiiZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 13,521,278 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpLinuxC nn8readSomeMFAhZi'2 [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,695,373 ???:_D2gc2gc3Gcx10smallAllocMFNbhKmkZ8tryAllocMFNbZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,558,147 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/linuxconn.d:_D8gamelibd3net5linux9linuxconn11handleEventFAS8gamelibd3net5linux8epo lapi11epoll_eventZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,091,173 ???:_D4core6thread5Fiber39__T4callVE4core6thread5Fiber7Rethrowi1Z4callM ZC6object9Throwable [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 11,153,058 /mnt/hgfs/mydoc/dworkspace/gamelibdtest//usr/include/dmd/phobos/std/format.d:_D3std6format18__T10FormatSpecTaZ10FormatSpec75__T17writeUpToNextSpecTS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkZ17writeUpToNextSpecMFNaNfS3std6format16__T7sformatTaTiZ7sf rmatFAaxAaiZ4SinkZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,877,409 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/epollapi.d:_D8gamelibd3net5linux8epoll pi8changeEvFiiPviZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,676,006 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpLinux onn9writeSomeMFAhZi [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,279,096 ???:pthread_getspecific [/usr/lib64/libpthread-2.17.so] 9,741,852 ???:__GI_memset [/usr/lib64/libc-2.17.so] 9,645,888 ???:_d_arraycopy [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 9,430,244 ???:_D2gc2gc3Gcx5sweepMFNbZm [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 8,866,869 ???:_D4core6atomic50__T11atomicStoreVE4core6atomic11MemoryOrderi0TbTbZ11atom cStoreFNaNbNiKObbZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 8,661,662 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpL nuxConn5writeMFAhZi [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest]When I enable the --profle, get something like this, it doesnt give me too much help: [...]Tried callgrind and kcachegrind? If nothing else it's better at illustrating the same output, assuming you have graphviz/dot installed. Given --dump-instr you can also get instruction-level profiling if you're really wondering what's happening.
May 27 2015
On Wednesday, 27 May 2015 at 10:27:08 UTC, zhmt wrote:It seems that dlang library is not so effient?The gamelibd could be doing a lot more than just echoing... it sounds to me that your socket might be blocking and epoll is busy looping waiting for it to become available again.
May 27 2015
On Wednesday, 27 May 2015 at 10:24:59 UTC, zhmt wrote:Anonymouse Thank u very much, I have tried this: valgrind --tool=callgrind ./gamelibdtest callgrind_annotate callgrind.out.29234 -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- [...]Without touching on the actual profile (which I'll leave to those better knowledgeable than I), open those callgrind.out.pid.* files in KCachegrind to get callgraphs. See https://imgur.com/AZEutCE,XXIDBvB,qehWaHg,bpTxS,3YfcJdh for some examples.
May 27 2015
On 5/27/15 2:42 AM, zhmt wrote:When I enable the --profle, get something like this, it doesnt give me too much help:I don't see any GC function here, I don't think you are are focusing on the right portion of the code. Seems like the gamelib library is consuming all the time. You may want to check with the author of that code for help. -Steve
May 27 2015
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:I am writing a echoclient, as below: Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; for(int i=0; i<N; i++) { scope string str = format("%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close();You can use sformat
May 27 2015
On Wed, 27 May 2015 05:48:11 +0000 zhmt via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:I am writing a echoclient, as below: Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; for(int i=0; i<N; i++) { scope string str = format("%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close(); When it loops about more than 100000 times,the throughput will fall from 60000request/second to 26request/second. If the code changes a little as below: scope string str = format("%s",10000); //changes to a constant It will runs normally with speed of 60000request/second. I test for 13minutes, everything seems well. What happened when the code changes a little? Who will give an explaination,Thanks a lot?what happend if you use sformat instead? Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; char[100] buf2; for(int i=0; i<N; i++) { auto str = sformat(buf2, "%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close();
May 27 2015
Daniel Kozák Thanks for reply. I have tried sformat, but it doesnt work: throughput falls down, and the cpu keeps busy.What happened when the code changes a little? Who will give an explaination,Thanks a lot?what happend if you use sformat instead? Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; char[100] buf2; for(int i=0; i<N; i++) { auto str = sformat(buf2, "%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close();
May 27 2015
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:The code you posted is the client code, but the issue seems to be on the server side. Can you post the server code and also the timing code?
May 27 2015
On Wednesday, 27 May 2015 at 19:04:53 UTC, Márcio Martins wrote:On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:Márcio Martins here is the server code: void echoServer() { Acceptor acc2 = new Acceptor(); acc2.listen("0.0.0.0",8881,100); acc2.accept((Ptr!Conn c){ scope(exit) c.close(); ubyte[5] buf; while(true) { int n = c.readSome(buf); if(n<=0){ break; } c.write(buf[0..n]); } writeFlush("close server sock"); }); } and I found the reason: the buffer size on server is too small, just five,so if the data size is more than 5(10000 for example),it will need two read operation。 The length of “10000” is 5,it perfectly matched the point of falling down。 If change the buffer size to 100, it runs well。 But why it falls sharply down to about 24,I will continue to study。The code you posted is the client code, but the issue seems to be on the server side. Can you post the server code and also the timing code?
May 27 2015
I think it is not problem of gc, it is my fault: The operations is serialized: clent send -> server recv -> server send -> client recv, so if one operation takes too long time, the throughput will definitely fall down. I cant explain why it so fast when buffer is big enough, and so low when buffer is smaller than data size.
May 27 2015
On Thursday, 28 May 2015 at 02:00:57 UTC, zhmt wrote:I think it is not problem of gc, it is my fault: The operations is serialized: clent send -> server recv -> server send -> client recv, so if one operation takes too long time, the throughput will definitely fall down. I cant explain why it so fast when buffer is big enough, and so low when buffer is smaller than data size.I found the root of the problem : tcp_nodelay, it is default false. After i modify it to true, the program runs exactly as what I expect. sorry, it is a low level error. the throughput is steady now: if buffer size is set to 1, throughput is about 20K response/second; when buffer size is big enough ,the throughput is about 60K response/second. It looks like resonable.
May 28 2015
On 05/28/2015 03:13 AM, zhmt wrote:Thanks for all help.Thank you for debugging and reporting. :) I am sure this will be helpful to many others. Ali
May 28 2015
On Thursday, 28 May 2015 at 10:11:38 UTC, zhmt wrote:On Thursday, 28 May 2015 at 02:00:57 UTC, zhmt wrote:the throughput is steady now: if buffer size is set to 1, throughput is about 20K response/second; when buffer size is big enough ,the throughput is about 60K response/second.checkout msgpack-rpc-d, i remember handling around 300k req/s
May 28 2015