www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - problem with gc?

reply "zhmt" <zhmtzhmt qq.com> writes:
I am writing a echoclient, as below:

Ptr!Conn conn = connect("127.0.0.1",8881);
ubyte[100] buf;
for(int i=0; i<N; i++)
{
	scope string str = format("%s",i);
	conn.write((cast(ubyte*)str.ptr)[0..str.length]);
	conn.read(buf[0..str.length]);
	n++;
}
conn.close();


When it loops about more than 100000 times,the throughput will 
fall from 60000request/second to 26request/second.

If the code changes a little as below:

  scope string str = format("%s",10000); //changes to a constant

It will runs normally with speed of 60000request/second. I test 
for 13minutes, everything seems well.



What happened when the code changes a little? Who will give an 
explaination,Thanks a lot?
May 26 2015
next sibling parent reply "zhmt" <zhmtzhmt qq.com> writes:
I noticed that the cpu% falls from 99% down to 4% as well when 
the throughput falls down.
May 26 2015
parent "zhmt" <zhmtzhmt qq.com> writes:
On Wednesday, 27 May 2015 at 05:51:21 UTC, zhmt wrote:
 I noticed that the cpu% falls from 99% down to 4% as well when 
 the throughput falls down.
I tried again for several times, the cpu is still busy, 98.9%.
May 26 2015
prev sibling next sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:

 What happened when the code changes a little? Who will give an 
 explaination,Thanks a lot?
Yes, on the first sight it looks like your allocations in the loop make GC spend too much time. I don't think "scope" does anything here. Try adding "delete str;" to the end of the loop, it should do what your "scope" meant. I know it's deprecated but sometimes it's just right.
May 27 2015
prev sibling next sibling parent reply "jklp" <jklp nowhere.fr> writes:
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:
 I am writing a echoclient, as below:

 Ptr!Conn conn = connect("127.0.0.1",8881);
 ubyte[100] buf;
 for(int i=0; i<N; i++)
 {
 	scope string str = format("%s",i);
 	conn.write((cast(ubyte*)str.ptr)[0..str.length]);
 	conn.read(buf[0..str.length]);
 	n++;
 }
 conn.close();


 When it loops about more than 100000 times,the throughput will 
 fall from 60000request/second to 26request/second.

 If the code changes a little as below:

  scope string str = format("%s",10000); //changes to a constant

 It will runs normally with speed of 60000request/second. I test 
 for 13minutes, everything seems well.



 What happened when the code changes a little? Who will give an 
 explaination,Thanks a lot?
Have you tried to declare str in the upper scope ? --- Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; string str; for(int i=0; i<N; i++) { str = format("%s",i); conn.write(cast(ubyte[]) str); conn.read(buf[0..str.length]); n++; } conn.close(); --- And also you could try to surround the whole block with `GC.disable` and `GC.enable`. This would help to determine if the GC is involved: --- Ptr!Conn conn = connect("127.0.0.1",8881); GC.disable; ubyte[100] buf; string str; for(int i=0; i<N; i++) { str = format("%s",i); conn.write(cast(ubyte[]) str); conn.read(buf[0..str.length]); n++; } GC.enable; conn.close(); ---
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
 jklp

 And also you could try to surround the whole block with 
 `GC.disable` and `GC.enable`. This would help to determine if 
 the GC is involved:

 ---
 Ptr!Conn conn = connect("127.0.0.1",8881);
 GC.disable;
 ubyte[100] buf;
 string str;
 for(int i=0; i<N; i++)
 {
     str = format("%s",i);
     conn.write(cast(ubyte[]) str);
     conn.read(buf[0..str.length]);
     n++;
 }
 GC.enable;
 conn.close();
 ---
I have tried this, and throughput falls down as well too. Does this mean the gc is not involved? I am confused.
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
When I enable the --profle, get something like this, it doesnt 
give me too much help:

======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs 
========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

1298756  4996649885  4987577377        3840     int 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.writeSome(ubyte[])
30358909 14353678347  4433441425         146     void 
gamelibd.net.linux.linuxconn.selectAndProcessNetEvents(ulong)
30359249  1670090296  1582462941          52     long 
gamelibd.util.utcNow()
1298744   874889280   858583339         661     
_D8gamelibd3net5linux12TcpLinuxConn12TcpLinuxConn8readSomeMFAhZ9__lambda2MFZv
1300549       25997   803742026         618     int 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.readSome(ubyte[])
2597506   756203332   702092233         270     void 
gamelibd.net.linux.epollapi.changeEv(int, int, void*, int)
30358909  9920075665   285890739           9     void 
gamelibd.net.linux.linuxconn.handleEvent(gamelibd.net.linux.epollapi.epoll_event[])
1298744  5131800498   244989946         188     void 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.doRead(gamelibd.net.linux.epollapi.epoll_event*)
1298768           0   223952199         172     void 
gamelibd.net.linux.IoEventHandler.IoEventHandler.autoReaderFiberSetting(void 
delegate())
1298756  5207337081   210684924         162     int 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.write(ubyte[])
30358953   162443804   162443804           5      property bool 
gamelibd.util.LinkedList!(gamelibd.net.exceptionsafefiber.ExceptionSafeFiber).LinkedList.isEmpty()
  648485   989705094   139972636         215     int 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.read(ubyte[])
1298919  1008643391   124807374          96     void 
gamelibd.net.exceptionsafefiber.ExceptionSafeFiber.resume()
30359072    88431872    88431872           2     pure nothrow 
 nogc  safe gamelibd.net.exceptionsafefiber.TimerTask 
std.container.rbtree.__T12RedBlackTreeTC8gamelibd3net18exceptionsafefiber9TimerTaskVAyaa25_612e737461727454696d65203c20622e737461727454696d65Vbi1Z.RedBlackTree.front()
1298750  1574897328    87384898          67     void 
gamelibd.net.linux.IoEventHandler.IoEventHandler.tryResumeReaderFiber()
30359072    77325915    77325915           2     pure nothrow 
 property  nogc  safe ulong 
std.container.rbtree.__T12RedBlackTreeTC8gamelibd3net18exceptionsafefiber9TimerTaskVAyaa25_612e737461727454696d65203c20622e737461727454696d65Vbi1Z.RedBlackTree.length()
4544831    61560347    61560347          13     pure  safe bool 
std.exception.enforce!(bool).enforce(bool, lazy object.Throwable)
2597500    51905642    51905642          19     void 
gamelibd.net.linux.TcpLinuxConn.TcpLinuxConn.throwExceptionIfErrAfterOp(long)
1298951    49392056    49392056          38     
gamelibd.net.exceptionsafefiber.ExceptionSafeFiber 
gamelibd.net.exceptionsafefiber.ExceptionSafeFiber.getThis()
2597518    80877181    46297298          17     
gamelibd.mem.Ptr!(gamelibd.net.provider.Conn).Ptr.getinout(ref 
 property inout(gamelibd.net.provider.Conn) function())
2597693    51725730    37629038          14     
gamelibd.mem.Ptr!(gamelibd.net.exceptionsafefiber.ExceptionSafeFib
r).Ptr.getinout(ref 
 property 
inout(gamelibd.net.exceptionsafefiber.ExceptionSafeFiber) 
function())
1298756   479941655    34717468          26     void 
gamelibd.net.linux.IoEventHandler.IoEventHandler.enableRead()
May 27 2015
next sibling parent "zhmt" <zhmtzhmt qq.com> writes:
If I pass a timeout with 1ms to epoll_wait,the cpu will not be 
busy when throughput falls down.
May 27 2015
prev sibling next sibling parent reply "Anonymouse" <herp derp.nl> writes:
On Wednesday, 27 May 2015 at 08:42:01 UTC, zhmt wrote:
 When I enable the --profle, get something like this, it doesnt 
 give me too much help:
 [...]
Tried callgrind and kcachegrind? If nothing else it's better at illustrating the same output, assuming you have graphviz/dot installed. Given --dump-instr you can also get instruction-level profiling if you're really wondering what's happening.
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
On Wednesday, 27 May 2015 at 09:39:42 UTC, Anonymouse wrote:
 On Wednesday, 27 May 2015 at 08:42:01 UTC, zhmt wrote:
 When I enable the --profle, get something like this, it doesnt 
 give me too much help:
 [...]
Tried callgrind and kcachegrind? If nothing else it's better at illustrating the same output, assuming you have graphviz/dot installed. Given --dump-instr you can also get instruction-level profiling if you're really wondering what's happening.
Anonymouse Thank u very much, I have tried this: valgrind --tool=callgrind ./gamelibdtest callgrind_annotate callgrind.out.29234 -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 124,918,002 ???:_D9invariant12_d_invariantFC6ObjectZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 27,235,809 ???:_D2gc2gc2GC6mallocMFNbmkPmxC8TypeInfoZPv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 22,484,758 /mnt/hgfs/mydoc/dworkspace/gamelibdtest//usr/include/dmd/phobos/std/format.d:_D3std6format74__T14formatUnsignedTS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkTaZ14formatUnsignedFNaNfS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkmKS3std6format18__T10FormatSpe TaZ10FormatSpeckbZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 19,171,341 ???:_D2gc2gc3Gcx10smallAllocMFNbhKmkZPv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 16,334,730 ???:pthread_mutex_lock [/usr/lib64/libpthread-2.17.so] 14,519,760 ???:pthread_mutex_unlock [/usr/lib64/libpthread-2.17.so] 14,503,272 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/epollapi.d:_D8gamelibd3net5linux8ep llapi8bitExistFiiZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 13,521,278 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpLinuxC nn8readSomeMFAhZi'2 [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,695,373 ???:_D2gc2gc3Gcx10smallAllocMFNbhKmkZ8tryAllocMFNbZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,558,147 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/linuxconn.d:_D8gamelibd3net5linux9linuxconn11handleEventFAS8gamelibd3net5linux8epo lapi11epoll_eventZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 12,091,173 ???:_D4core6thread5Fiber39__T4callVE4core6thread5Fiber7Rethrowi1Z4callM ZC6object9Throwable [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 11,153,058 /mnt/hgfs/mydoc/dworkspace/gamelibdtest//usr/include/dmd/phobos/std/format.d:_D3std6format18__T10FormatSpecTaZ10FormatSpec75__T17writeUpToNextSpecTS3std6format16__T7sformatTaTiZ7sformatFAaxAaiZ4SinkZ17writeUpToNextSpecMFNaNfS3std6format16__T7sformatTaTiZ7sf rmatFAaxAaiZ4SinkZb [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,877,409 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/epollapi.d:_D8gamelibd3net5linux8epoll pi8changeEvFiiPviZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,676,006 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpLinux onn9writeSomeMFAhZi [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 10,279,096 ???:pthread_getspecific [/usr/lib64/libpthread-2.17.so] 9,741,852 ???:__GI_memset [/usr/lib64/libc-2.17.so] 9,645,888 ???:_d_arraycopy [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 9,430,244 ???:_D2gc2gc3Gcx5sweepMFNbZm [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 8,866,869 ???:_D4core6atomic50__T11atomicStoreVE4core6atomic11MemoryOrderi0TbTbZ11atom cStoreFNaNbNiKObbZv [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest] 8,661,662 /mnt/hgfs/mydoc/dworkspace/gamelibdtest/../gamelibd/source/gamelibd/net/linux/TcpLinuxConn.d:_D8gamelibd3net5linux12TcpLinuxConn12TcpL nuxConn5writeMFAhZi [/mnt/hgfs/mydoc/dworkspace/bin/gamelibdtest]
May 27 2015
next sibling parent reply "zhmt" <zhmtzhmt qq.com> writes:
It seems that  dlang library is not so effient?
May 27 2015
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 27 May 2015 at 10:27:08 UTC, zhmt wrote:
 It seems that  dlang library is not so effient?
The gamelibd could be doing a lot more than just echoing... it sounds to me that your socket might be blocking and epoll is busy looping waiting for it to become available again.
May 27 2015
prev sibling parent "Anonymouse" <herp derp.nl> writes:
On Wednesday, 27 May 2015 at 10:24:59 UTC, zhmt wrote:
  Anonymouse

 Thank u very much, I have tried this:

  valgrind  --tool=callgrind ./gamelibdtest

 callgrind_annotate callgrind.out.29234

 --------------------------------------------------------------------------------
          Ir  file:function
 --------------------------------------------------------------------------------
 [...]
Without touching on the actual profile (which I'll leave to those better knowledgeable than I), open those callgrind.out.pid.* files in KCachegrind to get callgraphs. See https://imgur.com/AZEutCE,XXIDBvB,qehWaHg,bpTxS,3YfcJdh for some examples.
May 27 2015
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/27/15 2:42 AM, zhmt wrote:
 When I enable the --profle, get something like this, it doesnt give me
 too much help:
I don't see any GC function here, I don't think you are are focusing on the right portion of the code. Seems like the gamelib library is consuming all the time. You may want to check with the author of that code for help. -Steve
May 27 2015
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:
 I am writing a echoclient, as below:

 Ptr!Conn conn = connect("127.0.0.1",8881);
 ubyte[100] buf;
 for(int i=0; i<N; i++)
 {
 	scope string str = format("%s",i);
 	conn.write((cast(ubyte*)str.ptr)[0..str.length]);
 	conn.read(buf[0..str.length]);
 	n++;
 }
 conn.close();
You can use sformat http://dlang.org/phobos/std_format.html#.sformat with your buffer.
May 27 2015
prev sibling next sibling parent reply Daniel =?UTF-8?B?S296w6Fr?= via Digitalmars-d-learn writes:
On Wed, 27 May 2015 05:48:11 +0000
zhmt via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:

 I am writing a echoclient, as below:
 
 Ptr!Conn conn = connect("127.0.0.1",8881);
 ubyte[100] buf;
 for(int i=0; i<N; i++)
 {
 	scope string str = format("%s",i);
 	conn.write((cast(ubyte*)str.ptr)[0..str.length]);
 	conn.read(buf[0..str.length]);
 	n++;
 }
 conn.close();
 
 
 When it loops about more than 100000 times,the throughput will 
 fall from 60000request/second to 26request/second.
 
 If the code changes a little as below:
 
   scope string str = format("%s",10000); //changes to a constant
 
 It will runs normally with speed of 60000request/second. I test 
 for 13minutes, everything seems well.
 
 
 
 What happened when the code changes a little? Who will give an 
 explaination,Thanks a lot?
what happend if you use sformat instead? Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; char[100] buf2; for(int i=0; i<N; i++) { auto str = sformat(buf2, "%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close();
May 27 2015
parent "zhmt" <zhmtzhmt qq.com> writes:
 
 What happened when the code changes a little? Who will give an 
 explaination,Thanks a lot?
what happend if you use sformat instead? Ptr!Conn conn = connect("127.0.0.1",8881); ubyte[100] buf; char[100] buf2; for(int i=0; i<N; i++) { auto str = sformat(buf2, "%s",i); conn.write((cast(ubyte*)str.ptr)[0..str.length]); conn.read(buf[0..str.length]); n++; } conn.close();
Daniel Kozák Thanks for reply. I have tried sformat, but it doesnt work: throughput falls down, and the cpu keeps busy.
May 27 2015
prev sibling parent reply =?UTF-8?B?Ik3DoXJjaW8=?= Martins" <marcioapm gmail.com> writes:
On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:
 
The code you posted is the client code, but the issue seems to be on the server side. Can you post the server code and also the timing code?
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
On Wednesday, 27 May 2015 at 19:04:53 UTC, Márcio Martins wrote:
 On Wednesday, 27 May 2015 at 05:48:13 UTC, zhmt wrote:
 
The code you posted is the client code, but the issue seems to be on the server side. Can you post the server code and also the timing code?
Márcio Martins here is the server code: void echoServer() { Acceptor acc2 = new Acceptor(); acc2.listen("0.0.0.0",8881,100); acc2.accept((Ptr!Conn c){ scope(exit) c.close(); ubyte[5] buf; while(true) { int n = c.readSome(buf); if(n<=0){ break; } c.write(buf[0..n]); } writeFlush("close server sock"); }); } and I found the reason: the buffer size on server is too small, just five,so if the data size is more than 5(10000 for example),it will need two read operation。 The length of “10000” is 5,it perfectly matched the point of falling down。 If change the buffer size to 100, it runs well。 But why it falls sharply down to about 24,I will continue to study。
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
I think it is not problem of gc, it is my fault:
The operations is serialized:
clent send  -> server recv  -> server send  -> client recv,
so if one operation takes too long time, the throughput will 
definitely fall down.

I cant explain why it so fast when buffer is big enough, and so 
low when buffer is smaller than data size.
May 27 2015
parent reply "zhmt" <zhmtzhmt qq.com> writes:
On Thursday, 28 May 2015 at 02:00:57 UTC, zhmt wrote:
 I think it is not problem of gc, it is my fault:
 The operations is serialized:
 clent send  -> server recv  -> server send  -> client recv,
 so if one operation takes too long time, the throughput will 
 definitely fall down.

 I cant explain why it so fast when buffer is big enough, and so 
 low when buffer is smaller than data size.
I found the root of the problem : tcp_nodelay, it is default false. After i modify it to true, the program runs exactly as what I expect. sorry, it is a low level error. the throughput is steady now: if buffer size is set to 1, throughput is about 20K response/second; when buffer size is big enough ,the throughput is about 60K response/second. It looks like resonable.
May 28 2015
next sibling parent reply "zhmt" <zhmtzhmt qq.com> writes:
Thanks for all help.
May 28 2015
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/28/2015 03:13 AM, zhmt wrote:
 Thanks for all help.
Thank you for debugging and reporting. :) I am sure this will be helpful to many others. Ali
May 28 2015
prev sibling parent "yawniek" <yawniek srtnwz.com> writes:
On Thursday, 28 May 2015 at 10:11:38 UTC, zhmt wrote:
 On Thursday, 28 May 2015 at 02:00:57 UTC, zhmt wrote:
 the throughput is steady now: if buffer size is set to 1, 
 throughput is about 20K response/second; when buffer size is 
 big enough ,the throughput is about 60K response/second.
checkout msgpack-rpc-d, i remember handling around 300k req/s
May 28 2015