www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Significant GC performance penalty

On Fri, Dec 14, 2012 at 07:27:26PM +0100, Rob T wrote:
 I created a D library wrapper for sqlite3 that uses a dynamically
 constructed result list for returned records from a SELECT
 statement. It works in a similar way to a C++ version that I wrote a
 while back.

Hmm, I seem to have heard wind of an existing D sqlite3 wrapper somewhere, but that may have been D1/Tango, I'm not sure. [...]
 I remembered reading about people having performance problems with
 the GC, so I tried a quick fix, which was to disable the GC before
 the SELECT is run and re-enable afterwards. The result of doing that
 was a 3x performance boost, making the DMD compiled version run
 almost as fast as the C++ version. The DMD compiled version is now
 only 2 seconds slower on my stress test runs of a SELECT that
 returns 200,000+ records with 14 fields. Not too bad! I may get
 identical performance if I compile using gdc, but that will have to
 wait until it is updated to 2.061.
 
 Fixing this was a major relief since the code is expected to be used
 in a commercial setting. I'm wondering though, why the GC causes
 such a large penalty, and what negative effect if any if there will
 be when disabling the GC temporarily. I know that memory won't be
 reclaimed until the GC is re-enabled, but is there anything else to
 worry about?

AFAIK, it should be safe to disable the GC during that time as long as you're aware of the possibility of running out of memory in the interim. But there's also the issue that a good, enterprise-quality GC is very VERY hard to write, and especially hard for a language like D which allows you to do system-level stuff like pointer casting and unions (though thanks to its advanced features you rarely need to do such things). This forces the GC to be conservative, which complicates it and also affects its performance. The difficulty of the task made it so that our current GC does leave much to be desired. However, there's been talk of a (semi-)precise GC in the works, so hopefully we'll start getting a better GC in the near future. [...]
 Coming from C++ I *really* did not like having the GC, it made me
 very nervous, but now that I'm used to having it, I've come to like
 having it up to a point. It really does change the way you think and
 code. However as I've discovered, you still have to always be
 thinking about memory management issues because the GC can eat up a
 huge performance penalty under certain situations. I also NEED to
 know that I can always go full manual where necessary. There's no
 way I would want to give up that kind of control.

Totally understand what you mean. I also came from C/C++, and the fact that D relies on a GC actually put me off trying out D for some time. It took me a while before being convinced to at least give it a try. There was a particular article that convinced me, but I can't recall which one it was right now. Basically, the point was that having a GC frees up your mind from having to constantly worry about memory management issues, and actually think about the actual algorithm you're working on. It also eliminates memory leakage that comes from careless coding -- which happens all too often in C/C++, as shown in my day job where we're constantly chasing down memory leak bugs. We're all human, after all, and prone to slip-ups every now and then. All it takes is a single slip, and your app will eventually eat up all memory and bring down the system. Usually on the customer's live environment, which is really the only place where your code actually runs for sufficiently long periods of time for the bug to show up (QA theoretically is supposed to test this, but doesn't have that luxury due to release deadlines). After having gotten used to D and its GC, I have to say that my coding is much more efficient. I tend to use string operations quite often, and it's quite a big relief to not have to constantly worry about managing memory for the strings manually. (String manipulation is a royal pain in C/C++, so much so that sometimes I resort to Perl to get the job done.) Having said that, though, I agree that there *are* times when you want to, and *need* to, manage memory manually. A GC relieves you of manual memory management for the general case, but when optimizing the hotspots in your code, nothing beats a hand-crafted manual memory management scheme designed specifically for what you're doing. For that, D does let you call the C library's malloc() and free() yourself, and manage the pointers manually. You can then use Phobos' emplace function to create D objects in your manually-allocated memory blocks, and thus still enjoy some of D's advanced features to an extent. You can, of course, also temporarily turn off the GC during time-sensitive points where you don't want a collection cycle to start on you. [...]
 What I have not yet had the opportunity to explore, is using D in
 full manual memory management mode. My understanding is that if I
 take that route, then I cannot use certain parts of the std lib, and
 will also loose a few of the nice features of D that make it fun to
 work with. I'm not fully clear though on what to expect, so if
 there's any detailed information to look at, it would be a big help.

You could ask Manu, who IIRC uses full manual memory management in D. Or search the forums for using D without the GC -- I think somebody has posted the details before.
 I wonder what can be done to allow a programmer to go fully manual,
 while not loosing any of the nice features of D?

I don't think you'll be able to get 100% of D's features without a GC. Some features are simply too complicated to implement otherwise, such as array slicing + free appending. Hopefully more of D will be usable once Andrei (or whoever it was) works out the custom allocators design for Phobos. I think as of right now, the functions in std.range and std.algorithm should all be GC-free, as long as you don't use things like delegates. (I believe Jonathan has said that if any std.range or std.algorithm functions have implicit memory allocation, it should be considered as a bug.) On Fri, Dec 14, 2012 at 08:24:38PM +0100, Rob T wrote:
 On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander wrote:

I avoid using the GC when using D and I feel like I still have a
lot of freedom of expression, but maybe I'm just used to it.

I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time. At the end of the day, there's just no point in having a GC if you don't want to use it, so the big question is if a GC can be made to work much better than what we have? Supposedly yes, but will the improvements really matter? I somehow doubt it will.

For me, one big plus with the GC is that I can actually concentrate on improving my algorithms instead of being bogged down constantly by memory management issues. I think that has led me to write much better code than when I was coding in C/C++. It has also eliminated those annoying pointer bugs and memory leaks, and the countless hours spent debugging them. But, as with anything worthwhile in programming, the GC comes with a cost, so sometimes you will suffer from performance degradation. But like I said in another thread, your program's hotspots are often not where you think they are; you need actual profiling to figure out where the performance problems are. Once you locate those, you can apply some workarounds like temporarily disable the GC, or switch to manual memory management, etc.. I don't think GCs will ever get to the point where they will be both maximally-performant *and* not require any effort from the programmer. There are only two ways to implement that, and only the third one works. :-P
 When I look at GC based apps, what they all seem to have in common,
 is that they tend to eat up vast amounts of RAM for nothing and
 perform poorly. I'm speaking mostly about Java apps, they are
 terrible with performance and memory foot print in general. But also
 C++ apps that use built in GC tend to have similar issues.

If you're worried about performance, you might want to consider using GDC or LDC. IME, GDC consistently produces D executables that are at least 20-30% faster than what DMD produces, simply because GCC has a far more advanced optimization framework in its backend. Sometimes it can be 40-50% faster, though YMMV.
 It may be that the GC concept works far better in theory than in
 practice, although due to the performance penalty work-a-rounds, you
 may end up writing better performing apps because of it, however
 that's NOT the intention of having a GC!

Well, everything comes at a cost. :-) The GC lets you develop programs faster with less pain (and virtually no memory-related bugs), but you have to pay in performance. Manual memory management lets you maximize performance, but then you have to pay in countless headaches over finding pointer bugs and memory leaks. You can't have both. :) (Unless it's both bad performance *and* pointer bug headaches. :-P) A good middle ground is to use the GC for the common cases where performance isn't important, and optimize with manual memory management in your hotspots where performance matters. (And make sure you profile before you do anything, 'cos like I said, your hotspots often aren't where you think they are. I learnt that the hard way.) T -- Why do conspiracy theories always come from the same people??
Dec 14 2012