www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Significant GC performance penalty

reply "Rob T" <rob ucora.com> writes:
I created a D library wrapper for sqlite3 that uses a dynamically 
constructed result list for returned records from a SELECT 
statement. It works in a similar way to a C++ version that I 
wrote a while back.

The D code is D code, not a cloned up version of my earlier C++ 
code, so it makes use of many of the features of D, and one of 
them is the garbage collector.

When running comparison tests between the C++ version and the D 
version, both compiled using performance optimization flags, the 
C++ version runs 3x faster than the D version which was very 
unexpected. If anything I was hoping for a performance boost out 
of D or at least the same performance levels.

I remembered reading about people having performance problems 
with the GC, so I tried a quick fix, which was to disable the GC 
before the SELECT is run and re-enable afterwards. The result of 
doing that was a 3x performance boost, making the DMD compiled 
version run almost as fast as the C++ version. The DMD compiled 
version is now only 2 seconds slower on my stress test runs of a 
SELECT that returns 200,000+ records with 14 fields. Not too bad! 
I may get identical performance if I compile using gdc, but that 
will have to wait until it is updated to 2.061.

Fixing this was a major relief since the code is expected to be 
used in a commercial setting. I'm wondering though, why the GC 
causes such a large penalty, and what negative effect if any if 
there will be when disabling the GC temporarily. I know that 
memory won't be reclaimed until the GC is re-enabled, but is 
there anything else to worry about?

I feel it's worth commenting on my experience as feed back for 
the D developers and anyone else starting off with D.

Coming from C++ I *really* did not like having the GC, it made me 
very nervous, but now that I'm used to having it, I've come to 
like having it up to a point. It really does change the way you 
think and code. However as I've discovered, you still have to 
always be thinking about memory management issues because the GC 
can eat up a huge performance penalty under certain situations. I 
also NEED to know that I can always go full manual where 
necessary. There's no way I would want to give up that kind of 
control.

The trade off with having a GC seems to be that by default, C++ 
apps will perform considerably faster than equivalent D apps 
out-of-the-box, simply because the manual memory management is 
fine tuned by the programmer as the development proceeds. With D, 
when you simply let the GC take care of business, then you are 
not necessarily fine tuning as you go along, and when you do not 
take the resulting performance hit into consideration it means 
that your apps will likely perform poorly compared to a C++ 
equivalent. However, building the equivalent app in D is a much 
more pleasant experience in terms of the programming productivity 
gain. The code is simpler to deal with, and there's less to worry 
about with pointers and other memory management issues.

What I have not yet had the opportunity to explore, is using D in 
full manual memory management mode. My understanding is that if I 
take that route, then I cannot use certain parts of the std lib, 
and will also loose a few of the nice features of D that make it 
fun to work with. I'm not fully clear though on what to expect, 
so if there's any detailed information to look at, it would be a 
big help.

I wonder what can be done to allow a programmer to go fully 
manual, while not loosing any of the nice features of D?

Also, I think everyone agrees we really need a better GC, and I 
wonder once we do get a better GC, what kind of overall 
improvements we can expect to see?

Thanks for listening.

--rt
Dec 14 2012
next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
Allocating memory is simply slow. The same is true in C++ where 
you will see performance hits if you allocate memory too often. 
The GC makes things worse, but if you really care about 
performance then you'll avoid allocating memory so often.

Try to pre-allocate as much as possible, and use the stack 
instead of the heap where possible. Fixed size arrays and structs 
are your friend.

I avoid using the GC when using D and I feel like I still have a 
lot of freedom of expression, but maybe I'm just used to it.
Dec 14 2012
parent reply "Rob T" <rob ucora.com> writes:
On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander 
wrote:
 Allocating memory is simply slow. The same is true in C++ where 
 you will see performance hits if you allocate memory too often. 
 The GC makes things worse, but if you really care about 
 performance then you'll avoid allocating memory so often.

 Try to pre-allocate as much as possible, and use the stack 
 instead of the heap where possible. Fixed size arrays and 
 structs are your friend.
In my situation, I can think of some ways to mitigate the memory allocation problem, however it's a bit tricky when SELECT statement results have to be dynamically generated, since the number of rows returned and size and type of the rows are always different depending on the query and the data stored in the database. It's just not at all practical to custom fit for each SELECT to a pre-allocated array or list, it'll just be far too much manual effort. I could consider generating a free list of pre-allocated record components that is re-used rather than destroyed and reallocated. However knowing how many records to pre-allocate is tricky and I could run out, or waste tons of RAM for nothing most of the time. End of day, I may be better off digging into the GC source code itself and look for solutions.
 I avoid using the GC when using D and I feel like I still have 
 a lot of freedom of expression, but maybe I'm just used to it.
I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time. At the end of the day, there's just no point in having a GC if you don't want to use it, so the big question is if a GC can be made to work much better than what we have? Supposedly yes, but will the improvements really matter? I somehow doubt it will. When I look at GC based apps, what they all seem to have in common, is that they tend to eat up vast amounts of RAM for nothing and perform poorly. I'm speaking mostly about Java apps, they are terrible with performance and memory foot print in general. But also C++ apps that use built in GC tend to have similar issues. It may be that the GC concept works far better in theory than in practice, although due to the performance penalty work-a-rounds, you may end up writing better performing apps because of it, however that's NOT the intention of having a GC! --rt
Dec 14 2012
next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
 In my situation, I can think of some ways to mitigate the 
 memory allocation  problem, however it's a bit tricky when 
 SELECT statement results have to be dynamically generated, 
 since the number of rows returned and size and type of the rows 
 are always different depending on the query and the data stored 
 in the database. It's just not at all practical to custom fit 
 for each SELECT to a pre-allocated array or list, it'll just be 
 far too much manual effort.
Maybe I have misunderstood, but it sounds to me like you could get away with a single allocation there. Just reducing the number of allocations will improve things a lot.
 I avoid using the GC when using D and I feel like I still have 
 a lot of freedom of expression, but maybe I'm just used to it.
I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time.
This shouldn't be a problem. I occasionally recompile druntime with a printf inside the allocation function just to make sure, but normally I can tell if memory allocations are going on because of the sudden GC pauses.
 At the end of the day, there's just no point in having a GC if 
 you don't want to use it, so the big question is if a GC can be 
 made to work much better than what we have? Supposedly yes, but 
 will the improvements really matter? I somehow doubt it will.
D's GC has a lot of headroom for improvement. A generational GC will likely improve things a lot.
 When I look at GC based apps, what they all seem to have in 
 common, is that they tend to eat up vast amounts of RAM for 
 nothing and perform poorly. I'm speaking mostly about Java 
 apps, they are terrible with performance and memory foot print 
 in general. But also C++ apps that use built in GC tend to have 
 similar issues.
The problem with Java is not just because of the GC. Java eats up huge amounts of memory because it has no value-types, so *everything* has to be allocated on the heap, and every object has 16 bytes of overhead (on 64-bit systems) in addition to memory manager overhead. This is a great presentation on the subject of Java memory efficiency: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
 It may be that the GC concept works far better in theory than 
 in practice, although due to the performance penalty 
 work-a-rounds, you may end up writing better performing apps 
 because of it, however that's NOT the intention of having a GC!
When it comes to performance, there is always a compromise with usability. Even malloc performs poorly compared to more manual memory management. Even automatic register allocation by the compiler can lead to poor performance. The only question is where you want to draw the line between usability and performance.
Dec 14 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
 On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
[...]
It may be that the GC concept works far better in theory than in
practice, although due to the performance penalty work-a-rounds,
you may end up writing better performing apps because of it,
however that's NOT the intention of having a GC!
When it comes to performance, there is always a compromise with usability. Even malloc performs poorly compared to more manual memory management. Even automatic register allocation by the compiler can lead to poor performance. The only question is where you want to draw the line between usability and performance.
Yeah. If you want to squeeze out every last drop of juice your CPU's got to offer you, you could code directly in assembler, and no optimizing compiler, GC or no GC, will be able to beat that. But people stopped writing entire apps in assembler a long time ago. :-) (I actually did that once, many years ago, for a real app that actually made a sale or two. It was a good learning experience, and helped me improve my coding skills just from knowing how the machine actually works under the hood, as well as learning why it's so important to write code in a well-structured way -- you have no choice when doing large-scale coding in assembler, 'cos otherwise your assembly code quickly devolves into a spaghetti paste soup that no human can possibly comprehend. So I'd say it was a profitable, even rewarding experience. But I wouldn't do it again today, given the choice.) T -- Ruby is essentially Perl minus Wall.
Dec 14 2012
next sibling parent "Rob T" <rob ucora.com> writes:
On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
 (I actually did that once, many years ago, for a real app that 
 actually
 made a sale or two. It was a good learning experience, and 
 helped me
 improve my coding skills just from knowing how the machine 
 actually
 works under the hood, as well as learning why it's so important 
 to write
 code in a well-structured way -- you have no choice when doing
 large-scale coding in assembler, 'cos otherwise your assembly 
 code
 quickly devolves into a spaghetti paste soup that no human can 
 possibly
 comprehend. So I'd say it was a profitable, even rewarding 
 experience.
 But I wouldn't do it again today, given the choice.)


 T
Yeah, I did that too long ago and I'm happy to have learned the skills because it's the ultimate coding experience imaginable. If you don't do it very carefully, it goes all to hell just like you say. Best to let the machines do it these days, even if I could do it 10x better, it'll take me 100's of years to do what I can do now in a day. Everyone, thanks for the responses. I got some great ideas already to try out. I think at the end of the day, my code will be better performing than my old C++ version simply because I will be considering the costs of memory allocations which was something I really never thought about much before. I guess that's the positive side effect to the negative side effect of using a GC. I agree like many of you have commented, having a GC is a pro-con trade off, positive in some ways, but not all. Optimize only where you need to, and let the GC deal with the rest. --rt
Dec 14 2012
prev sibling parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
 On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
 On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
[...]
It may be that the GC concept works far better in theory than 
in
practice, although due to the performance penalty 
work-a-rounds,
you may end up writing better performing apps because of it,
however that's NOT the intention of having a GC!
When it comes to performance, there is always a compromise with usability. Even malloc performs poorly compared to more manual memory management. Even automatic register allocation by the compiler can lead to poor performance. The only question is where you want to draw the line between usability and performance.
Yeah. If you want to squeeze out every last drop of juice your CPU's got to offer you, you could code directly in assembler, and no optimizing compiler, GC or no GC, will be able to beat that.
I think it depends on what you're trying to achieve. If coding for resource constrained processors, or taking advantage of special SIMD instructions, then I agree. On the other hand if you're targeting processors with multiple execution units, instruction re-ordering, multiple cache levels, NUMA, ..., then it is another game level trying to beat the compiler. And when you win, it will be for a specific set of processor + motherboard + memory combination. Usually the compiler is way better keeping track of all possible instruction combinations for certain scenarios. Well this is just my opinion with my compiler design aficionado on, some guys here might prove me wrong. -- Paulo
Dec 14 2012
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 14, 2012 at 10:27:30PM +0100, Paulo Pinto wrote:
 On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
[...]
Yeah. If you want to squeeze out every last drop of juice your CPU's
got to offer you, you could code directly in assembler, and no
optimizing compiler, GC or no GC, will be able to beat that.
I think it depends on what you're trying to achieve. If coding for resource constrained processors, or taking advantage of special SIMD instructions, then I agree. On the other hand if you're targeting processors with multiple execution units, instruction re-ordering, multiple cache levels, NUMA, ..., then it is another game level trying to beat the compiler. And when you win, it will be for a specific set of processor + motherboard + memory combination.
Yeah, that too. Coding in assembler also requires the price of being tied to a specific version of a specific model of a specific brand of a specific vendor's CPU & motherboard. Like the OP stated, it may take you a few 100 years to produce your superior code, when what you write in 1 day with an optimizing compiler will probably perform close to or even match the handcrafted version, plus it has the advantage of being cross-platform. Not to mention, the CPUs of the old days were designed with assembly language or low-level languages in mind. The modern CPUs of today were designed with optimizing compilers in mind. The ease (or rather, difficulty) of hand-coding for modern CPUs is not mere happenstance. :-)
 Usually the compiler is way better keeping track of all possible
 instruction combinations for certain scenarios.
 
 Well this is just my opinion with my compiler design aficionado on,
 some guys here might prove me wrong.
[...] Well, I'm pretty sure that the difficulty (or rather impossibility) of solving the halting problem, which is equivalent to the difficulty of global optimization (cf. Kolmogorov complexity), means that there will always be cases where the compiler won't generate optimal code. However, it's an open question whether humans can beat the compiler at its own game. Just because we can *sometimes* solve specific instances of the halting problem (or come close) by special insight, doesn't mean that we'll always do better than the compiler in the general case. T -- Political correctness: socially-sanctioned hypocrisy.
Dec 14 2012
prev sibling parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
 On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander 
 wrote:
 Allocating memory is simply slow. The same is true in C++ 
 where you will see performance hits if you allocate memory too 
 often. The GC makes things worse, but if you really care about 
 performance then you'll avoid allocating memory so often.

 Try to pre-allocate as much as possible, and use the stack 
 instead of the heap where possible. Fixed size arrays and 
 structs are your friend.
In my situation, I can think of some ways to mitigate the memory allocation problem, however it's a bit tricky when SELECT statement results have to be dynamically generated, since the number of rows returned and size and type of the rows are always different depending on the query and the data stored in the database. It's just not at all practical to custom fit for each SELECT to a pre-allocated array or list, it'll just be far too much manual effort.
Isn't the memory management completely negligible when compared to the database access here ?
Dec 15 2012
parent reply "Rob T" <rob ucora.com> writes:
On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when compared 
 to the database access here ?
Here are the details ... My test run selects and returns 206,085 records with 14 fields per record. With all dynamic memory allocations disabled that are used to create the data structure containing the returned rows, a run takes 5 seconds. This does not return any data, but it runs exactly through all the records in the same way but returns to a temporary stack allocated value of appropriate type. If I disable the GC before the run and re-enable it immediately after, it takes 7 seconds. I presume a full 2 seconds are used to disable and re-enable the GC which seems like a lot of time. With all dynamic memory allocations enabled that are used to create the data structure containing the returned rows, a run takes 28 seconds. In this case, all 206K records are returned in a dynamically generate list. If I disable the GC before the run and re-enable it immediately after, it takes 11 seconds. Since a full 2 seconds are used to disable and re-enable the GC, then 9 seconds are used, and since 5 seconds are used without memory allocations, the allocations are using 4 seconds, but I'm doing a lot of allocations. In my case, the structure is dynamically generated by allocating each individual field for each record returned, so there's 206,085 records x 14 fields = 2,885,190 allocations being performed. I can cut the individual allocations down to 206,000 by allocating the full record in one shot, however this is a stress test designed to work D as hard as possible and compare it with an identically stressed C++ version. Both the D and C++ versions perform identically with the GC disabled and subtracting the 2 seconds from the D version to remove the time used up by enabling and disabling the GC during and after the run. I wonder why 2 seconds are used to disable and enable the GC? That seems like a very large amount of time. If I select only 5,000 records, the time to disable and enable the GC drops significantly to negligible levels and it takes the same amount of time per run with GC disabled & enabled, or with GC left enabled all the time. During all tests, I do not run out of free RAM, and at no point does the memory go to swap. --rt
Dec 15 2012
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Rob T:

 I wonder why 2 seconds are used to disable and enable the GC?
If you want one more test, try to put a "exit(0);" at the end of your program (The C exit is in core.stdc.stdlib). Bye, bearophile
Dec 16 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?
Here are the details ... My test run selects and returns 206,085 records with 14 fields per record. With all dynamic memory allocations disabled that are used to create the data structure containing the returned rows, a run takes 5 seconds. This does not return any data, but it runs exactly through all the records in the same way but returns to a temporary stack allocated value of appropriate type. If I disable the GC before the run and re-enable it immediately after, it takes 7 seconds. I presume a full 2 seconds are used to disable and re-enable the GC which seems like a lot of time. With all dynamic memory allocations enabled that are used to create the data structure containing the returned rows, a run takes 28 seconds. In this case, all 206K records are returned in a dynamically generate list. If I disable the GC before the run and re-enable it immediately after, it takes 11 seconds. Since a full 2 seconds are used to disable and re-enable the GC, then 9 seconds are used, and since 5 seconds are used without memory allocations, the allocations are using 4 seconds, but I'm doing a lot of allocations. In my case, the structure is dynamically generated by allocating each individual field for each record returned, so there's 206,085 records x 14 fields = 2,885,190 allocations being performed. I can cut the individual allocations down to 206,000 by allocating the full record in one shot, however this is a stress test designed to work D as hard as possible and compare it with an identically stressed C++ version. Both the D and C++ versions perform identically with the GC disabled and subtracting the 2 seconds from the D version to remove the time used up by enabling and disabling the GC during and after the run. I wonder why 2 seconds are used to disable and enable the GC? That seems like a very large amount of time. If I select only 5,000 records, the time to disable and enable the GC drops significantly to negligible levels and it takes the same amount of time per run with GC disabled & enabled, or with GC left enabled all the time. During all tests, I do not run out of free RAM, and at no point does the memory go to swap. --rt
Adding and subtracting times like this doesn't give very reliable results. If you want to know how much time is taken by different parts of code, I suggest you use a profiler.
Dec 16 2012
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?
Here are the details ... My test run selects and returns 206,085 records with 14 fields per record. With all dynamic memory allocations disabled that are used to create the data structure containing the returned rows, a run takes 5 seconds. This does not return any data, but it runs exactly through all the records in the same way but returns to a temporary stack allocated value of appropriate type. If I disable the GC before the run and re-enable it immediately after, it takes 7 seconds. I presume a full 2 seconds are used to disable and re-enable the GC which seems like a lot of time. With all dynamic memory allocations enabled that are used to create the data structure containing the returned rows, a run takes 28 seconds. In this case, all 206K records are returned in a dynamically generate list. If I disable the GC before the run and re-enable it immediately after, it takes 11 seconds. Since a full 2 seconds are used to disable and re-enable the GC, then 9 seconds are used, and since 5 seconds are used without memory allocations, the allocations are using 4 seconds, but I'm doing a lot of allocations. In my case, the structure is dynamically generated by allocating each individual field for each record returned, so there's 206,085 records x 14 fields = 2,885,190 allocations being performed. I can cut the individual allocations down to 206,000 by allocating the full record in one shot, however this is a stress test designed to work D as hard as possible and compare it with an identically stressed C++ version. Both the D and C++ versions perform identically with the GC disabled and subtracting the 2 seconds from the D version to remove the time used up by enabling and disabling the GC during and after the run. I wonder why 2 seconds are used to disable and enable the GC? That seems like a very large amount of time. If I select only 5,000 records, the time to disable and enable the GC drops significantly to negligible levels and it takes the same amount of time per run with GC disabled & enabled, or with GC left enabled all the time. During all tests, I do not run out of free RAM, and at no point does the memory go to swap. --rt
Use the stopwatch class from std.datetime to get a proper idea of where time is being spent. All this subtracting 2 secs business stinks. or just fire up a profiler.
Dec 16 2012
parent "Rob T" <rob ucora.com> writes:
On Sunday, 16 December 2012 at 11:43:20 UTC, John Colvin wrote:
 Use the stopwatch class from std.datetime to get a proper idea 
 of where time is being spent. All this subtracting 2 secs 
 business stinks.

 or just fire up a profiler.
I am using the stopwatch, but had not gotten around to wrapping around things for the extra detail. The subtractions and so forth was roughly calculated on the fly while I was posting and noticing new things I hadn't notice before. The fact is disabling and enabling the GC added on an extra 2 secs for some reason, so it's of interest knowing why. I'll do proper timing later and post the results here. --rt
Dec 16 2012
prev sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?
Here are the details ... My test run selects and returns 206,085 records with 14 fields per record. With all dynamic memory allocations disabled that are used to create the data structure containing the returned rows, a run takes 5 seconds. This does not return any data, but it runs exactly through all the records in the same way but returns to a temporary stack allocated value of appropriate type. If I disable the GC before the run and re-enable it immediately after, it takes 7 seconds. I presume a full 2 seconds are used to disable and re-enable the GC which seems like a lot of time. With all dynamic memory allocations enabled that are used to create the data structure containing the returned rows, a run takes 28 seconds. In this case, all 206K records are returned in a dynamically generate list. If I disable the GC before the run and re-enable it immediately after, it takes 11 seconds. Since a full 2 seconds are used to disable and re-enable the GC, then 9 seconds are used, and since 5 seconds are used without memory allocations, the allocations are using 4 seconds, but I'm doing a lot of allocations. In my case, the structure is dynamically generated by allocating each individual field for each record returned, so there's 206,085 records x 14 fields = 2,885,190 allocations being performed. I can cut the individual allocations down to 206,000 by allocating the full record in one shot, however this is a stress test designed to work D as hard as possible and compare it with an identically stressed C++ version.
You cannot expect the GC to perform like manual memory management. It's a completely unrealistic microbenchmark to allocate each individual field, even for manual MM. The least you can do to be a little bit realistic is indeed to allocate one row at a time. I hope that's what you intend to do. But usually, database drivers allow the user to tweak the queries and decide how many rows can be fetched at a time, and it's pretty common to fetch 50 or 100 rows at a time, meaning one allocation only each time. It would be interesting to compare the performance of the two languages in these situations, i.e one row at a time, and 50 rows at a time.
Dec 16 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Rob T:

 I wonder what can be done to allow a programmer to go fully 
 manual, while not loosing any of the nice features of D?
Even the Rust language, that has a more powerful type system than D, with region analysis and more, sometimes needs localized reference counting (or a localized per-thread GC) to allow the usage of its full features. So I don't think you can have all the nice features of D without its GC. I believe the D design has bet too much on its (not precise) GC. Now the design of Phobos & D needs to show more love for stack allocations (see Variable Length arrays, array literals, etc), for some alternative allocators like reaps (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.6505 ), and so on. Someone implemented a stack-like data manager for D, but the voting didn't allow it into Phobos. Bye, bearophile
Dec 14 2012
prev sibling next sibling parent reply "Paulo Pinto" <pjmlp progtools.org> writes:
On Friday, 14 December 2012 at 18:27:29 UTC, Rob T wrote:
 I created a D library wrapper for sqlite3 that uses a 
 dynamically constructed result list for returned records from a 
 SELECT statement. It works in a similar way to a C++ version 
 that I wrote a while back.

 The D code is D code, not a cloned up version of my earlier C++ 
 code, so it makes use of many of the features of D, and one of 
 them is the garbage collector.

 When running comparison tests between the C++ version and the D 
 version, both compiled using performance optimization flags, 
 the C++ version runs 3x faster than the D version which was 
 very unexpected. If anything I was hoping for a performance 
 boost out of D or at least the same performance levels.

 I remembered reading about people having performance problems 
 with the GC, so I tried a quick fix, which was to disable the 
 GC before the SELECT is run and re-enable afterwards. The 
 result of doing that was a 3x performance boost, making the DMD 
 compiled version run almost as fast as the C++ version. The DMD 
 compiled version is now only 2 seconds slower on my stress test 
 runs of a SELECT that returns 200,000+ records with 14 fields. 
 Not too bad! I may get identical performance if I compile using 
 gdc, but that will have to wait until it is updated to 2.061.

 Fixing this was a major relief since the code is expected to be 
 used in a commercial setting. I'm wondering though, why the GC 
 causes such a large penalty, and what negative effect if any if 
 there will be when disabling the GC temporarily. I know that 
 memory won't be reclaimed until the GC is re-enabled, but is 
 there anything else to worry about?

 I feel it's worth commenting on my experience as feed back for 
 the D developers and anyone else starting off with D.

 Coming from C++ I *really* did not like having the GC, it made 
 me very nervous, but now that I'm used to having it, I've come 
 to like having it up to a point. It really does change the way 
 you think and code. However as I've discovered, you still have 
 to always be thinking about memory management issues because 
 the GC can eat up a huge performance penalty under certain 
 situations. I also NEED to know that I can always go full 
 manual where necessary. There's no way I would want to give up 
 that kind of control.

 The trade off with having a GC seems to be that by default, C++ 
 apps will perform considerably faster than equivalent D apps 
 out-of-the-box, simply because the manual memory management is 
 fine tuned by the programmer as the development proceeds. With 
 D, when you simply let the GC take care of business, then you 
 are not necessarily fine tuning as you go along, and when you 
 do not take the resulting performance hit into consideration it 
 means that your apps will likely perform poorly compared to a 
 C++ equivalent. However, building the equivalent app in D is a 
 much more pleasant experience in terms of the programming 
 productivity gain. The code is simpler to deal with, and 
 there's less to worry about with pointers and other memory 
 management issues.

 What I have not yet had the opportunity to explore, is using D 
 in full manual memory management mode. My understanding is that 
 if I take that route, then I cannot use certain parts of the 
 std lib, and will also loose a few of the nice features of D 
 that make it fun to work with. I'm not fully clear though on 
 what to expect, so if there's any detailed information to look 
 at, it would be a big help.

 I wonder what can be done to allow a programmer to go fully 
 manual, while not loosing any of the nice features of D?

 Also, I think everyone agrees we really need a better GC, and I 
 wonder once we do get a better GC, what kind of overall 
 improvements we can expect to see?

 Thanks for listening.

 --rt
Having lots of experience in GC enabled languages, even for systems programming (Oberon & Active Oberon). I think there a few issues to consider: - D's GC still has a lot of room to improve, so some of the issues you have found might eventually get improved; - Having GC support, does not mean to do call new like crazy, one still needs to think how to code in a GC friendly way; - Make proper use of weak references in case they are available; - GC enabled languages runtimes usually offer ways to peak into the runtime, somehow, and allow the developer to understand how GC is working and what might be improved; The goodness of having a GC is to have a safer way to manage memory across multiple modules, specially when ownership is not clear. Even in C++ I seldom do manual memory management nowadays, if working on new codebases. Of course, others will have a different experience. Other than that, thanks for sharing your experience. -- Paulo
Dec 14 2012
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 14, 2012 at 08:27:46PM +0100, Paulo Pinto wrote:
[...]
 - Having GC support, does not mean to do call new like crazy, one
 still needs to think how to code in a GC friendly way;
It makes me think, though, that perhaps there is some way of optimizing the GC for recursive data structures where you only ever keep a reference to the head node, so that they can be managed in a much more efficient way than a structure where there may be arbitrary number of references to anything inside. I think this is a pretty common case, at least in the kind of code I encounter frequently. Also, coming from C/C++, I have to say that my coding style has been honed over the years to think in terms of single-ownership structures, so even when coding in D I tend to write code that way. However, having the GC available means that there are some cases where using multiple references to stuff will actually improve GC (and overall) performance by eliminating the need to deep-copy stuff everywhere.
 - GC enabled languages runtimes usually offer ways to peak into the
 runtime, somehow, and allow the developer to understand how GC is
 working and what might be improved;
[...] Yeah, I think for most applications, it's probably good enough to use the functions in core.memory (esp. enable, disable, collect, and minimize) to exercise some control over the GC so that you can use manual memory management in the important hotspots, and just let the GC do its thing in less important parts of the program. I think core.memory.minimize will solve the OP's concern about GC'd apps having bad memory footprints. T -- Computers aren't intelligent; they only think they are.
Dec 14 2012
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Dec 14, 2012 at 07:27:26PM +0100, Rob T wrote:
 I created a D library wrapper for sqlite3 that uses a dynamically
 constructed result list for returned records from a SELECT
 statement. It works in a similar way to a C++ version that I wrote a
 while back.
Hmm, I seem to have heard wind of an existing D sqlite3 wrapper somewhere, but that may have been D1/Tango, I'm not sure. [...]
 I remembered reading about people having performance problems with
 the GC, so I tried a quick fix, which was to disable the GC before
 the SELECT is run and re-enable afterwards. The result of doing that
 was a 3x performance boost, making the DMD compiled version run
 almost as fast as the C++ version. The DMD compiled version is now
 only 2 seconds slower on my stress test runs of a SELECT that
 returns 200,000+ records with 14 fields. Not too bad! I may get
 identical performance if I compile using gdc, but that will have to
 wait until it is updated to 2.061.
 
 Fixing this was a major relief since the code is expected to be used
 in a commercial setting. I'm wondering though, why the GC causes
 such a large penalty, and what negative effect if any if there will
 be when disabling the GC temporarily. I know that memory won't be
 reclaimed until the GC is re-enabled, but is there anything else to
 worry about?
AFAIK, it should be safe to disable the GC during that time as long as you're aware of the possibility of running out of memory in the interim. But there's also the issue that a good, enterprise-quality GC is very VERY hard to write, and especially hard for a language like D which allows you to do system-level stuff like pointer casting and unions (though thanks to its advanced features you rarely need to do such things). This forces the GC to be conservative, which complicates it and also affects its performance. The difficulty of the task made it so that our current GC does leave much to be desired. However, there's been talk of a (semi-)precise GC in the works, so hopefully we'll start getting a better GC in the near future. [...]
 Coming from C++ I *really* did not like having the GC, it made me
 very nervous, but now that I'm used to having it, I've come to like
 having it up to a point. It really does change the way you think and
 code. However as I've discovered, you still have to always be
 thinking about memory management issues because the GC can eat up a
 huge performance penalty under certain situations. I also NEED to
 know that I can always go full manual where necessary. There's no
 way I would want to give up that kind of control.
Totally understand what you mean. I also came from C/C++, and the fact that D relies on a GC actually put me off trying out D for some time. It took me a while before being convinced to at least give it a try. There was a particular article that convinced me, but I can't recall which one it was right now. Basically, the point was that having a GC frees up your mind from having to constantly worry about memory management issues, and actually think about the actual algorithm you're working on. It also eliminates memory leakage that comes from careless coding -- which happens all too often in C/C++, as shown in my day job where we're constantly chasing down memory leak bugs. We're all human, after all, and prone to slip-ups every now and then. All it takes is a single slip, and your app will eventually eat up all memory and bring down the system. Usually on the customer's live environment, which is really the only place where your code actually runs for sufficiently long periods of time for the bug to show up (QA theoretically is supposed to test this, but doesn't have that luxury due to release deadlines). After having gotten used to D and its GC, I have to say that my coding is much more efficient. I tend to use string operations quite often, and it's quite a big relief to not have to constantly worry about managing memory for the strings manually. (String manipulation is a royal pain in C/C++, so much so that sometimes I resort to Perl to get the job done.) Having said that, though, I agree that there *are* times when you want to, and *need* to, manage memory manually. A GC relieves you of manual memory management for the general case, but when optimizing the hotspots in your code, nothing beats a hand-crafted manual memory management scheme designed specifically for what you're doing. For that, D does let you call the C library's malloc() and free() yourself, and manage the pointers manually. You can then use Phobos' emplace function to create D objects in your manually-allocated memory blocks, and thus still enjoy some of D's advanced features to an extent. You can, of course, also temporarily turn off the GC during time-sensitive points where you don't want a collection cycle to start on you. [...]
 What I have not yet had the opportunity to explore, is using D in
 full manual memory management mode. My understanding is that if I
 take that route, then I cannot use certain parts of the std lib, and
 will also loose a few of the nice features of D that make it fun to
 work with. I'm not fully clear though on what to expect, so if
 there's any detailed information to look at, it would be a big help.
You could ask Manu, who IIRC uses full manual memory management in D. Or search the forums for using D without the GC -- I think somebody has posted the details before.
 I wonder what can be done to allow a programmer to go fully manual,
 while not loosing any of the nice features of D?
[...] I don't think you'll be able to get 100% of D's features without a GC. Some features are simply too complicated to implement otherwise, such as array slicing + free appending. Hopefully more of D will be usable once Andrei (or whoever it was) works out the custom allocators design for Phobos. I think as of right now, the functions in std.range and std.algorithm should all be GC-free, as long as you don't use things like delegates. (I believe Jonathan has said that if any std.range or std.algorithm functions have implicit memory allocation, it should be considered as a bug.) On Fri, Dec 14, 2012 at 08:24:38PM +0100, Rob T wrote:
 On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander wrote:
[...]
I avoid using the GC when using D and I feel like I still have a
lot of freedom of expression, but maybe I'm just used to it.
I'd like to do that too and wish I had your experience with how you are going about it. My fear is that I'll end up allocating without knowing and having my apps silently eat up memory over time. At the end of the day, there's just no point in having a GC if you don't want to use it, so the big question is if a GC can be made to work much better than what we have? Supposedly yes, but will the improvements really matter? I somehow doubt it will.
For me, one big plus with the GC is that I can actually concentrate on improving my algorithms instead of being bogged down constantly by memory management issues. I think that has led me to write much better code than when I was coding in C/C++. It has also eliminated those annoying pointer bugs and memory leaks, and the countless hours spent debugging them. But, as with anything worthwhile in programming, the GC comes with a cost, so sometimes you will suffer from performance degradation. But like I said in another thread, your program's hotspots are often not where you think they are; you need actual profiling to figure out where the performance problems are. Once you locate those, you can apply some workarounds like temporarily disable the GC, or switch to manual memory management, etc.. I don't think GCs will ever get to the point where they will be both maximally-performant *and* not require any effort from the programmer. There are only two ways to implement that, and only the third one works. :-P
 When I look at GC based apps, what they all seem to have in common,
 is that they tend to eat up vast amounts of RAM for nothing and
 perform poorly. I'm speaking mostly about Java apps, they are
 terrible with performance and memory foot print in general. But also
 C++ apps that use built in GC tend to have similar issues.
If you're worried about performance, you might want to consider using GDC or LDC. IME, GDC consistently produces D executables that are at least 20-30% faster than what DMD produces, simply because GCC has a far more advanced optimization framework in its backend. Sometimes it can be 40-50% faster, though YMMV.
 It may be that the GC concept works far better in theory than in
 practice, although due to the performance penalty work-a-rounds, you
 may end up writing better performing apps because of it, however
 that's NOT the intention of having a GC!
[...] Well, everything comes at a cost. :-) The GC lets you develop programs faster with less pain (and virtually no memory-related bugs), but you have to pay in performance. Manual memory management lets you maximize performance, but then you have to pay in countless headaches over finding pointer bugs and memory leaks. You can't have both. :) (Unless it's both bad performance *and* pointer bug headaches. :-P) A good middle ground is to use the GC for the common cases where performance isn't important, and optimize with manual memory management in your hotspots where performance matters. (And make sure you profile before you do anything, 'cos like I said, your hotspots often aren't where you think they are. I learnt that the hard way.) T -- Why do conspiracy theories always come from the same people??
Dec 14 2012
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully manual,
 while not loosing any of the nice features of D?
Someone has create a GC free version of druntime and Phobos. Unfortunately I can't find the post in the newsgroup right now. -- /Jacob Carlborg
Dec 15 2012
parent reply "Mike Parker" <aldacron gmail.com> writes:
On Saturday, 15 December 2012 at 11:35:18 UTC, Jacob Carlborg 
wrote:
 On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully 
 manual,
 while not loosing any of the nice features of D?
Someone has create a GC free version of druntime and Phobos. Unfortunately I can't find the post in the newsgroup right now.
http://3d.benjamin-thaut.de/?p=20
Dec 15 2012
next sibling parent "Rob T" <rob ucora.com> writes:
On Saturday, 15 December 2012 at 13:04:41 UTC, Mike Parker wrote:
 On Saturday, 15 December 2012 at 11:35:18 UTC, Jacob Carlborg 
 wrote:
 On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully 
 manual,
 while not loosing any of the nice features of D?
Someone has create a GC free version of druntime and Phobos. Unfortunately I can't find the post in the newsgroup right now.
http://3d.benjamin-thaut.de/?p=20
Thanks for the link, Windows only and I'm using Linux, but still worth a look. Note this, comment below, a 3x difference, same as what I experienced: Update: I found a piece of code that did manually slow down the simulation in case it got to fast. This code never kicked in with the GC version, because it never reached the margin. The manual memory managed version however did reach the margin and was slowed down. With this piece of code removed the manual memory managed version runs at 5 ms which is 200 FPS and thus nearly 3 times as fast as the GC collected version.
Dec 15 2012
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-12-15 14:04, Mike Parker wrote:

 http://3d.benjamin-thaut.de/?p=20
That's it, thanks. -- /Jacob Carlborg
Dec 16 2012