digitalmars.D - Significant GC performance penalty

Rob T (64/64) Dec 14 2012 I created a D library wrapper for sqlite3 that uses a dynamically

Peter Alexander (9/9) Dec 14 2012 Allocating memory is simply slow. The same is true in C++ where

Rob T (35/44) Dec 14 2012 In my situation, I can think of some ways to mitigate the memory

Peter Alexander (24/52) Dec 14 2012 Maybe I have misunderstood, but it sounds to me like you could

H. S. Teoh (18/31) Dec 14 2012 Yeah. If you want to squeeze out every last drop of juice your CPU's got

Rob T (18/35) Dec 14 2012 Yeah, I did that too long ago and I'm happy to have learned the
Paulo Pinto (20/42) Dec 14 2012 I think it depends on what you're trying to achieve.

H. S. Teoh (25/46) Dec 14 2012 Yeah, that too. Coding in assembler also requires the price of being

SomeDude (3/21) Dec 15 2012 Isn't the memory management completely negligible when compared

Rob T (41/43) Dec 15 2012 Here are the details ...

bearophile (5/6) Dec 16 2012 If you want one more test, try to put a "exit(0);" at the end of
jerro (4/49) Dec 16 2012 Adding and subtracting times like this doesn't give very reliable
John Colvin (5/50) Dec 16 2012 Use the stopwatch class from std.datetime to get a proper idea of

Rob T (9/13) Dec 16 2012 I am using the stopwatch, but had not gotten around to wrapping

SomeDude (12/44) Dec 16 2012 You cannot expect the GC to perform like manual memory

bearophile (15/17) Dec 14 2012 Even the Rust language, that has a more powerful type system than
Paulo Pinto (21/86) Dec 14 2012 Having lots of experience in GC enabled languages, even for

H. S. Teoh (25/30) Dec 14 2012 It makes me think, though, that perhaps there is some way of optimizing

H. S. Teoh (103/160) Dec 14 2012 Hmm, I seem to have heard wind of an existing D sqlite3 wrapper
Jacob Carlborg (5/7) Dec 15 2012 Someone has create a GC free version of druntime and Phobos.

Mike Parker (3/9) Dec 15 2012 http://3d.benjamin-thaut.de/?p=20

Rob T (13/24) Dec 15 2012 Thanks for the link, Windows only and I'm using Linux, but still
Jacob Carlborg (4/5) Dec 16 2012 That's it, thanks.

"Rob T" <rob ucora.com> writes:

I created a D library wrapper for sqlite3 that uses a dynamically 
constructed result list for returned records from a SELECT 
statement. It works in a similar way to a C++ version that I 
wrote a while back.

The D code is D code, not a cloned up version of my earlier C++ 
code, so it makes use of many of the features of D, and one of 
them is the garbage collector.

When running comparison tests between the C++ version and the D 
version, both compiled using performance optimization flags, the 
C++ version runs 3x faster than the D version which was very 
unexpected. If anything I was hoping for a performance boost out 
of D or at least the same performance levels.

I remembered reading about people having performance problems 
with the GC, so I tried a quick fix, which was to disable the GC 
before the SELECT is run and re-enable afterwards. The result of 
doing that was a 3x performance boost, making the DMD compiled 
version run almost as fast as the C++ version. The DMD compiled 
version is now only 2 seconds slower on my stress test runs of a 
SELECT that returns 200,000+ records with 14 fields. Not too bad! 
I may get identical performance if I compile using gdc, but that 
will have to wait until it is updated to 2.061.

Fixing this was a major relief since the code is expected to be 
used in a commercial setting. I'm wondering though, why the GC 
causes such a large penalty, and what negative effect if any if 
there will be when disabling the GC temporarily. I know that 
memory won't be reclaimed until the GC is re-enabled, but is 
there anything else to worry about?

I feel it's worth commenting on my experience as feed back for 
the D developers and anyone else starting off with D.

Coming from C++ I *really* did not like having the GC, it made me 
very nervous, but now that I'm used to having it, I've come to 
like having it up to a point. It really does change the way you 
think and code. However as I've discovered, you still have to 
always be thinking about memory management issues because the GC 
can eat up a huge performance penalty under certain situations. I 
also NEED to know that I can always go full manual where 
necessary. There's no way I would want to give up that kind of 
control.

The trade off with having a GC seems to be that by default, C++ 
apps will perform considerably faster than equivalent D apps 
out-of-the-box, simply because the manual memory management is 
fine tuned by the programmer as the development proceeds. With D, 
when you simply let the GC take care of business, then you are 
not necessarily fine tuning as you go along, and when you do not 
take the resulting performance hit into consideration it means 
that your apps will likely perform poorly compared to a C++ 
equivalent. However, building the equivalent app in D is a much 
more pleasant experience in terms of the programming productivity 
gain. The code is simpler to deal with, and there's less to worry 
about with pointers and other memory management issues.

What I have not yet had the opportunity to explore, is using D in 
full manual memory management mode. My understanding is that if I 
take that route, then I cannot use certain parts of the std lib, 
and will also loose a few of the nice features of D that make it 
fun to work with. I'm not fully clear though on what to expect, 
so if there's any detailed information to look at, it would be a 
big help.

I wonder what can be done to allow a programmer to go fully 
manual, while not loosing any of the nice features of D?

Also, I think everyone agrees we really need a better GC, and I 
wonder once we do get a better GC, what kind of overall 
improvements we can expect to see?

Thanks for listening.

--rt

Dec 14 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

Allocating memory is simply slow. The same is true in C++ where 
you will see performance hits if you allocate memory too often. 
The GC makes things worse, but if you really care about 
performance then you'll avoid allocating memory so often.

Try to pre-allocate as much as possible, and use the stack 
instead of the heap where possible. Fixed size arrays and structs 
are your friend.

I avoid using the GC when using D and I feel like I still have a 
lot of freedom of expression, but maybe I'm just used to it.

Dec 14 2012

"Rob T" <rob ucora.com> writes:

On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander 
wrote:
 Allocating memory is simply slow. The same is true in C++ where 
 you will see performance hits if you allocate memory too often. 
 The GC makes things worse, but if you really care about 
 performance then you'll avoid allocating memory so often.

 Try to pre-allocate as much as possible, and use the stack 
 instead of the heap where possible. Fixed size arrays and 
 structs are your friend.

In my situation, I can think of some ways to mitigate the memory 
allocation  problem, however it's a bit tricky when SELECT 
statement results have to be dynamically generated, since the 
number of rows returned and size and type of the rows are always 
different depending on the query and the data stored in the 
database. It's just not at all practical to custom fit for each 
SELECT to a pre-allocated array or list, it'll just be far too 
much manual effort.

I could consider generating a free list of pre-allocated record 
components that is re-used rather than destroyed and reallocated. 
However knowing how many records to pre-allocate is tricky and I 
could run out, or waste tons of RAM for nothing most of the time. 
End of day, I may be better off digging into the GC source code 
itself and look for solutions.

 I avoid using the GC when using D and I feel like I still have 
 a lot of freedom of expression, but maybe I'm just used to it.

I'd like to do that too and wish I had your experience with how 
you are going about it. My fear is that I'll end up allocating 
without knowing and having my apps silently eat up memory over 
time.

At the end of the day, there's just no point in having a GC if 
you don't want to use it, so the big question is if a GC can be 
made to work much better than what we have? Supposedly yes, but 
will the improvements really matter? I somehow doubt it will.

When I look at GC based apps, what they all seem to have in 
common, is that they tend to eat up vast amounts of RAM for 
nothing and perform poorly. I'm speaking mostly about Java apps, 
they are terrible with performance and memory foot print in 
general. But also C++ apps that use built in GC tend to have 
similar issues.

It may be that the GC concept works far better in theory than in 
practice, although due to the performance penalty work-a-rounds, 
you may end up writing better performing apps because of it, 
however that's NOT the intention of having a GC!

--rt

Dec 14 2012

"Peter Alexander" <peter.alexander.au gmail.com> writes:

On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
 In my situation, I can think of some ways to mitigate the 
 memory allocation  problem, however it's a bit tricky when 
 SELECT statement results have to be dynamically generated, 
 since the number of rows returned and size and type of the rows 
 are always different depending on the query and the data stored 
 in the database. It's just not at all practical to custom fit 
 for each SELECT to a pre-allocated array or list, it'll just be 
 far too much manual effort.

Maybe I have misunderstood, but it sounds to me like you could
get away with a single allocation there. Just reducing the number
of allocations will improve things a lot.

 I avoid using the GC when using D and I feel like I still have 
 a lot of freedom of expression, but maybe I'm just used to it.

 I'd like to do that too and wish I had your experience with how 
 you are going about it. My fear is that I'll end up allocating 
 without knowing and having my apps silently eat up memory over 
 time.

This shouldn't be a problem. I occasionally recompile druntime
with a printf inside the allocation function just to make sure,
but normally I can tell if memory allocations are going on
because of the sudden GC pauses.

 At the end of the day, there's just no point in having a GC if 
 you don't want to use it, so the big question is if a GC can be 
 made to work much better than what we have? Supposedly yes, but 
 will the improvements really matter? I somehow doubt it will.

D's GC has a lot of headroom for improvement. A generational GC
will likely improve things a lot.

 When I look at GC based apps, what they all seem to have in 
 common, is that they tend to eat up vast amounts of RAM for 
 nothing and perform poorly. I'm speaking mostly about Java 
 apps, they are terrible with performance and memory foot print 
 in general. But also C++ apps that use built in GC tend to have 
 similar issues.

The problem with Java is not just because of the GC. Java eats up
huge amounts of memory because it has no value-types, so
*everything* has to be allocated on the heap, and every object
has 16 bytes of overhead (on 64-bit systems) in addition to
memory manager overhead.

This is a great presentation on the subject of Java memory
efficiency:
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf

 It may be that the GC concept works far better in theory than 
 in practice, although due to the performance penalty 
 work-a-rounds, you may end up writing better performing apps 
 because of it, however that's NOT the intention of having a GC!

When it comes to performance, there is always a compromise with
usability. Even malloc performs poorly compared to more manual
memory management. Even automatic register allocation by the
compiler can lead to poor performance.

The only question is where you want to draw the line between
usability and performance.

Dec 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
 On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:

[...]
It may be that the GC concept works far better in theory than in
practice, although due to the performance penalty work-a-rounds,
you may end up writing better performing apps because of it,
however that's NOT the intention of having a GC!

 
 When it comes to performance, there is always a compromise with
 usability. Even malloc performs poorly compared to more manual
 memory management. Even automatic register allocation by the
 compiler can lead to poor performance.
 
 The only question is where you want to draw the line between
 usability and performance.

Yeah. If you want to squeeze out every last drop of juice your CPU's got
to offer you, you could code directly in assembler, and no optimizing
compiler, GC or no GC, will be able to beat that.

But people stopped writing entire apps in assembler a long time ago. :-)

(I actually did that once, many years ago, for a real app that actually
made a sale or two. It was a good learning experience, and helped me
improve my coding skills just from knowing how the machine actually
works under the hood, as well as learning why it's so important to write
code in a well-structured way -- you have no choice when doing
large-scale coding in assembler, 'cos otherwise your assembly code
quickly devolves into a spaghetti paste soup that no human can possibly
comprehend. So I'd say it was a profitable, even rewarding experience.
But I wouldn't do it again today, given the choice.)


T

-- 
Ruby is essentially Perl minus Wall.

Dec 14 2012

"Rob T" <rob ucora.com> writes:

On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
 (I actually did that once, many years ago, for a real app that 
 actually
 made a sale or two. It was a good learning experience, and 
 helped me
 improve my coding skills just from knowing how the machine 
 actually
 works under the hood, as well as learning why it's so important 
 to write
 code in a well-structured way -- you have no choice when doing
 large-scale coding in assembler, 'cos otherwise your assembly 
 code
 quickly devolves into a spaghetti paste soup that no human can 
 possibly
 comprehend. So I'd say it was a profitable, even rewarding 
 experience.
 But I wouldn't do it again today, given the choice.)


 T

Yeah, I did that too long ago and I'm happy to have learned the 
skills because it's the ultimate coding experience imaginable. If 
you don't do it very carefully, it goes all to hell just like you 
say. Best to let the machines do it these days, even if I could 
do it 10x better, it'll take me 100's of years to do what I can 
do now in a day.

Everyone, thanks for the responses. I got some great ideas 
already to try out. I think at the end of the day, my code will 
be better performing than my old C++ version simply because I 
will be considering the costs of memory allocations which was 
something I really never thought about much before. I guess 
that's the positive side effect to the negative side effect of 
using a GC. I agree like many of you have commented, having a GC 
is a pro-con trade off, positive in some ways, but not all. 
Optimize only where you need to, and let the GC deal with the 
rest.

--rt

Dec 14 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:
 On Fri, Dec 14, 2012 at 09:08:16PM +0100, Peter Alexander wrote:
 On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:

 [...]
It may be that the GC concept works far better in theory than 
in
practice, although due to the performance penalty 
work-a-rounds,
you may end up writing better performing apps because of it,
however that's NOT the intention of having a GC!

 
 When it comes to performance, there is always a compromise with
 usability. Even malloc performs poorly compared to more manual
 memory management. Even automatic register allocation by the
 compiler can lead to poor performance.
 
 The only question is where you want to draw the line between
 usability and performance.

 Yeah. If you want to squeeze out every last drop of juice your 
 CPU's got
 to offer you, you could code directly in assembler, and no 
 optimizing
 compiler, GC or no GC, will be able to beat that.

I think it depends on what you're trying to achieve.

If coding for resource constrained processors, or taking 
advantage of
special SIMD instructions, then I agree.

On the other hand if you're targeting processors with multiple 
execution
units, instruction re-ordering, multiple cache levels, NUMA, ..., 
then it is
another game level trying to beat the compiler. And when you win, 
it will
be for a specific set of processor + motherboard + memory 
combination.

Usually the compiler is way better keeping track of all possible 
instruction
combinations for certain scenarios.

Well this is just my opinion with my compiler design aficionado 
on, some guys here might prove me wrong.

--
Paulo

Dec 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Dec 14, 2012 at 10:27:30PM +0100, Paulo Pinto wrote:
 On Friday, 14 December 2012 at 20:33:33 UTC, H. S. Teoh wrote:

[...]
Yeah. If you want to squeeze out every last drop of juice your CPU's
got to offer you, you could code directly in assembler, and no
optimizing compiler, GC or no GC, will be able to beat that.

 
 I think it depends on what you're trying to achieve.
 
 If coding for resource constrained processors, or taking advantage of
 special SIMD instructions, then I agree.
 
 On the other hand if you're targeting processors with multiple
 execution units, instruction re-ordering, multiple cache levels, NUMA,
 ..., then it is another game level trying to beat the compiler. And
 when you win, it will be for a specific set of processor + motherboard
 + memory combination.

Yeah, that too. Coding in assembler also requires the price of being
tied to a specific version of a specific model of a specific brand of a
specific vendor's CPU & motherboard. Like the OP stated, it may take you
a few 100 years to produce your superior code, when what you write in 1
day with an optimizing compiler will probably perform close to or even
match the handcrafted version, plus it has the advantage of being
cross-platform.

Not to mention, the CPUs of the old days were designed with assembly
language or low-level languages in mind. The modern CPUs of today were
designed with optimizing compilers in mind. The ease (or rather,
difficulty) of hand-coding for modern CPUs is not mere happenstance. :-)


 Usually the compiler is way better keeping track of all possible
 instruction combinations for certain scenarios.
 
 Well this is just my opinion with my compiler design aficionado on,
 some guys here might prove me wrong.

[...]

Well, I'm pretty sure that the difficulty (or rather impossibility) of
solving the halting problem, which is equivalent to the difficulty of
global optimization (cf. Kolmogorov complexity), means that there will
always be cases where the compiler won't generate optimal code.

However, it's an open question whether humans can beat the compiler at
its own game. Just because we can *sometimes* solve specific instances
of the halting problem (or come close) by special insight, doesn't mean
that we'll always do better than the compiler in the general case.


T

-- 
Political correctness: socially-sanctioned hypocrisy.

Dec 14 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Friday, 14 December 2012 at 19:24:39 UTC, Rob T wrote:
 On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander 
 wrote:
 Allocating memory is simply slow. The same is true in C++ 
 where you will see performance hits if you allocate memory too 
 often. The GC makes things worse, but if you really care about 
 performance then you'll avoid allocating memory so often.

 Try to pre-allocate as much as possible, and use the stack 
 instead of the heap where possible. Fixed size arrays and 
 structs are your friend.

 In my situation, I can think of some ways to mitigate the 
 memory allocation  problem, however it's a bit tricky when 
 SELECT statement results have to be dynamically generated, 
 since the number of rows returned and size and type of the rows 
 are always different depending on the query and the data stored 
 in the database. It's just not at all practical to custom fit 
 for each SELECT to a pre-allocated array or list, it'll just be 
 far too much manual effort.

Isn't the memory management completely negligible when compared 
to the database access here ?

Dec 15 2012

"Rob T" <rob ucora.com> writes:

On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when compared 
 to the database access here ?

Here are the details ...

My test run selects and returns 206,085 records with 14 fields 
per record.

With all dynamic memory allocations disabled that are used to 
create the data structure containing the returned rows, a run 
takes 5 seconds. This does not return any data, but it runs 
exactly through all the records in the same way but returns to a 
temporary stack allocated value of appropriate type.

If I disable the GC before the run and re-enable it immediately 
after, it takes 7 seconds. I presume a full 2 seconds are used to 
disable and re-enable the GC which seems like a lot of time.

With all dynamic memory allocations enabled that are used to 
create the data structure containing the returned rows, a run 
takes 28 seconds. In this case, all 206K records are returned in 
a dynamically generate list.

If I disable the GC before the run and re-enable it immediately 
after, it takes 11 seconds. Since a full 2 seconds are used to 
disable and re-enable the GC, then 9 seconds are used, and since 
5 seconds are used without memory allocations, the allocations 
are using 4 seconds, but I'm doing a lot of allocations.

In my case, the structure is dynamically generated by allocating 
each individual field for each record returned, so there's 
206,085 records x 14 fields = 2,885,190 allocations being 
performed. I can cut the individual allocations down to 206,000 
by allocating the full record in one shot, however this is a 
stress test designed to work D as hard as possible and compare it 
with an identically stressed C++ version.

Both the D and C++ versions perform identically with the GC 
disabled and subtracting the 2 seconds from the D version to 
remove the time used up by enabling and disabling the GC during 
and after the run.

I wonder why 2 seconds are used to disable and enable the GC? 
That seems like a very large amount of time. If I select only 
5,000 records, the time to disable and enable the GC drops 
significantly to negligible levels and it takes the same amount 
of time per run with GC disabled & enabled, or with GC left 
enabled all the time.

During all tests, I do not run out of free RAM, and at no point 
does the memory go to swap.

--rt

Dec 15 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Rob T:

 I wonder why 2 seconds are used to disable and enable the GC?

If you want one more test, try to put a "exit(0);" at the end of 
your program (The C exit is in core.stdc.stdlib).

Bye,
bearophile

Dec 16 2012

"jerro" <a a.com> writes:

On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?

 Here are the details ...

 My test run selects and returns 206,085 records with 14 fields 
 per record.

 With all dynamic memory allocations disabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 5 seconds. This does not return any data, but it runs 
 exactly through all the records in the same way but returns to 
 a temporary stack allocated value of appropriate type.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 7 seconds. I presume a full 2 seconds are used 
 to disable and re-enable the GC which seems like a lot of time.

 With all dynamic memory allocations enabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 28 seconds. In this case, all 206K records are returned 
 in a dynamically generate list.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 11 seconds. Since a full 2 seconds are used to 
 disable and re-enable the GC, then 9 seconds are used, and 
 since 5 seconds are used without memory allocations, the 
 allocations are using 4 seconds, but I'm doing a lot of 
 allocations.

 In my case, the structure is dynamically generated by 
 allocating each individual field for each record returned, so 
 there's 206,085 records x 14 fields = 2,885,190 allocations 
 being performed. I can cut the individual allocations down to 
 206,000 by allocating the full record in one shot, however this 
 is a stress test designed to work D as hard as possible and 
 compare it with an identically stressed C++ version.

 Both the D and C++ versions perform identically with the GC 
 disabled and subtracting the 2 seconds from the D version to 
 remove the time used up by enabling and disabling the GC during 
 and after the run.

 I wonder why 2 seconds are used to disable and enable the GC? 
 That seems like a very large amount of time. If I select only 
 5,000 records, the time to disable and enable the GC drops 
 significantly to negligible levels and it takes the same amount 
 of time per run with GC disabled & enabled, or with GC left 
 enabled all the time.

 During all tests, I do not run out of free RAM, and at no point 
 does the memory go to swap.

 --rt

Adding and subtracting times like this doesn't give very reliable 
results. If you want to know how much time is taken by different 
parts of code, I suggest you use a profiler.

Dec 16 2012

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?

 Here are the details ...

 My test run selects and returns 206,085 records with 14 fields 
 per record.

 With all dynamic memory allocations disabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 5 seconds. This does not return any data, but it runs 
 exactly through all the records in the same way but returns to 
 a temporary stack allocated value of appropriate type.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 7 seconds. I presume a full 2 seconds are used 
 to disable and re-enable the GC which seems like a lot of time.

 With all dynamic memory allocations enabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 28 seconds. In this case, all 206K records are returned 
 in a dynamically generate list.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 11 seconds. Since a full 2 seconds are used to 
 disable and re-enable the GC, then 9 seconds are used, and 
 since 5 seconds are used without memory allocations, the 
 allocations are using 4 seconds, but I'm doing a lot of 
 allocations.

 In my case, the structure is dynamically generated by 
 allocating each individual field for each record returned, so 
 there's 206,085 records x 14 fields = 2,885,190 allocations 
 being performed. I can cut the individual allocations down to 
 206,000 by allocating the full record in one shot, however this 
 is a stress test designed to work D as hard as possible and 
 compare it with an identically stressed C++ version.

 Both the D and C++ versions perform identically with the GC 
 disabled and subtracting the 2 seconds from the D version to 
 remove the time used up by enabling and disabling the GC during 
 and after the run.

 I wonder why 2 seconds are used to disable and enable the GC? 
 That seems like a very large amount of time. If I select only 
 5,000 records, the time to disable and enable the GC drops 
 significantly to negligible levels and it takes the same amount 
 of time per run with GC disabled & enabled, or with GC left 
 enabled all the time.

 During all tests, I do not run out of free RAM, and at no point 
 does the memory go to swap.

 --rt

Use the stopwatch class from std.datetime to get a proper idea of 
where time is being spent. All this subtracting 2 secs business 
stinks.

or just fire up a profiler.

Dec 16 2012

"Rob T" <rob ucora.com> writes:

On Sunday, 16 December 2012 at 11:43:20 UTC, John Colvin wrote:
 Use the stopwatch class from std.datetime to get a proper idea 
 of where time is being spent. All this subtracting 2 secs 
 business stinks.

 or just fire up a profiler.

I am using the stopwatch, but had not gotten around to wrapping 
around things for the extra detail. The subtractions and so forth 
was roughly calculated on the fly while I was posting and 
noticing new things I hadn't notice before.

The fact is disabling and enabling the GC added on an extra 2 
secs for some reason, so it's of interest knowing why. I'll do 
proper timing later and post the results here.

--rt

Dec 16 2012

"SomeDude" <lovelydear mailmetrash.com> writes:

On Sunday, 16 December 2012 at 07:47:48 UTC, Rob T wrote:
 On Sunday, 16 December 2012 at 05:37:57 UTC, SomeDude wrote:
 Isn't the memory management completely negligible when 
 compared to the database access here ?

 Here are the details ...

 My test run selects and returns 206,085 records with 14 fields 
 per record.

 With all dynamic memory allocations disabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 5 seconds. This does not return any data, but it runs 
 exactly through all the records in the same way but returns to 
 a temporary stack allocated value of appropriate type.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 7 seconds. I presume a full 2 seconds are used 
 to disable and re-enable the GC which seems like a lot of time.

 With all dynamic memory allocations enabled that are used to 
 create the data structure containing the returned rows, a run 
 takes 28 seconds. In this case, all 206K records are returned 
 in a dynamically generate list.

 If I disable the GC before the run and re-enable it immediately 
 after, it takes 11 seconds. Since a full 2 seconds are used to 
 disable and re-enable the GC, then 9 seconds are used, and 
 since 5 seconds are used without memory allocations, the 
 allocations are using 4 seconds, but I'm doing a lot of 
 allocations.

 In my case, the structure is dynamically generated by 
 allocating each individual field for each record returned, so 
 there's 206,085 records x 14 fields = 2,885,190 allocations 
 being performed. I can cut the individual allocations down to 
 206,000 by allocating the full record in one shot, however this 
 is a stress test designed to work D as hard as possible and 
 compare it with an identically stressed C++ version.

You cannot expect the GC to perform like manual memory 
management. It's a completely unrealistic microbenchmark to 
allocate each individual field, even for manual MM. The least you 
can do to be a little bit realistic is indeed to allocate one row 
at a time. I hope that's what you intend to do. But usually, 
database drivers allow the user to tweak the queries and decide 
how many rows can be fetched at a time, and it's pretty common to 
fetch 50 or 100 rows at a time, meaning one allocation only each 
time. It would be interesting to compare the performance of the 
two languages in these situations, i.e one row at a time, and 50 
rows at a time.

Dec 16 2012

"bearophile" <bearophileHUGS lycos.com> writes:

Rob T:

 I wonder what can be done to allow a programmer to go fully 
 manual, while not loosing any of the nice features of D?

Even the Rust language, that has a more powerful type system than 
D, with region analysis and more, sometimes needs localized 
reference counting (or a localized per-thread GC) to allow the 
usage of its full features. So I don't think you can have all the 
nice features of D without its GC.

I believe the D design has bet too much on its (not precise) GC. 
Now the design of Phobos & D needs to show more love for stack 
allocations (see Variable Length arrays, array literals, etc), 
for some alternative allocators like reaps 
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.6505 
), and so on. Someone implemented a stack-like data manager for 
D, but the voting didn't allow it into Phobos.

Bye,
bearophile

Dec 14 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Friday, 14 December 2012 at 18:27:29 UTC, Rob T wrote:
 I created a D library wrapper for sqlite3 that uses a 
 dynamically constructed result list for returned records from a 
 SELECT statement. It works in a similar way to a C++ version 
 that I wrote a while back.

 The D code is D code, not a cloned up version of my earlier C++ 
 code, so it makes use of many of the features of D, and one of 
 them is the garbage collector.

 When running comparison tests between the C++ version and the D 
 version, both compiled using performance optimization flags, 
 the C++ version runs 3x faster than the D version which was 
 very unexpected. If anything I was hoping for a performance 
 boost out of D or at least the same performance levels.

 I remembered reading about people having performance problems 
 with the GC, so I tried a quick fix, which was to disable the 
 GC before the SELECT is run and re-enable afterwards. The 
 result of doing that was a 3x performance boost, making the DMD 
 compiled version run almost as fast as the C++ version. The DMD 
 compiled version is now only 2 seconds slower on my stress test 
 runs of a SELECT that returns 200,000+ records with 14 fields. 
 Not too bad! I may get identical performance if I compile using 
 gdc, but that will have to wait until it is updated to 2.061.

 Fixing this was a major relief since the code is expected to be 
 used in a commercial setting. I'm wondering though, why the GC 
 causes such a large penalty, and what negative effect if any if 
 there will be when disabling the GC temporarily. I know that 
 memory won't be reclaimed until the GC is re-enabled, but is 
 there anything else to worry about?

 I feel it's worth commenting on my experience as feed back for 
 the D developers and anyone else starting off with D.

 Coming from C++ I *really* did not like having the GC, it made 
 me very nervous, but now that I'm used to having it, I've come 
 to like having it up to a point. It really does change the way 
 you think and code. However as I've discovered, you still have 
 to always be thinking about memory management issues because 
 the GC can eat up a huge performance penalty under certain 
 situations. I also NEED to know that I can always go full 
 manual where necessary. There's no way I would want to give up 
 that kind of control.

 The trade off with having a GC seems to be that by default, C++ 
 apps will perform considerably faster than equivalent D apps 
 out-of-the-box, simply because the manual memory management is 
 fine tuned by the programmer as the development proceeds. With 
 D, when you simply let the GC take care of business, then you 
 are not necessarily fine tuning as you go along, and when you 
 do not take the resulting performance hit into consideration it 
 means that your apps will likely perform poorly compared to a 
 C++ equivalent. However, building the equivalent app in D is a 
 much more pleasant experience in terms of the programming 
 productivity gain. The code is simpler to deal with, and 
 there's less to worry about with pointers and other memory 
 management issues.

 What I have not yet had the opportunity to explore, is using D 
 in full manual memory management mode. My understanding is that 
 if I take that route, then I cannot use certain parts of the 
 std lib, and will also loose a few of the nice features of D 
 that make it fun to work with. I'm not fully clear though on 
 what to expect, so if there's any detailed information to look 
 at, it would be a big help.

 I wonder what can be done to allow a programmer to go fully 
 manual, while not loosing any of the nice features of D?

 Also, I think everyone agrees we really need a better GC, and I 
 wonder once we do get a better GC, what kind of overall 
 improvements we can expect to see?

 Thanks for listening.

 --rt

Having lots of experience in GC enabled languages, even for 
systems programming (Oberon & Active Oberon).

I think there a few issues to consider:

- D's GC still has a lot of room to improve, so some of the 
issues you have found might eventually get improved;

- Having GC support, does not mean to do call new like crazy, one 
still needs to think how to code in a GC friendly way;

- Make proper use of weak references in case they are available;

- GC enabled languages runtimes usually offer ways to peak into 
the runtime, somehow, and allow the developer to understand how 
GC is working and what might be improved;

The goodness of having a GC is to have a safer way to manage 
memory across multiple modules, specially when ownership is not 
clear.

Even in C++ I seldom do manual memory management nowadays, if 
working on new codebases. Of course, others will have a different 
experience.

Other than that, thanks for sharing your experience.

--
Paulo

Dec 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Dec 14, 2012 at 08:27:46PM +0100, Paulo Pinto wrote:
[...]
 - Having GC support, does not mean to do call new like crazy, one
 still needs to think how to code in a GC friendly way;

It makes me think, though, that perhaps there is some way of optimizing
the GC for recursive data structures where you only ever keep a
reference to the head node, so that they can be managed in a much more
efficient way than a structure where there may be arbitrary number of
references to anything inside. I think this is a pretty common case, at
least in the kind of code I encounter frequently.

Also, coming from C/C++, I have to say that my coding style has been
honed over the years to think in terms of single-ownership structures,
so even when coding in D I tend to write code that way. However, having
the GC available means that there are some cases where using multiple
references to stuff will actually improve GC (and overall) performance
by eliminating the need to deep-copy stuff everywhere.


 - GC enabled languages runtimes usually offer ways to peak into the
 runtime, somehow, and allow the developer to understand how GC is
 working and what might be improved;

[...]

Yeah, I think for most applications, it's probably good enough to use
the functions in core.memory (esp. enable, disable, collect, and
minimize) to exercise some control over the GC so that you can use
manual memory management in the important hotspots, and just let the GC
do its thing in less important parts of the program. I think
core.memory.minimize will solve the OP's concern about GC'd apps having
bad memory footprints.


T

-- 
Computers aren't intelligent; they only think they are.

Dec 14 2012

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Dec 14, 2012 at 07:27:26PM +0100, Rob T wrote:
 I created a D library wrapper for sqlite3 that uses a dynamically
 constructed result list for returned records from a SELECT
 statement. It works in a similar way to a C++ version that I wrote a
 while back.

Hmm, I seem to have heard wind of an existing D sqlite3 wrapper
somewhere, but that may have been D1/Tango, I'm not sure.


[...]
 I remembered reading about people having performance problems with
 the GC, so I tried a quick fix, which was to disable the GC before
 the SELECT is run and re-enable afterwards. The result of doing that
 was a 3x performance boost, making the DMD compiled version run
 almost as fast as the C++ version. The DMD compiled version is now
 only 2 seconds slower on my stress test runs of a SELECT that
 returns 200,000+ records with 14 fields. Not too bad! I may get
 identical performance if I compile using gdc, but that will have to
 wait until it is updated to 2.061.
 
 Fixing this was a major relief since the code is expected to be used
 in a commercial setting. I'm wondering though, why the GC causes
 such a large penalty, and what negative effect if any if there will
 be when disabling the GC temporarily. I know that memory won't be
 reclaimed until the GC is re-enabled, but is there anything else to
 worry about?

AFAIK, it should be safe to disable the GC during that time as long as
you're aware of the possibility of running out of memory in the interim.

But there's also the issue that a good, enterprise-quality GC is very
VERY hard to write, and especially hard for a language like D which
allows you to do system-level stuff like pointer casting and unions
(though thanks to its advanced features you rarely need to do such
things). This forces the GC to be conservative, which complicates it and
also affects its performance. The difficulty of the task made it so that
our current GC does leave much to be desired.

However, there's been talk of a (semi-)precise GC in the works, so
hopefully we'll start getting a better GC in the near future.


[...]
 Coming from C++ I *really* did not like having the GC, it made me
 very nervous, but now that I'm used to having it, I've come to like
 having it up to a point. It really does change the way you think and
 code. However as I've discovered, you still have to always be
 thinking about memory management issues because the GC can eat up a
 huge performance penalty under certain situations. I also NEED to
 know that I can always go full manual where necessary. There's no
 way I would want to give up that kind of control.

Totally understand what you mean. I also came from C/C++, and the fact
that D relies on a GC actually put me off trying out D for some time. It
took me a while before being convinced to at least give it a try. There
was a particular article that convinced me, but I can't recall which one
it was right now.

Basically, the point was that having a GC frees up your mind from having
to constantly worry about memory management issues, and actually think
about the actual algorithm you're working on. It also eliminates memory
leakage that comes from careless coding -- which happens all too often
in C/C++, as shown in my day job where we're constantly chasing down
memory leak bugs. We're all human, after all, and prone to slip-ups
every now and then. All it takes is a single slip, and your app will
eventually eat up all memory and bring down the system. Usually on the
customer's live environment, which is really the only place where your
code actually runs for sufficiently long periods of time for the bug to
show up (QA theoretically is supposed to test this, but doesn't have
that luxury due to release deadlines).

After having gotten used to D and its GC, I have to say that my coding
is much more efficient. I tend to use string operations quite often, and
it's quite a big relief to not have to constantly worry about managing
memory for the strings manually. (String manipulation is a royal pain in
C/C++, so much so that sometimes I resort to Perl to get the job done.)

Having said that, though, I agree that there *are* times when you want
to, and *need* to, manage memory manually. A GC relieves you of manual
memory management for the general case, but when optimizing the hotspots
in your code, nothing beats a hand-crafted manual memory management
scheme designed specifically for what you're doing. For that, D does let
you call the C library's malloc() and free() yourself, and manage the
pointers manually. You can then use Phobos' emplace function to create D
objects in your manually-allocated memory blocks, and thus still enjoy
some of D's advanced features to an extent.

You can, of course, also temporarily turn off the GC during
time-sensitive points where you don't want a collection cycle to start
on you.


[...]
 What I have not yet had the opportunity to explore, is using D in
 full manual memory management mode. My understanding is that if I
 take that route, then I cannot use certain parts of the std lib, and
 will also loose a few of the nice features of D that make it fun to
 work with. I'm not fully clear though on what to expect, so if
 there's any detailed information to look at, it would be a big help.

You could ask Manu, who IIRC uses full manual memory management in D. Or
search the forums for using D without the GC -- I think somebody has
posted the details before.


 I wonder what can be done to allow a programmer to go fully manual,
 while not loosing any of the nice features of D?

[...]

I don't think you'll be able to get 100% of D's features without a GC.
Some features are simply too complicated to implement otherwise, such as
array slicing + free appending. Hopefully more of D will be usable once
Andrei (or whoever it was) works out the custom allocators design for
Phobos.

I think as of right now, the functions in std.range and std.algorithm
should all be GC-free, as long as you don't use things like delegates.
(I believe Jonathan has said that if any std.range or std.algorithm
functions have implicit memory allocation, it should be considered as a
bug.)


On Fri, Dec 14, 2012 at 08:24:38PM +0100, Rob T wrote:
 On Friday, 14 December 2012 at 18:46:52 UTC, Peter Alexander wrote:

[...]
I avoid using the GC when using D and I feel like I still have a
lot of freedom of expression, but maybe I'm just used to it.

 
 I'd like to do that too and wish I had your experience with how you
 are going about it. My fear is that I'll end up allocating without
 knowing and having my apps silently eat up memory over time.
 
 At the end of the day, there's just no point in having a GC if you
 don't want to use it, so the big question is if a GC can be made to
 work much better than what we have? Supposedly yes, but will the
 improvements really matter? I somehow doubt it will.

For me, one big plus with the GC is that I can actually concentrate on
improving my algorithms instead of being bogged down constantly by
memory management issues. I think that has led me to write much better
code than when I was coding in C/C++. It has also eliminated those
annoying pointer bugs and memory leaks, and the countless hours spent
debugging them.

But, as with anything worthwhile in programming, the GC comes with a
cost, so sometimes you will suffer from performance degradation. But
like I said in another thread, your program's hotspots are often not
where you think they are; you need actual profiling to figure out where
the performance problems are. Once you locate those, you can apply some
workarounds like temporarily disable the GC, or switch to manual memory
management, etc.. I don't think GCs will ever get to the point where
they will be both maximally-performant *and* not require any effort from
the programmer. There are only two ways to implement that, and only the
third one works. :-P


 When I look at GC based apps, what they all seem to have in common,
 is that they tend to eat up vast amounts of RAM for nothing and
 perform poorly. I'm speaking mostly about Java apps, they are
 terrible with performance and memory foot print in general. But also
 C++ apps that use built in GC tend to have similar issues.

If you're worried about performance, you might want to consider using
GDC or LDC. IME, GDC consistently produces D executables that are at
least 20-30% faster than what DMD produces, simply because GCC has a far
more advanced optimization framework in its backend. Sometimes it can be
40-50% faster, though YMMV.


 It may be that the GC concept works far better in theory than in
 practice, although due to the performance penalty work-a-rounds, you
 may end up writing better performing apps because of it, however
 that's NOT the intention of having a GC!

[...]

Well, everything comes at a cost. :-) The GC lets you develop programs
faster with less pain (and virtually no memory-related bugs), but you
have to pay in performance. Manual memory management lets you maximize
performance, but then you have to pay in countless headaches over
finding pointer bugs and memory leaks. You can't have both. :) (Unless
it's both bad performance *and* pointer bug headaches. :-P)

A good middle ground is to use the GC for the common cases where
performance isn't important, and optimize with manual memory management
in your hotspots where performance matters. (And make sure you profile
before you do anything, 'cos like I said, your hotspots often aren't
where you think they are. I learnt that the hard way.)


T

-- 
Why do conspiracy theories always come from the same people??

Dec 14 2012

Jacob Carlborg <doob me.com> writes:

On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully manual,
 while not loosing any of the nice features of D?

Someone has create a GC free version of druntime and Phobos. 
Unfortunately I can't find the post in the newsgroup right now.

-- 
/Jacob Carlborg

Dec 15 2012

"Mike Parker" <aldacron gmail.com> writes:

On Saturday, 15 December 2012 at 11:35:18 UTC, Jacob Carlborg 
wrote:
 On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully 
 manual,
 while not loosing any of the nice features of D?

 Someone has create a GC free version of druntime and Phobos. 
 Unfortunately I can't find the post in the newsgroup right now.

http://3d.benjamin-thaut.de/?p=20

Dec 15 2012

"Rob T" <rob ucora.com> writes:

On Saturday, 15 December 2012 at 13:04:41 UTC, Mike Parker wrote:
 On Saturday, 15 December 2012 at 11:35:18 UTC, Jacob Carlborg 
 wrote:
 On 2012-12-14 19:27, Rob T wrote:

 I wonder what can be done to allow a programmer to go fully 
 manual,
 while not loosing any of the nice features of D?

 Someone has create a GC free version of druntime and Phobos. 
 Unfortunately I can't find the post in the newsgroup right now.

 http://3d.benjamin-thaut.de/?p=20

Thanks for the link, Windows only and I'm using Linux, but still 
worth a look.

Note this, comment below, a 3x difference, same as what I 
experienced:

Update:
I found a piece of code that did manually slow down the 
simulation in case it got to fast. This code never kicked in with 
the GC version, because it never reached the margin. The manual 
memory managed version however did reach the margin and was 
slowed down. With this piece of code removed the manual memory 
managed version runs at 5 ms which is 200 FPS and thus nearly 3 
times as fast as the GC collected version.

Dec 15 2012

Jacob Carlborg <doob me.com> writes:

On 2012-12-15 14:04, Mike Parker wrote:

 http://3d.benjamin-thaut.de/?p=20

That's it, thanks.

-- 
/Jacob Carlborg

Dec 16 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Significant GC performance penalty