digitalmars.D.learn - General performance tip about possibly using the GC or not

Cecil Ward (42/42) Aug 28 2017 I am vacillating - considering breaking a lifetime's C habits and

rikki cattermole (7/7) Aug 28 2017 D's GC is stop the world (aka all threads) and does not run on its own
=?UTF-8?Q?Ali_=c3=87ehreli?= (5/5) Aug 28 2017 I don't like the current format of the page (all articles are expanded

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/5) Aug 28 2017 Apparently, I was looking for this one:

Jonathan M Davis via Digitalmars-d-learn (67/109) Aug 28 2017 Normally, it's only run when you call new. When you call new, if it thin...
Mike Parker (66/112) Aug 28 2017 It's not a panacea, but it's also not the boogyeman some people
Elronnd (9/10) Aug 28 2017 Another alternative that I *think* (maybe someone who knows a bit
Jon Degenhardt (25/29) Aug 28 2017 The tsv command line tools I open-sourced haven't any problems

Cecil Ward <d cecilward.com> writes:

I am vacillating - considering breaking a lifetime's C habits and 
letting the D garbage collector make life wonderful by just 
cleaning up after me and ruining my future C disciple by not 
deleting stuff myself.

I don't know when the GC actually gets a chance to run.

I am wondering if deleting the usual bothersome 
immediately-executed hand-written cleanup code could actually 
improve performance in a sense in some situations. If the cleanup 
is done later by the GC, then this might be done when the 
processor would otherwise be waiting for io, in the top loop of 
an app, say? And if so this would amount to moving the code to be 
run effectively like 'low priority' app-scheduled activities, 
when the process would be waiting anyway, so moving cpu cycles to 
a later time when it doesn't matter. Is this a reasonable picture?

If I carry on deleting objects / freeing / cleaning up as I'm 
used to, without disabling the GC, am I just slowing my code 
down? Plus (for all I know) the GC will use at least some battery 
or possibly actually important cpu cycles in scanning and finding 
nothing to do all the time because I've fully cleaned up.

I suppose there might also be a difference in cache-friendliness 
as cleaning up immediately by hand might be working on hot 
memory, but the GC scanner coming along much later might have to 
deal with cold memory, but it may not matter if the activity is 
app-scheduled like low priority work or is within time periods 
that are merely eating into io-bound wait periods anyway.

I definitely need to read up on this. Have never used a GC 
language, just decades of C and mountains of asm.

Any general guidance on how to optimise cpu usage particularly 
responsiveness.

One pattern I used to use when writing service processes (server 
apps) is that of deferring compute tasks by using a kind of 'post 
this action' which adds an entry into a queue, the entry is a 
function address plus arg list and represents work to be done 
later. In the top loop, the app then executes these 'posted' jobs 
later at app-scheduled low priority relative to other activities 
and all handling of io and timer events, when it has nothing else 
to do, by simply calling through the function pointer in a post 
queue entry. So it's a bit like setting a timer for 0 ms, passing 
a callback function. Terminology - A DFC or lazy, late execution 
might be other terms. I'm wondering if using the garbage 
collector well might fit into this familiar pattern? That fair? 
And actually even help peformance for me if I'm lucky?

Aug 28 2017

rikki cattermole <rikki cattermole.co.nz> writes:

D's GC is stop the world (aka all threads) and does not run on its own 
(requires being asked to collect).

It is only given the opportunity to collect when you allocate (new/more) 
memory. It can decide not to, or to do so at any point making it very 
unpredictable.

This is why we keep saying that it is not a magic bullet.
It isn't. It just does a simple set of logic and nothing more.

Aug 28 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

I don't like the current format of the page (all articles are expanded 
as opposed to being an index page) but there are currently four D blog 
articles on GC and memory management:

   https://dlang.org/blog/category/gc/

Ali

Aug 28 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 08/28/2017 06:25 PM, Ali Çehreli wrote:
 I don't like the current format of the page

Apparently, I was looking for this one:

   https://dlang.org/blog/the-gc-series/

Ali

Aug 28 2017

Jonathan M Davis via Digitalmars-d-learn writes:

On Tuesday, August 29, 2017 00:52:11 Cecil Ward via Digitalmars-d-learn 
wrote:
 I am vacillating - considering breaking a lifetime's C habits and
 letting the D garbage collector make life wonderful by just
 cleaning up after me and ruining my future C disciple by not
 deleting stuff myself.

 I don't know when the GC actually gets a chance to run.

Normally, it's only run when you call new. When you call new, if it thinks
that it needs to do a collection to free up some space, then it will.
Otherwise, it won't normally ever run, because it's not sitting in its own

particular time, you can call core.memory.GC.collect to explicitly tell it
to run a collection. Similarly, you can call GC.disable to make it so that a
section of code won't cause any collections (e.g. in a performance critical
loop that can't afford for the GC to kick in), and then you can call
GC.enable to turn it back on again.

 I am wondering if deleting the usual bothersome
 immediately-executed hand-written cleanup code could actually
 improve performance in a sense in some situations. If the cleanup
 is done later by the GC, then this might be done when the
 processor would otherwise be waiting for io, in the top loop of
 an app, say? And if so this would amount to moving the code to be
 run effectively like 'low priority' app-scheduled activities,
 when the process would be waiting anyway, so moving cpu cycles to
 a later time when it doesn't matter. Is this a reasonable picture?

 If I carry on deleting objects / freeing / cleaning up as I'm
 used to, without disabling the GC, am I just slowing my code
 down? Plus (for all I know) the GC will use at least some battery
 or possibly actually important cpu cycles in scanning and finding
 nothing to do all the time because I've fully cleaned up.

 I suppose there might also be a difference in cache-friendliness
 as cleaning up immediately by hand might be working on hot
 memory, but the GC scanner coming along much later might have to
 deal with cold memory, but it may not matter if the activity is
 app-scheduled like low priority work or is within time periods
 that are merely eating into io-bound wait periods anyway.

 I definitely need to read up on this. Have never used a GC
 language, just decades of C and mountains of asm.

For a lot of stuff, GCs will actually be faster. It really depends on what
your code is doing. One aspect of this is that when you're doing manual
memory management or reference counting, you're basically spreading out the
collection across the program. It's costing you all over the place but isn't
necessarily costing a lot in any particular place. The GC on the other hand
avoids a lot of that cost as you're running, because your program isn't
constantly doing all of that work to free stuff up - but when the GC does
kick in to do a collection, then it costs a lot more for that moment than
any particular freeing of memory would have cost with manual memory
management. It's doing all of that work at once rather than spreading it
out. Whether that results in a more performant program or a less performant
program depends a lot on what you're doing and what your use case can
tolerate. For most programs, having the GC stop stuff temporarily really
doesn't matter at all, whereas for something like a real-time program, it
would be fatal. So, it really depends on what you're doing.

Ultimately, for most programs, it makes the most sense to just use the GC
and optimize your program where it turns out to be necessary. That could
mean disabling the GC in certain sections of code, or it could mean managing
certain memory manually, because it's more efficient to do so in that case.
Doing stuff like allocating a lot of small objects and throwing them away
will definitely be a performance problem for the GC, but it's not all that
great for manual memory management either. A lot of the performance gains
come from doing stuff on the stack where possible, which is one area where
ranges tend to shine.

Another thing to consider is that some programs will need to have specific
threads not managed by the GC so that they can't be stopped during a
collection (e.g. a program with an audio pipeline will probably not want
that on a thread that's GC-managed), and that's one way to avoid a
performance hit from the GC. That's a fairly atypical need though, much as
it's critical for certain types of programs.

All in all, switching to using the GC primarily will probably take a bit of
a shift in thinking, but typical D idioms do tend to reduce the need for
memory management in general and reduce the negative impacts that can come
with garbage collection. And ultimately, some workloads will be more
efficient with the GC. It's my understanding that relatively few programs
end up needing to play games where they do things like disable the GC
temporarily, but the tools are there if you need them. And profiling should
help show you where bottlenecks are.

Ultimately, I think that using the GC is a lot better in most cases. It's
memory safe in a way that manual memory managemen can't be, and it frees you
up from a lot of tedious stuff that often comes with manual memory
management. But it's not a panacea either, and the fact that D provides ways
to work around it when it does become a problem is a real boon.

 Any general guidance on how to optimise cpu usage particularly
 responsiveness.

 One pattern I used to use when writing service processes (server
 apps) is that of deferring compute tasks by using a kind of 'post
 this action' which adds an entry into a queue, the entry is a
 function address plus arg list and represents work to be done
 later. In the top loop, the app then executes these 'posted' jobs
 later at app-scheduled low priority relative to other activities
 and all handling of io and timer events, when it has nothing else
 to do, by simply calling through the function pointer in a post
 queue entry. So it's a bit like setting a timer for 0 ms, passing
 a callback function. Terminology - A DFC or lazy, late execution
 might be other terms. I'm wondering if using the garbage
 collector well might fit into this familiar pattern? That fair?
 And actually even help peformance for me if I'm lucky?

I don't know. You'd probably have to try it and see. Predicting the
performance characteristics of programs is generally difficult, and most
programmers get it wrong a surprisingly large part of the time. That's part
of why profiling is so important, much as most of us tend to forget about it
until we run into a problem that absolutely requires it.

A big part of the question of whether the GC helps performance has to do
with how much garbage you're producing and how quickly you churn through it.
Allocating a bunch of stuff that you don't need to free for a while can
definitely work better with a GC, but if you're constantly allocating and
deallocating, then you can run into serious problems with both the GC and
manual memory management, and which is worse is going to depend on a number
of factors.

- Jonathan M Davis

Aug 28 2017

Mike Parker <aldacron gmail.com> writes:

On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote:
 I am vacillating - considering breaking a lifetime's C habits 
 and letting the D garbage collector make life wonderful by just 
 cleaning up after me and ruining my future C disciple by not 
 deleting stuff myself.

It's not a panacea, but it's also not the boogyeman some people 
make it out to be. You can let the GC do it's thing most of the 
time and not worry about it. For the times when you do need to 
worry about it, there are tools available to mitigate its impact.

 I don't know when the GC actually gets a chance to run.

Only when memory is allocated from the GC, such as when you 
allocate via new, or use a buit-in language feature that 
implicitly allocates (like array concatenation). And then, it 
only runs if it needs to.

 I am wondering if deleting the usual bothersome 
 immediately-executed hand-written cleanup code could actually 
 improve performance in a sense in some situations. If the 
 cleanup is done later by the GC, then this might be done when 
 the processor would otherwise be waiting for io, in the top 
 loop of an app, say? And if so this would amount to moving the 
 code to be run effectively like 'low priority' app-scheduled 
 activities, when the process would be waiting anyway, so moving 
 cpu cycles to a later time when it doesn't matter. Is this a 
 reasonable picture?

When programming to D's GC, some of the same allocation 
strategies you use in C still apply. For example, in C you 
generally wouldn't allocate multiple objects in critical loop 
because allocations are not cheap -- you'd preallocate them, 
possibly on the stack, before entering the loop. That same 
strategy is a win in D, but for a different reason -- if you 
don't allocate anything from the GC heap in the loop, then the GC 
won't run in the loop.

Multiple threads complicate the picture a bit. A background 
thread might trigger a GC collection when you don't want it to, 
but it's still possible to mitigate the impact. This is the sort 
of thing that isn't necessary to concern yourself with in the 
general case, but that you need to be aware of so you can 
recognize it when it happens.

An example that I found interesting was the one Funkwerk 
encountered when the GC was causing their server to drop 
connections [1].


 If I carry on deleting objects / freeing / cleaning up as I'm 
 used to, without disabling the GC, am I just slowing my code 
 down? Plus (for all I know) the GC will use at least some 
 battery or possibly actually important cpu cycles in scanning 
 and finding nothing to do all the time because I've fully 
 cleaned up.

You generally don't delete or free GC-allocated memory. You can 
call destroy on GC-allocated objects, but that just calls the 
destructor and doesn't trigger a collection. And whatever you do 
with the C heap isn't going to negatively impact GC performance. 
You can trigger a collection by calling GC.collect. That's a 
useful tool in certain circumstances, but it can also hurt 
performance by forcing collections when they aren't needed.

The two fundamental mitigation strategies that you can follow in 
the general case: 1.) minimize the number of allocations and 2.) 
keep the size of allocations as small as possible. The first 
decreases the number of opportunities for a collection to occur, 
the second helps keep collection times shorter. That doesn't mean 
you should always work to avoid the GC, just be smart about how 
and when you allocate just as you would in C and C++.

 I suppose there might also be a difference in 
 cache-friendliness as cleaning up immediately by hand might be 
 working on hot memory, but the GC scanner coming along much 
 later might have to deal with cold memory, but it may not 
 matter if the activity is app-scheduled like low priority work 
 or is within time periods that are merely eating into io-bound 
 wait periods anyway.

 I definitely need to read up on this. Have never used a GC 
 language, just decades of C and mountains of asm.

You might start with the GC series on the D Blog [2]. The next 
post (Go Your Own Way Part Two: The Heap) is coming some time in 
the next couple of weeks.

 Any general guidance on how to optimise cpu usage particularly 
 responsiveness.

If it works for C, it works for D. Yes, the GC can throw you into 
a world of cache misses, but again, smart allocation strategies 
can minimize the impact.

Having worked quite a bit with C, Java, and D, my sense is it's 
best to treat D more like C than Java. Java programmers have 
traditionally had little support for optimizing cache usage 
(there are libraries out there now that can help, and I hear 
there's movement to finally bring value type aggregates to the 
language), and with the modern GC implementations as good as they 
are it's recommended to avoid the strategies of the past (such as 
pooling and reusing objects) in favor of allocating as needed. In 
D, you have the tools to optimize cache usage (such as choosing 
contiguous arrays of efficiently laid out structs over 
GC-allocated classes), and the GC implementation isn't near as 
shiny as those available for Java. So I think it's more natural 
for a C programmer with little Java experience to write efficient 
code in D than the converse. Don't overthink it.

 One pattern I used to use when writing service processes 
 (server apps) is that of deferring compute tasks by using a 
 kind of 'post this action' which adds an entry into a queue, 
 the entry is a function address plus arg list and represents 
 work to be done later. In the top loop, the app then executes 
 these 'posted' jobs later at app-scheduled low priority 
 relative to other activities and all handling of io and timer 
 events, when it has nothing else to do, by simply calling 
 through the function pointer in a post queue entry. So it's a 
 bit like setting a timer for 0 ms, passing a callback function. 
 Terminology - A DFC or lazy, late execution might be other 
 terms. I'm wondering if using the garbage collector well might 
 fit into this familiar pattern? That fair? And actually even 
 help peformance for me if I'm lucky?

The GC will certainly simplify the implementation in that you can 
allocate your arg list and not worry about freeing it, but how it 
affects performance is anyone's guess. That largely depends on 
the points I raises above: how often you allocate and how much.


[1] https://dlang.org/blog/2017/07/28/project-highlight-funkwerk/
[2] https://dlang.org/blog/the-gc-series/

Aug 28 2017

Elronnd <elronnd em.slashem.me> writes:

On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote:
 I don't know when the GC actually gets a chance to run.

Another alternative that I *think* (maybe someone who knows a bit 
more about the gc can chime in?) would work is if you manually 
stopped the gc then ran collections when profiling shows that 
your memory usage is high.  To get GC functions, "import 
core.memory".  To stop the GC (put this at the top of main()) 
"GC.disable()".  To trigger a collection, "GC.collect()".  That 
way you don't have to manually free everything, there's just one 
line of code.

Aug 28 2017

Jon Degenhardt <jond noreply.com> writes:

On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote:
 I am vacillating - considering breaking a lifetime's C habits 
 and letting the D garbage collector make life wonderful by just 
 cleaning up after me and ruining my future C disciple by not 
 deleting stuff myself.

The tsv command line tools I open-sourced haven't any problems 
with GC. They are only one type of app, perhaps better suited to 
GC than other apps, but still, it is a reasonable data point. 
I've done rather extensive benchmarking against similar tools 
written in native languages, mostly C. The D tools were faster, 
often by significant margins. The important part is not that they 
were faster on any particular benchmark, but that they did well 
against a fair variety of tools written by a fair number of 
different programmers, including several standard unix tools. The 
tools were programmed using the standard library where possible, 
without resorting to low-level optimizations.

I don't know if the exercise says anything about GC vs manual 
memory management from the perspective of maximum possible code 
optimization. But, I do think it is suggestive of benefits that 
may occur in more regular programming, in that GC allows you to 
spend more time on other aspects of your program, and less time 
on memory management details.

That said, all the caveats, suggestions, etc. given by others in 
this thread apply to my programs to. GC is hardly a free lunch.

Benchmarks on the tsv utilities: 
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md

Blog post describing some of the techniques used: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

--Jon

Aug 28 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - General performance tip about possibly using the GC or not