www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Garbage collection, and practical strategies to avoid allocation

reply Manu <turkeyman gmail.com> writes:
--089e013cc42823510104de0e21af
Content-Type: text/plain; charset=UTF-8

So let's talk about garbage collection, and practical strategies to avoid
allocation.

GC related discussions come up basically every day, perhaps multiple times
a day on IRC, and the recent reddit 2.063 release thread is dominated by
C++ programmers who are keenly interested in D, but are scared by the GC.
I can say with confidence, as someone who has gone out on a limb and
actually invested a lot of time and energy in D, I'm as nervous (or more
so) as they are, and feel their turmoil deeply!

So where are we?

I can only speak from my industry's perspective. As I see it, we are here:
 - Stopping the world is unacceptable
 - Scanning the whole heap is costly
 - Also seems extremely wasteful to scan the whole heap when the overall
temporal stability of a realtime heap is extremely high (just a few temps
allocated frome-to-frame on average, and mostly on the stack!)
 - GC runs at unpredictable moments
 - GC collection cycles become more and more frequent the less unallocated
memory overhead you have. Hint: video games usually run within kb of the
systems available memory. How often will full heap-scanning collections be
issued to collect a couple of transient/temporary allocations when there is
only a few kb free memory? Conceivably, more than once per frame...
 - In a low-free-memory environment, what is the cumulative effect of
fragmentation? Can this be measured, or will it be a nasty surprise 2
months from shipping a 20-million dollar project? (Hint: a discovery of
this sort could very well ruin a project and destroy a company)

Basically nobody will have experienced these issues to their fullest extent
on a PC with plentiful memory. But they must be considered if the audience
still stuck in C++ is to take D seriously (who I predict are D's greatest
potential user-base).

While I do think a sufficiently advanced GC might satisfy the realtime
environment, the more I think about it, the more I am thinking a GC is not
applicable to the embedded (or memory limited) environment.

So what options exist?

I'm thinking more and more that I like the idea of a ref-counting GC.
People argue that managing ref-counts may be slower, perhaps true, it
requires a small amount of localised overhead, but if allocation frequency
is low, it shouldn't be much.
I think the key advantages though are:
 - determinism, memory will be immediately freed (or perhaps deferred by a
small but predictable amount of time, let's say, 1 frame)
 - elimination of full-heap scans which takes the most time
 - the refcount table is self-contained, won't destroy the dcache like a
heap scan
 - less tendency to fragment, since transient allocations can come into and
leave existence before something else allocates beside it

But this is only part of the problem.
Naturally, the fastest allocation is the one that never happened.
So an equally, or even more important aspect of the puzzle is offering
clearly documented advice, and convenient syntax/patterns on how to avoid
allocation in general.
It should be made EASY to avoid allocation. This way people will tend to do
it by habit, further alleviating the problem.

I've made the case that libraries should avoid surprise allocations at all
costs. Maybe this leads back to  nogc conversations (a concept I'm not
personally sold on), but something needs to be done, and
best-practises/conventions need to be defined.

So I'd go so far as to say, perhaps these 2 points should be considered as
key goals for 2.064?
  * find a solution for deterministic embedded garbage collection
  * decide some realistic best-practises/conventions for reliably (and
conveniently!) avoiding allocations

Showing progress on these will show real progress on D for ex-C++ users. In
my time using D, there has been exactly ZERO progress on the GC issue,
which is discouraging to say the least, perhaps even kinda scary.
(fortunately, people are thinking about it, and there were 2 great talks at
dconf, but I don't think either address the specific issues I raise)

Discuss... (or perhaps, "destroooy")

--089e013cc42823510104de0e21af
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">So let&#39;s talk about garbage collection, and practical =
strategies to avoid allocation.<div><br></div><div>GC related discussions c=
ome up basically every day, perhaps multiple times a day on IRC, and the re=
cent reddit 2.063 release thread is dominated by C++ programmers who are ke=
enly interested in D, but are scared by the GC.</div>
<div>I can say with confidence, as someone who has gone out on a limb and a=
ctually invested a lot of time and energy in D, I&#39;m as nervous (or more=
 so) as they are, and feel their turmoil deeply!<br></div><div><br></div>
<div><div style>So where are we?</div><div style><br></div><div style>I can=
 only speak from my industry&#39;s perspective. As I see it, we are here:</=
div><div style>=C2=A0- Stopping the world is unacceptable<br></div><div sty=
le>
=C2=A0- Scanning the whole heap is costly</div><div style>=C2=A0- Also seem=
s extremely wasteful to scan the whole heap when the overall temporal stabi=
lity of a realtime heap is extremely high (just a few temps allocated frome=
-to-frame on average, and mostly on the stack!)</div>
<div style>=C2=A0- GC runs at unpredictable moments</div><div style>=C2=A0-=
 GC collection cycles become more and more frequent the less unallocated me=
mory overhead you have. Hint: video games usually run within kb of the syst=
ems available memory. How often will full heap-scanning collections be issu=
ed to collect a couple of transient/temporary allocations when there is onl=
y a few kb free memory? Conceivably, more than once per frame...</div>
<div style>=C2=A0- In a low-free-memory environment, what is the cumulative=
 effect of fragmentation? Can this be measured, or will it be a nasty surpr=
ise 2 months from shipping a 20-million dollar project? (Hint: a discovery =
of this sort could very well ruin a project and destroy a company)</div>
<div style><br></div><div style>Basically nobody will have experienced thes=
e issues to their fullest extent on a PC with plentiful memory. But they mu=
st be considered if the audience still stuck in C++ is to take D seriously =
(who I predict are D&#39;s greatest potential user-base).</div>
</div><div style><br></div><div style>While I do think a sufficiently advan=
ced GC might satisfy the realtime environment, the more I think about it, t=
he more I am thinking a GC is not applicable to the embedded (or memory lim=
ited) environment.</div>
<div style><br></div><div style>So what options exist?</div><div style><br>=
</div><div style>I&#39;m thinking more and more that I like the idea of a r=
ef-counting GC.</div><div style>People argue that managing ref-counts may b=
e slower, perhaps true, it requires a small amount of localised overhead, b=
ut if allocation frequency is low, it shouldn&#39;t be much.</div>
<div style>I think the key advantages though are:</div><div style>=C2=A0- d=
eterminism, memory will be immediately freed (or perhaps deferred by a smal=
l but predictable amount of time, let&#39;s say, 1 frame)</div><div style>=
=C2=A0- elimination of full-heap scans which takes the most time</div>
<div style>=C2=A0- the refcount table is self-contained, won&#39;t destroy =
the dcache like a heap scan</div><div style>=C2=A0- less tendency to fragme=
nt, since transient allocations can come into and leave existence before so=
mething else allocates beside it</div>
<div style><br></div><div style>But this is only part of the problem.</div>=
<div style>Naturally, the fastest allocation is the one that never happened=
.<br></div><div style>So an equally, or even more important aspect of the p=
uzzle is offering clearly documented advice, and convenient syntax/patterns=
 on how to avoid allocation in general.</div>
<div style>It should be made EASY to avoid allocation. This way people will=
 tend to do it by habit, further alleviating the problem.</div><div style><=
br></div><div style>I&#39;ve made the case that libraries should avoid surp=
rise allocations at all costs. Maybe this leads back to  nogc conversations=
 (a concept I&#39;m not personally sold on), but something needs to be done=
, and best-practises/conventions need to be defined.</div>
<div style><br></div><div style><div>So I&#39;d go so far as to say, perhap=
s these 2 points should be considered as key goals for 2.064?</div><div sty=
le>=C2=A0 * find a solution for deterministic embedded garbage collection</=
div>
<div style>=C2=A0 * decide some realistic best-practises/conventions for re=
liably (and conveniently!) avoiding allocations</div><div><br></div><div>Sh=
owing progress on these will show real progress on D for ex-C++ users. In m=
y time using D, there has been exactly ZERO progress on the GC issue, which=
 is discouraging to say the least, perhaps even kinda scary. (fortunately, =
people are thinking about it, and there were 2 great talks at dconf, but I =
don&#39;t think either address the specific issues I raise)<br>
</div><div><br></div></div><div style>Discuss... (or perhaps, &quot;destroo=
oy&quot;)</div></div>

--089e013cc42823510104de0e21af--
May 31 2013
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
just to toss in my quick thoughts, I wrote a couple comments on 
the recent reddit thread about using D with a minimal runtime and 
some of the talk may be relevant here too:

http://www.reddit.com/r/programming/comments/1fc9jt/dmd_2063_the_d_programming_language_reference/ca94mek


Some little things we could do is add overloads to some functions 
that return string to be able to take a buffer argument too.

string to(T:string)(int a) { char[] buf = new char[](16); return 
assumeUnique(to(a, buffer));

char[] to(int a, char[] buffer) { deposit it straight into 
buffer, return the slice into buffer that is actually used; }


and so on for all kinds of functions. Kinda a pain in the butt 
but when you need it, you have the second variant, and if you 
don't care, the convenient first one is still available (also 
avoids breaking code!)


I also mentioned on irc yesterday that I think a good, low cost 
idea to help find allocations is to add a global flag to druntime 
that makes gc_malloc throw an Error. Then you can set this flag 
in a critical section of code and at least get a runtime notice 
of a missed allocation in testing while still having the gc for 
the rest of the app.

Another member also suggested you can do that easily enough by 
running the program in a debugger and setting a breakpoint at 
gc_malloc, but I think the flag would be simpler yet.
May 31 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-06-01 04:16, Adam D. Ruppe wrote:
 Some little things we could do is add overloads to some functions that
 return string to be able to take a buffer argument too.

 string to(T:string)(int a) { char[] buf = new char[](16); return
 assumeUnique(to(a, buffer));

 char[] to(int a, char[] buffer) { deposit it straight into buffer,
 return the slice into buffer that is actually used; }

This approach is used all over the place in Tango. -- /Jacob Carlborg
Jun 01 2013
prev sibling next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Saturday, 1 June 2013 at 02:16:02 UTC, Adam D. Ruppe wrote:
 just to toss in my quick thoughts, I wrote a couple comments on 
 the recent reddit thread about using D with a minimal runtime 
 and some of the talk may be relevant here too:

 http://www.reddit.com/r/programming/comments/1fc9jt/dmd_2063_the_d_programming_language_reference/ca94mek


 Some little things we could do is add overloads to some 
 functions that return string to be able to take a buffer 
 argument too.

 string to(T:string)(int a) { char[] buf = new char[](16); 
 return assumeUnique(to(a, buffer));

 char[] to(int a, char[] buffer) { deposit it straight into 
 buffer, return the slice into buffer that is actually used; }

I played around with adding an overload that accepted an output range to some of the std.string functions identified in my run of -vgc over phobos[1] (after Jonathan pointed out this is probably the best approach and is already what formattedWrite does). It worked fine but it did make me realize there aren't a lot of output ranges available to plug in at the moment (appender and lockingTextWriter are the only two that come to mind though there may be others). Appender isn't useful if your goal is to avoid the GC. Array!char et al aren't output ranges (whether they should be or not I have no idea). static arrays would need some sort of wrapper to make them output ranges I believe unless it was decided that put() should work by replacing the front and calling popFront for them (which I kind of doubt is the desired behavior). (feel free to correct me on any of this, range experts) 1. http://goo.gl/HP78r
May 31 2013
prev sibling next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Saturday, 1 June 2013 at 02:47:40 UTC, Brad Anderson wrote:
static
 arrays would need some sort of wrapper to make them output 
 ranges I believe unless it was decided that put() should work 
 by replacing the front and calling popFront for them (which I 
 kind of doubt is the desired behavior).

 (feel free to correct me on any of this, range experts)

Actually I'm completely wrong about this. Slices already work this way and you can use a slice of a static array just fine as an output range.
May 31 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, June 01, 2013 04:47:39 Brad Anderson wrote:
 I played around with adding an overload that accepted an output
 range to some of the std.string functions identified in my run of
 -vgc over phobos[1] (after Jonathan pointed out this is probably
 the best approach and is already what formattedWrite does).  It
 worked fine but it did make me realize there aren't a lot of
 output ranges available to plug in at the moment (appender and
 lockingTextWriter are the only two that come to mind though there
 may be others).  Appender isn't useful if your goal is to avoid
 the GC.  Array!char et al aren't output ranges (whether they
 should be or not I have no idea).  static arrays would need some
 sort of wrapper to make them output ranges I believe unless it
 was decided that put() should work by replacing the front and
 calling popFront for them (which I kind of doubt is the desired
 behavior).
 
 (feel free to correct me on any of this, range experts)
 
 1. http://goo.gl/HP78r

Dynamic arrays are output ranges. The one potential hitch there though relates to the fact that they get written to rather than appended to. This is actually exactly what you want in a situation like Manu's. However, that means that you have to worry about an output range running out of space and how you deal with that. If it's know how much will need to be appended, presumably you can check length if hasLength!R is true. Otherwise, I guess that the right thing to do is to check empty (arrays get shrunk as they're written to, so they'll be empty when you can't call put on them anymore). Unfortunately, put doesn't seem to worry about the case where the ouput range is full/empty, so the result when calling put on an empty range is undefined. The situation is even worse with narrow strings (assuming that put works with them - I'm not sure that it does at the moment) given that even if you knew their length (which you wouldn't if you were going by hasLength), you wouldn't know whether a put would succeed when the string was nearly empty, as the actual number of elements that the dchar would take up would depend on its value. In general, I don't think that output ranges have really been sorted out on quite the level that input ranges have been, and I think that some discussion is in order with regards to how to handle things like when the range can't be put into anymore. Given that one reason to use output ranges is for performance-critical code that doesn't want to allocate, throwing when the range is empty is probably a bad idea, and it's unclear that we can reasonably determine in the general case whether you can put to an output range before you actually try it. One solution to that would be to make bool return whether it succeeded or not, but it's an issue that definitely needs to be explored a bit. So, I very much think that the correct thing is to use output ranges, but how to use them needs to be better defined, and we probably need to better sort out some of their details (like the finish function which the hash stuff uses). - Jonathan M Davis
May 31 2013
prev sibling next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Saturday, 1 June 2013 at 04:16:44 UTC, Jonathan M Davis wrote:
 [snip]
 The situation is even
 worse with narrow strings (assuming that put works with them - 
 I'm not sure
 that it does at the moment) given that even if you knew their 
 length (which
 you wouldn't if you were going by hasLength), you wouldn't know 
 whether a put
 would succeed when the string was nearly empty, as the actual 
 number of
 elements that the dchar would take up would depend on its value.

Furthermore--you probably know this because of the conversation on GitHub but for anyone else reading--narrow strings are also hard to work with because you can't put a dchar on them currently. Kenji had a pull request (since closed) to help with narrow strings issues: https://github.com/D-Programming-Language/phobos/pull/1000 That and all the issues and unknowns you describe would need to fixed for output ranges are going to be the approach. I'm reminded of a forum post by monarch_dodra that talked about some of the issues you brought up: http://forum.dlang.org/thread/xyvnifnetythvrhtcexm forum.dlang.org
May 31 2013
prev sibling next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Saturday, 1 June 2013 at 02:03:07 UTC, Manu wrote:
 So let's talk about garbage collection, and practical 
 strategies to avoid
 allocation.

 Discuss... (or perhaps, "destroooy")

Here is my take: http://forum.dlang.org/post/tftjtzmfuauxwcgcolct forum.dlang.org Sorry, I didn't see your new discussion.
May 31 2013
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Manu:

  - Scanning the whole heap is costly

Rust avoids this giving a different heap to each thread, and common heap to share data managed with unique references.
  - GC runs at unpredictable moments

Is this true? I think the D GC runs only when you allocate something. Bye, bearophile
Jun 01 2013
prev sibling next sibling parent "js.mdnq" <js_adddot+mdng gmail.com> writes:
Nothing will get done until someone decides to put in the effort 
to fix the problem. D's biggest drawback at this point is the GC 
and one would think with all the smart people around here someone 
would have solved this problem by now.

We need a solution that allows one to "plug and play" different 
allocation strategies. One shouldn't have to rely on any 
particularly bad GC implementation if they don't want to(but as 
of now are forced to). Stop the world GC is just plain crap 
except for the most basic apps. Any app that requires performance 
is going to have issues with this, the reason being simply that 
there are much better ways.


The ability to avoid the GC is of utmost importance for real time 
apps regardless of what some want people to think.  This can't be 
done as because D internally uses the GC. Hence this must be 
worked around.  nogc/ gc has been brought up several times... 
seems like a start and can't hurt.

IMO the best way to deal with the situation is to use manual 
memory management with a GC backend.

e.g., pretend the GC is not there just like the good ol' days. If 
you forget to deallocate something, maybe get a warning from a 
sufficiently intelligent compiler, if the GC is turned on then it 
will clean up for you behind the scenes. If you need performance, 
turn it off or disable it temporarily.

This is the best of both worlds and allows one to easily go from 
one extreme to the other. One can actually implement a pattern to 
allow the user to select one extreme or the other.

One can already do this in D in user code but since the D core 
does not follow this approach it doesn't do one much good unless 
they want to avoid slices, phobos, etc...

I think it will go a long way to get a standard for D regarding 
memory allocation that follows these lines. All new code will be 
written with using the standard and old code will be updated.
Jun 01 2013
prev sibling next sibling parent Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-06-01 02:02:53 +0000, Manu <turkeyman gmail.com> said:

   * find a solution for deterministic embedded garbage collection

I think reference counting while still continuing to use the current GC to release cycles is the way to go. It wouldn't be too hard to implement. This could make it realistic to disable the GC entirely in a program if you need everything to be deterministic. The GC could still be of assistance as a debug tool to help find those rogue cycles that shouldn't be there by configuring it to emit logs about what it deallocates. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jun 01 2013
prev sibling next sibling parent Benjamin Thaut <code benjamin-thaut.de> writes:
In my eyes there is just one reason there is no better GC yet: It 
requires compiler support.

And thats a huge problem. The number of people who would actually be 
ablte to make the modifications on the compiler in this community is 
very small and they tend to not have much time doing it. A really good 
concurrent GC which properly deals with short lived allocations would 
need the following compiler support.

1) Callback on pointer write (if the pointer is on the heap)
2) Support for percisley scanning the stack
3) Callback on assignment of __shared or static shared variables
4) Callback on cast to shared
5) Callback on new of a shared object

The only thing the compiler already does support is the RTInfo template 
which can be used to percisely scan the heap, but thats not enough for a 
really good GC.

Kind Regards
Benjamin Thaut
Jun 01 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday, June 01, 2013 09:43:49 bearophile wrote:
  - GC runs at unpredictable moments

Is this true? I think the D GC runs only when you allocate something.

Sure, but which of these calls to new is going to cause the GC to run? auto a = new Foo; ... auto b = new Bar; ... auto c = new Baz; It could be that all or none of them do. It all depends on what memory is available at the time and what exactly they're trying to allocate. So, while you may be able to know that the GC will only run when new is called, you don't know which new will trigger it, and it can (and probably will) vary on each run of the program. Pretty much the only way to control it is to disable the GC and then explicitly do a collection when you can afford a collection to run (though in a program like those Manu typically writes, where you're generating something like 60 frames per second, the GC is pretty much never fast enough to be run - not with anything even vaguely like it's current implementation). - Jonathan M Davis
Jun 01 2013
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Sat, 01 Jun 2013 07:10:07 -0400, Michel Fortin  
<michel.fortin michelf.ca> wrote:

 On 2013-06-01 02:02:53 +0000, Manu <turkeyman gmail.com> said:

   * find a solution for deterministic embedded garbage collection

I think reference counting while still continuing to use the current GC to release cycles is the way to go. It wouldn't be too hard to implement.

+1 I was going to write this exact post, but you already did :) -Steve
Jun 03 2013