www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Easy & huge GC optimizations

reply Etienne <etcimon gmail.com> writes:
I was thinking of how the GC could be optimized further and came across 
some sweet flags that are barely used throughout Phobos in favor of a 
witch-hunting against the GC:

GC.BlkAttr.NO_SCAN
GC.BlkAttr.NO_INTERIOR

When using the NO_SCAN attribute with GC.setAttr(p, NO_SCAN), you're 
basically doing removeRange on a GC allocation. It's never going to scan 
the memory in it, but the memory will stay alive if pointers are found 
pointing to anything inside it. This is very useful for strings! But I 
can't even find an example of a string with this flag. I'm totally baffled.

When using NO_INTERIOR attribute, you're telling the GC that nothing can 
point to the inside of the allocation if it's bigger than 4096 bytes, 
and to completely ignore scanning its contents in such case.

With these 2 attributes, one could write a simple recursive function in 
phobos that adjusts the flags on an object's allocations based on the 
type info.

Tuple!(int, int, int, string)[] bigTupleArray;
bigTupleArray.optimizeGC(); // sets NO_SCAN on ints, and on the pointee 
of string

Thoughts?
May 22 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 22.05.2014 18:42, Etienne wrote:
 I was thinking of how the GC could be optimized further and came across
 some sweet flags that are barely used throughout Phobos in favor of a
 witch-hunting against the GC:

 GC.BlkAttr.NO_SCAN
 GC.BlkAttr.NO_INTERIOR

 When using the NO_SCAN attribute with GC.setAttr(p, NO_SCAN), you're
 basically doing removeRange on a GC allocation. It's never going to scan
 the memory in it, but the memory will stay alive if pointers are found
 pointing to anything inside it. This is very useful for strings! But I
 can't even find an example of a string with this flag. I'm totally baffled.

 When using NO_INTERIOR attribute, you're telling the GC that nothing can
 point to the inside of the allocation if it's bigger than 4096 bytes,
 and to completely ignore scanning its contents in such case.

 With these 2 attributes, one could write a simple recursive function in
 phobos that adjusts the flags on an object's allocations based on the
 type info.

 Tuple!(int, int, int, string)[] bigTupleArray;
 bigTupleArray.optimizeGC(); // sets NO_SCAN on ints, and on the pointee
 of string

 Thoughts?

rt.lifetime makes heavy use of "NO_SCAN" for all array handling (including strings). It uses the compiler generated flag inside the TypeInfo. Classes also deal with this flag (see _d_newclass). I'm not sure where allocating a struct ends up, so maybe there is some potential there. "NO_INTERIOR" is currently only used for the hash array used by associative arrays. It is a bit dangerous to use as any pointer,slice or register still operating on the array is ignored, so collecting it might corrupt your memory.
May 22 2014
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 22 May 2014 14:12:38 -0400, Rainer Schuetze <r.sagitario gmx.de>  
wrote:

 I'm not sure where allocating a struct ends up, so maybe there is some  
 potential there.

It's done similarly to arrays. The function is here: https://github.com/D-Programming-Language/druntime/blob/master/src/rt/lifetime.d#L991 -Steve
May 22 2014
prev sibling next sibling parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?
May 22 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some statistics, you might have references in there to anything. As a result you cannot collect anything.
May 22 2014
next sibling parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-23 2:17 AM, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used by
 associative arrays. It is a bit dangerous to use as any pointer,slice or
 register still operating on the array is ignored, so collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some statistics, you might have references in there to anything. As a result you cannot collect anything.

It only skips the inner search of the pool, like marking it NO_SCAN if a sample of the pointers that pointed to it are still alive. I mean, why would you want to check the pointers and mark every page in a memory zone when you know they're probably all there anyways? The idea is that you could manage to avoid collection altogether during periods of high allocation. There's no other way to guess it. And specifying "GC.disable()" before making allocations is a little too verbose to consider it an optimization of the GC.
May 23 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-23 12:33 PM, Etienne wrote:
 It only skips the inner search of the pool, like marking it NO_SCAN if a
 sample of the pointers that pointed to it are still alive.

Sorry that's not true. It's like marking it NO_INTERIOR while it being still SCAN. By default, all the pages would be marked if the sample pointers to the pool are still alive. And so the objective is to be able to skip collections. How many collections are executed only to recover only 1-2% of the memory?
May 23 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 23.05.2014 18:41, Etienne wrote:
 On 2014-05-23 12:33 PM, Etienne wrote:
 It only skips the inner search of the pool, like marking it NO_SCAN if a
 sample of the pointers that pointed to it are still alive.

Sorry that's not true. It's like marking it NO_INTERIOR while it being still SCAN. By default, all the pages would be marked if the sample pointers to the pool are still alive.

Ok, so based on the samples, you skip fine grained marking of the objects inside the pool. The memory still needs to be scanned for references, though. I don't think this buys you a lot. You will have to scan more memory than when you can skip allocations that are no longer referenced. BTW: How do you detect the sample pointers are alive? Or do you mean just the roots?
 And so the objective is to be able to skip collections. How many
 collections are executed only to recover only 1-2% of the memory?

I agree it would be good to have a measure to detect if it makes sense to run the full collection at all. I don't see how the sample pointers help here.
May 23 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-23 2:14 PM, Rainer Schuetze wrote:
 BTW: How do you detect the sample pointers are alive? Or do you mean
 just the roots?

You store a void** and the original value, dereference it to see if it's the same value as the original. Loop through 20 of those if you have 500, and you update them during marking by taking 1 in 20. There's mathematical proof that you if you don't have 5 of those dead, you're great to go and don't need to collect. Being flexible enough, you can skip 9 in 10 collections altogether imo see this: https://github.com/D-Programming-Language/druntime/pull/803 also mathematical proof is at page 154-183 of: http://books.google.ca/books?id=b-XFrpBQ7d0C&lpg=PA119&pg=PA154#v=onepage&q&f=false
May 23 2014
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 23.05.2014 20:52, Etienne wrote:
 On 2014-05-23 2:14 PM, Rainer Schuetze wrote:
 BTW: How do you detect the sample pointers are alive? Or do you mean
 just the roots?

You store a void** and the original value, dereference it to see if it's the same value as the original. Loop through 20 of those if you have 500, and you update them during marking by taking 1 in 20.

Looking at the address where the reference is stored gives you a hint about modifications, but does not say anything about whether the object that contains this reference is still alive.
 There's mathematical proof that you if you don't have 5 of those dead,
 you're great to go and don't need to collect.

You still have to scan/mark the whole memory. Collecting unmarked memory is the easy part that doesn't cost too much time.
 Being flexible enough, you can skip 9 in 10 collections altogether imo

 see this:
 https://github.com/D-Programming-Language/druntime/pull/803

 also mathematical proof is at page 154-183 of:
 http://books.google.ca/books?id=b-XFrpBQ7d0C&lpg=PA119&pg=PA154#v=onepage&q&f=false

AFAICT your test case does not measure garbage collection, but manual memory management using the GC as the memory manager. delete/free are not meant to be called by user code as these are unsafe operations.
May 23 2014
parent Etienne <etcimon gmail.com> writes:
On 2014-05-23 3:08 PM, Rainer Schuetze wrote:
 AFAICT your test case does not measure garbage collection, but manual
 memory management using the GC as the memory manager. delete/free are
 not meant to be called by user code as these are unsafe operations.

Yes, you're right, the test without the explicit calls to delete unveil a 0.68% freed memory per collection cycle. It's even worse than I had imagined, but it seems to fit in with how bad the GC appeared to slow down my applications.
May 23 2014
prev sibling next sibling parent Etienne <etcimon gmail.com> writes:
On 2014-05-23 11:41 AM, John Colvin wrote:
 Bear in mind here that most code goes though a whole bunch of machine
 learning algorithms in the CPU itself. Like it or not, it has proved
 extremely successful.

I'm happy you're here to say this. Machine learning is the future of algorithms & heuristics, and then it has to work its way up, otherwise there's no real advantage of just doing it at a high-level for e.g. voice recognition
May 23 2014
prev sibling parent reply Etienne <etcimon gmail.com> writes:
On 2014-05-23 1:29 PM, Chris wrote:
 I know that CPU's do a good bit of guessing. But that's not the same
 thing. If they err, they make up for it ("Ooops, it's not in the cache!
 Will get it from HD, just a nanosec!"). If the GC errs, how do you make
 up for it? Please educate me.

If the GC errs, worst case you lose a few bytes of free space (Type 1 Error: skips collection for an object). If it weren't already known, the worst case would be that destructors are not guaranteed to be called.. but that's taken for granted now. Worst case will never be access violation even through machine learning
May 23 2014
parent Etienne <etcimon gmail.com> writes:
On 2014-05-23 2:08 PM, Chris wrote:
 Fair enough. But what about programs that allocate a lot and run for
 ages (a server app for example)?

A server app? Couldn't have asked me for a better example. You can see my native events fork here (I'm working on replacing libevent): https://github.com/globecsys/vibe.d/tree/native-events/source/vibe/core/events I actually need to improve the GC because of this and a cache library I'm making for vibe.d: https://github.com/globecsys/cache.d If you have 10,000 connections, they each create a 64KB buffer in the GC and if you don't want to risk collection for every one in a few times a new connection comes in, you need to use some sampling. Using FreeLists and manual memory management is a bad idea because if you're going to need to copy segments into the GC anyways if you want to extend the lifetime of data received over the network. I have to create a copy of that data serialized because of manual management. I could store just a pointer if I didn't need to go around the GC which loses 95% of its time collecting absolutely nothing.
May 23 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used 
 by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some statistics, you might have references in there to anything. As a result you cannot collect anything.

I'm not a fan of machine learning, especially not in cases you _can_ control, like memory allocation / deallocation. Guessing is not a good strategy, if you can have control over something. Machine learning is only good for vast and unpredictable data (voice recognition for example). Then it makes sense to apply probability. But if you can actually control what you are doing, why would you want to rely on a stupid and blind machine that decides things for you based on a probability of n%? Things can go wrong and you don't even know why. Mind you, we should rule the machines, not the other way around.
May 23 2014
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
 On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array used 
 by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some statistics, you might have references in there to anything. As a result you cannot collect anything.

I'm not a fan of machine learning, especially not in cases you _can_ control, like memory allocation / deallocation. Guessing is not a good strategy, if you can have control over something. Machine learning is only good for vast and unpredictable data (voice recognition for example). Then it makes sense to apply probability. But if you can actually control what you are doing, why would you want to rely on a stupid and blind machine that decides things for you based on a probability of n%? Things can go wrong and you don't even know why. Mind you, we should rule the machines, not the other way around.

Bear in mind here that most code goes though a whole bunch of machine learning algorithms in the CPU itself. Like it or not, it has proved extremely successful.
May 23 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 23 May 2014 at 15:41:39 UTC, John Colvin wrote:
 On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
 On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
 On 22.05.2014 21:04, Etienne wrote:
 On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
 "NO_INTERIOR" is currently only used for the hash array 
 used by
 associative arrays. It is a bit dangerous to use as any 
 pointer,slice or
 register still operating on the array is ignored, so 
 collecting it might
 corrupt your memory.

That's quite a relief, I was afraid of having to do it ;) I'm currently exploring the possibility of sampling the pointers during mark'ing to check if they're gone and using bayesian probabilities to decide whether or not to skip the pool. I explained it all here: https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016 -- paste -- Basically, when marking, you take 1 in X of the references and send them to a specific array that represents the pool they refer to. Then, next time you're going to collect you test them individually and if they're mostly there, you skip marking/free'ing for that particular pool during collection. You can force collection on certain pools every 1 in X collections to even out the average lifetime of the references. You're going to want to have a lower certainty of failure for big allocations, but basically you're using probabilities to avoid pushing a lot of useless load on the processor, especially when you're in a part of an application that's just allocating a lot (sampling will determine that the software is not in a state of data removal). http://en.wikipedia.org/wiki/Bayes_factor -- end paste -- The bayes factor is merely there to choose the appropriate model that fits with the program. Bayesian inference would take care of deciding if a pool should end up being mark'ed. In other words, machine learning. Would you think it'd be a good optimization opportunity?

Hmm, I guess I don't get the idea. You cannot skip a pool based on some statistics, you might have references in there to anything. As a result you cannot collect anything.

I'm not a fan of machine learning, especially not in cases you _can_ control, like memory allocation / deallocation. Guessing is not a good strategy, if you can have control over something. Machine learning is only good for vast and unpredictable data (voice recognition for example). Then it makes sense to apply probability. But if you can actually control what you are doing, why would you want to rely on a stupid and blind machine that decides things for you based on a probability of n%? Things can go wrong and you don't even know why. Mind you, we should rule the machines, not the other way around.

Bear in mind here that most code goes though a whole bunch of machine learning algorithms in the CPU itself. Like it or not, it has proved extremely successful.

What I'm saying is that in cases where you do have control you should not transfer it to the machine. Either you free memory yourself with free() or the GC mechanism is exact and does not "assume" things. This could cause inexplicable random bugs. I remember that about the GC introduced in Objective-C the manual said something like: Some objects may never be collected. I'm not an expert on GC, far from it, but I didn't like the sound of it. I know that CPU's do a good bit of guessing. But that's not the same thing. If they err, they make up for it ("Ooops, it's not in the cache! Will get it from HD, just a nanosec!"). If the GC errs, how do you make up for it? Please educate me.
May 23 2014
prev sibling next sibling parent "Chris" <wendlec tcd.ie> writes:
On Friday, 23 May 2014 at 17:38:27 UTC, Etienne wrote:
 On 2014-05-23 1:29 PM, Chris wrote:
 I know that CPU's do a good bit of guessing. But that's not 
 the same
 thing. If they err, they make up for it ("Ooops, it's not in 
 the cache!
 Will get it from HD, just a nanosec!"). If the GC errs, how do 
 you make
 up for it? Please educate me.

If the GC errs, worst case you lose a few bytes of free space (Type 1 Error: skips collection for an object). If it weren't already known, the worst case would be that destructors are not guaranteed to be called.. but that's taken for granted now. Worst case will never be access violation even through machine learning

Fair enough. But what about programs that allocate a lot and run for ages (a server app for example)?
May 23 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 23 May 2014 at 15:41:39 UTC, John Colvin wrote:
 Bear in mind here that most code goes though a whole bunch of 
 machine learning algorithms in the CPU itself. Like it or not, 
 it has proved extremely successful.

What kind of machine learning? Branch prediction?
May 23 2014