www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Possible quick win in GC?

reply "Abdulhaq" <alynch4047 gmail.com> writes:
Perhaps I've too had much caffeine today but I've had an idea 
which might give a fairly quick win on the GC speed situation. 
It's a simple idea at heart so it's very possible/likely that 
this is a well known idea that has already been discarded but 
anyway here goes.

I got the idea after thinking that it should be fairly simple for 
the compiler to detect straightforward cases of when a variable 
can be declared as going on the stack - i.e. no references to it 
are retained after its enclosing function returns. At the moment 
AIUI it is necessary for a class instance to be declared by the 
programmer as 'scoped' for this to take place.

Further, I was considering the type of ownership and boundary 
considerations that could be used to improve memory management - 
e.g. using the notion of an owner instance which, upon 
destruction, destroys all owned objects.

Afer some consideration it seems to me that by using only static 
analysis a tree of references could be constructed of references 
from a root 'scoped' object to all referred to objects that are 
allocated after the allocation of the root object. When the root 
object goes out of scope it is destroyed and all the descendent 
objects from the root object (as identified by the static 
analysis) could also be destroyed in one simple shot. The static 
analysis of course constructs the tree by analysing the capturing 
of references from one object to another. It could be the case 
that even a simple static analysis at first (e.g. discard the 
technique in difficult situations) could cover a lot of use cases 
(statistically).

Of course, if one of the descendent objects is referred to by an 
object which is not in the object tree, then this technique 
cannot be used. However, I envisage that there are many 
situations where upon the destruction of a root object all 
related post-allocated objects can also be destroyed.

In terms of implementation I see this being done by what I am 
calling 'bands' within the GC. With the allocation of any 
identified root object, a new band (heap) is created in the GC. 
Child objects of the root object (i.e. only referred to by the 
root object and other child objects in its tree) are placed in 
the same band. When the root object goes out of scope the entire 
band is freed. This by definition is safe because the static 
analysis has ensured that there are no 'out-of-tree' references 
between child objects in the tree and out-of-tree (out-of-band) 
objects. This property also means that normal GC runs do not need 
to add the scoped root object as a GC root object - this memory 
will normally only be freed when the scoped root object at the 
top of the tree goes out of scope. If memory becomes constrained 
then the bands can be added as root objects to the GC and memory 
incrementally freed just as with regularly allocated objects.


Sorry if this idea is daft and I've wasted your time!
Sep 28 2014
next sibling parent reply "Abdulhaq" <alynch4047 gmail.com> writes:
Here's a code snippet which mopefully makes things a bit clearer:


/**
* In this example the variable foo can be statically analysed as 
safe to go on the stack.
* The new instance of Bar allocated in funcLevelB is only 
referred to by foo. foo can
* be considered a root 'scoped' variable and the GC can delete 
both foo and the new Bar()
* when foo goes out of scope. There is no need (except when under 
memory pressure) for
* the GC to scan the band created for foo and it's related child 
allocations.
*/

import std.stdio;

class Bar {
	public:
	int x;
	
	this(int x) {
		this.x = x;
	}
}

class Foo {
	public:
	Bar bar;
}

void funcLevelA() {
	Foo foo = new Foo(); // static analysis could detect this as 
able to go on the stack
	funcLevelB(foo);
	writeln(foo.bar.x);
}

void funcLevelB(Foo foo) {
	foo.bar = new Bar(12); // this allocated memory is only referred 
to by foo, which
						   // static analysis has established can go on the stack	
}

void main() {
	funcLevelA();
}
Sep 28 2014
parent reply "Freddy" <Hexagonalstar64 gmail.com> writes:
On Sunday, 28 September 2014 at 17:47:42 UTC, Abdulhaq wrote:
 Here's a code snippet which mopefully makes things a bit 
 clearer:


 /**
 * In this example the variable foo can be statically analysed 
 as safe to go on the stack.
 * The new instance of Bar allocated in funcLevelB is only 
 referred to by foo. foo can
 * be considered a root 'scoped' variable and the GC can delete 
 both foo and the new Bar()
 * when foo goes out of scope. There is no need (except when 
 under memory pressure) for
 * the GC to scan the band created for foo and it's related 
 child allocations.
 */

 import std.stdio;

 class Bar {
 	public:
 	int x;
 	
 	this(int x) {
 		this.x = x;
 	}
 }

 class Foo {
 	public:
 	Bar bar;
 }

 void funcLevelA() {
 	Foo foo = new Foo(); // static analysis could detect this as 
 able to go on the stack
 	funcLevelB(foo);
 	writeln(foo.bar.x);
 }

 void funcLevelB(Foo foo) {
 	foo.bar = new Bar(12); // this allocated memory is only 
 referred to by foo, which
 						   // static analysis has established can go on the stack	
 }

 void main() {
 	funcLevelA();
 }
You mean this? https://en.wikipedia.org/wiki/Escape_analysis
Sep 28 2014
parent "Abdulhaq" <alynch4047 gmail.com> writes:
 You mean this?
 https://en.wikipedia.org/wiki/Escape_analysis
Of course my proposal uses the techique of escape analysis as part of its methodology, but the essence of the idea is to greatly cut down on the work that the GC has to do on each sweep when dealing with objects that have been found to belong to a particular set. The objects in each set are in an object graph that has no incoming references from objects external to the set and which can therefore be allocated in their own heap that is destroyed when the root object goes out of scope. The saving takes place because the GC does not need to scan the default heap for pointers found in the new heaps (bands). For certain type of programs such as compilers / lexers / parsers where many temporary objects are allocated and shortly after deallocated this can result in a substantial time saving in execution. In terms of memory usage we would see multiple potentially large but short-lived spikes.
Sep 28 2014
prev sibling next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote:
 I got the idea after thinking that it should be fairly simple 
 for the compiler to detect straightforward cases of when a 
 variable can be declared as going on the stack - i.e. no 
 references to it are retained after its enclosing function 
 returns.
LDC does the "fairly simple" part of this already in a custom LLVM optimizer pass. The issue is that escape analysis is fairly hard in general, and currently even more limited because we only do it on the LLVM IR level (i.e. don't leverage any additional attributes like scope, pure, … that might be present in the D source code). David
Sep 28 2014
parent reply "Abdulhaq" <alynch4047 gmail.com> writes:
On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger 
wrote:
 On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote:
 I got the idea after thinking that it should be fairly simple 
 for the compiler to detect straightforward cases of when a 
 variable can be declared as going on the stack - i.e. no 
 references to it are retained after its enclosing function 
 returns.
LDC does the "fairly simple" part of this already in a custom LLVM optimizer pass. The issue is that escape analysis is fairly hard in general, and currently even more limited because we only do it on the LLVM IR level (i.e. don't leverage any additional attributes like scope, pure, … that might be present in the D source code). David
That's interesting, yes I guessed that the escape analysis would present the harder part, but I'm hoping that the algorithm can be built up incrementally, identifying the easy wins first and then over time extending it to cover harder cases. One way that I see it working it is to conduct a form of lowering where the new operator has some information added to it to indicate the 'band' that the GC should place the non-root objects into (root objects go on the stack). Using the syntax of C++'s placement new (but totally different semantics) code could be lowered to e.g. External externalObj = new(0) External(); // 0 means use the default heap Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for this set of objects ... Bar bar = new (0x1234) Bar(); When the GC allocates memory it does so in the indicated band/heap, and then when foo (the root object of the object graph) goes out of scope the relevant band/heap is destroyed en bloc. The benefit of the idea is that when scanning for objects that can be deleted the GC does not need to consider those objects in the non default bands/heaps. For some classes of programs such as compilers (it was Higgs that gave me the stimulus), and with good static analysis (aye there's the rub cap'n) this could represent a very substantial time saving on ech GC sweep.
Sep 29 2014
parent reply "Mike" <none none.com> writes:
On Monday, 29 September 2014 at 07:03:29 UTC, Abdulhaq wrote:
 On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger 
 wrote:
 On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote:
 I got the idea after thinking that it should be fairly simple 
 for the compiler to detect straightforward cases of when a 
 variable can be declared as going on the stack - i.e. no 
 references to it are retained after its enclosing function 
 returns.
LDC does the "fairly simple" part of this already in a custom LLVM optimizer pass. The issue is that escape analysis is fairly hard in general, and currently even more limited because we only do it on the LLVM IR level (i.e. don't leverage any additional attributes like scope, pure, … that might be present in the D source code). David
That's interesting, yes I guessed that the escape analysis would present the harder part, but I'm hoping that the algorithm can be built up incrementally, identifying the easy wins first and then over time extending it to cover harder cases. One way that I see it working it is to conduct a form of lowering where the new operator has some information added to it to indicate the 'band' that the GC should place the non-root objects into (root objects go on the stack). Using the syntax of C++'s placement new (but totally different semantics) code could be lowered to e.g. External externalObj = new(0) External(); // 0 means use the default heap Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for this set of objects ... Bar bar = new (0x1234) Bar(); When the GC allocates memory it does so in the indicated band/heap, and then when foo (the root object of the object graph) goes out of scope the relevant band/heap is destroyed en bloc. The benefit of the idea is that when scanning for objects that can be deleted the GC does not need to consider those objects in the non default bands/heaps. For some classes of programs such as compilers (it was Higgs that gave me the stimulus), and with good static analysis (aye there's the rub cap'n) this could represent a very substantial time saving on ech GC sweep.
Sounds a little like http://wiki.dlang.org/DIP46 Mike
Sep 29 2014
parent "Abdulhaq" <alynch4047 gmail.com> writes:
On Monday, 29 September 2014 at 09:45:38 UTC, Mike wrote:
 On Monday, 29 September 2014 at 07:03:29 UTC, Abdulhaq wrote:
 On Sunday, 28 September 2014 at 20:20:29 UTC, David Nadlinger 
 wrote:
 On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote:
 I got the idea after thinking that it should be fairly 
 simple for the compiler to detect straightforward cases of 
 when a variable can be declared as going on the stack - i.e. 
 no references to it are retained after its enclosing 
 function returns.
LDC does the "fairly simple" part of this already in a custom LLVM optimizer pass. The issue is that escape analysis is fairly hard in general, and currently even more limited because we only do it on the LLVM IR level (i.e. don't leverage any additional attributes like scope, pure, … that might be present in the D source code). David
That's interesting, yes I guessed that the escape analysis would present the harder part, but I'm hoping that the algorithm can be built up incrementally, identifying the easy wins first and then over time extending it to cover harder cases. One way that I see it working it is to conduct a form of lowering where the new operator has some information added to it to indicate the 'band' that the GC should place the non-root objects into (root objects go on the stack). Using the syntax of C++'s placement new (but totally different semantics) code could be lowered to e.g. External externalObj = new(0) External(); // 0 means use the default heap Foo foo = new(0x1234) Foo(); // 0x1234 is the heap/band id for this set of objects ... Bar bar = new (0x1234) Bar(); When the GC allocates memory it does so in the indicated band/heap, and then when foo (the root object of the object graph) goes out of scope the relevant band/heap is destroyed en bloc. The benefit of the idea is that when scanning for objects that can be deleted the GC does not need to consider those objects in the non default bands/heaps. For some classes of programs such as compilers (it was Higgs that gave me the stimulus), and with good static analysis (aye there's the rub cap'n) this could represent a very substantial time saving on ech GC sweep.
Sounds a little like http://wiki.dlang.org/DIP46 Mike
Ah thanks for the link yes there are definite similarities, Walter identifies sets of objects to go in his region through having a boundary on the pure function. My set is determined by static analysis and does not require the function to be pure. However, thedeemon has pointed out that this technique is in fact well known - I'm left to wonder if I should have a go at an implementation, but my wife would not be too chuffed about that.
Sep 29 2014
prev sibling parent reply "thedeemon" <dlang thedeemon.com> writes:
On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote...

Congratulations on inventing
http://en.wikipedia.org/wiki/Region-based_memory_management
and "Region inference" in particular.
Sep 29 2014
parent reply "Abdulhaq" <alynch4047 gmail.com> writes:
On Monday, 29 September 2014 at 11:02:12 UTC, thedeemon wrote:
 On Sunday, 28 September 2014 at 16:29:45 UTC, Abdulhaq wrote...

 Congratulations on inventing
 http://en.wikipedia.org/wiki/Region-based_memory_management
 and "Region inference" in particular.
Ah yes - it's identical isn't it - I don't know whether to be happy or sad about that :). They explain it much better too... So, (does anyone know) has this technique been discarded for D or is it 'just' a matter of the resources to do it?
Sep 29 2014
parent reply "thedeemon" <dlang thedeemon.com> writes:
On Monday, 29 September 2014 at 11:36:51 UTC, Abdulhaq wrote:

 So, (does anyone know) has this technique been discarded for D 
 or is it 'just' a matter of the resources to do it?
From the literature on this topic I remember attempts for automatic region inference mostly failed: it lead to some small regions here and there that didn't affect anything much and then one huge region where most of the data landed, requiring a full-blown GC inside that big region. Making it work in D, where everything can be mutated by anything and any type system wall is a cast() away from breaking, seems virtually impossible.
Sep 29 2014
parent "Abdulhaq" <alynch4047 gmail.com> writes:
On Monday, 29 September 2014 at 12:02:40 UTC, thedeemon wrote:
 On Monday, 29 September 2014 at 11:36:51 UTC, Abdulhaq wrote:

 So, (does anyone know) has this technique been discarded for D 
 or is it 'just' a matter of the resources to do it?
From the literature on this topic I remember attempts for automatic region inference mostly failed: it lead to some small regions here and there that didn't affect anything much and then one huge region where most of the data landed, requiring a full-blown GC inside that big region. Making it work in D, where everything can be mutated by anything and any type system wall is a cast() away from breaking, seems virtually impossible.
Mmm yes I can imagine that being the case... oh well!
Sep 30 2014