www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Threading and the Garbage handler mess.

reply Alan Knowles <alan akbkhome.com> writes:
If anyone noticed this week I posted a few questions, and then a few bis 
of code relating to threading and the GC.

Between tearing my hair out and generally verging on total madness, this 
week I'm beginning to get to grips with the GC in a threaded environment.

Basically this statement is true for both phobos and tango (although 
I've examined the phobos GC far more than tango)

-------------------------------------------------
"THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS, 
and you should not try and write threaded applicaitons without expecting 
to do a large amount of hacking to the GC"
-------------------------------------------------

Background  - HOW THE GC WORKS:
-------------------------------------------------
let's first explain the basic ideas behind the Garbage collector:

Since most things D land are malloc'd via the GC - the GC knows all 
about the memory you have allocated.. - It can also look at the stack 
(currently running memory, eg. int/pointers that you have created in 
your methods)..

Basically what it does is this:

- loop through all the memory it knows about, see's if any of it points 
to any of the memory it has allocated..

- if the memory it has allocated has not been reference by the memory it 
knows about.. then it's dead meat, and can be returned to the  'stack' 
so it can be made available to any 'new' malloc() call.

There is quite a bit more in there, like a really nice design for 
buckets/ and linked lists that default fill it... enabling quick 
allocation depending on what size of memory is requested...

Anyway I hope you get the point..




The problems..
-------------------------------------------------
A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS

  The current GC implementations free across threads both explicity with 
delete, and implicitly with genCollect() which is just downright dangerous.

Consider this example - const char Gotcha

.......................................
extern (C) {
	void * x = mylib_new();
	void mylib_dosomething(void x* const char* z);
	
}
... d code ...
function risky_function(char[] a) {
	mylib_dosomething(mylib, a.toStringz());
}
.......................................


In the above code I've left const char* in there, as that's what's in 
the C headers, but is not actually in the D code.

What is critical about this code is that while mylib* x is active, and 
has been passed 'char* z' in D, as in the risky_function() examples. 
For the purpose of the garbage collecter which can only look at memory 
it knows about to find pointers to 'char *z' z is ready to be 
collected... This problem can easily crop up with the libc functions 
that expect const char*'s and store data which another method may 
queried later, and find has been overwritten with something completely 
different (imagine function pointers!!)

the only workaround for this is to addRoot() and store the return value 
to any 'toStringz()' call which is delivered into a const char* C call.. 
- even this may not help as you can accidentally delete these objects..

Normally all of this is not a common problem in a non-threaded 
application, as you would probably just run the genCollect() after you 
have finished dealing with mylib() - and everything will get cleaned up 
nicely.

Unfortunatly with a multi-threaded application - if you run genCollect() 
at the wrong time (eg. in another thread), you will destroy the char* 
that was supposed to be constant...


-------------------------------------------------
B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
as in the example above, you may think that turning off the GC and not 
running getCollect(), they trying to free memory as you get it might be 
a better idea.

Unfortunatly generally D is dependant and expects you to be running the 
GC! a really good example is the toStringz() code.

what toStrings does is this
char* toStringz(char[] s) {
	char[] copy = new char[s.length+1];
	copy[0..s.length] = s;
	copy[s.length] =0; // pad end with \0
	return copy.ptr;
}

what char[] looks like under gdb is
struct { length = 123; ptr = "xxxxxxxxxx" }

the above code is probably doing a malloc() for the struct and a 
malloc() for the ptr data, but leaves the struct hanging and expected to 
be garbage collected, returning the ptr which is then sent on to a 
external method, and might be expected to be constant. ***Well not 100% 
sure about that behaviour, the struct may not be malloced, but just 
allocated on the stack..***

There are many other examples in D of code where trying to manage and 
clear memory explicitly is dificult / or near impossible (assoative 
arrays for example) - and the expectation is that the garbage collecter 
be used to sort it out..

The reality is that it's extremely difficult to be sure that you are not 
leaking memory, there are no built in tools currently for testing this 
(there is code in dsource = i've not tested it yet) , and it's kind of 
goes against the principle of D, where you can for the most part throw 
away all that memory allocation code, and concentrate on getting things 
done..

-------------------------------------------------
POSSIBLE SOLUTION?
-------------------------------------------------

What I've hacked into the GC so far:

- getIdx() for std.thread.Thread exposing the private thread.idx ???< 
how reliable is this??? - should be OK as long as you never delete a 
thread....

- A GC Log like Array for associating each bit of malloc()'d memory is 
owned by which thread.

- Checks in genCollect code to ensure that it does not try and free 
other thread's memory.

- Warnings in free() when the program does exactly that..

- Information in LOGGING so that malloc can be traced to a line of code 
  (class/method etc.- backtrace_symbols does not appear to return line 
numbers...) - see recent post


At this point running a threaded server under heavy load is alot more 
stable than before. however as you can understand by looking at the 
descriptions above. it still has some serious issues.

- Searching all the threads memory each time it wants to tidy memory a 
little is horribly inefficient..

- There are still issues with the const char* that should be checked for..

ONGOING IDEAS
-------------------------------------------------
- One gcx pool for each thread.

This seems like the most sensible solution.. - it would mean that the 
genCollect() run on a specific thread would only look at memory related 
to that thread.
** Implications
  + memory allocated in one thread and used in another could not be 
freed by that  thread. -- need to think more about this.
  ++ if you pass a complex structure when starting a thread, you will 
have to flag the  root in the parent thread to ensure that it's not 
freed by the starting thread.
  ++ the child thread will then need a way to flag so the starting 
thread knows it can  free it... -
  ++ making changes to a object in the child thread may make memory 
allocation  problematic.... -> the child thread does not know about the 
parent?!?!



- rooting the results from getStringz() - and warning users!!!!!
- using strncpy and removing the need for the struct?? ( not sure if it 
makes much difference - it's based on struct assumtion previously)

Suggested   code for toStringz()  = something like:

char* toStringz(char[] s) {
	char *c = malloc(s.length+1;
	strncpy(c, s.ptr, s.length);
	c[s.length]=0;
	gc.addRoot(c);
	return c;
}

free should check Root() to see if it's registed there, to prevent it 
being free'd by accident. (genCollect already does) an additional method 
rootFree() - which basically does removeRoot() + free()...


Anyway it'd be interested in feedback.
Regards
Alan
Sep 06 2008
next sibling parent reply downs <default_357-line yahoo.de> writes:
There's three trivial steps to solve ALL of this:

a) don't manually delete things, at all

b) store pointers that have been passed to C functions where the GC can see
them.

c) For memory that's explicitly supposed to _belong_ to the C functions, use
cstdlib malloc.

If you keep to these three points, things become very simple.
Sep 06 2008
parent Alan Knowles <alan akbkhome.com> writes:
downs wrote:
 There's three trivial steps to solve ALL of this:
 
 a) don't manually delete things, at all

 
 b) store pointers that have been passed to C functions where the GC can see
them. 

 
 c) For memory that's explicitly supposed to _belong_ to the C functions, use
cstdlib malloc.

d) avoid using ~this() as it's called from the gc, and you can not guarantee the state of memory of the object..
 If you keep to these three points, things become very simple.

Yes, very good advice. Regards Alan
Sep 09 2008
prev sibling next sibling parent Sean Kelly <sean invisibleduck.org> writes:
Alan Knowles wrote:
...
 
 The problems..
 -------------------------------------------------
 A) CROSS THREAD FREE's ARE EXTREMELY DANGEROUS
 
  The current GC implementations free across threads both explicity with 
 delete, and implicitly with genCollect() which is just downright dangerous.
 
 Consider this example - const char Gotcha
 
 .......................................
 extern (C) {
     void * x = mylib_new();
     void mylib_dosomething(void x* const char* z);
     
 }
 ... d code ...
 function risky_function(char[] a) {
     mylib_dosomething(mylib, a.toStringz());
 }
 .......................................
 
 
 In the above code I've left const char* in there, as that's what's in 
 the C headers, but is not actually in the D code.
 
 What is critical about this code is that while mylib* x is active, and 
 has been passed 'char* z' in D, as in the risky_function() examples. For 
 the purpose of the garbage collecter which can only look at memory it 
 knows about to find pointers to 'char *z' z is ready to be collected... 
 This problem can easily crop up with the libc functions that expect 
 const char*'s and store data which another method may queried later, and 
 find has been overwritten with something completely different (imagine 
 function pointers!!)

There are very few C standard library routines which store data that other functions reference, largely because they aren't thread-safe. So I don't think this is actually an issue in practice. But more generally, you're right. If you're using an external library that stores a reference to GCed data and you intend to discard your own reference to that data then you have to tell the GC not to collect it.
 -------------------------------------------------
 B) NOT RUNNING genCollect() JUST WILL KILL MEMORY
 as in the example above, you may think that turning off the GC and not 
 running getCollect(), they trying to free memory as you get it might be 
 a better idea.
 
 Unfortunatly generally D is dependant and expects you to be running the 
 GC! a really good example is the toStringz() code.
 
 what toStrings does is this
 char* toStringz(char[] s) {
     char[] copy = new char[s.length+1];
     copy[0..s.length] = s;
     copy[s.length] =0; // pad end with \0
     return copy.ptr;
 }
 
 what char[] looks like under gdb is
 struct { length = 123; ptr = "xxxxxxxxxx" }
 
 the above code is probably doing a malloc() for the struct and a 
 malloc() for the ptr data, but leaves the struct hanging and expected to 
 be garbage collected, returning the ptr which is then sent on to a 
 external method, and might be expected to be constant. ***Well not 100% 
 sure about that behaviour, the struct may not be malloced, but just 
 allocated on the stack..***

You're right. String operations in D necessarily assume the presence of a GC. Concatenation, appending, etc, all allocate new memory and leave the old memory untouched... assuming that it may still be referenced by something.
 -------------------------------------------------
 POSSIBLE SOLUTION?
 -------------------------------------------------
 
 What I've hacked into the GC so far:
 
 - getIdx() for std.thread.Thread exposing the private thread.idx ???< 
 how reliable is this??? - should be OK as long as you never delete a 
 thread....
 
 - A GC Log like Array for associating each bit of malloc()'d memory is 
 owned by which thread.
 
 - Checks in genCollect code to ensure that it does not try and free 
 other thread's memory.
 
 - Warnings in free() when the program does exactly that..

What about transferring data between threads? That aside, I think the new "shared" semantics in D2 address this fairly well.
 ONGOING IDEAS
 -------------------------------------------------
 - One gcx pool for each thread.

This needs language support to work well, since some restrictions must be placed on the use of static data if GC scanning of all threads is to be avoided. Sean
Sep 06 2008
prev sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Alan Knowles <alan akbkhome.com> wrote:
 "THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS

Isn't it a bit overstated? You basically say that memory could be prematurely freed if an external non-D function stores a pointer to GC-allocated memory internally. This is quite different from "COMPLETELY BROKEN" to me. You don't even mention that for both extern(C) and extern(Windows) pointers are pushed on stack so that memory is protected at least until the external function returns. Well it *can* overwrite its arguments with an unrelated data, but in all other scenarios your memory is protected. The solutions are: 1. Don't bother. This is the correct approach for most of CRT and Win32 functions because they do not store pointers. 2. Write wrappers. If you must use a non-pure, non-reentrant C library which stores pointers, or returns malloc'd memory,---then yes, you must understand how GC works, how to protect GC-allocated memory, and how to make GC handle externally-allocated memory. 3. Write D-aware external functions. If you implement them yourself you can always copy memory instead of storing raw pointers, so that you can follow solution 1 when writing D code.
Sep 07 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
Sergey Gromov wrote:
 Alan Knowles <alan akbkhome.com> wrote:
 "THE GC AS DELIVERED IS COMPLETELY BROKEN FOR REAL THREADED APPLICATONS

Isn't it a bit overstated?

through trying (and still am) random segfaults and large memory usage. - even by writing it down helped a lot in understanding the whole issue. (and I hope helps others a bit..) I'm alot closer now to solving the issues than I think I was... In part by writing the email and explaining the issues to myself.. - But I still think that a large multi-threaded application having to rely on the GC is not a pretty prospect, and that being able to manage memory in a tighter clearer, and cleaner way would be preferable.. - imagine 500 threads being paused and scanned every few seconds to ensure memory doesn't get out of hand - this doesn't sound efficient... (althought I'm not sure I've got any great answers yet) It's also a wake-up call to the authors of things like DBI - which really should read this and understand that you can not send connect() the username/password etc. without adding root to them.... - which I doubt currently many libraries in dsource address..
 
 You basically say that memory could be prematurely freed if an external 
 non-D function stores a pointer to GC-allocated memory internally.  This 
 is quite different from "COMPLETELY BROKEN" to me.  You don't even 
 mention that for both extern(C) and extern(Windows) pointers are pushed 
 on stack so that memory is protected at least until the external 
 function returns.  Well it *can* overwrite its arguments with an 
 unrelated data, but in all other scenarios your memory is protected.
 
 The solutions are:
 
 1. Don't bother.  This is the correct approach for most of CRT and Win32 
 functions because they do not store pointers.
 
 2. Write wrappers.  If you must use a non-pure, non-reentrant C library 
 which stores pointers, or returns malloc'd memory,---then yes, you must 
 understand how GC works, how to protect GC-allocated memory, and how to 
 make GC handle externally-allocated memory.
 
 3. Write D-aware external functions.  If you implement them yourself you 
 can always copy memory instead of storing raw pointers, so that you can 
 follow solution 1 when writing D code.

Sep 07 2008
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Alan Knowles <alan akbkhome.com> wrote:
 - But I still think that a large multi-threaded application having to 
 rely on the GC is not a pretty prospect, and that being able to manage 
 memory in a tighter clearer, and cleaner way would be preferable.. - 
 imagine 500 threads being paused and scanned every few seconds to ensure 
 memory doesn't get out of hand - this doesn't sound efficient... 
 (althought I'm not sure I've got any great answers yet)

Unless you've got 500 cores, you're actually pausing 2, 4, okay 8 threads, in case of 100% efficiency of your code. There *should* be a benefit from collecting garbage in parallel, especially after number of cores raises to dozens. Also, overhead of scanning 500 threads is to scan 500 stacks instead of one. Doesn't sound like a big deal.
 It's also a wake-up call to the authors of things like DBI - which 
 really should read this and understand that you can not send connect() 
 the username/password etc. without adding root to them.... - which I 
 doubt currently many  libraries in dsource address..

Well, yes, connect() is an asynchronous call which requires specifically that buffers passed must stay valid while request is in progress. This causes troubles in C either.
Sep 07 2008
parent reply "Craig Black" <cblack ara.com> writes:
 There *should* be a
 benefit from collecting garbage in parallel, especially after number of
 cores raises to dozens.

Do you know something I don't? According to my research, a separate thread touching everyone else's memory causes huge amounts of cache misses and page faults. These far outweigh any benefit gleaned from running the GC in parallel. -Craig
Sep 08 2008
parent Sergey Gromov <snake.scaly gmail.com> writes:
Craig Black <cblack ara.com> wrote:
 
 There *should* be a
 benefit from collecting garbage in parallel, especially after number of
 cores raises to dozens.

Do you know something I don't? According to my research, a separate thread touching everyone else's memory causes huge amounts of cache misses and page faults. These far outweigh any benefit gleaned from running the GC in parallel.

A 'separate thread touching everyone else's memory' would inevitably require to pause 'everyone' so that wasn't what I was talking about. I was talking about a hypothetical situation when every thread were collecting its own garbage leaving other threads' garbage alone. This approach, assuming large amounts of thread-local garbage, would definitely benefit from parallelization. But, even being hypothetical, it couldn't be free, and inevitable overhead may outweight benefits if number of parallel units is small.
Sep 08 2008