www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - GC allocation issue

reply Etienne <etcimon gmail.com> writes:
I'm running some tests on a cache store where I planned to use only 
Malloc for the values being stored, I'm hoping to eliminate the GC in 
95% of the program, but to keep it only for actively used items..

My problem is: when the program reaches 40MB it suddenly goes down to 
0.9MB and blocks.

 From every resource I've read, the understanding that came out is that 
the GC will stop all threads and search for pointers for data that was 
allocated only by the GC (using addRange or addRoot to extend its 
reach). This means that in the worst case scenario, there could be 
leaks, but I'm seeing the data being deleted by the GC so I'm a little 
stomped here. What am I missing?

I'm using FreeListAlloc from here:
https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/utils/memory.d#L152
in here:
https://github.com/globecsys/cache.d/blob/master/chd/table.d#L1087
and this is how I made it crash:
https://github.com/globecsys/cache.d/blob/master/chd/connection.d#L550

I know it's the GC because using GC.disable() fixes it, so I'm only 
really asking if the GC has a habit of deleting mallocated data like this.
Mar 20 2014
next sibling parent "Rene Zwanenburg" <renezwanenburg gmail.com> writes:
On Thursday, 20 March 2014 at 20:48:18 UTC, Etienne wrote:
 I'm running some tests on a cache store where I planned to use 
 only Malloc for the values being stored, I'm hoping to 
 eliminate the GC in 95% of the program, but to keep it only for 
 actively used items..

 My problem is: when the program reaches 40MB it suddenly goes 
 down to 0.9MB and blocks.

 From every resource I've read, the understanding that came out 
 is that the GC will stop all threads and search for pointers 
 for data that was allocated only by the GC (using addRange or 
 addRoot to extend its reach). This means that in the worst case 
 scenario, there could be leaks, but I'm seeing the data being 
 deleted by the GC so I'm a little stomped here. What am I 
 missing?

 I'm using FreeListAlloc from here:
 https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/utils/memory.d#L152
 in here:
 https://github.com/globecsys/cache.d/blob/master/chd/table.d#L1087
 and this is how I made it crash:
 https://github.com/globecsys/cache.d/blob/master/chd/connection.d#L550

 I know it's the GC because using GC.disable() fixes it, so I'm 
 only really asking if the GC has a habit of deleting mallocated 
 data like this.
The strings returned by to!string are owned by the GC. Since the GC doesn't scan malloc'ed memory, the GC thinks all your 'values' are unreferenced. The keys are still referenced in the keys array. So, it's GC-owned memory referenced only through malloc'ed memory that is being freed. Not the malloc'ed memory itself. Also, not relevant for your current problem, but keep in mind that structs defined inside a function will have access to the function's locals. To avoid corruption when the function returns the stack frame will be allocated on the GC heap when an instance is created. Make your 'A' struct static to avoid this.
Mar 20 2014
prev sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Etienne:

 I'm running some tests on a cache store where I planned to use 
 only Malloc for the values being stored, I'm hoping to 
 eliminate the GC in 95% of the program, but to keep it only for 
 actively used items..
Usually 95%-100% of a D program uses the GC and the 0%-5% uses malloc :-) Bye, bearophile
Mar 20 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-03-20 8:39 PM, bearophile wrote:
 Etienne:

 I'm running some tests on a cache store where I planned to use only
 Malloc for the values being stored, I'm hoping to eliminate the GC in
 95% of the program, but to keep it only for actively used items..
Usually 95%-100% of a D program uses the GC and the 0%-5% uses malloc :-) Bye, bearophile
I'm trying to store a copy of strings for long-running processes with malloc. I tried using emplace but the copy gets deleted by the GC. Any idea why?
Mar 20 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 21 March 2014 at 00:56:22 UTC, Etienne wrote:
 I tried using emplace but the copy gets deleted by the GC. Any 
 idea why?
That's extremely unlikely, the GC doesn't know how to free manually allocated things. Are you sure that's where the crash happens? Taking a really quick look at your code, this line raises a red flag: https://github.com/globecsys/cache.d/blob/master/chd/table.d#L55 Class destructors in D aren't allowed to reference GC allocated memory through their members. Accessing that string in the dtor could be a problem that goes away with GC.disable too.
Mar 20 2014
parent reply Etienne Cimon <etcimon gmail.com> writes:
On 2014-03-20 21:08, Adam D. Ruppe wrote:
 On Friday, 21 March 2014 at 00:56:22 UTC, Etienne wrote:
 I tried using emplace but the copy gets deleted by the GC. Any idea why?
That's extremely unlikely, the GC doesn't know how to free manually allocated things. Are you sure that's where the crash happens? Taking a really quick look at your code, this line raises a red flag: https://github.com/globecsys/cache.d/blob/master/chd/table.d#L55 Class destructors in D aren't allowed to reference GC allocated memory through their members. Accessing that string in the dtor could be a problem that goes away with GC.disable too.
Yes, you're right I may have a lack of understanding about destructors, I'll review this. I managed to generate a VisualD projet and the debugger confirms the program crashes on the GC b/c it has a random call stack for everything under fullcollect(). cache-d_d.exe!gc gc Gcx mark() C++ cache-d_d.exe!gc gc Gcx fullcollect() C++
 
cache-d_d.exe!std array Appender!string Appender ensureAddable(unsigned int this) Line 2389 C++ [External Code] cache-d_d.exe!std array Appender!string Appender ensureAddable(unsigned int this) Line 2383 C++ .... I have no methodology for debugging under these circumstances, do you know of anything else I can do than manually review the pathways in the source code?
Mar 20 2014
parent Etienne Cimon <etcimon gmail.com> writes:
On 2014-03-20 21:46, Etienne Cimon wrote:
 On 2014-03-20 21:08, Adam D. Ruppe wrote:
 On Friday, 21 March 2014 at 00:56:22 UTC, Etienne wrote:
 I tried using emplace but the copy gets deleted by the GC. Any idea why?
That's extremely unlikely, the GC doesn't know how to free manually allocated things. Are you sure that's where the crash happens? Taking a really quick look at your code, this line raises a red flag: https://github.com/globecsys/cache.d/blob/master/chd/table.d#L55 Class destructors in D aren't allowed to reference GC allocated memory through their members. Accessing that string in the dtor could be a problem that goes away with GC.disable too.
Yes, you're right I may have a lack of understanding about destructors, I'll review this. I managed to generate a VisualD projet and the debugger confirms the program crashes on the GC b/c it has a random call stack for everything under fullcollect(). cache-d_d.exe!gc gc Gcx mark() C++ cache-d_d.exe!gc gc Gcx fullcollect() C++ > cache-d_d.exe!std array Appender!string Appender ensureAddable(unsigned int this) Line 2389 C++ [External Code] cache-d_d.exe!std array Appender!string Appender ensureAddable(unsigned int this) Line 2383 C++ .... I have no methodology for debugging under these circumstances, do you know of anything else I can do than manually review the pathways in the source code?
It seems to be crashing somewhere here in druntime's gc.d : void mark(void *pbot, void *ptop, int nRecurse) { //import core.stdc.stdio;printf("nRecurse = %d\n", nRecurse); void **p1 = cast(void **)pbot; void **p2 = cast(void **)ptop; Considering it's an access violation of a root that was probably added by phobos, could this be an issue with these libraries?
Mar 20 2014
prev sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 21 March 2014 at 00:56:22 UTC, Etienne wrote:
 I'm trying to store a copy of strings for long-running 
 processes with malloc. I tried using emplace but the copy gets 
 deleted by the GC. Any idea why?
Could you show the snippet where you used "emplace"? I'd like to know how you are using it. In particular, where you are emplacing, and *what*: the slice, or the slice contents?
Mar 20 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-03-21 2:53 AM, monarch_dodra wrote:
 On Friday, 21 March 2014 at 00:56:22 UTC, Etienne wrote:
 I'm trying to store a copy of strings for long-running processes with
 malloc. I tried using emplace but the copy gets deleted by the GC. Any
 idea why?
Could you show the snippet where you used "emplace"? I'd like to know how you are using it. In particular, where you are emplacing, and *what*: the slice, or the slice contents?
https://github.com/globecsys/cache.d/blob/master/chd/table.d#L1089 This line does the copying I don't think it's the memory copying algorithm anymore however. The GC crashes altogether during fullcollect(), the logs give me this:
	cache-d_d.exe!gc gc Gcx mark(void * this, void * nRecurse, int ptop) 
Line 2266 C++ cache-d_d.exe!gc gc Gcx mark(void * this, void * ptop) Line 2249 C++ cache-d_d.exe!gc gc Gcx fullcollect() Line 2454 C++ cache-d_d.exe!gc gc GC mallocNoSync(unsigned int this, unsigned int alloc_size, unsigned int * alloc_size) Line 458 C++ cache-d_d.exe!gc gc GC malloc(unsigned int this, unsigned int alloc_size, unsigned int * bits) Line 413 C++ ... With ptop= 03D8F030, pbot= 03E4F030 They both point invalid memory. It looks like a really wide range too, the usual would be 037CCB80 -> 037CCBA0 or such. I don't know how to find out where they come from... Maybe I could do an assert on that specific value in druntime
Mar 21 2014
parent reply Etienne <etcimon gmail.com> writes:
On 2014-03-21 9:36 AM, Etienne wrote:
 With ptop= 03D8F030, pbot= 03E4F030

 They both point invalid memory. It looks like a really wide range too,
 the usual would be 037CCB80 -> 037CCBA0 or such. I don't know how to
 find out where they come from... Maybe I could do an assert on that
 specific value in druntime
Looks like the range of the string[] keys array, it gets pretty big after adding 10000s of strings. +GC.addRange(p = 03EA0AB0, sz = 0x38), p + sz = 03EA0AE8 set: 209499732595 => ¨98303126 +GC.addRange(p = 03EA0B40, sz = 0x38), p + sz = 03EA0B78 set: 6491851329 => ¨50107378 +GC.addRange(p = 03EA0BD0, sz = 0x38), p + sz = 03EA0C08 set: 262797465895 => ¨14438090 +GC.addRange(p = 03EA0C60, sz = 0x38), p + sz = 03EA0C98 set: 95992076217 => ¨65000864 +GC.addRange(p = 03EA0CF0, sz = 0x38), p + sz = 03EA0D28 +GC.addRange(p = 03EA0D50, sz = 0x30000), p + sz = 03ED0D50 It crashes when sz approaches 0x180000, it looks like (my best guess) the resized array doesn't get allocated but the GC still tries to scan it.
Mar 21 2014
parent Etienne <etcimon gmail.com> writes:
On 2014-03-21 10:34 AM, Etienne wrote:
 It crashes when sz approaches 0x180000, it looks like (my best guess)
 the resized array doesn't get allocated but the GC still tries to scan it.
Ok I found it in the manual implementation of a malloc-based HashMap. The right way to debug this was, sadly, to add a lot of printf and a few asserts in druntime, and redirecting the stdout to a file from the shell (./exe > logoutput.txt). The druntime win32.mak doesn't have a debug build so I had to add -debug -g in there to add symbols and make the sources show up instead of the disassembly in VisualD. In this case, the logs showed gc's mark() was failing on wide ranges, so I added an assert in addRange to make it throw when that range was added, and it finally gave me the call stack of the culprit. The issue was that a malloc range was (maybe) not being properly initialized before being added to the GC. https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/utils/hashmap.d#L221 https://github.com/rejectedsoftware/vibe.d/blob/master/source/vibe/utils/memory.d#L153 In this case, ptr isn't null and the range existed, but there's still an access violation from the GC for some reason. I'll keep searching for the root cause but it doesn't seem to be a GC issue anymore; though the debugging procedure could use some documentation. Thanks
Mar 21 2014