www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Hard-to-reproduce GC bug

reply dsimcha <dsimcha yahoo.com> writes:
I'm having issues where it appears that the D2 druntime GC ignores roots
located in thread-local storage when determining what objects are alive.  This
leads to spurious deletion of objects in certain cases.  This is clearly a GC
issue, since disabling the GC solves it, but I can't figure out how to
reproduce it in a canonical way.  Has anyone else run into anything like this?
Dec 04 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
dsimcha wrote:
 I'm having issues where it appears that the D2 druntime GC ignores roots
 located in thread-local storage when determining what objects are alive.  This
 leads to spurious deletion of objects in certain cases.  This is clearly a GC
 issue, since disabling the GC solves it, but I can't figure out how to
 reproduce it in a canonical way.  Has anyone else run into anything like this?

Weird. The actual storage for TLS in druntime is an array of void* within the Thread class. I can't imagine that it wouldn't be scanned by the GC. Do you have a reproducible test case? Sean
Dec 04 2008
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Weird.  The actual storage for TLS in druntime is an array of void*
 within the Thread class.  I can't imagine that it wouldn't be scanned by
 the GC.  Do you have a reproducible test case?
 Sean

I just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491
Dec 05 2008
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
"dsimcha" wrote
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 Weird.  The actual storage for TLS in druntime is an array of void*
 within the Thread class.  I can't imagine that it wouldn't be scanned by
 the GC.  Do you have a reproducible test case?
 Sean

I just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491

I don't know if that __thread keyword is fleshed out yet. There is no documentation on it in the spec. The only place it is referenced is in the changelog, and there it says: "This is for testing purposes only to check out the machinery in the back end." I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect. -Steve
Dec 05 2008
next sibling parent dsimcha <dsimcha yahoo.com> writes:
== Quote from Steven Schveighoffer (schveiguy yahoo.com)'s article
 "dsimcha" wrote
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 Weird.  The actual storage for TLS in druntime is an array of void*
 within the Thread class.  I can't imagine that it wouldn't be scanned by
 the GC.  Do you have a reproducible test case?
 Sean

I just now managed to play around with this some more and come up with a small test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491

documentation on it in the spec. The only place it is referenced is in the changelog, and there it says: "This is for testing purposes only to check out the machinery in the back end." I'd say most likely that the GC doesn't see anything declared as __thread, so when you use that pointer as the only reference to GC allocated data, it doesn't see that it's still in use, and will collect. -Steve

Ok, well now that I'm aware of it, I'll just use the Tango/druntime TLS implementation to do what I want to do.
Dec 05 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Steven Schveighoffer wrote:
 I'd say most likely that the GC doesn't see anything declared as __thread, 
 so when you use that pointer as the only reference to GC allocated data, it 
 doesn't see that it's still in use, and will collect.

Looks like I need to do some research to see how the gc can discover the extent of tls data.
Dec 05 2008
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Walter Bright wrote:
 Steven Schveighoffer wrote:
 I'd say most likely that the GC doesn't see anything declared as 
 __thread, so when you use that pointer as the only reference to GC 
 allocated data, it doesn't see that it's still in use, and will collect.

Looks like I need to do some research to see how the gc can discover the extent of tls data.

I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).
Dec 07 2008
parent reply Leandro Lucarella <llucax gmail.com> writes:
Walter Bright, el  7 de diciembre a las 16:04 me escribiste:
 Walter Bright wrote:
Steven Schveighoffer wrote:
I'd say most likely that the GC doesn't see anything declared as __thread, so
when you use that pointer as the only reference to GC allocated data, it 
doesn't see that it's still in use, and will collect.


I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).

I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right? BTW, the change broke the indentation style of druntime :S Thank you. [1] http://www.dsource.org/projects/druntime/changeset/57 [2] http://www.dsource.org/projects/druntime/wiki/RuntimeSpec -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- La máquina de la moneda, mirá como te queda! -- Sidharta Kiwi
Dec 08 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Leandro Lucarella (llucax gmail.com)'s article
 Walter Bright, el  7 de diciembre a las 16:04 me escribiste:
 Walter Bright wrote:
Steven Schveighoffer wrote:
I'd say most likely that the GC doesn't see anything declared as __thread, so
when you use that pointer as the only reference to GC




doesn't see that it's still in use, and will collect.


I've got this working now for Windows and Linux for the main program (not for dll's or shared libraries).

implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?

That or runtime functions for the equivalent. Either way, the compiler runtime will have to define something.
 BTW, the change broke the indentation style of druntime :S

I'll take care of it :p Sean
Dec 08 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Leandro Lucarella wrote:
 Walter Bright, el  7 de diciembre a las 16:04 me escribiste:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 I'd say most likely that the GC doesn't see anything declared as __thread, so
when you use that pointer as the only reference to GC allocated data, it 
 doesn't see that it's still in use, and will collect.



I saw the change[1] and I wonder why there are mentions to the DMD implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?

I was more concerned about getting it to work right. <g>
Dec 08 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Walter Bright (newshound1 digitalmars.com)'s article
 Leandro Lucarella wrote:
 Walter Bright, el  7 de diciembre a las 16:04 me escribiste:
 Walter Bright wrote:
 Steven Schveighoffer wrote:
 I'd say most likely that the GC doesn't see anything declared as __thread, so
when you use that





 doesn't see that it's still in use, and will collect.





 I saw the change[1] and I wonder why there are mentions to the DMD
 implementation. Shouldn't that be implementation agnostic, being in the
 "common" part of the runtime? I guess _tlsstart and _tlsend should be
 added to the runtime specification[2] too, right?


Since there isn't yet a solution for shared libraries I may just wait on formalizing how this works. I've simply moved _tlsstart and _tlsend into the compiler runtime for now, and placed the related stuff in thread inside a version(DigitalMars) block. Sean
Dec 08 2008
prev sibling parent Leandro Lucarella <llucax gmail.com> writes:
Walter Bright, el  8 de diciembre a las 11:39 me escribiste:
 Leandro Lucarella wrote:
Walter Bright, el  7 de diciembre a las 16:04 me escribiste:
Walter Bright wrote:
Steven Schveighoffer wrote:
I'd say most likely that the GC doesn't see anything declared as __thread, so
when you use that pointer as the only reference to GC allocated data, it 
doesn't see that it's still in use, and will collect.



implementation. Shouldn't that be implementation agnostic, being in the "common" part of the runtime? I guess _tlsstart and _tlsend should be added to the runtime specification[2] too, right?

I was more concerned about getting it to work right. <g>

Sure, I mentioned it to understand where this is going to =) Anyway, maybe there should be a ticket or something on this so this issue don't get lost. Where one should put this kind of things? DMD's bugzilla? Druntime bugtracker? -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- De tan fina la condesa, por no cagarse, reza. -- Ricardo Vaporeso
Dec 08 2008
prev sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 == Quote from Sean Kelly (sean invisibleduck.org)'s article
 Weird.  The actual storage for TLS in druntime is an array of void*
 within the Thread class.  I can't imagine that it wouldn't be scanned by
 the GC.  Do you have a reproducible test case?
 Sean

test case, as opposed to a much larger real-world case, that reproduces this. I still haven't the slightest clue *why* my latest test case reproduces the bug and some others that I had tried didn't, but I've filed a bug report. See: http://d.puremagic.com/issues/show_bug.cgi?id=2491

Oh! You're using the built-in thread-local storage. I don't think that's fully implemented yet (Walter, please correct me if I'm wrong). You might want to use the thread-local storage feature in the Thread class for now. Depending on your memory / performance requirements: import core.thread; void main() { auto t = new ThreadLocal!(int); t.val = 5; writefln( t.val ); } -or- import core.thread; void main() { auto key = Thread.createLocal(); Thread.setLocal( key, cast(void*) 5 ); writefln( cast(int) Thread.getLocal( key ) ); } the second approach is closer to how C/C++ TLS works and saves the allocation of a wrapper struct, but is clearly more complicated in exchange. your best bet is probably to simply use ThreadLocal for now, since it will be easier to change later when built-in TLS works properly. Sean
Dec 05 2008
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sean Kelly (sean invisibleduck.org)'s article
 Oh!  You're using the built-in thread-local storage.  I don't think that's
fully
 implemented yet (Walter, please correct me if I'm wrong).  You might want to
 use the thread-local storage feature in the Thread class for now.  Depending
 on your memory / performance requirements:
 import core.thread;
 void main()
 {
     auto t = new ThreadLocal!(int);
     t.val = 5;
     writefln( t.val );
 }
 -or-
 import core.thread;
 void main()
 {
     auto key = Thread.createLocal();
     Thread.setLocal( key, cast(void*) 5 );
     writefln( cast(int) Thread.getLocal( key ) );
 }
 the second approach is closer to how C/C++ TLS works and saves
 the allocation of a wrapper struct, but is clearly more complicated
 in exchange.  your best bet is probably to simply use ThreadLocal
 for now, since it will be easier to change later when built-in TLS
 works properly.
 Sean

Thanks, though I'm way ahead of you in that I already did this. Works great, except it's a little bit slow. I'm actually working on an implementation of the SuperStack proposed by Andrei about a month ago, which was why I needed good TLS. It seems like with the current implementation (using the faster explicit key solution instead of the slower class-based solution), about 1/3 of my time is being spent on retrieving TLS. I got this number by caching the stuff from TLS on the stack of the calling function and passing it in as a parameter. This may become a semi-hidden feature for wringing out that last bit of performance from SuperStack. Is TLS inherently slow, or is the druntime implementation relatively quick and dirty and likely to improve in the future?
Dec 05 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from dsimcha (dsimcha yahoo.com)'s article
 Thanks, though I'm way ahead of you in that I already did this.  Works great,
 except it's a little bit slow.
 I'm actually working on an implementation of the SuperStack proposed by Andrei
 about a month ago, which was why I needed good TLS.  It seems like with the
 current implementation (using the faster explicit key solution instead of the
 slower class-based solution), about 1/3 of my time is being spent on retrieving
 TLS.  I got this number by caching the stuff from TLS on the stack of the
calling
 function and passing it in as a parameter.  This may become a semi-hidden
feature
 for wringing out that last bit of performance from SuperStack.  Is TLS
inherently
 slow, or is the druntime implementation relatively quick and dirty and likely
to
 improve in the future?

The druntime implementation is about as fast as user-level TLS can get, I'm afraid. If you look at the implementation: class ThreadLocal { T val() { Wrap* wrap = cast(Wrap*) Thread.getLocal( m_key ); return wrap ? wrap.val : m_def; } } class Thread { static void* getLocal( uint key ) { return getThis().m_local[key]; } static Thread getThis() { version( Posix ) return cast(Thread) pthread_getspecific( sm_this ); } void*[LOCAL_MAX] m_local; } The OS-level TLS call is typically implemented as an array indexing operation, so to get a TLS value you're looking at indexing into two arrays, a cast, and then an additional cast and conditional jump if you use ThreadLocal. Error checking is even omitted for performance reasons. If I knew of a way to make it faster then I would :-) Sean
Dec 05 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
dsimcha wrote:
 Thanks, though I'm way ahead of you in that I already did this.  Works great,
 except it's a little bit slow.

TLS is always going to be slow. Beating the old drum about how freakin' useful a tool obj2asm is and why doesn't anyone use it, here's what it looks like: -------------------- __thread int foo; void main() { foo = 3; } --------------------- __Dmain comdat assume CS:__Dmain mov EAX,__tls_index mov ECX,FS:__tls_array mov EDX,[EAX*4][ECX] mov dword ptr _D5test63fooi[EDX],3 xor EAX,EAX ret __Dmain ends --------------------- So you see, it takes 4 instructions to reference a TLS global vs 1 instruction for regular static data. The lesson is to minimize directly referencing such globals. Instead, take a pointer to them, or cache the value into a local. As for whether __thread is completely implemented, yes it is completely implemented in the compiler. I obviously forgot about the gc, though, and I'm glad you found the problem so I can fix it. In the meantime, you can call the gc directly to register your __thread variable as a 'root', then the gc will recognize it properly. If you want to read about how TLS works under Windows, see: http://www.nynaeve.net/?p=180 It works an equivalent, but completely differently under the hood, way in Linux: http://people.redhat.com/drepper/tls.pdf
Dec 05 2008
next sibling parent dsimcha <dsimcha yahoo.com> writes:
Thanks, guys.  I've found ways to speed things up a decent amount, and put an
alpha of my SuperStack up on Scrapple, though I renamed it to TempAlloc because
I
don't like the name SuperStack.  See D.announce and
http://dsource.org/projects/scrapple/browser/trunk/tempAlloc
Dec 05 2008
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jarrett Billingsley wrote:
 On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright
 <newshound1 digitalmars.com> wrote:
 TLS is always going to be slow. Beating the old drum about how freakin'
 useful a tool obj2asm is and why doesn't anyone use it, here's what it looks
 like:

Er, Walter, you realize it's not free, right? Meaning that even if the EUP is only $15 there's still going to be a lot of people who don't have it just because they don't feel bothered to buy it.

That's like saying one works as an auto mechanic but prefers to use a rock rather than a hammer because a hammer costs $15 !! It's just far too useful to not buy at such a reasonable price. Even so, obj2asm is free on the linux version.
Dec 05 2008
parent Don <nospam nospam.com> writes:
Jarrett Billingsley wrote:
 On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright
 <newshound1 digitalmars.com> wrote:
 That's like saying one works as an auto mechanic but prefers to use a rock
 rather than a hammer because a hammer costs $15 !! It's just far too useful
 to not buy at such a reasonable price.

 Even so, obj2asm is free on the linux version.

That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.

I updated Agner Fog's objconv so that the -fasm option now works with DMD .obj's on Windows. He still hasn't released it yet on his site, but I can give it to anyone who's interested. It disassembles all instructions, even the newly defined ones that don't exist on any current processors. But it still has a few problems, and it won't give you D source code interleaved with the asm output.
Dec 06 2008
prev sibling next sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Fri, Dec 5, 2008 at 5:38 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 TLS is always going to be slow. Beating the old drum about how freakin'
 useful a tool obj2asm is and why doesn't anyone use it, here's what it looks
 like:

Er, Walter, you realize it's not free, right? Meaning that even if the EUP is only $15 there's still going to be a lot of people who don't have it just because they don't feel bothered to buy it.
Dec 05 2008
prev sibling next sibling parent "Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:
On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright
<newshound1 digitalmars.com> wrote:
 That's like saying one works as an auto mechanic but prefers to use a rock
 rather than a hammer because a hammer costs $15 !! It's just far too useful
 to not buy at such a reasonable price.

 Even so, obj2asm is free on the linux version.

That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.
Dec 05 2008
prev sibling parent "Vladimir Panteleev" <thecybershadow gmail.com> writes:
On Sat, 06 Dec 2008 02:57:20 +0200, Jarrett Billingsley  
<jarrett.billingsley gmail.com> wrote:

 On Fri, Dec 5, 2008 at 7:29 PM, Walter Bright
 <newshound1 digitalmars.com> wrote:
 That's like saying one works as an auto mechanic but prefers to use a  
 rock
 rather than a hammer because a hammer costs $15 !! It's just far too  
 useful
 to not buy at such a reasonable price.

 Even so, obj2asm is free on the linux version.

That doesn't really help Windows DMD users who are stuck using an outdated object format that almost nothing else seems to understand. Or on Linux, for that matter, since there are - and always have been - free disassemblers for ELF.

IDA 4.9 is now free for non-commercial purposes, and it understands DMD's .obj files. http://www.hex-rays.com/idapro/idadownfreeware.htm -- Best regards, Vladimir mailto:thecybershadow gmail.com
Dec 07 2008
prev sibling parent reply Kagamin <spam here.lot> writes:
there is no simple way to scan TSL from another thread, TSL was designed to be
thread-local after all :)
Dec 05 2008
parent dsimcha <dsimcha yahoo.com> writes:
One thing I forgot to mention in the orig. post:  This happens in
single-threaded
apps.  I discovered this when writing a library struct, and haven't even tried
to
actually use multithreading yet.
Dec 05 2008