www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DLL crash inside removethreadtableentry - where's the source code

reply Ben Davis <entheh cantab.net> writes:
Hi,

The user-mode driver I'm working on (a 32-bit DLL) is crashing Windows 
Media Player on exit. (Two other host apps exit fine.) I can catch it in 
the Visual Studio debugger, but only see assembly language. Initially 
I'm just after tips on where to find source for the bits of D that are 
involved, but maybe someone will recognise the problem already...

I've gone through the assembly in some detail, and established that the 
crash is inside some removethreadtableentry() code which is called 
shortly before DllMain(DLL_THREAD_DETACH), and must look something like:

//tid is the Windows numeric thread ID for the current thread
removethreadtableentry(tid) {
   foreach (i, obj in someObjArray1024EntriesLong) {
     if (obj.someField == tid) goto foundIt;
   }
   return;

   //When we get here, i is 1 (pretend it's in scope)
   foundIt:
   free(obj.something);	//Does nothing, already 0
   if (obj.somethingElse) {  //Does nothing, already 0
     CloseHandle(obj.somethingElse);
   }
   free(obj);	//Crash inside this free()
}

Furthermore, I've established that:

- removethreadtableentry() doesn't get to foundIt for most threads.

- (almost certain) removethreadtableentry() isn't called at all for one 
of the two host apps that work fine; and is called but doesn't get to 
foundIt for the other app.

- (almost certain) removethreadtableentry() crashes the first time it 
gets to foundIt.

(These are almost certain in the sense that I only set the breakpoint 
after catching the first on-shutdown DLL_THREAD_DETACH, which means I 
may have missed one; but it's unlikely.)

So basically this seems to point to some buggy code that hardly ever 
runs, but does in my case. (Or it's designed for a slightly different 
use of DLLs or something like that.)

For reference, the assembly language I analysed is below, but I think 
the next step is if someone either wants to fix 
removethreadtableentry(), or direct me to the source so I can 
investigate further. (It is a D function, is it? It looks like D naming 
as opposed to Microsoft naming.)

I'm off to bed, but will pick this up again tomorrow.

Full detail follows (but probably isn't worth reading).

The call stack looks like this:

  	myproject.dll!RTLMultiPool::SelectFree()  + 0x17 bytes	C++
  	myproject.dll!__removethreadtableentry()  + 0x69 bytes	C++
  	myproject.dll!__DllMainCRTStartup 12()  + 0x10c bytes	C++
  	ntdll.dll!_LdrpCallInitRoutine 16()  + 0x14 bytes	
  	ntdll.dll!_LdrShutdownThread 0()  + 0xe2 bytes	
  	ntdll.dll!_RtlExitUserThread 4()  + 0x2a bytes	
  	kernel32.dll! BaseThreadInitThunk 12()  + 0x19 bytes	
  	ntdll.dll!___RtlUserThreadStart 8()  + 0x27 bytes	
  	ntdll.dll!__RtlUserThreadStart 8()  + 0x1b bytes	

When I view the assembly for __DllMainCRTStartup, I can see that this is 
the function directly responsible for calling my DllMain function. There 
seems to be only one place where it calls removethreadtableentry, and it 
seems to be before a call to DllMain.

When I look at the assembly for removethreadtableentry, it's trying to 
make the last call to 'free' before returning, as follows:

__removethreadtableentry:
05A88F64  push        eax
05A88F65  mov         ecx,dword ptr [esp+8]
05A88F69  xor         edx,edx
05A88F6B  push        ebx
05A88F6C  push        esi
05A88F6D  jmp         __removethreadtableentry+0Fh (5A88F73h)
05A88F6F  pop         esi
05A88F70  pop         ebx
05A88F71  pop         eax
05A88F72  ret
05A88F73  mov         eax,dword ptr [___thdtbl (5AADFBCh)]
05A88F78  mov         ebx,dword ptr [eax+edx*4]
05A88F7B  test        ebx,ebx
05A88F7D  je          __removethreadtableentry+20h (5A88F84h)
05A88F7F  cmp         dword ptr [ebx+18h],ecx
05A88F82  je          __removethreadtableentry+2Bh (5A88F8Fh)
05A88F84  inc         edx
05A88F85  cmp         edx,400h
05A88F8B  je          __removethreadtableentry+0Bh (5A88F6Fh)
05A88F8D  jmp         __removethreadtableentry+0Fh (5A88F73h)
05A88F8F  mov         dword ptr [esp+8],edx        *
05A88F93  mov         ecx,dword ptr [esp+8]
05A88F97  mov         edx,dword ptr [___thdtbl (5AADFBCh)]
05A88F9D  mov         esi,dword ptr [___thdtbl (5AADFBCh)]
05A88FA3  mov         ebx,dword ptr [edx+ecx*4]
05A88FA6  mov         dword ptr [esi+ecx*4],0
05A88FAD  push        dword ptr [ebx+20h]
05A88FB0  call        _free (5A87118h)
05A88FB5  add         esp,4
05A88FB8  cmp         dword ptr [ebx+1Ch],0
05A88FBC  je          __removethreadtableentry+63h (5A88FC7h)
05A88FBE  push        dword ptr [ebx+1Ch]
05A88FC1  call        dword ptr [__imp__CloseHandle 4 (5A42B28h)]
05A88FC7  push        ebx
05A88FC8  call        _free (5A87118h)         <--------------------
05A88FCD  add         esp,4
05A88FD0  pop         esi
05A88FD1  pop         ebx
05A88FD2  pop         eax
05A88FD3  ret

The crash is then somewhere deep inside free().

Further debugging shows that removethreadtableentry is searching through 
a 1024-entry array of pointers, looking for a non-null pointer to an 
object for which the field at offset 0x18 is the current thread ID 
(which is in ecx). If it finds it, then it jumps to the point where I 
put the *. The crash seems to happen the very first time this line is 
hit (at least since I put the breakpoint there, which was after the 
first call into my DllMain).

So in summary: a number of threads (7 to 10) get successfully detached 
first, but weren't in the table that removethreadtableentry is 
searching. For the first thread to be found in that table, it crashed.

Finally, here's everything from the * to the call to free() (on a 
different run, so different addresses), with some values annotated:

//edx is 1, so it's the second entry in the table.
05C08F8F  mov         dword ptr [esp+8],edx
05C08F93  mov         ecx,dword ptr [esp+8]

//These set edx and esi to 0x05c2cd40.
05C08F97  mov         edx,dword ptr [___thdtbl (5C2DFBCh)]
05C08F9D  mov         esi,dword ptr [___thdtbl (5C2DFBCh)]

//ecx is 1, and ebx becomes 0x05c29b9b.
05C08FA3  mov         ebx,dword ptr [edx+ecx*4]
05C08FA6  mov         dword ptr [esi+ecx*4],0

//This pushes 0, and the call to free() does nothing.
05C08FAD  push        dword ptr [ebx+20h]
05C08FB0  call        _free (5C07118h)
05C08FB5  add         esp,4

//This is 0 and the CloseHandle call is skipped.
05C08FB8  cmp         dword ptr [ebx+1Ch],0
05C08FBC  je          __removethreadtableentry+63h (5C08FC7h)
05C08FBE  push        dword ptr [ebx+1Ch]
05C08FC1  call        dword ptr [__imp__CloseHandle 4 (5BC2B28h)]

//ebx is unchanged from above, and this call crashes.
05C08FC7  push        ebx
05C08FC8  call        _free (5C07118h)

I also stepped inside free(), and the next interesting stuff happens 
here (note I skipped free() itself and went straight to RTLMultiPool):

RTLMultiPool::Free:
05C0AC68  push        ecx
05C0AC69  cmp         dword ptr [esp+8],0
05C0AC6E  je          RTLMultiPool::Free+15h (5C0AC7Dh)
05C0AC70  mov         eax,dword ptr [esp+8]
//eax is now 0x05c29b9b, the pointer we're trying to free
05C0AC74  lea         edx,[eax-4]
//edx is now eax-4 = 0x05c29b97
05C0AC77  push        edx
05C0AC78  call        RTLMultiPool::SelectFree (5C0AC34h)
...

RTLMultiPool::SelectFree:
05C0AC34  push        ecx
//This reads 0x05c29b97 into eax
05C0AC35  mov         eax,dword ptr [esp+8]
//This reads an address from where eax points, and edx is 0
05C0AC39  mov         edx,dword ptr [eax]
05C0AC3B  push        ebx
05C0AC3C  push        esi
//Looking at ecx+4 revealed the value 0x00000080 (128)
05C0AC3D  cmp         edx,dword ptr [ecx+4]
05C0AC40  ja          RTLMultiPool::SelectFree+21h (5C0AC55h)
//So we get here
05C0AC42  lea         ebx,[edx-1]  	//ebx = 0xffffffff
05C0AC45  shr         ebx,3  		//ebx = 0x1fffffff
05C0AC48  push        eax
05C0AC49  mov         esi,dword ptr [ecx]  //esi = 0x0516000c
05C0AC4B  mov         ecx,dword ptr [esi+ebx*4]  //crash!

I suppose esi + 0x1fffffff*4 is basically esi-4. But then we get:

Unhandled exception at 0x05c0ac4b (myproject.dll) in wmplayer.exe: 
0xC0000005: Access violation reading location 0x85160008.

//Here's the rest of SelectFree FWIW.
05C0AC4E  call        RTLPool::Free (5C0D460h)
05C0AC53  jmp         RTLMultiPool::SelectFree+2Dh (5C0AC61h)
05C0AC55  mov         ecx,dword ptr [RTLHeap::pMainHeap (5C2B4FCh)]
05C0AC5B  push        eax
05C0AC5C  call        RTLHeap::Free (5C0D6B4h)
05C0AC61  pop         esi
05C0AC62  pop         ebx
05C0AC63  pop         eax
05C0AC64  ret         4
05C0AC67  int         3
Feb 16 2013
next sibling parent Ben Davis <entheh cantab.net> writes:
Correction to my hideous analysis inside free :P

On 17/02/2013 03:07, Ben Davis wrote:
 RTLMultiPool::SelectFree:
 05C0AC34  push        ecx
 //This reads 0x05c29b97 into eax
 05C0AC35  mov         eax,dword ptr [esp+8]
 //This reads an address from where eax points, and edx is 0
 05C0AC39  mov         edx,dword ptr [eax]
 05C0AC3B  push        ebx
 05C0AC3C  push        esi
 //Looking at ecx+4 revealed the value 0x00000080 (128)
 05C0AC3D  cmp         edx,dword ptr [ecx+4]
 05C0AC40  ja          RTLMultiPool::SelectFree+21h (5C0AC55h)
 //So we get here
 05C0AC42  lea         ebx,[edx-1]      //ebx = 0xffffffff
 05C0AC45  shr         ebx,3          //ebx = 0x1fffffff
 05C0AC48  push        eax
 05C0AC49  mov         esi,dword ptr [ecx]  //esi = 0x0516000c
 05C0AC4B  mov         ecx,dword ptr [esi+ebx*4]  //crash!

 I suppose esi + 0x1fffffff*4 is basically esi-4. But then we get:
No, I got confused here - the shift right is equivalent to division by 8, not by 4. So the address [esi + 0x1fffffff*4] is very likely to be very wrong. This implies that edx being 0 is bad. I'd inclined to guess at maybe a double freeing, or maybe freeing an address that isn't even a heap address. It's also very interesting that the address we're trying to free is completely unaligned (an odd number).
Feb 16 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 17.02.2013 04:07, Ben Davis wrote:
 Hi,

 The user-mode driver I'm working on (a 32-bit DLL) is crashing Windows
 Media Player on exit. (Two other host apps exit fine.) I can catch it in
 the Visual Studio debugger, but only see assembly language. Initially
 I'm just after tips on where to find source for the bits of D that are
 involved, but maybe someone will recognise the problem already...

 I've gone through the assembly in some detail, and established that the
 crash is inside some removethreadtableentry() code which is called
 shortly before DllMain(DLL_THREAD_DETACH), and must look something like:

 //tid is the Windows numeric thread ID for the current thread
 removethreadtableentry(tid) {
    foreach (i, obj in someObjArray1024EntriesLong) {
      if (obj.someField == tid) goto foundIt;
    }
    return;

    //When we get here, i is 1 (pretend it's in scope)
    foundIt:
    free(obj.something);    //Does nothing, already 0
    if (obj.somethingElse) {  //Does nothing, already 0
      CloseHandle(obj.somethingElse);
    }
    free(obj);    //Crash inside this free()
 }

 Furthermore, I've established that:

 - removethreadtableentry() doesn't get to foundIt for most threads.
_removethreadtableentry is a function in the DM C runtime library. It has the bug that it tries to free a data record that has never been allocated if the thread that loaded the DLL is terminated. This is the entry at index 1.
Feb 16 2013
parent reply Ben Davis <entheh cantab.net> writes:
On 17/02/2013 07:56, Rainer Schuetze wrote:
 _removethreadtableentry is a function in the DM C runtime library. It
 has the bug that it tries to free a data record that has never been
 allocated if the thread that loaded the DLL is terminated. This is the
 entry at index 1.
That's a good start :) Can it be fixed? Who would be able to do it? Or is there some code I can put in my project that will successfully work around the issue? I get the impression the source is available for money. I found this page http://www.digitalmars.com/download/freecompiler.html which mentions complete library source under a link to the shop. I *could* buy it and see if I can fix it myself, but it seems a bit risky. By the way, thanks for Visual D :)
Feb 17 2013
next sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 17.02.2013 12:31, Ben Davis wrote:
 On 17/02/2013 07:56, Rainer Schuetze wrote:
 _removethreadtableentry is a function in the DM C runtime library. It
 has the bug that it tries to free a data record that has never been
 allocated if the thread that loaded the DLL is terminated. This is the
 entry at index 1.
That's a good start :) Can it be fixed? Who would be able to do it?
Sure it can be fixed. It's up to Walter to build a new lib for distribution, though.
 Or is there some code I can put in my project that will successfully
 work around the issue?
Without recompiling the lib, I guess the best that can be done is patch snn.lib to not execute the last call to free().
 I get the impression the source is available for money. I found this
 page http://www.digitalmars.com/download/freecompiler.html which
 mentions complete library source under a link to the shop. I *could* buy
 it and see if I can fix it myself, but it seems a bit risky.
Yes, you get library source and a lot more. The risk is pretty limited, it is not very expensive.
 By the way, thanks for Visual D :)
Thanks :-)
Feb 17 2013
prev sibling parent reply "Trey Brisbane" <tbrisbane hotmail.com> writes:
On Sunday, 17 February 2013 at 11:32:02 UTC, Ben Davis wrote:
 On 17/02/2013 07:56, Rainer Schuetze wrote:
 _removethreadtableentry is a function in the DM C runtime 
 library. It
 has the bug that it tries to free a data record that has never 
 been
 allocated if the thread that loaded the DLL is terminated. 
 This is the
 entry at index 1.
That's a good start :) Can it be fixed? Who would be able to do it? Or is there some code I can put in my project that will successfully work around the issue? I get the impression the source is available for money. I found this page http://www.digitalmars.com/download/freecompiler.html which mentions complete library source under a link to the shop. I *could* buy it and see if I can fix it myself, but it seems a bit risky. By the way, thanks for Visual D :)
Sorry to necro this thread, but I'm currently experiencing the exact same issue. Was this ever fixed? If not, was there a bug filed?
May 11 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/11/2013 12:10 AM, Trey Brisbane wrote:
 On Sunday, 17 February 2013 at 11:32:02 UTC, Ben Davis wrote:
 On 17/02/2013 07:56, Rainer Schuetze wrote:
 _removethreadtableentry is a function in the DM C runtime library. It
 has the bug that it tries to free a data record that has never been
 allocated if the thread that loaded the DLL is terminated. This is the
 entry at index 1.
That's a good start :) Can it be fixed? Who would be able to do it? Or is there some code I can put in my project that will successfully work around the issue? I get the impression the source is available for money. I found this page http://www.digitalmars.com/download/freecompiler.html which mentions complete library source under a link to the shop. I *could* buy it and see if I can fix it myself, but it seems a bit risky. By the way, thanks for Visual D :)
Sorry to necro this thread, but I'm currently experiencing the exact same issue. Was this ever fixed? If not, was there a bug filed?
I thought this was already fixed. What's the date/size on your snn.lib? The latest is: 02/25/2013 06:19 PM 573,952 snn.lib
May 11 2013
next sibling parent "Trey Brisbane" <tbrisbane hotmail.com> writes:
On Saturday, 11 May 2013 at 07:38:53 UTC, Walter Bright wrote:
 I thought this was already fixed. What's the date/size on your 
 snn.lib? The latest is:

 02/25/2013  06:19 PM           573,952 snn.lib
In dmd.2.062.zip (the one I'm using): 574,464 2012-12-11 7:30 AM In dmc.zip: 573,952 2013-02-26 11:19 AM <-- the one I should be using? In dmc856.zip (from the Digital Mars site): 574,464 2012-12-11 7:30 AM Shouldn't these be in sync? :P Anyway, thanks for the tip. I'll give it a shot and post back.
May 11 2013
prev sibling parent "Trey Brisbane" <tbrisbane hotmail.com> writes:
Yep, problem solved.

Thanks very much for your help! :)
May 11 2013