www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Segmentation fault in runTlsDtors

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
I need your help with sporadic segfaults.

Players:

* dmd 2.096 (but I've seen similar issues in the past with earlier 
versions as well)

* A D library with extern(C) functions that calls rt_init() and 
rt_term(), which I think are needed for the library's use with Python

* A D program that uses said library (would calling rt_init() and 
rt_term() cause harm in this case?) (Using the library with Python works 
fine.)


The segfault happens when the program is shutting down. Here is a stack 
trace from a core dump:

[Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
(gdb) bt

_D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6
bject10ModuleInfoZv 
() from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

/usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

_D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti
ns_elf_shared3DSOZi 
() from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

_D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () 
from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

/usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

/usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

pthread_create.c:463

../sysdeps/unix/sysv/linux/x86_64/clone.S:95


If related, here are the library initialization and deinitialization 
functions, which I think are needed e.g. for using from Python:

// The initialization function of the library
pragma (crt_constructor)
extern (C)
void lib_init() {
   const err = rt_init();
   enum success = 1;  // Yes, backwards.
   if (err != success) {
     fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime.");
     abort();
   }
}

// The deinitialization function of the library
pragma (crt_destructor)
extern (C)
void lib_deinit() {
   const err = rt_term();
   enum success = 1;  // Yes, backwards.
   if (err != success) {
     fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime.");
     // Intentionally not aborting in a destructor.
   }
}


The segmentation fault is sporadic; likely due to a race condition. Is 
it related to my code? Can I workaround this? Can I reduce the 
likelihood of this happening?

The couple of places where I define any '~this' function is not used in 
this program. So, I rule out my allocating memory in a destructor.

Thank you,
Ali
Jun 25
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
This may not help but try with ldc's address sanitizer.

That might give you more information about the life time for the memory 
causing the segfault itself with stack traces.
Jun 25
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/25/21 10:55 AM, Ali Çehreli wrote:
 I need your help with sporadic segfaults.
 
 Players:
 
 * dmd 2.096 (but I've seen similar issues in the past with earlier 
 versions as well)
 
 * A D library with extern(C) functions that calls rt_init() and 
 rt_term(), which I think are needed for the library's use with Python
 
 * A D program that uses said library (would calling rt_init() and 
 rt_term() cause harm in this case?) (Using the library with Python works 
 fine.)
rt_init and rt_term are reentrant, you can call rt_term and rt_init as many times as you like, as long as you call rt_init first, and rt_term as many times as you called rt_init.
 
 
 The segfault happens when the program is shutting down. Here is a stack 
 trace from a core dump:
 
 [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
 (gdb) bt

 _D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6
bject10ModuleInfoZv 
 () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 _D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti
ns_elf_shared3DSOZi 
 () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi () 
 from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 pthread_create.c:463

 ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers would be helpful. It's interesting though, the segfault is not happening in a static destructor, but rather the function that runs the destructors (seems like a nested function). Have you tried running demangle on these to see what they really are?
 
 
 If related, here are the library initialization and deinitialization 
 functions, which I think are needed e.g. for using from Python:
 
 // The initialization function of the library
 pragma (crt_constructor)
 extern (C)
 void lib_init() {
    const err = rt_init();
    enum success = 1;  // Yes, backwards.
    if (err != success) {
      fprintf(core.stdc.stdio.stderr, "Failed to initialize D runtime.");
      abort();
    }
 }
 
 // The deinitialization function of the library
 pragma (crt_destructor)
 extern (C)
 void lib_deinit() {
    const err = rt_term();
    enum success = 1;  // Yes, backwards.
    if (err != success) {
      fprintf(core.stdc.stdio.stderr, "Failed to deinitialize D runtime.");
      // Intentionally not aborting in a destructor.
    }
 }
 
 
 The segmentation fault is sporadic; likely due to a race condition. Is 
 it related to my code? Can I workaround this? Can I reduce the 
 likelihood of this happening?
Are you running any other CRT destructors that might use D constructs? Note that CRT destructors and constructors do *not* run in any specific order, unlike D constructors and destructors.
 The couple of places where I define any '~this' function is not used in 
 this program. So, I rule out my allocating memory in a destructor.
Allocating memory in a destructor would not cause this problem. -Steve
Jun 25
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/25/21 11:21 AM, Steven Schveighoffer wrote:

 rt_init and rt_term are reentrant, you can call rt_term and rt_init as
 many times as you like, as long as you call rt_init first, and rt_term
 as many times as you called rt_init.
Cool. That's what I know.
 The segfault happens when the program is shutting down. Here is a
 stack trace from a core dump:

 [Current thread is 1 (Thread 0x7fb1ef95e700 (LWP 20010))]
 (gdb) bt

 
_D2rt5minfo__T17runModuleFuncsRevSQBgQBg11ModuleGroup11runTlsDtorsMFZ9__lambda1ZQCoMFAxPyS6 bject10ModuleInfoZv
 () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 
_D2rt5minfo16rt_moduleTlsDtorUZ14__foreachbody1MFKSQBx19secti ns_elf_shared3DSOZi
 () from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 _D2rt19sections_elf_shared3DSO14opApplyReverseFMDFKSQByQByQBgZiZi ()
 from /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 /usr/lib/x86_64-linux-gnu/libphobos2.so.0.96

 pthread_create.c:463

 ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Hm... maybe try compiling Phobos/druntime in debug mode. Line numbers would be helpful. It's interesting though, the segfault is not happening in a static destructor, but rather the function that runs the destructors (seems like a nested function). Have you tried running demangle on these to see what they really are?
I can see runTlsDtors() in frame 0. Assuming it runs the destructors of my TLS objects, then the culprit may be me. (See below.) And why are we inside starting a thread? Is that a GC thread? I can't imagine my program starting a thread when the program is shutting down. (?)
 Are you running any other CRT destructors that might use D constructs?
No. There is only one pair to initialize the library. Again, the library is used by a D program but the program does not load the library explicitly. This is built by cmake and the library is specified as a dependency and I assume it's linked and loaded automatically. I just had a worry: I am not even sure whether a function is used from the library or whether it's compiled and used from the module that the program inevitably imports. For example, if the library has a c_api.d module, the D program imports it anyway and it imports other modules that it depends on anyway. :) So, perhaps my D program does not even use the librayr, in which case perhasp rt_term may be a problem. (?)
 The couple of places where I define any '~this' function is not used
 in this program. So, I rule out my allocating memory in a destructor.
Allocating memory in a destructor would not cause this problem.
I am reminded of ~this() functions (any kind: struct, class, static, and shared static) because the segfault happens during runTlsDtors(). Does that execute my code? Am I doing things in destructors that I should not be doing? But again, the only destructors I defined are not in this program. (The only one that's in this program is in a unittest, which is excluded by 'version(unittest)'.)
 -Steve
Thank you, Ali
Jun 25
parent reply Max Samukha <maxsamukha gmail.com> writes:
On Saturday, 26 June 2021 at 02:14:50 UTC, Ali Çehreli wrote:

 And why are we inside starting a thread? Is that a GC thread? I 
 can't imagine my program starting a thread when the program is 
 shutting down. (?)
We just haven't exited the process's main thread yet, which was created with this call at line 95: https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/clone.S.html
Jul 01
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 7/1/21 12:51 PM, Max Samukha wrote:
 On Saturday, 26 June 2021 at 02:14:50 UTC, Ali =C3=87ehreli wrote:
=20
 And why are we inside starting a thread? Is that a GC thread? I can't =
 imagine my program starting a thread when the program is shutting=20
 down. (?)
=20 We just haven't exited the process's main thread yet, which was created=
=20
 with this call at line 95:=20
 https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/x86_64/c=
lone.S.html=20
=20
Thanks. I came here to report that I've worked around this issue by not linking=20 with the library but including its modules in the program that segfaulted= =2E The main difference in this case is the lack of the library's c_api.d=20 file, which did automatic library initialization and deinitialization.=20 Of course, I'm not sure whether that was the cause but I am happy that=20 it was a fairly simple workaround which involved just the build=20 configuration file. Ali
Jul 01