www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - druntime thread (from foreach parallel?) cleanup bug

reply mw <mingwu gmail.com> writes:
My program received signal SIGSEGV, Segmentation fault.

Its simplified structure looks like this:

```
void foo() {
   ...
   writeln("done");  // saw this got printed!
}

int main() {
   foo();
   return 0;
}

```

So, just before the program exit, it failed. I suspect druntime 
has a thread (maybe due to foreach parallel) cleanup bug 
somewhere, which is unrelated to my own code. This kind of bug is 
hard to re-produce, not sure if I should file an issue.

I'm using: LDC - the LLVM D compiler (1.30.0) on x86_64.


Under gdb, here is the threads info (for the record):

Thread 11 "xxx" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x1555553df700 (LWP 36258)]
__GI___res_iclose (free_addr=true, statp=0x1555553dfdb8) at 
res-close.c:103
103     res-close.c: No such file or directory.


(gdb) info threads
   Id   Target Id         Frame
   1    Thread 0x155555515000 (LWP 36244) "lt" 0x0000155550af1d2d 
in __GI___pthread_timedjoin_ex (threadid=23456246527744, 
thread_return=0x0, abstime=0x0, block=<optimized out>) at 
pthread_join_common.c:89
* 11   Thread 0x1555553df700 (LWP 36258) "lt" __GI___res_iclose 
(free_addr=true, statp=0x1555553dfdb8) at res-close.c:103
   17   Thread 0x155544817700 (LWP 36264) "lt" 0x0000155550afac70 
in __GI___nanosleep (requested_time=0x155544810e90, 
remaining=0x155544810ea8) at 
../sysdeps/unix/sysv/linux/nanosleep.c:28


(gdb) thread 1
[Switching to thread 1 (Thread 0x155555515000 (LWP 36244))]

(threadid=23456246527744, thread_return=0x0, abstime=0x0, 
block=<optimized out>) at pthread_join_common.c:89
89      pthread_join_common.c: No such file or directory.
(gdb) where

(threadid=23456246527744, thread_return=0x0, abstime=0x0, 
block=<optimized out>) at pthread_join_common.c:89

core.thread.osthread.joinLowLevelThread(ulong) ()

_D4core8internal2gc4impl12conservativeQw3Gcx15stopScanThreadsMFNbZv ()

_D4core8internal2gc4impl12conservativeQw3Gcx4DtorMFZv ()

_D4core8internal2gc4impl12conservativeQw14ConservativeGC6__dtorMFZv ()



_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv ()



//home/zhou/project/ldc2-1.30.0-linux-x86_64/bin/../import/core/internal/entrypoint.d:42

<main>, argc=2, argv=0x7fffffffe188, init=<optimized out>, 
fini=<optimized out>, rtld_fini=<optimized out>, 
stack_end=0x7fffffffe178)
     at ../csu/libc-start.c:310



(gdb) thread 11
[Switching to thread 11 (Thread 0x1555553df700 (LWP 36258))]

res-close.c:103
103     res-close.c: No such file or directory.
(gdb) where

res-close.c:103


thread-freeres.c:29

pthread_create.c:476

../sysdeps/unix/sysv/linux/x86_64/clone.S:95


(gdb) thread 17
[Switching to thread 17 (Thread 0x155544817700 (LWP 36264))]

(requested_time=0x155544810e90, remaining=0x155544810ea8) at 
../sysdeps/unix/sysv/linux/nanosleep.c:28
28      ../sysdeps/unix/sysv/linux/nanosleep.c: No such file or 
directory.
(gdb) where

(requested_time=0x155544810e90, remaining=0x155544810ea8) at 
../sysdeps/unix/sysv/linux/nanosleep.c:28

_D4core6thread8osthread6Thread5sleepFNbNiSQBo4time8DurationZv ()

_D4hunt4util8DateTimeQj25_sharedStaticCtor_L406_C5FZ9__lambda4MFZv () at
home/zhou/.dub/packages/hunt-1.7.16/hunt/source/hunt/util/DateTime.d:430


pthread_create.c:463

../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Nov 01 2022
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Nov 01, 2022 at 05:19:56PM +0000, mw via Digitalmars-d-learn wrote:
 My program received signal SIGSEGV, Segmentation fault.
 
 Its simplified structure looks like this:
 
 ```
 void foo() {
   ...
   writeln("done");  // saw this got printed!
 }
 
 int main() {
   foo();
   return 0;
 }
 
 ```
Can you show a code snippet that includes the parallel foreach? Because the above code snippet is over-simplified to the point it's impossible to tell what the original problem might be, since obviously calling a function that calls writeln would not crash the program. Maybe try running Digger to reduce the code for you? T -- Never step over a puddle, always step around it. Chances are that whatever made it is still dripping.
Nov 01 2022
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/1/22 10:27, H. S. Teoh wrote:

 Maybe try running Digger to reduce the code for you?
Did you mean dustmite, which is accessible as 'dub dustmite <destination-path>' but I haven't used it. My guess for the segmentation fault is that the OP is executing destructor code that assumes some members are alive. If so, the code should be moved from destructors to functions to be called like obj.close(). But it's just a guess... Ali
Nov 01 2022
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Nov 01, 2022 at 10:37:57AM -0700, Ali Çehreli via Digitalmars-d-learn
wrote:
 On 11/1/22 10:27, H. S. Teoh wrote:
 
 Maybe try running Digger to reduce the code for you?
Did you mean dustmite, which is accessible as 'dub dustmite <destination-path>' but I haven't used it.
Oh yes, sorry, I meant dustmite, not digger. :-P
 My guess for the segmentation fault is that the OP is executing
 destructor code that assumes some members are alive. If so, the code
 should be moved from destructors to functions to be called like
 obj.close(). But it's just a guess...
[...] Yes, that's a common gotcha. T -- We are in class, we are supposed to be learning, we have a teacher... Is it too much that I expect him to teach me??? -- RL
Nov 01 2022
prev sibling parent reply mw <mingwu gmail.com> writes:
 Can you show a code snippet that includes the parallel foreach?
(It's just a very straight forward foreach on an array; as I said it may not be relevant.) And I just noticed, one of the thread trace points to here: https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430 ``` class DateTime { shared static this() { ... dateThread.isDaemon = true; // not sure if this is related } } ``` in the comments, it said: "BUG: ... crashed". Looks like someone run into this (cleanup) issue already, but unable to fix it. Anyway I logged an issue there: https://github.com/huntlabs/hunt/issues/96
Nov 01 2022
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 11/1/22 1:47 PM, mw wrote:
 Can you show a code snippet that includes the parallel foreach?
(It's just a very straight forward foreach on an array; as I said it may not be relevant.) And I just noticed, one of the thread trace points to here: https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430 ``` class DateTime {   shared static this() {     ...     dateThread.isDaemon = true;  // not sure if this is related   } } ``` in the comments, it said: "BUG: ... crashed".  Looks like someone run into this (cleanup) issue already, but unable to fix it. Anyway I logged an issue there: https://github.com/huntlabs/hunt/issues/96
Oh yeah, isDaemon detaches the thread from the GC. Don't do that unless you know what you are doing. -Steve
Nov 01 2022
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven Schveighoffer 
wrote:
 Oh yeah, isDaemon detaches the thread from the GC. Don't do 
 that unless you know what you are doing.
As discussed on discord, this isn't true actually. All it does is prevent the thread from being joined before exiting the runtime. What is *likely* happening is, the runtime shuts down. That thread is still running, but the D runtime is gone. So it eventually starts trying to do something (like let's say, access thread local storage), and it's gone. Hence the segfault. -Steve
Nov 01 2022
prev sibling parent reply mw <mingwu gmail.com> writes:
On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven Schveighoffer 
wrote:

 
 And I just noticed, one of the thread trace points to here:
 
 https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L430
 
 ```
 class DateTime {
    shared static this() {
      ...
      dateThread.isDaemon = true;  // not sure if this is 
 related
    }
 }
 ```
 
 in the comments, it said: "BUG: ... crashed".  Looks like 
 someone run into this (cleanup) issue already, but unable to 
 fix it.
 
 Anyway I logged an issue there:
 
 https://github.com/huntlabs/hunt/issues/96
 
 
Oh yeah, isDaemon detaches the thread from the GC. Don't do that unless you know what you are doing.
Maybe the hunt library author doesn't know. (My code does not directly use this library, it got pulled in by some other decencies.) Currently, the `isDaemon` doc does not mention this about this: https://dlang.org/library/core/thread/threadbase/thread_base.is_daemon.html Sets the daemon status for this thread. While the runtime will wait for all normal threads to complete before tearing down the process, daemon threads are effectively ignored and thus will not prevent the process from terminating. In effect, daemon threads will be terminated automatically by the OS when the process exits. Maybe we should add to the doc? BTW, what is exactly going wrong with their code? I saw the tick() method call inside the anonymous `dateThread` is accessing these two stack variables of shared static this(): https://github.com/huntlabs/hunt/blob/master/source/hunt/util/DateTime.d#L409 Appender!(char[])[2] bufs; const(char)[][2] targets; Why does this tick() call work after the static this() finished in a normal run? Why the problem only shows up when program finish?
Nov 01 2022
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Tuesday, 1 November 2022 at 19:49:47 UTC, mw wrote:
 On Tuesday, 1 November 2022 at 18:18:45 UTC, Steven 
 Schveighoffer wrote:

[...]
Maybe the hunt library author doesn't know. (My code does not directly use this library, it got pulled in by some other decencies.) [...]
Please, if you see anything in the docs that needs to be updated, make a PR right away <3 Documentation saves lives! The times I have thought "I'll do it later" have been too many.
Nov 01 2022