digitalmars.D.learn - Initializing D runtime and executing module and TLS ctors for D

=?UTF-8?Q?Ali_=c3=87ehreli?= (98/98) Jan 23 2021 tl;dr I know enough to sense there are important stuff that I don't know...

IGotD- (37/59) Jan 23 2021 During rt_init in the main thread, thread_attachThis is performed

=?UTF-8?Q?Ali_=c3=87ehreli?= (36/63) Jan 23 2021 Thank you very much for your answers. I think I've been on the right

IGotD- (29/38) Jan 24 2021 Any threads started by druntime has proper initialization of

=?UTF-8?Q?Ali_=c3=87ehreli?= (55/61) Jan 29 2021 And that of course has been what I've been trying to deal with. Bugs in

Imperatorn (6/13) Jan 30 2021 Hmm, interesting, or what you should call it 😅

=?UTF-8?Q?Ali_=c3=87ehreli?= (28/32) Jan 30 2021 I wonder whether doing something in the runtime is possible. For=20

IGotD- (16/29) Jan 30 2021 OT. A thread that suspends itself will always happen (not taking
Imperatorn (3/12) Jan 30 2021 Will take a look at the e-book also. Didn't know there was a

tsbockman (7/15) Jan 27 2021 The documentation build on dlang.org is broken. Check the source

rikki cattermole (2/5) Jan 27 2021 Fixed: https://issues.dlang.org/show_bug.cgi?id=21309

tsbockman (4/10) Jan 27 2021 I still don't see thread_setThis and thread_detachThis anywhere

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

tl;dr I know enough to sense there are important stuff that I don't know.

Even though I sometimes act[1] like someone who knows stuff, there are 
many fuzzy areas for me especially in the runtime.

Things work great when D code is inside a D program. The runtime and 
module states are magically initialized and everything works. It is not 
clear when it comes to writing a D library and especially when that 
library may be used by other language runtimes, necessarily on foreign 
threads.

Here are the essential points that I do and don't understand.

- Initialize the runtime: This is automatically done for a D program as 
described on the wiki[2]. This must be done by calling rt_init[3] for a 
D shared library. I handle this by calling rt_init from a 
pragma(crt_constructor) function[4]. Luckily, this is easy and works for 
all cases that I have.

- Execute module constructors ("ctor" for short, i.e. 'shared static 
this' blocks). This is done automatically for a D program and when the D 
library is loaded by other language code like C++ and Python. However, 
I've encountered a case[5] where module ctors were not being called. 
This could be due to runtime bugs or something that I don't understand 
with loading shared libraries. (My workaround is very involved: I grep 
the output of 'nm' to determine the symbol for the module ctor, call it 
after dlsym'ing, and because 'nm | grep' is a slow process, I cache this 
information in a file along with the ~2K libraries that I may load 
conditionally.)

- Loading D libraries from D code: I call loadLibrary[6] to load a D 
library so that "[its] D runtime [...] will be integrated with the 
current runtime". Sounds promising; assuming that rt_init is already 
called for the calling library, I assume loadLibrary will handle 
everything, and all code will use a single runtime and things will work 
fine. This works flawlessly for my D and C++ programs that load my D 
library that loads the other D libraries.

- Attaching foreign threads: D runtime needs to know about all threads 
that are running D code so that it will know what threads consist of 
"the world" for it to "stop the world" when performing garbage 
collection. The function to do this is thread_attachThis[7].

One question I have is, does rt_init already do thread_attachThis? I ask 
because I have a library that is loaded by Python and things work even 
*without* calling thread_attachThis.

- Execute thread local storage (TLS) ctors: Again, this happens 
automatically for most cases. However, thread_attachThis says "[if] full 
functionality as a D thread is desired, [rt_moduleTlsCtor] must be 
called after thread_attachThis". Ok. When would I not want "full 
functionality" anyway?

Another question: Are TLS ctors executed when I do loadLibrary?

And when they are executed, which modules are involved? The module that 
is calling rt_moduleTlsCtor or all modules? What are "all modules"?

- Detaching foreign threads: Probably even more important than 
thread_attachThis is thread_detachThis[8]. As its documentation says, 
one should call rt_moduleTlsDtor as well for "full functionality".

This is very important because when the GC collection kick in, it will 
stop all threads that makes up its world. If one of those threads has 
already been terminated, we will crash. (Related, I have an abandoned 
PR[9] that tried to fix issues with thread_detachThis, which stalled due 
to failing unit tests for the 32-bit Apple operating system, which D 
stopped supporting since then.) (And I stopped working on that issue 
mostly because the company I used to work for stopped using D and 
rewrote their library in C++.)

I have questions regarding thread_attachThis and thread_detachThis: When 
should they be called? Should the library expose a function that the 
users must call from *each thread* that they will be using? This may not 
be easy because a user may not know what thread they are running on. For 
example, the user of our library may be on a framework where threads may 
come and go, where the user may not have an opportunity to call 
thread_detachThis when a thread goes away. For example, the user may 
provide callback functions (which call us) to a framework that is 
running on a thread pool.

For that reason, my belief has been to call thread_attachThis upon 
entering an API function and calling thread_detachThis upon leaving it 
because I may not know whether this thread will survive or die soon. 
(thread_detachThis is so important because the next GC cycle will try to 
stop this thread and may crash.)

More questions: Can I thread_detachThis the thread that called rt_init? 
Can I call rt_moduleTlsCtor more than once? I guess it depends on each 
module. It will be troubling if a TLS ctor reinitializes an module state. :/

While trying to sort all of these out, I am facing a bug[10], which will 
force me to move away from std.parallelism and perhaps use 
std.concurrency. Even though that bug is reported for OS X, I think both 
that case and my "called from Python" case are related to an undefined 
behavior in thread management of runtime, which is exposed by 
std.parallelism. (?)

As you can see, even though I can list many references to act like I 
know stuff, I really don't and have many questions. :) The trouble is, 
when there are so many dimensions to test to be sure, it is extremely 
difficult to learn when a seg-fault bug is intermixed with all this, 
which hits sporadically. :(

I want to learn.

Thank you,
Ali

[1] https://www.youtube.com/watch?v=FNL-CPX4EuM

[2] https://wiki.dlang.org/Runtime_internals

[3] https://dlang.org/library/core/runtime/rt_init.html

[4] https://dlang.org/spec/pragma.html#crtctor

[5] https://forum.dlang.org/thread/rucm30$1lgk$1 digitalmars.com

[6] https://dlang.org/library/core/runtime/runtime.load_library.html

[7] https://dlang.org/library/core/thread/osthread/thread_attach_this.html

[8] https://dlang.org/library/core/thread/threadbase/thread_detach_this.html

[9] https://github.com/dlang/druntime/pull/1989

[10] https://issues.dlang.org/show_bug.cgi?id=11736

Jan 23 2021

IGotD- <nise nise.com> writes:

On Sunday, 24 January 2021 at 00:24:55 UTC, Ali Çehreli wrote:
 One question I have is, does rt_init already do 
 thread_attachThis? I ask because I have a library that is 
 loaded by Python and things work even *without* calling 
 thread_attachThis.

During rt_init in the main thread, thread_attachThis is performed 
what I have seen.

 Another question: Are TLS ctors executed when I do loadLibrary?

 And when they are executed, which modules are involved? The 
 module that is calling rt_moduleTlsCtor or all modules? What 
 are "all modules"?

The TLS standard (at least the ELF standard) does not have ctors. 
Only simple initialization are allowed meaning the initial data 
is stored as .tdata which is copied to the specific memory area 
for each thread. There is also a .tbss which is zero memory just 
like the .bss section. Actual ctor code that runs for each TLS 
thread is language specific and not part of the ELF standard 
therefore no such TLS ctor code are being run in the lower level 
API. The initialization (only copy and zeroing) of TLS data is 
being done when each thread starts. This can even be done in a 
lazy manner when the first TLS variable is being accessed.

 I have questions regarding thread_attachThis and 
 thread_detachThis: When should they be called? Should the 
 library expose a function that the users must call from *each 
 thread* that they will be using? This may not be easy because a 
 user may not know what thread they are running on. For example, 
 the user of our library may be on a framework where threads may 
 come and go, where the user may not have an opportunity to call 
 thread_detachThis when a thread goes away. For example, the 
 user may provide callback functions (which call us) to a 
 framework that is running on a thread pool.

I call thread_attachThis as soon the thread is supposed to call a 
D function. For example a callback from a thread in a thread 
pool. This usually happens when there is a function or delegate 
involved as any jump to D code would use them. I have to make a 
generic API and then a D API on top of that. In practice this 
means there is a trampoline function involved where and 
thread_attachThis and thread_detachThis is being called. Also 
this is where I call TLS ctors/dtors. It is an effect that 
delegates is language specific and it falls natural that way. 
Avoid extern(C) calls directly into D code.

In practice you can do this for any thread even if there are 
several delegates during the thread lifetime. You can simply have 
a TLS bool variable telling if the thread_attachThis and 
rt_moduleTlsCtor have already been run.

 More questions: Can I thread_detachThis the thread that called 
 rt_init? Can I call rt_moduleTlsCtor more than once? I guess it 
 depends on each module. It will be troubling if a TLS ctor 
 reinitializes an module state. :/

I have brought up this question before because like it is right 
now I haven't seen any "rt_uninit" or "rt_close" function. This 
is bit limiting for me as the main thread can exit while the 
process lives on. In general the main thread that goes into main 
must also be the last one returning the entire line of functions 
that was called during entry of the process. What will happen is 
that you possibly do a thread_detachThis twice.

Short answer is just park the main thread while the bulk is being 
done by other threads. Unfortunately that's how many libraries 
work today.

Jan 23 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Thank you very much for your answers. I think I've been on the right 
track and the following bug that I've mentioned has been messing up by 
hitting me randomly:

   https://issues.dlang.org/show_bug.cgi?id=11736

On 1/23/21 5:18 PM, IGotD- wrote:

 During rt_init in the main thread, thread_attachThis is performed what I
 have seen.

That explains why everything just works on most cases.

 Actual ctor code that runs for each TLS thread is language specific and
 not part of the ELF standard therefore no such TLS ctor code are being
 run in the lower level API. The initialization (only copy and zeroing)
 of TLS data is being done when each thread starts.

That must be the case for threads started by D runtime, right? It sounds 
like I must call rt_moduleTlsCtor explicitly for foreign threads. It's 
still not clear to me which modules' TLS variables are initialized 
(copied over). Only this module's or all modules that are in the 
program? I don't know whether it's possible to initialize one module; 
rt_moduleTlsCtor does not take any parameter.

 This can even be done
 in a lazy manner when the first TLS variable is being accessed.

I hope that's the case.

 I have to make a generic API and then a D
 API on top of that.

Did you mean a generic API, which makes calls to D? That's how I have 
it: an extern(C) API function calling proper D code.

 In practice this means there is a trampoline
 function involved where and thread_attachThis and thread_detachThis is
 being called. Also this is where I call TLS ctors/dtors.

That's what I will be doing.

 It is an effect
 that delegates is language specific and it falls natural that way. Avoid
 extern(C) calls directly into D code.

I hope I am misunderstanding you there. All I have are extern(C) 
function on the library API.

 In practice you can do this for any thread even if there are several
 delegates during the thread lifetime. You can simply have a TLS bool
 variable telling if the thread_attachThis and rt_moduleTlsCtor have
 already been run.

I've already experimented with it but it didn't work likely because of 
the bug mentioned above.

 In general the main thread that goes into main must also be the last one
 returning the entire line of functions that was called during entry of
 the process.

Main entry belongs to another language, so I have to document that this 
library can only work in such "well behaved" cases.

 What will happen is that you possibly do a
 thread_detachThis twice.

Sounds like I can track that with a bool variable as well, no?

 Short answer is just park the main thread while the bulk is being done
 by other threads. Unfortunately that's how many libraries work today.

Agreed. That's for me to specify in the library documentation.

I should revive my old PR and see whether it is needed at all:

   https://github.com/dlang/druntime/pull/1989

I am surprised how much I had learned at that time and how much I've 
already forgotten. :/ For example, my PR involves thread_setThis, which 
seems to be history now:

 
https://docarchives.dlang.io/v2.068.0/phobos/core_thread.html#.thread_setThis

And thread_detachThis seems to be missing now:

   https://dlang.org/phobos/core_thread.html

   https://dlang.org/phobos/core_thread_osthread.html

Documentation issue or is it not needed anymore?

Ali

Jan 23 2021

IGotD- <nise nise.com> writes:

On Sunday, 24 January 2021 at 03:59:26 UTC, Ali Çehreli wrote:
 That must be the case for threads started by D runtime, right? 
 It sounds like I must call rt_moduleTlsCtor explicitly for 
 foreign threads. It's still not clear to me which modules' TLS 
 variables are initialized (copied over). Only this module's or 
 all modules that are in the program? I don't know whether it's 
 possible to initialize one module; rt_moduleTlsCtor does not 
 take any parameter.

Any threads started by druntime has proper initialization of 
course. Any thread started by any module written in another 
language will not do D the thread initialization.

All TLS variables in all loaded modules are being initialized 
(only copying and zeoring) by the OS system code for each thread 
that the OS system knows about. After that it is up to each 
library for each language to do further initialization. Next time 
__tls_get_addr is being called after loading a library, the TLS 
variables of any new module will be found and initialized.

It is a mystery to me why the TLS standard never included a 
ctor/dtor vector for TLS variables. It is in practice possible 
but they didn't do it. The whole TLS design is like a swiss 
scheese.

 Did you mean a generic API, which makes calls to D? That's how 
 I have it: an extern(C) API function calling proper D code.

I have a lot of system code written in C++ which also include 
callbacks from that code. In order to support D a layer is 
necessary to catch all callbacks in a trampoline and invoke D 
delegates. Calling D code directly with extern(C) should be 
avoided because 1. D delegates are so much more versatile. 2. You 
must use a trampoline in order to do D specific thread 
initialization anyway. Since std::function cannot be used in a 
generic interface I actually use something like this, 
http://blog.coldflake.com/posts/C++-delegates-on-steroids/. Which 
is more versatile than plain extern(C) but simple enough so that 
it can be used by any language. In the case of D the "this 
pointer" can be used to a pointer of a D delegate.

Creating language agnostic interfaces require more attention than 
usual as I have experienced. Strings for example complicates 
things further as they are different for every language.

Jan 24 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/24/21 2:28 AM, IGotD- wrote:

 Any threads started by druntime has proper initialization of course. Any
 thread started by any module written in another language will not do D
 the thread initialization.

And that of course has been what I've been trying to deal with. Bugs in 
the uses of thread_attachThis and thread_detachThis, and most 
importantly, not having a guaranteed opportunity to call 
thread_detachThis (think a foreign thread dies on its own without 
calling us and the runtime crashes attempting to stop a nonexisting 
thread during a GC cycle) finally made me realize that D shared library 
functions cannot be called on foreign threads. At least not today... Or, 
they can only be used under unusual conventions like the rule below.

So, that's the golden rule: If you want to call functions of a D shared 
library (I guess static library as well) you must create your thread by 
our library's create_thread() function and join that thread by our 
library's join_thread() function. Works like a charm!

Luckily, it is trivial to determine whether we are being called on a 
foreign thread or a D thread through a module scoped bool variable...

 Since std::function cannot be
 used in a generic interface I actually use something like this,
 http://blog.coldflake.com/posts/C++-delegates-on-steroids/.

If I understand that article correctly, and by pure coincidence, the 
very shared libraries that are the subject of this discussion, which I 
load at run time, happen to register themselves by providing function 
pointers. Like in the article, those function pointers are of template 
instances, each of which know exactly what to do for their particular 
types but the registry keeps opaque functions. Pseudo code:

```
// Shared library:

struct A {
   // ...
}

struct B {
   // ...
}

shared static this() {
   register("some key",
            &serializer!A,    // <-- Takes e.g. void* but knows about A
            &deserializer!B); // ditto for B
}
```

And one of my issues has been that module constructors not being called 
when the library is loaded as a dependency of a C++ library, which is 
loaded by a Python module, which is imported by another Python module. :)

As I said earlier, I solved that issue by parsing and persisting the 
output of 'nm mylib.so' to identify the module ctor and to call it after 
dlsym'ing. Pretty hacky but works...

Getting back to my main issue: I am about to write a mixin template 
where any library's interface .d file will do the following and get the 
create_thread and join_thread functions automatically:

// mylib's extern(C) functions:

// This one provides mylib_create_thread() and mylib_join_thread():
mixin LibAPI!"mylib"();

// Other extern(C) functions of the library:
extern(C)
nothrow
int foo(int) {
   // ...
}

The .h file must still be maintained by hand.

Ali

Jan 29 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Saturday, 30 January 2021 at 05:44:37 UTC, Ali Çehreli wrote:
 On 1/24/21 2:28 AM, IGotD- wrote:

 [...]

 course. Any
 [...]

 not do D
 [...]

 [...]

Hmm, interesting, or what you should call it 😅

With this knowledge we have now, what changes could and/or should 
be made to make this process easier? 🤔

(Btw, I just "forced" my boss to buy your and Adam's book for me. 
I'm trying to sneak in D  thecompany)

Jan 30 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 1/30/21 1:34 AM, Imperatorn wrote:

 With this knowledge we have now, what changes could and/or should be
 made to make this process easier? =F0=9F=A4=94

I wonder whether doing something in the runtime is possible. For=20
example, it may be more resilient and not crash when suspending a thread =

fails because the thread may be dead already.

However, studying the runtime code around thread_detachThis three years=20
ago, I had realized that like many things in computing, the whole=20
stop-the-world is wishful thinking because there is no guarantee that=20
your "please suspend this thread" request to the OS has succeeded. You=20
get a success return code back but it means your request succeeded not=20
that the thread was or will be suspended. (I may be misremembering this=20
point but I know that the runtime requests things where OS does not give =

full guarantee for.)

(Going off-topic, even clicking on a user interface is wishful thinking=20
because a few times a year I attempt to click on something but another=20
window element pops under my mouse pointer and I unintentionally click=20
something else, commonly on web pages as they are being rendered by a=20
browser: links move around on the page. This used to bother me but not=20
anymore. Life is not perfect and I appreciate it. :) )

 (Btw, I just "forced" my boss to buy your and Adam's book for me

Cool! :) It makes me a little sad that my online version is ahead of the =

paper version by a couple of years now. I want to update the paper as=20
well but I want to work on work stuff like the topic of this discussion. =

:) (Related note: the ebook versions on the web page are more up-to-date =

than ones that you can buy especially because the versions on my web=20
site include a table of contents section. Consider updating your ebook=20
here: http://ddili.org/ders/d.en/index.html )

 I'm trying to sneak in D  thecompany)

I still think D is a great tool but some use cases can be tough and=20
sometimes embarrassing. :/

Ali

Jan 30 2021

IGotD- <nise nise.com> writes:

On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote:
 I wonder whether doing something in the runtime is possible. 
 For example, it may be more resilient and not crash when 
 suspending a thread fails because the thread may be dead 
 already.

 However, studying the runtime code around thread_detachThis 
 three years ago, I had realized that like many things in 
 computing, the whole stop-the-world is wishful thinking because 
 there is no guarantee that your "please suspend this thread" 
 request to the OS has succeeded. You get a success return code 
 back but it means your request succeeded not that the thread 
 was or will be suspended. (I may be misremembering this point 
 but I know that the runtime requests things where OS does not 
 give full guarantee for.)

OT. A thread that suspends itself will always happen (not taking 
fall through cases into account), if not, throw the OS away. If a 
thread suspends another thread, then you don't really know when 
that thread will be suspended. I would discourage that threads 
suspends other threads because that will open up a new world of 
race conditions. Some systems don't even allow it and its 
benefits are very limited.

Back to topic. I think that the generic solution even if it 
doesn't help you with your current implementation is to ban TLS 
all together. I think there have already been requests to remove 
TLS for druntime/phobos totally and I think this should 
definitely be done sooner than later. Also if you write a shared 
library in D, simply don't use TLS at all. This way it will not 
matter if a thread is registered by druntime or not. TLS is in my 
opinion a wart in computer science.

Jan 30 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote:
 On 1/30/21 1:34 AM, Imperatorn wrote:

 [...]

 should be
 [...]

 I wonder whether doing something in the runtime is possible. 
 For example, it may be more resilient and not crash when 
 suspending a thread fails because the thread may be dead 
 already.

 [...]

Will take a look at the e-book also. Didn't know there was a 
difference 👍

Jan 30 2021

tsbockman <thomas.bockman gmail.com> writes:

On Sunday, 24 January 2021 at 03:59:26 UTC, Ali Çehreli wrote:
 I am surprised how much I had learned at that time and how much 
 I've already forgotten. :/ For example, my PR involves 
 thread_setThis, which seems to be history now:


 https://docarchives.dlang.io/v2.068.0/phobos/core_thread.html#.thread_setThis

 And thread_detachThis seems to be missing now:

   https://dlang.org/phobos/core_thread.html

   https://dlang.org/phobos/core_thread_osthread.html

 Documentation issue or is it not needed anymore?

The documentation build on dlang.org is broken. Check the source 
code or Adam D. Ruppe's dpldocs.info for the complete 
documentation:
     
http://dpldocs.info/experimental-docs/core.thread.osthread.html

You'll find thread_setThis and thread_detachThis are still there.

Jan 27 2021

rikki cattermole <rikki cattermole.co.nz> writes:

On 28/01/2021 1:16 PM, tsbockman wrote:
 The documentation build on dlang.org is broken. Check the source code or 
 Adam D. Ruppe's dpldocs.info for the complete documentation:
 http://dpldocs.info/experimental-docs/core.thread.osthread.html

Fixed: https://issues.dlang.org/show_bug.cgi?id=21309

Jan 27 2021

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 28 January 2021 at 00:58:17 UTC, rikki cattermole 
wrote:
 On 28/01/2021 1:16 PM, tsbockman wrote:
 The documentation build on dlang.org is broken. Check the 
 source code or Adam D. Ruppe's dpldocs.info for the complete 
 documentation:
 http://dpldocs.info/experimental-docs/core.thread.osthread.html

 Fixed: https://issues.dlang.org/show_bug.cgi?id=21309

I still don't see thread_setThis and thread_detachThis anywhere 
on the dlang.org copy.

Jan 27 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Initializing D runtime and executing module and TLS ctors for D