www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Initializing D runtime and executing module and TLS ctors for D

reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
tl;dr I know enough to sense there are important stuff that I don't know.

Even though I sometimes act[1] like someone who knows stuff, there are 
many fuzzy areas for me especially in the runtime.

Things work great when D code is inside a D program. The runtime and 
module states are magically initialized and everything works. It is not 
clear when it comes to writing a D library and especially when that 
library may be used by other language runtimes, necessarily on foreign 
threads.

Here are the essential points that I do and don't understand.

- Initialize the runtime: This is automatically done for a D program as 
described on the wiki[2]. This must be done by calling rt_init[3] for a 
D shared library. I handle this by calling rt_init from a 
pragma(crt_constructor) function[4]. Luckily, this is easy and works for 
all cases that I have.

- Execute module constructors ("ctor" for short, i.e. 'shared static 
this' blocks). This is done automatically for a D program and when the D 
library is loaded by other language code like C++ and Python. However, 
I've encountered a case[5] where module ctors were not being called. 
This could be due to runtime bugs or something that I don't understand 
with loading shared libraries. (My workaround is very involved: I grep 
the output of 'nm' to determine the symbol for the module ctor, call it 
after dlsym'ing, and because 'nm | grep' is a slow process, I cache this 
information in a file along with the ~2K libraries that I may load 
conditionally.)

- Loading D libraries from D code: I call loadLibrary[6] to load a D 
library so that "[its] D runtime [...] will be integrated with the 
current runtime". Sounds promising; assuming that rt_init is already 
called for the calling library, I assume loadLibrary will handle 
everything, and all code will use a single runtime and things will work 
fine. This works flawlessly for my D and C++ programs that load my D 
library that loads the other D libraries.

- Attaching foreign threads: D runtime needs to know about all threads 
that are running D code so that it will know what threads consist of 
"the world" for it to "stop the world" when performing garbage 
collection. The function to do this is thread_attachThis[7].

One question I have is, does rt_init already do thread_attachThis? I ask 
because I have a library that is loaded by Python and things work even 
*without* calling thread_attachThis.

- Execute thread local storage (TLS) ctors: Again, this happens 
automatically for most cases. However, thread_attachThis says "[if] full 
functionality as a D thread is desired, [rt_moduleTlsCtor] must be 
called after thread_attachThis". Ok. When would I not want "full 
functionality" anyway?

Another question: Are TLS ctors executed when I do loadLibrary?

And when they are executed, which modules are involved? The module that 
is calling rt_moduleTlsCtor or all modules? What are "all modules"?

- Detaching foreign threads: Probably even more important than 
thread_attachThis is thread_detachThis[8]. As its documentation says, 
one should call rt_moduleTlsDtor as well for "full functionality".

This is very important because when the GC collection kick in, it will 
stop all threads that makes up its world. If one of those threads has 
already been terminated, we will crash. (Related, I have an abandoned 
PR[9] that tried to fix issues with thread_detachThis, which stalled due 
to failing unit tests for the 32-bit Apple operating system, which D 
stopped supporting since then.) (And I stopped working on that issue 
mostly because the company I used to work for stopped using D and 
rewrote their library in C++.)

I have questions regarding thread_attachThis and thread_detachThis: When 
should they be called? Should the library expose a function that the 
users must call from *each thread* that they will be using? This may not 
be easy because a user may not know what thread they are running on. For 
example, the user of our library may be on a framework where threads may 
come and go, where the user may not have an opportunity to call 
thread_detachThis when a thread goes away. For example, the user may 
provide callback functions (which call us) to a framework that is 
running on a thread pool.

For that reason, my belief has been to call thread_attachThis upon 
entering an API function and calling thread_detachThis upon leaving it 
because I may not know whether this thread will survive or die soon. 
(thread_detachThis is so important because the next GC cycle will try to 
stop this thread and may crash.)

More questions: Can I thread_detachThis the thread that called rt_init? 
Can I call rt_moduleTlsCtor more than once? I guess it depends on each 
module. It will be troubling if a TLS ctor reinitializes an module state. :/

While trying to sort all of these out, I am facing a bug[10], which will 
force me to move away from std.parallelism and perhaps use 
std.concurrency. Even though that bug is reported for OS X, I think both 
that case and my "called from Python" case are related to an undefined 
behavior in thread management of runtime, which is exposed by 
std.parallelism. (?)

As you can see, even though I can list many references to act like I 
know stuff, I really don't and have many questions. :) The trouble is, 
when there are so many dimensions to test to be sure, it is extremely 
difficult to learn when a seg-fault bug is intermixed with all this, 
which hits sporadically. :(

I want to learn.

Thank you,
Ali

[1] https://www.youtube.com/watch?v=FNL-CPX4EuM

[2] https://wiki.dlang.org/Runtime_internals

[3] https://dlang.org/library/core/runtime/rt_init.html

[4] https://dlang.org/spec/pragma.html#crtctor

[5] https://forum.dlang.org/thread/rucm30$1lgk$1 digitalmars.com

[6] https://dlang.org/library/core/runtime/runtime.load_library.html

[7] https://dlang.org/library/core/thread/osthread/thread_attach_this.html

[8] https://dlang.org/library/core/thread/threadbase/thread_detach_this.html

[9] https://github.com/dlang/druntime/pull/1989

[10] https://issues.dlang.org/show_bug.cgi?id=11736
Jan 23
parent reply IGotD- <nise nise.com> writes:
On Sunday, 24 January 2021 at 00:24:55 UTC, Ali Çehreli wrote:
 One question I have is, does rt_init already do 
 thread_attachThis? I ask because I have a library that is 
 loaded by Python and things work even *without* calling 
 thread_attachThis.
During rt_init in the main thread, thread_attachThis is performed what I have seen.
 Another question: Are TLS ctors executed when I do loadLibrary?

 And when they are executed, which modules are involved? The 
 module that is calling rt_moduleTlsCtor or all modules? What 
 are "all modules"?
The TLS standard (at least the ELF standard) does not have ctors. Only simple initialization are allowed meaning the initial data is stored as .tdata which is copied to the specific memory area for each thread. There is also a .tbss which is zero memory just like the .bss section. Actual ctor code that runs for each TLS thread is language specific and not part of the ELF standard therefore no such TLS ctor code are being run in the lower level API. The initialization (only copy and zeroing) of TLS data is being done when each thread starts. This can even be done in a lazy manner when the first TLS variable is being accessed.
 I have questions regarding thread_attachThis and 
 thread_detachThis: When should they be called? Should the 
 library expose a function that the users must call from *each 
 thread* that they will be using? This may not be easy because a 
 user may not know what thread they are running on. For example, 
 the user of our library may be on a framework where threads may 
 come and go, where the user may not have an opportunity to call 
 thread_detachThis when a thread goes away. For example, the 
 user may provide callback functions (which call us) to a 
 framework that is running on a thread pool.
I call thread_attachThis as soon the thread is supposed to call a D function. For example a callback from a thread in a thread pool. This usually happens when there is a function or delegate involved as any jump to D code would use them. I have to make a generic API and then a D API on top of that. In practice this means there is a trampoline function involved where and thread_attachThis and thread_detachThis is being called. Also this is where I call TLS ctors/dtors. It is an effect that delegates is language specific and it falls natural that way. Avoid extern(C) calls directly into D code. In practice you can do this for any thread even if there are several delegates during the thread lifetime. You can simply have a TLS bool variable telling if the thread_attachThis and rt_moduleTlsCtor have already been run.
 More questions: Can I thread_detachThis the thread that called 
 rt_init? Can I call rt_moduleTlsCtor more than once? I guess it 
 depends on each module. It will be troubling if a TLS ctor 
 reinitializes an module state. :/
I have brought up this question before because like it is right now I haven't seen any "rt_uninit" or "rt_close" function. This is bit limiting for me as the main thread can exit while the process lives on. In general the main thread that goes into main must also be the last one returning the entire line of functions that was called during entry of the process. What will happen is that you possibly do a thread_detachThis twice. Short answer is just park the main thread while the bulk is being done by other threads. Unfortunately that's how many libraries work today.
Jan 23
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
Thank you very much for your answers. I think I've been on the right 
track and the following bug that I've mentioned has been messing up by 
hitting me randomly:

   https://issues.dlang.org/show_bug.cgi?id=11736

On 1/23/21 5:18 PM, IGotD- wrote:

 During rt_init in the main thread, thread_attachThis is performed what I
 have seen.
That explains why everything just works on most cases.
 Actual ctor code that runs for each TLS thread is language specific and
 not part of the ELF standard therefore no such TLS ctor code are being
 run in the lower level API. The initialization (only copy and zeroing)
 of TLS data is being done when each thread starts.
That must be the case for threads started by D runtime, right? It sounds like I must call rt_moduleTlsCtor explicitly for foreign threads. It's still not clear to me which modules' TLS variables are initialized (copied over). Only this module's or all modules that are in the program? I don't know whether it's possible to initialize one module; rt_moduleTlsCtor does not take any parameter.
 This can even be done
 in a lazy manner when the first TLS variable is being accessed.
I hope that's the case.
 I have to make a generic API and then a D
 API on top of that.
Did you mean a generic API, which makes calls to D? That's how I have it: an extern(C) API function calling proper D code.
 In practice this means there is a trampoline
 function involved where and thread_attachThis and thread_detachThis is
 being called. Also this is where I call TLS ctors/dtors.
That's what I will be doing.
 It is an effect
 that delegates is language specific and it falls natural that way. Avoid
 extern(C) calls directly into D code.
I hope I am misunderstanding you there. All I have are extern(C) function on the library API.
 In practice you can do this for any thread even if there are several
 delegates during the thread lifetime. You can simply have a TLS bool
 variable telling if the thread_attachThis and rt_moduleTlsCtor have
 already been run.
I've already experimented with it but it didn't work likely because of the bug mentioned above.
 In general the main thread that goes into main must also be the last one
 returning the entire line of functions that was called during entry of
 the process.
Main entry belongs to another language, so I have to document that this library can only work in such "well behaved" cases.
 What will happen is that you possibly do a
 thread_detachThis twice.
Sounds like I can track that with a bool variable as well, no?
 Short answer is just park the main thread while the bulk is being done
 by other threads. Unfortunately that's how many libraries work today.
Agreed. That's for me to specify in the library documentation. I should revive my old PR and see whether it is needed at all: https://github.com/dlang/druntime/pull/1989 I am surprised how much I had learned at that time and how much I've already forgotten. :/ For example, my PR involves thread_setThis, which seems to be history now: https://docarchives.dlang.io/v2.068.0/phobos/core_thread.html#.thread_setThis And thread_detachThis seems to be missing now: https://dlang.org/phobos/core_thread.html https://dlang.org/phobos/core_thread_osthread.html Documentation issue or is it not needed anymore? Ali
Jan 23
next sibling parent reply IGotD- <nise nise.com> writes:
On Sunday, 24 January 2021 at 03:59:26 UTC, Ali Çehreli wrote:
 That must be the case for threads started by D runtime, right? 
 It sounds like I must call rt_moduleTlsCtor explicitly for 
 foreign threads. It's still not clear to me which modules' TLS 
 variables are initialized (copied over). Only this module's or 
 all modules that are in the program? I don't know whether it's 
 possible to initialize one module; rt_moduleTlsCtor does not 
 take any parameter.
Any threads started by druntime has proper initialization of course. Any thread started by any module written in another language will not do D the thread initialization. All TLS variables in all loaded modules are being initialized (only copying and zeoring) by the OS system code for each thread that the OS system knows about. After that it is up to each library for each language to do further initialization. Next time __tls_get_addr is being called after loading a library, the TLS variables of any new module will be found and initialized. It is a mystery to me why the TLS standard never included a ctor/dtor vector for TLS variables. It is in practice possible but they didn't do it. The whole TLS design is like a swiss scheese.
 Did you mean a generic API, which makes calls to D? That's how 
 I have it: an extern(C) API function calling proper D code.
I have a lot of system code written in C++ which also include callbacks from that code. In order to support D a layer is necessary to catch all callbacks in a trampoline and invoke D delegates. Calling D code directly with extern(C) should be avoided because 1. D delegates are so much more versatile. 2. You must use a trampoline in order to do D specific thread initialization anyway. Since std::function cannot be used in a generic interface I actually use something like this, http://blog.coldflake.com/posts/C++-delegates-on-steroids/. Which is more versatile than plain extern(C) but simple enough so that it can be used by any language. In the case of D the "this pointer" can be used to a pointer of a D delegate. Creating language agnostic interfaces require more attention than usual as I have experienced. Strings for example complicates things further as they are different for every language.
Jan 24
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/24/21 2:28 AM, IGotD- wrote:

 Any threads started by druntime has proper initialization of course. Any
 thread started by any module written in another language will not do D
 the thread initialization.
And that of course has been what I've been trying to deal with. Bugs in the uses of thread_attachThis and thread_detachThis, and most importantly, not having a guaranteed opportunity to call thread_detachThis (think a foreign thread dies on its own without calling us and the runtime crashes attempting to stop a nonexisting thread during a GC cycle) finally made me realize that D shared library functions cannot be called on foreign threads. At least not today... Or, they can only be used under unusual conventions like the rule below. So, that's the golden rule: If you want to call functions of a D shared library (I guess static library as well) you must create your thread by our library's create_thread() function and join that thread by our library's join_thread() function. Works like a charm! Luckily, it is trivial to determine whether we are being called on a foreign thread or a D thread through a module scoped bool variable...
 Since std::function cannot be
 used in a generic interface I actually use something like this,
 http://blog.coldflake.com/posts/C++-delegates-on-steroids/.
If I understand that article correctly, and by pure coincidence, the very shared libraries that are the subject of this discussion, which I load at run time, happen to register themselves by providing function pointers. Like in the article, those function pointers are of template instances, each of which know exactly what to do for their particular types but the registry keeps opaque functions. Pseudo code: ``` // Shared library: struct A { // ... } struct B { // ... } shared static this() { register("some key", &serializer!A, // <-- Takes e.g. void* but knows about A &deserializer!B); // ditto for B } ``` And one of my issues has been that module constructors not being called when the library is loaded as a dependency of a C++ library, which is loaded by a Python module, which is imported by another Python module. :) As I said earlier, I solved that issue by parsing and persisting the output of 'nm mylib.so' to identify the module ctor and to call it after dlsym'ing. Pretty hacky but works... Getting back to my main issue: I am about to write a mixin template where any library's interface .d file will do the following and get the create_thread and join_thread functions automatically: // mylib's extern(C) functions: // This one provides mylib_create_thread() and mylib_join_thread(): mixin LibAPI!"mylib"(); // Other extern(C) functions of the library: extern(C) nothrow int foo(int) { // ... } The .h file must still be maintained by hand. Ali
Jan 29
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 30 January 2021 at 05:44:37 UTC, Ali Çehreli wrote:
 On 1/24/21 2:28 AM, IGotD- wrote:

 [...]
course. Any
 [...]
not do D
 [...]
[...]
Hmm, interesting, or what you should call it 😅 With this knowledge we have now, what changes could and/or should be made to make this process easier? 🤔 (Btw, I just "forced" my boss to buy your and Adam's book for me. I'm trying to sneak in D thecompany)
Jan 30
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 1/30/21 1:34 AM, Imperatorn wrote:

 With this knowledge we have now, what changes could and/or should be
 made to make this process easier? =F0=9F=A4=94
I wonder whether doing something in the runtime is possible. For=20 example, it may be more resilient and not crash when suspending a thread = fails because the thread may be dead already. However, studying the runtime code around thread_detachThis three years=20 ago, I had realized that like many things in computing, the whole=20 stop-the-world is wishful thinking because there is no guarantee that=20 your "please suspend this thread" request to the OS has succeeded. You=20 get a success return code back but it means your request succeeded not=20 that the thread was or will be suspended. (I may be misremembering this=20 point but I know that the runtime requests things where OS does not give = full guarantee for.) (Going off-topic, even clicking on a user interface is wishful thinking=20 because a few times a year I attempt to click on something but another=20 window element pops under my mouse pointer and I unintentionally click=20 something else, commonly on web pages as they are being rendered by a=20 browser: links move around on the page. This used to bother me but not=20 anymore. Life is not perfect and I appreciate it. :) )
 (Btw, I just "forced" my boss to buy your and Adam's book for me
Cool! :) It makes me a little sad that my online version is ahead of the = paper version by a couple of years now. I want to update the paper as=20 well but I want to work on work stuff like the topic of this discussion. = :) (Related note: the ebook versions on the web page are more up-to-date = than ones that you can buy especially because the versions on my web=20 site include a table of contents section. Consider updating your ebook=20 here: http://ddili.org/ders/d.en/index.html )
 I'm trying to sneak in D  thecompany)
I still think D is a great tool but some use cases can be tough and=20 sometimes embarrassing. :/ Ali
Jan 30
next sibling parent IGotD- <nise nise.com> writes:
On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote:
 I wonder whether doing something in the runtime is possible. 
 For example, it may be more resilient and not crash when 
 suspending a thread fails because the thread may be dead 
 already.

 However, studying the runtime code around thread_detachThis 
 three years ago, I had realized that like many things in 
 computing, the whole stop-the-world is wishful thinking because 
 there is no guarantee that your "please suspend this thread" 
 request to the OS has succeeded. You get a success return code 
 back but it means your request succeeded not that the thread 
 was or will be suspended. (I may be misremembering this point 
 but I know that the runtime requests things where OS does not 
 give full guarantee for.)
OT. A thread that suspends itself will always happen (not taking fall through cases into account), if not, throw the OS away. If a thread suspends another thread, then you don't really know when that thread will be suspended. I would discourage that threads suspends other threads because that will open up a new world of race conditions. Some systems don't even allow it and its benefits are very limited. Back to topic. I think that the generic solution even if it doesn't help you with your current implementation is to ban TLS all together. I think there have already been requests to remove TLS for druntime/phobos totally and I think this should definitely be done sooner than later. Also if you write a shared library in D, simply don't use TLS at all. This way it will not matter if a thread is registered by druntime or not. TLS is in my opinion a wart in computer science.
Jan 30
prev sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Saturday, 30 January 2021 at 12:28:16 UTC, Ali Çehreli wrote:
 On 1/30/21 1:34 AM, Imperatorn wrote:

 [...]
should be
 [...]
I wonder whether doing something in the runtime is possible. For example, it may be more resilient and not crash when suspending a thread fails because the thread may be dead already. [...]
Will take a look at the e-book also. Didn't know there was a difference 👍
Jan 30
prev sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Sunday, 24 January 2021 at 03:59:26 UTC, Ali Çehreli wrote:
 I am surprised how much I had learned at that time and how much 
 I've already forgotten. :/ For example, my PR involves 
 thread_setThis, which seems to be history now:


 https://docarchives.dlang.io/v2.068.0/phobos/core_thread.html#.thread_setThis

 And thread_detachThis seems to be missing now:

   https://dlang.org/phobos/core_thread.html

   https://dlang.org/phobos/core_thread_osthread.html

 Documentation issue or is it not needed anymore?
The documentation build on dlang.org is broken. Check the source code or Adam D. Ruppe's dpldocs.info for the complete documentation: http://dpldocs.info/experimental-docs/core.thread.osthread.html You'll find thread_setThis and thread_detachThis are still there.
Jan 27
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 28/01/2021 1:16 PM, tsbockman wrote:
 The documentation build on dlang.org is broken. Check the source code or 
 Adam D. Ruppe's dpldocs.info for the complete documentation:
 http://dpldocs.info/experimental-docs/core.thread.osthread.html
Fixed: https://issues.dlang.org/show_bug.cgi?id=21309
Jan 27
parent tsbockman <thomas.bockman gmail.com> writes:
On Thursday, 28 January 2021 at 00:58:17 UTC, rikki cattermole 
wrote:
 On 28/01/2021 1:16 PM, tsbockman wrote:
 The documentation build on dlang.org is broken. Check the 
 source code or Adam D. Ruppe's dpldocs.info for the complete 
 documentation:
 http://dpldocs.info/experimental-docs/core.thread.osthread.html
Fixed: https://issues.dlang.org/show_bug.cgi?id=21309
I still don't see thread_setThis and thread_detachThis anywhere on the dlang.org copy.
Jan 27