www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - --emulated-tls explanation?

reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
Hi!

Can anyone explain what "--emulated-tls" actually do?

It solves my problem with correct static variables placement on 
ARM Cortex M3, but I don't know why.
Oct 13 2020
next sibling parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 09:32:07 UTC, Denis Feklushkin 
wrote:
 Hi!

 Can anyone explain what "--emulated-tls" actually do?

 It solves my problem with correct static variables placement on 
 ARM Cortex M3, but I don't know why.
Problem was: Without --emulated-tls static member variables sometimes(?) was placed on same place. This not affect usual (TLS) variables, or shared/__gshared.
Oct 13 2020
parent Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 09:59:40 UTC, Denis Feklushkin 
wrote:
 On Tuesday, 13 October 2020 at 09:32:07 UTC, Denis Feklushkin 
 wrote:
 Hi!

 Can anyone explain what "--emulated-tls" actually do?

 It solves my problem with correct static variables placement 
 on ARM Cortex M3, but I don't know why.
Problem was: Without --emulated-tls static member variables sometimes(?) was placed on same place.
I.e., compiler presumes that.
Oct 13 2020
prev sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 09:32:07 UTC, Denis Feklushkin 
wrote:
 Hi!

 Can anyone explain what "--emulated-tls" actually do?

 It solves my problem with correct static variables placement on 
 ARM Cortex M3, but I don't know why.
I think it is a compatibility layer that GCC provides. Instead of implementing everything yourself from scratch, GCC provide a framework and a set of hooks you should implement. http://www.chiark.greenend.org.uk/doc/gcc-4.9-doc/gccint.html#Emulated-TLS It seems like GCC provides default hooks, so for example if threading is not enabled this TLS emulation layer is probably pretty stupid and do not know what a thread is. The variables are dynamically allocated using the C library memory allocation functions. In practice you should read about the Runtime ABI for the ARM architecture, how TLS is implemented for ARM. For a custom system you have all the degrees of freedom and can do what you want and is usually better. ARM and GCC also offers several options how to implement it, static version, dynamic version, a mix, use thread pointer register, use a function for retrieval etc.
Oct 13 2020
parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 10:25:35 UTC, IGotD- wrote:
 On Tuesday, 13 October 2020 at 09:32:07 UTC, Denis Feklushkin 
 wrote:
 Hi!

 Can anyone explain what "--emulated-tls" actually do?

 It solves my problem with correct static variables placement 
 on ARM Cortex M3, but I don't know why.
I think it is a compatibility layer that GCC provides. Instead of implementing everything yourself from scratch, GCC provide a framework and a set of hooks you should implement. http://www.chiark.greenend.org.uk/doc/gcc-4.9-doc/gccint.html#Emulated-TLS It seems like GCC provides default hooks, so for example if threading is not enabled this TLS emulation layer is probably pretty stupid and do not know what a thread is.
So, compiler knows what this platform is not supports multithreading and does some things wrong with thread static variables if "--emulated-tls" is ommited?
 The variables are dynamically allocated using the C library 
 memory allocation functions.
As I understand, variables allocated by compiler, but it uses internal implict call to __tls_get_addr to provide access to them.
Oct 13 2020
parent reply IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 10:35:57 UTC, Denis Feklushkin 
wrote:
 So, compiler knows what this platform is not supports 
 multithreading and does some things wrong with thread static 
 variables if "--emulated-tls" is ommited?
You can see the implementation yourself. https://github.com/gcc-mirror/gcc/blob/master/libgcc/emutls.c I have used TLS emulation myself and it just works despite the library has no definition of threads or mutexes so I guess these are just stubs or use stubs in the C library in that case. Basically single threaded TLS.
 As I understand, variables allocated by compiler, but it uses 
 internal implict call to __tls_get_addr to provide access to 
 them.
If SW call is chosen then __tls_get_addr is the function that is used in order to obtain the address of a TLS variable. If emulation is used this function is just forwarded to the emulation function. It is almost easier to implement __tls_get_addr yourself and skip the emulation. If you look at the emulation layer it is filled with mutexes and mallocs and on simple systems this can be totally avoided if you use your own solution. Especially in real-time systems, the emulation layer should not be used for obvious reasons.
Oct 13 2020
parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 11:02:56 UTC, IGotD- wrote:
 On Tuesday, 13 October 2020 at 10:35:57 UTC, Denis Feklushkin 
 wrote:
 So, compiler knows what this platform is not supports 
 multithreading and does some things wrong with thread static 
 variables if "--emulated-tls" is ommited?
You can see the implementation yourself. https://github.com/gcc-mirror/gcc/blob/master/libgcc/emutls.c I have used TLS emulation myself and it just works despite the library has no definition of threads or mutexes so I guess these are just stubs or use stubs in the C library in that case. Basically single threaded TLS.
 As I understand, variables allocated by compiler, but it uses 
 internal implict call to __tls_get_addr to provide access to 
 them.
If SW call is chosen then __tls_get_addr is the function that is used in order to obtain the address of a TLS variable. If emulation is used this function is just forwarded to the emulation function. It is almost easier to implement __tls_get_addr yourself and skip the emulation.
Ok, I see in my binary what if I use "--emulated-tls" 3-rd party function __tls_get_address (provided by picolibc) replaced by __emutls_get_address. But it is still not clear why static variables are now not "superimposed" on one another at the same addresses.
 If you look at the emulation layer it is filled with mutexes 
 and mallocs and on simple systems this can be totally avoided 
 if you use your own solution. Especially in real-time systems, 
 the emulation layer should not be used for obvious reasons.
Yes, we already have fibers for this. However, at least one TLS must be created that belongs to the main thread.
Oct 13 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 11:13:16 UTC, Denis Feklushkin 
wrote:
 But it is still not clear why static variables are now not 
 "superimposed" on one another at the same addresses.
If you don't have emulated TLS, your build shouldn't even succeed if you haven't implemented __tls_get_address. So if you have a build that you can test, where does this __tls_get_address come from? Another possibility is that you can use the thread pointer register directly and not use __aeabi_read_tp.__aeabi_read_tp is basically the function that is used for obtaining the thread pointer by SW instead for initial exec model.
Oct 13 2020
parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 11:23:23 UTC, IGotD- wrote:
 On Tuesday, 13 October 2020 at 11:13:16 UTC, Denis Feklushkin 
 wrote:
 But it is still not clear why static variables are now not 
 "superimposed" on one another at the same addresses.
If you don't have emulated TLS, your build shouldn't even succeed if you haven't implemented __tls_get_address. So if you have a build that you can test, where does this __tls_get_address come from?
It is provided by "picolibc" library. Actually it provides __aeabi_read_tp but I wrap it: https://github.com/denizzzka/d_c_arm_test/blob/master/d/freertos_druntime_backend/external/rt/sections.d#L45
Oct 13 2020
parent reply IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 11:27:45 UTC, Denis Feklushkin 
wrote:
 On Tuesday, 13 October 2020 at 11:23:23 UTC, IGotD- wrote:
 On Tuesday, 13 October 2020 at 11:13:16 UTC, Denis Feklushkin 
 wrote:
 But it is still not clear why static variables are now not 
 "superimposed" on one another at the same addresses.
If you don't have emulated TLS, your build shouldn't even succeed if you haven't implemented __tls_get_address. So if you have a build that you can test, where does this __tls_get_address come from?
It is provided by "picolibc" library. Actually it provides __aeabi_read_tp but I wrap it: https://github.com/denizzzka/d_c_arm_test/blob/master/d/freertos_druntime_backend/external/rt/sections.d#L45
The function prototype of __tls_get_address is wrong. It should be struct tls_index { size_t ti_module; size_t ti_offset; }; void* __tls_get_addr(tls_index* ti) You perhaps don't use modules but you certainly need an offset. It should rather be something like void* __tls_get_addr(tls_index* ti) { return getThreadTlsArea(ti->ti_module) + ti->ti_offset; }
Oct 13 2020
next sibling parent IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 11:38:33 UTC, IGotD- wrote:
 The function prototype of __tls_get_address is wrong. It should 
 be

 struct tls_index
 {
 	size_t ti_module;
 	size_t ti_offset;
 };

 void* __tls_get_addr(tls_index* ti)


 You perhaps don't use modules but you certainly need an offset.

 It should rather be something like

 void* __tls_get_addr(tls_index* ti)
 {
     return getThreadTlsArea(ti->ti_module) + ti->ti_offset;
 }
Just to add to the confusion. If you compile everything statically into one binary, __tls_get_addr should never really be called at least with C/C++. Then the compiler should optimize and call __aeabi_read_tp directly. The compiler inserts TP + offset itself instead as it assumes all statically and dynamically linked that are loaded during program start have already allocated the TLS area and TP is valid. However, I've seen that D seems to insert calls __tls_get_addr anyway like the initial exec model optimization doesn't exist. That's a question if that model is implemented in D.
Oct 13 2020
prev sibling parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 11:38:33 UTC, IGotD- wrote:

 It is provided by "picolibc" library.

 Actually it provides __aeabi_read_tp but I wrap it:
 https://github.com/denizzzka/d_c_arm_test/blob/master/d/freertos_druntime_backend/external/rt/sections.d#L45
The function prototype of __tls_get_address is wrong.
It is implemented inside of LLVM? Can you provide link to right declaration? Google full of "__tls_get_addr()" form
 It should be

 struct tls_index
 {
 	size_t ti_module;
 	size_t ti_offset;
 };

 void* __tls_get_addr(tls_index* ti)


 You perhaps don't use modules but you certainly need an offset.

 It should rather be something like

 void* __tls_get_addr(tls_index* ti)
 {
     return getThreadTlsArea(ti->ti_module) + ti->ti_offset;
 }
Yep, sounds like the correct explanation of my issue! Thanks!
Oct 13 2020
next sibling parent Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 12:00:34 UTC, Denis Feklushkin 
wrote:
 On Tuesday, 13 October 2020 at 11:38:33 UTC, IGotD- wrote:

 It is provided by "picolibc" library.

 Actually it provides __aeabi_read_tp but I wrap it:
 https://github.com/denizzzka/d_c_arm_test/blob/master/d/freertos_druntime_backend/external/rt/sections.d#L45
The function prototype of __tls_get_address is wrong.
It is implemented inside of LLVM? Can you provide link to right declaration?
Don't worry, found it. Thanks again!
Oct 13 2020
prev sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 13 October 2020 at 12:00:34 UTC, Denis Feklushkin 
wrote:
 It is implemented inside of LLVM?
 Can you provide link to right declaration?

 Google full of "__tls_get_addr()" form
Sorry, I can't because it is a mess. I think that __tls_get_addr is connected to an ELF standard for thread-local storage. https://akkadia.org/drepper/tls.pdf Note that the document doesn't include ARM but Itanium and other unusual CPUs. However declaration of __tls_get_addr is stated there for other CPUs than ARM. I've seen other versions of __tls_get_addr out there as well and it can be architecture dependent. Also keep in mind that I use ARMv7-ar as reference point here, I cannot 100% say that it is the same for Cortex M3. The RunTime ABI for ARM http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=C3E4EA7E008BA776DF7E0F54C8F7CCF1?doi=10.1.1.352.5218&rep=rep1&type=pdf isn't really that helpful either.
Oct 13 2020
parent reply Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 12:33:57 UTC, IGotD- wrote:

 Sorry, I can't because it is a mess. I think that 
 __tls_get_addr is connected to an ELF standard for thread-local 
 storage.
Looks like this problem is solvable: it is possible to push through own implementation of (emu)TLS by replacing it in the linker (--wrap=). Emutls implementation not tied to anything.
Oct 14 2020
parent reply IGotD- <nise nise.com> writes:
On Thursday, 15 October 2020 at 01:23:53 UTC, Denis Feklushkin 
wrote:
 Looks like this problem is solvable: it is possible to push 
 through own implementation of (emu)TLS by replacing it in the 
 linker (--wrap=). Emutls implementation not tied to anything.
That's the same as implementing your own version so you don't need the TLS emulation at all. TLS in a static system isn't that difficult. https://wiki.osdev.org/Thread_Local_Storage The only thing you need to implement is __tls_get_addr and possibly __aeabi_read_tp.
Oct 15 2020
parent Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Thursday, 15 October 2020 at 15:09:45 UTC, IGotD- wrote:
 On Thursday, 15 October 2020 at 01:23:53 UTC, Denis Feklushkin 
 wrote:
 Looks like this problem is solvable: it is possible to push 
 through own implementation of (emu)TLS by replacing it in the 
 linker (--wrap=). Emutls implementation not tied to anything.
That's the same as implementing your own version so you don't need the TLS emulation at all.
I am not sure, but looks like different arguments for __tls_get_addr will be used in case if emulation is enabled and disabled? If emulation will be used arguments is same as for gcc and these args allows you to avoid ELF-related things.
Oct 16 2020
prev sibling parent Denis Feklushkin <feklushkin.denis gmail.com> writes:
On Tuesday, 13 October 2020 at 11:13:16 UTC, Denis Feklushkin 
wrote:

 If SW call is chosen then __tls_get_addr is the function that 
 is used in order to obtain the address of a TLS variable. If 
 emulation is used this function is just forwarded to the 
 emulation function. It is almost easier to implement 
 __tls_get_addr yourself and skip the emulation.
Ok, I see in my binary what if I use "--emulated-tls" 3-rd party function __tls_get_address (provided by picolibc) replaced by __emutls_get_address. But it is still not clear why static variables are now not "superimposed" on one another at the same addresses.
(I don't spawn threads)
Oct 13 2020