www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - TLS and x86-64 fs:

reply Cecil Ward <cecil cecilward.com> writes:
It seems that in x86-64 Linux (which I think is what godbolt.org 
uses - is that correct?), LDC emits a function call to get the 
address of your TLS static base. GDC seems to be vastly more 
efficient as it simply uses an fs: segment override and apart 
from that accesses the RAM directly. I don’t know what is 
happening if someone takes the address of a TLS static and mixes 
it with a comparison or pointer arithmetic with say an alloc cell 
or address in one of the stacks. How that works is a GDC question.

I wonder why the difference in method between the compilers ? 
Could LDC steal the GDC tech here in order to get much more speed 
?
Jun 28 2023
next sibling parent Cecil Ward <cecil cecilward.com> writes:
On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
 It seems that in x86-64 Linux (which I think is what 
 godbolt.org uses - is that correct?), LDC emits a function call 
 to get the address of your TLS static base. GDC seems to be 
 vastly more efficient as it simply uses an fs: segment override 
 and apart from that accesses the RAM directly. I don’t know 
 what is happening if someone takes the address of a TLS static 
 and mixes it with a comparison or pointer arithmetic with say 
 an alloc cell or address in one of the stacks. How that works 
 is a GDC question.

 I wonder why the difference in method between the compilers ? 
 Could LDC steal the GDC tech here in order to get much more 
 speed ?
Perhaps I should try taking the address of a TLS static with GDC and taking a look at its value?
Jun 28 2023
prev sibling parent reply IGotD- <nise nise.com> writes:
On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
 It seems that in x86-64 Linux (which I think is what 
 godbolt.org uses - is that correct?), LDC emits a function call 
 to get the address of your TLS static base. GDC seems to be 
 vastly more efficient as it simply uses an fs: segment override 
 and apart from that accesses the RAM directly. I don’t know 
 what is happening if someone takes the address of a TLS static 
 and mixes it with a comparison or pointer arithmetic with say 
 an alloc cell or address in one of the stacks. How that works 
 is a GDC question.

 I wonder why the difference in method between the compilers ? 
 Could LDC steal the GDC tech here in order to get much more 
 speed ?
LDC doesn't need to steal anything because TLS access is a standard, for Linux in the ELF and runtime ABI for the CPU architecture standard. The difference is because they are several different ways accessing TLS. Typically, if you access TLS from the main executable, then the compiler can optimize TLS accesses using the fs segment on x86. I don't know why LDC chooses to use a function call but this is likely to be a setting in LLVM as it should support all types of accesses. Function calls to obtain TLS access is typically used when the code is in a dynamically loaded library that was loaded by your code (not "statically" loaded library that the linker can determine at link time, I know it's a bit messy to understand this). A system standard (operating system) can actually choose what types of TLS access is supposed to be used.
Jun 29 2023
parent reply Cecil Ward <cecil cecilward.com> writes:
On Thursday, 29 June 2023 at 09:10:03 UTC, IGotD- wrote:
 On Wednesday, 28 June 2023 at 23:39:43 UTC, Cecil Ward wrote:
 It seems that in x86-64 Linux (which I think is what 
 godbolt.org uses - is that correct?), LDC emits a function 
 call to get the address of your TLS static base. GDC seems to 
 be vastly more efficient as it simply uses an fs: segment 
 override and apart from that accesses the RAM directly. I 
 don’t know what is happening if someone takes the address of a 
 TLS static and mixes it with a comparison or pointer 
 arithmetic with say an alloc cell or address in one of the 
 stacks. How that works is a GDC question.

 I wonder why the difference in method between the compilers ? 
 Could LDC steal the GDC tech here in order to get much more 
 speed ?
LDC doesn't need to steal anything because TLS access is a standard, for Linux in the ELF and runtime ABI for the CPU architecture standard. The difference is because they are several different ways accessing TLS. Typically, if you access TLS from the main executable, then the compiler can optimize TLS accesses using the fs segment on x86. I don't know why LDC chooses to use a function call but this is likely to be a setting in LLVM as it should support all types of accesses. Function calls to obtain TLS access is typically used when the code is in a dynamically loaded library that was loaded by your code (not "statically" loaded library that the linker can determine at link time, I know it's a bit messy to understand this). A system standard (operating system) can actually choose what types of TLS access is supposed to be used.
Thanks IGotD, this is making an exe, not a shared library. I can see it in x86-64 in godbolt.org in a minimal function that simply returns the content of a TLS static. Weird. Do you know how to control this behaviour then? I would like to see what is in that routine, to see the performance cost of it as is. I’d like to understand if it is an error taking the address difference : static ubyte TLS_static; __gshared ubyte _g_shared_static; const diff = &TLS_static - &_g_shared_static; const comparison = &TLS_static < &_g_shared_static;
Jun 29 2023
parent kinke <noone nowhere.com> writes:
On Thursday, 29 June 2023 at 17:03:35 UTC, Cecil Ward wrote:
 Do you know how to control this behaviour then?
With `-fthread-model`. E.g., `-fthread-model=local-exec` seems to be what GDC defaults to.
Jul 04 2023