D.gnu - Getting tls symbols out of the compiler.

Iain Buclaw (39/39) May 29 2013 OK, from today till the end of the week I'm going to be sitting

Johannes Pfau (38/61) May 30 2013 Do you know what that comment in that file "Sadly, this does not work

Iain Buclaw (35/96) May 30 2013 It's the same problem with what we currently have. LD when it gets

Johannes Pfau (19/51) May 30 2013 I see.

Iain Buclaw (6/57) May 30 2013 Yes, I've noticed (and even gave libdruntime a test on gdc shortly
Iain Buclaw (14/30) May 30 2013 Speed won't at all be a problem in that aspect. All roots are kept in
Jacob Carlborg (7/11) May 31 2013 He uses "dl_iterate_phdr" to iterate all shared libraries. Then iterates...

Johannes Pfau (3/17) May 31 2013 OK, then I was wrong about that :-)

Jacob Carlborg (25/30) May 31 2013 For Mac OS X there's a function to get a section form an image:

David Nadlinger (5/9) Jun 01 2013 Yep. It's a bit cumbersome to handle, though, as it relies on

Jacob Carlborg (4/7) Jun 02 2013 Oh, I didn't noticed it used blocks.

"Iain Buclaw" <ibuclaw ubuntu.com> writes:

OK, from today till the end of the week I'm going to be sitting 
down and thinking through this, as I'd really like the emitting 
of _tlsstart and _tlsend out of the compiler by hook or by crook.


Phase one - start:

We have rt/tls.S.  Have added this to the libdruntime builds, and 
I see no immediate problems running this through the testsuite  
(though - that is never really a good indicator of anything).  
Drawbacks, this is only available for linux at the moment. But 
that's fine if we keep this temporary for the week.


Phase two - plan:

The GC has a hook gc_addRoot used for the purpose of tracking GC 
allocated memory in C-land.  The idea I've got turning over in my 
head at the moment is that any thread local decls that are 
'new-able' (classes, pointers, d arrays) are added as a root upon 
'new' declaration,  this is safe-guarded by a thread-local static 
to prevent multiple calls to gc_addRoot.

eg:
---
var = new Object();
---
if (!var.guard)
{
   ++var.guard;
   gc_addRoot (&var);
}
var = _d_new_class (&Object_Class);
---

Then in the destruction of the thread (eg: atexit, or through a 
destructor called in .fini section)

---
if (var.guard)
   gc_removeRoot (&var);
---

Though this should not be required if we have a proper TLS GC in 
place.

Does this seem like a reasonable job?  Or have I completely lost 
the plot at half past 2 in the morning?  :o)

Regards
Iain.

May 29 2013

Johannes Pfau <nospam example.com> writes:

Am Thu, 30 May 2013 01:19:10 +0200
schrieb "Iain Buclaw" <ibuclaw ubuntu.com>:

 OK, from today till the end of the week I'm going to be sitting 
 down and thinking through this, as I'd really like the emitting 
 of _tlsstart and _tlsend out of the compiler by hook or by crook.
 
 
 Phase one - start:
 
 We have rt/tls.S.  Have added this to the libdruntime builds, and 
 I see no immediate problems running this through the testsuite  
 (though - that is never really a good indicator of anything).  
 Drawbacks, this is only available for linux at the moment. But 
 that's fine if we keep this temporary for the week.

Do you know what that comment in that file "Sadly, this does not work
because ld orders [...]" is about?

 Phase two - plan:
 
 The GC has a hook gc_addRoot used for the purpose of tracking GC 
 allocated memory in C-land.  The idea I've got turning over in my 
 head at the moment is that any thread local decls that are 
 'new-able' (classes, pointers, d arrays) are added as a root upon 
 'new' declaration,  this is safe-guarded by a thread-local static 
 to prevent multiple calls to gc_addRoot.
 

The good part part is that this will work with non-contiguous TLS, i.e
GCC emulated TLS. The bad part is that it'll be very slow so it should
really only be a fallback. And what about non-newable types?

Why don't we register all thread local variables in a module
constructor instead (in the C one or in the D one)?

static int a;
static void* b; 

static this()
{
    gc_addRoot(&a);
    gc_addRoot(&b);
}

Then unload in the module destructor. We could try to optimize that,
although we have to be careful:

static this() //pseudo code
{
    void*[2] roots;
    roots[0] = &a;
    roots[1] = &b;
    if(isContiguous(roots[])) //Sort. need to allow alignment though
       gc_addRange(roots[0], roots[$] + size) //probably need to keep a
       size array as well
    else
        foreach(ptr; roots)
              gc_addRoot(ptr);
}

This can detect if the TLS memory is contiguous and then only add one
range per module. Maybe it's not worth the effort as long as it's only
a slow fallback (and even with the optimization it'll still be slow)

 Though this should not be required if we have a proper TLS GC in 
 place.

Do you mean Martin Nowak's shared library/TLS work? That indeed sounds
like the proper solution.

(And maybe the D community should develop a sane standard interface to
the runtime linker to access sections. Then go lobbying all major libcs
out there...)

May 30 2013

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 30 May 2013 09:25, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 30 May 2013 01:19:10 +0200
 schrieb "Iain Buclaw" <ibuclaw ubuntu.com>:

 OK, from today till the end of the week I'm going to be sitting
 down and thinking through this, as I'd really like the emitting
 of _tlsstart and _tlsend out of the compiler by hook or by crook.


 Phase one - start:

 We have rt/tls.S.  Have added this to the libdruntime builds, and
 I see no immediate problems running this through the testsuite
 (though - that is never really a good indicator of anything).
 Drawbacks, this is only available for linux at the moment. But
 that's fine if we keep this temporary for the week.

 Do you know what that comment in that file "Sadly, this does not work
 because ld orders [...]" is about?

It's the same problem with what we currently have.  LD when it gets
all reference symbols that are to go into the tls section (for Linux,
at least) has the right to reorder the symbols.  As such, trying to
push _tlsstart as the first symbol in the compiler has no guarantees
it will be the first symbol in the object file.


 Phase two - plan:

 The GC has a hook gc_addRoot used for the purpose of tracking GC
 allocated memory in C-land.  The idea I've got turning over in my
 head at the moment is that any thread local decls that are
 'new-able' (classes, pointers, d arrays) are added as a root upon
 'new' declaration,  this is safe-guarded by a thread-local static
 to prevent multiple calls to gc_addRoot.

 The good part part is that this will work with non-contiguous TLS, i.e
 GCC emulated TLS. The bad part is that it'll be very slow so it should
 really only be a fallback. And what about non-newable types?

Non-newable types aren't collected because they never reference new
memory...   This is true for basic types (int, float, complex,
vectors).  The same is also true for static arrays (although array[] =
[1,2,3,4] calls _d_arrayliteral, the allocated memory is copied so can
be free'd immediately afterwards).   For structures we can check
((TypeStruct *) t)->hasPointers() at compile time - if false then we
can be safely assured that this will never be referencing allocated
memory.  Classes, pointers and D arrays I've already mentioned.
Forgot to mention associative arrays, which would also be added as a
root upon initialisation.


 Why don't we register all thread local variables in a module
 constructor instead (in the C one or in the D one)?

 static int a;
 static void* b;

 static this()
 {
     gc_addRoot(&a);
     gc_addRoot(&b);
 }

This is more of a lazy init that won't affect start-up speed.  Note:
this idea is based off what C++ (g++) does for say - static A a = new
A();


 Then unload in the module destructor. We could try to optimize that,
 although we have to be careful:

 static this() //pseudo code
 {
     void*[2] roots;
     roots[0] = &a;
     roots[1] = &b;
     if(isContiguous(roots[])) //Sort. need to allow alignment though
        gc_addRange(roots[0], roots[$] + size) //probably need to keep a
        size array as well
     else
         foreach(ptr; roots)
               gc_addRoot(ptr);
 }

This is a lot more work to describe in the compiler.  :)



 This can detect if the TLS memory is contiguous and then only add one
 range per module. Maybe it's not worth the effort as long as it's only
 a slow fallback (and even with the optimization it'll still be slow)

I don't think it would have much slow down.  Albeit the first
initialisation would jump through the druntime library twice, but
there after it's a single/two instruction test.  Pretty negligible -
but I'm not a speeeeeeeeed demon or freak who wants everything
compiled with -fOMG-fast. =)


 Though this should not be required if we have a proper TLS GC in
 place.

 Do you mean Martin Nowak's shared library/TLS work? That indeed sounds
 like the proper solution.

It should certainly mean that we don't have to worry about removing
roots once they've been added.


 (And maybe the D community should develop a sane standard interface to
 the runtime linker to access sections. Then go lobbying all major libcs
 out there...)

You mean binutils?  :-)

See binutils/ld/scripttempl  for the ldscripts used to lay out the tls
data sections.  (Note, only a few actually have a TLS section - and
only winpe defines a _tls_start__ and _tls_end__ symbol).


Regards
--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

May 30 2013

Johannes Pfau <nospam example.com> writes:

Am Thu, 30 May 2013 10:42:17 +0100
schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 
 This is more of a lazy init that won't affect start-up speed.  Note:
 this idea is based off what C++ (g++) does for say - static A a = new
 A();
 

I see.

 
 I don't think it would have much slow down.  Albeit the first
 initialisation would jump through the druntime library twice, but
 there after it's a single/two instruction test.  Pretty negligible -
 but I'm not a speeeeeeeeed demon or freak who wants everything
 compiled with -fOMG-fast. =)

I'm more worried about adding many roots to the GC. The code which
checks the guard variable is probably neglectable.

 
 
 Though this should not be required if we have a proper TLS GC in
 place.

 Do you mean Martin Nowak's shared library/TLS work? That indeed
 sounds like the proper solution.

 
 It should certainly mean that we don't have to worry about removing
 roots once they've been added.

AFAIK he also changed how TLS sections are looked up for the main
executable. _tlsstart and _tlsend are not used anymore. Instead some
mainly undocumented obscure glibc interface is used to ask the runtime
linker for the start and end of the TLS section.

https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_linux.d#L122
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_linux.d#L164

https://github.com/D-Programming-Language/dmd/commit/313132b6e20c119fa64c6164574818421cb522ce

 
 (And maybe the D community should develop a sane standard interface
 to the runtime linker to access sections. Then go lobbying all
 major libcs out there...)

 
 You mean binutils?  :-)
 
 See binutils/ld/scripttempl  for the ldscripts used to lay out the tls
 data sections.  (Note, only a few actually have a TLS section - and
 only winpe defines a _tls_start__ and _tls_end__ symbol).

Well the start/end symbols could be introduced in binutils linker
scripts. But the elf format already has all the information about
section size and the runtime linker (ld.so) knows where every section
starts and it also knows its size. There's just no standard interface to
receive that information. (I had a quick look at the FreeBSD runtime
linker some time ago. With some C libraries you can even get the
elf header, but it's all non-standard and undocumented)

May 30 2013

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 30 May 2013 11:13, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 30 May 2013 10:42:17 +0100
 schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 This is more of a lazy init that won't affect start-up speed.  Note:
 this idea is based off what C++ (g++) does for say - static A a = new
 A();

 I see.

 I don't think it would have much slow down.  Albeit the first
 initialisation would jump through the druntime library twice, but
 there after it's a single/two instruction test.  Pretty negligible -
 but I'm not a speeeeeeeeed demon or freak who wants everything
 compiled with -fOMG-fast. =)

 I'm more worried about adding many roots to the GC. The code which
 checks the guard variable is probably neglectable.

 Though this should not be required if we have a proper TLS GC in
 place.

 Do you mean Martin Nowak's shared library/TLS work? That indeed
 sounds like the proper solution.

 It should certainly mean that we don't have to worry about removing
 roots once they've been added.

 AFAIK he also changed how TLS sections are looked up for the main
 executable. _tlsstart and _tlsend are not used anymore. Instead some
 mainly undocumented obscure glibc interface is used to ask the runtime
 linker for the start and end of the TLS section.

 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_linux.d#L122
 https://github.com/D-Programming-Language/druntime/blob/master/src/rt/sections_linux.d#L164

 https://github.com/D-Programming-Language/dmd/commit/313132b6e20c119fa64c6164574818421cb522ce

 (And maybe the D community should develop a sane standard interface
 to the runtime linker to access sections. Then go lobbying all
 major libcs out there...)

 You mean binutils?  :-)

 See binutils/ld/scripttempl  for the ldscripts used to lay out the tls
 data sections.  (Note, only a few actually have a TLS section - and
 only winpe defines a _tls_start__ and _tls_end__ symbol).

 Well the start/end symbols could be introduced in binutils linker
 scripts. But the elf format already has all the information about
 section size and the runtime linker (ld.so) knows where every section
 starts and it also knows its size. There's just no standard interface to
 receive that information. (I had a quick look at the FreeBSD runtime
 linker some time ago. With some C libraries you can even get the
 elf header, but it's all non-standard and undocumented)


Yes, I've noticed  (and even gave libdruntime a test on gdc shortly
after the conference)  - it's broken everything ... literally ...


--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

May 30 2013

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 30 May 2013 11:13, Johannes Pfau <nospam example.com> wrote:
 Am Thu, 30 May 2013 10:42:17 +0100
 schrieb Iain Buclaw <ibuclaw ubuntu.com>:

 This is more of a lazy init that won't affect start-up speed.  Note:
 this idea is based off what C++ (g++) does for say - static A a = new
 A();

 I see.

 I don't think it would have much slow down.  Albeit the first
 initialisation would jump through the druntime library twice, but
 there after it's a single/two instruction test.  Pretty negligible -
 but I'm not a speeeeeeeeed demon or freak who wants everything
 compiled with -fOMG-fast. =)

 I'm more worried about adding many roots to the GC. The code which
 checks the guard variable is probably neglectable.

Speed won't at all be a problem in that aspect.  All roots are kept in
an array, so during collection, rather than the following:
---
mark (&_tlsstart,  &_tlsend - &_tlsstart);
---

It will be doing:
---
mark (this.roots, this.roots + this.nroots);
---


Regards
--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

May 30 2013

Jacob Carlborg <doob me.com> writes:

On 2013-05-30 12:13, Johannes Pfau wrote:

 AFAIK he also changed how TLS sections are looked up for the main
 executable. _tlsstart and _tlsend are not used anymore. Instead some
 mainly undocumented obscure glibc interface is used to ask the runtime
 linker for the start and end of the TLS section.

He uses "dl_iterate_phdr" to iterate all shared libraries. Then iterates 
all segments/sections to find the ones used for TLS.

Documented here, no secrets :)

http://linux.die.net/man/3/dl_iterate_phdr

-- 
/Jacob Carlborg

May 31 2013

Johannes Pfau <nospam example.com> writes:

Am Fri, 31 May 2013 09:42:10 +0200
schrieb Jacob Carlborg <doob me.com>:

 On 2013-05-30 12:13, Johannes Pfau wrote:
 
 AFAIK he also changed how TLS sections are looked up for the main
 executable. _tlsstart and _tlsend are not used anymore. Instead some
 mainly undocumented obscure glibc interface is used to ask the
 runtime linker for the start and end of the TLS section.

 
 He uses "dl_iterate_phdr" to iterate all shared libraries. Then
 iterates all segments/sections to find the ones used for TLS.
 
 Documented here, no secrets :)
 
 http://linux.die.net/man/3/dl_iterate_phdr
 

OK, then I was wrong about that :-)

May 31 2013

Jacob Carlborg <doob me.com> writes:

On 2013-05-30 11:42, Iain Buclaw wrote:

 It's the same problem with what we currently have.  LD when it gets
 all reference symbols that are to go into the tls section (for Linux,
 at least) has the right to reorder the symbols.  As such, trying to
 push _tlsstart as the first symbol in the compiler has no guarantees
 it will be the first symbol in the object file.

For Mac OS X there's a function to get a section form an image:

getsectbynamefromheader
getsectbynamefromheader_64

Documented here:

https://developer.apple.com/library/mac/#documentation/developertools/Reference/MachOReference/Reference/reference.html

You can get the mach_header (image) using this callback function:

_dyld_register_func_for_add_image

Or using the same approach Martin Nowak used for Linux.

The problem with the above function is that you cannot unregister the 
callback. If you have a dynamic library that registers a callback, i.e. 
druntime. If the dynamic library gets unloaded you will have a crash the 
next type "dlopen" is used.

A solution to that problem would be to either use the same approach 
Martin Nowak used or any of these undocumented functions from the 
dynamic linker:

void dyld_register_image_state_change_handler(dyld_image_states state, 
bool batch, dyld_image_state_change_handler handler)

Works like "_dyld_register_func_for_add_image" but will force the image 
using it to not be unloaded, or something like that.

Or:

void dyld_enumerate_tlv_storage(dyld_tlv_state_change_handler handler)

If I recall correctly, LDC is using "dyld_enumerate_tlv_storage".

-- 
/Jacob Carlborg

May 31 2013

"David Nadlinger" <see klickverbot.at> writes:

On Friday, 31 May 2013 at 07:58:25 UTC, Jacob Carlborg wrote:
 void dyld_enumerate_tlv_storage(dyld_tlv_state_change_handler 
 handler)

 If I recall correctly, LDC is using 
 "dyld_enumerate_tlv_storage".

Yep. It's a bit cumbersome to handle, though, as it relies on 
Blocks support:

https://github.com/ldc-developers/druntime/blob/2c78290fff65f2fce763da4b077856a2fc7596fb/src/ldc/osx_tls.c#L36

David

Jun 01 2013

Jacob Carlborg <doob me.com> writes:

On 2013-06-01 11:53, David Nadlinger wrote:

 Yep. It's a bit cumbersome to handle, though, as it relies on Blocks
 support:

 https://github.com/ldc-developers/druntime/blob/2c78290fff65f2fce763da4b077856a2fc7596fb/src/ldc/osx_tls.c#L36

Oh, I didn't noticed it used blocks.

-- 
/Jacob Carlborg

Jun 02 2013

D Programming

C/C++ Programming

Other

D.gnu - Getting tls symbols out of the compiler.