www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Implementing native TLS on OS X in DMD

reply Jacob Carlborg <doob me.com> writes:
This might be a bit odd to ask this question in the LDC newsgroup, but 
since LDC already supports native TLS on OS X I was hoping to get some 
help here.

I've implemented native TLS on OS X in DMD to the best of my knowledge. 
The data in the sections look correct, the assembly look correct, I've 
updated druntime to use the same code, in this regard, as LDC does. 
Everything seems to work correctly in the simple cases I've tried.

But, I have an issue when the garbage collector is run. In particular 
when running the DMD test suite. The failing test is this one [1]. I get 
a segmentation fault (in the debugger, range error) here [2], after 
executing the outer loop once. I highly suspect that it's the garbage 
collector that collects "_chars" [3] (or its content) too early, since 
the destructor of SomeClass [4] is executed. If I make "_chars" 
__gshared it doesn't crash. If I remove the call to the GC [5], it 
doesn't crash.

I've been trying to debug this but I don't have much knowledge in this 
area. What I have found out is that "_chars" is included in the range 
returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, 
and it looks like "_chars" is marked twice, before crashing. Or at least 
a range where "_chars" is included.

One thing that worries me though is the range returned by 
_d_dyld_getTLSRange for LDC is a quite a lot larger (around 3500) than 
for DMD (around 650). But I noticed that LDC has a couple of additional 
TLS symbols that DMD doesn't have. If I recall correctly, they looked 
like they were related to exception handling.

Any ideas what can be wrong or suggestions how to further debug this?

[1] 
https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L401

[2] 
https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L410

[3] 
https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L388

[4] 
https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L372

[5] 
https://github.com/D-Programming-Language/dmd/blob/7a7687e6e5b46ab9629bcdddb3061478c504ae49/test/runnable/testaa.d#L413

[6] 
https://github.com/ldc-developers/druntime/blob/ldc/src/rt/sections_ldc.d#L432

-- 
/Jacob Carlborg
Jan 07
parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
On 8 Jan 2016, at 8:37, Jacob Carlborg via digitalmars-d-ldc wrote:
 I've been trying to debug this but I don't have much knowledge in this 
 area. What I have found out is that "_chars" is included in the range 
 returned by _d_dyld_getTLSRange [6]. I've been trying to debug the GC, 
 and it looks like "_chars" is marked twice, before crashing. Or at 
 least a range where "_chars" is included.
It's been a while since I initially looked into getting the TLS to work, but did you check that _chars is properly aligned (i.e. to 8 bytes on x86_64)? This would be one way how the GC could miss the pointer even though the global is contained in a root range. If that's not it, I'd just continue trying to figure out which objects exactly are collected (not marked) and why.
 If I recall correctly, they looked like they were related to exception 
 handling.
There is currently a per-thread cache for exception handling metadata, yes. It contains a subtle bug, though (related to moving fibers between threads), and will probably go away. — David
Jan 08
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-01-08 16:32, David Nadlinger via digitalmars-d-ldc wrote:

 It's been a while since I initially looked into getting the TLS to work,
 but did you check that _chars is properly aligned (i.e. to 8 bytes on
 x86_64)? This would be one way how the GC could miss the pointer even
 though the global is contained in a root range.
That seemed to be the issue, it works now. Awesome :) thanks. A followup question: * I'm looking at the assembly output of LDC, it looks liked LDC aligns to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is that the case? * It looks like the only uses the above form of alignment if the symbol is placed in the __thread_bss section, i.e. doesn't have an initializer. Does that make sense? If it's has a initializer and is placed in the __thread_data section it will have the alignment of 3 or 4, depending of the size of the variable. -- /Jacob Carlborg
Jan 08
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-01-08 17:40, Jacob Carlborg wrote:

Adding the assembly for convenience

 * I'm looking at the assembly output of LDC, it looks liked LDC aligns
 to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is
 that the case?
Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?
 * It looks like the only uses the above form of alignment if the symbol
 is placed in the __thread_bss section, i.e. doesn't have an initializer.
 Does that make sense? If it's has a initializer and is placed in the
 __thread_data section it will have the alignment of 3 or 4, depending of
 the size of the variable.
With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4 -- /Jacob Carlborg
Jan 08
parent reply Dan Olson <gorox comcast.net> writes:
Jacob Carlborg <doob me.com> writes:

 On 2016-01-08 17:40, Jacob Carlborg wrote:

 Adding the assembly for convenience

 * I'm looking at the assembly output of LDC, it looks liked LDC aligns
 to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is
 that the case?
Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?
3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)
 * It looks like the only uses the above form of alignment if the symbol
 is placed in the __thread_bss section, i.e. doesn't have an initializer.
 Does that make sense? If it's has a initializer and is placed in the
 __thread_data section it will have the alignment of 3 or 4, depending of
 the size of the variable.
With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4
Same 8-byte alignment (OSX .align is synonym for .p2align). The tbss and tdata declarations match.
Jan 09
next sibling parent reply Dan Olson <gorox comcast.net> writes:
Dan Olson <gorox comcast.net> writes:

 Jacob Carlborg <doob me.com> writes:

 On 2016-01-08 17:40, Jacob Carlborg wrote:

 Adding the assembly for convenience

 * I'm looking at the assembly output of LDC, it looks liked LDC aligns
 to the size of the type, i.e. "int" to 4 and "long" to 8 and so on, is
 that the case?
Without initializer: .tbss __D4main1ai$tlv$init, 4, 3 BTW, do you know that the above 3 is?
3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)
 * It looks like the only uses the above form of alignment if the symbol
 is placed in the __thread_bss section, i.e. doesn't have an initializer.
 Does that make sense? If it's has a initializer and is placed in the
 __thread_data section it will have the alignment of 3 or 4, depending of
 the size of the variable.
With initializer: .section __DATA,__thread_data,thread_local_regular .align 3 __D4main1ai$tlv$init: .long 4
Same 8-byte alignment (OSX .align is synonym for .p2align). The tbss and tdata declarations match.
Just re-reading and it looks like alignments in your example are too big for a 4-byte type, assuming var is an int. .align only needs to be 2 here. $ cat tls.c __thread int x; __thread int y = 42; $ clang -S tls.c $ cat tls.s .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 10 .section __DATA,__thread_data,thread_local_regular .align 2 ## y _y$tlv$init: .long 42 ## 0x2a .section __DATA,__thread_vars,thread_local_variables .globl _y _y: .quad __tlv_bootstrap .quad 0 .quad _y$tlv$init .tbss _x$tlv$init, 4, 2 ## x .globl _x _x: .quad __tlv_bootstrap .quad 0 .quad _x$tlv$init .subsections_via_symbols
Jan 09
next sibling parent reply kinke <noone nowhere.com> writes:
On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
 Just re-reading and it looks like alignments in your example 
 are too big for a 4-byte type, assuming var is an int.  .align 
 only needs to be 2 here.
This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
Jan 09
parent reply Dan Olson <gorox comcast.net> writes:
kinke <noone nowhere.com> writes:

 On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
 Just re-reading and it looks like alignments in your example are too
 big for a 4-byte type, assuming var is an int.  .align only needs to
 be 2 here.
This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations? It probably is not a big issue, but the following: ubyte a,b,c,d,e,f,g,h; uses 64-bytes versus the 8-bytes from before. -- Dan
Jan 11
parent reply kink <noone nowhere.com> writes:
On Tuesday, 12 January 2016 at 05:44:56 UTC, Dan Olson wrote:
 kinke <noone nowhere.com> writes:

 On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
 Just re-reading and it looks like alignments in your example 
 are too big for a 4-byte type, assuming var is an int.  
 .align only needs to be 2 here.
This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations?
For all globals, yes. There's a std.conv unittest casting a global (I don't remember the original type) to an object (class) reference iirc, leading to an error or crash if the chunk isn't aligned. I just assumed DMD assumes such an alignment for globals...
Jan 12
parent Dan Olson <gorox comcast.net> writes:
kink <noone nowhere.com> writes:

 On Tuesday, 12 January 2016 at 05:44:56 UTC, Dan Olson wrote:
 kinke <noone nowhere.com> writes:

 On Saturday, 9 January 2016 at 20:07:34 UTC, Dan Olson wrote:
 Just re-reading and it looks like alignments in your example are
 too big for a 4-byte type, assuming var is an int.  .align only
 needs to be 2 here.
This is probably due to https://github.com/kinke/ldc/commit/a39997d326f0d3da353d8b9f27ffd559e6fcc5d7.
I haven't carefully read the commit yet. Is the extra alignment intended for all vars declarations?
For all globals, yes. There's a std.conv unittest casting a global (I don't remember the original type) to an object (class) reference iirc, leading to an error or crash if the chunk isn't aligned. I just assumed DMD assumes such an alignment for globals...
I think LDC is over aligning. It is ok functionally but my gut says it should be fixed eventually to match DMD. I did a test on OS X x86_64 and DMD seems to align global vars based on type size, maybe rounding up to next power of 2. I haven't looked at the code yet. DMD is more aligned than C or C++ but less than LDC. $ cat tls.d extern(C): __gshared byte a; __gshared byte[1] x1_1; __gshared byte[1] x1_2; __gshared byte[2] x2; __gshared byte[1] x1_3; __gshared byte[1] x1_4; __gshared byte[1] x1_5; __gshared byte[4] x4; __gshared byte[1] x1_6; __gshared byte[7] x7; __gshared byte[7] x7_1; void main() {} $ dmd tls.d $ nm -n tls (snip) 0000000100001010 B _a 0000000100001011 B _x1_1 0000000100001012 B _x1_2 0000000100001014 B _x2 0000000100001016 B _x1_3 0000000100001017 B _x1_4 0000000100001018 B _x1_5 000000010000101c B _x4 0000000100001020 B _x1_6 0000000100001028 B _x7 0000000100001030 B _x7_1 compared to LDC 000000010004c1c0 S _a 000000010004c1c8 S _x1_1 000000010004c1d0 S _x1_2 000000010004c1d8 S _x2 000000010004c1e0 S _x1_3 000000010004c1e8 S _x1_4 000000010004c1f0 S _x1_5 000000010004c1f8 S _x4 000000010004c200 S _x1_6 000000010004c208 S _x7 000000010004c210 S _x7_1 C just puts all these bytes together without any special alignment.
Jan 12
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2016-01-09 21:07, Dan Olson wrote:

 Just re-reading and it looks like alignments in your example are too big
 for a 4-byte type, assuming var is an int.  .align only needs to be 2 here.
The output was from LDC. I noticed that Clang and LDC behaves differently. -- /Jacob Carlborg
Jan 10
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2016-01-09 20:48, Dan Olson wrote:

 .tbss __D4main1ai$tlv$init, 4, 3

 BTW, do you know that the above 3 is?
3 is alignment like .p2align (power of 2 alignment). 2^3 in this case (8-byte)
I thought the four was the alignment. If the three is the alignment, then what is the four? The size of the variable?
 Same 8-byte alignment (OSX .align is synonym for .p2align).

 The tbss and tdata declarations match.
Ah, ok. If the second number (3) above is the alignment then it makes sense. -- /Jacob Carlborg
Jan 10