D.gnu - -fsection-anchors and -fdata-sections in ARM

Mike (87/87) Jan 22 2015 I'm trying to track down an optimization bug.

Johannes Pfau (29/132) Jan 22 2015 It's probably because we put the literals directly into .rodata but C

"Mike" <none none.com> writes:

I'm trying to track down an optimization bug.

I have written a D program and a C program that are essentially 
identical, and compiled them to assembly code with my 
arm-none-eabi-gdc/gcc build.

C Code: https://bpaste.net/show/6f420de3a892
C ASM : https://bpaste.net/show/42c75f6763bf
arm-none-eabi-gcc -O1 -S -ffunction-sections test.c -o test.s


D Code: https://bpaste.net/show/20faec5c4bb6
D ASM : https://bpaste.net/show/b2c200705d5d
arm-none-eabi-gdc -O1 -S -ffunction-sections test.d -o test.d.s

The problem is with the string literals at the very end of the 
ASM files:

C
****
.L12:
	.align	2
.L11:
	.word	.LC1
	.word	.LC2
	.word	.LC3
	.size	_start, .-_start
	.section	.rodata.str1.4,"aMS",%progbits,1
	.align	2
.LC0:
	.ascii	"\015\012\000"
	.space	1
.LC1:
	.ascii	"a\000"
	.space	2
.LC2:
	.ascii	"another string\000"
	.space	1
.LC3:
	.ascii	"do it again\000"
	.ident	"GCC: (GNU) 4.9.2"

D
****
.L6:
	.word	.LANCHOR0

... { blah blah }

.L13:
	.align	2
.L12:
	.word	.LANCHOR0
	.cantunwind
	.fnend
	.size	_D5start6_startFZv, .-_D5start6_startFZv
	.section	.rodata
	.align	2
	.set	.LANCHOR0,. + 0
.LC0:
	.ascii	"\015\012"
	.space	2
.LC1:
	.ascii	"a"
	.space	3
.LC2:
	.ascii	"another string"
	.space	2
.LC3:
	.ascii	"do it again"
	.ident	"GCC: (GNU) 4.9.2"

Notice how the D code adds the section anchor .LANCHOR0, but the 
C code does not.

The D code then refers to the sting literals with something like 
this...
ldr	r3, .L6

... essentially referring to the string literal as an offset from 
.LANCHOR

This is causing a problem for me because if I put my string 
literals in separate sections (i.e. -fdata-sections), compile 
with -O1 (i.e. -fsection-anchors), and link with --gc-section, 
then the linker strips out my string literals because it doesn't 
see the link to the string literal. (At least that's my theory).

The C code, on the other hand, does not use the section anchor 
and instead seems to put them in some kind of array of strings 
like this:
.L11:
	.word	.LC1
	.word	.LC2
	.word	.LC3

Therefore, the linker can see the link to the string literals and 
doesn't strip them out (Again, that's my theory)

Can anything be done about this in GDC?

Thanks always for the help,
Mike

Jan 22 2015

Johannes Pfau <nospam example.com> writes:

Am Thu, 22 Jan 2015 11:34:27 +0000
schrieb "Mike" <none none.com>:

 I'm trying to track down an optimization bug.
 
 I have written a D program and a C program that are essentially 
 identical, and compiled them to assembly code with my 
 arm-none-eabi-gdc/gcc build.
 
 C Code: https://bpaste.net/show/6f420de3a892
 C ASM : https://bpaste.net/show/42c75f6763bf
 arm-none-eabi-gcc -O1 -S -ffunction-sections test.c -o test.s
 
 
 D Code: https://bpaste.net/show/20faec5c4bb6
 D ASM : https://bpaste.net/show/b2c200705d5d
 arm-none-eabi-gdc -O1 -S -ffunction-sections test.d -o test.d.s
 
 The problem is with the string literals at the very end of the 
 ASM files:
 
 C
 ****
 .L12:
 	.align	2
 .L11:
 	.word	.LC1
 	.word	.LC2
 	.word	.LC3
 	.size	_start, .-_start
 	.section	.rodata.str1.4,"aMS",%progbits,1
 	.align	2
 .LC0:
 	.ascii	"\015\012\000"
 	.space	1
 .LC1:
 	.ascii	"a\000"
 	.space	2
 .LC2:
 	.ascii	"another string\000"
 	.space	1
 .LC3:
 	.ascii	"do it again\000"
 	.ident	"GCC: (GNU) 4.9.2"
 
 D
 ****
 .L6:
 	.word	.LANCHOR0
 
 ... { blah blah }
 
 .L13:
 	.align	2
 .L12:
 	.word	.LANCHOR0
 	.cantunwind
 	.fnend
 	.size	_D5start6_startFZv, .-_D5start6_startFZv
 	.section	.rodata
 	.align	2
 	.set	.LANCHOR0,. + 0
 .LC0:
 	.ascii	"\015\012"
 	.space	2
 .LC1:
 	.ascii	"a"
 	.space	3
 .LC2:
 	.ascii	"another string"
 	.space	2
 .LC3:
 	.ascii	"do it again"
 	.ident	"GCC: (GNU) 4.9.2"
 
 Notice how the D code adds the section anchor .LANCHOR0, but the 
 C code does not.
 
 The D code then refers to the sting literals with something like 
 this...
 ldr	r3, .L6

 ... essentially referring to the string literal as an offset from 
 .LANCHOR
 
 This is causing a problem for me because if I put my string 
 literals in separate sections (i.e. -fdata-sections), compile 
 with -O1 (i.e. -fsection-anchors), and link with --gc-section, 
 then the linker strips out my string literals because it doesn't 
 see the link to the string literal. (At least that's my theory).
 
 The C code, on the other hand, does not use the section anchor 
 and instead seems to put them in some kind of array of strings 
 like this:
 .L11:
 	.word	.LC1
 	.word	.LC2
 	.word	.LC3
 
 Therefore, the linker can see the link to the string literals and 
 doesn't strip them out (Again, that's my theory)
 
 Can anything be done about this in GDC?
 
 Thanks always for the help,
 Mike

It's probably because we put the literals directly into .rodata but C
puts them into .rodata.str1.4. I guess GCC would prefer section anchors
in both cases for optimization purposes but can't use anchors
for .rodata.str1.4. Maybe the linker performs merging or
something similar on .rodata.str1.4 which makes anchor-based addressing
invalid.

We can try to put our string literals into .rodata.str1.4 and see if
this helps. However, I think you can't rely on the fact that section
anchors are not used, it might be just an limitation in GCCs optimizer.



Main point to look at is gcc/varasm.c: default_elf_select_section
selects the section. mergeable_string_section is the .rodata.str1.4
section, decls are put into this section if categorize_decl_for_section
returns SECCAT_RODATA_MERGE_STR. We pass var_decls (the string
structs) to this function but it only categorizes a decl as a string
if it's a STRING_CST.

Unfortunately gcc is pretty dumb in this regard. Even in C:
const char mesg [] = "Hello World!"; //=>.rodata
const char* mesg = "Hello World! "; //=>rodata.str...

Our code is always equal to
const String s = {length, &"Hello World!"};
which is in the same category as the first C example.

Maybe we could do something like this:
const char* value = "Hello World";
const String s = {length, &value};
But I'm not sure if we can have char* declarations without a name.


 Iain any idea how we could make this work? There might be other
benefits like string constant merging, etc.

Jan 22 2015

D Programming

C/C++ Programming

Other

D.gnu - -fsection-anchors and -fdata-sections in ARM