www.digitalmars.com         C & C++   DMDScript  

D.gnu - Compiler-generated implicit symbols and --gc-sections

reply "Mike" <none none.com> writes:
I ran into a problem recently that resulted in a segmentation 
fault in my program whenever I called a member function of one of 
my classes.  Sometimes it occurred and sometimes it didn't 
depending on the order of certain things in my code.


I eventually tracked it down to the fact that I was compiling 
with -ffunction-sections and -fdata-sections and linking with 
--gc-sections and symbols like...

.data._D38TypeInfo_E14TypeInfo_Class10ClassFlags6__initZ
.data._D40TypeInfo_E15TypeInfo_Struct11StructFlags6__initZ

... were being discarded.  I'm assuming this is the mangled .init 
values of these types, yes?



My linker script contained...

.data : AT (__data_rom_begin)
     {
	. = ALIGN(4);
	__data_ram_begin = .;
	
	. = ALIGN(4);
	*(.data)
	*(.data*)

	. = ALIGN(4);
	__data_ram_end = .;
     } >SRAM

... so I was forced to conclude that the reason they were being 
discarded was because it couldn't find any code that was reaching 
these symbols.



After adding...

KEEP(*(.data.*init*))

... to my linker script, the problem was resolved.

I'm guessing these are generated implicitly by the GDC compiler, 
but it does appear that my code never reaches these symbols, so 
discarding them should be OK.  However, it seems discarding them 
causes dislocation in memory.

I'm still a novice with GCC-based toolchains, so forgive the 
ignorance of this question, but is this to be expected, or is 
this an indication of a problem with the compiler?

Mike

Compiler:
Latest GDC 4.8 compiled for arm-none-eabi (ARM Cortex-M)
Jan 03 2014
next sibling parent reply "Timo Sintonen" <t.sintonen luukku.com> writes:
On Friday, 3 January 2014 at 18:14:58 UTC, Mike wrote:
 I ran into a problem recently that resulted in a segmentation 
 fault in my program whenever I called a member function of one 
 of my classes.  Sometimes it occurred and sometimes it didn't 
 depending on the order of certain things in my code.


 I eventually tracked it down to the fact that I was compiling 
 with -ffunction-sections and -fdata-sections and linking with 
 --gc-sections and symbols like...

 .data._D38TypeInfo_E14TypeInfo_Class10ClassFlags6__initZ
 .data._D40TypeInfo_E15TypeInfo_Struct11StructFlags6__initZ

 ... were being discarded.  I'm assuming this is the mangled 
 .init values of these types, yes?



 My linker script contained...

 .data : AT (__data_rom_begin)
     {
 	. = ALIGN(4);
 	__data_ram_begin = .;
 	
 	. = ALIGN(4);
 	*(.data)
 	*(.data*)

 	. = ALIGN(4);
 	__data_ram_end = .;
     } >SRAM

 ... so I was forced to conclude that the reason they were being 
 discarded was because it couldn't find any code that was 
 reaching these symbols.



 After adding...

 KEEP(*(.data.*init*))

 ... to my linker script, the problem was resolved.

 I'm guessing these are generated implicitly by the GDC 
 compiler, but it does appear that my code never reaches these 
 symbols, so discarding them should be OK.  However, it seems 
 discarding them causes dislocation in memory.

 I'm still a novice with GCC-based toolchains, so forgive the 
 ignorance of this question, but is this to be expected, or is 
 this an indication of a problem with the compiler?

 Mike

 Compiler:
 Latest GDC 4.8 compiled for arm-none-eabi (ARM Cortex-M)
Again, I am guessing a little, but... In dmd and ides it is common to compile and link everything at once. the compiler has all information available and may remove unused code and data. The gcc system is made for separate compilation. When compiling a file, the compiler has no idea how other files call functions and objects in this file. So there has to be at least the default set of resources. If they are used is known only at linking phase. I do not know if the linker is able to remove unused code or data and what flags are needed. Because tha data is referenced from other files, there has to be a common naming system. Maybe it would be possible to use named variables but for some reason they have decided to name a separate section for every piece of info. Every class, struct etc will have its own sections and there will be lots of them. I have just included all of them without thinking. It may also be possible that the code or data is in use. In asm file there is a table of data after each funtion. The code may get a word from the table. This may be a pointer to another table in another function in another module. There may be an offset to another place in the table and there may be a pointer to this strange section. Without looking the whole program in debugger it is impossible to say whether the code and data are actually used or not.
Jan 03 2014
parent "Dicebot" <public dicebot.lv> writes:
On Saturday, 4 January 2014 at 07:59:55 UTC, Timo Sintonen wrote:
 In dmd and ides it is common to compile and link everything at 
 once. the compiler has all information available and may remove 
 unused code and data.
Actually no D compiler does it out of the box as far as I am aware. It is a big long-standing problem.
Jan 06 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 3 January 2014 at 18:14:58 UTC, Mike wrote:
 I eventually tracked it down to the fact that I was compiling 
 with -ffunction-sections and -fdata-sections and linking with 
 --gc-sections and symbols like...
I never got --gc-sections to work reliably with D without going dirty, crashes were somewhat common for any non-trivial program. Don't think this particular use case is tested by anyone at all, you are on your own once you get here.
Jan 06 2014
parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On 6 Jan 2014 13:45, "Dicebot" <public dicebot.lv> wrote:
 On Friday, 3 January 2014 at 18:14:58 UTC, Mike wrote:
 I eventually tracked it down to the fact that I was compiling with
-ffunction-sections and -fdata-sections and linking with --gc-sections and symbols like...
 I never got --gc-sections to work reliably with D without going dirty,
crashes were somewhat common for any non-trivial program. Don't think this particular use case is tested by anyone at all, you are on your own once you get here. Of course ! --gc-sections is just a dirty hack. If you want smaller binaries, then you are better off aiding the shared library support. :) I don't ever recall any of the core maintainers ever endorsing that switch anyway....
Jan 06 2014
next sibling parent reply "Mike" <none none.com> writes:
On Monday, 6 January 2014 at 18:59:00 UTC, Iain Buclaw wrote:
 On 6 Jan 2014 13:45, "Dicebot" <public dicebot.lv> wrote:
 On Friday, 3 January 2014 at 18:14:58 UTC, Mike wrote:
 I eventually tracked it down to the fact that I was compiling 
 with
-ffunction-sections and -fdata-sections and linking with --gc-sections and symbols like...
 I never got --gc-sections to work reliably with D without 
 going dirty,
crashes were somewhat common for any non-trivial program. Don't think this particular use case is tested by anyone at all, you are on your own once you get here. Of course ! --gc-sections is just a dirty hack. If you want smaller binaries, then you are better off aiding the shared library support. :) I don't ever recall any of the core maintainers ever endorsing that switch anyway....
I agree that the --gc-sections method is hackish, but I wouldn't say it's dirty. And, in absence of a better method, it is *essential* in the embedded world, and was likely added specifically to make the GNU toolchain a feasible alternative for the embedded market. I doubt the Arduino, with its 32KB of flash memory, would have even been created without it. The STM32 processors that I use have 16 ~ 1024KB of flash on them, and --gc-sections is essential to get some programs to fit. Furthermore, it saves my employer 10s of thousands of dollars in hardware costs for mass produced devices. With --gc-sections, these devices can be built with C/C++, libsup++, newlib, and libc++ quite effectively. Without it, this would be impossible. Shared library support just doesn't apply in this world. Most of the devices I build are single-threaded, and much of code in the libraries is just never called, and hacking the library's source code with #defines to strip out stuff is a non-solution. I'm interested in knowing why --gc-sections works well for C/C++ programs but not D, and I hope the compilers will eventually emit code that can support it. It would be sad if D fragmented into D and embedded-D. I don't think that would serve the D language well. I'm liking D so far, and I'm very interested in seeing D become an alternative for the embedded world. I'm willing to help in any way I can.
Jan 06 2014
parent reply "Joakim" <joakim airpost.net> writes:
On Tuesday, 7 January 2014 at 02:17:46 UTC, Mike wrote:
 On Monday, 6 January 2014 at 18:59:00 UTC, Iain Buclaw wrote:
 On 6 Jan 2014 13:45, "Dicebot" <public dicebot.lv> wrote:
 On Friday, 3 January 2014 at 18:14:58 UTC, Mike wrote:
 I eventually tracked it down to the fact that I was 
 compiling with
-ffunction-sections and -fdata-sections and linking with --gc-sections and symbols like...
 I never got --gc-sections to work reliably with D without 
 going dirty,
crashes were somewhat common for any non-trivial program. Don't think this particular use case is tested by anyone at all, you are on your own once you get here. Of course ! --gc-sections is just a dirty hack. If you want smaller binaries, then you are better off aiding the shared library support. :) I don't ever recall any of the core maintainers ever endorsing that switch anyway....
I agree that the --gc-sections method is hackish, but I wouldn't say it's dirty. And, in absence of a better method, it is *essential* in the embedded world, and was likely added specifically to make the GNU toolchain a feasible alternative for the embedded market. I doubt the Arduino, with its 32KB of flash memory, would have even been created without it. The STM32 processors that I use have 16 ~ 1024KB of flash on them, and --gc-sections is essential to get some programs to fit. Furthermore, it saves my employer 10s of thousands of dollars in hardware costs for mass produced devices. With --gc-sections, these devices can be built with C/C++, libsup++, newlib, and libc++ quite effectively. Without it, this would be impossible. Shared library support just doesn't apply in this world. Most of the devices I build are single-threaded, and much of code in the libraries is just never called, and hacking the library's source code with #defines to strip out stuff is a non-solution. I'm interested in knowing why --gc-sections works well for C/C++ programs but not D, and I hope the compilers will eventually emit code that can support it. It would be sad if D fragmented into D and embedded-D. I don't think that would serve the D language well. I'm liking D so far, and I'm very interested in seeing D become an alternative for the embedded world. I'm willing to help in any way I can.
I ran into this recently when compiling for Android/x86, as the Android NDK linker calls --gc-sections by default. I was able to reproduce the segfault with dmd compiling a linux/x86 executable with the --gc-sections flag added to the linker command, when compiling sieve.d from the samples. I think sieve.d was working fine when I removed the recent patches for shared library support on linux, in sections_linux.d, so this incompatibility might be related to the shared library work. I'm not sure if you're even using that work though, so maybe that's just one of the ways that gc-sections trips up.
Jan 07 2014
parent reply "Mike" <none none.com> writes:
On Tuesday, 7 January 2014 at 11:04:45 UTC, Joakim wrote:
 I ran into this recently when compiling for Android/x86, as the 
 Android NDK linker calls --gc-sections by default.  I was able 
 to reproduce the segfault with dmd compiling a linux/x86 
 executable with the --gc-sections flag added to the linker 
 command, when compiling sieve.d from the samples.  I think 
 sieve.d was working fine when I removed the recent patches for 
 shared library support on linux, in sections_linux.d, so this 
 incompatibility might be related to the shared library work.  
 I'm not sure if you're even using that work though, so maybe 
 that's just one of the ways that gc-sections trips up.
Interesting! I'd like to take the current 4.8 backport and compile it without the shared library stuff to test this out. But I don't know how. Would you mind giving me a quick explanation on how to remove these patches using git? I'm really quite new to some of these tools.
Jan 08 2014
parent reply "Mike" <none none.com> writes:
On Thursday, 9 January 2014 at 07:51:48 UTC, Mike wrote:
 On Tuesday, 7 January 2014 at 11:04:45 UTC, Joakim wrote:
 I ran into this recently when compiling for Android/x86, as 
 the Android NDK linker calls --gc-sections by default.  I was 
 able to reproduce the segfault with dmd compiling a linux/x86 
 executable with the --gc-sections flag added to the linker 
 command, when compiling sieve.d from the samples.  I think 
 sieve.d was working fine when I removed the recent patches for 
 shared library support on linux, in sections_linux.d, so this 
 incompatibility might be related to the shared library work.  
 I'm not sure if you're even using that work though, so maybe 
 that's just one of the ways that gc-sections trips up.
Interesting! I'd like to take the current 4.8 backport and compile it without the shared library stuff to test this out. But I don't know how. Would you mind giving me a quick explanation on how to remove these patches using git? I'm really quite new to some of these tools.
Nevermind that last post. I thought you were talking about code in GDC, not the runtime. My runtime is only about 400 lines total, and I'm not anywhere near sections.d.
Jan 09 2014
parent "Joakim" <joakim airpost.net> writes:
On Thursday, 9 January 2014 at 10:15:46 UTC, Mike wrote:
 On Thursday, 9 January 2014 at 07:51:48 UTC, Mike wrote:
 On Tuesday, 7 January 2014 at 11:04:45 UTC, Joakim wrote:
 I ran into this recently when compiling for Android/x86, as 
 the Android NDK linker calls --gc-sections by default.  I was 
 able to reproduce the segfault with dmd compiling a linux/x86 
 executable with the --gc-sections flag added to the linker 
 command, when compiling sieve.d from the samples.  I think 
 sieve.d was working fine when I removed the recent patches 
 for shared library support on linux, in sections_linux.d, so 
 this incompatibility might be related to the shared library 
 work.  I'm not sure if you're even using that work though, so 
 maybe that's just one of the ways that gc-sections trips up.
Interesting! I'd like to take the current 4.8 backport and compile it without the shared library stuff to test this out. But I don't know how. Would you mind giving me a quick explanation on how to remove these patches using git? I'm really quite new to some of these tools.
Nevermind that last post. I thought you were talking about code in GDC, not the runtime. My runtime is only about 400 lines total, and I'm not anywhere near sections.d.
Yeah, that's in druntime, which is what I'm porting to Android/x86. Interestingly, the segfault would go away if I removed --gc-sections, so that alone seemed to be causing it, on both linux/x86 and Android/x86.
Jan 09 2014
prev sibling parent "Dicebot" <public dicebot.lv> writes:
On Monday, 6 January 2014 at 18:59:00 UTC, Iain Buclaw wrote:
 Of course ! --gc-sections is just a dirty hack.  If you want 
 smaller
 binaries, then you are better off aiding the shared library 
 support. :)

 I don't ever recall any of the core maintainers ever endorsing 
 that switch
 anyway....
Hack or not it is pretty much the only existing solution for binary bloat which is very strong in D. Shared library support is completely irrelevant here - it does not fix the problem of compilers generating lot of code that is never actually used.
Jan 06 2014