www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Openwrt Linux Uclibc ARM GC issue

reply Radu <void null.pt> writes:
Trying to run some D code on Openwrt with Uclibc and got stuck by 
broken GC.

Using LDC 1.6
====================================
LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
====================================

Run time libs where compiled with:

====================================
ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
-mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
-mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" 
--targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF
====================================


The minimal program is:

++++++++++++++++++++
import core.memory;

void main()
{
   GC.collect();
}
++++++++++++++++++++

Compiled with `ldc2 -mtriple=armv7-linux-gnueabihf 
-mcpu=cortex-a7 -gcc=arm-openwrt-linux-gcc`

When run, I get this error spuriously:

====================================
core.exception.AssertError rt/sections_elf_shared.d(116): 
Assertion failure
Fatal error in EH code: _Unwind_RaiseException failed with reason 
code: 9
Aborted (core dumped)
====================================


GDB on the coredump:
====================================
(gdb) bt
#0  _dl_setup_progname (argv0=<optimized out>) at 
ldso/ldso/ldso.c:418
#1  0xb6f55e34 in map_writeable (libaddr=<optimized out>, 
flags=<optimized out>, piclib=-1225360472, ppnt=0x21, 
infile=<optimized out>) at ldso/ldso/dl-elf.c:442
#2  _dl_load_elf_shared_library (rflags=<optimized out>, 
rpnt=0xbeea5d9c, libname=0x0) at ldso/ldso/dl-elf.c:703
#3  0x0001c718 in _d_dso_registry ()
#4  0x00016b14 in ldc.register_dso ()
#5  0x00016b4c in ldc.dso_ctor.4test ()
#6  0xb6f54548 in __GI__dl_tls_setup () at ldso/ldso/dl-tls.c:451
#7  0xb6ea79e4 in _pthread_cleanup_pop_restore (buffer=<optimized 
out>, execute=<optimized out>) at libpthread/nptl/forward.c:152
Backtrace stopped: previous frame identical to this frame 
(corrupt stack?)
====================================

Any idea what might be wrong?
Dec 15 2017
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): Assertion 
 failure
 Fatal error in EH code: _Unwind_RaiseException failed with reason 
 code: 9
 Aborted (core dumped)
 ====================================
The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
Dec 15 2017
parent reply Radu <void null.pt> writes:
On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================
The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33 With the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking): ./druntime-test-runner 0.000s PASS release32 core.atomic 0.000s PASS release32 core.bitop 0.000s PASS release32 core.checkedint 0.005s PASS release32 core.demangle 0.000s PASS release32 core.exception 0.002s PASS release32 core.internal.arrayop 0.000s PASS release32 core.internal.convert 0.000s PASS release32 core.internal.hash 0.000s PASS release32 core.internal.string 0.000s PASS release32 core.math 0.000s PASS release32 core.memory 0.002s PASS release32 core.sync.barrier 0.015s PASS release32 core.sync.condition 0.000s PASS release32 core.sync.config 0.016s PASS release32 core.sync.mutex 0.016s PASS release32 core.sync.rwmutex 0.002s PASS release32 core.sync.semaphore Segmentation fault (core dumped) The seg fault is from core.thread:1351 unittest { auto t1 = new Thread({ foreach (_; 0 .. 20) Thread.getAll; }).start; auto t2 = new Thread({ foreach (_; 0 .. 20) GC.collect; // this seg faults }).start; t1.join(); t2.join(); } Calling GC.collect from the main thread doesn't seg fault. Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this: Thread 1 "test" received signal SIGUSR1, User defined signal 1. pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47 47 iattr->schedpolicy = thread->schedpolicy; (gdb) step Thread 1 "test" received signal SIGUSR2, User defined signal 2. 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58 58 CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout), (gdb) step Thread 1 "test" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb)
Dec 17 2017
parent reply Joakim <dlang joakim.fea.st> writes:
On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
 On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
 wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================
The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33
I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7? Try running uname -m on the device to check. For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.
 With the compiler switches changed I could run my test program 
 and try the druntime test runner (albeit with some changes on 
 math and stdio to get it linking):

 ./druntime-test-runner
 0.000s PASS release32 core.atomic
 0.000s PASS release32 core.bitop
 0.000s PASS release32 core.checkedint
 0.005s PASS release32 core.demangle
 0.000s PASS release32 core.exception
 0.002s PASS release32 core.internal.arrayop
 0.000s PASS release32 core.internal.convert
 0.000s PASS release32 core.internal.hash
 0.000s PASS release32 core.internal.string
 0.000s PASS release32 core.math
 0.000s PASS release32 core.memory
 0.002s PASS release32 core.sync.barrier
 0.015s PASS release32 core.sync.condition
 0.000s PASS release32 core.sync.config
 0.016s PASS release32 core.sync.mutex
 0.016s PASS release32 core.sync.rwmutex
 0.002s PASS release32 core.sync.semaphore
 Segmentation fault (core dumped)

 The seg fault is from core.thread:1351

 unittest
 {
     auto t1 = new Thread({
         foreach (_; 0 .. 20)
             Thread.getAll;
     }).start;
     auto t2 = new Thread({
         foreach (_; 0 .. 20)
             GC.collect; // this seg faults
     }).start;
     t1.join();
     t2.join();
 }

 Calling GC.collect from the main thread doesn't seg fault.
Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other. I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too: https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410 You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.
 Core dump is not very helpful - stack is garbage, but running 
 with gdbserver a minimal program with the unit test I can see 
 this:

 Thread 1 "test" received signal SIGUSR1, User defined signal 1.
 pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
 libpthread/nptl/pthread_getattr_np.c:47
 47        iattr->schedpolicy = thread->schedpolicy;
 (gdb) step

 Thread 1 "test" received signal SIGUSR2, User defined signal 2.
 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
 maxevents=2, timeout=-1224756080) at 
 libc/sysdeps/linux/common/epoll.c:58
 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
 epoll_event *events, int maxevents, int timeout),
 (gdb) step

 Thread 1 "test" received signal SIGSEGV, Segmentation fault.
 0xfffffffc in ?? ()
 (gdb)
The SIGUSR1/SIGUSR2 signals mean the GC ran fine. You'd need to delve more into the code and the implementation details mentioned above to track this down. On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
 Yes - latest LDC versions make cross compiling a breeze so 
 kudos to you guys for making this happening. I'm using Linux 
 subsystem for Window btw. so for me this is even more fun as I 
 can work on both environments natively :)
Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.
 The modifications need it surface deep are very few - some math 
 and memory streams functions are missing.
I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl: https://github.com/dlang/druntime/pull/1997
 The road block looks to be somewhere in the GC and TLS, or the 
 interaction of them (at least this is my feeling ATM)
Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.
Dec 17 2017
next sibling parent Suliman <evermind live.ru> writes:
offtop: there is another interesing lib: https://uclibc-ng.org/
Dec 18 2017
prev sibling parent reply Radu <void null.pt> writes:
On Sunday, 17 December 2017 at 19:05:04 UTC, Joakim wrote:
 On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
 On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
 wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================
The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33
I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7? Try running uname -m on the device to check. For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.
 With the compiler switches changed I could run my test program 
 and try the druntime test runner (albeit with some changes on 
 math and stdio to get it linking):

 ./druntime-test-runner
 0.000s PASS release32 core.atomic
 0.000s PASS release32 core.bitop
 0.000s PASS release32 core.checkedint
 0.005s PASS release32 core.demangle
 0.000s PASS release32 core.exception
 0.002s PASS release32 core.internal.arrayop
 0.000s PASS release32 core.internal.convert
 0.000s PASS release32 core.internal.hash
 0.000s PASS release32 core.internal.string
 0.000s PASS release32 core.math
 0.000s PASS release32 core.memory
 0.002s PASS release32 core.sync.barrier
 0.015s PASS release32 core.sync.condition
 0.000s PASS release32 core.sync.config
 0.016s PASS release32 core.sync.mutex
 0.016s PASS release32 core.sync.rwmutex
 0.002s PASS release32 core.sync.semaphore
 Segmentation fault (core dumped)

 The seg fault is from core.thread:1351

 unittest
 {
     auto t1 = new Thread({
         foreach (_; 0 .. 20)
             Thread.getAll;
     }).start;
     auto t2 = new Thread({
         foreach (_; 0 .. 20)
             GC.collect; // this seg faults
     }).start;
     t1.join();
     t2.join();
 }

 Calling GC.collect from the main thread doesn't seg fault.
Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other. I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too: https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410 You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.
 Core dump is not very helpful - stack is garbage, but running 
 with gdbserver a minimal program with the unit test I can see 
 this:

 Thread 1 "test" received signal SIGUSR1, User defined signal 1.
 pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
 libpthread/nptl/pthread_getattr_np.c:47
 47        iattr->schedpolicy = thread->schedpolicy;
 (gdb) step

 Thread 1 "test" received signal SIGUSR2, User defined signal 2.
 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
 maxevents=2, timeout=-1224756080) at 
 libc/sysdeps/linux/common/epoll.c:58
 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
 epoll_event *events, int maxevents, int timeout),
 (gdb) step

 Thread 1 "test" received signal SIGSEGV, Segmentation fault.
 0xfffffffc in ?? ()
 (gdb)
The SIGUSR1/SIGUSR2 signals mean the GC ran fine. You'd need to delve more into the code and the implementation details mentioned above to track this down. On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
 Yes - latest LDC versions make cross compiling a breeze so 
 kudos to you guys for making this happening. I'm using Linux 
 subsystem for Window btw. so for me this is even more fun as I 
 can work on both environments natively :)
Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.
 The modifications need it surface deep are very few - some 
 math and memory streams functions are missing.
I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl: https://github.com/dlang/druntime/pull/1997
 The road block looks to be somewhere in the GC and TLS, or the 
 interaction of them (at least this is my feeling ATM)
Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.
Got some time to work on this - just to clarify I'm developing against uClibc-ng 1.0.9, noticed others suggesting this and wanted to make it clear. Re. the architecture - it is an armv7a as 'uname -a' says: 'Linux fs 3.4.39 #249 SMP PREEMPT Wed Oct 4 12:07:05 MYT 2017 armv7l GNU/Linux' I could not produce any working binary by specifying the armv7a architecture to ldc, so I used the generic arm architecture for gnueabihf, as previously stated. I managed to get the druntime tester running (minus some math functions and memstream) except for one specific blocking issue - Thread.suspend does not work, it produces a segfault. To test this I commented out all suspendAll/resumeAll unittests from core.thread and stubbed out GC.collect(). This issue is not linked to the GC, as the segfault happens even when disabling the GC.collect function and enable the suspendAll/resumeAll unittests, the GC just happens to use the suspend mechanics and exposes the core issue. From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made. Like: 464 status = sem_post( &suspendCount ); (gdb) n Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2. 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 464 status = sem_post( &suspendCount ); (gdb) info threads Id Target Id Frame 1 Thread 16005.16005 "druntime-test-r" 0x001ba7a0 in _D4core6thread5Fiber5stateMxFNaNbNdNiNfZEQBnQBlQBh5State (this=0xb6d34980) at thread.d:4533 * 2 Thread 16005.16273 "druntime-test-r" 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 (gdb) bt #0 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 #1 0x001b483c in core.thread.callWithStackShell(scope void(void*) nothrow delegate) (fn=...) at thread.d:2600 #2 0x001b45f8 in thread_suspendHandler (sig=10) at thread.d:487 #3 0xfffffffe in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) n Thread 2 "druntime-test-r" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb) bt #0 0xfffffffc in ?? () #1 0xfffffffe in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Jan 09
parent reply "David Nadlinger" <code klickverbot.at> writes:
On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
 From what I can see in gdb 'thread_resumeHandler' is to blame, it 
 looks like 'sem_post( &suspendCount )' will immediately trigger the 
 resumeSignal and the call for 'sigsuspend( &sigres )' is never made.
You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell? sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals. —David
Jan 10
parent reply Radu <void null.pt> writes:
On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
wrote:
 On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
 From what I can see in gdb 'thread_resumeHandler' is to blame, 
 it looks like 'sem_post( &suspendCount )' will immediately 
 trigger the resumeSignal and the call for 'sigsuspend( &sigres 
 )' is never made.
You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell? sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals. —David
David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. Moving the break point to the thread_resumeHandler I can see that the handler gets called, but I think you are right about the ABI, observe: Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2. 0xb6e88648 in ?? () from target:/lib/libc.so.1 (gdb) bt #0 0xb6e88648 in ?? () from target:/lib/libc.so.1 #1 0xb6e50dd0 in sigsuspend () from target:/lib/libc.so.1 #2 0x001b46e8 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:467 #3 0x001b483c in core.thread.callWithStackShell(scope void(void*) nothrow delegate) (fn=...) at thread.d:2600 #4 0x001b45f8 in thread_suspendHandler (sig=10) at thread.d:487 #5 0xfffffffe in ?? () (gdb) c Thread 2 "druntime-test-r" hit Breakpoint 1, thread_resumeHandler (sig=12) at thread.d:494 warning: Source file is more recent than executable. 494 assert( sig == resumeSignalNumber ); (gdb) i f Stack level 0, frame at 0xb572f4d8: pc = 0x1b487c in thread_resumeHandler (thread.d:494); saved pc = 0xfffffffe called by frame at 0xb572f4d8 source language d. Arglist at 0xb572f4c8, args: sig=12 Locals at 0xb572f4c8, Previous frame's sp is 0xb572f4d8 Saved registers: r11 at 0xb572f4d0, lr at 0xb572f4d4 ....... (gdb) disas (gdb) disas Dump of assembler code for function thread_resumeHandler: 0x001b4864 <+0>: push {r11, lr} 0x001b4868 <+4>: mov r11, sp 0x001b486c <+8>: sub sp, sp, #8 0x001b4870 <+12>: ldr r1, [pc, #52] ; 0x1b48ac <thread_resumeHandler+72> 0x001b4874 <+16>: ldr r1, [pc, r1] 0x001b4878 <+20>: str r0, [sp, #4] 0x001b487c <+24>: ldr r0, [sp, #4] 0x001b4880 <+28>: ldr r1, [r1] 0x001b4884 <+32>: cmp r0, r1 0x001b4888 <+36>: bne 0x1b4894 <thread_resumeHandler+48> 0x001b488c <+40>: mov sp, r11 => 0x001b4890 <+44>: pop {r11, pc} 0x001b4894 <+48>: ldr r0, [pc, #20] ; 0x1b48b0 <thread_resumeHandler+76> 0x001b4898 <+52>: add r1, pc, r0 0x001b489c <+56>: mov r0, #13 0x001b48a0 <+60>: mov r2, #238 ; 0xee 0x001b48a4 <+64>: orr r2, r2, #256 ; 0x100 0x001b48a8 <+68>: bl 0xf00c8 <_d_assert> 0x001b48ac <+72>: mulseq r4, r8, r5 0x001b48b0 <+76>: ; <UNDEFINED> instruction: 0x00117bd1 (gdb) ni 0x001b4890 in thread_resumeHandler (sig=-2) at thread.d:499 499 } Warning: Cannot insert breakpoint 0. Cannot access memory at address 0xfffffffe It looks that PC is invalid causing the segmentation fault.
Jan 10
parent reply Joakim <dlang joakim.fea.st> writes:
On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
 wrote:
  [...]
David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. [...]
Have you ported much of druntime to Uclibc? It currently assumes Glibc on linux by default, so if there are differences between the way the two handle such signals, it can cause problems. For example, the Android Java Runtime intercepts SIGUSR1/SIGUSR2 and doesn't run their signal handlers, so I had to work around that issue: https://github.com/dlang/druntime/pull/1851#discussion_r123886260 You may be running across a similar incompatibility, so I suggest you port all the version-dependent blocks of that module and its dependent modules first.
Jan 10
parent reply Radu <void null.pt> writes:
On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:
 On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
 wrote:
  [...]
David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. [...]
Have you ported much of druntime to Uclibc? It currently assumes Glibc on linux by default, so if there are differences between the way the two handle such signals, it can cause problems. For example, the Android Java Runtime intercepts SIGUSR1/SIGUSR2 and doesn't run their signal handlers, so I had to work around that issue: https://github.com/dlang/druntime/pull/1851#discussion_r123886260 You may be running across a similar incompatibility, so I suggest you port all the version-dependent blocks of that module and its dependent modules first.
I missed a bunch of details that where killing the signal handling, thanks for the guidance!, various size differences on structs. Fixed now. druntime tests are passing in release mode now. The debug build fails with: core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure, Code looks like: invariant() { assert(_moduleGroup.modules.length); static if (SharedELF) { assert(_tlsMod || !_tlsSize); // <-- fails } } Stack trace: #0 rt.sections_elf_shared.DSO.__invariant1() const (this=...) at sections_elf_shared.d:116 #1 0x0029e490 in rt.sections_elf_shared.DSO.__invariant() const (this=<error reading variable: Cannot access memory at address 0xe9>) at sections_elf_shared.d:67 #2 0x0029e4d8 in rt.sections_elf_shared.DSO.gcRanges() inout (this=...) at sections_elf_shared.d:104 #3 0x00293e14 in _D2rt6memory16initStaticDataGCFZ14__foreachbody1MFKSQBy19secti ns_elf_shared3DSOZi (sg=...) at memory.d:23 #4 0x0029e350 in _D2rt19sections_elf_shared3DSO7opApplyFMDFKSQBqQBqQyZiZi (dg=...) at sections_elf_shared.d:73 #5 0x00293de4 in rt.memory.initStaticDataGC() () at memory.d:21 #6 0x00284984 in rt_init () at dmain2.d:184 #7 0x002851d4 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).runAll() () at dmain2.d:478 #8 0x00285138 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function).tryExec(scope void() delegate) (dg=...) at dmain2.d:454 #9 0x00285030 in _d_run_main (argc=1, argv=0xbefffd54, mainFunc=0xc5210 <D main>) at dmain2.d:487 #10 0x000c5394 in main (argc=1, argv=0xbefffd54) at __entrypoint.d:8 #11 0xb6e88a84 in __uClibc_main () from target:/lib/libc.so.1 #12 0x00000000 in ?? () I don't really understand that invariant, I see that those vars are initialized way before in the init part and have values, for example: _tlsMod = 0 and _tlsSize = 388 Stack trace: #0 _D2rt19sections_elf_shared12scanSegmentsFNbNiKxS4core3sys5linux4link12dl_phd _infoPSQDeQDe3DSOZv (info=..., pdso=0x307150) at sections_elf_shared.d:871 #1 0x0029ef18 in _d_dso_registry (arg=0xbefffbc8 "\001") at sections_elf_shared.d:455 #2 0x000c530c in ldc.register_dso () #3 0x000c5344 in ldc.dso_ctor.11test_runner () #4 0xb6fea548 in _dl_run_init_array () from target:/lib/ld-uClibc.so.0 #5 0xb6e889e4 in __uClibc_main () from target:/lib/libc.so.1 #6 0x00000000 in ?? () Any idea why this fails and how to fix?
Jan 14
parent reply Joakim <dlang joakim.fea.st> writes:
On Sunday, 14 January 2018 at 21:33:28 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:
 [...]
I missed a bunch of details that where killing the signal handling, thanks for the guidance!, various size differences on structs. Fixed now.
Figured that was it, that's why I asked you a couple times how much you had ported druntime.
 druntime tests are passing in release mode now.

 [...]
_tlsMod and _tlsSize are extracted from shared libraries and then passed to __tls_get_addr to initialize thread-local storage for each library. That invariant makes sure the TLS index _tlsMod isn't 0 along with a non-zero size, not sure why David checks for that. It could be he doesn't expect the index 0 for a shared library whereas uClibc is okay with that? I don't use this module or arbitrary shared libraries on Android/ARM, so I haven't had to mess with it.
Jan 15
parent "David Nadlinger" <code klickverbot.at> writes:
On 15 Jan 2018, at 10:05, Joakim via digitalmars-d-ldc wrote:
 _tlsMod and _tlsSize are extracted from shared libraries and then 
 passed to __tls_get_addr to initialize thread-local storage for each 
 library.  That invariant makes sure the TLS index _tlsMod isn't 0 
 along with a non-zero size, not sure why David checks for that.  It 
 could be he doesn't expect the index 0 for a shared library whereas 
 uClibc is okay with that?
We inherited that from Martin's code – presumably, it's just never the case on glibc. If all the tests work with shared libraries (DMD test suite and runtime unit tests, plus druntime/test, as run by ctest), there is nothing to worry about. — David
Jan 15
prev sibling parent reply Joakim <dlang joakim.fea.st> writes:
On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
 Trying to run some D code on Openwrt with Uclibc and got stuck 
 by broken GC.

 Using LDC 1.6
 ====================================
 LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
 ====================================

 Run time libs where compiled with:

 ====================================
 ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
 -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ 
 -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" 
 BUILD_SHARED_LIBS=OFF
 ====================================
First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime. Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar. Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it. It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples: https://wiki.dlang.org/Building_LDC_runtime_libraries https://wiki.dlang.org/Build_D_for_Android You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link. Let us know how it works out. While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example: https://github.com/dlang/druntime/pull/734 https://github.com/dlang/druntime/pull/1494
Dec 16 2017
parent Radu <void null.pt> writes:
On Saturday, 16 December 2017 at 14:14:40 UTC, Joakim wrote:
 On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
 Trying to run some D code on Openwrt with Uclibc and got stuck 
 by broken GC.

 Using LDC 1.6
 ====================================
 LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
 ====================================

 Run time libs where compiled with:

 ====================================
 ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
 -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ 
 -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" 
 BUILD_SHARED_LIBS=OFF
 ====================================
First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime. Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar. Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it. It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples: https://wiki.dlang.org/Building_LDC_runtime_libraries https://wiki.dlang.org/Build_D_for_Android You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link. Let us know how it works out. While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example: https://github.com/dlang/druntime/pull/734 https://github.com/dlang/druntime/pull/1494
Test runners where out of the question as no program started. See my reply to David. Yeah I setup the CC correctly, but curiously specifying a more fitting platform triple and -march on GCC produced non working binaries, I had to revert to the defaults. Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :) The modifications need it surface deep are very few - some math and memory streams functions are missing. The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)
Dec 17 2017