digitalmars.D.ldc - Openwrt Linux Uclibc ARM GC issue

Radu (71/71) Dec 15 2017 Trying to run some D code on Openwrt with Uclibc and got stuck by

David Nadlinger (17/25) Dec 15 2017 The assert is inside an invariant which checks that the TLS information

Radu (62/90) Dec 17 2017 My various attempts on getting it to run behaved very erratic.

Joakim (31/135) Dec 17 2017 I believe that triple defaults to ARMv5, are you sure your

Suliman (1/1) Dec 18 2017 offtop: there is another interesing lib: https://uclibc-ng.org/
Radu (58/203) Jan 09 2018 Got some time to work on this - just to clarify I'm developing

David Nadlinger (7/10) Jan 10 2018 You mean thread_suspendHandler? Perhaps single-stepping through the code...

Radu (69/82) Jan 10 2018 David, indeed sem_post works correctly, I guess gdb interpreted

Joakim (11/17) Jan 10 2018 Have you ported much of druntime to Uclibc? It currently assumes

Radu (59/78) Jan 14 2018 I missed a bunch of details that where killing the signal

Joakim (11/18) Jan 15 2018 Figured that was it, that's why I asked you a couple times how

David Nadlinger (6/12) Jan 15 2018 We inherited that from Martin's code – presumably, it's just never the...

Joakim (22/56) Dec 16 2017 First thing I'd do is build and run the test runners, then make

Radu (14/75) Dec 17 2017 Test runners where out of the question as no program started. See

Radu <void null.pt> writes:

Trying to run some D code on Openwrt with Uclibc and got stuck by 
broken GC.

Using LDC 1.6
====================================
LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
====================================

Run time libs where compiled with:

====================================
ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
-mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
-mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" 
--targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF
====================================


The minimal program is:

++++++++++++++++++++
import core.memory;

void main()
{
   GC.collect();
}
++++++++++++++++++++

Compiled with `ldc2 -mtriple=armv7-linux-gnueabihf 
-mcpu=cortex-a7 -gcc=arm-openwrt-linux-gcc`

When run, I get this error spuriously:

====================================
core.exception.AssertError rt/sections_elf_shared.d(116): 
Assertion failure
Fatal error in EH code: _Unwind_RaiseException failed with reason 
code: 9
Aborted (core dumped)
====================================


GDB on the coredump:
====================================
(gdb) bt

ldso/ldso/ldso.c:418

flags=<optimized out>, piclib=-1225360472, ppnt=0x21, 
infile=<optimized out>) at ldso/ldso/dl-elf.c:442

rpnt=0xbeea5d9c, libname=0x0) at ldso/ldso/dl-elf.c:703





out>, execute=<optimized out>) at libpthread/nptl/forward.c:152
Backtrace stopped: previous frame identical to this frame 
(corrupt stack?)
====================================

Any idea what might be wrong?

Dec 15 2017

"David Nadlinger" <code klickverbot.at> writes:

On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): Assertion 
 failure
 Fatal error in EH code: _Unwind_RaiseException failed with reason 
 code: 9
 Aborted (core dumped)
 ====================================

The assert is inside an invariant which checks that the TLS information 
has been extracted successfully. Perhaps uclibc uses a TLS 
implementation that is not ABI-compatible with glibc? (druntime needs to 
determine the TLS ranges to register them with the GC, for the main 
thread as well as newly spawned ones.)

Where in the program lifecycle does the error occur? From the backtrace, 
it looks like during C runtime startup, in which case I am not quite 
seeing the connection to the GC.

Why unwinding fails is another question, but not one I would be terribly 
worried about – it is possible that the error e.g. just occurs too 
early for the EH machinery to be properly set up yet. Other low-level 
parts of druntime have been converted to directly abort (e.g. using 
assert(0)) instead. In fact, I am about to overhaul sections_elf_shared 
in that respect anyway to improve error reporting when mixing shared and 
non-shared builds.

  — David

Dec 15 2017

Radu <void null.pt> writes:

On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================

 The assert is inside an invariant which checks that the TLS 
 information has been extracted successfully. Perhaps uclibc 
 uses a TLS implementation that is not ABI-compatible with 
 glibc? (druntime needs to determine the TLS ranges to register 
 them with the GC, for the main thread as well as newly spawned 
 ones.)

 Where in the program lifecycle does the error occur? From the 
 backtrace, it looks like during C runtime startup, in which 
 case I am not quite seeing the connection to the GC.

 Why unwinding fails is another question, but not one I would be 
 terribly worried about – it is possible that the error e.g. 
 just occurs too early for the EH machinery to be properly set 
 up yet. Other low-level parts of druntime have been converted 
 to directly abort (e.g. using assert(0)) instead. In fact, I am 
 about to overhaul sections_elf_shared in that respect anyway to 
 improve error reporting when mixing shared and non-shared 
 builds.

  — David

My various attempts on getting it to run behaved very erratic.
So I changed the parameters for cross compile, basically I 
removed all architecture specifics leaving only 
`-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side.

My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33

With the compiler switches changed I could run my test program 
and try the druntime test runner (albeit with some changes on 
math and stdio to get it linking):

./druntime-test-runner
0.000s PASS release32 core.atomic
0.000s PASS release32 core.bitop
0.000s PASS release32 core.checkedint
0.005s PASS release32 core.demangle
0.000s PASS release32 core.exception
0.002s PASS release32 core.internal.arrayop
0.000s PASS release32 core.internal.convert
0.000s PASS release32 core.internal.hash
0.000s PASS release32 core.internal.string
0.000s PASS release32 core.math
0.000s PASS release32 core.memory
0.002s PASS release32 core.sync.barrier
0.015s PASS release32 core.sync.condition
0.000s PASS release32 core.sync.config
0.016s PASS release32 core.sync.mutex
0.016s PASS release32 core.sync.rwmutex
0.002s PASS release32 core.sync.semaphore
Segmentation fault (core dumped)

The seg fault is from core.thread:1351

unittest
{
     auto t1 = new Thread({
         foreach (_; 0 .. 20)
             Thread.getAll;
     }).start;
     auto t2 = new Thread({
         foreach (_; 0 .. 20)
             GC.collect; // this seg faults
     }).start;
     t1.join();
     t2.join();
}

Calling GC.collect from the main thread doesn't seg fault.

Core dump is not very helpful - stack is garbage, but running 
with gdbserver a minimal program with the unit test I can see 
this:

Thread 1 "test" received signal SIGUSR1, User defined signal 1.
pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
libpthread/nptl/pthread_getattr_np.c:47
47        iattr->schedpolicy = thread->schedpolicy;
(gdb) step

Thread 1 "test" received signal SIGUSR2, User defined signal 2.
0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
maxevents=2, timeout=-1224756080) at 
libc/sysdeps/linux/common/epoll.c:58
58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
epoll_event *events, int maxevents, int timeout),
(gdb) step

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0xfffffffc in ?? ()
(gdb)

Dec 17 2017

Joakim <dlang joakim.fea.st> writes:

On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
 On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
 wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================

 The assert is inside an invariant which checks that the TLS 
 information has been extracted successfully. Perhaps uclibc 
 uses a TLS implementation that is not ABI-compatible with 
 glibc? (druntime needs to determine the TLS ranges to register 
 them with the GC, for the main thread as well as newly spawned 
 ones.)

 Where in the program lifecycle does the error occur? From the 
 backtrace, it looks like during C runtime startup, in which 
 case I am not quite seeing the connection to the GC.

 Why unwinding fails is another question, but not one I would 
 be terribly worried about – it is possible that the error e.g. 
 just occurs too early for the EH machinery to be properly set 
 up yet. Other low-level parts of druntime have been converted 
 to directly abort (e.g. using assert(0)) instead. In fact, I 
 am about to overhaul sections_elf_shared in that respect 
 anyway to improve error reporting when mixing shared and 
 non-shared builds.

  — David

 My various attempts on getting it to run behaved very erratic.
 So I changed the parameters for cross compile, basically I 
 removed all architecture specifics leaving only 
 `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C 
 side.

 My testing hardware is a ARM Cortex-A7, 
 http://linux-sunxi.org/A33

I believe that triple defaults to ARMv5, are you sure your 
Openwrt kernel is built for ARMv7?  Try running uname -m on the 
device to check.  For example, most low- to mid-level smartphones 
these days ship with ARMv8 chips but the kernel is only built for 
32-bit ARMv7, so they can only run 32-bit apps.

 With the compiler switches changed I could run my test program 
 and try the druntime test runner (albeit with some changes on 
 math and stdio to get it linking):

 ./druntime-test-runner
 0.000s PASS release32 core.atomic
 0.000s PASS release32 core.bitop
 0.000s PASS release32 core.checkedint
 0.005s PASS release32 core.demangle
 0.000s PASS release32 core.exception
 0.002s PASS release32 core.internal.arrayop
 0.000s PASS release32 core.internal.convert
 0.000s PASS release32 core.internal.hash
 0.000s PASS release32 core.internal.string
 0.000s PASS release32 core.math
 0.000s PASS release32 core.memory
 0.002s PASS release32 core.sync.barrier
 0.015s PASS release32 core.sync.condition
 0.000s PASS release32 core.sync.config
 0.016s PASS release32 core.sync.mutex
 0.016s PASS release32 core.sync.rwmutex
 0.002s PASS release32 core.sync.semaphore
 Segmentation fault (core dumped)

 The seg fault is from core.thread:1351

 unittest
 {
     auto t1 = new Thread({
         foreach (_; 0 .. 20)
             Thread.getAll;
     }).start;
     auto t2 = new Thread({
         foreach (_; 0 .. 20)
             GC.collect; // this seg faults
     }).start;
     t1.join();
     t2.join();
 }

 Calling GC.collect from the main thread doesn't seg fault.

Try running core.thread alone and see if it makes a difference, 
./druntime-test-runner core.thread, as I've sometimes seen tested 
modules interfere with each other.  I see that there are a few 
places where Glibc is assumed in core.thread, make sure those are 
right on Uclibc too:

https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301
https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410

You can also try skipping those tests that segfault for now and 
make sure everything else works, by adding something like 
version(skip) before that failing unittest block, so you know the 
extent of the test problems.

 Core dump is not very helpful - stack is garbage, but running 
 with gdbserver a minimal program with the unit test I can see 
 this:

 Thread 1 "test" received signal SIGUSR1, User defined signal 1.
 pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
 libpthread/nptl/pthread_getattr_np.c:47
 47        iattr->schedpolicy = thread->schedpolicy;
 (gdb) step

 Thread 1 "test" received signal SIGUSR2, User defined signal 2.
 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
 maxevents=2, timeout=-1224756080) at 
 libc/sysdeps/linux/common/epoll.c:58
 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
 epoll_event *events, int maxevents, int timeout),
 (gdb) step

 Thread 1 "test" received signal SIGSEGV, Segmentation fault.
 0xfffffffc in ?? ()
 (gdb)

The SIGUSR1/SIGUSR2 signals mean the GC ran fine.  You'd need to 
delve more into the code and the implementation details mentioned 
above to track this down.

On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
 Yes - latest LDC versions make cross compiling a breeze so 
 kudos to you guys for making this happening. I'm using Linux 
 subsystem for Window btw. so for me this is even more fun as I 
 can work on both environments natively :)

Yeah, you could just use the Windows ldc too, assuming you have a 
cross-compiler from that OS, as shown on the wiki for Windows 
with the Android NDK.

 The modifications need it surface deep are very few - some math 
 and memory streams functions are missing.

I don't know how much it differs from Glibc, but we'd always be 
interested in a port, assuming you have the time to submit a pull 
like this recent one for Musl:

https://github.com/dlang/druntime/pull/1997

 The road block looks to be somewhere in the GC and TLS, or the 
 interaction of them (at least this is my feeling ATM)

Not being able to do an explicit collect there isn't that big a 
deal: I'd skip that test for now and run everything else, then 
come back to that one once you have an idea of the bigger picture.

Dec 17 2017

Suliman <evermind live.ru> writes:

offtop: there is another interesing lib: https://uclibc-ng.org/

Dec 18 2017

Radu <void null.pt> writes:

On Sunday, 17 December 2017 at 19:05:04 UTC, Joakim wrote:
 On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:
 On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger 
 wrote:
 On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:
 When run, I get this error spuriously:

 ====================================
 core.exception.AssertError rt/sections_elf_shared.d(116): 
 Assertion failure
 Fatal error in EH code: _Unwind_RaiseException failed with 
 reason code: 9
 Aborted (core dumped)
 ====================================

 The assert is inside an invariant which checks that the TLS 
 information has been extracted successfully. Perhaps uclibc 
 uses a TLS implementation that is not ABI-compatible with 
 glibc? (druntime needs to determine the TLS ranges to 
 register them with the GC, for the main thread as well as 
 newly spawned ones.)

 Where in the program lifecycle does the error occur? From the 
 backtrace, it looks like during C runtime startup, in which 
 case I am not quite seeing the connection to the GC.

 Why unwinding fails is another question, but not one I would 
 be terribly worried about – it is possible that the error 
 e.g. just occurs too early for the EH machinery to be 
 properly set up yet. Other low-level parts of druntime have 
 been converted to directly abort (e.g. using assert(0)) 
 instead. In fact, I am about to overhaul sections_elf_shared 
 in that respect anyway to improve error reporting when mixing 
 shared and non-shared builds.

  — David

 My various attempts on getting it to run behaved very erratic.
 So I changed the parameters for cross compile, basically I 
 removed all architecture specifics leaving only 
 `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C 
 side.

 My testing hardware is a ARM Cortex-A7, 
 http://linux-sunxi.org/A33

 I believe that triple defaults to ARMv5, are you sure your 
 Openwrt kernel is built for ARMv7?  Try running uname -m on the 
 device to check.  For example, most low- to mid-level 
 smartphones these days ship with ARMv8 chips but the kernel is 
 only built for 32-bit ARMv7, so they can only run 32-bit apps.

 With the compiler switches changed I could run my test program 
 and try the druntime test runner (albeit with some changes on 
 math and stdio to get it linking):

 ./druntime-test-runner
 0.000s PASS release32 core.atomic
 0.000s PASS release32 core.bitop
 0.000s PASS release32 core.checkedint
 0.005s PASS release32 core.demangle
 0.000s PASS release32 core.exception
 0.002s PASS release32 core.internal.arrayop
 0.000s PASS release32 core.internal.convert
 0.000s PASS release32 core.internal.hash
 0.000s PASS release32 core.internal.string
 0.000s PASS release32 core.math
 0.000s PASS release32 core.memory
 0.002s PASS release32 core.sync.barrier
 0.015s PASS release32 core.sync.condition
 0.000s PASS release32 core.sync.config
 0.016s PASS release32 core.sync.mutex
 0.016s PASS release32 core.sync.rwmutex
 0.002s PASS release32 core.sync.semaphore
 Segmentation fault (core dumped)

 The seg fault is from core.thread:1351

 unittest
 {
     auto t1 = new Thread({
         foreach (_; 0 .. 20)
             Thread.getAll;
     }).start;
     auto t2 = new Thread({
         foreach (_; 0 .. 20)
             GC.collect; // this seg faults
     }).start;
     t1.join();
     t2.join();
 }

 Calling GC.collect from the main thread doesn't seg fault.

 Try running core.thread alone and see if it makes a difference, 
 ./druntime-test-runner core.thread, as I've sometimes seen 
 tested modules interfere with each other.  I see that there are 
 a few places where Glibc is assumed in core.thread, make sure 
 those are right on Uclibc too:

 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301
 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410

 You can also try skipping those tests that segfault for now and 
 make sure everything else works, by adding something like 
 version(skip) before that failing unittest block, so you know 
 the extent of the test problems.

 Core dump is not very helpful - stack is garbage, but running 
 with gdbserver a minimal program with the unit test I can see 
 this:

 Thread 1 "test" received signal SIGUSR1, User defined signal 1.
 pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at 
 libpthread/nptl/pthread_getattr_np.c:47
 47        iattr->schedpolicy = thread->schedpolicy;
 (gdb) step

 Thread 1 "test" received signal SIGUSR2, User defined signal 2.
 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, 
 maxevents=2, timeout=-1224756080) at 
 libc/sysdeps/linux/common/epoll.c:58
 58      CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct 
 epoll_event *events, int maxevents, int timeout),
 (gdb) step

 Thread 1 "test" received signal SIGSEGV, Segmentation fault.
 0xfffffffc in ?? ()
 (gdb)

 The SIGUSR1/SIGUSR2 signals mean the GC ran fine.  You'd need 
 to delve more into the code and the implementation details 
 mentioned above to track this down.

 On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:
 Yes - latest LDC versions make cross compiling a breeze so 
 kudos to you guys for making this happening. I'm using Linux 
 subsystem for Window btw. so for me this is even more fun as I 
 can work on both environments natively :)

 Yeah, you could just use the Windows ldc too, assuming you have 
 a cross-compiler from that OS, as shown on the wiki for Windows 
 with the Android NDK.

 The modifications need it surface deep are very few - some 
 math and memory streams functions are missing.

 I don't know how much it differs from Glibc, but we'd always be 
 interested in a port, assuming you have the time to submit a 
 pull like this recent one for Musl:

 https://github.com/dlang/druntime/pull/1997

 The road block looks to be somewhere in the GC and TLS, or the 
 interaction of them (at least this is my feeling ATM)

 Not being able to do an explicit collect there isn't that big a 
 deal: I'd skip that test for now and run everything else, then 
 come back to that one once you have an idea of the bigger 
 picture.

Got some time to work on this - just to clarify I'm developing 
against uClibc-ng 1.0.9, noticed others suggesting this and 
wanted to make it clear.

Re. the architecture - it is an armv7a as 'uname -a' says:

armv7l GNU/Linux'

I could not produce any working binary by specifying the armv7a 
architecture to ldc, so I used the generic arm architecture for 
gnueabihf, as previously stated.

I managed to get the druntime tester running (minus some math 
functions and memstream) except for one specific blocking issue - 
Thread.suspend does not work, it produces a segfault.
To test this I commented out all suspendAll/resumeAll unittests 
from core.thread and stubbed out GC.collect().

This issue is not linked to the GC, as the segfault happens even 
when disabling the GC.collect function and enable the 
suspendAll/resumeAll unittests, the GC just happens to use the 
suspend mechanics and exposes the core issue.

 From what I can see in gdb 'thread_resumeHandler' is to blame, it 
looks like 'sem_post( &suspendCount )' will immediately trigger 
the resumeSignal and the call for 'sigsuspend( &sigres )' is 
never made.

Like:

464                     status = sem_post( &suspendCount );
(gdb) n

Thread 2 "druntime-test-r" received signal SIGUSR2, User defined 
signal 2.
0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) 
(sp=0xb572f900 "$F\033") at thread.d:464
464                     status = sem_post( &suspendCount );
(gdb) info threads
   Id   Target Id         Frame
   1    Thread 16005.16005 "druntime-test-r" 0x001ba7a0 in 
_D4core6thread5Fiber5stateMxFNaNbNdNiNfZEQBnQBlQBh5State 
(this=0xb6d34980) at thread.d:4533
* 2    Thread 16005.16273 "druntime-test-r" 0x001b46d0 in 
core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 
"$F\033") at thread.d:464
(gdb) bt

core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 
"$F\033") at thread.d:464

void(void*) nothrow delegate) (fn=...) at thread.d:2600


Backtrace stopped: previous frame identical to this frame 
(corrupt stack?)
(gdb) n

Thread 2 "druntime-test-r" received signal SIGSEGV, Segmentation 
fault.
0xfffffffc in ?? ()
(gdb) bt


Backtrace stopped: previous frame identical to this frame 
(corrupt stack?)

Jan 09 2018

"David Nadlinger" <code klickverbot.at> writes:

On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
 From what I can see in gdb 'thread_resumeHandler' is to blame, it 
 looks like 'sem_post( &suspendCount )' will immediately trigger the 
 resumeSignal and the call for 'sigsuspend( &sigres )' is never made.

You mean thread_suspendHandler? Perhaps single-stepping through the code 
and having a look where the stack is corrupted would yield some insight? 
Is there possibly some ABI incompatibility caused by callWithStackShell?

sem_post shouldn't cause anything to happen on the calling thread 
itself; and it is explicitly documented to be re-entrant w.r.t. signals.

  —David

Jan 10 2018

Radu <void null.pt> writes:

On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
wrote:
 On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:
 From what I can see in gdb 'thread_resumeHandler' is to blame, 
 it looks like 'sem_post( &suspendCount )' will immediately 
 trigger the resumeSignal and the call for 'sigsuspend( &sigres 
 )' is never made.

 You mean thread_suspendHandler? Perhaps single-stepping through 
 the code and having a look where the stack is corrupted would 
 yield some insight? Is there possibly some ABI incompatibility 
 caused by callWithStackShell?

 sem_post shouldn't cause anything to happen on the calling 
 thread itself; and it is explicitly documented to be re-entrant 
 w.r.t. signals.

  —David

David, indeed sem_post works correctly, I guess gdb interpreted 
the sequence in the wrong order.

Moving the break point to the thread_resumeHandler I can see that 
the handler gets called, but I think you are right about the ABI, 
observe:

Thread 2 "druntime-test-r" received signal SIGUSR2, User defined 
signal 2.
0xb6e88648 in ?? () from target:/lib/libc.so.1
(gdb) bt



core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 
"$F\033") at thread.d:467

void(void*) nothrow delegate) (fn=...) at thread.d:2600


(gdb) c
Thread 2 "druntime-test-r" hit Breakpoint 1, thread_resumeHandler 
(sig=12) at thread.d:494
warning: Source file is more recent than executable.
494                 assert( sig == resumeSignalNumber );
(gdb) i f
Stack level 0, frame at 0xb572f4d8:
  pc = 0x1b487c in thread_resumeHandler (thread.d:494); saved pc = 
0xfffffffe
  called by frame at 0xb572f4d8
  source language d.
  Arglist at 0xb572f4c8, args: sig=12
  Locals at 0xb572f4c8, Previous frame's sp is 0xb572f4d8
  Saved registers:
   r11 at 0xb572f4d0, lr at 0xb572f4d4
.......
(gdb) disas
(gdb) disas
Dump of assembler code for function thread_resumeHandler:
    0x001b4864 <+0>:     push    {r11, lr}
    0x001b4868 <+4>:     mov     r11, sp


<thread_resumeHandler+72>
    0x001b4874 <+16>:    ldr     r1, [pc, r1]


    0x001b4880 <+28>:    ldr     r1, [r1]
    0x001b4884 <+32>:    cmp     r0, r1
    0x001b4888 <+36>:    bne     0x1b4894 <thread_resumeHandler+48>
    0x001b488c <+40>:    mov     sp, r11
=> 0x001b4890 <+44>:    pop     {r11, pc}

<thread_resumeHandler+76>
    0x001b4898 <+52>:    add     r1, pc, r0



    0x001b48a8 <+68>:    bl      0xf00c8 <_d_assert>
    0x001b48ac <+72>:    mulseq  r4, r8, r5
    0x001b48b0 <+76>:                    ; <UNDEFINED> 
instruction: 0x00117bd1

(gdb) ni
0x001b4890 in thread_resumeHandler (sig=-2) at thread.d:499
499             }
Warning:
Cannot insert breakpoint 0.
Cannot access memory at address 0xfffffffe

It looks that PC is invalid causing the segmentation fault.

Jan 10 2018

Joakim <dlang joakim.fea.st> writes:

On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
 wrote:
  [...]

 David, indeed sem_post works correctly, I guess gdb interpreted 
 the sequence in the wrong order.

 [...]

Have you ported much of druntime to Uclibc?  It currently assumes 
Glibc on linux by default, so if there are differences between 
the way the two handle such signals, it can cause problems.  For 
example, the Android Java Runtime intercepts SIGUSR1/SIGUSR2 and 
doesn't run their signal handlers, so I had to work around that 
issue:

https://github.com/dlang/druntime/pull/1851#discussion_r123886260

You may be running across a similar incompatibility, so I suggest 
you port all the version-dependent blocks of that module and its 
dependent modules first.

Jan 10 2018

Radu <void null.pt> writes:

On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:
 On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger 
 wrote:
  [...]

 David, indeed sem_post works correctly, I guess gdb 
 interpreted the sequence in the wrong order.

 [...]

 Have you ported much of druntime to Uclibc?  It currently 
 assumes Glibc on linux by default, so if there are differences 
 between the way the two handle such signals, it can cause 
 problems.  For example, the Android Java Runtime intercepts 
 SIGUSR1/SIGUSR2 and doesn't run their signal handlers, so I had 
 to work around that issue:

 https://github.com/dlang/druntime/pull/1851#discussion_r123886260

 You may be running across a similar incompatibility, so I 
 suggest you port all the version-dependent blocks of that 
 module and its dependent modules first.

I missed a bunch of details that where killing the signal 
handling, thanks for the guidance!, various size differences on 
structs. Fixed now.

druntime tests are passing in release mode now.

The debug build fails with:

core.exception.AssertError rt/sections_elf_shared.d(116): 
Assertion failure,

Code looks like:

     invariant()
     {
         assert(_moduleGroup.modules.length);
         static if (SharedELF)
         {
             assert(_tlsMod || !_tlsSize); // <-- fails
         }
     }

Stack trace:


sections_elf_shared.d:116

(this=<error reading variable: Cannot access memory at address 
0xe9>) at sections_elf_shared.d:67

(this=...) at sections_elf_shared.d:104

_D2rt6memory16initStaticDataGCFZ14__foreachbody1MFKSQBy19secti
ns_elf_shared3DSOZi (sg=...) at memory.d:23

_D2rt19sections_elf_shared3DSO7opApplyFMDFKSQBqQBqQyZiZi (dg=...) 
at sections_elf_shared.d:73



int(char[][]) function).runAll() () at dmain2.d:478

int(char[][]) function).tryExec(scope void() delegate) (dg=...) 
at dmain2.d:454

mainFunc=0xc5210 <D main>) at dmain2.d:487

__entrypoint.d:8



I don't really understand that invariant, I see that those vars 
are initialized way before in the init part and have values, for 
example:

_tlsMod  = 0 and _tlsSize = 388


Stack trace:


_D2rt19sections_elf_shared12scanSegmentsFNbNiKxS4core3sys5linux4link12dl_phd
_infoPSQDeQDe3DSOZv (info=..., pdso=0x307150) at sections_elf_shared.d:871

sections_elf_shared.d:455



target:/lib/ld-uClibc.so.0



Any idea why this fails and how to fix?

Jan 14 2018

Joakim <dlang joakim.fea.st> writes:

On Sunday, 14 January 2018 at 21:33:28 UTC, Radu wrote:
 On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:
 [...]

 I missed a bunch of details that where killing the signal 
 handling, thanks for the guidance!, various size differences on 
 structs. Fixed now.

Figured that was it, that's why I asked you a couple times how 
much you had ported druntime.

 druntime tests are passing in release mode now.

 [...]

_tlsMod and _tlsSize are extracted from shared libraries and then 
passed to __tls_get_addr to initialize thread-local storage for 
each library.  That invariant makes sure the TLS index _tlsMod 
isn't 0 along with a non-zero size, not sure why David checks for 
that.  It could be he doesn't expect the index 0 for a shared 
library whereas uClibc is okay with that?

I don't use this module or arbitrary shared libraries on 
Android/ARM, so I haven't had to mess with it.

Jan 15 2018

"David Nadlinger" <code klickverbot.at> writes:

On 15 Jan 2018, at 10:05, Joakim via digitalmars-d-ldc wrote:
 _tlsMod and _tlsSize are extracted from shared libraries and then 
 passed to __tls_get_addr to initialize thread-local storage for each 
 library.  That invariant makes sure the TLS index _tlsMod isn't 0 
 along with a non-zero size, not sure why David checks for that.  It 
 could be he doesn't expect the index 0 for a shared library whereas 
 uClibc is okay with that?

We inherited that from Martin's code – presumably, it's just never the 
case on glibc. If all the tests work with shared libraries (DMD test 
suite and runtime unit tests, plus druntime/test, as run by ctest), 
there is nothing to worry about.

  — David

Jan 15 2018

Joakim <dlang joakim.fea.st> writes:

On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
 Trying to run some D code on Openwrt with Uclibc and got stuck 
 by broken GC.

 Using LDC 1.6
 ====================================
 LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
 ====================================

 Run time libs where compiled with:

 ====================================
 ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
 -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ 
 -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" 
 BUILD_SHARED_LIBS=OFF
 ====================================

First thing I'd do is build and run the test runners, then make 
sure no tests are failing, particularly in druntime.  Another 
thing I notice is that you don't separate many of those C and D 
flags with semi-colons: not sure how that worked for you, as I 
get errors if I try something similar.  Also, you need to specify 
the C cross-compiler with CC=arm-openwrt-linux-gcc before running 
ldc-build-runtime: maybe you did that but forgot to mention it.

It is fairly easy to cross-compile the test runners too if you 
pass the --testrunners flag, see the instructions for the RPi and 
Android for examples:

https://wiki.dlang.org/Building_LDC_runtime_libraries
https://wiki.dlang.org/Build_D_for_Android

You may need to make some modifications to druntime or Phobos to 
get everything to compile, and you may have to specify some 
linker flags too, to get the test runners to link.  Let us know 
how it works out.

While you could reuse most of the glibc declarations for now, you 
may eventually need to patch druntime for Uclibc, as was done 
before for Bionic and the NetBSD libc for example:

https://github.com/dlang/druntime/pull/734
https://github.com/dlang/druntime/pull/1494

Dec 16 2017

Radu <void null.pt> writes:

On Saturday, 16 December 2017 at 14:14:40 UTC, Joakim wrote:
 On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:
 Trying to run some D code on Openwrt with Uclibc and got stuck 
 by broken GC.

 Using LDC 1.6
 ====================================
 LDC - the LLVM D compiler (1.6.0):
   based on DMD v2.076.1 and LLVM 5.0.0
   built with LDC - the LLVM D compiler (1.6.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: broadwell
   http://dlang.org - http://wiki.dlang.org/LDC

   Registered Targets:
     aarch64    - AArch64 (little endian)
     aarch64_be - AArch64 (big endian)
     arm        - ARM
     arm64      - ARM64 (little endian)
     armeb      - ARM (big endian)
     nvptx      - NVIDIA PTX 32-bit
     nvptx64    - NVIDIA PTX 64-bit
     ppc32      - PowerPC 32
     ppc64      - PowerPC 64
     ppc64le    - PowerPC 64 LE
     thumb      - Thumb
     thumbeb    - Thumb (big endian)
     x86        - 32-bit X86: Pentium-Pro and above
     x86-64     - 64-bit X86: EM64T and AMD64
 ====================================

 Run time libs where compiled with:

 ====================================
 ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf 
 -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 
 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ 
 -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" 
 BUILD_SHARED_LIBS=OFF
 ====================================

 First thing I'd do is build and run the test runners, then make 
 sure no tests are failing, particularly in druntime.  Another 
 thing I notice is that you don't separate many of those C and D 
 flags with semi-colons: not sure how that worked for you, as I 
 get errors if I try something similar.  Also, you need to 
 specify the C cross-compiler with CC=arm-openwrt-linux-gcc 
 before running ldc-build-runtime: maybe you did that but forgot 
 to mention it.

 It is fairly easy to cross-compile the test runners too if you 
 pass the --testrunners flag, see the instructions for the RPi 
 and Android for examples:

 https://wiki.dlang.org/Building_LDC_runtime_libraries
 https://wiki.dlang.org/Build_D_for_Android

 You may need to make some modifications to druntime or Phobos 
 to get everything to compile, and you may have to specify some 
 linker flags too, to get the test runners to link.  Let us know 
 how it works out.

 While you could reuse most of the glibc declarations for now, 
 you may eventually need to patch druntime for Uclibc, as was 
 done before for Bionic and the NetBSD libc for example:

 https://github.com/dlang/druntime/pull/734
 https://github.com/dlang/druntime/pull/1494

Test runners where out of the question as no program started. See 
my reply to David.
Yeah I setup the CC correctly, but curiously specifying a more 
fitting platform triple and -march on GCC produced non working 
binaries, I had to revert to the defaults.

Yes - latest LDC versions make cross compiling a breeze so kudos 
to you guys for making this happening. I'm using Linux subsystem 
for Window btw. so for me this is even more fun as I can work on 
both environments natively :)

The modifications need it surface deep are very few - some math 
and memory streams functions are missing.

The road block looks to be somewhere in the GC and TLS, or the 
interaction of them (at least this is my feeling ATM)

Dec 17 2017

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - Openwrt Linux Uclibc ARM GC issue