www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Remaining Travis merge-2.064 failure

reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
Hi all,

so there currently seems to be one remaining Travis build failure on
the merge-2.064 branch, namely a linking issue in the release tests,
and only on LLVM 3.4:
https://travis-ci.org/ldc-developers/ldc/builds/25057239

The issue looks like a symbol emission problem where std.net.curl is
getting pulled in randomly because it contained a template instance
that is missing from whatever other module by chance. However, I can
neither replicate the issue on x86_64 Linux locally nor on OS X.

Any ideas? We should really try to get the merge-* branches integrated
into master ASAP.

Best,
David
May 17 2014
parent reply "Kai Nacke" <kai redstar.de> writes:
On Sunday, 18 May 2014 at 02:04:52 UTC, David Nadlinger via 
digitalmars-d-ldc wrote:
 Hi all,

 so there currently seems to be one remaining Travis build 
 failure on
 the merge-2.064 branch, namely a linking issue in the release 
 tests,
 and only on LLVM 3.4:
 https://travis-ci.org/ldc-developers/ldc/builds/25057239

 The issue looks like a symbol emission problem where 
 std.net.curl is
 getting pulled in randomly because it contained a template 
 instance
 that is missing from whatever other module by chance. However, 
 I can
 neither replicate the issue on x86_64 Linux locally nor on OS X.

 Any ideas? We should really try to get the merge-* branches 
 integrated
 into master ASAP.

 Best,
 David
Hi David, I think this is a problem with the multilib setup. This Travis build passes: https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the only difference is that the 32bit libraries are not build (and some applications are not installed, e.g. gcc-multilib). Regards, Kai
May 18 2014
next sibling parent reply "Kai Nacke" <kai redstar.de> writes:
It is also not reproducible on Ubuntu 13.10 with multilib 
creation.

Regards,
Kai
May 19 2014
parent David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
On Mon 19 May 2014 05:46:44 PM CEST, Kai Nacke via digitalmars-d-ldc 
wrote:
 It is also not reproducible on Ubuntu 13.10 with multilib creation.
May 20 2014
prev sibling parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
Hi Kai,

On Mon, May 19, 2014 at 7:36 AM, Kai Nacke via digitalmars-d-ldc
<digitalmars-d-ldc puremagic.com> wrote:
 I think this is a problem with the multilib setup. This Travis build passes:
 https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the only
 difference is that the 32bit libraries are not build (and some applications
 are not installed, e.g. gcc-multilib).
There is still be the possibility that this is a bug in LDC. If it's a case of template symbols not being emitted correctly, then the order in which the other object files in libphobos.a are tried might differ between different toolchain versions, and we might just get unlucky to hit curl.o (std.net) in the multilib case. If you have a setup where you can actually reproduce this, it might be worth having a short look at what actually causes curl.o to be pulled in. Maybe GNU ld has an option similar to the OS X one to do this, but if you can't find it like me, you could adapt that hacky script I sent you once to track down which unresolved symbol causes the issue. I agree that it is not worth postponing the release due to this, though, if we don't have concrete evidence that this also affects user code. Best, David
May 20 2014
next sibling parent "Kai Nacke" <kai redstar.de> writes:
Hi David!

On Tuesday, 20 May 2014 at 10:16:46 UTC, David Nadlinger via 
digitalmars-d-ldc wrote:
 I agree that it is not worth postponing the release due to this,
 though, if we don't have concrete evidence that this also 
 affects user
 code.
I changed my mind here. The problem is reproducible with the beta1 binaries if used with a different Linux distro. I have some time this weekend and will analyze this further. Regards, Kai
May 23 2014
prev sibling parent reply "Kai Nacke" <kai redstar.de> writes:
Hi David!

On Tuesday, 20 May 2014 at 10:16:46 UTC, David Nadlinger via 
digitalmars-d-ldc wrote:
 Hi Kai,

 On Mon, May 19, 2014 at 7:36 AM, Kai Nacke via digitalmars-d-ldc
 <digitalmars-d-ldc puremagic.com> wrote:
 I think this is a problem with the multilib setup. This Travis 
 build passes:
 https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the 
 only
 difference is that the 32bit libraries are not build (and some 
 applications
 are not installed, e.g. gcc-multilib).
I am a bit stuck here. I compiled stdiobase.d (one of the failing tests) with -unittest -main. Then I extracted libphobos-ldc.a and resolved all dependencies by hand. This results in gcc -o stdiobase ../stdiobase.o ../__main.o src_rt_dmain2.o std_stdio.o src_object_.o src_ldc_eh.o src_rt_monitor_.o src_rt_critical_.o src_rt_lifetime.o src_rt_tlsgc.o src_rt_aaA.o src_rt_cast_.o src_core_memory.o src_gc_gc.o src_gc_bits.o src_core_sync_mutex.o src_core_sync_exception.o src_core_exception.o src_core_thread.o src_gc_proxy.o src_core_time.o src_rt_adi.o src_rt_typeinfo_ti_*.o src_rt_util_console.o src_rt_sections_ldc.o src_rt_sections_linux.o std_string.o std_exception.o std_format.o src_core_sys_posix_netdb.o std_utf.o std_array.o std_conv.o std_typecons.o std_algorithm.o std_range.o std_typetuple.o std_traits.o std_ascii.o std_functional.o std_uni.o src_rt_memory.o src_core_stdc_errno.o src_rt_util_hash.o src_gc_os.o src_rt_minfo.o src_core_bitop.o src_rt_util_string.o src_rt_util_utf.o src_rt_util_container.o src_rt_aApply.o src_core_runtime.o src_ldc_arrayinit.o src_rt_qsort.o std_math.o std_random.o std_bitmanip.o std_container.o std_internal_unicode_comp.o std_internal_unicode_tables.o src_core_demangle.o std_numeric.o std_complex.o src_rt_switch_.o errno.c.o -lc -lpthread -lm -ldl -lrt This creates the executable stdiobase without a link error and without using -lcurl. I don't understand this. Any ideas? (All on Ubuntu 12 64bit.) Regards, Kai
May 25 2014
parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
Hi Kai,

On 05/26/2014 07:56 AM, Kai Nacke via digitalmars-d-ldc wrote:
 This creates the executable stdiobase without a link error and without
 using -lcurl. I don't understand this. Any ideas? (All on Ubuntu 12 64bit.)
As I mentioned, I suspect that the issue is dependent on the order the linker searches the object files. Several object files might have the missing template symbol, curl.o just being one of them. If you run the failing compile with -L-M, you should see a mention of why curl.o is pulled in near the top of the output. IIRC the output also includes information about for which specifc module the symbol was requested. Then, you'd need to debug into LDC to see why the symbol in question is not emitted to the module that needs it. Best, David
May 26 2014
parent reply "Kai Nacke" <kai redstar.de> writes:
On Monday, 26 May 2014 at 09:03:29 UTC, David Nadlinger via 
digitalmars-d-ldc wrote:
 Hi Kai,

 On 05/26/2014 07:56 AM, Kai Nacke via digitalmars-d-ldc wrote:
 This creates the executable stdiobase without a link error and 
 without
 using -lcurl. I don't understand this. Any ideas? (All on 
 Ubuntu 12 64bit.)
As I mentioned, I suspect that the issue is dependent on the order the linker searches the object files. Several object files might have the missing template symbol, curl.o just being one of them. If you run the failing compile with -L-M, you should see a mention of why curl.o is pulled in near the top of the output. IIRC the output also includes information about for which specifc module the symbol was requested. Then, you'd need to debug into LDC to see why the symbol in question is not emitted to the module that needs it. Best, David
This reveals: /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_net_curl.o) /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_stdio.o) (_D6object15__T8capacityTaZ8capacityFNaNbNdAaZm) The mentioned weak symbol is defined in several files: _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_uni.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_math.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_exception.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_conv.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_functional.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_string.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_typetuple.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_container.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_numeric.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_complex.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_array.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_algorithm.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_bitmanip.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_range.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_format.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_utf.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_traits.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_typecons.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_random.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_net_curl.o but std.stdio is missing. This at least explains it... Thanks. Regards, Kai
May 26 2014
parent reply Christian Kamm <kamm incasoftware.de> writes:
 /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_net_curl.o)
                              
 /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_stdio.o)
 (_D6object15__T8capacityTaZ8capacityFNaNbNdAaZm)
You are saying std_stdio.o uses that symbol - and the bug is that the instantiation was not emitted into that object file? The symptom is that it then uses the instantiation from std_net_curl.o, leading to a linker error because -lcurl wasn't passed? My runtime/std/stdio.o also only has a U _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm so I think I can reproduce it locally. Regards, Christian
Jun 09 2014
parent reply Christian Kamm <kamm incasoftware.de> writes:
On 09.06.2014 09:29, Christian Kamm wrote:
 My runtime/std/stdio.o also only has a
 U _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm
 so I think I can reproduce it locally.
The symbol isn't emitted because its instantiatingModule is std.bitmanip - which is not a root module - and thus the function is ignored by DtoDefineFunction. I think the comment in there (functions.cpp:922) is wrong. The frontend seems to try hard to make sure instantiatingModule is a non-root module if possible. That should mean LDC is not emitting templates that have a non-root module instantiating them somewhere. The idea probably is that you shouldn't need to emit functions again if they were already emitted into a library you import and link. (if that's desired, the correct fix is probably to require -lcurl when linking phobos...) I wonder if you can break that behavior with a cycle of imports-that-instantiate. But I couldn't make a failing test case. Cheers, Christian
Jun 09 2014
parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
On 9 Jun 2014, at 10:44, Christian Kamm via digitalmars-d-ldc wrote:
 I think the comment in there (functions.cpp:922) is wrong. The 
 frontend
 seems to try hard to make sure instantiatingModule is a non-root 
 module
 if possible. That should mean LDC is not emitting templates that have 
 a
 non-root module instantiating them somewhere.
This logic was adapted from what I gathered from discussions when 2.064 (I think) came out. If you look at FuncDeclaration::toObjFile in DMD 2.064.2, you'll see that it uses the same logic to determine whether to emit a certain symbol. In more recent versions, Kenji has moved the check into the frontend at our (GDC/LDC) request, but it is still fundamentally the same (FuncDeclaration::needsCodegen, https://github.com/D-Programming-Language/dmd/pull/3107).
 The idea probably is that you shouldn't need to emit functions again 
 if
 they were already emitted into a library you import and link. (if 
 that's
 desired, the correct fix is probably to require -lcurl when linking
 phobos...)
The idea is instead that functions that are already part of an *object file* you need to link anyway should not be emitted again. This is a sound design, as long as you only omit template instances that you know are already required by somebody else in your dependency graph (ignoring cycles for the moment). Now, obviously std.net.curl isn't in the import graph of std.stdio. What seems to happen here is that std.net.curl only contains the symbol by accident, even though we thought it was going to be provided by somebody else. And as we don't build with symbol-per-section and --gc-sections yet, this of course causes us to also pull in the libcurl dependencies. Thinking about this a bit, it seems very plausible that the compiler actually works as intended here. This suggests that a possible fix would be to split off everything that depends on curl into a separate static library, as this would guarantee that the linker looks up the object files from the non-curl modules first (but then, of course, we'd either have to specify libphobos-ldc twice, or use the GNU ld grouping options, to get std.net.curl to link with its Phobos dependencies). Best, David
Jun 09 2014
next sibling parent reply Christian Kamm <kamm incasoftware.de> writes:
On 09.06.2014 13:09, David Nadlinger via digitalmars-d-ldc wrote:
 On 9 Jun 2014, at 10:44, Christian Kamm via digitalmars-d-ldc wrote:
 I think the comment in there (functions.cpp:922) is wrong. The frontend
 seems to try hard to make sure instantiatingModule is a non-root module
 if possible. That should mean LDC is not emitting templates that have a
 non-root module instantiating them somewhere.
This logic was adapted from what I gathered from discussions when 2.064 (I think) came out. If you look at FuncDeclaration::toObjFile in DMD 2.064.2, you'll see that it uses the same logic to determine whether to emit a certain symbol.
Yes, I saw. I just think the comment is misleading: it says "Skip generating code if this part of a TemplateInstance that is instantiated only by non-root modules" but actually it seems to skip instances that have any non-root module instantiating them. I'll make a pull request to fix it.
 The idea probably is that you shouldn't need to emit functions again if
 they were already emitted into a library you import and link. (if that's
 desired, the correct fix is probably to require -lcurl when linking
 phobos...)
The idea is instead that functions that are already part of an *object file* you need to link anyway should not be emitted again. This is a sound design, as long as you only omit template instances that you know are already required by somebody else in your dependency graph (ignoring cycles for the moment).
Okay. Aside: how does it deal with cycles? Wouldn't no instance be emitted if two modules both instantiate the same function and include each other? (in practice both were emitted for me)
 Thinking about this a bit, it seems very plausible that the compiler
 actually works as intended here.
Agreed. In dmd's libphobos2, array_10f_5e7.o and array_187_86f.o use the symbol while only object_5_50d.o defines it. Why doesn't dmd's stdio use it? Regards, Christian
Jun 09 2014
next sibling parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
Hi Christian,

On 9 Jun 2014, at 18:28, Christian Kamm via digitalmars-d-ldc wrote:
 I'll make a pull request to fix it.
Yes, that would be great.
 Okay. Aside: how does it deal with cycles? Wouldn't no instance be
 emitted if two modules both instantiate the same function and include
 each other? (in practice both were emitted for me)
If ti->instantiatingModule (the module you would like to pull the symbol in from) itself also imports at least one of the root modules, then importsRoot will be true, and the symbol will still be defined. Exactly the same situation will occur (with m/mi swapped) when building what currently is ti->instantiatingModule, so you end up emitting that template into both modules, as you observed. On a somewhat unrelated note, the use of "insearch" in that piece of code is a beautiful example of DMD's … uhm … pasta-inspired design. Best, David
Jun 10 2014
parent Christian Kamm <kamm incasoftware.de> writes:
On 10.06.2014 14:09, David Nadlinger via digitalmars-d-ldc wrote:
 Okay. Aside: how does it deal with cycles? Wouldn't no instance be
 emitted if two modules both instantiate the same function and include
 each other? (in practice both were emitted for me)
If ti->instantiatingModule (the module you would like to pull the symbol in from) itself also imports at least one of the root modules, then importsRoot will be true, and the symbol will still be defined. Exactly the same situation will occur (with m/mi swapped) when building what currently is ti->instantiatingModule, so you end up emitting that template into both modules, as you observed.
Oh, right! Thanks for clearing that up for me. Cheers, Christian
Jun 10 2014
prev sibling parent "Kai Nacke" <kai redstar.de> writes:
On Monday, 9 June 2014 at 16:28:11 UTC, Christian Kamm wrote:
 On 09.06.2014 13:09, David Nadlinger via digitalmars-d-ldc 
 wrote:
 Yes, I saw. I just think the comment is misleading: it says
 "Skip generating code if this part of a TemplateInstance that is
 instantiated only by non-root modules"
 but actually it seems to skip instances that have any non-root 
 module
 instantiating them. I'll make a pull request to fix it.
That's a nice summary what the code does. A pull request would be great! Regards, Kai
Jun 10 2014
prev sibling parent reply Christian Kamm <kamm incasoftware.de> writes:
 Thinking about this a bit, it seems very plausible that the compiler
 actually works as intended here.
For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue. It seems like we could imitate what dmd -lib does or use --gc-sections like David suggested to fix this for real. Could we, as a workaround, reorder the object files in the phobos library to make std.net.curl come last? Does the static linker work that way? It's annoying that this blocks a new ldc version from being released. Regards, Christian
Jun 14 2014
next sibling parent Christian Kamm <kamm incasoftware.de> writes:
On 14.06.2014 14:18, Christian Kamm wrote:
 Thinking about this a bit, it seems very plausible that the compiler
 actually works as intended here.
For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue. It seems like we could imitate what dmd -lib does or use --gc-sections like David suggested to fix this for real. Could we, as a workaround, reorder the object files in the phobos library to make std.net.curl come last? Does the static linker work that way? It's annoying that this blocks a new ldc version from being released.
Maybe removing the call to ranlib or using ranlib -D could help? The order in the archive looks fine (string.o before curl.o).
Jun 14 2014
prev sibling next sibling parent reply David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
On Sat, Jun 14, 2014 at 2:18 PM, Christian Kamm via digitalmars-d-ldc
<digitalmars-d-ldc puremagic.com> wrote:
 It's annoying that this blocks a new ldc version from being released.
At this point, I think _any_ fix for the issue would be fine. We really need to get merge-2.064 and merge-2.065 out there. Unfortunately, I still can't reproduce the issue. The Travis CI docs say that they run Ubuntu 12.04 LTS, but I couldn't get the linker error to appear on a EC2 instance I set up from the Canonical AMI. Best, David
Jun 14 2014
parent reply Christian Kamm <kamm incasoftware.de> writes:
On 14.06.2014 20:28, David Nadlinger via digitalmars-d-ldc wrote:
 Unfortunately, I still can't reproduce the issue. The Travis CI docs
 say that they run Ubuntu 12.04 LTS, but I couldn't get the linker
 error to appear on a EC2 instance I set up from the Canonical AMI.
Is it possible to log into the Travis instances? If it is, looking at the output and playing around with the static linker flags could help finding a workaround. I'd be interested in nm -s on that libphobos2.a as well as the order of files in the archive and the symbols in the object files. I don't think ranlib -D would change anything - it seems to only force some meta information to 0, not change the lookup order. Cheers, Christian
Jun 14 2014
parent David Nadlinger via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> writes:
On Sat, Jun 14, 2014 at 8:45 PM, Christian Kamm via digitalmars-d-ldc
<digitalmars-d-ldc puremagic.com> wrote:
 I'd be interested in nm -s on that libphobos2.a as well as the order of
 files in the archive and the symbols in the object files.
You can get at the output by simply submitting a pull request that adds the "nm -s" command to .travis.yml. Directly logging in is not possible, as far as I'm aware. David
Jun 14 2014
prev sibling parent "Kai Nacke" <kai redstar.de> writes:
On Saturday, 14 June 2014 at 12:18:54 UTC, Christian Kamm wrote:
 Thinking about this a bit, it seems very plausible that the 
 compiler
 actually works as intended here.
For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue.
Yes, but the std.array.Appender() ctor which calls object.capacity() is also not emitted by dmd 2.064. This ctor is called nowhere so it looks we are emitting too much code. The ctor is: pure nothrow ref safe std.array.Appender!(immutable(char)[]).Appender std.array.Appender!(immutable(char)[]).Appender.__ctor(char[]) Regards, Kai
Jun 15 2014