www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - linker wrapper

reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
Just saw another linker error in d.learn, and it got me thinking

dmd just calls the linker, and the linker spits out link errors.  But what  
if we had a 'linker wrapper' program which translated mangled names into  
demangled names?  It would at least help people understand the problem  
better.  How many times does a newbie come back and say "I have this  
problem, and dmd spits out some weird message I don't understand" and it  
takes a person who half-speaks mangled d names to understand what the name  
is.

Given that D already includes a demangler, wouldn't it be rather trivial  
to write this program in D?

I know it would make my life a bit better.

Thoughts?  Takers?

-Steve
Nov 11 2010
next sibling parent reply "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 11 Nov 2010 15:54:50 +0300, Steven Schveighoffer  =

<schveiguy yahoo.com> wrote:

 Just saw another linker error in d.learn, and it got me thinking

 dmd just calls the linker, and the linker spits out link errors.  But =

 what if we had a 'linker wrapper' program which translated mangled nam=

 into demangled names?  It would at least help people understand the  =

 problem better.  How many times does a newbie come back and say "I hav=

 this problem, and dmd spits out some weird message I don't understand"=

 and it takes a person who half-speaks mangled d names to understand wh=

 the name is.

 Given that D already includes a demangler, wouldn't it be rather trivi=

 to write this program in D?

 I know it would make my life a bit better.

 Thoughts?  Takers?

 -Steve

I suggested that previously [1], and I think that need to be a part of D= MD = for a simple reason: some of the names can't be demangled because their = = are somewhat "hashed" to avoid limitations such as long literal name. E.= g. = "_D4math6M=D0=B1rix9=D1=8544F32__T3addTC=D0=82=E2=80=93=D0=8EZ=D0=82=E2=80= =9E=D1=9CF=D0=82=E2=80=94=D1=9C=D0=82=C2=98=D2=91=D0=82=E2=80=94=C2=98".= Given that dmd calls a linker internally, it could also retrieve linker = = errors (if any present), translate and then show them, with a list of = suggestions to fix the problem if possible. Here are an example: module test1; void foo() {} module test2; import test1; void main() { foo(); } #dmd test2.d Desired output: Error: No implementation found for method void foo() = defined in module test1. Try linking with test1.d Actual output: Error 42: Symbol Undefined _D5test13fooFZv [1] http://d.puremagic.com/issues/show_bug.cgi?id=3D2238
Nov 11 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Denis Koroskin:

 Given that dmd calls a linker internally, it could also retrieve linker  
 errors (if any present), translate and then show them, with a list of  
 suggestions to fix the problem if possible. Here are an example:
 
 module test1;
 void foo() {}
 
 module test2;
 import test1;
 void main() { foo(); }
 
 #dmd test2.d
 
 Desired output: Error: No implementation found for method void foo()  
 defined in module test1. Try linking with test1.d
 Actual output: Error 42: Symbol Undefined _D5test13fooFZv
 
 [1] http://d.puremagic.com/issues/show_bug.cgi?id=2238

I have just suggested a similar error message in D.learn newsgroup. Of course the compiler can also do the damm thing by itself and find the module it needs (this feature may be disable with a compiler switch, for larger compilations, do-it-yourself-people, etc). Bye, bearophile
Nov 11 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/11/10 5:32 AM, Steven Schveighoffer wrote:
 On Thu, 11 Nov 2010 08:20:46 -0500, Denis Koroskin <2korden gmail.com>
 wrote:

 On Thu, 11 Nov 2010 15:54:50 +0300, Steven Schveighoffer
 <schveiguy yahoo.com> wrote:

 Just saw another linker error in d.learn, and it got me thinking

 dmd just calls the linker, and the linker spits out link errors. But
 what if we had a 'linker wrapper' program which translated mangled
 names into demangled names? It would at least help people understand
 the problem better. How many times does a newbie come back and say "I
 have this problem, and dmd spits out some weird message I don't
 understand" and it takes a person who half-speaks mangled d names to
 understand what the name is.

 Given that D already includes a demangler, wouldn't it be rather
 trivial to write this program in D?

 I know it would make my life a bit better.

 Thoughts? Takers?

 -Steve

I suggested that previously [1], and I think that need to be a part of DMD for a simple reason: some of the names can't be demangled because their are somewhat "hashed" to avoid limitations such as long literal name. E.g. "_D4math6Mбrix9х44F32__T3addTCЂ–ЎZЂ„ќFЂ—ќЂ˜ґЂ—˜".

Not sure we can do anything about that, if we're only giving dmd object files to work with. I think we need to implement something different in terms of 'hashing'. It's another idea I've had, but not sure if I've expressed it. Typically, you have things like this in a module: struct Y(T) {} struct X(T) { void foo(Y!T y) {...} } where this is in std.amodule. The seemingly small foo symbol gets exploded to the equivalent of: std.amodule.X!int.foo(std.amodule.Y!int) I see a lot of repetition in there. I think we can do some kind of lossless compression with name mangling so you represent repetetive symbols such as module names and type parameters as back references. These would also be demangleable.
 Given that dmd calls a linker internally, it could also retrieve
 linker errors (if any present), translate and then show them, with a
 list of suggestions to fix the problem if possible. Here are an example:

 module test1;
 void foo() {}

 module test2;
 import test1;
 void main() { foo(); }

 #dmd test2.d

 Desired output: Error: No implementation found for method void foo()
 defined in module test1. Try linking with test1.d
 Actual output: Error 42: Symbol Undefined _D5test13fooFZv

This would be cool too. BTW, does anyone know if the linkers used by DMD allow options to output easily-parsable errors? -Steve

Apparently gnu's ld demangles by default: http://linux.die.net/man/1/ld (Search for "demangle".) But it recognizes C++ mangling, not D mangling. Andrei
Nov 11 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Apparently gnu's ld demangles by default:
 
 http://linux.die.net/man/1/ld
 
 (Search for "demangle".) But it recognizes C++ mangling, not D mangling.

The same for optlink.
Nov 11 2010
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
Denis Koroskin wrote:
 I suggested that previously [1], and I think that need to be a part of 
 DMD for a simple reason: some of the names can't be demangled because 
 their are somewhat "hashed" to avoid limitations such as long literal 
 name. E.g. "_D4math6Mбrix9х44F32__T3addTCЂ–ЎZЂ„ќFЂ—ќЂ˜ґЂ—˜".

This is not a "hashed" symbol, it is compressed and can be converted back to the mangled string. The latest svn version of core.demangle has a function to decompress it. An example of a "hashed" symbol is _D3dmd19TemplateDeclaration19TemplateDeclaration27deduceFunctionTemplateMatchMFS3dmd3Loc3LocC3dmECA03C220132A92B4B4768C252AA61AC It cannot be fully demangled (though just demangling the symbol name without type information should work most of the time and could already be helpful). I've seen dmd using a third way dealing with strings longer than 255 characters: instead of a single byte the length is encoded as 0xff 0x00, LSB, MSB before the symbol characters. Walter, is optlink able to deal with this 3rd representation in all places, so we can get rid of those other two encodings? I was also considering adding the demangling to the build output parsing of Visual D, but somehow didn't yet get to it. Please note that sometimes mangled names are also used in error messages produced by dmd. Rainer
Nov 11 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Rainer Schuetze wrote:
 
 Denis Koroskin wrote:
 I suggested that previously [1], and I think that need to be a part of 
 DMD for a simple reason: some of the names can't be demangled because 
 their are somewhat "hashed" to avoid limitations such as long literal 
 name. E.g. "_D4math6Mбrix9х44F32__T3addTCЂ–ЎZЂ„ќFЂ—ќЂ˜ґЂ—˜".

This is not a "hashed" symbol, it is compressed and can be converted back to the mangled string. The latest svn version of core.demangle has a function to decompress it. An example of a "hashed" symbol is _D3dmd19TemplateDeclaration19TemplateDeclaration27deduceFunctionTemplateMatchMFS3dmd3Loc3LocC3dmECA03C220132 92B4B4768C252AA61AC It cannot be fully demangled (though just demangling the symbol name without type information should work most of the time and could already be helpful). I've seen dmd using a third way dealing with strings longer than 255 characters: instead of a single byte the length is encoded as 0xff 0x00, LSB, MSB before the symbol characters. Walter, is optlink able to deal with this 3rd representation in all places, so we can get rid of those other two encodings?

The way long symbols on Windows are dealt with: 1. if they fit, they fit 2. try using the extended length method 3. try compressing the string 4. use a hash for the string 1..3 are reversible, 4 is not.
 
 I was also considering adding the demangling to the build output parsing 
 of Visual D, but somehow didn't yet get to it. Please note that 
 sometimes mangled names are also used in error messages produced by dmd.
 
 Rainer
 

Nov 11 2010
parent Rainer Schuetze <r.sagitario gmx.de> writes:
Walter Bright wrote:
 Rainer Schuetze wrote:
 Walter, is optlink able to deal with this 3rd representation in all 
 places, so we can get rid of those other two encodings?

The way long symbols on Windows are dealt with: 1. if they fit, they fit 2. try using the extended length method 3. try compressing the string 4. use a hash for the string 1..3 are reversible, 4 is not.

Doing it always in this order would be fine, because 2. is bound to never fail. Checking the source (but probably missing some places), I see that - obj_namestring uses 1,2 - obj_mangle uses 1,3,4,2 with a length limit of 128 - Library::FillDict uses 1,2, but might crash due to limited buffer size (for symbols longer than 468 characters - is this a limitation of the library format?) - cv_namestring uses 1,2 So I tried compiling ddmd skipping 3 and 4 in obj_mangle, but that caused crashes for symbols longer than IDMAX (=900). Increasing that limit (max length was >5600) caused a crash in optlink. But as far as I can see, limiting the usage of 3 and 4 to symbols longer than IDMAX seems to work (though I have not done any more testing). Actually, this version is already in the code, but deliberately replaced by the limit of 128 characters, so there might be a reason... Rainer
Nov 12 2010
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 11 Nov 2010 08:20:46 -0500, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 11 Nov 2010 15:54:50 +0300, Steven Schveighoffer  
 <schveiguy yahoo.com> wrote:

 Just saw another linker error in d.learn, and it got me thinking

 dmd just calls the linker, and the linker spits out link errors.  But  
 what if we had a 'linker wrapper' program which translated mangled  
 names into demangled names?  It would at least help people understand  
 the problem better.  How many times does a newbie come back and say "I  
 have this problem, and dmd spits out some weird message I don't  
 understand" and it takes a person who half-speaks mangled d names to  
 understand what the name is.

 Given that D already includes a demangler, wouldn't it be rather  
 trivial to write this program in D?

 I know it would make my life a bit better.

 Thoughts?  Takers?

 -Steve

I suggested that previously [1], and I think that need to be a part of DMD for a simple reason: some of the names can't be demangled because their are somewhat "hashed" to avoid limitations such as long literal name. E.g. "_D4math6Mбrix9х44F32__T3addTCЂ–ЎZЂ„ќFЂ—ќЂ˜ґЂ—˜".

Not sure we can do anything about that, if we're only giving dmd object files to work with. I think we need to implement something different in terms of 'hashing'. It's another idea I've had, but not sure if I've expressed it. Typically, you have things like this in a module: struct Y(T) {} struct X(T) { void foo(Y!T y) {...} } where this is in std.amodule. The seemingly small foo symbol gets exploded to the equivalent of: std.amodule.X!int.foo(std.amodule.Y!int) I see a lot of repetition in there. I think we can do some kind of lossless compression with name mangling so you represent repetetive symbols such as module names and type parameters as back references. These would also be demangleable.
 Given that dmd calls a linker internally, it could also retrieve linker  
 errors (if any present), translate and then show them, with a list of  
 suggestions to fix the problem if possible. Here are an example:

 module test1;
 void foo() {}

 module test2;
 import test1;
 void main() { foo(); }

 #dmd test2.d

 Desired output: Error: No implementation found for method void foo()  
 defined in module test1. Try linking with test1.d
 Actual output: Error 42: Symbol Undefined _D5test13fooFZv

This would be cool too. BTW, does anyone know if the linkers used by DMD allow options to output easily-parsable errors? -Steve
Nov 11 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 11/11/10 4:54 AM, Steven Schveighoffer wrote:
 Just saw another linker error in d.learn, and it got me thinking

 dmd just calls the linker, and the linker spits out link errors. But
 what if we had a 'linker wrapper' program which translated mangled names
 into demangled names? It would at least help people understand the
 problem better. How many times does a newbie come back and say "I have
 this problem, and dmd spits out some weird message I don't understand"
 and it takes a person who half-speaks mangled d names to understand what
 the name is.

 Given that D already includes a demangler, wouldn't it be rather trivial
 to write this program in D?

 I know it would make my life a bit better.

 Thoughts? Takers?

 -Steve

Would love to see this implemented. Mangled symbols are the main reason for which linker messages are considered incomprehensible. Andrei
Nov 11 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 Mangled symbols are the main reason 
 for which linker messages are considered incomprehensible.

I beg to differ. The same confusion appears when linking C programs, where the names are not mangled. Top 3 linker questions: 1. What does it mean when it says "foo is referenced but not defined" ? 2. What does it mean when it says that "foo is defined in more than one module" ? 3. Why is my executable file so large? While it's nice to demangle the names, and optlink does so for C++ names, it doesn't reduce the confusion about what the linker is doing. Surprisingly, I see these questions not just from newbies, but regularly from people with 10+ years of experience.
Nov 11 2010