www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DMD's codeview types for arrays

reply Jascha Wetzel <jascha mainia.de> writes:
Hi,

examining the codeview symbols that DMD for windows generates, i've
found that it sets the symbol type of arrays to 0x23 which is UQUAD
(ulong in D). I can't find information on what the array's element type is.
Can it be determined from the CV data?
If not, shouldn't that be possible?
It would be kind of cumbersome to have to parse the source for that.

/jascha
Jan 16 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Hi,
 
 examining the codeview symbols that DMD for windows generates, i've
 found that it sets the symbol type of arrays to 0x23 which is UQUAD
 (ulong in D). I can't find information on what the array's element type is.
 Can it be determined from the CV data?
 If not, shouldn't that be possible?
 It would be kind of cumbersome to have to parse the source for that.
I tried telling CV that arrays were really: struct Array { size_t length; void *ptr; } but it would get hopelessly lost with functions that returned arrays.
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
How about using type mangling to add the complete array type to the
symbol name then? The mangled type could be separated from the actual
symbol name by an escape character invalid in D identifiers.
That wouldn't hurt non-D debuggers and would still enable D debuggers to
deal with arrays properly.

Walter Bright wrote:
 Jascha Wetzel wrote:
 Hi,

 examining the codeview symbols that DMD for windows generates, i've
 found that it sets the symbol type of arrays to 0x23 which is UQUAD
 (ulong in D). I can't find information on what the array's element
 type is.
 Can it be determined from the CV data?
 If not, shouldn't that be possible?
 It would be kind of cumbersome to have to parse the source for that.
I tried telling CV that arrays were really: struct Array { size_t length; void *ptr; } but it would get hopelessly lost with functions that returned arrays.
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
thinking about it: why not simply mangle all data symbols just like
function symbols?

Jascha Wetzel wrote:
 How about using type mangling to add the complete array type to the
 symbol name then? The mangled type could be separated from the actual
 symbol name by an escape character invalid in D identifiers.
 That wouldn't hurt non-D debuggers and would still enable D debuggers to
 deal with arrays properly.
 
 Walter Bright wrote:
 Jascha Wetzel wrote:
 Hi,

 examining the codeview symbols that DMD for windows generates, i've
 found that it sets the symbol type of arrays to 0x23 which is UQUAD
 (ulong in D). I can't find information on what the array's element
 type is.
 Can it be determined from the CV data?
 If not, shouldn't that be possible?
 It would be kind of cumbersome to have to parse the source for that.
I tried telling CV that arrays were really: struct Array { size_t length; void *ptr; } but it would get hopelessly lost with functions that returned arrays.
Jan 17 2007
parent reply Thomas Kuehne <thomas-dloop kuehne.cn> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jascha Wetzel schrieb am 2007-01-17:
 thinking about it: why not simply mangle all data symbols just like
 function symbols?
Some "data symbols" never have a proper address apart from a register. "data symbols" in functions don't have a constant address (though they usually have a constant offset). Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFFrmu4LK5blCcjpWoRAhCpAJ4qtohHjD4GS1xiM8ruYVpt6zwDUwCfdSbm j3bMOGzhaQMrsZx8q7JJ8Mw= =D/ob -----END PGP SIGNATURE-----
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Codeview doesn't support register symbols, afaik. Therefore local
symbols always have offsets into the stack frame. Debugging optimized
code is problematic either way...
To be precise, i should have written "stack data symbols". DMD mangles
types for global and local functions as well as global data, but doesn't
do so for local (stack-) data.

Thomas Kuehne wrote:
 Jascha Wetzel schrieb am 2007-01-17:
 thinking about it: why not simply mangle all data symbols just like
 function symbols?
Some "data symbols" never have a proper address apart from a register. "data symbols" in functions don't have a constant address (though they usually have a constant offset). Thomas
Jan 17 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Codeview doesn't support register symbols, afaik.
It does.
 Therefore local
 symbols always have offsets into the stack frame. Debugging optimized
 code is problematic either way...
 To be precise, i should have written "stack data symbols". DMD mangles
 types for global and local functions as well as global data, but doesn't
 do so for local (stack-) data.
The problem really is for function return types.
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Walter Bright wrote:
 The problem really is for function return types.
Does that affect the naming of local symbols? The CV types could be left the way they are and yet the complete array type could be derived from the mangled name. In fact, i'm manually emulating this ATM by using variable names with mangled type information in my debuggee's source: void debugeeFunc() { // char[] test; char[] _D4testAa; // ... } the generated CV symbol looks like this: S_BPREL32 UQUAD [ebp+FFFFFFF8] _D4testAa It's fine to have it declared UQUAD, since with the mangled type info in the symbol name i can interpret the data at runtime properly.
Jan 17 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Walter Bright wrote:
 The problem really is for function return types.
Does that affect the naming of local symbols? The CV types could be left the way they are and yet the complete array type could be derived from the mangled name. In fact, i'm manually emulating this ATM by using variable names with mangled type information in my debuggee's source: void debugeeFunc() { // char[] test; char[] _D4testAa; // ... } the generated CV symbol looks like this: S_BPREL32 UQUAD [ebp+FFFFFFF8] _D4testAa It's fine to have it declared UQUAD, since with the mangled type info in the symbol name i can interpret the data at runtime properly.
That's possible. What debugger are you using?
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Walter Bright wrote:
 That's possible. What debugger are you using?
I'm writing it ;)
Jan 17 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Walter Bright wrote:
 That's possible. What debugger are you using?
I'm writing it ;)
Ahh, I see now.
Jan 17 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Walter Bright wrote:
 Jascha Wetzel wrote:
 Walter Bright wrote:
 That's possible. What debugger are you using?
I'm writing it ;)
Ahh, I see now.
I've put together an alpha version of my debugger, that uses mangled types, if available, in all symbol names to interpret the data at runtime: http://mainia.de/ddbg-0.1-alpha.zip Included is a test program that declares some of it's variables with mangled typenames, as i explained before. Some array expression evaluation already works if enough type information is available. Slicing only works for char arrays, yet. Here is an example debug session with the files from the above archive: C:>ddbg.exe debuggee.exe Process started ntdll.dll loaded KERNEL32.dll loaded USER32.dll loaded GDI32.dll loaded Unknown breakpoint hit at 0x7C901230 ->lsm src\debuggee.d ->bp deb:45 Breakpoint set: src\debuggee.d:45 0x40217c ->r IMM32.dll loaded ADVAPI32.dll loaded RPCRT4.dll loaded LPK.dll loaded USP10.dll loaded msvcrt.dll loaded Breakpoint 0 hit src\debuggee.d:45 0x40217c if ( _D4argsAAa.length > 1 ) ->ov src\debuggee.d:48 0x40219d writefln("No arguments today, boring..."); -> No arguments today, boring... src\debuggee.d:50 0x4021b6 uint[] _D4testGk = [0xdeadbeef, 0xbaadf00d, 0xf00baaa]; -> src\debuggee.d:51 0x4021db int index = 1; -> src\debuggee.d:52 0x4021e2 uint num_chars = printArgs(_D4argsAAa); ->in src\debuggee.d:24 0x402080 uint printArgs(char[][] _D4argsAAa) ->bp deb:38 Breakpoint set: src\debuggee.d:38 0x402157 ->r Documents\working\ddbg\debuggee.exe Breakpoint 1 hit src\debuggee.d:38 0x402157 qwer q = new asdf; ->lsv Scope: uint debuggee.printArgs(char[][]) char[][] args [ebp+8] = ["C:\Documents and Settings\jascha\My Documents\working\ddbg\debuggee.exe"] uint numchars [ebp-72] = 0x0000004f char[] test [ebp-64] = "asdfqwer1234" float[] ztui [ebp-56] = [234.657806] void* _TMP1 [ebp-48] = 0x0012ff54 ulong _TMP2 [ebp-40] = 0x0012ff5400000001 int i [ebp-32] = 1 char[] a [ebp-24] = "C:\Documents and Settings\jascha\My Documents\working\ddbg\debuggee.exe" char[] numstr [ebp-16] = "0" [ebp-8] = Symbol q has unknown type (ddl says: [custom: 0x1003] q [bp-8]) ->= ztui[0] 234.657806 ->= args[0][0..12] "C:\Documents" ->da src\debuggee.d:38 qwer q = new asdf; 00402157: 6808414100 push dword 0x414108 0040215c: e8e3010000 call 0x402344 00402161: 8945f8 mov [ebp-0x8], eax src\debuggee.d:39 q.test2(); 00402164: 8b18 mov ebx, [eax] 00402166: ff5318 call dword near [ebx+0x18] src\debuggee.d:40 return numchars; 00402169: 8b45b8 mov eax, [ebp-0x48] 0040216c: 83c404 add esp, 0x4 src\debuggee.d:41 } 0040216f: 5f pop edi 00402170: 5e pop esi 00402171: 5b pop ebx 00402172: c9 leave 00402173: c20800 ret 0x8 ->q
Jan 21 2007
next sibling parent reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Excellent!  Keep up the good work!
So how far are you from having a fully usable debugger that works with 
plain, unmangled D source?

--bb

Jascha Wetzel wrote:
 Walter Bright wrote:
 Jascha Wetzel wrote:
 Walter Bright wrote:
 That's possible. What debugger are you using?
I'm writing it ;)
Ahh, I see now.
I've put together an alpha version of my debugger, that uses mangled types, if available, in all symbol names to interpret the data at runtime: http://mainia.de/ddbg-0.1-alpha.zip Included is a test program that declares some of it's variables with mangled typenames, as i explained before. Some array expression evaluation already works if enough type information is available. Slicing only works for char arrays, yet. Here is an example debug session with the files from the above archive: C:>ddbg.exe debuggee.exe Process started ntdll.dll loaded KERNEL32.dll loaded USER32.dll loaded GDI32.dll loaded Unknown breakpoint hit at 0x7C901230 ->lsm src\debuggee.d ->bp deb:45 Breakpoint set: src\debuggee.d:45 0x40217c ->r IMM32.dll loaded ADVAPI32.dll loaded RPCRT4.dll loaded LPK.dll loaded USP10.dll loaded msvcrt.dll loaded Breakpoint 0 hit src\debuggee.d:45 0x40217c if ( _D4argsAAa.length > 1 ) ->ov src\debuggee.d:48 0x40219d writefln("No arguments today, boring..."); -> No arguments today, boring... src\debuggee.d:50 0x4021b6 uint[] _D4testGk = [0xdeadbeef, 0xbaadf00d, 0xf00baaa]; -> src\debuggee.d:51 0x4021db int index = 1; -> src\debuggee.d:52 0x4021e2 uint num_chars = printArgs(_D4argsAAa); ->in src\debuggee.d:24 0x402080 uint printArgs(char[][] _D4argsAAa) ->bp deb:38 Breakpoint set: src\debuggee.d:38 0x402157 ->r Documents\working\ddbg\debuggee.exe Breakpoint 1 hit src\debuggee.d:38 0x402157 qwer q = new asdf; ->lsv Scope: uint debuggee.printArgs(char[][]) char[][] args [ebp+8] = ["C:\Documents and Settings\jascha\My Documents\working\ddbg\debuggee.exe"] uint numchars [ebp-72] = 0x0000004f char[] test [ebp-64] = "asdfqwer1234" float[] ztui [ebp-56] = [234.657806] void* _TMP1 [ebp-48] = 0x0012ff54 ulong _TMP2 [ebp-40] = 0x0012ff5400000001 int i [ebp-32] = 1 char[] a [ebp-24] = "C:\Documents and Settings\jascha\My Documents\working\ddbg\debuggee.exe" char[] numstr [ebp-16] = "0" [ebp-8] = Symbol q has unknown type (ddl says: [custom: 0x1003] q [bp-8]) ->= ztui[0] 234.657806 ->= args[0][0..12] "C:\Documents" ->da src\debuggee.d:38 qwer q = new asdf; 00402157: 6808414100 push dword 0x414108 0040215c: e8e3010000 call 0x402344 00402161: 8945f8 mov [ebp-0x8], eax src\debuggee.d:39 q.test2(); 00402164: 8b18 mov ebx, [eax] 00402166: ff5318 call dword near [ebx+0x18] src\debuggee.d:40 return numchars; 00402169: 8b45b8 mov eax, [ebp-0x48] 0040216c: 83c404 add esp, 0x4 src\debuggee.d:41 } 0040216f: 5f pop edi 00402170: 5e pop esi 00402171: 5b pop ebx 00402172: c9 leave 00402173: c20800 ret 0x8 ->q
Jan 21 2007
parent Jascha Wetzel <"[firstname]" mainia.de> writes:
Bill Baxter wrote:
 Excellent!  Keep up the good work!
 So how far are you from having a fully usable debugger that works with
 plain, unmangled D source?
thx! that depends on what you think is usable ;) you can use the alpha version with unmangled code, it will not pretty-print arrays (shows them as ulong and you'll have to use memory dumps to see the content), and thus array (debugger-)expressions can't be evaluated either, but it should work otherwise. besides stability, missing features are: - CV custom types (i.e. struct/classes cannot be interpreted) - debugging multiple threads isn't tested, could work though - debugging child processes isn't tested and will probably not work - attaching to running processes - considering nested scopes (when unwinding the stack) - more important stuff, that i don't think of right now - conditional breakpoints - module names for addresses outside the debug info - lots of usability features (watchlists, etc.) i'm also writing codeblocks integration, which isn't working quite right, yet.
Jan 22 2007
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Walter Bright wrote:
 Jascha Wetzel wrote:
 Walter Bright wrote:
 That's possible. What debugger are you using?
I'm writing it ;)
Ahh, I see now.
I've put together an alpha version of my debugger, that uses mangled types, if available, in all symbol names to interpret the data at runtime: http://mainia.de/ddbg-0.1-alpha.zip
Can you please put together a web page for this? It's too important to be just a link to a zip file buried in a thread!
Jan 22 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Walter Bright wrote:
 Can you please put together a web page for this? It's too important to
 be just a link to a zip file buried in a thread!
done: http://ddbg.mainia.de/ shall i open a bugzilla issue for the type mangling of local symbols?
Jan 22 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Walter Bright wrote:
 Can you please put together a web page for this? It's too important to
 be just a link to a zip file buried in a thread!
done: http://ddbg.mainia.de/ shall i open a bugzilla issue for the type mangling of local symbols?
Sure, but I suggest a better way - add our own extension to CV data.
Jan 22 2007
parent reply Jascha Wetzel <"[firstname]" mainia.de> writes:
Walter Bright wrote:
 Sure, but I suggest a better way - add our own extension to CV data.
ok, i can think of type leafs similar to this (notation according to CV specs): size content 2 LF_DYN_ARRAY (define as 0x0017) 2 elemtype and 2 LF_ASSOC_ARRAY (define as 0x0018) 2 elemtype 2 keytype couldn't that also trigger the problem with functions returning such types?
Jan 23 2007
parent reply Walter Bright <newshound digitalmars.com> writes:
Jascha Wetzel wrote:
 Walter Bright wrote:
 Sure, but I suggest a better way - add our own extension to CV data.
ok, i can think of type leafs similar to this (notation according to CV specs): size content 2 LF_DYN_ARRAY (define as 0x0017) 2 elemtype and 2 LF_ASSOC_ARRAY (define as 0x0018) 2 elemtype 2 keytype couldn't that also trigger the problem with functions returning such types?
No, because you can fix your debugger. What I'd do is the same thing I do with gdb, -g generates D debug info and needs a debugger aware of it, -gc generates C debug info for debuggers that aren't. In this case, the C debug info will not change.
Jan 23 2007
next sibling parent reply Lars Ivar Igesund <larsivar igesund.net> writes:
Walter Bright wrote:

 Jascha Wetzel wrote:
 Walter Bright wrote:
 Sure, but I suggest a better way - add our own extension to CV data.
ok, i can think of type leafs similar to this (notation according to CV specs): size content 2 LF_DYN_ARRAY (define as 0x0017) 2 elemtype and 2 LF_ASSOC_ARRAY (define as 0x0018) 2 elemtype 2 keytype couldn't that also trigger the problem with functions returning such types?
No, because you can fix your debugger. What I'd do is the same thing I do with gdb, -g generates D debug info and needs a debugger aware of it,
Except that the stack backtrace is broken using -g (although it is possible to find _some_ linenumbers). Dropping -g works much better, although no line numbers are present. For reference, http://d.puremagic.com/issues/show_bug.cgi?id=136 -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi Dancing the Tango
Jan 24 2007
parent reply "[firstname]" <"[firstname]" mainia.de> writes:
Lars Ivar Igesund wrote:
 Walter Bright wrote:
 Except that the stack backtrace is broken using -g (although it is possible
 to find _some_ linenumbers). Dropping -g works much better, although no
 line numbers are present.
this problem is probably elf/dwarf related. the codeview stuff is win32 only.
Jan 24 2007
parent Lars Ivar Igesund <larsivar igesund.net> writes:
[firstname] wrote:

 Lars Ivar Igesund wrote:
 Walter Bright wrote:
 Except that the stack backtrace is broken using -g (although it is
 possible to find _some_ linenumbers). Dropping -g works much better,
 although no line numbers are present.
this problem is probably elf/dwarf related. the codeview stuff is win32 only.
True, this is a DMD for Linux problem. I thought it was suggested earlier in the thread, but possibly not :) -- Lars Ivar Igesund blog at http://larsivi.net DSource & #D: larsivi Dancing the Tango
Jan 24 2007
prev sibling parent Jascha Wetzel <jascha mainia.de> writes:
Walter Bright wrote:
 No, because you can fix your debugger. What I'd do is the same thing I 
 do with gdb, -g generates D debug info and needs a debugger aware of it, 
 -gc generates C debug info for debuggers that aren't. In this case, the 
 C debug info will not change.
ok, great! so, are the custom type leafs the kind of extension you have in mind? i'd prepare my code for that, then.
Jan 24 2007