www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SAOC LLDB D integration: 2nd Weekly Update

reply =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
Hi D community!

Sorry for being late this week.

I'm here again, to describe what I've done during the second week of
Symmetry
Autumn of Code.



Last week some missing pieces on the test suite and on the LLDB side
was
concluded at the beginning of this week and put everything together.



After successfully integrating `libiberty` D demangler into LLVM and
before
sending the patches to LLVM, code style needed to be properly handled
to
conform with `clang-format` style of LLVM, so, I decided to transform
the code
to be more C++ like:

- Move functions with `struct dlang_info` context to a struct making
them
  member functions to implicitly pass the context
- Make string handling on the demangler a bit more C++ish (class
OutputString)
- Fix structural codestyle to conform with clang formatting such as
variables
  names, spaces between identifiers, etc...

I also ended up writing documentation for everything inside the string
and
demangler struct for future understanding.



Right after having the codestyle finished, I submitted the patches into
the
LLVM review platform. In the meantime, I'm striving for acceptance and
proactively changing the patches to accomplish with the LLVM
maintainers
requests.

The first patch introduces the demangler codebase with the ported code,
available [here](https://reviews.llvm.org/D110578). The second patch
enables
support for `llvm-cxxfilt` tool, similar to `c++filt` from GNU
binutils,
available [here](https://reviews.llvm.org/D110576). Finally, the last
patch
enables the most important part for the users, the LLDB part. The patch
is
available [here](https://reviews.llvm.org/D110577).



Meanwhile, I found some things to improve on the `libiberty` side that
I
changed on my patches to LLVM:

- Use appendc for single chars append:
=20
[patch](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580512.htm=
l
).
- Remove parenthesis where it is not needed:
=20
[patch](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580525.htm=
l
).
- Rename function symbols to be more consistent:
=20
[patch](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580542.htm=
l
).
- Use switch instead of if-else:
=20
[patch](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580545.htm=
l
).

I also made this patch which fixes the testsuite that I previously
broke on the
security patches.

- Add missing format on d-demangle-expected:
=20
[patch](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580544.htm=
l
).



I made a thread on the GCC mailing list to encourage more fuzzing.
Currently
the demangler is being fuzzed without any heuristics which makes it
inefficient
to search for real security vulnerabilities. Instead, AFL and libfuzzer
should
be taken to consideration. My idea is to also add support for
GCC/libiberty to
OSS Fuzz. You can check the thread
[here](https://gcc.gnu.org/pipermail/gcc/2021-September/237442.html)
and
participate if you have any questions or suggestions on that topic.

About the exponential time complexity issue, I don't have any news,
since I
still don't have the full picture of it. I'm probably not going to
dedicate
much time to that since it's kinda out of the scope of this project.
Although,
if anyone wants to have a look and discuss hints and suggestions to
improve the
current demangler, I appreciate it.
[Here](http://ipfs.io/ipfs/bafybeihw6bk46r7gnkp6estkwk7ucilxb2swlwzzi2izpyt=
aclypxeu2wq/
)
are the blobs generated by the fuzzer for timeout and slow-unit
triggers.



For now, I'm going to proactively fix the requested changes in the LLVM
patches. They seem to require smaller patches and probably the next
week will
be dedicated to that.

You can see this also on my blog, since my email client doesn't like 80
line splitted text: https://lsferreira.net/posts/d-saoc-2021-02

--=20
Sincerely,
Lu=C3=ADs Ferreira   lsferreira.net
Sep 30 2021
next sibling parent reply James Blachly <james.blachly gmail.com> writes:
On 9/30/21 7:05 PM, Luís Ferreira wrote:
 The first patch introduces the demangler codebase with the ported code,
 available [here](https://reviews.llvm.org/D110578). The second patch
 enables
 support for `llvm-cxxfilt` tool, similar to `c++filt` from GNU
 binutils,
 available [here](https://reviews.llvm.org/D110576). Finally, the last
 patch
 enables the most important part for the users, the LLDB part. The patch
 is
 available [here](https://reviews.llvm.org/D110577).
 
Congratulations on getting patches accepted to LLVM! Your LLDB work is incredibly important and exciting -- thank you and keep going!
Sep 30 2021
parent =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
On Thu, 2021-09-30 at 20:38 -0400, James Blachly via Digitalmars-d
wrote:
 On 9/30/21 7:05 PM, Lu=C3=ADs Ferreira wrote:
 The first patch introduces the demangler codebase with the ported
 code,
 available [here](https://reviews.llvm.org/D110578). The second
 patch
 enables
 support for `llvm-cxxfilt` tool, similar to `c++filt` from GNU
 binutils,
 available [here](https://reviews.llvm.org/D110576). Finally, the
 last
 patch
 enables the most important part for the users, the LLDB part. The
 patch
 is
 available [here](https://reviews.llvm.org/D110577).
=20
=20 Congratulations on getting patches accepted to LLVM! =20 Your LLDB work is incredibly important and exciting -- thank you and=20 keep going!
Thanks for your inspiring words! --=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.net
Oct 03 2021
prev sibling next sibling parent reply WebFreak001 <d.forum webfreak.org> writes:
On Thursday, 30 September 2021 at 23:05:21 UTC, Luís Ferreira 
wrote:
 Hi D community!

 Sorry for being late this week.

 I'm here again, to describe what I've done during the second 
 week of
 Symmetry
 Autumn of Code.

 [...]
Awesome! I had looked at trying to implement this before, but haven't really gotten further than seeing where to add the enum entry. Great to see you tackle this, I think this is already making LLDB the best D debugger on Linux. BTW I have made pretty printers for LLDB in the past (https://github.com/Pure-D/dlang-debug/) to print objects, arrays, strings, etc. much better. If you want to implement something like that, might be worth looking at that.
Oct 01 2021
parent =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
On Fri, 2021-10-01 at 07:15 +0000, WebFreak001 via Digitalmars-d wrote:
 On Thursday, 30 September 2021 at 23:05:21 UTC, Lu=C3=ADs Ferreira=20
 wrote:
 Hi D community!
=20
 Sorry for being late this week.
=20
 I'm here again, to describe what I've done during the second=20
 week of
 Symmetry
 Autumn of Code.
=20
 [...]
=20 Awesome! I had looked at trying to implement this before, but=20 haven't really gotten further than seeing where to add the enum=20 entry. Great to see you tackle this, I think this is already=20 making LLDB the best D debugger on Linux. =20 BTW I have made pretty printers for LLDB in the past=20 (https://github.com/Pure-D/dlang-debug/) to print objects,=20 arrays, strings, etc. much better. If you want to implement=20 something like that, might be worth looking at that.
Thanks for your words and valuable resources! I already took a quick look at it. This is my plan to implement on the second milestone. The thing I'm kinda skeptical is some ABI assumptions that are currently not standerdized, such as the associative arrays and the symbol name exported on DWARF by DMD. Currently LDC uses fully qualified names for array types (in fact, every symbol) but DMD uses _Array_<primitive_type> or _Array_struct for custom agreggate types and so on. I'm studying this and other similar stuff and push some more standerization in that regard or, at least, consistency among the existing compilers. My idea is to at least support struct and strings pretty print. I can trigger a discussion about standerdizing the AA's ABI. I'm also searching if anything can be done to read vtables correctly, but I don't know how that is handled by DWARF and don't have a full picture of the ABI structure for that. Anyway, that is definitely something tackle next. --=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.net
Oct 03 2021
prev sibling parent reply James Blachly <james.blachly gmail.com> writes:
On 9/30/21 7:05 PM, Luís Ferreira wrote:

 
 For now, I'm going to proactively fix the requested changes in the LLVM
 patches. They seem to require smaller patches and probably the next
 week will
 be dedicated to that.
Luís: I think it's a little bit surprising and disappointing that they want such granular breakdown, but they required the same of the recent Rust demangler [0] as well, so at least they are applying their rule fairly consistently. There is also the licensing issue which raised their hackles and you'll have to deal with. One potential strategy to sidestep the licensing issue AND to make the breakdown task much easier, is to abandon the libiberty code and literally take each of the Rust patches [1] and straight port (adding and subtracting cases as needed) for D demangling. Rust Demangler: [0] https://github.com/llvm/llvm-project/blob/main/llvm/lib/Demangle/RustDemangle.cpp Consecutive patch history: [1] https://github.com/llvm/llvm-project/commits/main/llvm/lib/Demangle/RustDemangle.cpp
Oct 01 2021
parent =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
On Fri, 2021-10-01 at 14:02 -0400, James Blachly via Digitalmars-d
wrote:
 On 9/30/21 7:05 PM, Lu=C3=ADs Ferreira wrote:

=20
 For now, I'm going to proactively fix the requested changes in the
 LLVM
 patches. They seem to require smaller patches and probably the next
 week will
 be dedicated to that.
=20 Lu=C3=ADs: =20 I think it's a little bit surprising and disappointing that they want such granular breakdown, but they required the same of the recent Rust=20 demangler [0] as well, so at least they are applying their rule fairly=20 consistently.
I kinda understand their point, since huge changes can be difficult to review, although, for my side, this can also be time consuming, but I guess I have no choice. It is at least a reasonable rationale since LLVM is a big project and code introduced to it should be take with a bit of caution.
 There is also the licensing issue which raised their hackles and
 you'll=20
 have to deal with.
=20
 One potential strategy to sidestep the licensing issue AND to make
 the=20
 breakdown task much easier, is to abandon the libiberty code and=20
 literally take each of the Rust patches [1] and straight port (adding
 and subtracting cases as needed) for D demangling.
=20
 Rust Demangler:
 [0]=20
 https://github.com/llvm/llvm-project/blob/main/llvm/lib/Demangle/RustDema=
ngle.cpp
=20
 Consecutive patch history:
 [1]=20
 https://github.com/llvm/llvm-project/commits/main/llvm/lib/Demangle/RustD=
emangle.cpp They are not being against relicensing, which is good, and since Iain is also on our side, I think that this is no longer a problem, I hope. I also have half of a demangler written in D kinda from scratch to substitute the current core.demangle, which I can implement it as a ultimate plan Z but that choice is also very time consuming and probably not feasable in the proposed time range. --=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.net
Oct 03 2021