www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SAOC LLDB D integration: 1st Weekly Update

reply =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
Hi D community!

I'm here to describe what I've done during the first week on the
Symmetry
Autumn of Code.



During the discussion for the milestones plan with my mentor, I decided
to
advance some work and wrote a simple C API around D runtime demangler
to expose
the D demangler API into a C interface. This would allow in the future
to
implement an LLDB language plugin into the LLVM. The source code is
available
on Github,
[https://github.com/ljmf00/liblldbd](liblldbd).



In the meanwhile, we decided to focus on porting libiberty demangler
codebase
to the LLVM upstream repository since it would provide much more
benefits and
acceptance to be upstreamed. So the `liblldbd` is a plan B if libiberty
is not
accepted by the LLVM team.



Right after we finished the plan, in which you can follow up
[here](https://pad.riseup.net/p/r.05c919765a66f89368a3fc28c98432db), I
started
porting `libiberty` and integrate the code into the LLVM core.
Similarly to
Rust demangler, I tried to follow up some patches on the [LLVM review
platform](https://reviews.llvm.org/) and the awesome documentation that
LLVM
provides.

This ended up being relatively easy to plug into the LLVM codebase,
since most
of the demangler logic was isolated in one file, thanks to Iain
( ibuclaw) for
the excelent code. Because I didn't expect this to be so plug and play
I
decided to extensively test the code using the robust test suite that
LLVM
provides.



First, I started to port the `libiberty` test suite for D demangling
and right
after wrote some `libfuzzer` tests and ran it with an address sanitizer
and UB
sanitizer.



The `libfuzzer` results took some time to show up but I got some
interesting
outputs from there. The most interesting one was a heap/stack buffer
overflow.
I also managed to find a null dereferencing.  Both, with a crafted
malicious
mangle name, can trigger a segmentation fault or undefined behaviour by
reading/writing to a protected memory space.

I wrote a patch to fix both issues and contacted MITRE for standard
vulnerabilities reporting procedure, since GCC is widely used and can
potentially cause some issues. I pushed those patches into the GCC
mailing
list, and I'm currently waiting for appreciation. You can check those
two
patches
[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579985.html
)
and
[here](https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579987.html
).

After patching the code I ran the fuzzer again and after some hours the
fuzzer
reported a timeout with a huge number of recursive calls. I carefully
analyzed
the generated output mangle that the fuzzer created and found out that
it is a
very repetitive name. Doing some superficial analysis I found out that
those
recursive calls are creating exponential time complexity and can cause
the
demangler to wait for hours or even days to complete. I believe that
this can
also be used to maliciously cause a denial of service, although I
didn't have
much time to profile it yet.

To have some discussion about this I'm going to create a thread on the
GCC
security mailing list and express some solutions to mitigate those
problems,
such as integrating part of the codebase into the OSS fuzzer.

Before that, I'm waiting for a reply to the message I sent to MITRE,
which was
forwarded to Red Hat security team for further appreciation.

I don't really know if this is crucial to share now, but I saved the
fuzzer
result, if anyone is interested in researching more ideas of crafted
mangles to
feed the address/UB sanitizer.



The last task I was working on (today) was on finalizing the LLDB
integration.
I still need to write some tests but the most important fact is that it
is
already working! My LLDB tree can successfully pretty print the mangled
names.
My fork is available on my Github,
[here](https://github.com/ljmf00/llvm-project/tree/add-d-demangler).



=46rom the first time I built LLVM I found out that compiling it with
debug
information is extremely costly in terms of memory usage, since linking
all
those symbols at once can consume a lot of RAM. I recommend you build
it with
`Release` flags.

Here is my `cmake` config so far, if someone wants to test my work at
any
point.
```
cmake -S llvm -B build -G Ninja \
       -DLLVM_ENABLE_PROJECTS=3D"clang;libcxx;libcxxabi;lldb" \
       -DCMAKE_BUILD_TYPE=3DRelease \
       -DLLDB_EXPORT_ALL_SYMBOLS=3D0 \
       -DLLVM_ENABLE_ASSERTIONS=3DON \
       -DLLVM_CCACHE_BUILD=3DON \
       -DLLVM_LINK_LLVM_DYLIB=3DON \
       -DCLANG_LINK_CLANG_DYLIB=3DON
```

To build LLDB, you can do something like:

```
cmake --build build -- lldb -j$(nproc --all)
```



Next week, I'm going to have an eye on the time complexity problem, try
to
solve it, restructure the code to look a bit more C++ish and finishing
the LLDB
test suite to finally start upstreaming my changes.  Although, this can
take a
while, since there is a challenge, described in the plan, which is
dual-licensing the GCC codebase with LLVM codebase. This is
cooperatively being
handled by Mathias (my mentor), Iain and GCC team.


--=20
Sincerely,
Lu=C3=ADs Ferreira   lsferreira.net
Sep 22 2021
parent reply user1234 <user1234 12.de> writes:
On Wednesday, 22 September 2021 at 20:11:56 UTC, Luís Ferreira 
wrote:
 Hi D community!

 I'm here to describe what I've done during the first week on the
 Symmetry
 Autumn of Code.

 [...]
Nice project, I'll follow. so in theory LLMDB has the same "machine interface" as GDB ?
Sep 23 2021
parent =?ISO-8859-1?Q?Lu=EDs?= Ferreira <contact lsferreira.net> writes:
What do you mean by "machine interface"? Can you elaborate a bit more,
please?

On Thu, 2021-09-23 at 18:32 +0000, user1234 via Digitalmars-d wrote:
 On Wednesday, 22 September 2021 at 20:11:56 UTC, Lu=C3=ADs Ferreira=20
 wrote:
 Hi D community!
=20
 I'm here to describe what I've done during the first week on the
 Symmetry
 Autumn of Code.
=20
 [...]
=20 Nice project, I'll follow. so in theory LLMDB has the same "machine interface" as GDB ?
--=20 Sincerely, Lu=C3=ADs Ferreira lsferreira.net
Sep 23 2021