www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 23903] New: Demangling produces exponentially long output

https://issues.dlang.org/show_bug.cgi?id=23903

          Issue ID: 23903
           Summary: Demangling produces exponentially long output
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Keywords: mangling
          Severity: major
          Priority: P1
         Component: druntime
          Assignee: nobody puremagic.com
          Reporter: dlang-bugzilla thecybershadow.net

Consider the following D program:

///////////////// test.d ////////////////
auto fun(T)(T t = T.init)
{
    struct S { T t; }
    return S(t);
}

import core.demangle;

alias f1 = fun!int;
pragma(msg, f1.mangleof.demangle.length);
alias f2 = fun!(typeof(f1()));
pragma(msg, f2.mangleof.demangle.length);
alias f3 = fun!(typeof(f2()));
pragma(msg, f3.mangleof.demangle.length);
alias f4 = fun!(typeof(f3()));
pragma(msg, f4.mangleof.demangle.length);
alias f5 = fun!(typeof(f4()));
pragma(msg, f5.mangleof.demangle.length);
/////////////////////////////////////////

Note that although the amount of code grows linearly, the length of the
demangled string grows exponentially.

Some problems which results from this:

1. When the program crashes, it is very slow to print its stack trace (easily
in the order of minutes). This makes debugging difficult.

2. Debuggers (at least, GDB) will run out of memory and crash when attempting
to debug a program using such constructs. This also makes debugging difficult.

Chaining IFTI functions and voldemort types, which I think is idiomatic D code,
is one way for this problem to be manifested. One practical use case where I
have constructed such a program by accident is a functor-based text formatting
package (which encapsulates values to be formatted into self-contained objects
and allows building simple expression trees of them without memory
allocations).

However, it is not difficult to run into it in other ways. The underlying
problem is that the relationship between language constructs is a DAG; however,
the demangled string version is essentially a walk of the DAG through every
possible path, which is exponential in complexity. Here is a more explicit
example which illustrates this:

//////////////// test.d ////////////////
struct T(alias A, alias B) {}

struct T0 {}

alias T1 = T!(T0, T0);
alias T2 = T!(T1, T1);
alias T3 = T!(T2, T2);
alias T4 = T!(T3, T3);
alias T5 = T!(T4, T4);
alias T6 = T!(T5, T5);
alias T7 = T!(T6, T6);
alias T8 = T!(T7, T7);
alias T9 = T!(T8, T8);

pragma(msg, T9.mangleof.length); // 174
pragma(msg, T9.stringof.length); // 4090
////////////////////////////////////////

The mangled version of the symbols does not have this problem, and grows
linearly; it seems to be using backreferences to avoid this problem there.

Ignoring the DAG nature of the relationships of the mangled objects is a
mis-design of the demangling algorithm, and is fundamentally incorrect! Not
only does it cause issues such as the above, it makes debugging non-trivial
template code difficult due to the verbosity of the output, and it allows the
equivalent of ZIP-bombs (https://en.wikipedia.org/wiki/Zip_bomb).

To fix this problem, the demangling algorithm (in Druntime and in debuggers)
should not expand long back-references, emitting the DAG as it is. For example,
instead of:

pure nothrow  nogc  safe
test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S).S
test.fun!(test.fun!(int).fun(int).S).fun(test.fun!(int).fun(int).S)

it could write e.g.:

pure nothrow  nogc  safe test.fun!(T1).fun(__1).S test.fun!(__1).fun(T1);
__1=test.fun!(int).fun(int).S

--
May 07 2023