www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 859] New: Improve compiler inlining

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859

           Summary: Improve compiler inlining
           Product: D
           Version: 1.00
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: digitalmars-com baysmith.com


Compiler inlining of functions gives much worse performance than manually
inlined functions (at least in some cases). In the attached example, the
performance is 6 times slower.

C:\>dmd -O -inline -release -g testinline.d
C:\>testinline.exe
compiler inlined time: 374058
manually inlined time: 61362

C:\>obj2asm testinline.obj -ctestinline.asm

See line 486 for the compiler inlined code
See line 544 for the manually inlined code

The compiler inlined code extra instructions like the following:
        lea     ESI,-080h[EBP]
        lea     EDI,-048h[EBP]
        movsd
        movsd
        movsd
        lea     ESI,-074h[EBP]
        lea     EDI,-03Ch[EBP]
        movsd
        movsd
        movsd

These instructions are absent in the manually inlined code, and may be the
cause of the poor performance.


-- 
Jan 19 2007
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859






Created an attachment (id=92)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=92&action=view)
Example to test inlining of a simple function


-- 
Jan 19 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859






Created an attachment (id=93)
 --> (http://d.puremagic.com/issues/attachment.cgi?id=93&action=view)
Assembly code from the example


-- 
Jan 19 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


digitalmars-com baysmith.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------

               type|                            |




-- 
Jan 19 2007
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


Leandro Lucarella <llucax gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |performance
                 CC|                            |llucax gmail.com
           Platform|x86                         |All
            Version|1.00                        |D1 & D2
         OS/Version|Windows                     |All



PDT ---
To avoid opening a new bug, I'll reuse this ancient bug report, since the
summary is pretty much the same I'll write for this.

I'm having some performance problems moving some stuff from a lower-level
C-style to a higher-lever D-style. Here is an example:

---
int find_if(bool delegate(ref int) predicate)
{
        for (int i = 0; i < 100; i++)
                if (predicate(i))
                        return i;
        return -1;
}

int main()
{
//      for (int i = 0; i < 100; i++)
//              if (i == 99)
//                      return i;
//      return -1;
        return find_if((ref int i) { return i == 99; });
}
---

The program produced by this source executes 4 times more instructions than the
more direct (lower-level) version commented out. I would expect DMD to inline
all functions/delegates and produce the same asm for both, but that's not the
case.

This is a reduced test-case, but I'm working on improving the GC and I'm really
hitting this problem. If I use this higher-level style in the GC, a Dil run for
generating the Tango docs is 3.33 times slower than the C-ish style used by the
current GC.

So I think this is a real problem for D, it's really important to be able to
encourage people to use the higher-level D constructs.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 27 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


nfxjfg gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg gmail.com



 Leandro Lucarella: ldc seems to inline the predicate just fine, although the
generated code is still slightly different.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 27 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859




PDT ---

  Leandro Lucarella: ldc seems to inline the predicate just fine, although the
 generated code is still slightly different.
Yes, LDC is better at inlining because it doesn't use the front-end inlining code, it let the LLVM optimizer do the job instead (I think they inhibited the DMDFE inliner precisely because of this issues). This bug report is about the DMD implementation. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 27 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc



An improved version of the test program, that allows to compare dmd and ldc on
this inlining problem:


version (Tango) {
    import tango.stdc.stdio: printf;
    import tango.stdc.stdlib: atof;
} else {
    import std.c.stdio: printf;
    import std.c.stdlib: atof;
}

struct Vec3 {
    float x, y, z;
}

float dot(Vec3 A, Vec3 B) {
    return A.x * B.x + A.y * B.y + A.z * B.z;
}

struct Timer {
    long starttime;

    static long getTime() {
        asm {
            naked;
            rdtsc;
            ret;
        }
    }

    void start() {
        starttime = getTime();
    }

    void stop() {
        long endTime = getTime();
        printf("time: %lld\n", endTime - starttime);
    }
}

void main() {
    int n = 30_000;
    Vec3 a = Vec3(atof("1.0"), atof("2.0"), atof("3.0"));
    Vec3 b = Vec3(atof("4.0"), atof("5.0"), atof("6.0"));
    Timer t;
    float sum;

    printf("    Auto inlined ");
    sum = 0.0;
    t.start();
    for (int i; i < n; i++) {
        a.x++;
        a.y++;
        a.z++;
        sum += dot(a, b);
    }
    t.stop();
    printf("sum: %f\n", sum);

    printf("Manually inlined ");
    sum = 0.0;
    t.start();;
    for (int i; i < n; i++) {
        a.x++;
        a.y++;
        a.z++;
        sum += a.x * b.x + a.y * b.y + a.z * b.z;
    }
    t.stop();
    printf("sum: %f\n", sum);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 08 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


Brad Roberts <braddr puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |braddr puremagic.com
            Summary|Improve compiler inlining   |float vector codegen after
                   |                            |inlining very different
                   |                            |from manual inlined code



---
Guys, piling more stuff into a bug report isn't a good idea.  In fact, I need
to re-classify this bug since its not a problem with inlining at all.  The call
to DOT in the original code _is_ being inlined.  The resulting code is
different than the manually inlined version, but the code IS inlined.

While they might be the same, they're different enough right now to call them
different bugs.  I just split the new report into bug 4440

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 08 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859




---
This was fixed by the changes that fixed bug 2008.  This report passes static
arrays as a parameter which was one of the things that caused the inliner to
reject a function.

I'm going to close this bug.

I've opened bug 4447 to track a remaining issue regarding oddities involving
the first function taking significantly longer to execute, regardless of which
it is.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 11 2010
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=859


Brad Roberts <braddr puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 11 2010