www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Increasing D Compiler Speed by Over 75%

reply Walter Bright <newshound2 digitalmars.com> writes:
http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
Jul 25 2013
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
I propose we always refer to compiling as "doing the nasty" from this moment forward.
Jul 25 2013
parent Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On Thu, 25 Jul 2013 20:04:10 +0200
"Brad Anderson" <eco gnuk.net> wrote:

 On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
I propose we always refer to compiling as "doing the nasty" from this moment forward.
Yea, that's just absolutely classic :)
Jul 25 2013
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 25.07.2013 20:03, schrieb Walter Bright:
 http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
do you compare dmc based and visualc based dmd builds? the vc dmd build seems to be always two times faster - how does that look with your optimization?
Jul 26 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/26/2013 1:25 AM, dennis luehring wrote:
 do you compare dmc based and visualc based dmd builds?
 the vc dmd build seems to be always two times faster - how does that look with
 your optimization?
It would be most interesting to see just what it was that made the vc build faster. But that won't help on Linux/FreeBSD/OSX.
Jul 26 2013
parent reply "Temtaime" <temtaime gmail.com> writes:
DMC is ugly compiler.
It will be much nicer if you'll use mingw for that purpose on 
Windows. GCC usually generates more faster code that VC does.
http://sourceforge.net/projects/mingwbuilds/
Jul 30 2013
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Tuesday, 30 July 2013 at 09:04:10 UTC, Temtaime wrote:
 DMC is ugly compiler.
 It will be much nicer if you'll use mingw for that purpose on 
 Windows. GCC usually generates more faster code that VC does.
 http://sourceforge.net/projects/mingwbuilds/
I'm willing to bet Walter would accept pull requests to add support for mingw like he did with VC. Be sure to document the build process when you make the changes. Sidenote: Insulting Walter's work isn't a great way to get him to do your a favor.
Jul 30 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 7/30/2013 11:16 AM, Brad Anderson wrote:
 Sidenote: Insulting Walter's work isn't a great way to get him to do your a
favor.
I'm sad that I never got the opportunity to be insulted by Jobs.
Jul 30 2013
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 30.07.2013 11:04, schrieb Temtaime:
 DMC is ugly compiler.
 It will be much nicer if you'll use mingw for that purpose on
 Windows. GCC usually generates more faster code that VC does.
 http://sourceforge.net/projects/mingwbuilds/
 DMC is ugly compiler.
ugly means bad or miss-designed, but please show me a better 16/32(64) bit full c/c++ compiler out there
 GCC usually generates more faster code that VC does.
currently the vc builded dmd is about 2 times faster in compiling, do you think that a mingw build will even top this?
Jul 30 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
Jul 31 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 31.07.2013 09:00, schrieb Walter Bright:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
tried to but failed downloaded dmd-master.zip (from github) downloaded dmd.2.063.2.zip buidl dmd-master with vs2010 copied the produces dmd_msc.exe to dmd.2.063.2\dmd2\windows\bin dmd.2.063.2\dmd2\src\phobos>..\..\windows\bin\dmd.exe std\algorithm -unittest -main gives Error: cannot read file main.d (what is this "" in front of main.d?) dmd.2.063.2\dmd2\src\phobos>..\..\windows\bin\dmd_msc.exe std\algorithm -unittest -main gives std\datetime.d(31979): Error: pure function 'std.datetime.enforceValid!"hours".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13556): Error: template instance std.datetime.enforceValid!"hours" error instantiating std\datetime.d(31984): Error: pure function 'std.datetime.enforceValid!"minutes".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13557): Error: template instance std.datetime.enforceValid!"minutes" error instantiating std\datetime.d(31989): Error: pure function 'std.datetime.enforceValid!"seconds".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(13558): Error: template instance std.datetime.enforceValid!"seconds" error instantiating std\datetime.d(33284): called from here: (TimeOfDay __ctmp1990; , __ctmp1990).this(0, 0, 0) std\datetime.d(33293): Error: CTFE failed because of previous errors in this std\datetime.d(31974): Error: pure function 'std.datetime.enforceValid!"months".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8994): Error: template instance std.datetime.enforceValid!"months" error instantiating std\datetime.d(32012): Error: pure function 'std.datetime.enforceValid!"days".enforceValid' cannot call impure function 'core.time.TimeException.this' std\datetime.d(8995): Error: template instance std.datetime.enforceValid!"days" error instantiating std\datetime.d(33389): called from here: (Date __ctmp1999; , __ctmp1999).this(-3760, 9, 7) std\datetime.d(33458): Error: CTFE failed because of previous errors in this Error: undefined identifier '_xopCmp' and a compiler crash my former benchmark where done the same way and it worked without any problems - this master seems to have problems
Jul 31 2013
prev sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 31.07.2013 09:00, Walter Bright wrote:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between): Debug build dmd_dmc: 23 sec, std new 43 sec Debug build dmd_msc: 19 sec, std new 20 sec "std new" is the version without the "block allocator". Release build dmd_dmc: 3 min 30, std new 5 min 25 Release build dmd_msc: 1 min 32, std new 1 min 40 The release builds use "-release -O -inline" and need a bit more than 1 GB memory for two of the libraries (I still had to patch dmd_dmc to be large-address-aware). This shows that removing most of the allocations was a good optimization for the dmc-Runtime, but does not have a large, but still notable impact on a faster heap implementation (the VS runtime usually maps directly to the Windows API for non-Debug builds). I suspect the backend and the optimizer do not use "new" a lot, but plain "malloc" calls, so they still suffer from the slow runtime.
Jul 31 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Thanks for doing this, this is good information.

On 7/31/2013 2:24 PM, Rainer Schuetze wrote:
 I have just tried yesterdays dmd to build Visual D (it builds some libraries
and
 contains a few short non-compiling tasks in between):

 Debug build dmd_dmc: 23 sec, std new 43 sec
 Debug build dmd_msc: 19 sec, std new 20 sec
That makes it clear that the dmc malloc() was the dominator, not code gen.
 "std new" is the version without the "block allocator".

 Release build dmd_dmc: 3 min 30, std new 5 min 25
 Release build dmd_msc: 1 min 32, std new 1 min 40

 The release builds use "-release -O -inline" and need a bit more than 1 GB
 memory for two of the libraries (I still had to patch dmd_dmc to be
 large-address-aware).

 This shows that removing most of the allocations was a good optimization for
the
 dmc-Runtime, but does not have a large, but still notable impact on a faster
 heap implementation (the VS runtime usually maps directly to the Windows API
for
 non-Debug builds). I suspect the backend and the optimizer do not use "new" a
 lot, but plain "malloc" calls, so they still suffer from the slow runtime.
Actually, dmc still should give a better showing. All the optimizations I've put into dmd also went into dmc, and do result in significantly better code speed. For example, the hash modulus optimization has a significant impact, but I haven't released that dmc yet. Optimized builds have an entirely different profile than debug builds, and I haven't investigated that.
Jul 31 2013
parent reply Richard Webb <richard.webb boldonjames.com> writes:
On 01/08/2013 00:32, Walter Bright wrote:
 Thanks for doing this, this is good information.

 On 7/31/2013 2:24 PM, Rainer Schuetze wrote:
 I have just tried yesterdays dmd to build Visual D (it builds some
 libraries and
 contains a few short non-compiling tasks in between):

 Debug build dmd_dmc: 23 sec, std new 43 sec
 Debug build dmd_msc: 19 sec, std new 20 sec
That makes it clear that the dmc malloc() was the dominator, not code gen.
It still appears that the DMC malloc is a big reason for the difference between DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test suggests that changing the global new in rmem.c to call HeapAlloc instead of malloc gives a large speedup).
Aug 02 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/2/2013 4:18 AM, Richard Webb wrote:
 It still appears that the DMC malloc is a big reason for the difference between
 DMC and MSVC builds when compiling the algorithm unit tests. (a very quick test
 suggests that changing the global new in rmem.c to call HeapAlloc instead of
 malloc gives a large speedup).
Yes, I agree, the DMC malloc is clearly a large performance problem. I had not realized this.
Aug 02 2013
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 31.07.2013 23:24, schrieb Rainer Schuetze:
 On 31.07.2013 09:00, Walter Bright wrote:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):
can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main
Jul 31 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 01.08.2013 07:33, dennis luehring wrote:
 Am 31.07.2013 23:24, schrieb Rainer Schuetze:
 On 31.07.2013 09:00, Walter Bright wrote:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):
can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main
std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 sec
Jul 31 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 01.08.2013 08:16, schrieb Rainer Schuetze:
 On 01.08.2013 07:33, dennis luehring wrote:
 Am 31.07.2013 23:24, schrieb Rainer Schuetze:
 On 31.07.2013 09:00, Walter Bright wrote:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):
can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main
std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 sec
so we can "still" say das msc builds are around two times faster - or even faster
Jul 31 2013
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 01.08.2013 08:16, schrieb Rainer Schuetze:
 On 01.08.2013 07:33, dennis luehring wrote:
 Am 31.07.2013 23:24, schrieb Rainer Schuetze:
 On 31.07.2013 09:00, Walter Bright wrote:
 On 7/30/2013 11:40 PM, dennis luehring wrote:
 currently the vc builded dmd is about 2 times faster in compiling
That's an old number now. Someone want to try it with the current HEAD?
I have just tried yesterdays dmd to build Visual D (it builds some libraries and contains a few short non-compiling tasks in between):
can you also give us also timings for (dmd_dmc|dmd_msc) std\algorithm -unittest -main
std.algorithm -unittest -main: dmd_dmc 20 sec, std new 61 sec dmd_msc 11 sec, std new 13 sec std.algorithm -unittest -main -O: dmd_dmc 27 sec, std new 68 sec dmd_msc 16 sec, std new 18 sec
results from mingw, vs2012(13) and llvm-clang builds would be also very interesting, but i don't know if dmd can be build with mingw or clang out of the box under windows
Aug 01 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
I've now upgraded dmc so dmd builds can take advantage of improved code
generation.

http://www.digitalmars.com/download/freecompiler.html
Aug 01 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 02.08.2013 00:36, Walter Bright wrote:
 I've now upgraded dmc so dmd builds can take advantage of improved code
 generation.

 http://www.digitalmars.com/download/freecompiler.html
Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
Aug 02 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/2/2013 12:57 AM, Rainer Schuetze wrote:
 http://www.digitalmars.com/download/freecompiler.html
Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this: ?_aaGetRvalue YAPAXPAUAA PAX Z: push EBX mov EBX,0Ch[ESP] push ESI cmp dword ptr 0Ch[ESP],0 je L184 mov EAX,0Ch[ESP] mov ECX,4[EAX] cmp ECX,4 jne L139 mov ESI,EBX and ESI,3 jmp short L166 L139: cmp ECX,01Fh jne L15E ======== note this section does not have a div instruction in it ============== mov EAX,EBX mov EDX,08421085h mov ECX,EBX mul EDX mov EAX,ECX sub EAX,EDX shr EAX,1 lea EDX,[EAX][EDX] shr EDX,4 imul EAX,EDX,01Fh sub ECX,EAX mov ESI,ECX ========================================================================== jmp short L166 L15E: mov EAX,EBX xor EDX,EDX div ECX mov ESI,EDX L166: mov ECX,0Ch[ESP] mov ECX,[ECX] mov EDX,[ESI*4][ECX] test EDX,EDX je L184 L173: cmp 4[EDX],EBX jne L17E mov EAX,8[EDX] pop ESI pop EBX ret L17E: mov EDX,[EDX] test EDX,EDX jne L173 L184: pop ESI xor EAX,EAX pop EBX ret
Aug 02 2013
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 02.08.2013 10:24, Walter Bright wrote:
 On 8/2/2013 12:57 AM, Rainer Schuetze wrote:
 http://www.digitalmars.com/download/freecompiler.html
Although my laptop got quite a bit faster overnight (I guess it was throttled for some reason yesterday), relative results don't change: std.algorithm -main -unittest dmc85?: 12.5 sec dmc857: 12.5 sec msc: 7 sec BTW: I usually use VS2008, but now also tried VS2010 - no difference.
The two dmc times shouldn't be the same. I see a definite improvement. Disassemble aav.obj, and look at the function aaGetRvalue. It should look like this:
My disassembly looks exactly the same. I don't think that a single div operation in a rather long function has a lot of impact on modern processors. I'm running an i7, according to the instruction tables by Agner Fog, the div has latency of 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the latency of the asm snippet, I also get 16 cycles. And that doesn't take the additional tests and jumps into consideration. ======== note this section does not have a div instruction in it ============== mov EAX,EBX mov EDX,08421085h ; latency 3 mov ECX,EBX mul EDX ; latency 5 mov EAX,ECX sub EAX,EDX ; latency 1 shr EAX,1 ; latency 1 lea EDX,[EAX][EDX] ; latency 1 shr EDX,4 ; latency 1 imul EAX,EDX,01Fh ; latency 3 sub ECX,EAX ; latency 1 mov ESI,ECX ==========================================================================
Aug 02 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/2/2013 2:47 AM, Rainer Schuetze wrote:
 My disassembly looks exactly the same. I don't think that a single div
operation
 in a rather long function has a lot of impact on modern processors. I'm running
 an i7, according to the instruction tables by Agner Fog, the div has latency of
 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate the
 latency of the asm snippet, I also get 16 cycles. And that doesn't take the
 additional tests and jumps into consideration.
I'm using an AMD FX-6100.
Aug 02 2013
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 02.08.2013 18:37, Walter Bright wrote:
 On 8/2/2013 2:47 AM, Rainer Schuetze wrote:
 My disassembly looks exactly the same. I don't think that a single div
 operation
 in a rather long function has a lot of impact on modern processors.
 I'm running
 an i7, according to the instruction tables by Agner Fog, the div has
 latency of
 17-28 cycles and a reciprocal throughput of 7-17 cycles. If I estimate
 the
 latency of the asm snippet, I also get 16 cycles. And that doesn't
 take the
 additional tests and jumps into consideration.
I'm using an AMD FX-6100.
This processor seems to do a little better with the mov reg,imm operation but otherwise is similar. The DIV operation has larger worst-case latency, though (16-48 cycles). Better to just use a power of 2 for the array sizes anyway...
Aug 02 2013
prev sibling parent reply "Daniel Murphy" <yebblies nospamgmail.com> writes:
"Rainer Schuetze" <r.sagitario gmx.de> wrote in message 
news:ktbvam$dvf$1 digitalmars.com...
large-address-aware).
 This shows that removing most of the allocations was a good optimization 
 for the dmc-Runtime, but does not have a large, but still notable impact 
 on a faster heap implementation (the VS runtime usually maps directly to 
 the Windows API for non-Debug builds). I suspect the backend and the 
 optimizer do not use "new" a lot, but plain "malloc" calls, so they still 
 suffer from the slow runtime.
On a related note, I just tried replacing the two ::malloc calls in rmem's operator new with VirtualAlloc and I get a reduction from 13 seconds to 9 seconds (compiling "dmd std\range -unittest -main") with a release build of dmd.
Aug 02 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 8/2/2013 8:18 AM, Daniel Murphy wrote:
 On a related note, I just tried replacing the two ::malloc calls in rmem's
 operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
 seconds (compiling "dmd std\range -unittest -main") with a release build of
 dmd.
Hmm, very interesting!
Aug 02 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
02-Aug-2013 20:40, Walter Bright пишет:
 On 8/2/2013 8:18 AM, Daniel Murphy wrote:
 On a related note, I just tried replacing the two ::malloc calls in
 rmem's
 operator new with VirtualAlloc and I get a reduction from 13 seconds to 9
 seconds (compiling "dmd std\range -unittest -main") with a release
 build of
 dmd.
Hmm, very interesting!
Made a pull to provide an implementation of rmem.c on top of Win32 Heap API. https://github.com/D-Programming-Language/dmd/pull/2445 Also noting that global new/delete are not reentrant already, added NO_SERIALIZE flag to save on locking/unlocking of heap. For me this gets from 13 to 8 seconds. -- Dmitry Olshansky
Aug 02 2013
prev sibling parent "Don" <prosthetictelevisions teletubby.medical.com> writes:
On Thursday, 25 July 2013 at 18:03:22 UTC, Walter Bright wrote:
 http://www.reddit.com/r/programming/comments/1j1i30/increasing_the_d_compiler_speed_by_over_75/
I just reported this compile speed killer: http://d.puremagic.com/issues/show_bug.cgi?id=10716 It has a big impact on some of the tests in the DMD test suite. It might also be responsible for a significant part of the compilation time of Phobos, since array literals tend to be widely used inside unittest functions.
Jul 26 2013