www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - top time wasters in DMD, as reported by gprof

reply Walter Bright <newshound2 digitalmars.com> writes:
Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
   5.19      0.08     0.08   945494     0.00     0.00  _aaGetRvalue(AA*, void*)
   3.90      0.14     0.06   926415     0.00     0.00  Lexer::scan(Token*)
   3.25      0.19     0.05   397097     0.00     0.00  calcHash(char const*, 
unsigned long)
   2.60      0.23     0.04   373646     0.00     0.00  StringTable::search(char 
const*, unsigned lon
   1.95      0.26     0.03    33255     0.00     0.00 
TemplateInstance::semantic(Scope*, Array<Expr
   1.30      0.28     0.02   968985     0.00     0.00 
ScopeDsymbol::search(Loc, 
Identifier*, int)
   1.30      0.30     0.02   424079     0.00     0.00  calccodsize
   1.30      0.32     0.02   234547     0.00     0.00  Parser::parseShiftExp()
   1.30      0.34     0.02   168933     0.00     0.00 
AliasDeclaration::toAlias()
   1.30      0.36     0.02   136371     0.00     0.00  _aaGet(AA**, void*)
   1.30      0.38     0.02   132102     0.00     0.00  TypeStruct::hasPointers()
   1.30      0.40     0.02    32490     0.00     0.00 
TemplateDeclaration::matchWithInstance(Templa
   1.30      0.42     0.02    21666     0.00     0.00 
FuncDeclaration::functionSemantic()
   1.30      0.44     0.02    17983     0.00     0.00 
VarDeclaration::syntaxCopy(Dsymbol*)
   1.30      0.46     0.02    12908     0.00     0.00 
TemplateDeclaration::deduceFunctionTemplateMa
   1.30      0.48     0.02     6890     0.00     0.00  assignaddrc
   1.30      0.50     0.02     6563     0.00     0.00  ecom(elem**)
   1.30      0.52     0.02     5675     0.00     0.00 
TypeInfo_Idxstr::getHash(void*)
   1.30      0.54     0.02     3780     0.01     0.01 
TemplateDeclaration::semantic(Scope*)
   1.30      0.56     0.02     2648     0.01     0.01  operator delete(void*)
   0.97      0.58     0.02    42592     0.00     0.00 
IdentifierExp::semantic(Scope*)
   0.97      0.59     0.02    36403     0.00     0.00  BinExp::syntaxCopy()
   0.65      0.60     0.01  2668529     0.00     0.00  operator new(unsigned
long)
   0.65      0.61     0.01  2575503     0.00     0.00 
Dsymbol::isTemplateMixin()
   0.65      0.62     0.01  2310757     0.00     0.00  Type::toBasetype()
   0.65      0.63     0.01  2096689     0.00     0.00 
OutBuffer::reserve(unsigned long)
   0.65      0.64     0.01  1030635     0.00     0.00 
Dsymbol::isTemplateInstance()
   0.65      0.65     0.01   708615     0.00     0.00  Module::search(Loc, 
Identifier*, int)
   0.65      0.66     0.01   683509     0.00     0.00 
VarDeclaration::isDataseg()
   0.65      0.67     0.01   662180     0.00     0.00  Module::isModule()
   0.65      0.68     0.01   567921     0.00     0.00  isType(RootObject*)
   0.65      0.69     0.01   536331     0.00     0.00  Mem::realloc(void*, 
unsigned long)
   0.65      0.70     0.01   454836     0.00     0.00 
Parameter::foreach(Array<Parameter>*, int (*)
   0.65      0.71     0.01   375732     0.00     0.00  Type::merge()
   0.65      0.72     0.01   353615     0.00     0.00  Mem::free(void*)
   0.65      0.73     0.01   348546     0.00     0.00  isDsymbol(RootObject*)
   0.65      0.74     0.01   339336     0.00     0.00 
exceptionOrCantInterpret(Expression*)
   0.65      0.75     0.01   311493     0.00     0.00  isTuple(RootObject*)
   0.65      0.76     0.01   300069     0.00     0.00  TypeBasic::isscalar()
   0.65      0.77     0.01   270438     0.00     0.00 
VarDeclaration::hasPointers()
   0.65      0.78     0.01   248163     0.00     0.00 
Identifier::equals(RootObject*)
   0.65      0.79     0.01   234806     0.00     0.00  Parser::parseAddExp()
Jun 23 2013
next sibling parent reply Timothee Cour <thelastmammoth gmail.com> writes:
can you give more context ? what was the command line to get this?
I thought lexing was not a bottleneck, yet it seems to be in second place?

On Sun, Jun 23, 2013 at 2:28 PM, Walter Bright
<newshound2 digitalmars.com>wrote:

 Flat profile:

 Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
   5.19      0.08     0.08   945494     0.00     0.00  _aaGetRvalue(AA*,
 void*)
   3.90      0.14     0.06   926415     0.00     0.00  Lexer::scan(Token*)
   3.25      0.19     0.05   397097     0.00     0.00  calcHash(char
 const*, unsigned long)
   2.60      0.23     0.04   373646     0.00     0.00
  StringTable::search(char const*, unsigned lon
   1.95      0.26     0.03    33255     0.00     0.00
 TemplateInstance::semantic(**Scope*, Array<Expr
   1.30      0.28     0.02   968985     0.00     0.00
  ScopeDsymbol::search(Loc, Identifier*, int)
   1.30      0.30     0.02   424079     0.00     0.00  calccodsize
   1.30      0.32     0.02   234547     0.00     0.00
  Parser::parseShiftExp()
   1.30      0.34     0.02   168933     0.00     0.00
  AliasDeclaration::toAlias()
   1.30      0.36     0.02   136371     0.00     0.00  _aaGet(AA**, void*)
   1.30      0.38     0.02   132102     0.00     0.00
  TypeStruct::hasPointers()
   1.30      0.40     0.02    32490     0.00     0.00 TemplateDeclaration::
 **matchWithInstance(Templa
   1.30      0.42     0.02    21666     0.00     0.00 FuncDeclaration::**
 functionSemantic()
   1.30      0.44     0.02    17983     0.00     0.00
 VarDeclaration::syntaxCopy(**Dsymbol*)
   1.30      0.46     0.02    12908     0.00     0.00 TemplateDeclaration::
 **deduceFunctionTemplateMa
   1.30      0.48     0.02     6890     0.00     0.00  assignaddrc
   1.30      0.50     0.02     6563     0.00     0.00  ecom(elem**)
   1.30      0.52     0.02     5675     0.00     0.00
 TypeInfo_Idxstr::getHash(void***)
   1.30      0.54     0.02     3780     0.01     0.01
 TemplateDeclaration::semantic(**Scope*)
   1.30      0.56     0.02     2648     0.01     0.01  operator
 delete(void*)
   0.97      0.58     0.02    42592     0.00     0.00
 IdentifierExp::semantic(Scope***)
   0.97      0.59     0.02    36403     0.00     0.00  BinExp::syntaxCopy()
   0.65      0.60     0.01  2668529     0.00     0.00  operator
 new(unsigned long)
   0.65      0.61     0.01  2575503     0.00     0.00
  Dsymbol::isTemplateMixin()
   0.65      0.62     0.01  2310757     0.00     0.00  Type::toBasetype()
   0.65      0.63     0.01  2096689     0.00     0.00
 OutBuffer::reserve(unsigned long)
   0.65      0.64     0.01  1030635     0.00     0.00
  Dsymbol::isTemplateInstance()
   0.65      0.65     0.01   708615     0.00     0.00  Module::search(Loc,
 Identifier*, int)
   0.65      0.66     0.01   683509     0.00     0.00
  VarDeclaration::isDataseg()
   0.65      0.67     0.01   662180     0.00     0.00  Module::isModule()
   0.65      0.68     0.01   567921     0.00     0.00  isType(RootObject*)
   0.65      0.69     0.01   536331     0.00     0.00  Mem::realloc(void*,
 unsigned long)
   0.65      0.70     0.01   454836     0.00     0.00
 Parameter::foreach(Array<**Parameter>*, int (*)
   0.65      0.71     0.01   375732     0.00     0.00  Type::merge()
   0.65      0.72     0.01   353615     0.00     0.00  Mem::free(void*)
   0.65      0.73     0.01   348546     0.00     0.00
  isDsymbol(RootObject*)
   0.65      0.74     0.01   339336     0.00     0.00
 exceptionOrCantInterpret(**Expression*)
   0.65      0.75     0.01   311493     0.00     0.00  isTuple(RootObject*)
   0.65      0.76     0.01   300069     0.00     0.00  TypeBasic::isscalar()
   0.65      0.77     0.01   270438     0.00     0.00
  VarDeclaration::hasPointers()
   0.65      0.78     0.01   248163     0.00     0.00
 Identifier::equals(RootObject***)
   0.65      0.79     0.01   234806     0.00     0.00  Parser::parseAddExp()
Jun 23 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/23/2013 8:39 PM, Timothee Cour wrote:
 can you give more context ? what was the command line to get this?
Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithm
 I thought lexing was not a bottleneck, yet it seems to be in second place?
The fun with profilers is they tell you things you didn't think were true.
Jun 23 2013
next sibling parent Timothee Cour <thelastmammoth gmail.com> writes:
On Sun, Jun 23, 2013 at 8:55 PM, Walter Bright
<newshound2 digitalmars.com>wrote:

 On 6/23/2013 8:39 PM, Timothee Cour wrote:

 can you give more context ? what was the command line to get this?
Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithm
 I thought lexing was not a bottleneck, yet it seems to be in second
place? The fun with profilers is they tell you things you didn't think were true.
more profiliing numbers from Ian Buclaw, for gdc: http://forum.dlang.org/post/mailman.1469.1339580395.24740.digitalmars-d puremagic.com http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf That cam up in an old proposal i made a year ago: "AST files instead of DI interface files for faster compilation and easier distribution".
Jun 23 2013
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 24 June 2013 05:53, Timothee Cour <thelastmammoth gmail.com> wrote:
 On Sun, Jun 23, 2013 at 8:55 PM, Walter Bright <newshound2 digitalmars.com>
 wrote:
 On 6/23/2013 8:39 PM, Timothee Cour wrote:
 can you give more context ? what was the command line to get this?
Compile on Linux with -pg. Run with: dmd -main -unittest std/algorithm
 I thought lexing was not a bottleneck, yet it seems to be in second
 place?
The fun with profilers is they tell you things you didn't think were true.
more profiliing numbers from Ian Buclaw, for gdc: http://forum.dlang.org/post/mailman.1469.1339580395.24740.digitalmars-d puremagic.com http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf That cam up in an old proposal i made a year ago: "AST files instead of DI interface files for faster compilation and easier distribution".
With that report, all modules are compiled with roughly the equivalent of dmd's -O -g -release. So for std.algorithm, it is show to take 0.52 seconds to compile. However with -unittest turned on, I know it takes at least a minute for gdc to compile that module (up there with std.datetime) - and consumes about 2GBs of memory. ;-) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 23 2013
prev sibling next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
how does that look using msvc compiling the dmd compiler
as it turns out that msvc make dmd much faster

btw: the memory profile tool i mentioned is also available as 
performance profiler: http://www.softwareverify.com/cpp-profiler.php

Am 23.06.2013 23:28, schrieb Walter Bright:
 Flat profile:

 Each sample counts as 0.01 seconds.
     %   cumulative   self              self     total
    time   seconds   seconds    calls  ms/call  ms/call  name
     5.19      0.08     0.08   945494     0.00     0.00  _aaGetRvalue(AA*,
void*)
     3.90      0.14     0.06   926415     0.00     0.00  Lexer::scan(Token*)
     3.25      0.19     0.05   397097     0.00     0.00  calcHash(char const*,
 unsigned long)
     2.60      0.23     0.04   373646     0.00     0.00 
StringTable::search(char
 const*, unsigned lon
     1.95      0.26     0.03    33255     0.00     0.00
 TemplateInstance::semantic(Scope*, Array<Expr
     1.30      0.28     0.02   968985     0.00     0.00 
ScopeDsymbol::search(Loc,
 Identifier*, int)
     1.30      0.30     0.02   424079     0.00     0.00  calccodsize
     1.30      0.32     0.02   234547     0.00     0.00  Parser::parseShiftExp()
     1.30      0.34     0.02   168933     0.00     0.00 
AliasDeclaration::toAlias()
     1.30      0.36     0.02   136371     0.00     0.00  _aaGet(AA**, void*)
     1.30      0.38     0.02   132102     0.00     0.00 
TypeStruct::hasPointers()
     1.30      0.40     0.02    32490     0.00     0.00
 TemplateDeclaration::matchWithInstance(Templa
     1.30      0.42     0.02    21666     0.00     0.00
 FuncDeclaration::functionSemantic()
     1.30      0.44     0.02    17983     0.00     0.00
 VarDeclaration::syntaxCopy(Dsymbol*)
     1.30      0.46     0.02    12908     0.00     0.00
 TemplateDeclaration::deduceFunctionTemplateMa
     1.30      0.48     0.02     6890     0.00     0.00  assignaddrc
     1.30      0.50     0.02     6563     0.00     0.00  ecom(elem**)
     1.30      0.52     0.02     5675     0.00     0.00
 TypeInfo_Idxstr::getHash(void*)
     1.30      0.54     0.02     3780     0.01     0.01
 TemplateDeclaration::semantic(Scope*)
     1.30      0.56     0.02     2648     0.01     0.01  operator delete(void*)
     0.97      0.58     0.02    42592     0.00     0.00
 IdentifierExp::semantic(Scope*)
     0.97      0.59     0.02    36403     0.00     0.00  BinExp::syntaxCopy()
     0.65      0.60     0.01  2668529     0.00     0.00  operator new(unsigned
long)
     0.65      0.61     0.01  2575503     0.00     0.00 
Dsymbol::isTemplateMixin()
     0.65      0.62     0.01  2310757     0.00     0.00  Type::toBasetype()
     0.65      0.63     0.01  2096689     0.00     0.00
 OutBuffer::reserve(unsigned long)
     0.65      0.64     0.01  1030635     0.00     0.00 
Dsymbol::isTemplateInstance()
     0.65      0.65     0.01   708615     0.00     0.00  Module::search(Loc,
 Identifier*, int)
     0.65      0.66     0.01   683509     0.00     0.00 
VarDeclaration::isDataseg()
     0.65      0.67     0.01   662180     0.00     0.00  Module::isModule()
     0.65      0.68     0.01   567921     0.00     0.00  isType(RootObject*)
     0.65      0.69     0.01   536331     0.00     0.00  Mem::realloc(void*,
 unsigned long)
     0.65      0.70     0.01   454836     0.00     0.00
 Parameter::foreach(Array<Parameter>*, int (*)
     0.65      0.71     0.01   375732     0.00     0.00  Type::merge()
     0.65      0.72     0.01   353615     0.00     0.00  Mem::free(void*)
     0.65      0.73     0.01   348546     0.00     0.00  isDsymbol(RootObject*)
     0.65      0.74     0.01   339336     0.00     0.00
 exceptionOrCantInterpret(Expression*)
     0.65      0.75     0.01   311493     0.00     0.00  isTuple(RootObject*)
     0.65      0.76     0.01   300069     0.00     0.00  TypeBasic::isscalar()
     0.65      0.77     0.01   270438     0.00     0.00 
VarDeclaration::hasPointers()
     0.65      0.78     0.01   248163     0.00     0.00
 Identifier::equals(RootObject*)
     0.65      0.79     0.01   234806     0.00     0.00  Parser::parseAddExp()
Jun 24 2013
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Monday, 24 June 2013 at 13:21:23 UTC, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
Cosnidering keyword "gprof", chances are extremely high it was run with gcc on some not-so-MS platform.
Jun 24 2013
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 16:09, schrieb Dicebot:
 On Monday, 24 June 2013 at 13:21:23 UTC, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
Cosnidering keyword "gprof", chances are extremely high it was run with gcc on some not-so-MS platform.
i know - my question was - how does that look using msvc... because the msvc build tends to be faster - would be a nice compare if an windows (dmc and msvc) build will show similar hotspots under (because of msvc) an windows based profiler
Jun 24 2013
next sibling parent reply "Richard Webb" <webby beardmouse.org.uk> writes:
On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:
 i know - my question was - how does that look using msvc...
I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
Jun 24 2013
next sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 18:15, schrieb Richard Webb:
 On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:
 i know - my question was - how does that look using msvc...
I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
so it could be std library implementation related - can DMC use the msvc libs? (just for the compare) and you should also try 2010 - or better 2012 msvc (it still gets speedier code out)
Jun 24 2013
parent reply "SomeDude" <lovelydear mailmetrash.com> writes:
On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:

 so it could be std library implementation related - can DMC use 
 the msvc libs? (just for the compare)

 and you should also try 2010 - or better 2012 msvc (it still 
 gets speedier code out)
Is there still a free version of the VS compiler ?
Jun 25 2013
next sibling parent "SomeDude" <lovelydear mailmetrash.com> writes:
On Wednesday, 26 June 2013 at 05:39:00 UTC, SomeDude wrote:
 On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:

 so it could be std library implementation related - can DMC 
 use the msvc libs? (just for the compare)

 and you should also try 2010 - or better 2012 msvc (it still 
 gets speedier code out)
Is there still a free version of the VS compiler ?
Ah yes, Visual Studio Express 2012 is free.
Jun 25 2013
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, June 26, 2013 07:38:51 SomeDude wrote:
 On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:
 so it could be std library implementation related - can DMC use
 the msvc libs? (just for the compare)
 
 and you should also try 2010 - or better 2012 msvc (it still
 gets speedier code out)
Is there still a free version of the VS compiler ?
Yes. http://www.microsoft.com/visualstudio/eng/downloads Though it's obviously going to be a stripped down version. - Jonathan M Davis
Jun 25 2013
prev sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 26.06.2013 07:38, schrieb SomeDude:
 On Monday, 24 June 2013 at 16:46:51 UTC, dennis luehring wrote:

 so it could be std library implementation related - can DMC use
 the msvc libs? (just for the compare)

 and you should also try 2010 - or better 2012 msvc (it still
 gets speedier code out)
Is there still a free version of the VS compiler ?
always the latest (currently) vs2012 as express edition (only the MFC library is missing)
Jun 26 2013
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 18:15, schrieb Richard Webb:
 On Monday, 24 June 2013 at 15:04:37 UTC, dennis luehring wrote:
 i know - my question was - how does that look using msvc...
I just did a very quick test using the latest DMD source: Using the command line -release -unittest -c D:\DTesting\dmd.2.063\src\phobos\std\algorithm.d DMD built with DMC takes ~49 seconds to complete, but DMD build with VC2008 only takes ~12 seconds. (Need to get a proper VC build done to test it properly). Looks like the DMC build spends far more time allocating memory, even though the peak memory usage is only slightly lower in the VS version?
i got similar speed improvemnts using vs2010
Jun 24 2013
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 18:15, schrieb Richard Webb:
 DMD built with DMC takes ~49 seconds to complete, but DMD build
 with VC2008 only takes ~12 seconds. (Need to get a proper VC
 build done to test it properly).
 Looks like the DMC build spends far more time allocating memory,
 even though the peak memory usage is only slightly lower in the
 VS version?
i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
Jun 24 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 25.06.2013 07:51, schrieb dennis luehring:
 Am 24.06.2013 18:15, schrieb Richard Webb:
 DMD built with DMC takes ~49 seconds to complete, but DMD build
 with VC2008 only takes ~12 seconds. (Need to get a proper VC
 build done to test it properly).
 Looks like the DMC build spends far more time allocating memory,
 even though the peak memory usage is only slightly lower in the
 VS version?
i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
sorry it was VS2010 (and VTune Trial) the results of VTune seems to be different to Walters
Jun 24 2013
prev sibling parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 25.06.2013 07:51, schrieb dennis luehring:
 Am 24.06.2013 18:15, schrieb Richard Webb:
 DMD built with DMC takes ~49 seconds to complete, but DMD build
 with VC2008 only takes ~12 seconds. (Need to get a proper VC
 build done to test it properly).
 Looks like the DMC build spends far more time allocating memory,
 even though the peak memory usage is only slightly lower in the
 VS version?
i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html
Jun 24 2013
parent reply "Kiith-Sa" <kiithsacmp gmail.com> writes:
On Tuesday, 25 June 2013 at 06:21:09 UTC, dennis luehring wrote:
 Am 25.06.2013 07:51, schrieb dennis luehring:
 Am 24.06.2013 18:15, schrieb Richard Webb:
 DMD built with DMC takes ~49 seconds to complete, but DMD 
 build
 with VC2008 only takes ~12 seconds. (Need to get a proper VC
 build done to test it properly).
 Looks like the DMC build spends far more time allocating 
 memory,
 even though the peak memory usage is only slightly lower in 
 the
 VS version?
i've done VS2012 + Intel VTune Amp XE 2013 profiling - see the attached zipped csv file
the AMD CodeXL results are also different - both VTune and CodeXL fully integrated into VS2010 and using "same" settings btw nice to read: http://www.yosefk.com/blog/how-profilers-lie-the-cases-of-gprof-and-kcachegrind.html
GProf tends to be pretty useless for actual profiling in my experience. I think the best way is to use a sampling profiler such as 'perf' (a part of the linux project on a recent debian/ubuntu/mint type 'perf' into console to get info about what package to install, docs at https://perf.wiki.kernel.org/index.php/Tutorial‎, 'oprofile' (pretty much the same featureset as perf, sometimes hard to set up) or VTune mentioned here. Never expect gprof to give you reliable data as to how much time which function takes. Callgrind/kcachegrind is also pretty good if your code doesn't spend a lot of time on i/o, system calls, etc (as the main code is running in a slow VM - anything not running in that VM will seem to run much faster). Furthermore, _neither_ of these requires compiling with special flags. As for debug symbols, it's best to enable optimizations together with enabling debug symbols. Optimizations are not a big issue - even if some functions were inlined, these tools give you per-line and per-instruction results. Not to mention cache hits/misses, branches, branch mispredictions, and if you use CPU specific event IDs whatever else your CPU can record. AND it doesn't affect performance of profiled code measurably, unless you set an insanely high sample rate. And if this sounds difficult to configure, most of these tools (perf at the very least) have very sane defaults that give way more useful results than gprof. TLDR: gprof is horrible. Never use it for profiling. There are approximaly 5 billion better tools that give more detailed results _and_ are easier to use. I seriously need to write a blog post/article about this.
Jun 25 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/25/2013 2:27 AM, Kiith-Sa wrote:
 TLDR: gprof is horrible. Never use it for profiling. There are approximaly 5
 billion better tools that give more detailed results _and_ are easier to use.
One thing gprof does give is accurate counts, and accurate fan in / fan out data. The fan data is very useful in finding out who is calling who and how many times.
 I seriously need to write a blog post/article about this.
Please do so!
Jun 25 2013
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 6/24/13, dennis luehring <dl.soluz gmx.net> wrote:
 i know - my question was - how does that look using msvc...
The recent speed improvements in git-head have improve both the DMC-based and MSVC-based DMD executables. It shaved off from 700msecs on small projects up to 4 seconds on large projects for clean builds (for me). That being said the MSVC built compiler is usually twice as fast as the DMC built one. It goes to show that whatever is in that DMC backend is long outdated.
Jun 24 2013
parent reply "Adam Wilson" <flyboynw gmail.com> writes:
On Mon, 24 Jun 2013 11:44:24 -0700, Andrej Mitrovic  
<andrej.mitrovich gmail.com> wrote:

 On 6/24/13, dennis luehring <dl.soluz gmx.net> wrote:
 i know - my question was - how does that look using msvc...
The recent speed improvements in git-head have improve both the DMC-based and MSVC-based DMD executables. It shaved off from 700msecs on small projects up to 4 seconds on large projects for clean builds (for me). That being said the MSVC built compiler is usually twice as fast as the DMC built one. It goes to show that whatever is in that DMC backend is long outdated.
I asked Walter about that recently, he says that with all the focus on D there hasn't been enough time to seriously develop the backend in a LONG time. IIRC he said it's been something like 15 years since it's seen any serious development work... NOTE: It came up because I was asking about getting things like auto-vectorization and Haswell's new Transactional Synchronization Extensions and the related instructions into the backend. -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Jun 24 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2013 12:27 PM, Adam Wilson wrote:
 I asked Walter about that recently, he says that with all the focus on D there
 hasn't been enough time to seriously develop the backend in a LONG time. IIRC
he
 said it's been something like 15 years since it's seen any serious development
 work...
It's also possible that a faster malloc() is responsible for much of it.
Jun 24 2013
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2013 6:19 AM, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
The profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
Jun 24 2013
next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 20:01, schrieb Walter Bright:
 On 6/24/2013 6:19 AM, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
The profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
different optimization strategies can shuffle the results totaly - or do we talk about different things
Jun 24 2013
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Monday, 24 June 2013 at 18:01:11 UTC, Walter Bright wrote:
 On 6/24/2013 6:19 AM, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
The profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
I'm confused.... Different optimisers often produce radically different profile results. Or are am I misunderstanding you?
Jun 25 2013
parent dennis luehring <dl.soluz gmx.net> writes:
Am 26.06.2013 08:47, schrieb John Colvin:
 On Monday, 24 June 2013 at 18:01:11 UTC, Walter Bright wrote:
 On 6/24/2013 6:19 AM, dennis luehring wrote:
 how does that look using msvc compiling the dmd compiler
 as it turns out that msvc make dmd much faster
The profile report was done by gcc/gprof. And besides, better compilers shouldn't change profile results.
I'm confused.... Different optimisers often produce radically different profile results. Or are am I misunderstanding you?
maybe he talk about the dmd -release parameter but even that should produce different results (or is the same code use the same amount?)
Jun 26 2013
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
For some reason gprof nerver terminates on my system. So here is the 
same data recorded with oprofile's operf. It also includes kernel, libc 
and ld figures.

http://codepad.org/zL3C6eQd

whole phobos as shared library with unittests and BUILD=debug
http://codepad.org/lITSMbhD

BUILD=release didn't compile with some error about .offsetof
Jun 24 2013
parent reply dennis luehring <dl.soluz gmx.net> writes:
Am 24.06.2013 16:52, schrieb Martin Nowak:
 For some reason gprof nerver terminates on my system. So here is the
 same data recorded with oprofile's operf. It also includes kernel, libc
 and ld figures.

 http://codepad.org/zL3C6eQd

 whole phobos as shared library with unittests and BUILD=debug
 http://codepad.org/lITSMbhD

 BUILD=release didn't compile with some error about .offsetof
what is the sense of doing profiling in BUILD=debug? or is the optimizer still active?
Jun 24 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 6/24/2013 8:57 AM, dennis luehring wrote:
 what is the sense of doing profiling in BUILD=debug?
Non-debug builds may strip out the symbols, rendering the profiling report useless.
Jun 24 2013
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
On 06/24/2013 05:57 PM, dennis luehring wrote:
 what is the sense of doing profiling in BUILD=debug? or is the optimizer
 still active?
That is compiling phobos in debug (non-optimized) mode not dmd. As I said building phobos in release didn't succeed for whatever reason. I can try to install kernel debuginfo that 12% might contain some useful information.
Jun 24 2013
parent reply Martin Nowak <code dawg.eu> writes:
On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain some useful
 information.
http://codepad.org/gWrGvm40
Jun 24 2013
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Mon, 24 Jun 2013 21:01:36 +0200
schrieb Martin Nowak <code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain some useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ? -- Marco
Jun 24 2013
next sibling parent reply Iain Buclaw <ibuclaw ubuntu.com> writes:
On 25 June 2013 07:46, Marco Leise <Marco.Leise gmx.de> wrote:
 Am Mon, 24 Jun 2013 21:01:36 +0200
 schrieb Martin Nowak <code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain some useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 25 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/25/13 2:13 AM, Iain Buclaw wrote:
 On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de>  wrote:
 Am Mon, 24 Jun 2013 21:01:36 +0200
 schrieb Martin Nowak<code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain some useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.
Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc. Andrei
Jun 25 2013
next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 25 June 2013 17:56, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 On 6/25/13 2:13 AM, Iain Buclaw wrote:
 On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de>  wrote:
 Am Mon, 24 Jun 2013 21:01:36 +0200

 schrieb Martin Nowak<code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain some
 useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.
Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc.
I meant it in the most light hearted way possible. Though it's no secret that heavily templated code, coupled with string mixins (eg: Neat is a good example which consumes at least 3GB memory to compile one module IIRC) can see memory allocations sore. I do believe feep preloads the boehm-gc when compiling to mitigate this. :o) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 25 2013
prev sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 25 June 2013 at 16:56:55 UTC, Andrei Alexandrescu 
wrote:
 On 6/25/13 2:13 AM, Iain Buclaw wrote:
 On 25 June 2013 07:46, Marco Leise<Marco.Leise gmx.de>  wrote:
 Am Mon, 24 Jun 2013 21:01:36 +0200
 schrieb Martin Nowak<code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might 
 contain some useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
The D front end does nothing *but* allocate memory... and sometimes from all this allocation (if your computer doesn't die) a compiled program is produced.
Then maybe it should use its own malloc that uses the bump-the-pointer approach from large chunks allocated with malloc.
The problem is not that malloc is slow, but malloc is being called too much. A hierarchical profile would be more useful here, to see exactly what's allocating all that memory. I'm also intrigued as to why so much string activity is happening (vfprintf!? num_to_str!?)
Jun 26 2013
prev sibling parent "Wyatt" <wyatt.epp gmail.com> writes:
On Tuesday, 25 June 2013 at 06:46:54 UTC, Marco Leise wrote:
 Am Mon, 24 Jun 2013 21:01:36 +0200
 schrieb Martin Nowak <code dawg.eu>:

 On 06/24/2013 08:43 PM, Martin Nowak wrote:
 I can try to install kernel debuginfo that 12% might contain 
 some useful
 information.
http://codepad.org/gWrGvm40
Interesting. So to troll a bit, do I see it right, that dmd is mostly a Unicode conversion and memory allocation tool ?
Maybe not DMD, so much as GNU ld? Whatever the case, I'm not surprised to see their iconv that high up in the list; in my experience, it's horrifically slow. Ah, and a fun question: does that number change significantly when you modify your locale variables? (I think it should be enough to just export LC_CTYPE="C".) -Wyatt
Jun 25 2013