www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - profiling issues

reply "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
I've got a library I've been building up over a few projects, and 
I've only ever run it under "debug" "unittest" and "release" 
(with dub "buildOptions").
Lately I've needed to control the performance more carefully, but 
unfortunately trying to compile with dub --profile gives me some 
strange errors:

1) A few lines in one of my modules are reported as "unreachable" 
by dmd. The data they operate on are defined entirely in code 
(i.e. not read as external input) so maybe they're getting CTFE'd 
into oblivion?
All I know is they're apparently reachable in non-profiled code 
(and very essential to the business logic... but they're just 
math functions, nothing crazy, one of the unreachable lines 
computes the areas of some polygons, another sums the areas up).

2) The linker complains about undefined references to 
std.exception.enforce being called from std.stdio.rawRead.

3) If I try to compile with "buildOptions":["profile"] instead of 
dub --profile, then it compiles and links but then I segfault on 
launch at gc_malloc.

I also recall (but can't seem to find) something about profiling 
not working with multithreaded code? Because almost every 
encapsulated service in this library runs on its own thread.

And the code base (>15k LOC) isn't easily reduced, as any 
remotely interesting main method I write pretty much pulls from 
the entire library. I don't want to have to turn this whole thing 
inside out. Its like 95% templates and inlining wreaks havoc on 
the logic as well, but that's another problem for another day...

Does anyone else have these kinds of issues? Are there any 
alternative methods of coarse-grained profiling (i.e., not 
manually peppering timer calls into my code)? Whats with the 
unreachable statements? Any hints on what I can try next to get 
closer to a performance profile of my code?
Sep 11 2014
parent reply "Kiith-Sa" <kiithsacmp gmail.com> writes:
On Friday, 12 September 2014 at 03:23:55 UTC, Vlad Levenfeld 
wrote:
 I've got a library I've been building up over a few projects, 
 and I've only ever run it under "debug" "unittest" and 
 "release" (with dub "buildOptions").
 Lately I've needed to control the performance more carefully, 
 but unfortunately trying to compile with dub --profile gives me 
 some strange errors:

 1) A few lines in one of my modules are reported as 
 "unreachable" by dmd. The data they operate on are defined 
 entirely in code (i.e. not read as external input) so maybe 
 they're getting CTFE'd into oblivion?
 All I know is they're apparently reachable in non-profiled code 
 (and very essential to the business logic... but they're just 
 math functions, nothing crazy, one of the unreachable lines 
 computes the areas of some polygons, another sums the areas up).

 2) The linker complains about undefined references to 
 std.exception.enforce being called from std.stdio.rawRead.

 3) If I try to compile with "buildOptions":["profile"] instead 
 of dub --profile, then it compiles and links but then I 
 segfault on launch at gc_malloc.

 I also recall (but can't seem to find) something about 
 profiling not working with multithreaded code? Because almost 
 every encapsulated service in this library runs on its own 
 thread.

 And the code base (>15k LOC) isn't easily reduced, as any 
 remotely interesting main method I write pretty much pulls from 
 the entire library. I don't want to have to turn this whole 
 thing inside out. Its like 95% templates and inlining wreaks 
 havoc on the logic as well, but that's another problem for 
 another day...

 Does anyone else have these kinds of issues? Are there any 
 alternative methods of coarse-grained profiling (i.e., not 
 manually peppering timer calls into my code)? Whats with the 
 unreachable statements? Any hints on what I can try next to get 
 closer to a performance profile of my code?
Instrumenting 'conventional' profilers such as DMD's builtin profiler or gprof are pretty useless for getting reliable data as they distort the results. I recommend using a sampling profiler. With sampling profilers you usually get profiling results down to source line or even instruction level and you don't need to recompile your binary (having debug symbols is needed for source lines, though). They also tend to be able to measure more than just time (e.g. cache misses for individual caches, branches _and_ branch mispredictions, FPU usage, etc, etc) If you're on Linux, 'perf' is good (on Ubuntu/Mint, possibly other distros just type 'perf' into the console and it will tell you what package to install, usually it's 'linux-tools-common'). https://perf.wiki.kernel.org/index.php/Tutorial It also has the awesome 'perf top' utility that allows you to profile in real-time, like 'top' but with functions instead of processes. OProfile is good *if you can get it to run*, very similar in usage to perf but I almost always run into some issue. AMD CodeXL is also decent and on both Linux and Windows, although on non-AMD CPUs it can only measure execution time (still very useful, down to instruction level). RotateRight Zoom, Intel VTune should also be good, but both are commercial. If you're writing a game or any other real-time interactive application and need to profile occasional lags, you might need a different approach (but in this case you won't avoid manual instrumentation, although it's rather easy to use): http://defenestrate.eu/2014/09/05/frame_based_game_profiling.html https://github.com/kiith-sa/tharsis.prof
Sep 11 2014
parent "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
Awesome! These are exactly what I was looking for. Thanks!
Sep 11 2014