www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - How accurate is dmd profile? (and do I need GMD/LDC to use gprof?)

reply Chris Katko <ckatko gmail.com> writes:
Does it break down on multi-threaded scenarios?

I'm running dmd (newest) + Allegro (a C game programming library) 
with DAllegro (a nice templated binder). My executable is 
multi-threaded (mostly just helper functions / glue logic from 
libraries/D/etc), and using OpenGL on 64-bit Linux with a very 
recent DMD release.

The function that dmd's -profile reported was the biggest user of 
CPU time was a tiny little function that draws a couple 
background tiles. (<30 at worst case) As opposed to everything 
else being drawn, tons of opengl primitives, graphical text, text 
being converted with tons of writelns to console, etc.

It didn't make any sense, so I loaded it up with valgrind and 
kcachegrind and it said, "no, this function takes 0.00 of total 
time."

Should I be using DMD's -profile? Does it have known failure 
modes? Is this failure mode new to people? Is there any way to 
get normal profiling with gprof or whatever with DMD, or do I 
need to compile with LDC and GDC?

I'm getting back into D and I recall having both toolchains (LDC 
and DMD) running. This might have been the reason I kept LDC 
around and maintained two sets of libraries compiled for both LDC 
and DMD.

Also "-profile" functions used over 7% of all CPU time. Is that 
the nature of the profiling, or is D using way more than 
comparable languages/compilers?

Lastly, is there any way to d mangle D functions in 
Valgrind/kcachegrind?

Thanks! Have a great weekend!
Oct 03 2021
next sibling parent reply max haughton <maxhaton gmail.com> writes:
On Sunday, 3 October 2021 at 08:31:14 UTC, Chris Katko wrote:
 Does it break down on multi-threaded scenarios?

 I'm running dmd (newest) + Allegro (a C game programming 
 library) with DAllegro (a nice templated binder). My executable 
 is multi-threaded (mostly just helper functions / glue logic 
 from libraries/D/etc), and using OpenGL on 64-bit Linux with a 
 very recent DMD release.

 [...]
Unless your profiler does call stack sampling (which I don't think gprof or dmd does), don't use it. They're not reliable unless you are doing very targeted profiling. For profiling code, if you're on an Intel, vTune is top dog. Nothing else is as good.
Oct 03 2021
parent drug <drug2004 bk.ru> writes:
03.10.2021 19:20, max haughton пишет:
 On Sunday, 3 October 2021 at 08:31:14 UTC, Chris Katko wrote:
 Does it break down on multi-threaded scenarios?

 I'm running dmd (newest) + Allegro (a C game programming library) with 
 DAllegro (a nice templated binder). My executable is multi-threaded 
 (mostly just helper functions / glue logic from libraries/D/etc), and 
 using OpenGL on 64-bit Linux with a very recent DMD release.

 [...]
Unless your profiler does call stack sampling (which I don't think gprof or dmd does), don't use it. They're not reliable unless you are doing very targeted profiling. For profiling code, if you're on an Intel, vTune is top dog. Nothing else is as good.
Both sampling and instrumenting profiling can be unreliable. Sampling profiler results are subject to sampling rate for example. Instrumenting profiler can change timings too much. In fact, sampling and instrumentation complement each other.
Oct 03 2021
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sun, Oct 03, 2021 at 08:31:14AM +0000, Chris Katko via Digitalmars-d wrote:
[...]
 The function that dmd's -profile reported was the biggest user of CPU
 time was a tiny little function that draws a couple background tiles.
 (<30 at worst case) As opposed to everything else being drawn, tons of
 opengl primitives, graphical text, text being converted with tons of
 writelns to console, etc.
[...] Be aware that dmd -profile uses *16-bit counters* for tracking function call counts; if your program is CPU-intensive and calls the same function(s) in inner loops more than 65535 times, the counters will wrap around and cause the profile output to be garbled. I ran into this a couple of years ago when trying to profile some CPU-intensive code with some non-trivial testcases, and found dmd -profile output completely unusable because of this limitation. T -- Recently, our IT department hired a bug-fix engineer. He used to work for Volkswagen.
Oct 03 2021
next sibling parent Chris Katko <ckatko gmail.com> writes:
On Sunday, 3 October 2021 at 21:55:29 UTC, H. S. Teoh wrote:
 Be aware that dmd -profile uses *16-bit counters* for tracking 
 function call counts;
WHAT?! Wouldn't 32/64 bit (architecture native) values be faster for memory accesses anyway??
Oct 03 2021
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Sunday, 3 October 2021 at 21:55:29 UTC, H. S. Teoh wrote:
 On Sun, Oct 03, 2021 at 08:31:14AM +0000, Chris Katko via 
 Digitalmars-d wrote: [...]
 The function that dmd's -profile reported was the biggest user 
 of CPU time was a tiny little function that draws a couple 
 background tiles. (<30 at worst case) As opposed to everything 
 else being drawn, tons of opengl primitives, graphical text, 
 text being converted with tons of writelns to console, etc.
[...] Be aware that dmd -profile uses *16-bit counters* for tracking function call counts; if your program is CPU-intensive and calls the same function(s) in inner loops more than 65535 times, the counters will wrap around and cause the profile output to be garbled. I ran into this a couple of years ago when trying to profile some CPU-intensive code with some non-trivial testcases, and found dmd -profile output completely unusable because of this limitation. T
What the... I had no idea but that alone deems it useless to me.
Oct 03 2021
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Monday, 4 October 2021 at 06:10:00 UTC, bauss wrote:
 On Sunday, 3 October 2021 at 21:55:29 UTC, H. S. Teoh wrote:
 On Sun, Oct 03, 2021 at 08:31:14AM +0000, Chris Katko via 
 Digitalmars-d wrote: [...]
 [...]
[...] Be aware that dmd -profile uses *16-bit counters* for tracking function call counts; if your program is CPU-intensive and calls the same function(s) in inner loops more than 65535 times, the counters will wrap around and cause the profile output to be garbled. I ran into this a couple of years ago when trying to profile some CPU-intensive code with some non-trivial testcases, and found dmd -profile output completely unusable because of this limitation. T
What the... I had no idea but that alone deems it useless to me.
Maybe we can change it
Oct 05 2021
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 06, 2021 at 05:34:54AM +0000, Imperatorn via Digitalmars-d wrote:
 On Monday, 4 October 2021 at 06:10:00 UTC, bauss wrote:
 On Sunday, 3 October 2021 at 21:55:29 UTC, H. S. Teoh wrote:
[...]
 Be aware that dmd -profile uses *16-bit counters* for tracking
 function call counts; if your program is CPU-intensive and calls
 the same function(s) in inner loops more than 65535 times, the
 counters will wrap around and cause the profile output to be
 garbled.
 
 I ran into this a couple of years ago when trying to profile some
 CPU-intensive code with some non-trivial testcases, and found dmd
 -profile output completely unusable because of this limitation.
[...]
 What the... I had no idea but that alone deems it useless to me.
Maybe we can change it
That would be very nice. I believe the code is somewhere in druntime. It would save a lot of grief in the future. :-) T -- An elephant: A mouse built to government specifications. -- Robert Heinlein
Oct 06 2021