www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - An easy-way to profile apps visually

reply Guillaume Piolat <first.last spam.org> writes:
tl;dr Simply generates a JSON that follows the Trace Event Format.

Trace Event Format is a simple JSON format that is then read by 
web apps like:
  - chrome://tracing
  - https://ui.perfetto.dev/

You can get images of your instrumented program like that very 
easily:
https://imgur.com/a/q4Uwosz

Surprisingly, TLS really shines there, since you can collect the 
JSON trace in a thread-local manner and concatenate the output at 
the end. Though, the reallocs will get more and more expensive as 
time goes by. The profile size balloons easily.

All in all I think explicit frame profiling like that is a 
valuable alternative to either sampling or instrumentation 
profiler. At least you can finally visualize parallelism and how 
much of it is synchronization.

Profiler implementation in dplug:gui => 
https://github.com/AuburnSounds/Dplug/blob/master/gui/dplug/gui/profiler.d
(haven't tested outside Windows for now... I was surprised synchronization
stuff was relatively lightweight), it would be a small deal of work to strip it
of its library.
Dec 15 2022
parent reply Hipreme <msnmancini hotmail.com> writes:
On Thursday, 15 December 2022 at 14:14:01 UTC, Guillaume Piolat 
wrote:
 tl;dr Simply generates a JSON that follows the Trace Event 
 Format.

 [...]
What are your thoughts about that? Do you think is it worth? Or is the proposal totally different? I have been using AMD uProf and I have been good results with it
Dec 17 2022
parent reply Guillaume Piolat <first.last spam.org> writes:
On Saturday, 17 December 2022 at 10:24:29 UTC, Hipreme wrote:
 What are your thoughts about that? Do you think is it worth? Or 
 is the proposal totally different? I have been using AMD uProf 
 and I have been good results with it
I think sampling profilers are good for finding places where you spend CPU, and "frame" profiling is good to find parallelization opportunities and latency improvements.
Dec 18 2022
parent reply Hipreme <msnmancini hotmail.com> writes:
On Sunday, 18 December 2022 at 14:25:45 UTC, Guillaume Piolat 
wrote:
 On Saturday, 17 December 2022 at 10:24:29 UTC, Hipreme wrote:
 What are your thoughts about that? Do you think is it worth? 
 Or is the proposal totally different? I have been using AMD 
 uProf and I have been good results with it
I think sampling profilers are good for finding places where you spend CPU, and "frame" profiling is good to find parallelization opportunities and latency improvements.
When the "frame profiling" could show you a parallelization opportunity? I'm thinking on how I could apply that in my context
Dec 18 2022
parent Guillaume Piolat <first.last spam.org> writes:
On Sunday, 18 December 2022 at 14:59:12 UTC, Hipreme wrote:
 When the "frame profiling" could show you a parallelization 
 opportunity? I'm thinking on how I could apply that in my 
 context
For example in my image example, I never had the idea before that first draw of the background widget could load two images at the same time in order to save first open time. When you program in CUDA, it's very similar with the nvidia profiler, and majorly easier to optimize for the bottleneck.
Dec 18 2022