www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Improve performance of -profile by factor of 10

reply Walter Bright <newshound2 digitalmars.com> writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9787

This can be a fun little project, with a nice payoff. Any takers?
Mar 22 2013
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
On Mar 22, 2013, at 3:39 PM, Walter Bright <newshound2 digitalmars.com> =
wrote:

 http://d.puremagic.com/issues/show_bug.cgi?id=3D9787
=20
 This can be a fun little project, with a nice payoff. Any takers?

Bonus points if the code is made multithread capable. When I was = thinking about this before, the correct approach seemed to be tracking = profile data on a per-thread basis and then merging results into final = on thread termination.=
Mar 25 2013
parent Martin Nowak <code dawg.eu> writes:
On 03/25/2013 11:37 PM, Sean Kelly wrote:
 On Mar 22, 2013, at 3:39 PM, Walter Bright <newshound2 digitalmars.com> wrote:

 http://d.puremagic.com/issues/show_bug.cgi?id=9787

 This can be a fun little project, with a nice payoff. Any takers?

Bonus points if the code is made multithread capable. When I was thinking about this before, the correct approach seemed to be tracking profile data on a per-thread basis and then merging results into final on thread termination.

Mar 26 2013
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 22 March 2013 at 22:39:56 UTC, Walter Bright wrote:
 http://d.puremagic.com/issues/show_bug.cgi?id=9787

 This can be a fun little project, with a nice payoff. Any 
 takers?

This is a bit off-topic, but: A polling profiler would be more precise and efficient than an instrumenting profiler. A polling profiler simply periodically pauses the program thread, records its state, and resumes it. The advantage is that execution times of small functions are not skewed by the overhead added by instrumentation. A polling profiler runs mostly in its own thread, so it has a smaller impart on the main program thread. A polling profiler is also capable of measuring performance down to a CPU-instruction level. The disadvantages of a polling profiler are: 1. The program must run for a considerable amount of time, so the profiler gathers enough samples to build a good picture of the program's performance; 2. As a consequence of the above, functions that execute quickly / are called relatively rarely may not appear in the profiler's output at all; 3. Stack frames must be enabled, to be able to collect call stacks. On Windows, I've had good success with compiling D programs with -g, converting their debug information using cv2pdb[1], then profiling them using Very Sleepy - I have a fork of it with some enhancements[2]. [1]: http://dsource.org/projects/cv2pdb [2]: http://blog.thecybershadow.net/2013/01/11/very-sleepy-fork/
Mar 25 2013
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
 The disadvantages of a polling profiler are:

4. not getting the fan in / fan out data. 5. requires non-trivial effort in getting it to work on each platform.
Mar 25 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2013 7:26 PM, Vladimir Panteleev wrote:
 On Monday, 25 March 2013 at 23:52:26 UTC, Walter Bright wrote:
 On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
 The disadvantages of a polling profiler are:

4. not getting the fan in / fan out data.

It is assembled from collected stack frames (assuming I understood the term correctly).

While you can get the caller (after all, debuggers do it), it can be arbitrarily costly (in terms of execution speed) to do so, which can negate many of the advantages of a probing profiler. The ones I've seen didn't bother to do it. Fan in/out is very useful because the most effective optimization is to not call the time consuming functions, and this path information enables you to figure out where you don't really need to call it.
Mar 25 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2013 8:01 PM, Vladimir Panteleev wrote:
 On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote:
 While you can get the caller (after all, debuggers do it), it can be
 arbitrarily costly (in terms of execution speed) to do so, which can negate
 many of the advantages of a probing profiler.

What? You just read the value EBP is pointing at, or something like that. Walking the call stack is basically walking a linked list.

If only it were that simple. 1. many stack frames do not have an EBP 2. the stack frames on Win64 require doing a bunch of table searches to figure out - they don't use EBP 3. even when you find the return address, then it's a costly process to figure out what function that address belongs in
 The ones I've seen didn't bother to do it.

Maybe they just weren't very good profilers ;) I've tried a few before I found Very Sleepy.
 Fan in/out is very useful because the most effective optimization is to not
 call the time consuming functions, and this path information enables you to
 figure out where you don't really need to call it.

Who's arguing that?

Just wanted to point out how useful it is!
Mar 25 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/25/2013 10:16 PM, deadalnix wrote:
 You can still stop the thread, gather the data you are interested in, and doing
 the whole process while resuming the application, which leverage concurrency.

The obvious difficulty with that is when the app is posting data to the profiling thread faster than the latter can process it. At some point, I'm going to say feel free to write a better profiler! I only suggest that it be as easy to use as the existing one.
Mar 26 2013
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2013 4:52 PM, Vladimir Panteleev wrote:
 Even when the overhead of a polling profiler is higher, it has the qualifying
 difference in that the performance overhead does not skew the execution times
of
 particular parts of a program as much as an instrumenting profiler. That is,
 while an instrumenting profiler makes some parts of the program much slower
than
 others, a polling profiler should make the whole program about equally slower.

I've done a lot of very successful optimization jobs using -profile. Sure, it isn't terribly accurate, but it's plenty accurate enough where it matters.
Mar 26 2013
parent Walter Bright <newshound2 digitalmars.com> writes:
On 3/26/2013 6:30 PM, Jonathan M Davis wrote:
 On Tuesday, March 26, 2013 17:17:51 Walter Bright wrote:
 I've done a lot of very successful optimization jobs using -profile. Sure,
 it isn't terribly accurate, but it's plenty accurate enough where it
 matters.

I'd say that for the most part, our approach to stuff like -profile or unit testing is to provide a simple-to-use feature that solves the problem 90% of the time while leaving more powerful (and therefore more complicated) stuff to 3rd party solutions. That way, we have a tool that everyone can use effectively even if it doesn't have all of the bells and whistles, and those that really care about the bells and whistles can do what they normally would have done anyway if we didn't provide a solution (i.e. use or write a 3rd party solution).

Yup. My experience with such things, including profiling, coverage testing, unit testing, documentation, lambdas, etc., is that if it ain't easy, it just ain't happening. I remember in the 80's getting a profiler, and with it came a manual that was an inch thick. It never got used.
Mar 26 2013
prev sibling parent Martin Nowak <code dawg.eu> writes:
On 03/26/2013 12:22 AM, Vladimir Panteleev wrote:
 On Friday, 22 March 2013 at 22:39:56 UTC, Walter Bright wrote:
 A polling profiler would be more precise and efficient than an
 instrumenting profiler.

I don't see why we should switch the profiling method. Using sampling profilers has a lot of benefits but also some drawbacks and there are already a lot of them out there, e.g. http://en.wikipedia.org/wiki/Profiling_(computer_programming)#Statistical_profilers. On unix you can use gprof which is as simple as passing "-pg" to the gcc link command and calling gprof on the output data.
Mar 26 2013
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Monday, 25 March 2013 at 23:52:26 UTC, Walter Bright wrote:
 On 3/25/2013 4:22 PM, Vladimir Panteleev wrote:
 The disadvantages of a polling profiler are:

4. not getting the fan in / fan out data.

It is assembled from collected stack frames (assuming I understood the term correctly).
Mar 25 2013
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote:
 While you can get the caller (after all, debuggers do it), it 
 can be arbitrarily costly (in terms of execution speed) to do 
 so, which can negate many of the advantages of a probing 
 profiler.

What? You just read the value EBP is pointing at, or something like that. Walking the call stack is basically walking a linked list.
 The ones I've seen didn't bother to do it.

Maybe they just weren't very good profilers ;) I've tried a few before I found Very Sleepy.
 Fan in/out is very useful because the most effective 
 optimization is to not call the time consuming functions, and 
 this path information enables you to figure out where you don't 
 really need to call it.

Who's arguing that?
Mar 25 2013
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 26 March 2013 at 05:01:03 UTC, Walter Bright wrote:
 On 3/25/2013 8:01 PM, Vladimir Panteleev wrote:
 On Tuesday, 26 March 2013 at 02:57:07 UTC, Walter Bright wrote:
 While you can get the caller (after all, debuggers do it), it 
 can be
 arbitrarily costly (in terms of execution speed) to do so, 
 which can negate
 many of the advantages of a probing profiler.

What? You just read the value EBP is pointing at, or something like that. Walking the call stack is basically walking a linked list.

If only it were that simple. 1. many stack frames do not have an EBP 2. the stack frames on Win64 require doing a bunch of table searches to figure out - they don't use EBP 3. even when you find the return address, then it's a costly process to figure out what function that address belongs in

You can still stop the thread, gather the data you are interested in, and doing the whole process while resuming the application, which leverage concurrency. The obvious advantage is that you don't measure the profiler's performance in addition to your app's. BTW, I want to raise issue with fibers. We should report 2 stacks : the stack of function calls, and the stack of fiber calls.
Mar 25 2013
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 26 March 2013 at 16:33:19 UTC, Martin Nowak wrote:
 On 03/26/2013 12:22 AM, Vladimir Panteleev wrote:
 On Friday, 22 March 2013 at 22:39:56 UTC, Walter Bright wrote:
 A polling profiler would be more precise and efficient than an
 instrumenting profiler.

I don't see why we should switch the profiling method.

I'm not saying we should. Just letting D users know that other solutions exist, which would work better in some situations.
Mar 26 2013
prev sibling next sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Tuesday, 26 March 2013 at 07:45:34 UTC, Walter Bright wrote:
 On 3/25/2013 10:16 PM, deadalnix wrote:
 You can still stop the thread, gather the data you are 
 interested in, and doing
 the whole process while resuming the application, which 
 leverage concurrency.

The obvious difficulty with that is when the app is posting data to the profiling thread faster than the latter can process it.

That's not how polling profilers work. Polling profilers do not run alien code in the programs' thread. Thus, the program thread is not posting anything to the profiling thread. I'm repeating myself, but: they work by having the profiler thread/process periodically pause the program thread, record its state, resume it, then analyze/store the collected data, and sleep a bit before taking another sample. Even when the overhead of a polling profiler is higher, it has the qualifying difference in that the performance overhead does not skew the execution times of particular parts of a program as much as an instrumenting profiler. That is, while an instrumenting profiler makes some parts of the program much slower than others, a polling profiler should make the whole program about equally slower.
Mar 26 2013
prev sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, March 26, 2013 17:17:51 Walter Bright wrote:
 I've done a lot of very successful optimization jobs using -profile. Sure,
 it isn't terribly accurate, but it's plenty accurate enough where it
 matters.

I'd say that for the most part, our approach to stuff like -profile or unit testing is to provide a simple-to-use feature that solves the problem 90% of the time while leaving more powerful (and therefore more complicated) stuff to 3rd party solutions. That way, we have a tool that everyone can use effectively even if it doesn't have all of the bells and whistles, and those that really care about the bells and whistles can do what they normally would have done anyway if we didn't provide a solution (i.e. use or write a 3rd party solution). - Jonathan M Davis
Mar 26 2013