www.digitalmars.com Home | Search | CTG | RTL | IDDE | STL
Last update Sat Apr 8 23:54:06 2006
Compiler & Tools Guide

Compiling
Compiling Code
C Implementation
C++ Implementation
Language Extensions
Mixing Languages
Assembly Language
Inline Assembler
Optimizing Code
Numerics Programming
Regular Expressions
Acrtused
Pragmas
Precompiled Headers
Predefined Macros
Warning Messages
Error Messages
Runtime Messages


Linking
Optlink
Switches
Module Definition Files
Operation and Design
Error Messages


Win32 Programming
Win32 Programming


DOS and Win16
Programming

Memory Models
16 Bit Pointer Types
and Type Modifiers

Handle Pointers
DOS
DOS 32 (DOSX)
Win16
Win16 DLLs
Win16 Prolog/Epilog


C/C++ Extensions
Contract Programming
__debug statement
__debug declaration
Dynamic Profiling
Embedding C in HTML


Tools
BCC
CHMOD
CL
COFF2OMF
COFFIMPLIB
DMC
DIFF
DIFFDIR
DUMP
DUMPOBJ
DUMPEXE
EXE2BIN
FLPYIMG
GREP
HC
IMPLIB
LIB
LIBUNRES
MAKE
MAKEDEP
ME
OBJ2ASM
PATCHOBJ
RC
RCC
SC
SHELL
SMAKE
TOUCH
UNMANGLE
WHEREIS


Porting to DMC++
Switching to DMC++
from Microsoft
from Borland
Porting Guide


Trace Dynamic Profiling

Page swapping can cripple an otherwise fast program. Normally, code is written in such a manner as to group together functions that are conceptually related, not those that call each other. To minimize page swapping, functions must be grouped together that call each other a lot at runtime.

Grouping functions that call each other has other benefits, even on machines that have sufficient memory to run the application without swapping:

  • Called functions will be more likely to be in the memory cache already.
  • When Windows loads a program or DLL, it doesn't actually read it into memory. It simply maps an address range onto the file. Then, when the program executes, pages of the program are swapped into memory as they are referenced. Grouping functions that call each other will minimize the amount of the program that actually has to be read in from disk, thus minimizing load time.
A compiler can determine the static calling relationships between functions in a program, but this has severe limitations. The compiler has no way to determine at compile time which connections are called relatively often. There is also no way to determine function calling relationships via virtual function dispatch, or other dispatch via function pointers.

What is necessary is to gather this information at runtime, preferably while running a typical application of the program. Such dynamic profiling information can then be processed to yield an order in which functions should be placed by the linker.

Dynamic Profiling With DMC++

This capability is now built in to Digital Mars C++. It is currently only implemented for Win32 systems, but the results should be directly applicable to a 16 bit version of your program.

Setting Up

It's enabled by two switches working together: -gt and -Nc.

-gt by itself inserts profiling code into every function, and when the program terminates, it prints out a log showing for each function:

  • What other functions call it and how often (called the "fan in")
  • What functions it calls and how often (the "fan out")
Throwing -gt will also predefine the macro
	#define _DEBUG_TRACE 1
If -Nc (make all functions COMDATs) is thrown as well, profiling code is only inserted into global COMDAT functions, since it is only global COMDAT functions that the linker can adjust the link order on.

If there are many static functions, and those are to be profiled and grouped as well, throw the -gg (make static functions global) also. This assumes, of course, that all static function names in the -gg modules are distinct from one another and all global names.

The program is linked normally.

Gathering the Data

Run the program in the manner it is typically used. The trace code will accumulate the information, and when the program is terminated, the information will be written to files 'trace.log' and 'trace.def'. trace.log is a readable text file with a list of all the traced functions, along with who calls them (fan in) and who they call (fan out), and each time this occurs.

Following this is listed timing data. The data are:

	Num Calls:      Number of times the function was called.
	Tree Time:      Time used by function plus the tree times of all the
			functions it calls.
	Func Time:      Time used by a function excluding the time
			used by the functions it calls (times for functions 
			it calls that are not compiled with -gt, like
			RTL functions, are included in Func Time).
	Per Call:       (Func Time) / (Num Calls)
If multiple runs are done with the program, the profiling data is merged and summed with any previously existing trace.log file.

The final output is the 'trace.def' file, which is meant to be included in the linker module definition file. It contains the list of the functions in the order the linker should link them.

Modifying Dynamic Profiling Behavior

The behavior of the profiler can be modified with several functions that can be explicitly called in the program. They are prototyped in trace.h. They are implemented aa macros that expand to nothing if -gt is not thrown.
	void trace_term()
Normally, this function is called when the program exits as part of the static destructor list. If this list is not called, or you wish to stop gathering data before the end of the program (for instance, to optimize for fast load), then insert a call to trace_term() at the point where it is to stop.
	int trace_setdeffilename(const char *name)
Set file name to use for .def file instead of trace.def. Returns: 0 for success, non-zero for failure
	int trace_setlogfilename(const char *name)
Set file name to use for .def file instead of trace.log. Returns: 0 for success, non-zero for failure

Optimizing Program Load Time

To optimize the program for fast loading, rather than fast running, gather the profile data only as far as the program is loaded. Then, call trace_term() to terminate the data gathering and write the output files.

Notes

  • The library does not have profiling instrumentation in it, so the library functions are not grouped by the linker, nor is any profiling information collected for library functions. The extra code linked in to do the profiling makes use of the library functions, so if the library functions are instrumented, some infinite loops can occur. So don't recompile the library with -gt.
  • Functions declared with __declspec(naked) never get profiling instrumentation added to them.
  • The profile data gathering code will get confused if non-local jumps occur when things like setjmp/longjmp, Windows NT structured exceptions, or C++ exceptions are used. If this profiling tool is well used by customers, we'll address these shortcomings.
  • The timing is based on the Win32 performance timing API, which is accurate to about one microsecond. Unfortunately, any paging or any other tasks running can erratically change the timings. There is no real way around this besides just running the tests several times and averaging the results.
  • The overhead of the instrumentation code can distort the measurements when the function times are very short. The profiler tries to compensate for this, but the overhead times are not always consistent.
  • Profiling of multithreaded apps is not supported.
  • Multiple EXEs and DLLs can be profiled simultaneously, but the log files and def files must all be set to different names, or else they will step on each other.