Reports processor usage.
The tprof command reports processor usage for individual programs and the system as a whole. This command is a useful tool for anyone with a Java, C, C++, or FORTRAN program that might be processor-bound and who wants to know which sections of the program are most heavily using the processor.
The tprof command can charge processor time to object files, processes, threads, subroutines (user mode, kernel mode and shared library) and even to source lines of programs or individual instructions. Charging processor time to subroutines is called profiling and charging processor time to source program lines is called micro-profiling.
For subroutine-level profiling, the tprof command can be run without modifying executable programs, that is no recompilation with special compiler flags is necessary. This is still true if the executables have been stripped, unless the traceback tables have also been removed. However, recompilation is required to get a micro-profile, unless a listing file is already available. To perform micro-profiling on a program, either the program should be compiled with the -g flag and the source files should be accessible to the tprof command or the program should be compiled with the -qlist flag and either both the object listing files and the source files or just the object listing files should be accessible to the tprof command. To take full advantage of tprof micro-profiling capabilities, it is best to provide both the .lst and the source file.
All of the input and report files used by the tprof command are named rootstring.suffix, where rootstring is either specified with the -r flag, or is the program name specified with the -x flag.
In realtime mode and automated offline mode, the ulimit value of the data area for the program that is being profiled is set to unlimited.
In automated offline mode, you can specify the -N flag to collect source line information into the generated RootString.syms file. And you can specify the -I flag to collect binary instructions into the generated RootString.syms file.
The tprof command can re-process these files any time to generate profiling reports. This is called manual offline mode. The rootstring.syms file contains symbolic name information similar to the output of the gensyms command. The rootstring.trc[-cpuid] files are trace log files. The -cpuid is added to the names when per-processor tracing is on. In that case, each file contains trace data from one processor only.
These files are generated when the tprof command runs (in any mode except post-processing mode) with the -c flag.
To generate these files, you need to manually run the gensyms command and AIX trace facility, or run the tprof command in automated offline mode without the -c flag.
The tprof command always first looks for rootstring.csyms and rootstring.ctrc[-cpuid] files. Only if these files are not available, does it look for the rootstring.syms and rootstring.trc[-cpuid] files. To prevent the tprof command from looking for the rootstring.csyms and rootstring.ctrc[-cpuid] files, that is, force the manual offline mode, use the -F flag.
If the input symbols file contains demangled names, you cannot use the -Z flag.
If you specify the -p, -P and -t flags, the process and thread level profile sections are created for processes and threads. The subsections present within each of the per-process of per-thread sections are identical to the subsections present in the global section, they are selected using the profiling flags (-u,-s,-k,-e,-j).
Optionally, if you run the tprof command with the -C flag, the command also generates per-processor profiling reports, which contains one profiling report per processor. The generated tprof reports have the same structure and are named using the convention: rootstring.prof[-cpuid].
If a source file is not present, but a .lst file is present, tprof only shows the processor usage based on the source lines and the instructions from the .lst file.
If you specify the -m flag, the -N flag is automatically specified to gather the source line info into a symbols file in automated offline mode.
If you specify the -Z flag with the -m flag, one report file is generated per subroutine. The following naming convention is used: RootString.source.routine.mprof, where routine is the name of one of the subroutines listed in the source file. In addition, a file named RootString.source.HOT_LINES.mprof containing the hot line profiling information described above is also created.
If you specify the -L flag, the tprof command generates annotated listing files. The files use the following naming convention: RootString.source.alst, where source is the base name of a source file. If more than one source file has the same base name, a number to uniquely identify them is appended to the report file name. For example, RootString.Filename.c.alst-1. If you specify the -Z flag with the -L flag, one report file is generated per subroutine. The following naming convention is then used: RootString.source.routine.alst, where routine is the name of one of the subroutines listed in the source file.
Time-Based versus Event-Based Profiling
By default, tprof is time-based and is driven by the decrementer interrupt. Another mode of profiling is event-based profiling, in which the interrupt is driven by either software-based events or by Performance Monitor events. With event-based profiling, both the sampling frequency and the profiling event can be varied on the command line.
The -E flag enables event-based profiling. The -E flag is one of the four software-based events (EMULATION, ALIGNMENT, ISLBMISS, DSLBMISS) or a Performance Monitor event (PM_*). By default, the profiling event is processor cycles. All Performance Monitor events are prefixed with PM_, such as PM_CYC for processor cycles or PM_INST_CMPL for instructions completed. The pmlist lists all Performance Monitor events that are supported on a processor. The chosen Performance Monitor event must be taken in a group where we can also find the PM_INST_CMPL Performance Monitor event. On POWER4 and later processors, profiling on marked events results in better accuracy. Marked events have the PM_MRK_ prefix.
If you specify the -y flag, only the specified program and its descendents are profiled. Use the -y flag only with the -E or -a flag.
The -f flag varies the sampling frequency for event-based profiling. For software-based events and processor cycles, supported frequencies range from 1 to 500 milliseconds, with a default of 10 milliseconds. For all other Performance Monitor events, the range is from 10000 to MAXINT occurrences of the event, with a default of 10000 events. If you specify the -f flag with the -y flag, the sampling frequency can range from 1 through the MAXINT occurrences for other Performance Monitor events, with a default of 10000 events.
Additional information is added to the .prof file to reflect the processor name, profiling event, and sampling frequency.
Java Applications Profiling
To profile Java applications, you must specify the -j flag, and start the applications with the -Xrunjpa API (for running on Java 5 and earlier JVMs) or the -agentlib:jpa (for running on Java 6 JVM) of the java command line option. When you specify this option, the JVM will automatically calls the jpa library whenever new classes and methods are loaded into memory. The library will in turn collect address to name mapping information for methods and classes in files named /tmp/JavaPID.syms, where PID is the process ID of a process running a Java Virtual Machine. The tprof command will automatically look in that directory for such files.
When running in automated offline mode, or selecting the cooking flags, the tprof command will copy the information contained in JavaPID.syms files into the RootString.syms or RootString.csyms file. The corresponding files in /tmp can then be deleted. The directory content should be kept up to date by tprof command users. Whenever the JVM corresponding to a particular JavaPID.syms is stopped, the file should be deleted.
Profile Accuracy
The degree to which processor activity can be resolved is determined by the number of samples captured and the degree to which hot spots dominate. While a program with a few hot spots can be profiled with relatively few samples, less-frequently executed sections of the program are not visible in the profiling reports unless more samples are captured. In cases where user programs run less than a minute, there may be insufficient resolution to have a high degree of confidence in the estimates.
A simple solution is to repeatedly execute the user program or script until you achieve the degree of resolution you need. The longer a program is run, the finer the degree of resolution of the profile. If you doubt the accuracy of a profile, run the tprof command several times and compare the resulting profiles.
Information
XML Report Generating
The -X is used in automated offline mode to generate XML report directly.
The -X is also used in manual offline mode to generate XML report from the RootString.syms and RootString.trc files.
If the -X timedata is specified, the generated XML report will include the time data information. By default, the time data generating function is turned off.
To specify the bucket number for the time data, use the buckets=N argument. The default bucket number is 1800.
Large Page Analysis
Data Profiling
The tprof -b command turns on basic data profiling and collects data access information. The summary section reports access information across the kernel data, library data, user global data, and the stack heap sections for each process.
If you specify the -b flag with the -s, -u, -k, and -e flags, the tprof command data profiling reports most used data structures (exported data symbols) in shared library, binary, kernel and kernel extensions. The -b flag also reports the functions that use those data structures.
Comparison of tprof Versus prof and gprof
The most significant differences between these three commands is that tprof collects data with no impact on the execution time of the programs being profiled, and works on optimized and stripped binaries without any need for recompilation, except to generate micro-profiling reports. Neither gprof nor prof have micro-profiling capabilities or work on optimized binaries, while they do require special compilation flags, and induce a slowdown in the execution time that can be significant. prof does not work on stripped binaries.
The prof and gprof tools are standard, supported profiling tools on many UNIX systems, including this operating system. Both prof and gprof provide subprogram profiling and exact counts of the number of times every subprogram is called. The gprof command also provides a very useful call graph showing the number of times each subprogram was called by a specific parent and the number of times each subprogram called a child. The tprof command provides neither subprogram call counts nor call graph information.
Like the tprof command, both the prof and gprof commands obtain their processor consumption estimates for each subprogram by sampling the program counter of the user program.
tprof collects processor usage information for the whole system, while prof and gprof collect only profiling information for a single program and only for the time spent in user mode.tprof also provides summary for all processes active during the execution of the profiled user program and fully support libraries and kernel mode profiling.
tprof support the profiling of Java applications, which prof and gprof do not.
Item | Description |
---|---|
-@ { ALL | wparlist } | Includes the WPAR information
in the generated reports. The ALL option includes summaries for all of the WPARs. When this option is set, the report contains a 'SYSTEM' report and a report per WPAR traced. The wparlist option specifies a comma-separated list of WPARs. When the wparlist option is set, the tprof command produces a report for each WPAR specified. |
-a | Turns on the large page analysis. |
-A { all | cpulist } | Turns on automatic offline mode. No argument turns off per-processor tracing. all enables tracing of all processors. cpulist is a comma separated list of processor-ids to be traced. |
-b | Turns on basic data profiling. |
-B | Turns on basic data profiling with the information about the instruction address mapped function. |
-c | Turns on generation of cooked files. |
-C all | cpulist | Turns on the per-processor profiling. Specify all to
generate profile reports for all processors. Processor numbers should
be separated with a comma if you give a cpulist (for example,
0,1,2). Note: per-processor profiling is possible only if per-processor
trace is either on (in automated offline mode), or has been used (in
manual offline mode). It is not possible at all in online mode. This
option is not supported if the number of CPUs traced is greater than
128.
|
-d | Turns on deferred tracing mode, that is defers data collection until trcon is called. |
-D | Turns on detailed profiling which displays processor usage by instruction offset under each subroutine. |
-e | Turns on kernel extension profiling. |
-E [ mode ] | Enables event-based profiling. The possible modes are:
|
-f frequency | Specifies the sampling frequency. The sampling frequency can be from 1 to 500 milliseconds for processor cycles and EMULATION, ALIGNMENT, ISLBMISS, and DSLBMISS events, and from 10000 to MAXINT event occurrences for other Performance Monitor events. If you specify the -f flag with the -y flag, the value of the sampling frequency ranges from 1 through the value of the MAXINT occurrences for other Performance Monitor events, with the default value of 10000 events. |
-F | Overwrites cooked files if they exists. If used without the -x flag, this forces the manual offline mode. |
-g | Does not translate symbol names into human-readable names. |
-I | Turns on binary instructions collecting. Note: The -I flag
activates to gather binary instructions when generating symbol files
or cooked symbol files in automated offline mode. However, in manual
offline mode, the -I flag does not affect the report files.
|
-j | Turns on Java classes and methods profiling. |
-k | Enables kernel profiling. |
-l | Enables long names reporting. By default tprof truncates the subroutine, program and source file names if they do not fit into the available space in the profiling report. This flag disables truncation. |
-L objectlist | Enables listing annotation for objects specified
by the comma separated list, objectlist.
Executables and shared libraries can have their listing files annotated.
Specify the archive name for libraries. Note:
|
-m objectslist | Enables micro-profiling of objects specified by the comma separated
list, objectlist. Executables, shared libraries, and kernel
extensions can be micro-profiled. Specify the archive name for libraries
and kernel extensions. Note:
|
-M PathList | Specifies the source path list. The PathList is a colon
separated list of paths that are searched for source files and .lst files
that are required for micro-profiling and listing annotation.
By default the source path list is the object search path list. |
-n | Turns off postprocessing. If the -n flag is specified, the -u, -s, -k, -e, and -j flags are ignored. The data is collected, the .trc file and the gensyms files are generated, but the .prof file is not generated. This helps avoid overloading the system during a benchmark, for example. The -A flag must be used if the -n option is used. |
-N | Turns on source line number info collecting. The -N flag activates to gather source line information when generating symbol files or cooked symbol files in automated offline mode. However, in manual offline mode, the -N flag does not affect the report files. |
-p processlist | Enables process level profiling of the process names specified
in the processlist. processlist is a comma separated
list of process names Process level profiling is enabled only if at least one of the profiling modes (-u,-s,-k,-e, or -j) is turned on. |
-P { all | PIDList } | Enables process level profiling of all processes encountered
or for processes specified with PIDList. The PIDList is
a comma separated list of process-IDs. Process level profiling is enabled only if at least one of the profiling modes (-u,-s,-k,-e, or -j) is turned on. |
-r rootstring | Specifies the rootstring.tprof input and report
files all have names in the form of rootstring.suffix.
If you do not specify the -r flag, the rootstring parameter uses the default program name that the -x flag specifies. |
-R | Specifies that the tprof command
should use samples weighted by PURR increment values to calculate
percentages. This is the preferred mode when running in either simultaneous multithreading or Micro-Partitioning® environments. The -R flag cannot be used with either the -z flag or the -Z flag. |
-s | Enables shared library profiling. |
-S PathList | Specifies the object search PathList. The PathList is
a colon separated list of paths that are searched for executables,
shared libraries and kernel extensions. The default object search PathList is the environment path list ($PATH). |
-t | Enables thread level profiling. If -p or -P are not specified with the -t flag, -t is equivalent to -P all -t. Otherwise, it enables thread level reporting for the selected processes. Thread level profiling is enabled only if at least one of the profiling modes (-u,-s,-k,-e, -j) is enabled. |
-T buffersize | Specifies the trace buffersize. This flag has meaning only in real time or automated offline modes. |
-u | Enables user mode profiling. |
-v | Enables verbose mode. |
-V File | Stores the verbose output in the specified File. |
-x program | Specifies the program to be executed by tprof. Data
collection stops when program completes or trace is
manually stopped with either trcoff or trcstop The -x flag must be the last flag in the list of flags specified in tprof. |
-X | Specifies the tprof command to call XML Generator when
the tprof profiling is finished, and to generate the XML report
directly from the tprof trace and symlib data. The -X option needs Java. Install the Java first, and make sure Java is in PATH. |
-y | Turns on the event-based profiling for only the specified command and its descendents. |
-z | Turns on ticks report. Enables compatibility mode with the
previous version of tprof. By default processor usage is only
reported in percentages. When -z is used, tprof also
reports ticks. This flag also adds the Address and Bytes columns in
subroutine reports. If you specify the -z flag with the -a flag, the process summary section in the report displays numbers rather than percentages. |
-Z | Switches reports to use ticks instead of percentages
(same as the -z flag), and splits annotated
listing (when used with the -L flag) and
annotated source files (when used with the -m flag)
into multiple files, one per subroutine. This option turns on the -g flag. |
$tprof -x sleep 10
An output that is similar to the following is displayed:
Mon May 20 00:39:26 2002 System: AIX 5.2 Node: dreaming Machine: 000671894C00
Starting Command sleep 10
stopping trace collection.
Generating sleep.prof
The sleep.prof file that is generated only contains the summary report section.
$tprof -skeuj -x sleep 10
An output that is similar to the following is displayed:
Mon May 20 00:39:26 2002
System: AIX 5.2 Node: drea
ming Machine: 000671894C00
Starting Command sleep 10
stopping trace collection.
Generating sleep.prof
The sleep.prof file that is generated contains the summary report and global profile sections.
$tprof -u -p workload -x workload
An output that is similar to the following is displayed:
Mon May 20 00:39:26 2002
System: AIX 5.2 Node: drea
ming Machine: 000671894C00
Starting Command workload stopping trace collection.
Generating workload.prof
The workload.prof file that is generated contains the summary report, the global user mode profile sections, and one process level profile section for the process 'workload' that contains only a user mode profile subsection.
$tprof -se -p send,receive -x startall
An output that is similar to the following is displayed:
Mon May 20 00:39:26 2002
System: AIX 5.2 Node: dreaming Machine: 000671894C00
Starting Command startall
stopping trace collection.
Generating startall.prof
The startall.prof file that is generated contains the summary report, the global shared library mode profile, the global kernel extension profile sections, and two process level profile sections: one for the process 'send', and one for the process 'receive'. The process level sections each contain two subsections: one with shared library profiling information and one with kernel extensions profiling information.
$tprof -m ./tcalc -L ./tcalc -u -x ./tcalc
An output that is similar to the following is displayed:
Mon May 20 00:47:09 2002
System: AIX 5.2 Node: dreaming Machine: 000671894C00
Starting Command ./tcalc
stopping trace collection.
Generating tcalc.prof
Generating tcalc.tcalc.c.mprof
Generating tcalc.tcalc.c.alst
The tcalc.prof file that is generated contains the summary report and the global user mode profile sections. The resulting tcalc.tcalc.c.mprof and tcalc.tcalc.c.alst files contain the micro-profiling report and the annotated listing.
$tprof -E -f 100 -Askex sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Tue Apr 26 14:44:02 2005
System: AIX 5.3 Node: bigdomino Machine: 00C0046A4C00
Generating sleep.trc
Generating sleep.prof
Generating sleep.syms
$tprof -E PM_INST_CMPL -f 20000 -Askex sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Tue Apr 26 14:42:44 2005
System: AIX 5.3 Node: bigdomino Machine: 00C0046A4C00
Generating sleep.trc
Generating sleep.prof
Generating sleep.syms
$tprof -E EMULATION -Askex sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Tue Apr 26 14:41:44 2005
System: AIX 5.3 Node: bigdomino Machine: 00C0046A4C00
Generating sleep.trc
Generating sleep.prof
Generating sleep.syms
$tprof -c -A all -x sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Mon May 20 00:52:52 2002
System: AIX 5.2 Node: dreaming Machine: 000671894C00
Generating sleep.ctrc
Generating sleep.csyms
Generating sleep.prof
The sleep.prof file that is generated only has a summary report section, while the two cooked files are ready to be re-postprocessed.
$tprof -A -N -x sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Wed Feb 8 15:12:41 2006
System: AIX 5.3 Node: aixperformance Machine: 000F9F3D4C00
Generating sleep.trc
Generating sleep.prof
Generating sleep.syms
The sleep.prof file that is
generated only contains the summary report section, while sleep.syms contains
the source line information.$tprof -A -N -I -r RootString -x sleep 10
The output is similar to the following display:
Starting Command sleep 10
stopping trace collection.
Wed Feb 8 15:16:37 2006
System: AIX 5.3 Node: aixperformance Machine: 000F9F3D4C00
Generating RootString.trc
Generating RootString.prof
Generating RootString.syms
The rootstring.prof file
is generated. The rootstring.syms file contains the source
line information and binary instructions.$tprof -N -I -x java -Xrunjpa:source=1,instructions=1 HelloAIX
The output is similar to the following display:
Thu Feb 9 13:30:38 2006
System: AIX 5.3 Node: perftdev Machine: 00CEBB4A4C00
Starting Command java -Xrunvpn_jpa:source=1,instructions=1 Hello AIX
Hello AIX!
stopping trace collection.
Generating java.prof
The java.prof file is generated.
It contains the JIT source line information and the JIT instructions.$tprof -A -n -s -t -r test -x vloop_lib_32 5
The output is similar to the following display:
Starting Command vloop_lib_32 5
stopping trace collection.
Generating test.trc
Generating test.syms
$tprof -A -X -r RootString -x sleep 10
Starting Command sleep 10
stopping trace collection.
Tue Apr 17 22:00:24 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
Generating sleep.trc
Generating sleep.syms
Calling tprof2xml to generate XML report.
tprof2xml TraceReader Version 1.2.0
Tue Apr 17 22:00:24 2007
System: AIX 6.1 Node: test105 Machine: 00CEBB4A4C00
------------------0------------------
Record 0
Post-processing counters
Retrieving Disassembly
writing the XML
Writing symbol list
.
Writing process hierarchy
Finished writing sleep.etm
$tprof -A -N -I -X -x sleep 10
Starting Command sleep 10
stopping trace collection.
Tue Apr 17 22:00:24 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
Generating sleep.trc
Generating sleep.syms
Calling tprof2xml to generate XML report.
tprof2xml TraceReader Version 1.2.0
Tue Apr 17 22:00:24 2007
System: AIX 6.1 Node: test105 Machine: 00CEBB4A4C00
------------------0------------------
Record 0
Post-processing counters
Retrieving Disassembly
writing the XML
Writing symbol list
.
Writing process hierarchy
Finished writing sleep.etm
The symbol data elements in the xml report will have both bytes and
LineNumberList child elements.
$tprof -A -X timedata,buckets=100 -x sleep 10
Starting Command sleep 10
stopping trace collection.
Tue Apr 17 22:18:06 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
Generating RootString.trc
Generating RootString.syms
Calling tprof2xml to generate XML report.
tprof2xml TraceReader Version 1.2.0
Tue Apr 17 22:18:06 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
Tue Apr 17 22:18:06 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
------------------0------------------
Record 0
Post-processing counters
Retrieving Disassembly
writing the XML
Writing symbol list
.
Writing process hierarchy
Finished writing RootString.etm
The RootString.etm will have bucket elements in each object of the profile
hierachy.
$tprof -A -x sleep 10
Starting Command sleep 10
stopping trace collection.
Tue Apr 17 22:28:01 2007
System: AIX 5.3 Node: test105 Machine: 00CEBB4A4C00
Generating sleep.trc
Generating sleep.prof
Generating sleep.syms
$tprof -X -r sleep
Calling tprof2xml to generate XML report.
tprof2xml TraceReader Version 1.2.0
Tue Apr 17 22:28:01 2007
System: AIX 6.1 Node: test105 Machine: 00CEBB4A4C00
------------------0------------------
Record 0
Post-processing counters
Retrieving Disassembly
writing the XML
Writing symbol list
.
Writing process hierarchy
Finished writing sleep.etm
$tprof -a -y workload
Starting Command workload
stopping trace collection.
Tue Apr 26 14:42:44 2005
System: AIX 5.3 Node: bigdomino Machine: 00C0046A4C00
Generating workload.trc
Generating workload.prof
Generating workload.syms
$tprof -E PM_MRK_LSU_FIN -f 20000 –Aske –y workload
Starting Command workload
stopping trace collection.
Tue Apr 26 16:42:44 2005
System: AIX 5.3 Node: bigdomino Machine: 00C0046A4C00
Generating workload.trc
Generating workload.prof
Generating workload.syms
$ tprof -N -I -x java -agentlib:jpa=source=1,instructions=1 Hello AIX
$ tprof -N -I -x java -agentlib:jpa64=source=1,instructions=1 Hello AIX
Fri May 30 04:16:27 2008
System: AIX 6.1 Node: toolbox2 Machine: 00CBA6FE4C00
Starting Command java -agentlib:jpa=source=1,instructions=1 Hello AIX
Hello AIX!
stopping trace collection.
Generating java.prof
The java.prof file is generated. It contains the JIT source line information and JIT instructions.
If your system displays the following message:
/dev/systrace: device busy or trcon: TRCON:no such device
This means the trace facility is already in use. Stop your program and try again after typing trcstop, stops the trace.