nvprof –help

nvprof –helpUsage:nvprof[options][application][application-arguments]Options:–aggregate-mode<on|off>Turnon/offaggregatemodeforeventsandmetricsspecifiedbysubsequent"–events"and"–…

大家好,又见面了,我是你们的朋友全栈君。

Usage: nvprof [options] [application] [application-arguments]

Options:

--aggregate-mode <on|off>

Turn on/off aggregate mode for events and metrics specified

by subsequent "--events" and "--metrics" options. Those

event/metric values will be collected for each domain instance,

instead of the whole device. Allowed values:

on - turn on aggregate mode (default)

off - turn off aggregate mode



--analysis-metrics

Collect profiling data that can be imported to Visual Profiler's

"analysis" mode. Note: Use "--export-profile" to specify

an export file.



--concurrent-kernels <on|off>

Turn on/off concurrent kernel execution. If concurrent kernel

execution is off, all kernels running on one device will

be serialized. Allowed values:

on - turn on concurrent kernel execution (default)

off - turn off concurrent kernel execution



--continuous-sampling-interval <interval>

Set the continuous mode sampling interval in milliseconds.

Minimum is 1 ms. Default is 2 ms.



--dependency-analysis

Generate event dependency graph for host and device activities

and run dependency analysis.



--device-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for non-CDP operations, especially for concurrent

kernel tracing, for each buffer on a context. The default

value is 8MB. The size should be a positive integer.



--device-cdp-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for CDP operations for each buffer on a context.

The default value is 8MB. The size should be a positive

integer.



--devices <device ids>

Change the scope of subsequent "--events", "--metrics", "--query-events"

and "--query-metrics" options.

Allowed values:

all - change scope to all valid devices

comma-separated device IDs - change scope to specified

devices



--event-collection-mode <mode>

Choose event collection mode for all events/metrics Allowed

values:

kernel - events/metrics are collected only for durations

of kernel executions (default)

continuous - events/metrics are collected for duration

of application. This is not applicable for non-tesla devices.

This mode is compatible only with NVLink events/metrics.

This modeis incompatible with "--profile-all-processes"

or "--profile-child-processes" or "--replay-mode kernel"

or "--replay-mode application".



-e, --events <event names>

Specify the events to be profiled on certain device(s). Multiple

event names separated by comma can be specified. Which device(s)

are profiled is controlled by the "--devices" option. Otherwise

events will be collected on all devices.

For a list of available events, use "--query-events".

Use "--events all" to profile all events available for each

device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.



--kernel-latency-timestamps <on|off>

Turn on/off collection of kernel latency timestamps, namely

queued and submitted. The queued timestamp is captured when

a kernel launch command was queued into the CPU command

buffer. The submitted timestamp denotes when the CPU command

buffer containing this kernel launch was submitted to the

GPU. Turning this option on may incur an overhead during

profiling. Allowed values:

on - turn on collection of kernel latency timestamps

off - turn off collection of kernel latency timestamps

(default)



--kernels <kernel path syntax>

Change the scope of subsequent "--events", "--metrics" options.

The syntax is as follows:

<kernel name>

Limit scope to given kernel name.

or

<context id/name>:<stream id/name>:<kernel name>:<invocation>

The context/stream IDs, names, kernel name and invocation

can be regular expressions. Empty string matches any number

or characters. If <context id/name> or <stream id/name>

is a positive number, it's strictly matched against the

CUDA context/stream ID. Otherwise it's treated as a regular

expression and matched against the context/stream name specified

by the NVTX library. If the invocation count is a positive

number, it's strictly matched against the invocation of

the kernel. Otherwise it's treated as a regular expression.

Example: --kernels "1:foo:bar:2" will profile any kernel

whose name contains "bar" and is the 2nd instance on context

1 and on stream named "foo".



-m, --metrics <metric names>

Specify the metrics to be profiled on certain device(s).

Multiple metric names separated by comma can be specified.

Which device(s) are profiled is controlled by the "--devices"

option. Otherwise metrics will be collected on all devices.

For a list of available metrics, use "--query-metrics".

Use "--metrics all" to profile all metrics available for

each device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.

Note: "--metrics all" does not include some metrics which

are needed for Visual Profiler's source level analysis.

For that, use "--analysis-metrics".



--pc-sampling-period <period>

Specify PC Sampling period in cycles, at which the sampling

records will be dumped. Allowed values for the period are

integers between 5 to 31 both inclusive.

This will set the sampling period to (2^period) cycles

Default value is a number between 5 and 12 based on the setup.Note:

Only available for GM20X+.





--profile-all-processes

Profile all processes launched by the same user who launched

this nvprof instance. Note: Only one instance of nvprof

can run with this option at the same time. Under this mode,

there's no need to specify an application to run.



--profile-api-trace <none|runtime|driver|all>

Turn on/off CUDA runtime/driver API tracing. Allowed values:

none - turn off API tracing

runtime - only turn on CUDA runtime API tracing

driver - only turn on CUDA driver API tracing

all - turn on all API tracing (default)



--profile-child-processes

Profile the application and all child processes launched

by it.



--profile-from-start <on|off>

Enable/disable profiling from the start of the application.

If it's disabled, the application can use {cu,cuda}Profiler{Start,Stop}

to turn on/off profiling. Allowed values:

on - enable profiling from start (default)

off - disable profiling from start



--profiling-semaphore-pool-size <count>

Set the profiling semaphore pool size reserved for storing

profiling data for serialized kernels and memory operations

for each context. The default value is 65536. The size should

be a positive integer.



--query-events

List all the events available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--query-metrics

List all the metrics available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--replay-mode <mode>

Choose replay mode used when not all events/metrics can be

collected in a single run. Allowed values:

disabled - replay is disabled, events/metrics couldn't

be profiled will be dropped

kernel - each kernel invocation is replayed (default)

application - the entire application is replayed.

This modeis incompatible with "--profile-all-processes"

or "profile-child-processes".



-a, --source-level-analysis <source level analysis names>

Specify the source level metrics to be profiled on a certain

kernel invocation. Use "--devices" and "--kernels" to select

a specific kernel invocation. Allowed values: one or more

of the following, separated by commas

global_access: global access

shared_access: shared access

branch: divergent branch

instruction_execution: instruction execution

pc_sampling: pc sampling, available only for GM20X+

Note: Use "--export-profile" to specify an export file.



--system-profiling <on|off>

Turn on/off power, clock, and thermal profiling. Allowed

values:

on - turn on system profiling

off - turn off system profiling (default)



-t, --timeout <seconds>

Set an execution timeout (in seconds) for the CUDA application.

Note: Timeout starts counting from the moment the CUDA driver

is initialized. If the application doesn't call any CUDA

APIs, timeout won't be triggered.



--track-memory-allocations <on|off>

Turn on/off tracking of memory operations, which involves

recording timestamps, memory size, memory type and program

counters of the memory allocations and frees. Turning this

option on may incur an overhead during profiling. Allowed

values:

on - turn on tracking of memory allocations and

free

off - turn off tracking of memory allocations and

free (default)



--unified-memory-profiling <per-process-device|off>

Configure unified memory profiling. Allowed values:

per-process-device - collect counts for each process

and each device (default)

off - turn off unified memory profiling



--cpu-profiling <on|off>

Turn on CPU profiling. Note: CPU profiling is not supported

in multi-process mode.



--cpu-profiling-explain-ccff <filename>

Path to a PGI pgexplain.xml file that should be used to interpret

Common Compiler Feedback Format (CCFF) messages.



--cpu-profiling-frequency <frequency>

Set the CPU profiling frequency in samples per second. Default

is 25Hz. Maximum is 500Hz.



--cpu-profiling-max-depth <depth>

Set the maximum depth of each call stack. Zero means no limit.

Default is zero.



--cpu-profiling-mode <flat|top-down|bottom-up>

Set the output mode of CPU profiling. Allowed values:

flat - Show flat profile

top-down - Show parent functions at the top

bottom-up - Show parent functions at the bottom

(default)



--cpu-profiling-percentage-threshold <threshold>

Filter out the entries that are below the set percentage

threshold. The limit should be an integer between 0 and

100, inclusive. Zero means no limit. Default is zero.



--cpu-profiling-scope <function|instruction>

Choose the profiling scope. Allowed values:

function - Each level in the stack trace represents

a distinct function (default)

instruction - Each level in the stack trace represents

a distinct instruction address



--cpu-profiling-show-ccff <on|off>

Choose whether to print Common Compiler Feedback Format (CCFF)

messages embedded in the binary. Note: this option implies

"--cpu-profiling-scope instruction".Default is off.



--cpu-profiling-show-library <on|off>

Choose whether to print the library name for each sample.



--cpu-profiling-thread-mode <separated|aggregated>

Set the thread mode of CPU profiling. Allowed values:

separated - Show separate profile for each thread

aggregated - Aggregate data from all threads (default)



--cpu-profiling-unwind-stack <on|off>

Choose whether to unwind the CPU call-stack at each sample

point. Default is on.



--openacc-profiling <on|off>

Enable/disable recording information from the OpenACC profiling

interface. Note: if the OpenACC profiling interface is available

depends on the OpenACC runtime. Default is on.



--context-name <name>

Name of the CUDA context.

"%i" in the context name string is replaced with

the ID of the context.

"%p" in the context name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the context name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the context name string is replaced with

the hostname of the system.

"%%" in the context name string is replaced with

"%". Any other character following "%" is illegal.



--csv

Use comma-separated values in the output.



--demangling <on|off>

Turn on/off C++ name demangling of function names. Allowed

values:

on - turn on demangling (default)

off - turn off demangling



-u, --normalized-time-unit <s|ms|us|ns|col|auto>

Specify the unit of time that will be used in the output.

Allowed values:

s - second, ms - millisecond, us - microsecond,

ns - nanosecond

col - a fixed unit for each column

auto (default) - the scale is chosen for each value

based on its length.



--openacc-summary-mode <mode>

Set how durations are computed in the OpenACC summary. Allowed

values:

exclusive: show exclusive times (default)

inclusive: show inclusive times



--print-api-summary

Print a summary of CUDA runtime/driver API calls.



--print-api-trace

Print CUDA runtime/driver API trace.



--print-dependency-analysis-trace

Print dependency analysis trace.



--print-gpu-summary

Print a summary of the activities on the GPU (including CUDA

kernels and memcpy's/memset's).



--print-gpu-trace

Print individual kernel invocations (including CUDA memcpy's/memset's)

and sort them in chronological order. In event/metric profiling

mode, show events/metrics for each kernel invocation.



--print-openacc-constructs

Include parent construct names in OpenACC profile.



--print-openacc-summary

Print a summary of the OpenACC profile.



--print-openacc-trace

Print a trace of the OpenACC profile.



-s, --print-summary

Print a summary of the profiling result on screen. Note:

This is the default unless "--export-profile" or other print

options are used.



--print-summary-per-gpu

Print a summary of the profiling result for each GPU.



--process-name <name>

Name of the process.

"%p" in the process name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the process name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the process name string is replaced with

the hostname of the system.

"%%" in the process name string is replaced with

"%". Any other character following "%" is illegal.



--quiet

Suppress all nvprof output.



--stream-name <name>

Name of the CUDA stream.

"%i" in the stream name string is replaced with the

ID of the stream.

"%p" in the stream name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the stream name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the stream name string is replaced with

the hostname of the system.

"%%" in the stream name string is replaced with

"%". Any other character following "%" is illegal.



-o, --export-profile <filename>

Export the result file which can be imported later or opened

by the NVIDIA Visual Profiler.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name string is replaced with "%".

Any other character following "%" is illegal.

By default, this option disables the summary output. Note:

If the application being profiled creates child processes,

or if '--profile-all-processes' is used, the "%p" format

is needed to get correct export files for each process.



-f, --force-overwrite

Force overwriting all output files (any existing files will

be overwritten).



-i, --import-profile <filename>

Import a result profile from a previous run.



--log-file <filename>

Make nvprof send all its output to the specified file, or

one of the standard channels. The file will be overwritten.

If the file doesn't exist, a new one will be created.

"%1" as the whole file name indicates standard output

channel (stdout).

"%2" as the whole file name indicates standard error

channel (stderr). Note: This is the default.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name is replaced with "%".

Any other character following "%" is illegal.



--print-nvlink-topology

Print nvlink topology



-h, --help

Print this help information.



-V, --version

Print version information of this tool.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/135846.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • Gamma校正_显示器伽马值调多少

    Gamma校正_显示器伽马值调多少介绍伽马校正的由来,作用以及实践

    2022年9月24日
    0
  • JS 暂时性死区

    JS 暂时性死区JS暂时性死区ES6暂时性死区引用ES6暂时性死区只要块级作用域内存在let命令,它所声明的变量就“绑定”(binding)这个区域,不再受外部的影响。vartmp=123;if(true){tmp=’abc’;//ReferenceErrorlettmp;}上面代码中,存在全局变量tmp,但是块级作用域内let又声明了一个局部变量tmp,导致后…

    2022年6月30日
    21
  • origin安装嵌入python_python爬虫之git的使用(origin说明)

    origin安装嵌入python_python爬虫之git的使用(origin说明)1、首先我们回忆两个命令#gitremoteaddorigin远程仓库链接#gitpush-uoriginmaster我们一起看看这个命令,git是git的一级命令,push就是下载,-u应该使用用账户验证maser就是分支的名字(前面我们说过),那么这个origin是个什么鬼?大家看看下面的这个5毛钱图,就能发现,其实origin就是远程仓库的名称。如果不相信在看看我的配置文件#…

    2022年5月3日
    94
  • linux系统dpkg命令[通俗易懂]

    linux系统dpkg命令[通俗易懂]dpkg是Debianpackage的简写,为”Debian“操作系统专门开发的套件管理系统,用于软件的安装,更新和移除。阅读目录安装软件 列出与该包先关联的文件 显示包的版本 移除软件(保留配置) 移除软件(不保留配置) 查找包的详细信息 列出deb包的内容安装软件命令:dpkg-i<.debfilename>实例:dpkg-i~/Download/mozybackup_i386.debmozybackup_i386.deb是手动下…

    2022年5月11日
    46
  • 根据身高重建队列

    根据身高重建队列

    2020年11月19日
    231
  • 比例和比率的区别

    比例和比率的区别数据分析中可能会出现比例和比率的区别:举个例子:全班人数50人,男生30,女生20,那男生的比例就是30/50,同理女生的就是20/50,那么男女的比率是什么呢,是30/20对,就是这个区别。…

    2022年5月15日
    121

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号