nvprof –help

nvprof –helpUsage:nvprof[options][application][application-arguments]Options:–aggregate-mode<on|off>Turnon/offaggregatemodeforeventsandmetricsspecifiedbysubsequent"–events"and"–…

大家好,又见面了,我是你们的朋友全栈君。

Usage: nvprof [options] [application] [application-arguments]

Options:

--aggregate-mode <on|off>

Turn on/off aggregate mode for events and metrics specified

by subsequent "--events" and "--metrics" options. Those

event/metric values will be collected for each domain instance,

instead of the whole device. Allowed values:

on - turn on aggregate mode (default)

off - turn off aggregate mode



--analysis-metrics

Collect profiling data that can be imported to Visual Profiler's

"analysis" mode. Note: Use "--export-profile" to specify

an export file.



--concurrent-kernels <on|off>

Turn on/off concurrent kernel execution. If concurrent kernel

execution is off, all kernels running on one device will

be serialized. Allowed values:

on - turn on concurrent kernel execution (default)

off - turn off concurrent kernel execution



--continuous-sampling-interval <interval>

Set the continuous mode sampling interval in milliseconds.

Minimum is 1 ms. Default is 2 ms.



--dependency-analysis

Generate event dependency graph for host and device activities

and run dependency analysis.



--device-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for non-CDP operations, especially for concurrent

kernel tracing, for each buffer on a context. The default

value is 8MB. The size should be a positive integer.



--device-cdp-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for CDP operations for each buffer on a context.

The default value is 8MB. The size should be a positive

integer.



--devices <device ids>

Change the scope of subsequent "--events", "--metrics", "--query-events"

and "--query-metrics" options.

Allowed values:

all - change scope to all valid devices

comma-separated device IDs - change scope to specified

devices



--event-collection-mode <mode>

Choose event collection mode for all events/metrics Allowed

values:

kernel - events/metrics are collected only for durations

of kernel executions (default)

continuous - events/metrics are collected for duration

of application. This is not applicable for non-tesla devices.

This mode is compatible only with NVLink events/metrics.

This modeis incompatible with "--profile-all-processes"

or "--profile-child-processes" or "--replay-mode kernel"

or "--replay-mode application".



-e, --events <event names>

Specify the events to be profiled on certain device(s). Multiple

event names separated by comma can be specified. Which device(s)

are profiled is controlled by the "--devices" option. Otherwise

events will be collected on all devices.

For a list of available events, use "--query-events".

Use "--events all" to profile all events available for each

device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.



--kernel-latency-timestamps <on|off>

Turn on/off collection of kernel latency timestamps, namely

queued and submitted. The queued timestamp is captured when

a kernel launch command was queued into the CPU command

buffer. The submitted timestamp denotes when the CPU command

buffer containing this kernel launch was submitted to the

GPU. Turning this option on may incur an overhead during

profiling. Allowed values:

on - turn on collection of kernel latency timestamps

off - turn off collection of kernel latency timestamps

(default)



--kernels <kernel path syntax>

Change the scope of subsequent "--events", "--metrics" options.

The syntax is as follows:

<kernel name>

Limit scope to given kernel name.

or

<context id/name>:<stream id/name>:<kernel name>:<invocation>

The context/stream IDs, names, kernel name and invocation

can be regular expressions. Empty string matches any number

or characters. If <context id/name> or <stream id/name>

is a positive number, it's strictly matched against the

CUDA context/stream ID. Otherwise it's treated as a regular

expression and matched against the context/stream name specified

by the NVTX library. If the invocation count is a positive

number, it's strictly matched against the invocation of

the kernel. Otherwise it's treated as a regular expression.

Example: --kernels "1:foo:bar:2" will profile any kernel

whose name contains "bar" and is the 2nd instance on context

1 and on stream named "foo".



-m, --metrics <metric names>

Specify the metrics to be profiled on certain device(s).

Multiple metric names separated by comma can be specified.

Which device(s) are profiled is controlled by the "--devices"

option. Otherwise metrics will be collected on all devices.

For a list of available metrics, use "--query-metrics".

Use "--metrics all" to profile all metrics available for

each device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.

Note: "--metrics all" does not include some metrics which

are needed for Visual Profiler's source level analysis.

For that, use "--analysis-metrics".



--pc-sampling-period <period>

Specify PC Sampling period in cycles, at which the sampling

records will be dumped. Allowed values for the period are

integers between 5 to 31 both inclusive.

This will set the sampling period to (2^period) cycles

Default value is a number between 5 and 12 based on the setup.Note:

Only available for GM20X+.





--profile-all-processes

Profile all processes launched by the same user who launched

this nvprof instance. Note: Only one instance of nvprof

can run with this option at the same time. Under this mode,

there's no need to specify an application to run.



--profile-api-trace <none|runtime|driver|all>

Turn on/off CUDA runtime/driver API tracing. Allowed values:

none - turn off API tracing

runtime - only turn on CUDA runtime API tracing

driver - only turn on CUDA driver API tracing

all - turn on all API tracing (default)



--profile-child-processes

Profile the application and all child processes launched

by it.



--profile-from-start <on|off>

Enable/disable profiling from the start of the application.

If it's disabled, the application can use {cu,cuda}Profiler{Start,Stop}

to turn on/off profiling. Allowed values:

on - enable profiling from start (default)

off - disable profiling from start



--profiling-semaphore-pool-size <count>

Set the profiling semaphore pool size reserved for storing

profiling data for serialized kernels and memory operations

for each context. The default value is 65536. The size should

be a positive integer.



--query-events

List all the events available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--query-metrics

List all the metrics available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--replay-mode <mode>

Choose replay mode used when not all events/metrics can be

collected in a single run. Allowed values:

disabled - replay is disabled, events/metrics couldn't

be profiled will be dropped

kernel - each kernel invocation is replayed (default)

application - the entire application is replayed.

This modeis incompatible with "--profile-all-processes"

or "profile-child-processes".



-a, --source-level-analysis <source level analysis names>

Specify the source level metrics to be profiled on a certain

kernel invocation. Use "--devices" and "--kernels" to select

a specific kernel invocation. Allowed values: one or more

of the following, separated by commas

global_access: global access

shared_access: shared access

branch: divergent branch

instruction_execution: instruction execution

pc_sampling: pc sampling, available only for GM20X+

Note: Use "--export-profile" to specify an export file.



--system-profiling <on|off>

Turn on/off power, clock, and thermal profiling. Allowed

values:

on - turn on system profiling

off - turn off system profiling (default)



-t, --timeout <seconds>

Set an execution timeout (in seconds) for the CUDA application.

Note: Timeout starts counting from the moment the CUDA driver

is initialized. If the application doesn't call any CUDA

APIs, timeout won't be triggered.



--track-memory-allocations <on|off>

Turn on/off tracking of memory operations, which involves

recording timestamps, memory size, memory type and program

counters of the memory allocations and frees. Turning this

option on may incur an overhead during profiling. Allowed

values:

on - turn on tracking of memory allocations and

free

off - turn off tracking of memory allocations and

free (default)



--unified-memory-profiling <per-process-device|off>

Configure unified memory profiling. Allowed values:

per-process-device - collect counts for each process

and each device (default)

off - turn off unified memory profiling



--cpu-profiling <on|off>

Turn on CPU profiling. Note: CPU profiling is not supported

in multi-process mode.



--cpu-profiling-explain-ccff <filename>

Path to a PGI pgexplain.xml file that should be used to interpret

Common Compiler Feedback Format (CCFF) messages.



--cpu-profiling-frequency <frequency>

Set the CPU profiling frequency in samples per second. Default

is 25Hz. Maximum is 500Hz.



--cpu-profiling-max-depth <depth>

Set the maximum depth of each call stack. Zero means no limit.

Default is zero.



--cpu-profiling-mode <flat|top-down|bottom-up>

Set the output mode of CPU profiling. Allowed values:

flat - Show flat profile

top-down - Show parent functions at the top

bottom-up - Show parent functions at the bottom

(default)



--cpu-profiling-percentage-threshold <threshold>

Filter out the entries that are below the set percentage

threshold. The limit should be an integer between 0 and

100, inclusive. Zero means no limit. Default is zero.



--cpu-profiling-scope <function|instruction>

Choose the profiling scope. Allowed values:

function - Each level in the stack trace represents

a distinct function (default)

instruction - Each level in the stack trace represents

a distinct instruction address



--cpu-profiling-show-ccff <on|off>

Choose whether to print Common Compiler Feedback Format (CCFF)

messages embedded in the binary. Note: this option implies

"--cpu-profiling-scope instruction".Default is off.



--cpu-profiling-show-library <on|off>

Choose whether to print the library name for each sample.



--cpu-profiling-thread-mode <separated|aggregated>

Set the thread mode of CPU profiling. Allowed values:

separated - Show separate profile for each thread

aggregated - Aggregate data from all threads (default)



--cpu-profiling-unwind-stack <on|off>

Choose whether to unwind the CPU call-stack at each sample

point. Default is on.



--openacc-profiling <on|off>

Enable/disable recording information from the OpenACC profiling

interface. Note: if the OpenACC profiling interface is available

depends on the OpenACC runtime. Default is on.



--context-name <name>

Name of the CUDA context.

"%i" in the context name string is replaced with

the ID of the context.

"%p" in the context name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the context name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the context name string is replaced with

the hostname of the system.

"%%" in the context name string is replaced with

"%". Any other character following "%" is illegal.



--csv

Use comma-separated values in the output.



--demangling <on|off>

Turn on/off C++ name demangling of function names. Allowed

values:

on - turn on demangling (default)

off - turn off demangling



-u, --normalized-time-unit <s|ms|us|ns|col|auto>

Specify the unit of time that will be used in the output.

Allowed values:

s - second, ms - millisecond, us - microsecond,

ns - nanosecond

col - a fixed unit for each column

auto (default) - the scale is chosen for each value

based on its length.



--openacc-summary-mode <mode>

Set how durations are computed in the OpenACC summary. Allowed

values:

exclusive: show exclusive times (default)

inclusive: show inclusive times



--print-api-summary

Print a summary of CUDA runtime/driver API calls.



--print-api-trace

Print CUDA runtime/driver API trace.



--print-dependency-analysis-trace

Print dependency analysis trace.



--print-gpu-summary

Print a summary of the activities on the GPU (including CUDA

kernels and memcpy's/memset's).



--print-gpu-trace

Print individual kernel invocations (including CUDA memcpy's/memset's)

and sort them in chronological order. In event/metric profiling

mode, show events/metrics for each kernel invocation.



--print-openacc-constructs

Include parent construct names in OpenACC profile.



--print-openacc-summary

Print a summary of the OpenACC profile.



--print-openacc-trace

Print a trace of the OpenACC profile.



-s, --print-summary

Print a summary of the profiling result on screen. Note:

This is the default unless "--export-profile" or other print

options are used.



--print-summary-per-gpu

Print a summary of the profiling result for each GPU.



--process-name <name>

Name of the process.

"%p" in the process name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the process name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the process name string is replaced with

the hostname of the system.

"%%" in the process name string is replaced with

"%". Any other character following "%" is illegal.



--quiet

Suppress all nvprof output.



--stream-name <name>

Name of the CUDA stream.

"%i" in the stream name string is replaced with the

ID of the stream.

"%p" in the stream name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the stream name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the stream name string is replaced with

the hostname of the system.

"%%" in the stream name string is replaced with

"%". Any other character following "%" is illegal.



-o, --export-profile <filename>

Export the result file which can be imported later or opened

by the NVIDIA Visual Profiler.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name string is replaced with "%".

Any other character following "%" is illegal.

By default, this option disables the summary output. Note:

If the application being profiled creates child processes,

or if '--profile-all-processes' is used, the "%p" format

is needed to get correct export files for each process.



-f, --force-overwrite

Force overwriting all output files (any existing files will

be overwritten).



-i, --import-profile <filename>

Import a result profile from a previous run.



--log-file <filename>

Make nvprof send all its output to the specified file, or

one of the standard channels. The file will be overwritten.

If the file doesn't exist, a new one will be created.

"%1" as the whole file name indicates standard output

channel (stdout).

"%2" as the whole file name indicates standard error

channel (stderr). Note: This is the default.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name is replaced with "%".

Any other character following "%" is illegal.



--print-nvlink-topology

Print nvlink topology



-h, --help

Print this help information.



-V, --version

Print version information of this tool.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/135846.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • mysql 字符转数字进行比较大小_mysql将字符串字段转为数字排序或比大小

    mysql 字符转数字进行比较大小_mysql将字符串字段转为数字排序或比大小mysql里面有个坑就是,有时按照某个字段的大小排序(或是比大小)发现排序有点错乱。后来才发现,是我们想当然地把对字符串字段当成数字并按照其大小排序(或是比大小),结果肯定不会是你想要的结果。这时候需要把字符串转成数字再排序。最简单的办法就是在字段后面加上+0如把’123’转成数字123(以下例子全为亲测):排序:例:方法一:ORDERBY’123’+0;(首推)方法二:ORDERBYCA…

    2022年5月29日
    47
  • android开发之Notification_通知栏消息「建议收藏」

    Notification简介  Notification看名字就知道,是一个和提醒有关的东西,它通常和NotificationManager一块使用。具体来说,其主要功能如下。  1.NotificationManager和Notification用来设置通知  通知的设置等操作相对比较简单,基本的使用方式就是新建一个Notification对象,设置好通知的各项参数,然后使用系统后台

    2022年3月10日
    39
  • ASMM自动管理的功能[通俗易懂]

    ASMM自动管理的功能[通俗易懂]AutomaticSharedMemoryManagement(ASMM)是ORACLE10g另外一个自动管理的功能。[@more@]AutomaticSharedMemoryManagement(ASMM)是ORA…

    2022年5月31日
    53
  • PL/Sql 访录被用户锁定

    PL/Sql 访录被用户锁定

    2021年9月1日
    50
  • 最简单的纯js实现点击展开二级菜单功能

    最简单的纯js实现点击展开二级菜单功能虽然,jQuery已经非常好用了,但是实际的开发项目中,还是有很多限制,比如项目组奇葩的要求,不能使用任何插件,当然,也是考虑插件占用资源,毕竟100+KB对与小型项目来说还是非常大的。我最近就遇到做个点击展开二级菜单的要求,当然只能用原生的JS去写来实现,我借鉴了网上的一个案例,补充一下,分享一下:如果,默认打开页面进来时二级菜单是隐藏的,需要点击才能展现二级菜单,再点击就是隐藏二级菜单。这

    2022年5月11日
    51
  • ArrayList扩容机制。

    ArrayList扩容机制。1)直接new一个ArrayList对象时(未指定初始容量大小)是一个空的数组,容量大小为零。publicArrayList(){//DEFAULTCAPACITY_EMPTY_ELEMENTDATA变量为一个空的数组privatestaticfinalObject[]DEFAULTCAPACITY_EMPTY…

    2022年5月25日
    30

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号