nvprof –help

nvprof –helpUsage:nvprof[options][application][application-arguments]Options:–aggregate-mode<on|off>Turnon/offaggregatemodeforeventsandmetricsspecifiedbysubsequent"–events"and"–…

大家好,又见面了,我是你们的朋友全栈君。

Usage: nvprof [options] [application] [application-arguments]

Options:

--aggregate-mode <on|off>

Turn on/off aggregate mode for events and metrics specified

by subsequent "--events" and "--metrics" options. Those

event/metric values will be collected for each domain instance,

instead of the whole device. Allowed values:

on - turn on aggregate mode (default)

off - turn off aggregate mode



--analysis-metrics

Collect profiling data that can be imported to Visual Profiler's

"analysis" mode. Note: Use "--export-profile" to specify

an export file.



--concurrent-kernels <on|off>

Turn on/off concurrent kernel execution. If concurrent kernel

execution is off, all kernels running on one device will

be serialized. Allowed values:

on - turn on concurrent kernel execution (default)

off - turn off concurrent kernel execution



--continuous-sampling-interval <interval>

Set the continuous mode sampling interval in milliseconds.

Minimum is 1 ms. Default is 2 ms.



--dependency-analysis

Generate event dependency graph for host and device activities

and run dependency analysis.



--device-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for non-CDP operations, especially for concurrent

kernel tracing, for each buffer on a context. The default

value is 8MB. The size should be a positive integer.



--device-cdp-buffer-size <size in MBs>

Set the device memory size (in MBs) reserved for storing

profiling data for CDP operations for each buffer on a context.

The default value is 8MB. The size should be a positive

integer.



--devices <device ids>

Change the scope of subsequent "--events", "--metrics", "--query-events"

and "--query-metrics" options.

Allowed values:

all - change scope to all valid devices

comma-separated device IDs - change scope to specified

devices



--event-collection-mode <mode>

Choose event collection mode for all events/metrics Allowed

values:

kernel - events/metrics are collected only for durations

of kernel executions (default)

continuous - events/metrics are collected for duration

of application. This is not applicable for non-tesla devices.

This mode is compatible only with NVLink events/metrics.

This modeis incompatible with "--profile-all-processes"

or "--profile-child-processes" or "--replay-mode kernel"

or "--replay-mode application".



-e, --events <event names>

Specify the events to be profiled on certain device(s). Multiple

event names separated by comma can be specified. Which device(s)

are profiled is controlled by the "--devices" option. Otherwise

events will be collected on all devices.

For a list of available events, use "--query-events".

Use "--events all" to profile all events available for each

device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.



--kernel-latency-timestamps <on|off>

Turn on/off collection of kernel latency timestamps, namely

queued and submitted. The queued timestamp is captured when

a kernel launch command was queued into the CPU command

buffer. The submitted timestamp denotes when the CPU command

buffer containing this kernel launch was submitted to the

GPU. Turning this option on may incur an overhead during

profiling. Allowed values:

on - turn on collection of kernel latency timestamps

off - turn off collection of kernel latency timestamps

(default)



--kernels <kernel path syntax>

Change the scope of subsequent "--events", "--metrics" options.

The syntax is as follows:

<kernel name>

Limit scope to given kernel name.

or

<context id/name>:<stream id/name>:<kernel name>:<invocation>

The context/stream IDs, names, kernel name and invocation

can be regular expressions. Empty string matches any number

or characters. If <context id/name> or <stream id/name>

is a positive number, it's strictly matched against the

CUDA context/stream ID. Otherwise it's treated as a regular

expression and matched against the context/stream name specified

by the NVTX library. If the invocation count is a positive

number, it's strictly matched against the invocation of

the kernel. Otherwise it's treated as a regular expression.

Example: --kernels "1:foo:bar:2" will profile any kernel

whose name contains "bar" and is the 2nd instance on context

1 and on stream named "foo".



-m, --metrics <metric names>

Specify the metrics to be profiled on certain device(s).

Multiple metric names separated by comma can be specified.

Which device(s) are profiled is controlled by the "--devices"

option. Otherwise metrics will be collected on all devices.

For a list of available metrics, use "--query-metrics".

Use "--metrics all" to profile all metrics available for

each device.

Use "--devices" and "--kernels" to select a specific kernel

invocation.

Note: "--metrics all" does not include some metrics which

are needed for Visual Profiler's source level analysis.

For that, use "--analysis-metrics".



--pc-sampling-period <period>

Specify PC Sampling period in cycles, at which the sampling

records will be dumped. Allowed values for the period are

integers between 5 to 31 both inclusive.

This will set the sampling period to (2^period) cycles

Default value is a number between 5 and 12 based on the setup.Note:

Only available for GM20X+.





--profile-all-processes

Profile all processes launched by the same user who launched

this nvprof instance. Note: Only one instance of nvprof

can run with this option at the same time. Under this mode,

there's no need to specify an application to run.



--profile-api-trace <none|runtime|driver|all>

Turn on/off CUDA runtime/driver API tracing. Allowed values:

none - turn off API tracing

runtime - only turn on CUDA runtime API tracing

driver - only turn on CUDA driver API tracing

all - turn on all API tracing (default)



--profile-child-processes

Profile the application and all child processes launched

by it.



--profile-from-start <on|off>

Enable/disable profiling from the start of the application.

If it's disabled, the application can use {cu,cuda}Profiler{Start,Stop}

to turn on/off profiling. Allowed values:

on - enable profiling from start (default)

off - disable profiling from start



--profiling-semaphore-pool-size <count>

Set the profiling semaphore pool size reserved for storing

profiling data for serialized kernels and memory operations

for each context. The default value is 65536. The size should

be a positive integer.



--query-events

List all the events available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--query-metrics

List all the metrics available on the device(s). Device(s)

queried can be controlled by the "--devices" option.



--replay-mode <mode>

Choose replay mode used when not all events/metrics can be

collected in a single run. Allowed values:

disabled - replay is disabled, events/metrics couldn't

be profiled will be dropped

kernel - each kernel invocation is replayed (default)

application - the entire application is replayed.

This modeis incompatible with "--profile-all-processes"

or "profile-child-processes".



-a, --source-level-analysis <source level analysis names>

Specify the source level metrics to be profiled on a certain

kernel invocation. Use "--devices" and "--kernels" to select

a specific kernel invocation. Allowed values: one or more

of the following, separated by commas

global_access: global access

shared_access: shared access

branch: divergent branch

instruction_execution: instruction execution

pc_sampling: pc sampling, available only for GM20X+

Note: Use "--export-profile" to specify an export file.



--system-profiling <on|off>

Turn on/off power, clock, and thermal profiling. Allowed

values:

on - turn on system profiling

off - turn off system profiling (default)



-t, --timeout <seconds>

Set an execution timeout (in seconds) for the CUDA application.

Note: Timeout starts counting from the moment the CUDA driver

is initialized. If the application doesn't call any CUDA

APIs, timeout won't be triggered.



--track-memory-allocations <on|off>

Turn on/off tracking of memory operations, which involves

recording timestamps, memory size, memory type and program

counters of the memory allocations and frees. Turning this

option on may incur an overhead during profiling. Allowed

values:

on - turn on tracking of memory allocations and

free

off - turn off tracking of memory allocations and

free (default)



--unified-memory-profiling <per-process-device|off>

Configure unified memory profiling. Allowed values:

per-process-device - collect counts for each process

and each device (default)

off - turn off unified memory profiling



--cpu-profiling <on|off>

Turn on CPU profiling. Note: CPU profiling is not supported

in multi-process mode.



--cpu-profiling-explain-ccff <filename>

Path to a PGI pgexplain.xml file that should be used to interpret

Common Compiler Feedback Format (CCFF) messages.



--cpu-profiling-frequency <frequency>

Set the CPU profiling frequency in samples per second. Default

is 25Hz. Maximum is 500Hz.



--cpu-profiling-max-depth <depth>

Set the maximum depth of each call stack. Zero means no limit.

Default is zero.



--cpu-profiling-mode <flat|top-down|bottom-up>

Set the output mode of CPU profiling. Allowed values:

flat - Show flat profile

top-down - Show parent functions at the top

bottom-up - Show parent functions at the bottom

(default)



--cpu-profiling-percentage-threshold <threshold>

Filter out the entries that are below the set percentage

threshold. The limit should be an integer between 0 and

100, inclusive. Zero means no limit. Default is zero.



--cpu-profiling-scope <function|instruction>

Choose the profiling scope. Allowed values:

function - Each level in the stack trace represents

a distinct function (default)

instruction - Each level in the stack trace represents

a distinct instruction address



--cpu-profiling-show-ccff <on|off>

Choose whether to print Common Compiler Feedback Format (CCFF)

messages embedded in the binary. Note: this option implies

"--cpu-profiling-scope instruction".Default is off.



--cpu-profiling-show-library <on|off>

Choose whether to print the library name for each sample.



--cpu-profiling-thread-mode <separated|aggregated>

Set the thread mode of CPU profiling. Allowed values:

separated - Show separate profile for each thread

aggregated - Aggregate data from all threads (default)



--cpu-profiling-unwind-stack <on|off>

Choose whether to unwind the CPU call-stack at each sample

point. Default is on.



--openacc-profiling <on|off>

Enable/disable recording information from the OpenACC profiling

interface. Note: if the OpenACC profiling interface is available

depends on the OpenACC runtime. Default is on.



--context-name <name>

Name of the CUDA context.

"%i" in the context name string is replaced with

the ID of the context.

"%p" in the context name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the context name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the context name string is replaced with

the hostname of the system.

"%%" in the context name string is replaced with

"%". Any other character following "%" is illegal.



--csv

Use comma-separated values in the output.



--demangling <on|off>

Turn on/off C++ name demangling of function names. Allowed

values:

on - turn on demangling (default)

off - turn off demangling



-u, --normalized-time-unit <s|ms|us|ns|col|auto>

Specify the unit of time that will be used in the output.

Allowed values:

s - second, ms - millisecond, us - microsecond,

ns - nanosecond

col - a fixed unit for each column

auto (default) - the scale is chosen for each value

based on its length.



--openacc-summary-mode <mode>

Set how durations are computed in the OpenACC summary. Allowed

values:

exclusive: show exclusive times (default)

inclusive: show inclusive times



--print-api-summary

Print a summary of CUDA runtime/driver API calls.



--print-api-trace

Print CUDA runtime/driver API trace.



--print-dependency-analysis-trace

Print dependency analysis trace.



--print-gpu-summary

Print a summary of the activities on the GPU (including CUDA

kernels and memcpy's/memset's).



--print-gpu-trace

Print individual kernel invocations (including CUDA memcpy's/memset's)

and sort them in chronological order. In event/metric profiling

mode, show events/metrics for each kernel invocation.



--print-openacc-constructs

Include parent construct names in OpenACC profile.



--print-openacc-summary

Print a summary of the OpenACC profile.



--print-openacc-trace

Print a trace of the OpenACC profile.



-s, --print-summary

Print a summary of the profiling result on screen. Note:

This is the default unless "--export-profile" or other print

options are used.



--print-summary-per-gpu

Print a summary of the profiling result for each GPU.



--process-name <name>

Name of the process.

"%p" in the process name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the process name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the process name string is replaced with

the hostname of the system.

"%%" in the process name string is replaced with

"%". Any other character following "%" is illegal.



--quiet

Suppress all nvprof output.



--stream-name <name>

Name of the CUDA stream.

"%i" in the stream name string is replaced with the

ID of the stream.

"%p" in the stream name string is replaced with

the process ID of the application being profiled.

"%q{<ENV>}" in the stream name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the stream name string is replaced with

the hostname of the system.

"%%" in the stream name string is replaced with

"%". Any other character following "%" is illegal.



-o, --export-profile <filename>

Export the result file which can be imported later or opened

by the NVIDIA Visual Profiler.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name string is replaced with "%".

Any other character following "%" is illegal.

By default, this option disables the summary output. Note:

If the application being profiled creates child processes,

or if '--profile-all-processes' is used, the "%p" format

is needed to get correct export files for each process.



-f, --force-overwrite

Force overwriting all output files (any existing files will

be overwritten).



-i, --import-profile <filename>

Import a result profile from a previous run.



--log-file <filename>

Make nvprof send all its output to the specified file, or

one of the standard channels. The file will be overwritten.

If the file doesn't exist, a new one will be created.

"%1" as the whole file name indicates standard output

channel (stdout).

"%2" as the whole file name indicates standard error

channel (stderr). Note: This is the default.

"%p" in the file name string is replaced with the

process ID of the application being profiled.

"%q{<ENV>}" in the file name string is replaced

with the value of the environment variable "<ENV>". If the

environment variable is not set it's an error.

"%h" in the file name string is replaced with the

hostname of the system.

"%%" in the file name is replaced with "%".

Any other character following "%" is illegal.



--print-nvlink-topology

Print nvlink topology



-h, --help

Print this help information.



-V, --version

Print version information of this tool.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/135846.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 下载视频网站中ts格式的视频[通俗易懂]

    下载视频网站中ts格式的视频[通俗易懂]下载视频网站中ts格式的视频1、第一步打开开发者工具Chrome中可以用【F12】或者【Crtl+Shift+I】或者【自定义】(也就是三个点的选项按钮)下面的【更多工具】——【开发者工具】(Chroem版本86)找到【network】并刷新网页2、如果有m3u8结尾的文件,把它的源地址复制下来Chrome开发者工具第三行开头,有一个【filter】(过滤器)在里面输入【m3u8】就可以搜索,寻找.m3u8结尾的文件右键【xxx.m3u8】文件,找到copycopylink

    2022年7月18日
    37
  • windows启动tomcat闪退

    windows启动tomcat闪退现象:windows下双击tomcat\bin\startup.bat时闪退原因:缺少环境变量导致解决方法:打开编辑tomcat\bin\startup.bat,头部加入以下代码,一个是JAVA目录,一个是Tomcat目录SETJAVA_HOME=C:\ProgramFiles\Java\jdk1.6.0_39SETTOMCAT_HOME=D:\hunk\work\apache-tomcat

    2022年5月30日
    60
  • phpstorm激活码20213月最新在线激活

    phpstorm激活码20213月最新在线激活,https://javaforall.net/100143.html。详细ieda激活码不妨到全栈程序员必看教程网一起来了解一下吧!

    2022年3月15日
    40
  • 安全帽识别系统-智慧工地的守夜人

    安全帽识别系统-智慧工地的守夜人在企业作业和工地施工过程中,安全永远高于一切。大家都知道,在进入工作现场是必须佩戴安全帽的,传统的检查方法主要靠安检人员一个一个的检查,这种方法耗时费力却无法保证效果。深圳强美推出鹰眸安全帽识别系统之后,这项工作就与人工智能紧密相联,跨入了高科技时代。安全帽识别系统能够实时对未佩戴安全帽的行为发出警告,及时提醒监理人员处理,为作业人员筑起一道人工智能的安全防火墙。鹰眸安全帽识别系统面世之后,在业…

    2022年5月19日
    35
  • 前端样式库_freelist数据结构

    前端样式库_freelist数据结构这个过程主要分为三个步骤:数据预处理数据处理就是把数据按照一定的格式写出来,以便网路自己去读取数据1准备原始数据我的cloth数据一共是四个类别,每个类别有衣服47张,一用是188张图片,这些

    2022年8月2日
    6

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号