Updates in 2025.2

General

  • Added support for collecting C2C link information on Blackwell GPUs.

  • CPU call stack filtering now supports Python call stacks.

  • Instruction statistics now show warp- and thread-level instruction counts per opcode category. Added new metrics sass__inst_executed_per_opcode_category and sass__thread_inst_executed_per_opcode_category. See the Metrics Reference for details.

  • Enhanced several rules to produce tables pointing to the source location of interest.

  • Improved the NvRules API to support generic tables for the UI and CLI.

  • Improved the NvRules and Python Report Interface documentations to be more pythonic.

  • Added APIs to the Python Report Interface for querying rules and source markers in the report.

  • Added Occupancy Calculator Python Interface, which provides a Python-based interface for performing occupancy calculations and analysis of kernels on NVIDIA GPUs.

NVIDIA Nsight Compute

  • Added product-wide search functionality via a new search bar and tool window.

  • The Source page now shows scoreboard dependencies in SASS.

  • Converted more tooltips into interactive tooltips. Interactive tooltips can now be pinned and dragged.

  • Added source correlation navigation controls which allow navigation to the previous or next block of correlated lines.

NVIDIA Nsight Compute CLI

Resolved Issues

  • CUDA Graphs in the Resources View use the current UI theme.

  • Resolved several issues when interacting with timelines on the Details page.

  • Resolved issues with Python syntax highlighting on the Source page.

  • Disabled deprecated columns in the API Stream tool window.

  • Fixed that the Source page may show incorrect correlation when some source files were not resolved.

  • Reduced the number of replay passes required for collecting the PmSampling.section on GH100 with applicable drivers.

  • Resolved that --native-include did not work properly when using range replay and cu(da)ProfilerStop.

  • Fixed an Invalid or unsupported charset:ANSI_X3.4-1968 error when using the CLI on some systems.

  • Fixed that memory available for saving context state during replay may be computed incorrectly when the app was using managed memory.

  • Fixed that some metrics were not listed for collection in section files for GB20x GPUs.