Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Graphics and Gaming
    • High Performance Computing
    • Innovation
    • Multimedia
    • Open Source Software and Platforms
    • Physical
    • Processors
    • Security
    • System
    • Software Tools
    • TrustZone for Armv8-M
    • 中文社区
  • Blog
    • Announcements
    • Artificial Intelligence
    • Automotive
    • Healthcare
    • HPC
    • Infrastructure
    • Innovation
    • Internet of Things
    • Machine Learning
    • Mobile
    • Smart Homes
    • Wearables
  • Forums
    • All developer forums
    • IP Product forums
    • Tool & Software forums
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
High Performance Computing
  • Developer Community
  • Tools and Software
  • High Performance Computing
  • Jump...
  • Cancel
High Performance Computing
HPC blog Arm introduces the first parallel debugger for Python
  • HPC blog
  • HPC forum
  • Server & HPC events
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
More blogs in High Performance Computing
  • HPC blog

Tags
  • High Performance Computing (HPC)
  • Profiling
  • Arm Forge
  • python
  • Debugging
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm introduces the first parallel debugger for Python

Patrick Wohlschlegel
Patrick Wohlschlegel
July 22, 2020

Introduction

Arm announces the release of Arm Forge 20.1 featuring various enhancements and bug fixes across all products. In particular, this new release includes:

  • Native parallel debugging of Python applications in DDT
  • Improvements to performance analysis of Nvidia GPU in both MAP and Performance Reports
  • Simplifications to our packaging, through the integration of Performance Reports into the Forge installation files
  • Support for the latest development environments for Arm-based servers.

For more in-depth information including a full breakdown of the latest features and bug fixes please see release notes history.  

Python Debugging

Attach to mpi4py and serial applications by inserting %allinea_python_debug% into the command line:

bin/ddt mpirun -np 2 python3 %allinea_python_debug% python-blog.py

Or

bin/ddt python3 %allinea_python_debug% python-blog.py

The MPI version drops you into a stack starting at "import mpi4py". Using step-in, step-out and step-over in Python code all work in the same way as C, C++ and Fortran.

Starting an MPI code written in Python with DDT

All the same advanced breakpoint features as C and C++ are supported including function name, conditional and triggering every N hits breakpoints.
Setting a breakpoint in a loop with the condition i == 10 stops at that point, as can be seen from the local variables:

Conditional Breakpoints

The evaluation window can be used to inspect globals, locals or even execute Python expressions in the selected frame:

Evaluate Python Variables

As well as debugging Python, the stack also shows a merged view of Python and native code. So the steps that led up to some native code being executed are visible. For example, here is what the stack looks like when pausing in a numpy dot product which uses BLAS under the hood. Registers can be inspected and instructions can be stepped over using Forge's assembly debugging mode:

Python Merged Stack

Performance Reports and Forge Integration

Performance Reports is now distributed with Forge as a single combined installation, launched via bin/perf-report in the Forge installation directory.

We have also renamed some lesser-used or behind-the-scenes binaries and scripts to be more appropriate. While this will not affect most users, any users of manual launch should use forge-client instead of (ddt-client or allinea-client) and users of .qtf scripts should use forge-mpirun in place of ddt-mpirun.

Revamp of GPU Metrics 

CUDA 10.2 and GPU Metrics are now supported on x86_64 and PowerPC. We have removed the "GPU Temperature" and "Time Spent in Global Memory Accesses" metrics to provide a more stable metric collection mechanism that is consistent across supported platforms.

GPU Utilization, GPU Memory Usage and GPU Power Usage are collected once the NVIDIA Management library is installed (https://developer.nvidia.com/nvidia-management-library-nvml). Warp Stall Reasons and Line metrics are collected using MAP's CUDA Kernel Analysis feature based on CUPTI, CUDA's profiling interface. MAP supports profiling compiler optimized code but it is necessary to compile with the flag -lineinfo to use MAP's CUDA Kernel Analysis feature. CUDA Kernel Analysis can be enabled with the GUI's Run Dialog or with the command line using --cuda-kernel-analysis. An example workflow is

$ nvcc -O3 -g -lineinfo cuda_app.cu -o cuda_app
$ map --profile --cuda-kernel-analysis cuda_app

The following is a MAP profile of CloverLeaf_CUDA on Oak Ridge National Laboratory's Summit. It demonstrates both GPU Metric collection and CUDA Kernel Analysis on PowerPC. 

New GPU Profiling Metrics

Graphical Interface Refresh

Forge has been updated to Qt 5, which means a crisper and more performant GUI, as well as bug fixes and stability improvements. In particular, macOS is better supported when in dark appearance mode.

Documentation

Developer and reference guides

  • Arm Performance Reports is a merged component of the Arm Forge product from version 20.1 onwards.
    The Arm Performance Reports user guide is now combined with the Arm Forge user guide, and is available from https://developer.arm.com/docs/101136/latest.
  • Arm License Server user guide is available from https://developer.arm.com/docs/101169.

Conclusion

Despite the very unusual times, we are all experiencing, the team has been able to push new, innovative features. With this release, Arm is the first company to release a parallel debugger for Python which includes all the features one would expect. We are looking forward to hearing what you think, just click on the button below!

Give us feedback

The team joins me in wishing all of you and your families the very best. Stay healthy, stay safe.

Anonymous
HPC blog
  • Designing Arm Cortex-M55 CPU on Arm Neoverse powered AWS Graviton2 Processors

    Tim Thornton
    Tim Thornton
    In this blog, read how Arm made the transition from on-prem EDA to running EDA in the Cloud on AWS Graviton2.
    • December 17, 2020
  • Ocean Modeling with HYCOM on AWS Graviton2

    Lucas Pettey
    Lucas Pettey
    AWS Graviton2 based c6g instances offer the fastest resolution time on HYCOM.
    • December 10, 2020
  • Trends to Watch in HPC

    Brent Gorda
    Brent Gorda
    In this blog, Brent Gorda discusses a few key trends that will impact the future of high-performance computing.
    • November 4, 2020