Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Research Collaboration and Enablement
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Tools, Software and IDEs blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
High Performance Computing (HPC) blog Arm introduces the first parallel debugger for Python
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • Profiling
  • Arm Forge
  • python
  • Debugging
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Arm introduces the first parallel debugger for Python

Patrick Wohlschlegel
Patrick Wohlschlegel
July 22, 2020
3 minute read time.

Introduction

Arm announces the release of Arm Forge 20.1 featuring various enhancements and bug fixes across all products. In particular, this new release includes:

  • Native parallel debugging of Python applications in DDT
  • Improvements to performance analysis of Nvidia GPU in both MAP and Performance Reports
  • Simplifications to our packaging, through the integration of Performance Reports into the Forge installation files
  • Support for the latest development environments for Arm-based servers.

For more in-depth information including a full breakdown of the latest features and bug fixes please see release notes history.  

Python Debugging

Attach to mpi4py and serial applications by inserting %allinea_python_debug% into the command line:

bin/ddt mpirun -np 2 python3 %allinea_python_debug% python-blog.py

Or

bin/ddt python3 %allinea_python_debug% python-blog.py

The MPI version drops you into a stack starting at "import mpi4py". Using step-in, step-out and step-over in Python code all work in the same way as C, C++ and Fortran.

Starting an MPI code written in Python with DDT

All the same advanced breakpoint features as C and C++ are supported including function name, conditional and triggering every N hits breakpoints.
Setting a breakpoint in a loop with the condition i == 10 stops at that point, as can be seen from the local variables:

Conditional Breakpoints

The evaluation window can be used to inspect globals, locals or even execute Python expressions in the selected frame:

Evaluate Python Variables

As well as debugging Python, the stack also shows a merged view of Python and native code. So the steps that led up to some native code being executed are visible. For example, here is what the stack looks like when pausing in a numpy dot product which uses BLAS under the hood. Registers can be inspected and instructions can be stepped over using Forge's assembly debugging mode:

Python Merged Stack

Performance Reports and Forge Integration

Performance Reports is now distributed with Forge as a single combined installation, launched via bin/perf-report in the Forge installation directory.

We have also renamed some lesser-used or behind-the-scenes binaries and scripts to be more appropriate. While this will not affect most users, any users of manual launch should use forge-client instead of (ddt-client or allinea-client) and users of .qtf scripts should use forge-mpirun in place of ddt-mpirun.

Revamp of GPU Metrics 

CUDA 10.2 and GPU Metrics are now supported on x86_64 and PowerPC. We have removed the "GPU Temperature" and "Time Spent in Global Memory Accesses" metrics to provide a more stable metric collection mechanism that is consistent across supported platforms.

GPU Utilization, GPU Memory Usage and GPU Power Usage are collected once the NVIDIA Management library is installed (https://developer.nvidia.com/nvidia-management-library-nvml). Warp Stall Reasons and Line metrics are collected using MAP's CUDA Kernel Analysis feature based on CUPTI, CUDA's profiling interface. MAP supports profiling compiler optimized code but it is necessary to compile with the flag -lineinfo to use MAP's CUDA Kernel Analysis feature. CUDA Kernel Analysis can be enabled with the GUI's Run Dialog or with the command line using --cuda-kernel-analysis. An example workflow is

$ nvcc -O3 -g -lineinfo cuda_app.cu -o cuda_app
$ map --profile --cuda-kernel-analysis cuda_app

The following is a MAP profile of CloverLeaf_CUDA on Oak Ridge National Laboratory's Summit. It demonstrates both GPU Metric collection and CUDA Kernel Analysis on PowerPC. 

New GPU Profiling Metrics

Graphical Interface Refresh

Forge has been updated to Qt 5, which means a crisper and more performant GUI, as well as bug fixes and stability improvements. In particular, macOS is better supported when in dark appearance mode.

Documentation

Developer and reference guides

  • Arm Performance Reports is a merged component of the Arm Forge product from version 20.1 onwards.
    The Arm Performance Reports user guide is now combined with the Arm Forge user guide, and is available from https://developer.arm.com/docs/101136/latest.
  • Arm License Server user guide is available from https://developer.arm.com/docs/101169.

Conclusion

Despite the very unusual times, we are all experiencing, the team has been able to push new, innovative features. With this release, Arm is the first company to release a parallel debugger for Python which includes all the features one would expect. We are looking forward to hearing what you think, just click on the button below!

Give us feedback

The team joins me in wishing all of you and your families the very best. Stay healthy, stay safe.

Anonymous
  • Kelly Peters
    Offline Kelly Peters 1 month ago

    I actually love how this thing works. The features of the parallel debugger for Python are impressive.

    Do you have a dedicated server hosting for games?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • a_unique_name
    Offline a_unique_name over 1 year ago

    ARM DDT debugger for Python is awesome! Can you tell me where can I enter the command?

    $ bin/ddt mpirun -np 2 python3 %allinea_python_debug% python-blog.py

    After connecting the remote machine with ARM Forge client, the DDT does not give me an terminal.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
High Performance Computing (HPC) blog
  • AWS Graviton3 improves Cadence EDA tools performance for Arm

    Tim Thornton
    Tim Thornton
    In this blog we provide an update to our use of Cadence EDA tools in the AWS cloud, with a focus on Graviton3 performance improvements.
    • November 16, 2022
  • A case study in vectorizing HACCmk using SVE

    Brian Waldecker
    Brian Waldecker
    This blog uses the HACCmk benchmark to demonstrate the vectorization capabilities and benefits of SVE over NEON (ASIMD)
    • November 3, 2022
  • Bringing WRF up to speed with Arm Neoverse

    Phil Ridley
    Phil Ridley
    In this blog we examine the WRF weather model and examine the performance improvement available using AWS Graviton3 (Neoverse V1 core) compared to AWS Graviton2 (Neoverse N1 core).
    • October 19, 2022