Arm announces the release of Arm Forge 20.1 featuring various enhancements and bug fixes across all products. In particular, this new release includes:
For more in-depth information including a full breakdown of the latest features and bug fixes please see release notes history.
Attach to mpi4py and serial applications by inserting %allinea_python_debug% into the command line:
bin/ddt mpirun -np 2 python3 %allinea_python_debug% python-blog.py
bin/ddt python3 %allinea_python_debug% python-blog.py
The MPI version drops you into a stack starting at "import mpi4py". Using step-in, step-out and step-over in Python code all work in the same way as C, C++ and Fortran.
All the same advanced breakpoint features as C and C++ are supported including function name, conditional and triggering every N hits breakpoints.Setting a breakpoint in a loop with the condition i == 10 stops at that point, as can be seen from the local variables:
i == 10
The evaluation window can be used to inspect globals, locals or even execute Python expressions in the selected frame:
As well as debugging Python, the stack also shows a merged view of Python and native code. So the steps that led up to some native code being executed are visible. For example, here is what the stack looks like when pausing in a numpy dot product which uses BLAS under the hood. Registers can be inspected and instructions can be stepped over using Forge's assembly debugging mode:
Performance Reports is now distributed with Forge as a single combined installation, launched via bin/perf-report in the Forge installation directory.
We have also renamed some lesser-used or behind-the-scenes binaries and scripts to be more appropriate. While this will not affect most users, any users of manual launch should use forge-client instead of (ddt-client or allinea-client) and users of .qtf scripts should use forge-mpirun in place of ddt-mpirun.
CUDA 10.2 and GPU Metrics are now supported on x86_64 and PowerPC. We have removed the "GPU Temperature" and "Time Spent in Global Memory Accesses" metrics to provide a more stable metric collection mechanism that is consistent across supported platforms.
GPU Utilization, GPU Memory Usage and GPU Power Usage are collected once the NVIDIA Management library is installed (https://developer.nvidia.com/nvidia-management-library-nvml). Warp Stall Reasons and Line metrics are collected using MAP's CUDA Kernel Analysis feature based on CUPTI, CUDA's profiling interface. MAP supports profiling compiler optimized code but it is necessary to compile with the flag -lineinfo to use MAP's CUDA Kernel Analysis feature. CUDA Kernel Analysis can be enabled with the GUI's Run Dialog or with the command line using --cuda-kernel-analysis. An example workflow is
$ nvcc -O3 -g -lineinfo cuda_app.cu -o cuda_app
$ map --profile --cuda-kernel-analysis cuda_app
The following is a MAP profile of CloverLeaf_CUDA on Oak Ridge National Laboratory's Summit. It demonstrates both GPU Metric collection and CUDA Kernel Analysis on PowerPC.
Forge has been updated to Qt 5, which means a crisper and more performant GUI, as well as bug fixes and stability improvements. In particular, macOS is better supported when in dark appearance mode.
Despite the very unusual times, we are all experiencing, the team has been able to push new, innovative features. With this release, Arm is the first company to release a parallel debugger for Python which includes all the features one would expect. We are looking forward to hearing what you think, just click on the button below!
Give us feedback
The team joins me in wishing all of you and your families the very best. Stay healthy, stay safe.