Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Innovation
    • Open Source Software and Platforms
  • Forums
    • AI and ML forum
    • Architectures and Processors forum
    • Arm Development Platforms forum
    • Arm Development Studio forum
    • Arm Virtual Hardware forum
    • Automotive forum
    • Compilers and Libraries forum
    • Graphics, Gaming, and VR forum
    • High Performance Computing (HPC) forum
    • Infrastructure Solutions forum
    • Internet of Things (IoT) forum
    • Keil forum
    • Morello Forum
    • Operating Systems forum
    • SoC Design and Simulation forum
    • 中文社区论区
  • Blogs
    • AI and ML blog
    • Announcements
    • Architectures and Processors blog
    • Automotive blog
    • Graphics, Gaming, and VR blog
    • High Performance Computing (HPC) blog
    • Infrastructure Solutions blog
    • Innovation blog
    • Internet of Things (IoT) blog
    • Mobile blog
    • Operating Systems blog
    • Research Articles
    • SoC Design and Simulation blog
    • Smart Homes
    • Tools, Software and IDEs blog
    • Works on Arm blog
    • 中文社区博客
  • Support
    • Arm Support Services
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • Community Help
  • More
  • Cancel
Arm Community blogs
Arm Community blogs
High Performance Computing (HPC) blog Profiling Python and compiled code with Arm Forge – and a performance surprise
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI and ML blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded blog

  • Graphics, Gaming, and VR blog

  • High Performance Computing (HPC) blog

  • Infrastructure Solutions blog

  • Internet of Things (IoT) blog

  • Operating Systems blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • Performance Reports
  • Arm Forge
  • python
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Profiling Python and compiled code with Arm Forge – and a performance surprise

Patrick Wohlschlegel
Patrick Wohlschlegel
March 28, 2018

If you are developing HPC applications, there is a good chance that you have been in contact with Python these days. Whether you use Python to orchestrate large workflows, to quickly put together small prototypes, to visualize data or even to create actual simulations, you’ve likely either used or written Python code at some point in your day job.

Python brings a lot of advantages, such as its capacity to enable productivity, but it is often described as being slow when it comes to performance.  Developers typically assume that most of the execution time is spent in compiled, optimized C/C++ or Fortran libraries (e.g. NumPy) which are called from Python. But is that truly the case? How confident are you that your application is not wasting your precious computing resources for the wrong reasons?

In Arm Forge and Arm Performance Reports 19.0, we have added the Python profiling capabilities you need to hunt down and resolve bottlenecks for your Python codes in the blink of an eye and at scale. Too good to be true? Let’s get to it using our profiling tool, Arm MAP!

First off, profile your application like you always have, using the following command:

map --profile mpirun -n 2 python ./demo.py

This command generates the profile information you need. Let’s open it up with the command:

map ./profile.map

Profiling Python before Arm MAP
 

If your code spends time in the Python interpreter, the information will be plotted in pink in the graphical user interface. In this particular example, we realize fairly quickly that we are spending the vast majority of the execution time in the python interpreter. That’s not what we expected! Actually, an innocuous multiplication in a loop is taking most of our time!  We can do better!

By simply replacing this line of code by a call to numpy.multiply() we manage to replace operations performed by the interpreter by a compiled library call. How does this impact the efficiency of our code? Quickly profiling the new application with Arm MAP gives the following:

Profiling Python after Arm MAP

What a change! We now spend only 1% of the time in the Python interpreter (down from 80.2%) and the small loop runs in a fraction of the time. Within just five minutes, we have been able to run the same code more than 10 times faster (from 41.2 seconds down to 3.6 seconds).

And this is just one of the problems you can resolve. Better load balancing of large workflows orchestrated by Python frameworks, more intelligent data accesses… The pitfalls Arm Forge 19.0 can help you avoid are countless.

As usual, this feature is available on any hardware architecture. If you are interested, simply download the latest Forge and Performance Reports builds and install it on your cluster. Use your existing licence. If you are not yet part of the Forge tools family, do feel free to request a temporary trial licence or give us a shout at Sales-hpc-sw@arm.com. The whole team looks forward to hearing from you.

Visit our Python tutorial for the latest hints and tips. To view our Python profiling webinar please visit our YouTube channel.

Anonymous
Parents
  • KC.
    Offline KC. over 3 years ago

    Cool tool.  I'm confused by the example though.  The first program computes 5**100000 while the second one just computes 25 over and over.  Why wouldn't we expect the second one to run faster?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • KC.
    Offline KC. over 3 years ago

    Cool tool.  I'm confused by the example though.  The first program computes 5**100000 while the second one just computes 25 over and over.  Why wouldn't we expect the second one to run faster?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
  • Patrick Wohlschlegel
    Offline Patrick Wohlschlegel over 3 years ago in reply to KC.

    Very good point, the codes are not doing the same thing. I'll have a closer look. :)

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Patrick Wohlschlegel
    Offline Patrick Wohlschlegel over 3 years ago in reply to KC.

    Quick heads-up: the blog post was updated with a better and (hopefully!) factually correct demo. Thanks a lot for bringing this to my attention!

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
High Performance Computing (HPC) blog
  • Key workloads demonstrate how Arm servers are changing HPC

    David Lecomber
    David Lecomber
    In the blog we look at the progress made in the Arm HPC application ecosystem and give a preview of our activities at ISC'22.
    • May 24, 2022
  • Arm Compilers and Performance Libraries for HPC developers now available for free

    Ashok Bhat
    Ashok Bhat
    Arm C/C++/Fortran Compilers and Arm Performance Libraries, aimed at HPC application developers, are now available for free. You no longer need license files to use the tools.
    • May 9, 2022
  • Stoking the Fire in Arm HPC

    David Lecomber
    David Lecomber
    In this blog we look at the growth of Arm in HPC - from humble beginnings to the number one ranked supercomputer in the world
    • May 3, 2022