Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Boosting OpenFOAM behavior with Arm Performance Reports
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • High Performance Computing (HPC)
  • HPC Tools
  • Development Tools
  • Arm Performance Reports
  • infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Boosting OpenFOAM behavior with Arm Performance Reports

Florent Lebeau
Florent Lebeau
November 3, 2014
4 minute read time.

OpenFOAM, developed by ESI-OpenCFD is one of the most popular tools for developing CFD (Computational Fluid Dynamics) applications, along with ANSYS Fluent or CD-Adapco Star-CCM+.

Most modules of OpenFOAM are heavily optimized and offer little room for improvement at the code level – but surprisingly there are still many rewards that can be had by making sure that OpenFOAM makes the best of your system.

Arm Performance Reports looks inside applications and diagnoses how well they are performing and where issues might be.

In this article, we focus on OpenFOAM’s interFOAM solver on a small scale, on one server.

We show how Arm Performance Reports help to increase the efficiency of our usage and reach the highest level of performance that our machine can offer.

We'll assume you have OpenFOAM already up and running – and will take an example from OpenFOAM’s tutorial: damBreak.

Analyzing our performance

This example solves a problem of a dam break in two dimensions using interFOAM solver for 2 incompressible, isothermal and immiscible fluids.

Dam break in two dimensions using interFOAM

Starting with Arm Performance Reports is very easy. Just add "perf-report" in front of the mpirun command - and you are good to go.

$ perf-report mpirun –n 8 \

       /home/allinea/OpenFOAM/OpenFOAM-2.3.0/applications/linuxGccx86_64/interFOAM –parallel

NB If you are trying this for yourself but see an error – you may need to follow these short steps on how to compile OpenFOAM for profiling.

This command runs the application, generates the scientific results you would normally expect, and Arm Performance Reports creates a html and a raw text file containing profiling information about this run.

Here is the information displayed by Arm Performance Reports - on 8 processes from that example.

Openfoam profiling performance reports

At a glimpse, we have an overview of the application behaviour with communications, computing and disk access – and some more specific profiling information and hints to help us understand what could be improved.

Although the application is CPU-bound, most of the CPU time is spent doing memory access. The report also indicates that the code is poorly vectorized. The time spent in MPI communications does not seem very efficient either.

There may be room for improvements here, and Arm Performance Reports provides us with several hints :

  • Do we have a workload imbalance and did OpenFOAM not split the workload correctly?
  • Has the current build been compiled with the appropriate optimization options?
  • Could OpenFOAM solvers and loops be better optimized?
  • Is there a better way to start OpenFOAM?

Is my workload unbalanced?

With communication at over 14% of run time for an application running on a single server, that sounds high. Perhaps we should explore the workload distribution - the domain decomposition. Can we get some hints as to how the mesh is split across the processes?

A good proxy for data distribution is the quantity of memory usage. Arm Performance Reports suggests a reasonable balance has been achieved:

Openfoam profiling performance reports 2So, there's very little we can do there this time. Let's try another optimization.

Can I improve the processor usage?

At almost 85% of the time, processor usage is up high where we want it to be, but is it good usage?

We can see from the CPU section of the report that a lot of CPU time - 59% - is spent in memory accesses. This is very high - it's a sign that we don't have a great memory access pattern - we'd rather be spending time in floating point operations. We're suffering from poor cache usage.

We probably cannot change the vectorization achieved (that usually requires source code or compiler magic). However, we may be able to improve cache usage through improving the spatial and temporal locality. Let's do that by increasing the number of MPI processes to 12.

Let’s have a look at the memory access again by profiling the application with Arm Performance Reports.

Openfoam profiling performance reports 3

The memory accesses have decreased down to 38%. This is still high but it brings noticeable increase in performance. The execution time has been reduced: approximately 40 seconds instead of 45. Communication time now represents 28% of the application and the overall MPI communications are worse - less bandwidth and more synchronization. That is to say, some of the time saved by increasing the cache usage is lost because of longer MPI call durations and poorer communication!

Even though we were working on one node and on a small scale, we already have a good understanding of OpenFOAM. For this data set, the limiting factors for OpenFOAM are the memory bandwidth and the communication.

What's next for my simulation?

With only two runs of OpenFOAM through Arm Performance Reports, we have been able to understand this key behavior.

In a future article, we will validate those findings on a multi-node environment. And as we will try and scale up, new questions will be triggered.

Exploring bottlenecks and finding improvements without touching the source code is really easy with Arm Performance Reports. With this tool, you can answer:

  • what scale is best for a given domain size and mesh resolution?
  • how should the meshes be dimensioned and spread across processes?
  • what can get the best efficiency and increase the system productivity without touching the code or the hardware?

The Arm report also forms a reference you can rely on. Hardware faults, software upgrades issues - all those can impact the profile and the efficiency of your applications. With Arm reports, you can track those problems down and get the best from your cluster in production. Why not take of a trial of Arm Performance Reports on your CFD simulations today.

Anonymous
Servers and Cloud Computing blog
  • Hands-on with MPAM: Deploying and verifying on Ubuntu

    Howard Zhang
    Howard Zhang
    In this blog post, Howard Zhang walks through how to configure and verify MPAM on Ubuntu Linux.
    • September 24, 2025
  • DPDK scalability analysis on Arm Neoverse V2

    Doug Foster
    Doug Foster
    Deep dive into DPDK performance on Arm Neoverse V2, analyzing system bottlenecks and providing guidance on optimizing performance.
    • September 23, 2025
  • Out-of-band telemetry on Arm Neoverse based servers

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and Insyde advance out-of-band telemetry on Neoverse servers, enabling scalable, real-time datacenter insights via open standards and fleet analytics.
    • September 17, 2025