MPI containers execution models

March 18, 2020

6 minute read time.

In this article, I discuss the use of containers with the Message Passing Interface (MPI). Regardless of opinions, MPI applications still cover most High Performance Computing (HPC) workloads. In fact, even Machine Learning workloads do not really change that fact since a significant fraction of machine learning frameworks are based on MPI for multi-node execution.

So, what does it mean to run MPI containers? What do I need to know, as a developer, before trying to create and run MPI containers? As a user, how can I manage my MPI containers if I am not an MPI expert? Using MPI can already be a daunting task and adding a container runtime may seem to only make everything more complicated. And to some extends, it is true.

First, containers bring the same advantages than for non-MPI workloads, including help with portability, application packaging, data packaging, reproducibility, and software sharing between users.

But MPI does not understand what a container is, when I get on a HPC system, I usually get an mpirun command and most certainly a job manager. And it is where, you, as a developer need to make a choice based on your goals and the configuration of the target execution platforms. To help make these choices, I describe in this article the 3 different execution models for MPI containers that are commonly accepted by the community: hybrid (the most popular model), embedded, host-only. For each model, I give an overview of the positives and negatives implications.

Hybrid Model

This is by far the most popular model for the execution of MPI applications in containers. With the hybrid model, the mpirun command on the host, or the job manager, is used to start the container which ultimately will be executing the MPI ranks. This model is called hybrid because it requires both an MPI implementation on the host and another implementation in the container image. The host MPI provides the mpirun command and, with most MPI implementations, a runtime capability on compute node to start ranks or containers. MPI in the container image is used to actually run the application. Of course, this means that both MPI implementations need to be “compatible” since they need to tightly interact with each other.

For example, assuming Singularity containers and 2 MPI ranks, the command to start an MPI application looks like:

$ mpirun -np 2 singularity run ./my_hybrid_container.sif /path/to/app/in/container

This example explicitly shows that the mpirun on the host is used to start to ranks. Also, for each rank, the command is in fact a Singularity command that will start first the container and then the rank within the container.

A good question is what “compatibility” exactly means. Ultimately, what users want to know is a two-fold question: If my target execution platform has MPI implementation X version Y, can I run my container that is based on the MPI implementation X version Z? And if not, which version do I need to install on the host and how am I supposed to install that specific version of MPI (meaning the configuration details)?

Given a specific implementation of MPI, for example, Open MPI, there is therefore the need for compatibility matrices. Based on a container runtime, the matrix shows what version of Open MPI in a container works with a specific version of Open MPI on the host. And this is where you will quickly notice if the container runtime is HPC-friendly or not. For instance, to the best of my knowledge, there is no such official compatibility matrix for Docker. On the other hand, the Singularity ecosystem includes a specific tool to automatically create such compatibility matrices. More details about this in a later blog article.

Lastly, note that very few container ecosystems provide the required tools to assist developers and users. In theory, an MPI-friendly ecosystem would provide a series of tool with the following capabilities:

Capture how a container was created, especially how MPI in setup in the image. This ensures that all the required details to setup the host are available.
At runtime, use this information to validate the configuration of the hos and potentially update the host configuration. For example, the tool would install a configuration of MPI that is compatible with the container to execute.

Without such tool, it is the responsibility of the developer to track all relevant information and assist users when they try to run their containers on various HPC platforms. To the best of my knowledge, only Singularity provides such a tool: https://sylabs.io/articles/2019/11/create-run-and-manage-your-mpi-containers-in-a-few-steps.

Embedded Model

The second model is called embedded. With this model, the MPI implementation in the container is solely used; no MPI implementation is required on the host. This approach has the benefit of being extremely portable: the containers are self-contained and can be executed pretty much anywhere (at least from an MPI point-of-view). Unfortunately, this model requires a more advanced understanding of the MPI implementation to make sure that when starting the first container, mpirun can be executed from that container. From there. the MPI implementation starts all other containers on all target compute nodes. In other words, this is the responsibility of the developer to ensure that the MPI implementation is correctly set up for all target execution platforms . This is usually a non-trivial task, especially when problems arise and require debugging.

Assuming Singularity containers and 2 MPI ranks, the command to start an MPI application looks like:

$ singularity exec ./my_embedded_container.sif mpirun -np 2 /path/to/app/in/container

This example illustrates that a Singularity container is first started and mpirun within the container is then executed to start MPI ranks. This assumes that the MPI implementation is set up to guarantee that when an MPI rank is started, a container is first started and then the rank within it. The user is responsible for ensuring this. I do not detail the technical details since it is implementation specific. Overall, this solution is very portable but technically challenging because requires a precise and detailed understanding of the MPI and container runtime configurations.

Host-only Model

The last model is the host-only model. With this model, only the MPI implementation from the host is used to start and execute the MPI application in containers. This means that the application in the container image has been compiled with an MPI implementation that is “compatible” with the MPI available on the host.. The term "compatible" is the same than with the hybrid model. This means that the container image is not as portable as with other models. The advantage is the small size of the container which does not need to include any MPI implementation. Instead, the MPI implementation from the host being mounted into the container and used by the application.

The following example illustrates how a host only MPI container can be executed, assuming Singularity and 2 MPI ranks are used:

$ mpirun -np 2 singularity \
 -b /host/directory/where/mpi/is:/container/directory/where/mpi/is/assumed/to/be \
 ./my_hostonly_container.sif \
 /path/to/app/in/container

This example shows how the user of the container is responsible for figuring out in which directory on the host the MPI implementation is installed. The user is then responsible for mounting that directory into the container. As a result, this solution is potentially less portable: the container must be “prepared” for the MPI implementation available on the host. On the other hand, the image does not have to include any MPI and is therefore smaller.

SyMPI

The MPI forum

0 comments
0 members are here

Servers and Cloud Computing blog

Unlocking Performance and Cost Savings with Arm Neoverse-powered AWS Graviton for Zilliz Cloud

Jiang Chen

This blog explores how Zilliz Cloud migrated from x86 to Arm CPUs to boost performance and slash costs with Arm Neoverse CPUs for compute-intensive AI workloads, scalable vector search and RAG pipelines…
- July 21, 2025
Introducing New Sparse Functions in Arm Performance Libraries 25.07

Chris Armstrong

In this blog, we introduce the new sparse functions added in Arm Performance Libraries 25.07. We also take a closer look at new features and share performance insights based on benchmarks running on Arm…
- July 16, 2025
How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3

Peter Ma

Migrating to Arm-based AWS Graviton3 improved SiteMana’s scalability, latency, and costs while enabling real-time ML inference at scale.
- July 4, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

MPI containers execution models

Hybrid Model

Embedded Model

Host-only Model

Unlocking Performance and Cost Savings with Arm Neoverse-powered AWS Graviton for Zilliz Cloud

Introducing New Sparse Functions in Arm Performance Libraries 25.07

How SiteMana scaled real-time visitor ingestion and ML inference by migrating to Arm-based AWS Graviton3