With the new generation of Arm CPUs, we discovered a chance to significantly cut processing time and increase the efficiency of our sparse solution, as well as prepare to utilize previously unexplored parallelization potential. We therefore decided to make the solver Pardiso Arm compatible.” (Prof. Olaf Schenk, Director, Panua Technologies, Lugano, Switzerland).
In the realm of computational mathematics, optimizing the performance of sparse linear solvers is essential for tackling a wide range of large-scale scientific, engineering, and data analytics problems efficiently. Pardiso, a renowned sparse direct linear solver developed by Prof. Olaf Schenk from the Università della Svizzera italiana (USI), and now distributed by the USI spin-off Panua Technologies, excels in this domain. This blog outlines the strategies utilized to enhance Pardiso's performance by leveraging the Arm architecture and presents a comparative study with Intel MKL Pardiso.
Benchmarking Pardiso’s execution on the Arm Neoverse N1-based Ampere® Altra® Max M128-30 and on Neoverse V1 against the Intel Math Kernel Library Pardiso on Intel Xeon Platinum 8360Y (“Ice Lake”) processors, the Arm Neoverse V1 showcases significant performance gains and parallelization capabilities.
Figure 1: Execution time for MKL-Pardiso and Panua-Pardiso on 64 cores Intel and Arm on 500x500 finite uniform grid example
We illustrate in Figure 1 the execution times for the solution of linear systems with a matrix emerging from a 500x500 finite uniform grid with 20 degrees of freedom and 10 Lagrange multipliers and observe an improvement up to 5.5x on Arm processors, compared to Intel MKL.
Figure 2: Execution time for MKL-Pardiso and Panua-Pardiso – Automobile Sheet Forming
The same set of experiments is conducted for a matrix arising from a finite-element automobile sheet metal forming simulation, with 6 degrees of freedoms and approx. 1 million rows and columns, and 23 million non-zeros. Figure 3 demonstrates a reduction of more than 5x in the execution time on Arm architectures, compared to Intel MKL.
Figure 3: Scalable components of Panua-Pardiso on Arm Neoverse V1
Then in Figure 3 we observe a speedup of the numerical factorization and solution process of 7x when utilizing all the available cores of a Neoverse V1 processor, respectively.
By leveraging Arm architectures, Pardiso is consolidating itself as the premier solver for sparse linear systems. Exploiting advanced hardware architectures and cutting-edge mathematical software, Panua Technologies drives innovation in the field of computational mathematics. Visit www.panua.ch/ for additional information on Pardiso, including more performance comparisons and examples of usage, as well as instructions on how to obtain a Panua license of execution.
“Pardiso is an AutoForm core technology for the efficient simulation of sheet metal forming and assembly processes. With the bitwise parallel reproducible system in Pardiso, we map our important stages of our parallel numerical solution processes, from one-step simulation to more advanced finite-element calculation to higher effective areas” (Dr. Mike Selig, Development Manager, AutoForm Engineering Zurich, Switzerland).
"NXP Semiconductors have been using Pardiso solver software successfully for many years inside the circuit simulator Mica. Time is the enemy of the circuit designer engineer and priorities are speed, accuracy, ease of use, cost savings and overall efficiency as the Panua-Pardiso solver is up to 10 times faster than the competitors for NXP circuit matrices”. (Prof. Matthias Bollhoefer, TU Braunschweig, Germany).