What's the difference in linking and usage between libarmpl.a and libarmpl_mp.a?
Anything special is required of linking/employing libarmpl_mp.a in user's engine? Under almost the same environment, Is it enough to replace Intel MKL functions with ARM corresponding ones to achieve high performance?
Same question here, but with different and linked issue: performance: single thread looks OK while multi-thread poor . - High Performance Computing (HPC) forum - Support forums - Arm Community
It's never enough to just replace a function call. To achieve highest performance for a particular machine, the code must be carefully tailored to that machine.
The version of the library with "_mp" in its name allows the library to do OpenMP multithreading internally, whereas the ones without do serial implementations. Which one you choose will depend on your overall application and runtime system.
* if you have a serial application but wanted to be able to run cases with big matrices that you want the library to parallelise internally to use all the cores on your system then the libarmpl_mp version would be best.
* if you have an application that is already parallelised to use all your available cores, or an MPI case where you are running it to fill your system then the serial version would be the right choice to avoid oversubscription of the cores
* If you have a case which allows a small degree of parallelism at the application level, but not enough to fill all the cores on your system you may choose to use nested parallelism through setting the usual OpenMP environment variables.
Yes, switching out MKL should not require any changes to your code, assuming you are sticking to the BLAS, LAPACK and FFTW interfaces. We try and ensure that the correct tuning is hidden away inside the library so the user never needs do anything different.
Hope this helps.