This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Poor DGESVD performance in ArmPL

Hello,

I discovered this issue on macOS but it applies to Linux too. It turned out that one of my workloads runs a lot slower with ArmPL compared to vecLib, OpenBLAS and even vanilla LAPACK. I did some profiling and the culprit seems to be DGESVD. My applications does a large number of calls to DGESVD, some of which are single-threaded while others are parallel single-thread calls. I investigated one of the most common single-thread calls and it processes for 1320 ns with ArmPL, 680 ns with OpenBLAS and 400 ns with LAPACK.

Bellow is a summary of the inputs with number of calls in the first column and the rest are the input arguments excluding the input arrays:

Single-thread
----------------------------------
61891934 DGESVD S S 3 3 3 3 3 -1 0
61891934 DGESVD S S 3 3 3 3 3 6 0
     420 DGESVD N N 1 1 1 1 1 -1 0
     420 DGESVD N N 1 1 1 1 1 5 0
     840 DGESVD N N 2 2 2 1 1 -1 0
     840 DGESVD N N 2 2 2 1 1 10 0
     392 DGESVD N N 3 3 3 1 1 -1 0
     392 DGESVD N N 3 3 3 1 1 15 0
      84 DGESVD N N 3 6 3 1 1 -1 0
      84 DGESVD N N 3 6 3 1 1 15 0
      56 DGESVD N N 3 9 3 1 1 -1 0
      56 DGESVD N N 3 9 3 1 1 18 0
      84 DGESVD N N 4 4 4 1 1 -1 0
      84 DGESVD N N 4 4 4 1 1 20 0
      56 DGESVD N N 8 8 8 1 1 -1 0
      56 DGESVD N N 8 8 8 1 1 40 0

Multi-thread
----------------------------------
   1204 DGESVD N N 3 3 3 1 1 -1 0
   1204 DGESVD N N 3 3 3 1 1 15 0
    252 DGESVD N N 2 2 2 1 1 -1 0
    252 DGESVD N N 2 2 2 1 1 10 0
    119 DGESVD N S 3 3 3 1 3 -1 0
    119 DGESVD N S 3 3 3 1 3 6 0
     84 DGESVD N N 3 2 3 1 1 -1 0
     84 DGESVD N N 3 2 3 1 1 10 0
     84 DGESVD N N 3 6 3 1 1 -1 0
     84 DGESVD N N 3 6 3 1 1 15 0
     56 DGESVD N N 3 9 3 1 1 -1 0
     56 DGESVD N N 3 9 3 1 1 18 0
     35 DGESVD N N 1 1 1 1 1 -1 0
     35 DGESVD N N 1 1 1 1 1 5 0
     49 DGESVD N S 2 2 2 1 2 -1 0
     49 DGESVD N S 2 2 2 1 2 4 0
     42 DGESVD N N 8 8 8 1 1 -1 0
     42 DGESVD N N 8 8 8 1 1 40 0
      7 DGESVD S S 9 10 9 9 9 -1 0
      7 DGESVD S S 9 10 9 9 9 45 0

It would be nice to have this fixed. Thanks!

  • Hi,

    Thanks for the feedback. These are *very* small problems, and I think what you are seeing are the overheads of the setup & handling of more optimized code paths for larger problems, hence why PL and OpenBLAS are actually slower than reference for these problems. We'll take a look, thanks!

    Chris.