I recently compared Sparse matrix vector multiplication (SpMV) performance of armpl library and native implementation.
The performance of armpl_spmv_exec_d function is the same as native implementation for a sparse matrix in CSR format with dimension of 16M x 16M.
Is that expected?
The compilation is with arm compiler for linux (acfl) with flag '-Ofast -mcpu=native' to enable sve and fast math.
-Ofast -mcpu=native' to enable sve and fast math.
Thanks for the suggestions,
That's what I have tried. depending on the characteristics of the sparse matrix, I saw 10% - 25% improvement with the armpl function.
Thanks for letting us know. Glad to hear you get a benefit.