I recently compared Sparse matrix vector multiplication (SpMV) performance of armpl library and native implementation.
The performance of armpl_spmv_exec_d function is the same as native implementation for a sparse matrix in CSR format with dimension of 16M x 16M.
Is that expected?
The compilation is with arm compiler for linux (acfl) with flag '-Ofast -mcpu=native' to enable sve and fast math.
-Ofast -mcpu=native' to enable sve and fast math.
Thanks for letting us know. Glad to hear you get a benefit.