I recently compared Sparse matrix vector multiplication (SpMV) performance of armpl library and native implementation.

The performance of armpl_spmv_exec_d function is the same as native implementation for a sparse matrix in CSR format with dimension of 16M x 16M.

Is that expected?

The compilation is with arm compiler for linux (acfl) with flag '`-Ofast -mcpu=native' to enable sve and fast math.`