This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sparse matrix vector multiplication performance

I recently compared Sparse matrix vector multiplication (SpMV) performance of armpl library and native implementation.

The performance of armpl_spmv_exec_d function is the same as native implementation for a sparse matrix in CSR format with dimension of 16M x 16M.  

Is that expected?

The compilation is with arm compiler for linux (acfl) with flag '-Ofast -mcpu=native' to enable sve and fast math.

 
Parents
  • Hi.

    The Arm Performance Libraries SpMV implementation is specifically optimized around cases where the matrix is reused many times, rather than just a single invocation.  This means that a longer time may be spent in the initial "armpl_spmv_optimize()" phase.  The key here is to use the 'hints' system to let the library know that it is worthwhile doing this optimization.  This is demonstrated in the example on:

            https://developer.arm.com/documentation/101004/2202/Sparse-Linear-Algebra/Example-of-SpMV-usage

    with the line 

            info = armpl_spmat_hint(armpl_mat, ARMPL_SPARSE_HINT_SPMV_INVOCATIONS, ARMPL_SPARSE_INVOCATIONS_MANY);

    If you are just doing a single invocation of “armpl_spmv_exec_d()” without having done the "armpl_spmv_optimize()" call then it assumes that only a single call will be done, and hence a standard CSR method will be used.  With a matrix of the size you are talking about you will then be limited by the available memory bandwidth on your system and hence getting similar performance is to be expected.  Having done an optimize operation we tend to see between 10% -> 2x performance increases, but this will be dependent on the sparsity structure of your matrix.  There are occasional cases where CSR-performance is as good as is achievable but those cases have very unfriendly sparsity structures. 

    Hope this helps.

    Chris

Reply
  • Hi.

    The Arm Performance Libraries SpMV implementation is specifically optimized around cases where the matrix is reused many times, rather than just a single invocation.  This means that a longer time may be spent in the initial "armpl_spmv_optimize()" phase.  The key here is to use the 'hints' system to let the library know that it is worthwhile doing this optimization.  This is demonstrated in the example on:

            https://developer.arm.com/documentation/101004/2202/Sparse-Linear-Algebra/Example-of-SpMV-usage

    with the line 

            info = armpl_spmat_hint(armpl_mat, ARMPL_SPARSE_HINT_SPMV_INVOCATIONS, ARMPL_SPARSE_INVOCATIONS_MANY);

    If you are just doing a single invocation of “armpl_spmv_exec_d()” without having done the "armpl_spmv_optimize()" call then it assumes that only a single call will be done, and hence a standard CSR method will be used.  With a matrix of the size you are talking about you will then be limited by the available memory bandwidth on your system and hence getting similar performance is to be expected.  Having done an optimize operation we tend to see between 10% -> 2x performance increases, but this will be dependent on the sparsity structure of your matrix.  There are occasional cases where CSR-performance is as good as is achievable but those cases have very unfriendly sparsity structures. 

    Hope this helps.

    Chris

Children