Hi,
I'm currently doing research in exploring how performance differs between different sparsity formats. I'm using the ARM PL sparse API to perform benchmarks on SPMV (Spare matrix/dense vector) with different formats. I notice that there are supported formats COO, CSR, CSC, and BSR. Ideally, I would like to benchmark all formats, but I've been running into trouble getting some of these formats to work. It's hard for me to debug since the library's source code is closed.
Note that I am not calling the armpl_spmv_optimize API function. Because if I understand correctly, this will automatically optimize the matrix to an optimal format, based on the sparse matrix structure or previous runs. Since the focus of my research is study of different formats, I don't call this to not skew any results with my observations on different formats.
Unfortunately, when benchmarking, I get segmentation faults during armpl_spmv_exec_s for CSC and BSR formats. COO and CSR format work flawlessly for me, but when I convert the same code to support CSC or BSR format, it fails to run armpl_spmv_exec_s. It's hard for me to debug or know what's really going on inside, and wanted to ask if any developer has insight on this error? Could it be that BSR or CSC are not fully supported for SPMV?
I benchmarked a subset of matrices from the SuiteSparse Matrix Collection in single precision. I make sure that the matrices fit on to memory. The subset I benchmarked works successfully on CSR and COO formats, but fails when I change to CSC or BSR. I will note however, the same matrices and formats (CSR, CSC, COO, BSR) all work on NVIDIA's cuSPARSE. The platform I am benchmarking on is NVIDIA Jetson AGX Xavier that has 8 core ARMv8 CPU.
Hi Brian,
Sorry to hear you've run into trouble. Could you let me know which version of Arm PL you're using, including: version, compiler (& compiler version), whether you're linking to the int32 or int64 library, and whether you're linking to serial or OpenMP builds? If you could, please also share your link line.
Thanks, Chris.
Hi Chris,
Thanks for the prompt response. Here is the info you requested.
Thanks, that all looks sensible to me. We released a new version of the libraries a few weeks ago (23.04) so you could try that. However, we didn't have any bug reports to fix in this area between 22.1 and 23.04, so it's an outside chance that you'd see a different result.
https://developer.arm.com/downloads/-/arm-performance-libraries
Assuming you see the same...
Do all of the CSC, BSR tests fail, or is it just a subset of your selection?
And is there any chance you could send a reproducer to support-hpc-sw@arm.com so that we can investigate further?
Hi, it seems to be a subset of my selection from what I initially see. I'm currently blocked on debugging further because I'm running some other benchmarks, but I'll give this new version a try. I'll also try to send a reproducible code to the email you linked. It will take some time to refactor my current code since it's a large project.
Thanks Brian. Recieved and we'll be in touch.
This was found to be a problem with integer overflow in the versions of Arm PL which take 32-bit integers, i.e. "lp64" or "int32" libs. A workaround is to switch to using 64-bit integers and link to the "ilp64" or "int64" libs. We have fixed the issue in the int32 libs in the ACfL/Arm PL 23.04.1 release.