Hi everyone,
I'm currently implementing a sparse daxpyi operation on an aarch64 platform, using the inspector-executor pattern.
I compared three implementations:
Results:
I'm wondering:
Thanks in advance for any insights or suggestions! If helpful, I can share more details.
void arm_daxpyi2(const int n, const double alpha, const double *x, const int *indx, double *y) { // Early return if alpha is zero (no operation needed) if (alpha == 0.0) { return; } // use std::max_element to find the maximum index const int full_size = *std::max_element(indx, indx + n) + 1; // Find max index in indx array // Create sparse vector descriptor for x armpl_spvec_t spvec_x; armpl_status_t status = armpl_spvec_create_d(&spvec_x, // Pointer to sparse vector object to create 0, // Index base (0 for C-style indexing) full_size, // Dimension of the sparse vector n, // Number of non-zero elements indx, // Array of indices x, // Array of non-zero values 0 // Flags (currently unused) ); if (status != ARMPL_STATUS_SUCCESS) { // Handle error return; } // Execute the sparse vector operation: y = alpha*x + beta*y // Use beta = 1.0 to keep the existing values in y const double beta = 1.0; status = armpl_spaxpby_exec_d(alpha, // alpha coefficient spvec_x, // sparse vector x beta, // beta coefficient y // dense vector y (input/output) ); if (status != ARMPL_STATUS_SUCCESS) { // Handle error armpl_spvec_destroy(spvec_x); return; } // Clean up armpl_spvec_destroy(spvec_x); }