I am trying to solve a batch of linear systems using QR factorization.The steps I follow are: 1) Assemble matrix and right-hand sides, 2) interleave with dge_interleave, 3) A P = QR with dgeqrff_interleave_batch, 4) B := Q^T B with dormqr_interleave_batch, 5) solve R X = B with dtrsm_interleave_batch. Now I need to apply the row (?) permutation to get the true X. I have tried the following process
// The solution X is now sitting in the B_p buffer. // The interleaved pivot vectors P**i are in jvpt_p. std::vector<double> col_B(m*nrhs); // regular column-major array for (int i = 0; i < ninter; i++) { // Deinterleave ARMPL_CHECK( armpl_dge_deinterleave(ninter, i, m, nrhs, col_B.data(), 1, m, B_p, istrd_B, jstrd_B)); // Permute LAPACKE_dlaswp(LAPACK_COL_MAJOR, nrhs, col_B.data(), m, 0, m-1, jpvt_p, istrd_jpvt); // Print the result vector (first right-hand side only) for (int row = 0; row < m; row++) { std::cout << col_B[row] << '\n'; } }
It looks like I have resolved the issue - jpvt is a simple permutation array.(It is not series of swap as used by the DGETRF/DGETRS routines, see LAPACK discussion, so DLASWP can not be used here). The following row swapping code worked for me:
std::vector<double> final_X(m*nrhs); for (int j = 0; j < m; j++) { // Retrieve k int k = jpvt[j * ninter + i]; // Copy all RHS for this row for (int rhs = 0; rhs < nrhs; rhs++) { final_X[k + rhs * m] = col_B[j + rhs * m]; } }