I am trying to solve a batch of linear systems using QR factorization.The steps I follow are: 1) Assemble matrix and right-hand sides, 2) interleave with dge_interleave, 3) A P = QR with dgeqrff_interleave_batch, 4) B := Q^T B with dormqr_interleave_batch, 5) solve R X = B with dtrsm_interleave_batch. Now I need to apply the row (?) permutation to get the true X. I have tried the following process
// The solution X is now sitting in the B_p buffer. // The interleaved pivot vectors P**i are in jvpt_p. std::vector<double> col_B(m*nrhs); // regular column-major array for (int i = 0; i < ninter; i++) { // Deinterleave ARMPL_CHECK( armpl_dge_deinterleave(ninter, i, m, nrhs, col_B.data(), 1, m, B_p, istrd_B, jstrd_B)); // Permute LAPACKE_dlaswp(LAPACK_COL_MAJOR, nrhs, col_B.data(), m, 0, m-1, jpvt_p, istrd_jpvt); // Print the result vector (first right-hand side only) for (int row = 0; row < m; row++) { std::cout << col_B[row] << '\n'; } }
Hi,
Many thanks for your questions about our interleaved-batch functions in ArmPL.
It seems that you've managed to fix all your immediate problems. We'll add a note to the documentation for `jpvt` to point out that it is not compatible with LAPACK's `LASWP`. Your assumption about 0-based indexing is correct as well. We do not provide Fortran interfaces for the interleaved-batch functions so there is no support for 1-based indexing. We will also document that, if these functions are called from Fortran (via the C interface), then the indexing is still 0-based.
We'll also add an example in the future. We're about to release a new version of the library very soon, so the new example won't make the upcoming release but we can get one added for the release following later in the year.
We do recommend running some benchmarks to determine the best interleaving factor for your application on different target systems. Multiples of the vector width make sense, and for the 2 machines you mentioned this is 128-bits (i.e. 2 doubles). The blog I wrote when we released these functions shows some of the types of graphs you might want to produce to determine the best factor: Introducing interleave-batched linear algebra functions in Arm PL
Remember that these functions are intended for use when solving many very small problems, so as you approach individual problem sizes of 100 you might find it better to use the LAPACK function `DGEQRF` at some point.
Hope that helps, please let us know if you have any more feedback or need any help.
Regards,
Chris.