ARM Zynq Cortex-A53: implementing complex matrix inversion

Hello,

I am developing embedded software on Zynq MPSOC Cortex-A53 (Armv7/Armv8) for image processing, and I need some help for developing a specific algorithm.

The algorithm involves many calculations of FFT and matrix using. As highest priority, we need to implement an inversion of complex matrix, with large dimension up to 30x30. (By complex matrix, I mean complex float number with real and imaginary parts).

The most significant constraint is obviously the timing constraint: we use to develop our algorithms with ARM NEON SIMD to be faster.

Consequently, I am still looking for a library (compatible ARM) to help me developing this inversion of complex matrix using ARM NEON.

I do not find any library satisfying these 2 constraints:

- Matrix inversion with complex numbers.

- ARM NEON using.

For example, I have studied Ne10 library but it provides inversion of matrix for real numbers but not complex.

Do you know a library (using ARM NEON) I should have a look to help me developing this complex matrix inversion?

Thanking you in advance,

Laurent BOUCHOT.

Parents

0 Chris Armstrong over 6 years ago

Hi Laurent,

In ArmPL functions cgetri/zgetri will return the inverse of a complex mtatrix (for single/double precision). That uses a LU factorization, which in turn uses gemm (matrix multiplication), which is heavily optimized (including using NEON instructions), so the performance should be good. Let us know if not!

www.netlib.org/.../group__complex16_g_ecomputational_gab490cfc4b92edec5345479f19a9a72ca.html

Chris.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

Reply

0 Chris Armstrong over 6 years ago

Hi Laurent,

In ArmPL functions cgetri/zgetri will return the inverse of a complex mtatrix (for single/double precision). That uses a LU factorization, which in turn uses gemm (matrix multiplication), which is heavily optimized (including using NEON instructions), so the performance should be good. Let us know if not!

www.netlib.org/.../group__complex16_g_ecomputational_gab490cfc4b92edec5345479f19a9a72ca.html

Chris.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

Children

0 Laurent38 over 6 years ago in reply to Chris Armstrong
Hi Chris,

I was doing some tests during the last past days.

I tried to compare Eigen and ArmPL libraries in terms of execution timing for a double complex matrix inversion of size 24x24.

First, I did test Eigen (very easy to use) and I got two measurements:

- 0.80 ms without ffast-math option.

- 0.35 ms with ffast-math option.

I did not check yet the impact / inconvenient of ffast-math build option, this was an option got through an online forum.

Second, I did test ArmPL, using LAPACKE API (very hard to make working it), and I got an average duration of 0.37 ms for the same matrix inversion.

I did hope to get better measurement using ArmPL... But not really. Our goal was to get around 0,05 ms, but I am not so sure it is possible!

What do you think about it?

Do you have other libraries in mind which I can test and compare?

Any remarks and adivces are welcome!

Thanks.

Regards,

Laurent.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Chris Armstrong over 6 years ago in reply to Laurent38

HI Laurent,

Thanks for trying out ArmPL. It's no surprise there's not much of an advantage in this case since 24x24 is a small problem (ArmPL is targeted at HPC) and the A53 is not a core we optimise for any more, however we did used to target that core so if you're interested contact support-hpc-sw@arm.com, and can see if we can point you to one of the old versions with A53-specific optimisations.

Chris.
Cancel
Up 0 Down

Reply

Accept answer

Cancel