This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A-53 : R/W data using interleaved mnemonics

Zvi Vered over 2 years ago

Hello,

I developed few signal processing routines (e.g FIR) using interleaved mnemonics like:

vld2q_f32 : load float32x4x2_t

vst2q_f32: store float32x4x2_t

Using those mnemonics, is simple and the code is clean.

Then I did the same with:

vld1q_f32: load flot32x4_t

vst1q_f32: load float32x4_t

In this case the input was 2 consecutive vectors of real, imaginary float32x4

Of course I had to run the calculation twice: for real, imaginary

It seems that using load\store of float32x4_t worked faster even that I had to run it twice.

Does it make sense ?

If relevant, I can share full source code.

Thank you,

Zvika