This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M55 Sharing values between MVE and FPU for unsupported FP32 operations.

I am needing to optimize code that will be running on an Arm Cortex-M55 that has helium/MVE with floating-point support. The algorithm is quite recursive in nature so the compiler is struggling to infer any vector operations. The code is parallelizing across 4 channels and the math operations are mostly multiply, add/sub, divide and sqrt. The bottleneck is that I can run half the operations on MVE but the other (e.g. divide and sqrt) need to be performed on vFPU. The question is if I have my values stored in for example Q0 can I subsequently call FPU operations using the corresponding S registers (S0, S1, S2, S3)?

Take the following function (similar to what I am optimizing):

I rewrote with straight asm as follows:

 Notice I am *assuming* that after the MVE operations on Q registers the S registers will have the same content. The idea is to reduce the number of load/store/movs between FPU and MVE but not sure this assumption is correct. 

Thanks

0