I cannot find any information on the number of CPU cycles it takes to execute a 1024 Complex FFT, 32-bit floating-point data size, on an R52+ using Neon. Assume that the code executes from TCM and all data is in TCM.
Also, I see examples of 4x4 matrix multiply, but no information on the number of CPU cycles it takes.
Is there an answer to this question?