Hi team,
in an education environment, I tested a 128-tap FIR filter with some random input signal, 1000 samples long, both 32 bit integer.
int32_t Sb[128]; // FIR coefficients int32_t x[1000]; // input signal // ---- 1000 discrete convolution OP's int32_t test_fir_int() { int64_t y = 0; for (int n=0; n < 1000; n++) { for (int i = 0; i < 128; i++) { if (n >= i) y += (int64_t)x[n-i] * Sb[i]; // use of SMLAL: Compiler's choice } } return y >> 20; } ...
This code was built and run on Arduino 1.8.19 IDE on 2 different boards:
My observations:
I would have expected a factor of around 8 considering clock-speed ratio between M7/M3, another factor 2 because of the M7 dual-issue property, in total a factor 16.
My questions:
Thanks a lot.
Best regards,
Wolfgang
There are 2 LDR operations within the inner loop, right before the SMLAL MAC operation. I cannot say, if this is many. From your answers I understood, that my original question for a main reason for the M7 performance increase has no simple answer.I think it could be helpful to have some simulator tool to get more detailed information about the overall performance to be expected.
Thank you very much.