This is my first post; sorry if it's in the wrong place or if I've unwittingly violated any accepted norms or conventions around here...
My embedded application uses a Cortex-M4f with 128kB of SRAM. I need to do a 4096-point real FFT and I'm trying to use arm_rfft_fast_f32(), which seems to be ideally suited for my needs. My FFT will be a fixed-size, meaning I only need to do a 4096-point transform. I've done everything correctly, including instantiating an arm_rfft_fast_f32_instance. When I build my project (which, at the moment, does little more than the FFT in question), it turns out to be more than 130kB in size. When I look in the linker map file generated by my IDE (Keil), I can see that my arm_rfft_fast_f32_instance comprises quite a few tables of twiddle factors for the FFT. Given that a) I only need the largest one (the 4096-point table) and b) the smaller tables are just "decimated by 2" versions of the larger tables, is there a way to get rid of the unneeded tables? If I could do that, I could cut the memory footprint of this thing by more than 25% and it would fit into SRAM, which is what I really need it to do.
Has anyone else had to deal with this issue? I find it odd that this particular FFT implementation is so incredibly storage-intensive. Is there a "smaller" FFT implementation that doesn't have such a big memory footprint?
Thanks in advance.
Solved. It was pretty simple, thanks to Paul Beckmann at DSP Concepts. I simply "over-rode" rfft_fast_init_f32() with my own function that looks like this:
arm_status gz_rfft_fast_init_f32(arm_rfft_fast_instance_f32 * S)
{
arm_cfft_instance_f32 * Sint;
Sint = &(S->Sint);
Sint->fftLen = 2048u;
S->fftLenRFFT = fftLen;
Sint->bitRevLength = ARMBITREVINDEXTABLE2048_TABLE_LENGTH;
Sint->pBitRevTable = (uint16_t *)armBitRevIndexTable2048;
Sint->pTwiddle = (float32_t *) twiddleCoef_2048;
S->pTwiddleRFFT = (float32_t *) twiddleCoef_rfft_4096;
return (ARM_MATH_SUCCESS);
}
The native init function had a switch/case tree that included ALL of the tables for lengths 4096, 2048, 1024, etc., all of which get linked in. Because I'm always using a 4096-point FFT, I can hard-wire the init.
That switch/case block is a natural location to start finding a solution to the problem, unfortunately we are not working with rfft_fast_init_f32() at the source code level. I'll refer to the function you posted if ever I also run into this situation in the future.
So, by how much have you reduced the size of your code?