This is my first post; sorry if it's in the wrong place or if I've unwittingly violated any accepted norms or conventions around here...
My embedded application uses a Cortex-M4f with 128kB of SRAM. I need to do a 4096-point real FFT and I'm trying to use arm_rfft_fast_f32(), which seems to be ideally suited for my needs. My FFT will be a fixed-size, meaning I only need to do a 4096-point transform. I've done everything correctly, including instantiating an arm_rfft_fast_f32_instance. When I build my project (which, at the moment, does little more than the FFT in question), it turns out to be more than 130kB in size. When I look in the linker map file generated by my IDE (Keil), I can see that my arm_rfft_fast_f32_instance comprises quite a few tables of twiddle factors for the FFT. Given that a) I only need the largest one (the 4096-point table) and b) the smaller tables are just "decimated by 2" versions of the larger tables, is there a way to get rid of the unneeded tables? If I could do that, I could cut the memory footprint of this thing by more than 25% and it would fit into SRAM, which is what I really need it to do.
Has anyone else had to deal with this issue? I find it odd that this particular FFT implementation is so incredibly storage-intensive. Is there a "smaller" FFT implementation that doesn't have such a big memory footprint?
Thanks in advance.
That switch/case block is a natural location to start finding a solution to the problem, unfortunately we are not working with rfft_fast_init_f32() at the source code level. I'll refer to the function you posted if ever I also run into this situation in the future.
So, by how much have you reduced the size of your code?