This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

"pruning" arm_rfft_fast_f32_instance?

This is my first post; sorry if it's in the wrong place or if I've unwittingly violated any accepted norms or conventions around here...

My embedded application uses a Cortex-M4f with 128kB of SRAM. I need to do a 4096-point real FFT and I'm trying to use arm_rfft_fast_f32(), which seems to be ideally suited for my needs. My FFT will be a fixed-size, meaning I only need to do a 4096-point transform. I've done everything correctly, including instantiating an arm_rfft_fast_f32_instance. When I build my project (which, at the moment, does little more than the FFT in question), it turns out to be more than 130kB in size. When I look in the linker map file generated by my IDE (Keil), I can see that my arm_rfft_fast_f32_instance comprises quite a few tables of twiddle factors for the FFT. Given that a) I only need the largest one (the 4096-point table) and b) the smaller tables are just "decimated by 2" versions of the larger tables, is there a way to get rid of the unneeded tables? If I could do that, I could cut the memory footprint of this thing by more than 25% and it would fit into SRAM, which is what I really need it to do.

Has anyone else had to deal with this issue? I find it odd that this particular FFT implementation is so incredibly storage-intensive. Is there a "smaller" FFT implementation that doesn't have such a big memory footprint?

Thanks in advance.

Parents
  • Hi glennz,

    I have not viewed a code generated by arm_rfft_fast_f32() but if it builds separate twiddle factors for all FFT lengths it supports while your application is fixed at 4096-point, you can delete those tables for other lengths. The table for all supported lengths are there so that the program can dynamically change the FFT length, you don't need this flexibility.

    I have not tried this however so I can't provide more details. Anyway, I don't perceive a catastrophic consequence if you try commenting out those unnecessary tables (assuming your application does not detonate explosives of any kind based on the FFT output, and if it does you can replace the trigerring device with an LED or small buzzer temporarily ). Stefano Cadario's reply may also be helpful to you.

    It's possible to create a twiddle factor table for the largest FFT length and increase the step size when indexing the table with shorter FFT length but, based on the information from your second paragraph, that's not the option chosen in arm_rfft_fast_f32() for some reason.

    There are some factors that makes FFT "incredibly storage-intensive", these may include:

    • The lack of bit-reversed addressing capability in hardware can result in increased memory requirement if the software uses look-up table to reorder the samples. Look-up table trades off memory size for execution speed. I have not investigated yet how CMSIS-DSP emulates bit-reversed addressing.
    • Your application is memory-intensive because of the FFT length you need, 4096-point. For the data buffer you already need

    4096 samples * 4 bytes/sample = 16384 bytes

    Regards,

    Goodwin

Reply
  • Hi glennz,

    I have not viewed a code generated by arm_rfft_fast_f32() but if it builds separate twiddle factors for all FFT lengths it supports while your application is fixed at 4096-point, you can delete those tables for other lengths. The table for all supported lengths are there so that the program can dynamically change the FFT length, you don't need this flexibility.

    I have not tried this however so I can't provide more details. Anyway, I don't perceive a catastrophic consequence if you try commenting out those unnecessary tables (assuming your application does not detonate explosives of any kind based on the FFT output, and if it does you can replace the trigerring device with an LED or small buzzer temporarily ). Stefano Cadario's reply may also be helpful to you.

    It's possible to create a twiddle factor table for the largest FFT length and increase the step size when indexing the table with shorter FFT length but, based on the information from your second paragraph, that's not the option chosen in arm_rfft_fast_f32() for some reason.

    There are some factors that makes FFT "incredibly storage-intensive", these may include:

    • The lack of bit-reversed addressing capability in hardware can result in increased memory requirement if the software uses look-up table to reorder the samples. Look-up table trades off memory size for execution speed. I have not investigated yet how CMSIS-DSP emulates bit-reversed addressing.
    • Your application is memory-intensive because of the FFT length you need, 4096-point. For the data buffer you already need

    4096 samples * 4 bytes/sample = 16384 bytes

    Regards,

    Goodwin

Children
No data