This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

"pruning" arm_rfft_fast_f32_instance?

This is my first post; sorry if it's in the wrong place or if I've unwittingly violated any accepted norms or conventions around here...

My embedded application uses a Cortex-M4f with 128kB of SRAM. I need to do a 4096-point real FFT and I'm trying to use arm_rfft_fast_f32(), which seems to be ideally suited for my needs. My FFT will be a fixed-size, meaning I only need to do a 4096-point transform. I've done everything correctly, including instantiating an arm_rfft_fast_f32_instance. When I build my project (which, at the moment, does little more than the FFT in question), it turns out to be more than 130kB in size. When I look in the linker map file generated by my IDE (Keil), I can see that my arm_rfft_fast_f32_instance comprises quite a few tables of twiddle factors for the FFT. Given that a) I only need the largest one (the 4096-point table) and b) the smaller tables are just "decimated by 2" versions of the larger tables, is there a way to get rid of the unneeded tables? If I could do that, I could cut the memory footprint of this thing by more than 25% and it would fit into SRAM, which is what I really need it to do.

Has anyone else had to deal with this issue? I find it odd that this particular FFT implementation is so incredibly storage-intensive. Is there a "smaller" FFT implementation that doesn't have such a big memory footprint?

Thanks in advance.

Parents
  • Solved. It was pretty simple, thanks to Paul Beckmann at DSP Concepts. I simply "over-rode" rfft_fast_init_f32() with my own function that looks like this:

    arm_status gz_rfft_fast_init_f32(arm_rfft_fast_instance_f32 * S)

    {

      arm_cfft_instance_f32 * Sint;

      Sint = &(S->Sint);

      Sint->fftLen = 2048u;

      S->fftLenRFFT = fftLen;

      Sint->bitRevLength = ARMBITREVINDEXTABLE2048_TABLE_LENGTH;

      Sint->pBitRevTable = (uint16_t *)armBitRevIndexTable2048;

      Sint->pTwiddle     = (float32_t *) twiddleCoef_2048;

      S->pTwiddleRFFT    = (float32_t *) twiddleCoef_rfft_4096;

      return (ARM_MATH_SUCCESS);

    }

    The native init function had a switch/case tree that included ALL of the tables for lengths 4096, 2048, 1024, etc., all of which get linked in. Because I'm always using a 4096-point FFT, I can hard-wire the init.

Reply
  • Solved. It was pretty simple, thanks to Paul Beckmann at DSP Concepts. I simply "over-rode" rfft_fast_init_f32() with my own function that looks like this:

    arm_status gz_rfft_fast_init_f32(arm_rfft_fast_instance_f32 * S)

    {

      arm_cfft_instance_f32 * Sint;

      Sint = &(S->Sint);

      Sint->fftLen = 2048u;

      S->fftLenRFFT = fftLen;

      Sint->bitRevLength = ARMBITREVINDEXTABLE2048_TABLE_LENGTH;

      Sint->pBitRevTable = (uint16_t *)armBitRevIndexTable2048;

      Sint->pTwiddle     = (float32_t *) twiddleCoef_2048;

      S->pTwiddleRFFT    = (float32_t *) twiddleCoef_rfft_4096;

      return (ARM_MATH_SUCCESS);

    }

    The native init function had a switch/case tree that included ALL of the tables for lengths 4096, 2048, 1024, etc., all of which get linked in. Because I'm always using a 4096-point FFT, I can hard-wire the init.

Children