This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Use armpl(22.0) to calculate fft, but fftwh(fp16) is slow than fftwf(fp32) in kunpeng920 arm server, I expect fftwh is faster 2x than fftwf

code:

static void fftwf_armpl_fp32(fftwf_complex* signal, int row, int col) {
fftwf_plan plan_f = fftwf_plan_dft_2d(col, row, signal, signal, FFTW_FORWARD, FFTW_ESTIMATE);
fftwf_execute(plan_f);
fftwf_destroy_plan(plan_f);
}

static void fftwf_armpl_fp16(fftwh_complex* signal, int row, int col) {
fftwh_plan plan_h = fftwh_plan_dft_2d(col, row, signal, signal, FFTW_FORWARD, FFTW_ESTIMATE);
fftwh_execute(plan_h);
fftwh_destroy_plan(plan_h);
}

size

FP32(ms)

FP16(ms)

256*256

4.45

3.09

512*512

16.4

12.7

1024*1024

35.7

36.0

2048*2048

180.1

169.1

4096*4096

761.5

861.4

Parents
  • Hi.

    Thanks for getting in contact.

    Planning time for an FFT call is typically far greater than the execution time.  If doing a benchmark it is therefore sensible to time the two parts separately.  I don't have any comparison figures to hand on a Kunpeng920, but I'd imagine that planning costs are comparable between the precisions, which may well be what your results show.  I'd recommend calling out the two costs independently in your table.  The usage model of the FFTW interface is that you plan once and use the resulting plan many times.

    For a 1-d case I'd recommend averaging over (many) calls, but in 2-d that's less important, may may be worth a go.

    Hope this helps.

    Chris

Reply
  • Hi.

    Thanks for getting in contact.

    Planning time for an FFT call is typically far greater than the execution time.  If doing a benchmark it is therefore sensible to time the two parts separately.  I don't have any comparison figures to hand on a Kunpeng920, but I'd imagine that planning costs are comparable between the precisions, which may well be what your results show.  I'd recommend calling out the two costs independently in your table.  The usage model of the FFTW interface is that you plan once and use the resulting plan many times.

    For a 1-d case I'd recommend averaging over (many) calls, but in 2-d that's less important, may may be worth a go.

    Hope this helps.

    Chris

Children