This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Use armpl(22.0) to calculate fft, but fftwh(fp16) is slow than fftwf(fp32) in kunpeng920 arm server, I expect fftwh is faster 2x than fftwf

code:

static void fftwf_armpl_fp32(fftwf_complex* signal, int row, int col) {
fftwf_plan plan_f = fftwf_plan_dft_2d(col, row, signal, signal, FFTW_FORWARD, FFTW_ESTIMATE);
fftwf_execute(plan_f);
fftwf_destroy_plan(plan_f);
}

static void fftwf_armpl_fp16(fftwh_complex* signal, int row, int col) {
fftwh_plan plan_h = fftwh_plan_dft_2d(col, row, signal, signal, FFTW_FORWARD, FFTW_ESTIMATE);
fftwh_execute(plan_h);
fftwh_destroy_plan(plan_h);
}

size

FP32(ms)

FP16(ms)

256*256

4.45

3.09

512*512

16.4

12.7

1024*1024

35.7

36.0

2048*2048

180.1

169.1

4096*4096

761.5

861.4