Noticeable Performance Decrease Problem For Short-Length FFT Samples

meidus · ‎01-09-2006

Dear Support at Intel,

We are benchmarking the performance of specific signal processing functions implemented using the IPP libraries. In particular, the FFT and FIR filtering operations are performed for real and complex data and for single and double precision floating point arithmetic. We have carried out this testing for the 4.0, 4.1.2, and 5.0 libraries using statically linked a6, w7, t7, and m7 code.

The benchmark uses the well-known dual-loop algorithm to measure the average number of clock ticks per data sample required to perform an FFT for power of 2 FFT lengths (e.g., 2, 4, 8, 16, 32, 64, , 2^17). Large sets of average elapsed clock tick measurements are generated for each FFT length, and then the median value of each set is determined to minimize the effect of outliers. By plotting the median value for each FFT length, a trend-line corresponding to the computational cost of processing data as a function of the FFT length is obtained.

Repeated tests with P3 and Prescott P4 based systems show anomalously poor results for 16 point FFTs. Please see the attached figures. When using a Hamming window, this behavior is observed for 4 point FFTs. We would appreciate it if you could provide any insight into the reason that this behavior is occurring (is it an artifact of the FFT implementation?).

Best Regards,

Matt

Message Edited by [email protected] on 01-09-2006 01:57 PM

Ying_S_Intel · ‎01-11-2006

Dear Customer,
We will look into it. I'vedocumented this report in issue 346494through Intel Premier Support under your account. Please provide any necessary information via that channel and our support engineer will provide further assistance on this issue.

Thanks,
Ying S
Intel Corp.