FFT and AVX cpu - different results on 64-bit platform

Kamil_Z_ · ‎03-10-2014

Hello,

I am dealing with the problem of different results for the same input data. These differences depends on PC type. After analysis of the problem it turned out the differences come from ippiFFTFwd_CToC_32fc_C1R function.

For example for the same source data and fft spec IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate:
- on PC with cpu Intel Core 2 Quad Q8400 (ippGetCpuType returns ippCpuPenryn, ippi version y8) one of element in destination array is: re=244466.937500 im=-545828.250000
- on PC with cpu Intel Xeon E5-1660 (ippGetCpuType returns ippCpuAVX, ippi version e9) the seme element in destination array is: re=244466.812500 im=-545828.250000

It gives difference in real values > 0.1. Is it normal?

I prepared sample program (proof of concept) working as follows:
1. ippInitCpu(ippCpuPenryn);
2. do FFTFwd: source -> destPenryn
3. ippInitCpu(ippCpuAVX);
4. do FFTFwd: source -> destAVX
5. compare elements in destPenryn and destAVX and print out differences where abs > 0.001

Then I ran it on Intel Xeon E5-1660 and got a lots of differences.

There is one more note: this problem occurred on x64 version. When I built this program on 32-bit platform destPenryn and destAVX were exactly the same (ippi versions respectively p8 for Penryn and g9 for AVX).

I have put this sample program in attachments just in case you'd find it useful (VS2012).

My IPP version is 7.1 (ippi 7.1.0 r36264).

Could you confirm this is bug or normal working? Is it related to floating-point operations? Then what accuracy is guaranteed and why it depends on platform type?

Tkank you in advance.

Best regards,

Kamil Żukowski

Thomas_Jensen1 · ‎03-11-2014

Just a quick thought: since you use 32f (single precision) you'll get approximate 6.5 significant digits: 123456.7

Your value 244466.937500 is thus only significant to 244466.9 where the .9 is only significant to 50%, so it could easily be .8, so the error of >0.1 is within tolerance if I'm correct.

Using two different methods (SSE vs. AVX) to compute the FFT is the reason for the different unstable decimals.

You could test by computing using 64f and see if the decimals gets stable (gets significant).

Thomas

Kamil_Z_ · ‎03-11-2014

You may be right.

Unfortunately I did not find any two dimensional FFT for 64f type so I can't test in on the same source data and order.

Anyway I did some experiments with one dimensional FFT (ippsFFT) and for 32f the max difference has never exceeded 6.5 digits (as you said) and for 64f the differences were below 32f epsilon (I didn't check exact values). It confrims your guess.

Best regards,
Kamil Żukowski

Igor_A_Intel · ‎03-12-2014

Hi Kamil,

AVX code uses FMA instruction - it provides mul & add operation without intermediate rounding - so AVX code (FMA based) will always be a bit precise than SSE4.2 and lower one (SSE based of course, we don't consider FPU here). The difference you see must be considered as relative only, not absolute one - and the answer from Tomas is correct - the difference is not greater than the weight of the least meaning bit in 32-bit FP mantissa.

regards, Igor

BTW you can emulate 2D 64f FFT just applying 1D 64f FFT to all rows and then to all columns - 32f one uses the same approach internally and is based on 1D functions.

Kamil_Z_ · ‎03-12-2014

Thank you very much for both explanations.

Best regards,
Kamil

Stephan1 · ‎04-02-2014

Hi Igor and Thomas

I had a similar experience as Kamil and I wonder:

In case a program wants to achieve identical result independent of the FFT library / hardware running those:
Should then after FFT the lowest (two) significant bit in the mantissa of those (F32) results just masked to zero ?

Thanks in advance for your thoughts / comments.

Best regards, Stephan