Programming Tradeoffs in Floating-point Applications

Jan_Meyer · ‎08-25-2010

Hi,

I have spent some time tracking a floating point underflow problem in our code and finally tracked it back to either callingippStaticInit or not calling it.

It seems if I callippStaticInit then I will get the underflow. If I don't then all is good. I went a step further and played withippGetCpuType andippInitCpu and found the ipp says my cpu isippCpuPenryn, which it is, but if I manually tellippInitCpu to set forippCpuX8664 then I don't get any underflow.

I double checked with CPU-Z my CPU specs and have added a screen dump of the info.

Here is the simple code example that fails for me in both 32 and 64bit compilation mode:

--------------------------------------------------------------------------------------------------------------------

IppCpuType cpu = ippGetCpuType();

// This call to ippInitCpu succeeds, but then the ippsMagnitude_32f returns infinity

//if (ippInitCpu(cpu) != ippStsNoErr)

// This call causes correct dispatching (on my x64 machine) and ippsMagnitude_32f

// returns correct result

//if (ippInitCpu(ippCpuX8664) != ippStsNoErr)

// return 1;

// This causes incorrect dispatching as it probably does the incorrect cpu detection

// as is seen above...

ippStaticInit();

float pfTemp = 0.0f;

float pfFFTOutReal = 8.9981476e-020f;

float pfFFTOutIm = -5.9999533e-020f;

// This is the formula inside ippsMagnitude_32f

// dst = sqrt( src.re^2 + src.im^2 )

// This is not using IPP and works every time, fanswer is 1.0815087e-019

//float fanswer = sqrt( pow(pfFFTOutReal[0], 2) + pow(pfFFTOutIm[0], 2) );

// Depending on if correct dispatching is done this will return the correct

// result (1.0815087e-019) in pfTemp or infinity in pfTemp

ippsMagnitude_32f(&pfFFTOutReal, &pfFFTOutIm, &pfTemp, 1);

IppCpuType cpu = ippGetCpuType();
// This call to ippInitCpu succeeds, but then the ippsMagnitude_32f returns infinity //if (ippInitCpu(cpu) != ippStsNoErr)
// This call causes correct dispatching (on my x64 machine) and ippsMagnitude_32f // returns correct result //if (ippInitCpu(ippCpuX8664) != ippStsNoErr) // return 1;
// This causes incorrect dispatching as it probably does the incorrect cpu detection // as is seen above... ippStaticInit();
float pfTemp = 0.0f; float pfFFTOutReal = 8.9981476e-020f; float pfFFTOutIm = -5.9999533e-020f;
// This is the formula inside ippsMagnitude_32f // dst = sqrt( src.re^2 + src.im^2 )
// This is not using IPP and works every time, fanswer is 1.0815087e-019 //float fanswer = sqrt( pow(pfFFTOutReal[0], 2) + pow(pfFFTOutIm[0], 2) );
// Depending on if correct dispatching is done this will return the correct // result (1.0815087e-019) in pfTemp or infinity in pfTemp ippsMagnitude_32f(&pfFFTOutReal, &pfFFTOutIm, &pfTemp, 1);

---------------------------------------------------------------------------------------------------------------

So why does the underflow happen? I am assuming the dispatching is calling the wrong function for my CPU.

This was reported by our testing team so I know this is not unique to my CPU or machine.

We prefer to use static linking with dispatching and I have double and tripple checked the order in which we include the ipp libs as per the spec.

I have also updated to 6.1.5 to check if this issue has been resolved, but it hasn't.

Cheers

Jan

Ying_H_Intel · ‎08-27-2010

Hi Jan,

Thanks for raising the issue here. I haven't did the test. But according to your discription, It should be related to the float point caculation precision and it's implementation on cpu. As your input and out are so small number,
The pfFFTOutReal is 0.000000000000000000089981476
and expected right answer is 0.00000000000000000010815087
It may producevariable results when using differentcaculate instructionsonCPU.

From the image ofCPUID, your cpu isPenryn cpu.The ippStaticInit() and ippGetCpuType() should be work correctly.

I thought thatthe problem is that under x64, call ippStaticInit(), then IPP will dispatch processor-specific optimized code. If you cpu is Penry, then "y8" code will be used. If not call or call ippInitCpu(ippCpuX8664) Then the C optimized code "mx"will used. Thus the px code return the "right" result as you compute from sqrt(pow(real,2)+pow(im, 2)). And "y9" which is some optimized instruction, resulta "INF" value.

Please see more in <<http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/).

Alought the input value and output value arebeyond double floating point computation precision (about 14~15 bit). But you may try ippsMagnitude_64f and see if it give same result.

Regards,
Ying

P.S some information from Intel compiler documentation
Understanding Floating-point Operations

Programming Tradeoffs in Floating-point Applications

In general, the programming objectives for floating-point applications fall into the following categories:

Accuracy: The application produces results that are close to the correct result.
Reproducibility and portability: The application produces consistent results across different runs, different sets of build options, different compilers, different platforms, and different architectures.
Performance: The application produces fast, efficient code.

Based on the goal of an application, you will need to make tradeoffs among these objectives. For example, if you are developing a 3D graphics engine, then performance may be the most important factor to consider, and reproducibility and accuracy may be your secondary concerns.

The Intel Compiler provides several compiler options that allow you to tune your applications based on specific objectives. Broadly speaking there are the floating-point specific options, such as the -fp-model (Linux* and Mac OS* X operating systems) or /fp (Windows* operating system) option, and the fast-but-low-accuracy options, such as the -fimf-max-error (Linux* and MacOS* X) or /Qimf-max-error (Windows*) option.The compiler optimizes and generates code differently when you specify these different compiler options. You should select appropriate compiler options by carefully balancing your programming objectives and making tradeoffs among these objectives. ...

View solution in original post

Ying_H_Intel · ‎08-27-2010

Hi Jan,

Thanks for raising the issue here. I haven't did the test. But according to your discription, It should be related to the float point caculation precision and it's implementation on cpu. As your input and out are so small number,
The pfFFTOutReal is 0.000000000000000000089981476
and expected right answer is 0.00000000000000000010815087
It may producevariable results when using differentcaculate instructionsonCPU.

From the image ofCPUID, your cpu isPenryn cpu.The ippStaticInit() and ippGetCpuType() should be work correctly.

I thought thatthe problem is that under x64, call ippStaticInit(), then IPP will dispatch processor-specific optimized code. If you cpu is Penry, then "y8" code will be used. If not call or call ippInitCpu(ippCpuX8664) Then the C optimized code "mx"will used. Thus the px code return the "right" result as you compute from sqrt(pow(real,2)+pow(im, 2)). And "y9" which is some optimized instruction, resulta "INF" value.

Please see more in <<http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/).

Alought the input value and output value arebeyond double floating point computation precision (about 14~15 bit). But you may try ippsMagnitude_64f and see if it give same result.

Regards,
Ying

P.S some information from Intel compiler documentation
Understanding Floating-point Operations

Programming Tradeoffs in Floating-point Applications

In general, the programming objectives for floating-point applications fall into the following categories:

Accuracy: The application produces results that are close to the correct result.
Reproducibility and portability: The application produces consistent results across different runs, different sets of build options, different compilers, different platforms, and different architectures.
Performance: The application produces fast, efficient code.

Based on the goal of an application, you will need to make tradeoffs among these objectives. For example, if you are developing a 3D graphics engine, then performance may be the most important factor to consider, and reproducibility and accuracy may be your secondary concerns.

The Intel Compiler provides several compiler options that allow you to tune your applications based on specific objectives. Broadly speaking there are the floating-point specific options, such as the -fp-model (Linux* and Mac OS* X operating systems) or /fp (Windows* operating system) option, and the fast-but-low-accuracy options, such as the -fimf-max-error (Linux* and MacOS* X) or /Qimf-max-error (Windows*) option.The compiler optimizes and generates code differently when you specify these different compiler options. You should select appropriate compiler options by carefully balancing your programming objectives and making tradeoffs among these objectives. ...

Jan_Meyer · ‎08-30-2010

I tried the 64f versions of the functions and they produce the correct result while still using ippStaticInit (which select Penryn).

Thanks for the help. I didn't realize the x8664 option set the code to optimized c code.

ippStaticInit and ippGetCpuType fails to detect correct CPU

Programming Tradeoffs in Floating-point Applications

Programming Tradeoffs in Floating-point Applications