I have a C++ project using Intel Compile, this project is called A.
This A project have 2 functions ( X& Y). Output of A project is dll and export 2 functions.
I have another C# project which imports 2 functions (X&Y) of A project, the project is called B.
The B project calls X function, doesn't call Y function.
While executing B project, a bug occurred. The bug caused changing the default value of float.Epsilon from 1.401298E-45 to zero. It seem lose last bit mantissa of float.epsilon.
When I remove Y function from A project, B project will execute normally.
The Y function of A project has only 1 line code to allocate memory by using ippsMalloc_64f function.
I don't know root cause of this bug. Could you please help me?
ippsMalloc_64f can't lead to changes you've described above - it's just a wrapper above run-time malloc() that just aligns allocated buffer, and doesn't use at least 1 FP instruction. Could you provide a reproducer of this issue?
The reason is when call a function in IPP DLL on program, the progress loading ipp dll have change FZ(bit15) in MXCSR register to 1.
In some words, I summarize main steps in the sample as below.
- Call method of IPP project at the 1st time (at this time, FZ(bit15) is set to 1 in MXCSR register)
- Use C code in order to reset FZ(bit15) to 0 with method _mm_setcsr().
- When I call method of IPP project at the 2nd time, FZ(bit15) is still 0.
For more details, please refer to the attached file.
FTZ mode is set by Intel compiler in the init(section) that is called before dllmain() - it's default compiler behavior if "-fp:precise" switch is not used. (IPP is built with Intel compiler.) In your case you can set this bit to the state you need at the very beginning of your application with call to
IPPAPI( IppStatus, ippSetFlushToZero, ( int value, unsigned int* pUMask ))
// value - !0 or 0 - set or clear the corresponding bit of MXCSR
// pUMask - pointer to user store current underflow exception mask
// ( may be NULL if don't want to store )
// ippStsNoErr - Ok
// ippStsCpuNotSupportedErr - the mode is not suppoted
that is defined in the ippcore.h
1. About setting "-fp:precise". I tried to set in Configuration Properties > C/C++ > Code Generation > Floating Point Model
- when I set Floating Point Model to "-fp:precise", FZ(bit15) is set to 1 after calling method of IPP.
- when I set Floating Point Model to "fp:strict" or "fp:fast", FZ(bit15) is still set 1 after calling method of IPP.
(Environment: Windows 7, 64 bit)
2. If i use ippSetFlushToZero to reset FZ(bit15) to 0, is there any problems apart from default case (FZ(bit15) = 1)?
For example, decrease in peformance, wrong result of IPP methods' calculation process, etc.
3.Is there any solution better than above-mentioned solutions?
For example, I tried to do following steps.
- before calling methods of IPP, I set FZ(bit15) to 1.
- after calling methods of IPP, I reset FZ(bit15) to 0.
With this way, it will be processed same as default case. It means that IPP methods' process will be operated by enabling "flush to zero" mode (FZ(bit15) is 1).
If you have any suggestion for this case, please contact me. Thank you for your support.
You don't need to set fp:precise - I've said that IPP is built without this switch and FTZ bit is set to 1 in the IPP dllMailn() function - this behavior can't be changed in your app. You should not always switch this bit off/on before/after each IPP call - you should do this only once after IPP DLL has been loaded and initialized. This bit doesn't affect correctness of IPP functions, in some rare cases it can affect performance only. For example IIR functions behavior strongly depends on coefficients and input data because of feedback dependency and limited FP accuracy (23 bit mantissa for 32f and 53 bit for 64f - therefore rounding is performed after each add or mul operation) - therefore it is very easy to go out from normalized FP numbers representation to over/underflow - in this case, if FTZ is == 1, HW operates with such numbers as with zeroes (normal speed execution) while if FTZ is not set - HW continues trying to obtain the most correct result - and in this case instead of simple add or mul instruction execution CPU invokes special subroutine from its ROM that can lead to significant performance degradation. Such cases are very rare if correct algorithms and data are used. If you meet significant unexpected perf degradation in some piece of your FP code because of under/overflows of FP numbers you use - you should reconsider algorithm, coefficients, data range, etc., or should set FTZ and DAZ bits to 1. Intel compiler without fp:precise switch sometimes generates speculative but fast code and sets FTZ to 1 in order to avoid any slowdown because of reasons described above.
In order to set FTZ = 0, I tried to set the FTZ flag as instruction of below page: https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-1659EAE1-583E-44EE-BDEA-7C68C46061C7.htm
I do the following steps:
- Open Configuration Properties > C/C++ > Command Line
- Add "/Qftz-" to field "Additional Options". Click OK
But when I rebuild the project, the output is
"1>icl: command line remark #10148: option '/Qftz-' not supported" and cannot set FTZ = 0 (*)
Could you please tell me the below questions:
- The way to set FTZ = 0 as the page is OK, isn’t it?
- Is my setting method correct?
- Please tell me the reason of the error (*) when rebuild the project and how to fix it. The reason is version of IPP, isn’t it?
(I use: Visual Studio 2008 SP1, Intel Parallel Studio 2011)
Thank you for your support.