I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:
1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?
2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?
3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.
4. What other benefits might my client gain by switching to MKL?
Thanks in advance - dc42
You would have to call an MKL exp function explicitly (possibly by macro substitution) to use MKL as a solution, if I understand the question you have posed. MKL doesn't support scalar math functions, although you could treat a scalar function as a vector of length 1. You would choose the highest accuracy version; fast vector exponentials normally permit more than 1 ULP variation.
You might consider whether a trial of Intel C++ (which includes MKL) would determine whether there is a better solution. Intel C++ has its own library of math functions and supports an option /Qimf-arch-consistency:true in which you can give a list of functions for which you want a version which should give the same result on AMD and Intel platforms. You would also want to set a /arch option such as /arch:SSE3 which works on all your platforms. You may need also to consider the option /fp:model source which disables vectorization optimizations where data alignment could produce differences of 1 ULP.
MKL libraries (like C++ compiled code) work as unmanaged code under .net. See this:
Tim, thanks for your reply, that was very helpful. Compatibility between results from AMD/Intel processors is not a big issue for us, we've found that differences are rare and we have worked around them. Compatibility between 32- and 64-bit results matters much more. Based on your reply, the Intel compiler might help us, but not MKL.
Sergei, we haven't tried the evaluation version of MKL yet. It would take at least two days to rebuild the code base using MKL and run the tests, and the purpose of my question was to determine whether it would be a worthwhile exercise.I didn't call _set_SSE2_enable, because based on Microsoft's documentation, I don't need to because it is the default on SSE2-enabled processors. I did check with the debugger that it was following the SSE2 code path inside the exp() function. Disassembling the exp() functions revealed that the Microsoft 64-bit and 32-bit SSE2 exp functions use different algorithms.
Hi Sergey, in case you still want to do verification, I am using Visual Studio 2010 under Windows 7 64-bit. Some of the operand values for which the Microsoft implementations of exp() return different values in 32-bit SSE2 and 64-bit builds are:
-0.083296271058701077 0.0066998698389698335 0.0066998698389698344 0.0066998698389698352
Hi illyapolak, is _Clexp the one that gets called if you have SSE2 code generation and intrinsic functions enabled?
Sergey touched on a good point. With Microsoft compiler, you need to set /arch:SSE2 in order for the 32-bit compiler to generate SSE2 code as the X64 one does by default. In VS2012 you have the option to generate AVX code by /arch:AVX. I don't know the details of how these options impact Microsoft math libraries, but you could expect numerical differences, particularly with float data types, between default 32-bit non-SSE code and 64-bit SSE2, even if the math functions were identical.
The Intel compilers changed the IA32 default to /arch:SSE2 a few years ago, so that IA32 and Intel64 defaults match. Only the 32-bit compilers present the choice of x87 code, which is default for Microsoft CL and invoked by /arch:IA32 for Intel ICL. The latter option may give you the same math functions which CL uses by default.
I do not know when compiler chooses to call CI version of exp function.By looking at msvcrt dll I can see that both versions have different entry points and different ordinals.Both of implementation have different prolog,but main calculation block is the same.By looking at code I can see that some kind of Horner scheme is used.
Hi Tim, we're using VS2010 and we're already setting /arch:sse2 in an attempt to get the 32- and 64-bit results to match as closely as possible. I have verified with the debugger that the SSE2 code for 32-bit exp() is executed.
Hi illyapolak, yes those 3 operands differ from the next in the series by only 1ulp. All of them produce the same value for exp(), with the 32-bit SSE2 and 64-bit results differing by 1ulp. The problem we have is not that at some of the results are more than 0.5ulp out, it's lack of reproducibility between 32- and 64-bit builds of the applications, which means that test results have to be inspected manually
>>>I assume it may have the same 32/64-bit differences, although I haven't tested that yet.>>>
I think that you can expect the same differences because NET runtime will probably call C runtime functions.
>>>The problem we have is not that at some of the results are more than 0.5ulp out, it's lack of reproducibility between 32- and 64-bit builds of the applications, which means that test results have to be inspected manually>>>
I think that this is silly question, but can you indentify those input arguments when your regression test fail and provide precalculated results to ported application?
>>Can you post mentioned above code?<<
You mean the disassembly? That will have to wait until I am back in the office.
>>I think that this is silly question, but can you indentify those input arguments when your regression test fail and provide precalculated results to ported application?<<
That would mean using different regression test data for the 32- and 64-bit builds. We want to use the same data for both builds, partly because it saves time, partly as a check that we haven't done anything wrong in porting the app to 64 bits.
>>Here are binary representations of these numbers:
0.0066998698389698335 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101
0.0066998698389698344 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101
0.0066998698389698352 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101<<
That may be true in single-precision (float) maths, but we're using double precision. Expressed as doubles, those 3 numbers are different. More precisely, they are decimal approximations of 3 adjacent binary double-precision (64-bit) IEEE floating-point numbers in a series generated using the _nextafter function (see http://msdn.microsoft.com/en-us/library/h0dff77w(v=vs.100).aspx). I'll post the test code when I am back in the office.
>>>0.0066998698389698335 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101>>>
decimal value has precision of 17 decimal digits so it should be represented by double precision floating point value.