- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:

1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?

2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?

3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.

4. What other benefits might my client gain by switching to MKL?

Thanks in advance - dc42

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You would have to call an MKL exp function explicitly (possibly by macro substitution) to use MKL as a solution, if I understand the question you have posed. MKL doesn't support scalar math functions, although you could treat a scalar function as a vector of length 1. You would choose the highest accuracy version; fast vector exponentials normally permit more than 1 ULP variation.

You might consider whether a trial of Intel C++ (which includes MKL) would determine whether there is a better solution. Intel C++ has its own library of math functions and supports an option /Qimf-arch-consistency:true in which you can give a list of functions for which you want a version which should give the same result on AMD and Intel platforms. You would also want to set a /arch option such as /arch:SSE3 which works on all your platforms. You may need also to consider the option /fp:model source which disables vectorization optimizations where data alignment could produce differences of 1 ULP.

MKL libraries (like C++ compiled code) work as unmanaged code under .net. See this:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**_set_SSE2_enable**CRT function? ( Take a look at

**MSDN**for a description of the function and what it does ) - Could provide more technical details for '...

**results that differ by 1ulp**...'? ( a test-case? outputs? etc )

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Tim, thanks for your reply, that was very helpful. Compatibility between results from AMD/Intel processors is not a big issue for us, we've found that differences are rare and we have worked around them. Compatibility between 32- and 64-bit results matters much more. Based on your reply, the Intel compiler might help us, but not MKL.

Sergei, we haven't tried the evaluation version of MKL yet. It would take at least two days to rebuild the code base using MKL and run the tests, and the purpose of my question was to determine whether it would be a worthwhile exercise.I didn't call _set_SSE2_enable, because based on Microsoft's documentation, I don't need to because it is the default on SSE2-enabled processors. I did check with the debugger that it was following the SSE2 code path inside the exp() function. Disassembling the exp() functions revealed that the Microsoft 64-bit and 32-bit SSE2 exp functions use different algorithms.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

There are actually two version of exp() functions one of them is exp() and second is _Clexp both of them call into the same SSE2 code implementation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Sergey, in case you still want to do verification, I am using Visual Studio 2010 under Windows 7 64-bit. Some of the operand values for which the Microsoft implementations of exp() return different values in 32-bit SSE2 and 64-bit builds are:

-0.083296271058701077 0.0066998698389698335 0.0066998698389698344 0.0066998698389698352

Hi illyapolak, is _Clexp the one that gets called if you have SSE2 code generation and intrinsic functions enabled?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sergey touched on a good point. With Microsoft compiler, you need to set /arch:SSE2 in order for the 32-bit compiler to generate SSE2 code as the X64 one does by default. In VS2012 you have the option to generate AVX code by /arch:AVX. I don't know the details of how these options impact Microsoft math libraries, but you could expect numerical differences, particularly with float data types, between default 32-bit non-SSE code and 64-bit SSE2, even if the math functions were identical.

The Intel compilers changed the IA32 default to /arch:SSE2 a few years ago, so that IA32 and Intel64 defaults match. Only the 32-bit compilers present the choice of x87 code, which is default for Microsoft CL and invoked by /arch:IA32 for Intel ICL. The latter option may give you the same math functions which CL uses by default.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi David,

I do not know when compiler chooses to call CI version of exp function.By looking at msvcrt dll I can see that both versions have different entry points and different ordinals.Both of implementation have different prolog,but main calculation block is the same.By looking at code I can see that some kind of Horner scheme is used.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>>>0.0066998698389698335 0.0066998698389698344 0.0066998698389698352 >>>

It differs each other by 14 decimal orsers of magnitude.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Tim, we're using VS2010 and we're already setting /arch:sse2 in an attempt to get the 32- and 64-bit results to match as closely as possible. I have verified with the debugger that the SSE2 code for 32-bit exp() is executed.

Hi illyapolak, yes those 3 operands differ from the next in the series by only 1ulp. All of them produce the same value for exp(), with the 32-bit SSE2 and 64-bit results differing by 1ulp. The problem we have is not that at some of the results are more than 0.5ulp out, it's lack of reproducibility between 32- and 64-bit builds of the applications, which means that test results have to be inspected manually

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>>>I have verified with the debugger that the SSE2 code for 32-bit exp() is executed>>>

Can you post mentioned above code?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>>>I assume it may have the same 32/64-bit differences, although I haven't tested that yet.>>>

I think that you can expect the same differences because NET runtime will probably call C runtime functions.

>>>The problem we have is not that at some of the results are more than 0.5ulp out, it's lack of reproducibility between 32- and 64-bit builds of the applications, which means that test results have to be inspected manually>>>

I think that this is silly question, but can you indentify those input arguments when your regression test fail and provide precalculated results to ported application?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**whether it would be a worthwhile exercise**... I don't think this could be allowed on some projects to proceed with changing main codes, that is integration of a very big library, like MKL, without having a clear picture ( a

**complete**test case! ) on how MKL

**exp()**function could change matters. You could simply spend even more days on integration of MKL and could get a negative result. Even if a negative result is also result a couple of days will be lost.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi illapolak,

>>Can you post mentioned above code?<<

You mean the disassembly? That will have to wait until I am back in the office.

>>I think that this is silly question, but can you indentify those input arguments when your regression test fail and provide precalculated results to ported application?<<

That would mean using different regression test data for the 32- and 64-bit builds. We want to use the same data for both builds, partly because it saves time, partly as a check that we haven't done anything wrong in porting the app to 64 bits.

Hi Sergey,

>>Here are binary representations of these numbers:

0.0066998698389698335 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101

0.0066998698389698344 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101

0.0066998698389698352 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101<<

That may be true in single-precision (float) maths, but we're using double precision. Expressed as doubles, those 3 numbers are different. More precisely, they are decimal approximations of 3 adjacent binary double-precision (64-bit) IEEE floating-point numbers in a series generated using the _nextafter function (see http://msdn.microsoft.com/en-us/library/h0dff77w(v=vs.100).aspx). I'll post the test code when I am back in the office.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>>Can you post mentioned above code?<<

You mean the disassembly? That will have to wait until I am back in the office.>>>

Yes. Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>>>0.0066998698389698335 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101>>>

decimal value has precision of 17 decimal digits so it should be represented by double precision floating point value.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**but we're using double precision**. Expressed >>as doubles, those

**3 numbers are different**... David, That really makes a difference ( please don't skip such details next time... )!

**[ Float ]**0.0066998698389698335 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101 0.0066998698389698344 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101 0.0066998698389698352 -> Binary = 0x3BDB8A95 = 00111011 11011011 10001010 10010101

**[ Double ]**0.00669986983896983

**35**-> Binary = 0x3F7B71529D8887

**5E**= Binary A 0.00669986983896983

**44**-> Binary = 0x3F7B71529D8887

**5F**= Binary B 0.00669986983896983

**52**-> Binary = 0x3F7B71529D8887

**60**= Binary C Binary A = 00111111 01111011 01110001 01010010 10011101 10001000 10000111 01

**011110**Binary B = 00111111 01111011 01110001 01010010 10011101 10001000 10000111 01

**011111**Binary C = 00111111 01111011 01110001 01010010 10011101 10001000 10000111 01

**100000**David, Which binary, that is

**Binary A**, or

**Binary B**, or

**Binary C**, gives a right result(s) in your regression tests?

**Note:**Data for

**Long Double**will be posted later.

**[ Long Double ]**0.0066998698389698335 -> Binary = 0x... = not ready yet 0.0066998698389698344 -> Binary = 0x... = not ready yet 0.0066998698389698352 -> Binary = 0x... = not ready yet

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**_control87**that controls precision ( of calculations ) and another parameters of FP units?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page