Quite a lot of things to - Page 3

David_C_12 · ‎05-24-2013

I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:

1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?

2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?

3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.

4. What other benefits might my client gain by switching to MKL?

Thanks in advance - dc42

Bernard · ‎05-29-2013

Can you post disassembly of exp()?

David_C_12 · ‎05-29-2013

iliyapolak wrote:

Can you post disassembly of exp()?

I'm not sure I should do that, it might incur the wrath of Microsoft as it's their copyright. You can disassemble it yourself quite easily if you run the test program I gave under the debugger and step into exp().

SergeyKostrov · ‎05-29-2013

>>...When this program is run, the results for 32- and 64-bit runs differ for a few values in the range... As I've already mentioned if these differences are less than Epsilon for a Double-Precision floating point ( DP FP ) type then all these values "could be considered" as the same. If you need consistent results across all platforms try to mask 6 ( or better 7 ) last bits of a binary value which represents a value of DP FP and you will get the same results. Of course, you could proceed with your own way but I see that a change from CRT-function exp() to MKL vector exp-functions doesn't make sense. >>... >>d = _nextafter(d, dir); >>... Why do you need _nextafter?

Shane_S_Intel · ‎05-29-2013

Hi David, this is definitely becoming a lengthy thread …

You are correct that we do NOT want disassembled sources from other libraries posted here. Doing so may violate a licensing agreement.

Now onto your original questions:

Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?

Intel MKL does not currently ensure that the results between our 32-bit and 64-bit libraries match across all MKL domains. I’ll check with the team to confirm this statement for the vector math domain, specifically. We do have a new conditional numerical reproducibility feature, but it does not apply across 32-bit and 64-bit OSs, see http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110

Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons?

Yes.

If so, are there any performance penalties if MKL is used in place of the Microsoft library?

It is difficult to say based on the application and functions called. Intel MKL contains optimized vector implementations while you appear to be interested in scalar or simd-style math functions, so there may be additional call overheads associated with the switch. You may want to create a timing program to compare the differences between the functions of interest and the latest version of Intel MKL. Either that or if possible try your application with the Intel compiler – its scalar math library and simd-style (short vector math library) are also highly optimized. This information may also be of use: http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm

Is there any way of getting the Microsoft .NET framework to use MKL?

Here is a write-up on a related subject – maybe it will help: http://software.intel.com/en-us/articles/some-more-additional-tips-how-to-call-mkl-from-your-c-code

I assume it may have the same 32/64-bit differences, although I haven't tested that yet.

The .NET aspect should not change how Intel MKL behaves internally. So yes, expect differences.

What other benefits might my client gain by switching to MKL?

The key benefit of Intel MKL is performance, while overall application performance is very much application specific. I recommend that you try the Intel compiler first, and see how that works for you. If you are able to identify where the code calls elementary functions on lengthy double/single precision vectors, Intel MKL’s vector math functions may help boost performance. The conditional numerical reproducibility feature in Intel MKL (and Intel compiler) should also allow you to get consistent results across a variety of hardware platforms assuming the application executes on the same OS.

Thanks, Shane

SergeyKostrov · ‎05-29-2013

This is a short follow up: [ David wote ] >>...this is causing regression tests to fail... David, Could you explain how these differences ( once again, less then Epsilon for DP FP type ) affect some real processing, please? Note: Sorry, just generic and Not related to MKL... There is a possibility that your client will always have some differences in numbers calculated on different platforms, that is 32-bit and 64-bit. In a mission critical software systems, for example, in healthcare, finance, aerospace, geomatics or defense, a concept of tolerences is used and there are many simple ways of how it could be implemented. On a very complex X-Ray image processing system our mathematician with MS degree in Mathematics simply informed us that 6 digits precision will satisfy all requirements a company is forced to follow, related to ISO 8001 standard, and everything is good if a relative error is Not greater that 2% from some reference data set.

SergeyKostrov · ‎05-29-2013

>>...Intel MKL does not currently ensure that the results between our 32-bit and 64-bit libraries match across all MKL domains... I provided lots of test cases in the thread and results, like: >>Here are results on a Windows 7 Professional 64-bit with a 64-bit test application: >> >>[ Microsoft C++ compiler - 64-bit / VS 2008 Professional Edition / Rounding of Test Values ] >> >>exp >> >>V11 = 1.006722364175213700 >>V21 = 1.006722364175213700 >>V31 = 1.006722364175213700 >> >>IPP >> >>r50[0] = 1.006722364175213700 >>r50[1] = 1.006722364175213700 >>r50[2] = 1.006722364175213700 >> >>r53[0] = 1.006722364175213700 >>r53[1] = 1.006722364175213700 >>r53[2] = 1.006722364175213700 >> >>MKL - vdExp >> >>dR[0] = 1.006722364175213700 >>dR[1] = 1.006722364175213700 >>dR[2] = 1.006722364175213700 >> >>MKL - vmdExp >> >>dR[0] = 1.006722364175213700 >>dR[1] = 1.006722364175213700 >>dR[2] = 1.006722364175213700 Please let me know if you have different numbers ( with all these 5 exp-functions ) and I'll be glad to look at it again on what is wrong.

Sergey_M_Intel2 · ‎05-29-2013

Hi David,

I represent Intel team doing math libraries for Intel software products such as compilers, MKL and IPP. A couple of notes on this email thread.

There is no guarantee that any of math libraries considered in this thread (MSFT or Intel C++ compiler math library, MKL or IPP) will pass client's regression test after the migration. Formally, 32 and 64 bit compilers are different and they may generate slightly different code sequences which may affect the numeric reproducibility of results and these may cause the regression test to fail. It may or may not be the case in your situation - it's just a general note.

As indicated above typically there must be some tolerance threshold set which is in right balance between particular application requirements and the reality that numerical results may slightly diverge on slightly different binaries. As I mentioned before slightly different binary may be due to the use of different compiler versions (e.g. newer compiler adds more optimizations and as a result it changes the generated instruction sequences even on the same OS and in the same environment), or different compilers (32 vs. 64 bit or MSFT vs. Intel) or different math libraries (assuming you transition from standard libm shipped with the compiler to other math library such as MKL or IPP).

So my first recommendation is to reconsider whether the existing regression test matches your needs, e.g. is this really a hard requirement to work identically in different environments such as 32 vs. 64 bit OS.

Please remember that math library (exp() function in your case) is potentially only one of sources of numeric deviations during migration.

Second, as you indicated, 1 ulp differences in exp results are rare and these rare events result in test results divergence. This raises another question about numerical stability of the algorithm used in the test. In other words is the tolerance threshold is too small or is the numerical algorithm too sensitive to such small perturbations? Maybe the right path is to reconsider the numerical algorithm to make it more robust?

Assuming that you've done such evaluation and came to the conclusion that the thresholds are right sized, numerical algorithm is robust and the only source of your pain is exp(). Well, the only guaranteed recipe in this case will be to use the correctly rounded math library such as open source CRLIBM. In this case you'll have guaranteed identical results from this math library in all environments. All other "quick-and-dirty" tests provide no guarantee that in some rare case you don't get slightly different result.

Hope it was useful,

Sergey

David_C_12 · ‎05-30-2013

Quite a lot of things to reply to since my last post.

Sergey K, yes the differences in 32- and 64-bit results are very small, only 1ulp. However, under some conditions they get multiplied, and in one case this actually caused the result of a computation to change sign. I have established that in the cases we looked at, the differences in results are not significant in the context of the application, so either value returned by exp() is acceptable. The problem is lack of reproducibility between 32- and 64-bit builds. Every time we get a difference, someone is going to have to take a look at the differences and decided whether they are insignificant differences attributable to 32/64-bit maths differences, or might be significant differences caused by bugs introduced in migrating the code to 64-bits.

We already have a tolerance factor in our regression test results to allow for minor differences. This has been adequate in the past. Unfortunately, a single tolerance factor is not suitable for all the results we compute. We're looking at allowing different tolerances for different types of result, but I suspect that this won't prove sufficient in all cases, because the allowable tolerance may depend on the input data in some cases.

Sergey M, it's not a hard requirement that 32- and 64-bit results match to within the tolerance factor we already allow, however it's going to make releasing software more expensive if they don't, due to the need to take a close look at test results that differ and sign them off. We can accept a one-off change in test results if we migrate to a different implementation of exp(), because we would know that our code hasn't changed and any differences in results are caused by the change in library; then we can save down new regression data for use in future tests. I appreciate that there may be other reasons why 32- and 64-bit results might be different, however we have established that in the sample we looked at, exp() was responsible for at least 90% of the test failures. So changing to a library that uses the same algorithm for computing exp() in both builds will greatly improve the situation even if it doesn't totally fix it.

SergeyKostrov · ‎05-30-2013

>>...Sergey K, yes the differences in 32- and 64-bit results are very small, only 1ulp. However, under some conditions they get >>multiplied, and in one case this actually caused the result of a computation to change sign... 1. We're discussing that matter for more than 6 days and a problem with sign is a really new development and please try to explain what these conditions are. 2. In 2012 year two problems with Intel C++ compiler were detected related to an incorrect sing for values returned by 'sin', 'cos' and 'mod' functions. Even if you're using Microsoft C++ compiler it makes sence to very all math CRT-functions used on the project. 3. Based on my real experience with getting very consistent results on many platforms and many C++ compilers it is possible that something else is wrong with regression tests, or with a processing in a real application. 4. I could do a verification for Windows XP and Windows 7 with the following set of C++ compilers: Visual C++ 4.2, Visual C++ 6.x, VS 2005, VS 2008, VS 2010, VS2012, Intel versions 8, 11 and 13, MinGW, Borland C++ and Turbo C++ if you provide a test case that simulates that change of sign.

Bernard · ‎05-30-2013

David C. wrote:

Quote:

iliyapolakwrote:
Can you post disassembly of exp()?

I'm not sure I should do that, it might incur the wrath of Microsoft as it's their copyright. You can disassemble it yourself quite easily if you run the test program I gave under the debugger and step into exp().

Sorry for asking this.

Bernard · ‎05-30-2013

>>>However, under some conditions they get multiplied,>>>

Does the multiplication of 32/64 bit difference of results is a mandatory part of your code or is it done only for testing ppurpose?

David_C_12 · ‎05-30-2013

iliyapolak wrote:

>>>However, under some conditions they get multiplied,>>>

Does the multiplication of 32/64 bit difference of results is a mandatory part of your code or is it done only for testing ppurpose?

It's a necessary part of the code. It happens when we take the difference between two similar computations that use exp(), for example when calculating the derivative of a function by bumping. We know there will be a loss or precision when we do this, but as I explained before, the problem for us is lack of consistency between the two builds, not lack of precision.

Bernard · ‎05-30-2013

What about single precision results did they differ between the 32 and 64 bit libraries?Do you need double precision in your code?

David_C_12 · ‎05-30-2013

We use double precision only, so I haven't tried the single-precision versions.

SergeyKostrov · ‎05-30-2013

>>...as I explained before, the problem for us is lack of consistency between the two builds, not lack of precision... Could you post results for comparison from both builds? Unfortunately, these generic explanations are Not helping. Several IDZ members responed to your problem and some of them, including me, are trying to help you absolutely for free because a problem looks very interesting. I provided you with lots of examples and results and I didn't see follow ups. Also, if you think some details can not be released to a public thread then you need to proceed with a request to Intel Premier Support.

Bernard · ‎05-30-2013

Does your application by design demand double precision computation domain?

MKL vs Microsoft exp() function