- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a client that is migrating a large C++ software base from 32- to 64-bit code in MS Visual Studio. One of the problems they are having is that the 32- and 64-bit versions of the C library exp() function produce results that differ by 1ulp for some operands, and this is causing regression tests to fail. One potential solution I am considering is to use Intel MKL instead of the Microsoft library. So I have a few questions:
1. Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?
2. Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons? If so, are there any performance penalties if MKL is used in place of the Microsoft library?
3. Is there any way of getting the Microsoft .NET framework to use MKL? I assume it may have the same 32/64-bit differences, although I haven't tested that yet.
4. What other benefits might my client gain by switching to MKL?
Thanks in advance - dc42
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you post disassembly of exp()?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
Can you post disassembly of exp()?
I'm not sure I should do that, it might incur the wrath of Microsoft as it's their copyright. You can disassemble it yourself quite easily if you run the test program I gave under the debugger and step into exp().
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David, this is definitely becoming a lengthy thread …
You are correct that we do NOT want disassembled sources from other libraries posted here. Doing so may violate a licensing agreement.
Now onto your original questions:
- Do the 32-bit and 64-bit builds of MKL produce identical results for exp() and other transcendental functions, for all operands, assuming that SSE2 is enabled for our 32-bit code?
Intel MKL does not currently ensure that the results between our 32-bit and 64-bit libraries match across all MKL domains. I’ll check with the team to confirm this statement for the vector math domain, specifically. We do have a new conditional numerical reproducibility feature, but it does not apply across 32-bit and 64-bit OSs, see http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110
- Although the client has mostly Intel hardware, I believe they have a few AMD Opteron-based server farms. Does MKL work on Opterons?
Yes.
- If so, are there any performance penalties if MKL is used in place of the Microsoft library?
It is difficult to say based on the application and functions called. Intel MKL contains optimized vector implementations while you appear to be interested in scalar or simd-style math functions, so there may be additional call overheads associated with the switch. You may want to create a timing program to compare the differences between the functions of interest and the latest version of Intel MKL. Either that or if possible try your application with the Intel compiler – its scalar math library and simd-style (short vector math library) are also highly optimized. This information may also be of use: http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm
- Is there any way of getting the Microsoft .NET framework to use MKL?
Here is a write-up on a related subject – maybe it will help: http://software.intel.com/en-us/articles/some-more-additional-tips-how-to-call-mkl-from-your-c-code
- I assume it may have the same 32/64-bit differences, although I haven't tested that yet.
The .NET aspect should not change how Intel MKL behaves internally. So yes, expect differences.
- What other benefits might my client gain by switching to MKL?
The key benefit of Intel MKL is performance, while overall application performance is very much application specific. I recommend that you try the Intel compiler first, and see how that works for you. If you are able to identify where the code calls elementary functions on lengthy double/single precision vectors, Intel MKL’s vector math functions may help boost performance. The conditional numerical reproducibility feature in Intel MKL (and Intel compiler) should also allow you to get consistent results across a variety of hardware platforms assuming the application executes on the same OS.
Thanks, Shane
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi David,
I represent Intel team doing math libraries for Intel software products such as compilers, MKL and IPP. A couple of notes on this email thread.
There is no guarantee that any of math libraries considered in this thread (MSFT or Intel C++ compiler math library, MKL or IPP) will pass client's regression test after the migration. Formally, 32 and 64 bit compilers are different and they may generate slightly different code sequences which may affect the numeric reproducibility of results and these may cause the regression test to fail. It may or may not be the case in your situation - it's just a general note.
As indicated above typically there must be some tolerance threshold set which is in right balance between particular application requirements and the reality that numerical results may slightly diverge on slightly different binaries. As I mentioned before slightly different binary may be due to the use of different compiler versions (e.g. newer compiler adds more optimizations and as a result it changes the generated instruction sequences even on the same OS and in the same environment), or different compilers (32 vs. 64 bit or MSFT vs. Intel) or different math libraries (assuming you transition from standard libm shipped with the compiler to other math library such as MKL or IPP).
So my first recommendation is to reconsider whether the existing regression test matches your needs, e.g. is this really a hard requirement to work identically in different environments such as 32 vs. 64 bit OS.
Please remember that math library (exp() function in your case) is potentially only one of sources of numeric deviations during migration.
Second, as you indicated, 1 ulp differences in exp results are rare and these rare events result in test results divergence. This raises another question about numerical stability of the algorithm used in the test. In other words is the tolerance threshold is too small or is the numerical algorithm too sensitive to such small perturbations? Maybe the right path is to reconsider the numerical algorithm to make it more robust?
Assuming that you've done such evaluation and came to the conclusion that the thresholds are right sized, numerical algorithm is robust and the only source of your pain is exp(). Well, the only guaranteed recipe in this case will be to use the correctly rounded math library such as open source CRLIBM. In this case you'll have guaranteed identical results from this math library in all environments. All other "quick-and-dirty" tests provide no guarantee that in some rare case you don't get slightly different result.
Hope it was useful,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quite a lot of things to reply to since my last post.
Sergey K, yes the differences in 32- and 64-bit results are very small, only 1ulp. However, under some conditions they get multiplied, and in one case this actually caused the result of a computation to change sign. I have established that in the cases we looked at, the differences in results are not significant in the context of the application, so either value returned by exp() is acceptable. The problem is lack of reproducibility between 32- and 64-bit builds. Every time we get a difference, someone is going to have to take a look at the differences and decided whether they are insignificant differences attributable to 32/64-bit maths differences, or might be significant differences caused by bugs introduced in migrating the code to 64-bits.
We already have a tolerance factor in our regression test results to allow for minor differences. This has been adequate in the past. Unfortunately, a single tolerance factor is not suitable for all the results we compute. We're looking at allowing different tolerances for different types of result, but I suspect that this won't prove sufficient in all cases, because the allowable tolerance may depend on the input data in some cases.
Sergey M, it's not a hard requirement that 32- and 64-bit results match to within the tolerance factor we already allow, however it's going to make releasing software more expensive if they don't, due to the need to take a close look at test results that differ and sign them off. We can accept a one-off change in test results if we migrate to a different implementation of exp(), because we would know that our code hasn't changed and any differences in results are caused by the change in library; then we can save down new regression data for use in future tests. I appreciate that there may be other reasons why 32- and 64-bit results might be different, however we have established that in the sample we looked at, exp() was responsible for at least 90% of the test failures. So changing to a library that uses the same algorithm for computing exp() in both builds will greatly improve the situation even if it doesn't totally fix it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David C. wrote:
Quote:
iliyapolakwrote:Can you post disassembly of exp()?
I'm not sure I should do that, it might incur the wrath of Microsoft as it's their copyright. You can disassemble it yourself quite easily if you run the test program I gave under the debugger and step into exp().
Sorry for asking this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>However, under some conditions they get multiplied,>>>
Does the multiplication of 32/64 bit difference of results is a mandatory part of your code or is it done only for testing ppurpose?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
>>>However, under some conditions they get multiplied,>>>
Does the multiplication of 32/64 bit difference of results is a mandatory part of your code or is it done only for testing ppurpose?
It's a necessary part of the code. It happens when we take the difference between two similar computations that use exp(), for example when calculating the derivative of a function by bumping. We know there will be a loss or precision when we do this, but as I explained before, the problem for us is lack of consistency between the two builds, not lack of precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What about single precision results did they differ between the 32 and 64 bit libraries?Do you need double precision in your code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We use double precision only, so I haven't tried the single-precision versions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your application by design demand double precision computation domain?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »