Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- BoxMuller Random Deviate differences between processors?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

mattwatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
10:22 AM

99 Views

BoxMuller Random Deviate differences between processors?

I'm generating random deviates using the Intel MKL and have observed differences in the results when using a Xeon when compared to a Core i7. Has anyone else observed this? Is the level of error observed expected? I don't have any differences when using the uniform or ICDF transformations, only the BoxMuller.

With a Xeon, I get1.682149939433206

With an i7, I get 1.682149939381822

I'm using Windows Server 2008 with Visual Studio 2010.

The code I'm using:

#include

#include

#include "mkl_vsl.h"

int main (){

double s;

double r[2];

VSLStreamStatePtr stream;

/* Initializing */

s = 0.0;

vslNewStream( &stream, VSL_BRNG_MT2203, 0x345);

/* Generating */

vdRngGaussian( VSL_RNG_METHOD_GAUSSIAN_BOXMULLER, stream, 2, r, 0.0, 1.0 );

s = r[0];

/* Deleting the stream */

vslDeleteStream( &stream );

/* Printing results */

printf( "Random deviate 1: %1.15f\\n", s );

system("pause");

return 0;

}

Thanks,

Matt

Link Copied

10 Replies

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
11:48 AM

99 Views

mattwatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
02:43 PM

99 Views

Based on your comment, I used the vsRngGaussian function to again compare results. What I found is that the results are identical between the two processors, even beyond the mantissa (ie: both show a result of 1.682149887084961). Does this suggest that the distribution function is generating numbers consistently, but the conversion from single to double via the vdRngGaussian function is introducing error?

mecej4

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-23-2011
07:07 PM

99 Views

I don't know that there is such a conversion, nor do I know the internals of MKL. Here is an example code fragment from one Box-Muller code. Note that both

[cpp] do { x1 = 2.0 * ranf() - 1.0; x2 = 2.0 * ranf() - 1.0; w = x1 * x1 + x2 * x2; } while ( w >= 1.0 ); w = sqrt( (-2.0 * log( w ) ) / w ); [/cpp]

Andrey_N_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2011
12:57 AM

99 Views

Intel MKL Random Number Generators are developed using the calls to the optimized Basic Random Number Generators (BRNGs) and Vector Math Functionswhich are usedfor transformation of BRNG outputs.

We optimize Vector Math functionstoget the good performance of the routinesfor each CPU generation.Vector Math functionswork in 3 accuracy modes: High Accuracy, Low Accuracy, and Enhanced Performance. The accuracy is CPU specific, and you can see slightly different results returned by the same function between different CPUs. Nevertheless, the differences remain within specific error bounds, and each function meets accuracy requirements oneach CPU generation.

You can find more details about accuracy modes in Intel MKL Manual, Chapter "Vector Math Functions" and the document "VML Accuracy and Performance Data" available at http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm

Hope this explains your observations.

Wouldyou please clarify why you care about the slight difference in RNG outputs on different CPUs?

Thanks,

Andrey

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2011
02:39 AM

99 Views

Yes, this is expected behavior. MKL functions, including RNG, are optimized for different processor generations. At run time MKL detects the processor it is running on and selects the best implementation for that processor. These implementations might be algorithmically different but still the accuracy is controlled in such a way to have negligible effect on the application.

In your case the results are identical to 10 decimal digits. This is more than sufficient in Monte Carlo where statistical error dominates. Also such deviations do not statistically worsen the generated numbers. All VSL RNGs are extensively tested by the battery of statistical tests.

As to why you don't observe this on ICDF - this is just your luck :-). For certain inputs it is possible that ICDF also shows minor deviations in results on different CPUs.

Feel free to ask more questions and I will gladly answer those,

Sergey

mattwatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2011
12:11 PM

99 Views

Hi Andrey, Sergey,

I'm interested in this problem because I work for a firm in the financial services sector and we're planning on using the Azure cloud to perform most of our calculations. The discrepancy I observed was simply between two sets of developer workstations (mine, a Xeon, and everyone else's, the i7's). This came up after a unit test failed. I'm also exploring a more significant difference between processors (1e-5) with an optimization problem.

We will be using Monte Carlo simulations to drive many of our calculations. Given that we are a vendor, we would like to ensure that our clients will receive the same result when the same computation is performed. While the discrepancy may be small, any lack of consistency is problematic for us. A portfolio worth $500 million would be worth 40 billion yen, where the current level of error could show a difference in the final digit or two, depending on the processor. Yes, it's small, but it could raise eyebrows.

I looked at the document you referenced, and the default behavior is High Accuracy, meaning there should beerror less than 1.0 ulp (unit-in-the-last-place). This tolerance is met, given the use of a 32-bit random stream of integers, which presumably defines the number of significant digits for the rest of the problem...

Initially, I was under the impression that the generation method,VSL_RNG_METHOD_GAUSSIAN_BOXMULLER, was single precision, but the generator was double precision. This didn't make sense. I assume there is confusion over the naming convention--VSL in this case meaning Vector Statistics Library, while vslNewStream (for example) does in fact denote single precision. The only 32-bit operation comes from the stream itself, no?

Ultimately, it seems that when working with variants of processors, it would be best to use the single precision function (vsRngGaussian, etc...). Given the differences observed, and the certainty that an input to the double precision function is itself single precision (there are no double precision streams, are there?), is the double precision function handicapped by typecasting? Is there any point to having the double precision functions?

Again, thanks for your responses,

Matt

mattwatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-24-2011
03:39 PM

99 Views

One further note...

When I generate a set of 2 uniform random numbers and compute the Box Muller transformation myself, the results agree between the two processors.

Sample code...

#include

#define _USE_MATH_DEFINES

#include

#include

#include "mkl_vsl.h"

#include "mkl_vml_functions.h"

int main ()

{

double n[2];

VSLStreamStatePtr stream;

vslNewStream( &stream, VSL_BRNG_MT2203, 0x345);

vdRngUniform( VSL_RNG_METHOD_UNIFORM_STD_ACCURATE, stream, 2, n, 0.0, 1.0 );

vslDeleteStream( &stream );

double u1 = n[0];

double u2 = n[1];

double x = sqrt(-2*log(u1))*sin(2*M_PI*u2);

printf( "Box Muller Manual 1: %1.15f\n", x );

system("pause");

return 0;

}

< 1.682149949310261="">

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2011
02:56 AM

99 Views

Thank you for additional detail.

If you call vdRngGaussian then all underlying floating-point arithmetic is double precision, not single precision. There is no confusion here. vslNewStream (and many other service routines) is precision independent - it initializes underlying BRNG (state and a few other parameters) which is integer sequence generator. All precision-dependent RNGs in VSL start with prefix "vs" for single precision and "vd" for double precision. So you have full control over VSL precision. Double precision does not map to single precision in any of VSL RNGs.

Unlike VML you do not have control over VSL accuracy. Yes, some VSL RNGs use VML kernels underneath but those accuracyis notcontrolled by VML mode. There isno <1 ulp guarantee for VSL RNGs. Perhaps that was confusing partin Andrey's response. VSL RNG accuracy is relaxed but still more than enough for Monte Carlo simulations.

To summarize, the differences you see likely comes from different code paths for Xeon and i7. Unerlying integer BRNG sequence is identical for Xeon and i7, but the floating-point Box-Muller transformation (double precision if you call vdRngGaussian)implementations may be slightly different for Xeon and i7 to gain maximum performance on each of these processors.

Changing vdRngGaussian to vsRngGaussian will not solve thebit-to-bit difference. Sorry. None of VSLRNGguarantees bitwise identical result across CPU generations. The only resolution with guaranteed result is running the same code path on both Xeon and i7. Unfortunately there are no user controls to fix the code path in current MKL product.

I will bring your issue to our technical support team so that it is properly tracked. We will consider options how we can address your needs.

Thank you for you time,

Sergey

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2011
05:46 AM

99 Views

Additional comment on your code example illustrating Box-Mullerimplementation in C (avoiding call to vdRngGaussian).

Unfortunately there is still no guarantee that the result is bit-to-bit identical across CPUs. Modern optimizing compilers (such as Intel C++/Fortran compilers) can do code versioning for different data layouts (aligned/not aligned) or CPUs. In other words extra action is required on the user side to ensure guaranteed cross-CPU consistency. Compiler math library functions may have different code paths for different CPUs, so there is no guarantee that this line provides bitwise identical output:

double x = sqrt(-2*log(u1))*sin(2*M_PI*u2);

To guarantee cross-CPU consistency of math library functions with Intel compilers you should compile your application with the switch -fimf-arch-consistency=value[:funclist]. You need also control data layouts in memory so that the same code path is used for such layout.

You can find additional details on this topic by following this link http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/cpp/lin/copts/common_opt...

Please be noted that setting -fimf-arch-consistency=true will have performance impact.

I hope that this helps. We'll see how we can address your needs within MKL functions also.

Regards,

Sergey

mattwatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2011
11:10 AM

99 Views

--Matt

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.