- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
True model:
Model1: y_1 = x_1 + 0.9 * x_2 + 0.7 * x_3 + e_1
Model2: y_2 = x_1 + 0.9 * x_2 + 0.7 * x_3 + e_2
where x_1, x_2, x_3 following Uniform distribution in (0, 1), e_1 and e_2 are both Gaussian(0, 1).
I simulate e_1 and e_2 with different methods namely VSL_METHOD_DGAUSSIAN_BOXMULLER and VSL_METHOD_DGAUSSIAN_ICDF.
vdRngGaussian ( VSL_METHOD_DGAUSSIAN_BOXMULLER, stream, 1000, e_1, 0.0, 1.0 );
vdRngGaussian ( VSL_METHOD_DGAUSSIAN_ICDF, stream, 1000, e_2, 0.0, 1.0 );
And then I run two regressions. The first model gives a good estimate but the second one is very poor. Just a few trails:
Model1: 0.911859 0.995013 0.712112
Model2: 3.31478 -0.116816 - 0.232942
Model1: 0.945507 0.89906 0.746299
Model2: 3.40879 -0.149013 -0.306843
Model1: 1.10576 0.825391 0.694707
Model2: 3.32135 -0.122911 -0.276134
At the very beginning I use ICDF but the bias is very serious. It costs me a lot of time to figure out the source of the problem. But I still don't understand why it is so sensitive to the generating method. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stanley,
You initialize and remove the random stream immediately before and after the generation of random numbers. Output of the function time() is used to set the seed for initialization of the random stream. This could result in the same seed in the generation of both uniform and Guassian variates, the same uniform variates underneath Gaussian numbers and biased results.
May I suggest you to make initialization of the VSL random stream once, in the constructor of class sim? Deallocation of VSL RNG related resources can be done once in the destructor of the class. Thus, methods gaussian_1(), gaussian_2() and uniform() of the class sim would be just wrappers around MKL random number generators. Alternatively, you could use different seeds for generation of Gaussian and uniform numbers.This should help you to avoid potentially biased results in your simulations.
Please, let me know if this answers your question.
Thanks,
Andrey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can you please provide more details for your linear regression simulations:
1. Which Intel MKL version do you use? Which OS and mode, IA-32 or Intel 64?
2. How do you link against MKL?
3. Which basic random number generator(s) is(are) used for the simulations?
4. Do you provide the same sequence of uniform random numbers into both models, so the models are different in just source of Gaussian numbers?
5. Whatare the sample mean/other sample estimates in both models?Are theyclose to the theoretical?
Ideally, a short test case (the extract from your simulation application)that demonstrates the issue would be sufficient.
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using MKL10.2 with Intel64, Intel C++ compiler 11.1, and my OS is OpenSUSE 11.2. I compile with the following link
-L$MKLPATH $MKLPATH/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread
and the following random number generator
Uniform: vdRngUniform ( VSL_METHOD_DUNIFORM_STD_ACCURATE, stream, 1000, x, 0.0, 1.0 );
Gaussian: vdRngGaussian ( VSL_METHOD_DGAUSSIAN_BOXMULLER, stream, 1000, e_1, 0.0, 1.0 );
vdRngGaussian ( VSL_METHOD_DGAUSSIAN_ICDF, stream, 1000, e_2, 0.0, 1.0 );
I use the same sequence of x's in two equations. all the sample mean and sd are consistent with the setting.
I just come up with the idea that ICDF is the inverse transformation form the uniform distribution. So my x's would be correlated with the e generating by ICDF. That's why the regression will be bias.
Thanks,
Stanley KO
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stanley,
Thank you much for the additional details. I still have a few questions below to clarify.
To my understanding, your computational scheme is as follows (please feel free to correct or/and add more details):
1. Create random stream
errcode = vslNewStream( &stream, BRNG, SEED );
Can you please specify value for BRNG and SEED parameters?
2. Generate random numbers
vdRngUniform ( VSL_METHOD_DUNIFORM_STD_ACCURATE, stream, 1000, x, 0.0, 1.0 );
vdRngGaussian ( method, stream, 1000, e, 0.0, 1.0 );
where method is VSL_METHOD_DGAUSSIAN_BOXMULLER or VSL_METHOD_DGAUSSIAN_ICDF and e is e_1 or e_2.
Both BoxMuller and ICDF methods transform uniform numbers to produce Gaussian sequence.
If you use pseudo-random number generator like MT19937 you would not expect that uniform numbers x_i and uniforms underneath Gaussian generator are correlated, and thus, no bias should be in your simulations. On the other hand, if you use SOBOL quasi-random number generator as the source of uniform random numbers generally you can run into correlated variates.
For those reasons it could be important to understand which BRNG and seed are used in the simulations.
3. Do regression computations.
According to the equations you provided you need 3 uniform numbers and one normal variate per one output. Does it make sense to generate 3000 uniforms and 1000 Gaussian numbers to obtain 1000 model outputs?
4. Delete stream
errcode = vslDeleteStream( &stream );
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I may post my source code. I have two files. One defines the class for random number generation and the other is the main file.
/******************CLASS FOR RANDOM NUMBER GENERATION*******************/
#include
#include "mkl.h"
#define BRNG VSL_BRNG_MT19937
using namespace std;
class sim
{
private:
VSLStreamStatePtr stream;
public:
int error_code;
void gaussian_1 ( int, double *, double, double );
void gaussian_2 ( int, double *, double, double );
void uniform ( int, double *, double, double );
};
void sim::gaussian_1 ( int N, double * r, double mean, double sigma ) // gaussian with BOXMULLER
{
vslNewStream( &stream, BRNG, time(0) );
error_code = vdRngGaussian ( VSL_METHOD_DGAUSSIAN_BOXMULLER, stream, N, r, mean, sigma );
vslDeleteStream ( &stream );
}
void sim::gaussian_2 ( int N, double * r, double mean, double sigma ) // gaussian with ICDF
{
vslNewStream( &stream, BRNG, time(0) );
error_code = vdRngGaussian ( VSL_METHOD_DGAUSSIAN_ICDF, stream, N, r, mean, sigma );
vslDeleteStream ( &stream );
}
void sim::uniform ( int N, double * r, double a, double b)
{
vslNewStream( &stream, BRNG, time(0) );
error_code = vdRngUniform ( VSL_METHOD_DUNIFORM_STD_ACCURATE, stream, N, r, a, b );
vslDeleteStream( &stream );
}
/*************************THE MAIN FILE***************************/
#include "./sim.h"
#define N 1000
int main()
{
double mean = 0.0;
double sigma = 1.0;
double a = 0.0;
double b = 1.0;
double e_1
double e_2
sim eps;
eps.gaussian_1 ( N, e_1, mean, sigma
); // generate e_1
eps.gaussian_2 ( N, e_2, mean, sigma
); // generate e_2
double x_1
double x_2
double x_3
double * tem_x;
tem_x = new double[3*N];
eps.uniform ( 3*N, tem_x, a, b ); // generate x_1, x_2, x_3
for ( int i = 0; i < N; i++)
{
x_1 = tem_x;
x_2 = tem_x[N+i];
x_3 = tem_x[(2*N)+i];
}
delete[] tem_x;
double y_1
double y_2
for( int i = 0; i < N; i++ )
{
y_1 = x_1 + 0.9 * x_2 +
0.7 * x_3 + e_1; // model 1
y_2 = x_1 + 0.9 * x_2 +
0.7 * x_3 + e_2; // model 2
}
return 0;
}
Stanley KO
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stanley,
You initialize and remove the random stream immediately before and after the generation of random numbers. Output of the function time() is used to set the seed for initialization of the random stream. This could result in the same seed in the generation of both uniform and Guassian variates, the same uniform variates underneath Gaussian numbers and biased results.
May I suggest you to make initialization of the VSL random stream once, in the constructor of class sim? Deallocation of VSL RNG related resources can be done once in the destructor of the class. Thus, methods gaussian_1(), gaussian_2() and uniform() of the class sim would be just wrappers around MKL random number generators. Alternatively, you could use different seeds for generation of Gaussian and uniform numbers.This should help you to avoid potentially biased results in your simulations.
Please, let me know if this answers your question.
Thanks,
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks so much for your correction. I define a constructor and a destructor and everything looks fine. But I still don't understand why BOXMULLER will work but ICDF doesn't.
Thanks so much.
Regards,
Stanley KO
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Stanley,
The output of the uniform generator is u= (u(i)), i=1,...,3*n. If the random stream is initialized with the same seed the same unifrom variates, n in case of ICDF and 2*n in case of Box-Muller would be used in Gaussian generator.
If ICDF method is used for generation of normal variates then the output of the Gaussian generator x=(x(i)), i=1,...,n, is just a one to one transformation of the same sequence u: x(i) = sqrt(2) erfinv( u(i) ) (section 9.3.4 of VSLNotes). In this case the simulations would result in biased results.
Box-Muller transformation is more complex and requires a pair of uniform variates to produce one Gaussian number:
x(i) = sqrt( -2 ln( u(2*i) ) ) sin (2 * pi * u(2*i+1) ) (section 9.3.2 of VSLNotes).
Let us consider your regression scheme in more details. To produce regression output you need 3 uniforms and 1 normal number that is, according to your testcase
y(0) needs (u(0), u(N+0), u(2N+0), x(0) ), x(0) uses ( u(0), u(1) )
y(1) needs (u(1), u(N+1), u(2N+1), x(1) ), x(1) uses ( u(2), u(3) )
y(2) needs (u(2), u(N+2), u(2N+2), x(2) ), x(2) uses ( u(4), u(5) )
y(3) needs (u(3), u(N+3), u(2N+3), x(3) ), x(3) uses ( u(6), u(7) )
etc
Thus, in case of Box-Muller transformation you would expect that the results are biased in "less degree", but the dependence between y(i)looks to be possible. So, it really makes sense to avoid the usage model of the test case that potentially can results in biased simulation outputs.
Please, let me know if this helps.
Thanks,
Andrey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page