Hi Andrey,

Petros_Mamales · ‎02-05-2016

Hi,

I have two questions:

a) In the Mersenne sample code in Matsumoto's page (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c), there are 3 variants for creating a double (64 bit) which he calls : genrand64_real{1,2,3}.

I need to make a comparison with some other implementation and would like to know which one is used in mkl?

Is there a difference between different mersenne brng's (SFMT, MT2203 ) - although maybe SFMT is an entirely different thing (I would not know)

b) in the seeding (initialization) some people skip a number of oucomes to avoid bad behavior. Does mkl do something along these lines?

("bad behavior" meaning the numbers getting "trapped" in certain values - again, no an expert by far)

TIA for your help.

Petros

PS: interested for both windows and linux (although I would not expect it to be different, just making sure)

Petros_Mamales · ‎02-05-2016

Update:

Also the code for generating 32 bit and 64 bit sequences seems to be different in original MT implementations.

Is this the case in MKL?

Andrey_N_Intel · ‎02-05-2016

Hi Petros,

answering your questions:

- double/single version of Intel MKL MT19937 BRNG uses u(k)/2^32 type of transformation which is different from genrand64_real() transformation you see in the author's implementation, please see VSL Notes, section https://software.intel.com/en-us/node/590405

- Intel MKL implementation of MT2203 follows the same algorithmic scheme as MT19937 however it relies on different set of parameters, has shorter state, and shorter period. Its intension -support parallel Monte Carlo simulations. Indeed, MT2203 is a family of ~6000+ MT2203 BRNGs which produce independent random number sequences. SFM19937 is SIMD friendly version of the generator, its algorithmic scheme is adopted for use with vector registers

- Initialization of MT19937 follows the original author's scheme. Using skip-ahead technique supported by the generator you can skip pre-defined number of the generator outcomes

- Intel MKL generators are OS independent

Please, let me know, if it answers your questions.

Thanks,

Andrey

Petros_Mamales · ‎02-05-2016

Hi Andrey,

Thank you for getting back to me.

So from what I read in the VSL notes, MKL uses the 32 bit algorithm.

If you look in M's homepage:

32 bit:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

   y = mt[mti++];

    /* Tempering */
    y ^= (y >> 11);
    y ^= (y << 7) & 0x9d2c5680UL;
    y ^= (y << 15) & 0xefc60000UL;
    y ^= (y >> 18);

    return y;

and 64 bit:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c

   x = mt[mti++];

    x ^= (x >> 29) & 0x5555555555555555ULL;
    x ^= (x << 17) & 0x71D67FFFEDA60000ULL;
    x ^= (x << 37) & 0xFFF7EEE000000000ULL;
    x ^= (x >> 43);

    return x;

i.e. the constants with which the output is "contracted" are different!

However, I do not know the materiality of the difference.

In the same sites you will see that the generation of the double is the second variant of functions:

/* generates a random number on [0,1)-real-interval */
double genrand_real2(void)
{
    return genrand_int32()*(1.0/4294967296.0); 
    /* divided by 2^32 */
}

again in 64 bit this is:

/* generates a random number on [0,1)-real-interval */
double genrand64_real2(void)
{
    return (genrand64_int64() >> 11) * (1.0/9007199254740992.0);
}

(again different and again do not know how material this is).

Also,

the skip-ahead technique is happening here outside the creation of random samples, rather than a stub in the generation routine, so that they can get skipped according to a certain criterion, unless one generates the numbers one by one.

Thank you for all your help, really appreciate it

Petros

PS: Currently, in mkl, random numbers are produced in a row major way, akin more to pathwise MonteCarlo.

This has some issues with extensibility of the sample as well to more time steps, if one cares for reproducibility.

Maybe MKL would like to think of an interface to deposit the output to matrices ( i.e. using the lapack standard for strike and leading dimension) to avoid extra copying and code cluttering.

Andrey_N_Intel · ‎02-08-2016

Hi Petros,

Answering your question about row major layout used by Intel MKL RNGs I wonder if the leapfrogging technique would help you produce the random numbers in the necessary layout. This technique as well as skip-ahead approach also allows supporting parallel Monte Carlo simulations. Please, have a look at VSL Notes, https://software.intel.com/en-us/node/590372 and Intel MKL Manual, https://software.intel.com/en-us/node/521866. You may want also to have a look at the respective training materials which provide the high level summary of capabilities available in Intel MKL Statistical Functions, https://software.intel.com/en-us/articles/intel-mkl-vsl-training-material

Please, let me know if this answers your question.

Andrey

Petros_Mamales · ‎02-09-2016

Hi Andrey,

Thank you for the proposal.

Alas, with mt there is no real way to skip ahead or leap-frog (other than advancing internally the state skipping the output variable creation), which is not efficient. In Sobol, for example, there is a very efficient way to obtain different dimensions by using a key (xor-ing with a dimension vector, if I remember well).

Having said this, it is pretty understandable why mkl would choose to provide numbers in contiguous array, so the user is made aware of it, which is important for quasi-random brngs.

However, I thing that adding a stride to the random numbers generation would be helpful - no big deal though, I can still do the copying ;-))

Thank you for all your help,

Petros

Andrey_N_Intel · ‎02-09-2016

Hi Petros,

Intel MKL version of MT19937 supports skip-ahead feature, while leap-frogging is not supported. Please have a look at the properties of the generator and the reference to the algorithm for skipping-ahead, https://software.intel.com/en-us/node/590405

Can you please provide the (pseudo) code sample that demonstrates the need in strides, if possible?

Thanks,

Andrey

Petros_Mamales · ‎02-10-2016

Hi Andrey,

It does support skip ahead however not efficiently (internally it advances the state sequentially), simply because it is an open mathematical problem, as far as I know.

Here is an example:

Say I want to have a 3 factor stochastic model with 100 time steps and sample size 1e6.

then at each step I want to create 3*1e6 numbers that I want to place in the range (for subsequent simd use) of a row-major ( so that I can iterate one sample point at a time) matrix, of dimensions 1e6 X 3.

In order for this to be extensible I need to dispose the numbers in column-wise manner.

Presently mkl does not give me the ability to directly deposit the numbers in a matrix range , so I have to put them somewhere else and copy them on my own, which is not efficient, a bit clumsy and cluttery.

As I said not a biggy, but since you asked ;-))

Andrey_N_Intel · ‎02-11-2016

Hi Petros,

According to our measurements, skip-ahead algorithm we support in the library for MT19937 is faster than the naïve sequential iteration over the internal state of the generator when the number of outcomes to skip is visibly big. This helps speed-up simulations which require a significant amount of random numbers to be produced in threads. Additionally, the library also provides other basic random number generators such as MCG59 or counter based generator where the cost of block-splitting initialization routines is quite small.

Thanks for providing the example, it helps. In order to make sure I understand it correctly let me specify your description. Let s(t) is price of (one for simplicity) asset to simulate. s(t) = F(s(t-1), W1(t), W2(t), W3(t)) where Wi - the Wiener processes used to represent the factors, they are simulated by Gaussian RNG. I assume you need to correlate them, thus multivariate Gaussian RNG is necessary. You simulate for a given time t all s(t, i) where i is index of sample/path. Is this correct understanding?

If so, you generate a matrix of random numbers that follows multivariate Gaussian distribution. The matrix returned by Intel MKL multivariate Gaussian RNG in this case has 1M rows and 3 columns. You want to apply some transformation to each column of the matrix in simd friendly way. For this reason instead of 1M x 3 matrix it would be better to have 3 x 1M matrix. Is my understanding correct?

Thanks,

Andrey

Petros_Mamales · ‎02-11-2016

Hi Andrey,

I appreciate getting in the trouble to look for an example in what I do ;-))

So, since u asked: -;))

Advancing a stochastic process will involve going on a path by path integration( I refer here to one time step), very likely over tbb threads.

As we said MC-size=1e3 and nFactors = 3,

Now when I will be processing these random numbers, it is best to use blas/simd -ready calls.

For this, I would need the dW (Wienner increments) to be simd-aligned and contiguous . So if I use doubles and want 4 doubles at a time (avx) I would like to embed the dW sample in a 1e6 x 4 matrix which better be row-major (for contiguous dW per path).

Furthermore I want to be able to create random numbers per time-step (extensible). So, I need to populate the columns of the matrix.

Therefore, ideally, mkl would populate the matrix-range (1e6 X 3 ) of the row-major matrix (1e6 X 4) by column.

In lapack (c-version), this wuld be accommodated by setting the leading dimension to 4, the sizes to 1e6 and 3 ) - I actually am more familiar with the fortran layout for lapack matrices, but by extrapolation.

If the brngs were smart enough to do this in one call it would be ideal.

What I have to do instead is:

Dump the brng results into a (column-major) Fortran matrix 1e6 x3,

Use mkl blas extensions for copying to the row-major 1e6 X 4 - which is a bit tricky, a bit error prone, but more importantly costly in allocating memory and performing copies.

HTH,

Petros

PS: At any rate, for synchronicity and common logic over containers, requires that I end up with a 1e6 x 3 logical container - there is much more than a single stock price that I need to propagate..

PS2: Other brngs is a big leap for me right now.

Petros_Mamales · ‎02-11-2016

Note: Actually I prefer to correlate the sample myself when propagating, so I use the single brng.

(Some users for small samples -~1e3 would like to "norrmalize" the sample to unit stdev and mean 0, which has to be created after the samples are created and which could actually be another option in the samples creation in matrix form).

The problem comes from the fact that brngs are imagined as row-major while lapack (the real -;)) - one and more efficient, AFAIK ) is column-major.

Andrey_N_Intel · ‎02-14-2016

Hi Petros,

Thanks for the detailed explanation, this helps. To double-check I understand you correctly. Gaussian RNG produces 1d contiguous array of, say, 3M independent normal numbers g0,g1,g2,g3,g4,g5,... By specifying say matrix layout and leading dimension you would like to get the result from the generator in the following (for example) format:

r[0,0] = g0, r[0,1]=g1, r[0,2]=g2,r[0,3]=xxx

r[1,0] = g3, r[1,1]=g4, r[1,2]=g5,r[1,3]=xxx

...

or, alternatively

r[0,0] = g0, r[0,1]=g(1M), r[0,2]=g(2M), r[0,3]=xxx

r[1,0] = g1, r[1,1]=g(2M-1), r[1,2]=g(3M-1), r[1,3]=xxx

...

and the library does not touch the last column in the matrix (I use 2d notation for the explanation but assume that the numbers would occupy 1d contiguous array). Is this correct interpretation?

Regarding your other comment on the normalization of the sample matrix: for the sample matrix generated earlier, its transformation relying on z-score normalization can be done on later stages of your analysis/computations, correct?

As aside comment/question which does not directly relate to the topic of the discussion: I assume that you do not generate 1M x 3 random numbers at once; instead you generate the numbers in blocks of smaller size to populate say first level cache and then immediately apply repacking and other necessary transformations. Is this correct understanding?

Thanks,

Andrey

Petros_Mamales · ‎02-15-2016

Hi Andrey,

The layout is the one after "alternatuvely".

The reason being extensibility. Now Then strategy is to break up this 1e6 samples into 1e4 and "throw" them to tbb.

This is why random numbers have to be created ahead of time and cached (for the step).

O/wise different (tbb) threads will be calling in the brng-s but there is no guarantee for the sequence they will be called and therefore jumping ahead is not a good way to do things - unless u have a copy of brng per thread which is clearly impractical (just the size of the state and the allocations allone - btw, mkl should provide for different allocation (and de-..) inputs for the functions that create objects under the hood. In the end of the day, a FORTRAN library will always be inherently limited).

A final question, created by your answer earlier:

If I want to use the SFMT with the states from MT2203, your statement earlier suggested that this would not work - because the latter has a smaller state. Logically though, there is no reason for this to be the case. The interface does not welcome putting a state from one brng in another - which makes sense for different ones, but not for the same -mathematically-one).

Can you please confirm? Any workaround?

Thanks for all the help,

Petros

Andrey_N_Intel · ‎02-15-2016

Hi Petros,

Thanks, your RNG use model is clear, its analysis on our side is required. The state of MT19937 is relatively big; to support multi-scale applications the library provides the RNGs with small state and fast blocking initialization such as ARS5 or Philox (however, if I correctly understand, you need to rely on Mersenne Twister in your code).

Answering your question on SFMT and MT2203 - it will not work as the generators are algorithmically different. I'm also not sure about any workaround.

Please, let me know, if any additional clarifications/details on the generators are necessary.

Thanks,

Andrey