MT2203 generates a repeated (duplicated) numbers even within a single workerID thread

sylwester_dygas · ‎10-18-2011

Hi,
Before I start a detailed problem description please forgive me a lack of deep scientific knowledge in this area.
I encountered a problem with MT2203 (the same problem is related to MT19937) generator. It generates a repeated (duplicated) numbers even within a single workerID thread. Currently in our solution we are using MKL version 10.2.x (ILP_64) . But this behavior is exactly the same for:

32bit/64bit (ILP_64) 10.2.7.041
64bit (ilp_64) - 10.3.6.233

In the below example, I'm using a single workerID. I tried with different seed numbers but this doesn't help.

Our solution is focus on providing general purpose random number generator (based internally on MKL) which also can be used for Monte Carlo simulation. We want using a MT2203 generator because it allows as to use up to 6024 possible concurrent worker threads. Duplicated values appears at random index on the output array, so there is no simple solution like drop XXX first values :(

I know that WH generator is also a Monte Carlo method (I do not observe a repeated values) but the limit of 273 different generators (thus, as I understand, a possible maximum 273 concurrent working threads) is rather not safe for us.

Any advice would be appreciated.

Best regards.
Sylwester Dygas

Below you can find my source code example:

RndTest.cpp

[cpp]/**
 * RndTest.cpp
 */
#include 
#include 
#include 
#include "math.h"
#include "limits"
#include "vector"
#include "mpi.h"
#include "mkl_scalapack.h"
#include "mkl.h"

using namespace std;


bool isEqualDouble(double A, double B) {
	return fabs(A - B) < numeric_limits::epsilon();
}

void testRNGSeed(int matrixSize) {
	cout << "test";
	int workerId = 0;
	if (matrixSize <= 0)
		matrixSize = 1024;
	unsigned long numberOfInst = matrixSize ;
	numberOfInst *=1000000;
	double * target = NULL;
	target = new double[numberOfInst];
	cout<< "ArraySize = "<<(numberOfInst/1000000)<<" M"<" << seed << std::endl;
	cout<< "Randomizing..."<= 0; --var) {
		cout << "[" << var << "]=" << target[var] << ", " << endl;
		i--;
		if (i <= 0)
			break;
	};

	vector * v = new vector(numberOfInst);
	vector * vDupIdx = new vector();
	for (int i = 0; i < numberOfInst; ++i) {
			(*v) = target;
	}


	cout << "sorting..." << endl;
	sort(v->begin(), v->end());
	cout << "sorted." << endl;

	cout << "Looking for duplicates" << endl;
	i = 0;
	int nod = 0;
	double prev = 0.0;

	for (i = 0; i < v->size(); ++i) {
		if (i > 0) {
			if (isEqualDouble(prev, (*v)) == true) {
				nod++;
				//cout << (*v)[i-1] << " ==" << (*v) << endl;
				/*vDupIdx->push_back(getDuplFirstIdx(target, numberOfInst, (*v)));*/
			}
		}//if
		prev = (*v);
	}//for
	cout<< "Number of duplicates = "<<clear();
	vDupIdx->clear();
}

	int main(int argc, char** argv) {
		testRNGSeed(10); //Millions
	return 0;
}
[/cpp]

MakeFile

[bash]MKLROOT= /opt/intel/mkl/10.2.7.041
LIBROOT  = $(MKLROOT)/lib/32
CC  = /home/sdygas/Work/mpi/bin/mpiCC
MPI_BIND_LIB = openmpi
MPI_INCLUDE=/home/sdygas/Work/mpi/include

TDIR=.
LIBFILES := \
 $(wildcard ../*.cpp)\

#10.2.x x32 bit
LNARGS_DYN= -L$(LIBROOT) -lmkl_scalapack_core -lmkl_solver_sequential -Wl,--start-group -lmkl_intel -lmkl_sequential -lmkl_core -lmkl_blacs_$(MPI_BIND_LIB) -Wl,--end-group -lpthread


#10.2.x x32bit
CCARGS = -I$(MKLROOT)/include \
  -I ./include \
  -I $(MPI_INCLUDE) \
  -I ../ \
  -g


$(TDIR)/RndTest: RndTest.cpp $(LIBFILES)
 $(CC) $< -o $@ $(CCARGS) $(LIBFILES) $(LNARGS_DYN) 
  [/bash]

test.sh launch script

[bash]#!/bin/bash
#x32bit 10.2.x
/home/sdygas/Work/mpi/bin/mpirun -x LD_LIBRARY_PATH=/opt/intel/mkl/10.2.7.041/lib/32 ./RndTest[/bash]

example output:

[plain]sdygas@ubuntu:~/Work/workspace1/RndTest$ make
	/home/sdygas/Work/mpi/bin/mpiCC RndTest.cpp -o RndTest -I/opt/intel/mkl/10.2.7.041/include -I ./include -I /home/sdygas/Work/mpi/include -I ../ -g   -L/opt/intel/mkl/10.2.7.041/lib/32 -lmkl_scalapack_core -lmkl_solver_sequential -Wl,--start-group -lmkl_intel -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi -Wl,--end-group -lpthread 
sdygas@ubuntu:~/Work/workspace1/RndTest$ 
sdygas@ubuntu:~/Work/workspace1/RndTest$ ./test.sh 
testArraySize = 10 M
Generating seed from time: ->1316182621
Randomizing...
[9999999]=-23.9141, 
[9999998]=1.48806, 
[9999997]=-0.145264, 
[9999996]=-1.54288, 
[9999995]=-1.01094, 
[9999994]=-0.942214, 
[9999993]=-0.916466, 
[9999992]=1.11978, 
[9999991]=2.30275, 
[9999990]=-0.910964, 
sorting...
sorted.
Looking for duplicates
Number of duplicates = 11733

sdygas@ubuntu:~/Work/workspace1/RndTest$ [/plain]

mecej4 · ‎10-18-2011

If the underlying uniform RNG is a LCG, and the period were 10 million/11733, which is less than 1000, that would be indicate that the RNG was unaccceptable. However, I do not believe that to be the case, and that can be established by comparing v[2:N] to v[1]. Thus, we do not have a cycle of period less than 1000 that repeats 11733 times.

Rather, I suspect that there is insufficient precision in the numerical inversion of the CDF, which is the process used to convert a uniform real random number sequence to another with a specified non-uniform distribution.

A related question, which is not involved yet, is whether your application with multiple RN streams can tolerate the Marsaglia effect (see, e.g., this Wikipedia article).

Andrey_N_Intel · ‎10-18-2011

Hello Sylwester,

Thanks for your question, the test caseand detailed description of the environment in which you get the results.
As a quick note Intel MKL provides set of basic random number generators whose output should be the same between operation systems, library versions, LP64/ILP64 mode. This helps to provide Intel MKL RNG based SW solution which is platform independent.

Let us return to Mersenne Twister MT19937 BRNG.
Thisgenerator like othergenerator is definedby its state S(i) which is array of 624 unsigned integers, transition function T:S(i)->S(i+1) which isdescribed by the MT algorithm, set of of output symbols O,unsigned32-bit integers, and output function, g:S(i)->O(i) which allows converting currentstate of the generator S into output number you use in your computations. To start computations, initial stateS(0) should be provided.
Period of the generator (that is, the small positive number p such that S(i+p)=S(i) for some i) is huge, 2^19937-1. Thus, to obtain subsequence of generator outputs which repeat the subsequence generated earlier you need to run your simulations really for the long time.
In itsturn, the generator is 32-bit, and you have just 2^32numbers as the output of the algorithm.
It should mean that for relatively big vectors yougetfrom thegenerator you willsee the same numbers from time to time. However, please, make the difference between this case when the same random numbers are sporadically repeated in the sequence, and the case when the subsequence of random numbers absolutely reproduces the previous part of the sequence.

I believe that the same things are applicable to MT2203 BRNG which is a parametric modification of MT19937forparallel computations.

If you run your experiment with other VSL BRNG, say MCG31 whose state is presented with a scalar you will likely see zero number of repeations. In case of this generator, appearence of the same number in the sequence should indicate repeating the same state and, thus, period related problem. For Intel MKL VSL MCG31 generator you would need to generatesay ~2^31 (BRNG period)numbers or so to see this effect.

It's worth noting, thatMKL generators are tested on several levels.
First, DieHard testsare applied to Basic Random Number Generators; then the distribution generators are tested using statistical tests like Chi2 or Moments. We also apply additional tests to the generators to make sure that they properly run in various envirornments, like in multi-threaded.

Please, have a look at theadditional descriptionof the tests we use to test the quaility of MKL generators andtest results for eachBRNG in VSLNotes,
http://software.intel.com/sites/products/documentation/hpc/mkl/vslnotes/vslnotes.pdf

Please, let me know if this answers your question. Feel free to ask more questions related to the generators orStatistical Component of Intel MKL at a whole, and we would be ready to help.

Best,
Andrey

Sergey_M_Intel2 · ‎10-19-2011

Hello Sylwester,

Let me provideupdate on your issue and add a few notes to what Andrey said.

First of all, Andrey reproduced the issue you have reported on several basic random number generators. Further, the issue is reproduced not only with the Cauchy distribution transformation, it is reproduced with the uniform distribution also. Based on that our conclusion is that the duplicates come from BRNG rather than from the distribution transformation.

Second, there is nothing wrong if duplicates occur in random sequence if it is large enough. Underlying BRNG sequence can be always viewed as a random sampling from a lot with repetitions. In other words there is always a probability to get repeated numbers in true random sample. But...

The number of duplicates that you observe ~11,000 per 10M numbers is too large - it does not match withthe theory (under assumption of truly random sequence).

An interesting fact is that Andrey reproduced this behavior for several BRNGs - MT19937, MT2203, R250, MRG32k3a. The most curious part is that all these BRNG give the number of duplicates ~11,000. This suggests that the issue you're facing is unlikely due to some deficiency of particular BRNG (or family of BRNGs).

Andrey also conducted experiments with MCG31m1, MCG59, and WH. The number of duplicates are zero. This is a closer match to theory, i.e. probability to get at least one duplicate in sample of size 10M is fairly small. But...

MCG31m1, MCG59, and WH are essentially the mutliplicative congruential generators (WH is a combination of several MCG).The important factabout MCG is thatits state is determined by 1 number, and the duplicate is just impossible unless you exchaust the entire BRNG period. The smallest period is ~2 billion for MCG31m1. So it is not a surprise thatwe don't see any duplicates for MCG31m1, MCG59, and WHin 10M sample.

In contrast, other VSL BRNGs state is more than 1 number, so as Andrey indicated repetitions are possible in theory. Andrey will try to find out what underlying features ofthese BRNGs result in excessive number of duplicates. This is a research problem and the result will unlikely come immediately. Sorry about that.

It may come up that MT and R250 are just not suitable for modeling accurately the probabilities of duplicates. If you reallywant to avoid duplicatesthen my recommendation would be simply to use MCG59 or WH. Both BRNGs support skeap-ahead and leapfrog parallelization techniques - by using it you can get arbitrary number of worker threads. In other words, WH limitationof 273 sets should not be a showstopper for you.

Best wishes,
Sergey Maidanov

sylwester_dygas · ‎10-23-2011

I would like to thank everyone for your help and insightful remarks. We will use WH generator in conjunction with stream skipping-ahead method.
Without your help we could get stuck for days or weeks....
As far as problem is concerned - I'm thinking about one reason behind the RNG quality degradation could be the initialization procedure: apparently, the state of generator (624* 8 B) is set using only one integer (8 B), somehow restricting the entropy of whole sequence. This problem does not appear in R implementation, so you could treat it as a reference.

Best regards

Sergey_M_Intel2 · ‎10-23-2011

Hi Sylwester,

Glad to see that you're no longer stuck with this issue. Please feel free to report any other roadblock on the way of incorporation of the WH.

Good point about initialization procedure for MT19937. Andrey and I tried to check this hypothesis a few days ago. Our assumption was, since MT19937 has a period 2^263*32 and its state is 263*32 bits long then it implies that over time it covers all bit combinations of the state irregardless of the initial state. So if the initialization is indeed poor then warming up the MT19937 state would fix the excessive number of duplicates over time.

We ran MT onthe slightly modifiedtestcase by warming up MT by skipping up to 1000*10M elements. In every such case we still received ~11K duplicates consistently.

With your inputs about R we probably need to run additional experiments.

Regards,
Sergey

Andrey_N_Intel · ‎10-28-2011

Hello Sylwester,

Idid the computations to understand the behavior of MT BRNG you observe in your test case.
First, I appliedthe Birthday Paradox approach to estimate the probability of the event that at least one dublicate is possible in the random sequence of 32 bit numbers.
My computations show thatfor random sequence ofsize 77002 this probability is 0.5, forsequence of size 117002 this probability is 0.8. The sequence which consists of at least 214K elements will have at least one duplicate with probability 1. In the sequence of 10M numbers (as in your test case) you would see at least one duplicate. So, from this perspective, behavior of BRNG is aligned with theory.

Second, I computed expected number of duplicates in sequence of size n=10M:
1.Number of possible pairs in the sequence is N = C(n,2) = n! / (2! (n-2)! ) = n * (n-1)/2 = 10M * (10M-1)/2= 49999995000000

2. Probability p0 to get a duplicate in any pair is 1/2^32

3.S(N) is number of the pairs-duplicates, and the expectation E S(N) = N * p0 = 11642 given the probability of each outcome is the same.

So, the expected number of repeatings in sequence of 10M elements is 11,642.

Yousee number of dublicates 11733. Using double-sided testI checkedif observed frequency of duplicates p=number of duplicates / N and theoretical p0, that is11733/N=2.34660023466002E-10 and 2.3283064365387E-10,are the same (null hypothesis).Informally, they are close, and the test confirms that they arestatistically insignificant:p-value returned by the test is ~0.39 for 0.05 significance level. This meansthat the observed frequency of dublicates in sample of size 10M is aligned with expected one.

In other words, behavior of MT BRNG you see in your test case is aligned with the theory. My recommendation is to useMT in your application.
Please, let me know if this addresses your concern/answers the question.

Thanks,
Andrey

sylwester_dygas · ‎12-05-2011

Hi Gentlemen, below is the response from my colleague from the Analytics team:

"Thank you for detailed explanation.
It seems that all is clear now: the resolution of RNG is 2^32 in the (0,1), so it does not use full range of IEEE double precision representation (allowing 2^52 uniformly distributed values in range [0,1) ), in fact creating rather a discrete RNG instead of approximating the continuous distribution in numerical arithmetic.
Is there any simple (and preferably fast :) ) way to increase the density (eg. to the level of drand48)?"

Best regards

Sergey_M_Intel2 · ‎12-05-2011

Hello Sylwester,

Good question! Your colleague is right about the MT2203 resolution. In fact most of VSL BRNGs have resolution limited to 32 bits with a few exceptions. Those are

1) MCG59 - 59-bits of resolution (though a few least significant bits are known to have deficiencies due to nature of the linear congruential generators)
2) WH - it doesn't provide fixed resolution. Due to nature of the algorithm it varies in range between 16 bits and 48 bits.

32 bit resolution is typically enough though I can imagine the cases when higher resolution may be desired. Can you please check with your colleague at Analytics team whether at least 48+ bit resolution is really needed?

So the simplest way to get higher resolution is to use either MCG59 or WH. If it doesn't work for you then you can still rely on VSL (and MT2203 in particular) to get higher resolution random numbers. But it will require a bit of effort on your side plus the BRNG performance will be at least twice less than using 32-bit MT2203.

The idea is to use viRngUniformBits64 to get 64-bit integers based on MT2203. Next you convert 64-bit integer to double (0,1) interval to get full-resolution doubles.

To further apply VSL distribution transformations you will need to use either VSL abstract streams or to register your "new" BRNG in VSL. VSL documentation provides examples how to use abstract streams and how to register own BRNG.

Please let me know if it works. In the meantime we will discuss internally other possible ways to address your needs.

Regards,
Sergey