topic Hi Steve, the answers to your in IntelĀ® oneAPI Math Kernel Library
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027340#M19967
<P>Hi Steve, the answers to your questions are below. Please, le me know, if this helps. Andrey</P>
<P>1. No, Intel MKL does not provide such a solution.</P>
<P>2. The answer to the second question is complex and contains several items:</P>
<P> - as you mention above, the call to Intel MKL RNGs on the vector size 1 is not effective from perspective of the performance. Thus, it makes sense to apply blocking in the computations</P>
<P> - you have the dependence between parameters of the generators on and inside of each iteration of the loop. To resolve it, we might want to try the following approach which relies on the properties of the distribution generators. Let a and beta be displacement and scaling parameters of the Gamma generator, then Gamma(a,beta)=beta*Gamma(0,1)+a. Let a and sigma be mean and standard deviation of the Gaussian generator, than N(a,sigma)=sigma*N(0,1)+a.</P>
<P>Generate array of Gamma numbers via call to vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, 0.0, 1.0). Generate array of Gaussian numbers via call to vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, 0.0, 1.0 ). Size of both arrays is n.</P>
<P>Use the following recurrent dependence</P>
<P>beta = 0.0;</P>
<P>for all i = 1,...,n</P>
<P> x<I> =a + 1/(4+beta*beta) * x<I>;</I></I></P>
<P> y<I> = a + 1/sqrt(x<I>+2)*y<I>;</I></I></I></P>
<P> beta = 1 / (x<I> + 1) + y<I>;</I></I></P>
<P>- instead of using the same basic random number generator initialized with different streams you might want to use the generator such as MCG59 which supports LeapFrog feature and help to split the original random number sequence into non-intersecting subsequences. The additional details are available in Intel MKL Manual and VSL RNG notes. The code looks something like this</P>
<P>status = vslnewstream( stream, VSL_BRNG_MCG59, seed )</P>
<P>status=vslcopystream( stream2,stream )<BR />
status=vslleapfrogstream( stream, 0, 2 )</P>
<P>status=vslleapfrogstream( stream2, 1, 2 )</P>
<P> <BR />
</P>Mon, 20 Oct 2014 08:14:33 GMTAndrey_N_Intel2014-10-20T08:14:33ZGibbs sampling solution ?
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027339#M19966
<P><SPAN style="font-size: 1em; line-height: 1.5;">Hi</SPAN></P>
<P>I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation <A href="http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/">http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/</A></P>
<P>I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution</P>
<P>2 questions -</P>
<P>1 is there a ready made MKL solution?</P>
<P>2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size > 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)</P>
<P>thanks</P>
<P>Steve</P>
<P>include 'mkl_vsl.f90'<BR />
PROGRAM Gibbs</P>
<P> USE IFPORT<BR />
USE MKL_VSL_TYPE<BR />
USE MKL_VSL<BR />
IMPLICIT NONE<BR />
REAL(8) START_CLOCK, STOP_CLOCK<BR />
INTEGER status,n,i,j, M, thin<BR />
REAL(8), DIMENSION(1) :: x,y<BR />
TYPE (VSL_STREAM_STATE) :: stream, stream2<BR />
REAL(8) alpha, a</P>
<P><BR />
!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE<BR />
!VSL_RNG_METHOD_GAMMA_GNORM<BR />
!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE</P>
<P>START_CLOCK = DCLOCK()</P>
<P>n=1<BR />
alpha = 3.0<BR />
a=1.0<BR />
x(1) = 0.0<BR />
y(1) = 0.0<BR />
M=50000<BR />
thin=1000</P>
<P>status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )<BR />
status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )</P>
<P>! f(x|y) = (x^2)*exp(-x*(4+y*y)) ## a Gamma density kernel<BR />
! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) ## a Gaussian kernel</P>
<P><BR />
do j=1,M<BR />
do i=1,thin<BR />
status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )<BR />
status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )<BR />
y(1) = 1.0/(x(1)+1) + y(1)<BR />
enddo<BR />
enddo</P>
<P>print*, "X" , x<BR />
print*, "Y" , y<BR />
STOP_CLOCK = DCLOCK()<BR />
print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'</P>
<P>end PROGRAM Gibbs</P>
<P> </P>Fri, 17 Oct 2014 16:18:36 GMThttps://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027339#M19966steve_o_2014-10-17T16:18:36ZHi Steve, the answers to your
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027340#M19967
<P>Hi Steve, the answers to your questions are below. Please, le me know, if this helps. Andrey</P>
<P>1. No, Intel MKL does not provide such a solution.</P>
<P>2. The answer to the second question is complex and contains several items:</P>
<P> - as you mention above, the call to Intel MKL RNGs on the vector size 1 is not effective from perspective of the performance. Thus, it makes sense to apply blocking in the computations</P>
<P> - you have the dependence between parameters of the generators on and inside of each iteration of the loop. To resolve it, we might want to try the following approach which relies on the properties of the distribution generators. Let a and beta be displacement and scaling parameters of the Gamma generator, then Gamma(a,beta)=beta*Gamma(0,1)+a. Let a and sigma be mean and standard deviation of the Gaussian generator, than N(a,sigma)=sigma*N(0,1)+a.</P>
<P>Generate array of Gamma numbers via call to vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, 0.0, 1.0). Generate array of Gaussian numbers via call to vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, 0.0, 1.0 ). Size of both arrays is n.</P>
<P>Use the following recurrent dependence</P>
<P>beta = 0.0;</P>
<P>for all i = 1,...,n</P>
<P> x<I> =a + 1/(4+beta*beta) * x<I>;</I></I></P>
<P> y<I> = a + 1/sqrt(x<I>+2)*y<I>;</I></I></I></P>
<P> beta = 1 / (x<I> + 1) + y<I>;</I></I></P>
<P>- instead of using the same basic random number generator initialized with different streams you might want to use the generator such as MCG59 which supports LeapFrog feature and help to split the original random number sequence into non-intersecting subsequences. The additional details are available in Intel MKL Manual and VSL RNG notes. The code looks something like this</P>
<P>status = vslnewstream( stream, VSL_BRNG_MCG59, seed )</P>
<P>status=vslcopystream( stream2,stream )<BR />
status=vslleapfrogstream( stream, 0, 2 )</P>
<P>status=vslleapfrogstream( stream2, 1, 2 )</P>
<P> <BR />
</P>Mon, 20 Oct 2014 08:14:33 GMThttps://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027340#M19967Andrey_N_Intel2014-10-20T08:14:33Z