Hi Steve, the answers to your

steve_o_ · ‎10-17-2014

Hi

I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/

I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution

2 questions -

1 is there a ready made MKL solution?

2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size > 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)

thanks

Steve

include 'mkl_vsl.f90'
PROGRAM Gibbs

USE IFPORT
USE MKL_VSL_TYPE
USE MKL_VSL
IMPLICIT NONE
REAL(8) START_CLOCK, STOP_CLOCK
INTEGER status,n,i,j, M, thin
REAL(8), DIMENSION(1) :: x,y
TYPE (VSL_STREAM_STATE) :: stream, stream2
REAL(8) alpha, a

!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE
!VSL_RNG_METHOD_GAMMA_GNORM
!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE

START_CLOCK = DCLOCK()

n=1
alpha = 3.0
a=1.0
x(1) = 0.0
y(1) = 0.0
M=50000
thin=1000

status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )
status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )

! f(x|y) = (x^2)*exp(-x*(4+y*y)) ## a Gamma density kernel
! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) ## a Gaussian kernel

do j=1,M
   do i=1,thin
       status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )
       status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )
       y(1) = 1.0/(x(1)+1) + y(1)
   enddo
enddo

print*, "X" , x
print*, "Y" , y
STOP_CLOCK = DCLOCK()
print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'

end PROGRAM Gibbs

Andrey_N_Intel · ‎10-20-2014

Hi Steve, the answers to your questions are below. Please, le me know, if this helps. Andrey

1. No, Intel MKL does not provide such a solution.

2. The answer to the second question is complex and contains several items:

- as you mention above, the call to Intel MKL RNGs on the vector size 1 is not effective from perspective of the performance. Thus, it makes sense to apply blocking in the computations

- you have the dependence between parameters of the generators on and inside of each iteration of the loop. To resolve it, we might want to try the following approach which relies on the properties of the distribution generators. Let a and beta be displacement and scaling parameters of the Gamma generator, then Gamma(a,beta)=beta*Gamma(0,1)+a. Let a and sigma be mean and standard deviation of the Gaussian generator, than N(a,sigma)=sigma*N(0,1)+a.

Generate array of Gamma numbers via call to vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, 0.0, 1.0). Generate array of Gaussian numbers via call to vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, 0.0, 1.0 ). Size of both arrays is n.

Use the following recurrent dependence

beta = 0.0;

for all i = 1,...,n

x =a + 1/(4+beta*beta) * x;

y = a + 1/sqrt(x+2)*y;

beta = 1 / (x + 1) + y;

- instead of using the same basic random number generator initialized with different streams you might want to use the generator such as MCG59 which supports LeapFrog feature and help to split the original random number sequence into non-intersecting subsequences. The additional details are available in Intel MKL Manual and VSL RNG notes. The code looks something like this

status = vslnewstream( stream, VSL_BRNG_MCG59, seed )

status=vslcopystream( stream2,stream )
status=vslleapfrogstream( stream, 0, 2 )

status=vslleapfrogstream( stream2, 1, 2 )

Gibbs sampling solution ?