<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Gibbs sampling solution ? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027339#M19966</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hi&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation&amp;nbsp;&lt;A href="http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/"&gt;http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution&lt;/P&gt;

&lt;P&gt;2 questions -&lt;/P&gt;

&lt;P&gt;1 is there a ready made MKL solution?&lt;/P&gt;

&lt;P&gt;2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size &amp;gt; 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)&lt;/P&gt;

&lt;P&gt;thanks&lt;/P&gt;

&lt;P&gt;Steve&lt;/P&gt;

&lt;P&gt;include 'mkl_vsl.f90'&lt;BR /&gt;
	PROGRAM Gibbs&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;USE IFPORT&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; USE MKL_VSL_TYPE&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; USE MKL_VSL&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;IMPLICIT NONE&lt;BR /&gt;
	REAL(8) START_CLOCK, STOP_CLOCK&lt;BR /&gt;
	INTEGER status,n,i,j, M, thin&lt;BR /&gt;
	REAL(8), DIMENSION(1) :: x,y&lt;BR /&gt;
	TYPE (VSL_STREAM_STATE) :: stream, stream2&lt;BR /&gt;
	REAL(8) alpha, a&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE&lt;BR /&gt;
	!VSL_RNG_METHOD_GAMMA_GNORM&lt;BR /&gt;
	!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE&lt;/P&gt;

&lt;P&gt;START_CLOCK = DCLOCK()&lt;/P&gt;

&lt;P&gt;n=1&lt;BR /&gt;
	alpha = 3.0&lt;BR /&gt;
	a=1.0&lt;BR /&gt;
	x(1) = 0.0&lt;BR /&gt;
	y(1) = 0.0&lt;BR /&gt;
	M=50000&lt;BR /&gt;
	thin=1000&lt;/P&gt;

&lt;P&gt;status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )&lt;BR /&gt;
	status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )&lt;/P&gt;

&lt;P&gt;! f(x|y) = (x^2)*exp(-x*(4+y*y)) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ## a Gamma density kernel&lt;BR /&gt;
	! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) &amp;nbsp;## a Gaussian kernel&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	do j=1,M&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;do i=1,thin&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;y(1) = 1.0/(x(1)+1) + y(1)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;enddo&lt;BR /&gt;
	enddo&lt;/P&gt;

&lt;P&gt;print*, "X" , x&lt;BR /&gt;
	print*, "Y" , y&lt;BR /&gt;
	STOP_CLOCK = DCLOCK()&lt;BR /&gt;
	print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'&lt;/P&gt;

&lt;P&gt;end PROGRAM Gibbs&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 17 Oct 2014 16:18:36 GMT</pubDate>
    <dc:creator>steve_o_</dc:creator>
    <dc:date>2014-10-17T16:18:36Z</dc:date>
    <item>
      <title>Gibbs sampling solution ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027339#M19966</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hi&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I'm attempting to write a restricted boltzman machine using Gibbs Sampling for a deep learning neural net . I had a look in MKL and didn't find a specific routine so I had a search on the internet and found a C/Java/Python/R/Scala implementation&amp;nbsp;&lt;A href="http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/"&gt;http://www.r-bloggers.com/mcmc-and-faster-gibbs-sampling-using-rcpp/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I created my own implementation using ifort and MKL based on C code I found there and on referenced pages, I'm not a mathematician but I did physics at university 30yrs ago and have written neural nets before so I can follow a formula and I get the rough gist of gibbs sampling but I'm looking at GS as a black box solution&lt;/P&gt;

&lt;P&gt;2 questions -&lt;/P&gt;

&lt;P&gt;1 is there a ready made MKL solution?&lt;/P&gt;

&lt;P&gt;2 The C code from the web runs in just under 8 seconds on my computer, however the Fortran version using gamma and gaussian distribution takes 55 sec which is slower than python. Now I assume this is because the other web progs are using distributions returning scalars rather than a vector of size 1 like me, plus there is no statement as to correctness of implementation of the C/Java/Python etc libs. Indeed , I changed the return vector size in fortran to a large size and proportionally reduced the loop size and the the MKL implementation comes in under 2 seconds, so I'm obviously not doing a like by like comparison. BUT, my simplistic understanding of Gibbs sampling is that x and y need to be cross related across the 2 distributions and I can't think how to do this with a vector of size &amp;gt; 1 to take advantage of the MKL implementation, any ideas?? (I'm using a Mersenne Twister as a direct comparison - I can cut time in half with a simpler method)&lt;/P&gt;

&lt;P&gt;thanks&lt;/P&gt;

&lt;P&gt;Steve&lt;/P&gt;

&lt;P&gt;include 'mkl_vsl.f90'&lt;BR /&gt;
	PROGRAM Gibbs&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;USE IFPORT&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; USE MKL_VSL_TYPE&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; USE MKL_VSL&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;IMPLICIT NONE&lt;BR /&gt;
	REAL(8) START_CLOCK, STOP_CLOCK&lt;BR /&gt;
	INTEGER status,n,i,j, M, thin&lt;BR /&gt;
	REAL(8), DIMENSION(1) :: x,y&lt;BR /&gt;
	TYPE (VSL_STREAM_STATE) :: stream, stream2&lt;BR /&gt;
	REAL(8) alpha, a&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	!VSL_RNG_METHOD_GAMMA_GNORM_ACCURATE&lt;BR /&gt;
	!VSL_RNG_METHOD_GAMMA_GNORM&lt;BR /&gt;
	!VSL_RNG_METHOD_EXPONENTIAL_ICDF_ACCURATE&lt;/P&gt;

&lt;P&gt;START_CLOCK = DCLOCK()&lt;/P&gt;

&lt;P&gt;n=1&lt;BR /&gt;
	alpha = 3.0&lt;BR /&gt;
	a=1.0&lt;BR /&gt;
	x(1) = 0.0&lt;BR /&gt;
	y(1) = 0.0&lt;BR /&gt;
	M=50000&lt;BR /&gt;
	thin=1000&lt;/P&gt;

&lt;P&gt;status = vslnewstream( stream, VSL_BRNG_SFMT19937, 1777 )&lt;BR /&gt;
	status = vslnewstream( stream2, VSL_BRNG_SFMT19937, 1877 )&lt;/P&gt;

&lt;P&gt;! f(x|y) = (x^2)*exp(-x*(4+y*y)) &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ## a Gamma density kernel&lt;BR /&gt;
	! f(y|x) = exp(-0.5*2*(x+1)*(y^2 - 2*y/(x+1)) &amp;nbsp;## a Gaussian kernel&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	do j=1,M&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;do i=1,thin&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, a, (1.0/(4.0 + y(1)**2) ) )&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, a, 1.0/sqrt(2*x(1)+2) )&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;y(1) = 1.0/(x(1)+1) + y(1)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;enddo&lt;BR /&gt;
	enddo&lt;/P&gt;

&lt;P&gt;print*, "X" , x&lt;BR /&gt;
	print*, "Y" , y&lt;BR /&gt;
	STOP_CLOCK = DCLOCK()&lt;BR /&gt;
	print *, 'Gibbs Sampler took:', STOP_CLOCK - START_CLOCK, 'seconds.'&lt;/P&gt;

&lt;P&gt;end PROGRAM Gibbs&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Oct 2014 16:18:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027339#M19966</guid>
      <dc:creator>steve_o_</dc:creator>
      <dc:date>2014-10-17T16:18:36Z</dc:date>
    </item>
    <item>
      <title>Hi Steve, the answers to your</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027340#M19967</link>
      <description>&lt;P&gt;Hi Steve, the answers to your questions are below. Please, le me know, if this helps. Andrey&lt;/P&gt;

&lt;P&gt;1. No, Intel MKL does not provide such a solution.&lt;/P&gt;

&lt;P&gt;2. The answer to the second question is complex and contains several items:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; - as you mention above, the&amp;nbsp;call to Intel MKL RNGs on the vector size 1 is not effective from perspective of the performance. Thus, it makes sense to apply blocking in the computations&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; - you have the dependence between parameters of the generators on and inside of each iteration of the loop.&amp;nbsp;&amp;nbsp;To resolve it, we might want to try the following approach which relies on the properties of the distribution generators. Let a and beta be displacement and scaling parameters of the Gamma generator, then Gamma(a,beta)=beta*Gamma(0,1)+a. Let a and sigma be mean and standard deviation of the Gaussian generator, than N(a,sigma)=sigma*N(0,1)+a.&lt;/P&gt;

&lt;P&gt;Generate array of Gamma numbers&amp;nbsp;via call to&amp;nbsp;vdrnggamma(VSL_RNG_METHOD_GAMMA_GNORM, stream, n, x, alpha, 0.0, 1.0). Generate array of Gaussian numbers via call to vdrnggaussian( VSL_RNG_METHOD_GAUSSIAN_ICDF, stream2, n, y, 0.0,&amp;nbsp;1.0 ). Size of both arrays is n.&lt;/P&gt;

&lt;P&gt;Use the following recurrent dependence&lt;/P&gt;

&lt;P&gt;beta = 0.0;&lt;/P&gt;

&lt;P&gt;for all i = 1,...,n&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; x&lt;I&gt; =a + 1/(4+beta*beta) * x&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y&lt;I&gt; = a + 1/sqrt(x&lt;I&gt;+2)*y&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; beta = 1 / (x&lt;I&gt; + 1) + y&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;- instead of using&amp;nbsp;the same basic random number generator initialized with different streams you might want to use the generator&amp;nbsp;such as MCG59 which supports LeapFrog feature and help to split the original random number sequence into non-intersecting subsequences. The additional details are available in Intel MKL Manual and VSL RNG&amp;nbsp;notes.&amp;nbsp;The code&amp;nbsp;looks something like this&lt;/P&gt;

&lt;P&gt;status = vslnewstream( stream, VSL_BRNG_MCG59,&amp;nbsp;seed )&lt;/P&gt;

&lt;P&gt;status=vslcopystream( stream2,stream )&lt;BR /&gt;
	status=vslleapfrogstream( stream, 0,&amp;nbsp;2 )&lt;/P&gt;

&lt;P&gt;status=vslleapfrogstream( stream2, 1,&amp;nbsp;2 )&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 08:14:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Gibbs-sampling-solution/m-p/1027340#M19967</guid>
      <dc:creator>Andrey_N_Intel</dc:creator>
      <dc:date>2014-10-20T08:14:33Z</dc:date>
    </item>
  </channel>
</rss>

