<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Increase chemistry calculation performances, EXPONENTIAL in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784747#M1749</link>
    <description>Sorry for the delay, I was not able to go on Internet this weekend.&lt;BR /&gt;&lt;BR /&gt;OK, I understand how it works now, it just replace the standard exponential routine. But even with this SVML, I still got better performances with the vdexp subroutine, so I think I will use this one.&lt;BR /&gt;&lt;BR /&gt;I still just have one question : when should I use -mkl:sequential, -mkl:cluster and -mkl:parallel ? I didn't find how to choose which one is the better...</description>
    <pubDate>Mon, 12 Jul 2010 11:49:10 GMT</pubDate>
    <dc:creator>benoit_leveugle</dc:creator>
    <dc:date>2010-07-12T11:49:10Z</dc:date>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784739#M1741</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I am trying to improve the performance of our calculation code.&lt;BR /&gt;One of the main CPU costly subroutine is the chemistry calculation (Arrhenius Law). It is a point to point exponetial calculation, so each point is fully independant and a simple generic test can shows waht performance we could expect.&lt;BR /&gt;&lt;BR /&gt;I tried the MKL library, but performances are clearly worst than the simple mathematic function.&lt;BR /&gt;&lt;BR /&gt;Here is the source code (in FORTRAN 90, but someone used to C/C++ can easily understand it) :&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;&lt;BR /&gt;program test&lt;BR /&gt;&lt;BR /&gt;implicit none&lt;BR /&gt;&lt;BR /&gt;integer, parameter :: Nx1=100000000&lt;BR /&gt;real(8), dimension(1:Nx1) :: x,y&lt;BR /&gt;integer :: n&lt;BR /&gt;real(8) :: time1,time2,totaltime&lt;BR /&gt;&lt;BR /&gt;y(:) = 0.0d0&lt;BR /&gt;do n=1,Nx1&lt;BR /&gt;x(n) = dcos(n*3.14d0/13.89567d0)&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;call cpu_time(time1)&lt;BR /&gt;CALL vdExp(Nx1,x,y) !! INTEL MKL Subroutine&lt;BR /&gt;call cpu_time(time2)&lt;BR /&gt;totaltime=time2-time1&lt;BR /&gt;print *,"test 1 :",totaltime,sum(y)&lt;BR /&gt;&lt;BR /&gt;y(:) = 0.0d0&lt;BR /&gt;do n=1,Nx1&lt;BR /&gt;x(n) = dcos(n*3.14d0/13.89567d0)&lt;BR /&gt;end do&lt;BR /&gt;&lt;BR /&gt;call cpu_time(time1)&lt;BR /&gt;do n=1,Nx1&lt;BR /&gt;y(n) = dexp(x(n)) !! Standard Exponential function&lt;BR /&gt;end do&lt;BR /&gt;call cpu_time(time2)&lt;BR /&gt;totaltime=time2-time1&lt;BR /&gt;print *,"test 2 :",totaltime,sum(y)&lt;BR /&gt;&lt;BR /&gt;end program&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;In fact, it compute Nx=&lt;I&gt;100000000&lt;/I&gt; pseudo random exponential.&lt;BR /&gt;&lt;BR /&gt;I compiled with the following arguments (on a XEON Nehalem) :&lt;BR /&gt;ifort -O4 -xsse4.2 -mkl test.f90 &lt;BR /&gt;&lt;BR /&gt;And the results are the following :&lt;BR /&gt;test 1 : 1.35000000000000 126606589.225275 &lt;BR /&gt;test 2 : 0.420000000000000 126606589.225275 &lt;BR /&gt;&lt;BR /&gt;It is clear that non MKL calculation is far better (3 times faster).&lt;BR /&gt;&lt;BR /&gt;Do I made a mistake ?&lt;BR /&gt;&lt;BR /&gt;Ben</description>
      <pubDate>Tue, 06 Jul 2010 17:51:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784739#M1741</guid>
      <dc:creator>benoit_leveugle</dc:creator>
      <dc:date>2010-07-06T17:51:54Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784740#M1742</link>
      <description>Hi Ben,&lt;DIV&gt;it's not clear how did you link mkl?&lt;/DIV&gt;&lt;DIV&gt;--Gennady&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Jul 2010 04:43:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784740#M1742</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-07-07T04:43:54Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784741#M1743</link>
      <description>&lt;P&gt;Hi
Ben,&lt;/P&gt;

&lt;P&gt;Please forget
my first message.(:- I 've missed one important point: the input vector size is
very huge and hence, the behavior that you
observe, the expected.&lt;/P&gt;&lt;P&gt;Please look at the the VML performance chart&lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/vml/functions/exp.html"&gt;here&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;See the
middle chart  Performance vs Vector length, Exp function for Intel Core i7 (
LLC size 8 MB).When the
Vector length is around 10^6 we can see that CPE ceases to
decrease and begins to increase.This is due to the fact that the vector size (&lt;I&gt;Nx1&lt;/I&gt;&lt;I&gt;&lt;/I&gt;* sizeof(double)
) exceeds the size of last level cache ( which is 8 Mb in this case).&lt;/P&gt;&lt;P&gt;--Gennady&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jul 2010 06:23:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784741#M1743</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-07-07T06:23:19Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784742#M1744</link>
      <description>This is one of the reasons for preferring compiler auto-vectorization (such as ifort does, using svml library). When you split loops explicitly and don't allow your intermediate results to reside in L1 cache, it is quite likely that any gain from an optimized library will be lost in cache misses. If you have a reason for not allowing compiler optimization and prefer the VML calls, you could try blocking your loops for cache locality.&lt;BR /&gt;If you require only 3 digits precision, why are you using double precision?</description>
      <pubDate>Wed, 07 Jul 2010 15:03:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784742#M1744</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-07-07T15:03:22Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784743#M1745</link>
      <description>Thank you for your answers.&lt;BR /&gt;&lt;BR /&gt;I have found that poor performances where coming from the compilation. I had to use -mkl:sequential instead of -mkl only. I am using the serial version of the code. Do I need to do the same when I will run the MPI code (or use -mkl:parallel) ?&lt;BR /&gt;&lt;BR /&gt;Now, the LA test is 34% faster than normal exponential for a cumulated error of 5e-10.&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;Gennady&lt;/B&gt;&lt;BR /&gt;Considering the size of the vector : in our calculations the size per processors (MPI calculations) is something like 20 000 to 100 000 points to be computed. It is clear it's far too much according to the performances graphs. I think I will try to cut calculations in parts to comply with the 1000-2000 points length vectors.&lt;BR /&gt;See below.&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;tim18&lt;/B&gt;&lt;BR /&gt;What do you mean by "If you have a reason for not allowing compiler optimization" ?&lt;BR /&gt;In or code, the calculation is made as followed (this exponential operation cost a lot) :&lt;BR /&gt;&lt;BR /&gt;For each species (from 2 to ~250) :&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;do j=1,Nx2&lt;BR /&gt;do i=1,Nx1&lt;BR /&gt;W(i,j) = (A(i,j)**n) * (B(i,j)**m) * ... * dexp(T(i,j)*Cste)&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;So I was thinking about splitting like that :&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;call vdexp(i*j,T(:,:)*Cste,W(:,:))&lt;BR /&gt;&lt;BR /&gt;do j=1,Nx2&lt;BR /&gt;do i=1,Nx1&lt;BR /&gt;W(i,j) = (A(i,j)**n) * (B(i,j)**m) * ... * w(i,j)&lt;BR /&gt;end do&lt;BR /&gt;end do&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;Gennady&lt;BR /&gt;&lt;/B&gt;And splitting to comply with the performances graphs : (but I am not sure about that)&lt;BR /&gt;If we consider that the size of i is between 128 and 512, I thinks it's better.&lt;BR /&gt;&lt;I&gt;&lt;BR /&gt;
do j=1,Nx2&lt;BR /&gt;&lt;/I&gt;&lt;I&gt;call vdexp(i,T(:,j)*Cste,W(:,j))&lt;/I&gt;&lt;BR /&gt;&lt;I&gt;
do i=1,Nx1&lt;BR /&gt;
W(i,j) = (A(i,j)**n) * (B(i,j)**m) * ... * w(i,j)&lt;BR /&gt;
end do&lt;BR /&gt;
end do&lt;/I&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;tim18&lt;/B&gt;&lt;BR /&gt;In fact, I need something like 1e-10 precision to prevent the CFD code to explode, that is why I use double precision. I did not voluntary used non accurate values in the test. :)&lt;BR /&gt;</description>
      <pubDate>Thu, 08 Jul 2010 11:27:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784743#M1745</guid>
      <dc:creator>benoit_leveugle</dc:creator>
      <dc:date>2010-07-08T11:27:48Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784744#M1746</link>
      <description>mkl_sequential library is likely to be appropriate under MPI, if you don't see a benefit for MKL threading when running on a single node. This decision depends on many factors which haven't been discussed here.&lt;BR /&gt;I wished to point out that what you call "normal exponential" might well be optimized into svml vector calls by ifort, giving performance competitive with VML, without incurring the cache locality problem.</description>
      <pubDate>Thu, 08 Jul 2010 15:20:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784744#M1746</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-07-08T15:20:19Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784745#M1747</link>
      <description>I am sorry, but I fail to find how to use svml :(&lt;BR /&gt;&lt;BR /&gt;Do I need to install a complementary library (like I did for MKL), or do I need to add something specific at the compilation ?&lt;BR /&gt;I have found the page concerning C/C++ :&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/how-to-implement-the-short-vector-math-library/" target="_blank"&gt;http://software.intel.com/en-us/articles/how-to-implement-the-short-vector-math-library/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;I am really interested with this smvl if the prototype of the function 
is the same as the "normal exponential" function, but I cannot find anything valuable concerning Fortran, except the name of the library : libsvml.a&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 09 Jul 2010 07:37:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784745#M1747</guid>
      <dc:creator>benoit_leveugle</dc:creator>
      <dc:date>2010-07-09T07:37:12Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784746#M1748</link>
      <description>&lt;DIV id="_mcePaste"&gt;SVML ( short vector math lib) is not a part of MKL, but part of Intel Compiler.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;Look into ia32intrin.h, you can find there all API. You can find more details into Compiler documentation also.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;As an example, instead of vdExp() it will be _mm_cexp_ps(__m128 v1);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;But, as Tim18 already told, if you will use intel compiler you dont care about svml routes because of in your particularly cases, Intel complier will use svml routines. You can check it if you look in asm code.&lt;/DIV&gt;&lt;DIV&gt;--Gennady&lt;/DIV&gt;</description>
      <pubDate>Fri, 09 Jul 2010 13:11:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784746#M1748</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-07-09T13:11:58Z</dc:date>
    </item>
    <item>
      <title>Increase chemistry calculation performances, EXPONENTIAL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784747#M1749</link>
      <description>Sorry for the delay, I was not able to go on Internet this weekend.&lt;BR /&gt;&lt;BR /&gt;OK, I understand how it works now, it just replace the standard exponential routine. But even with this SVML, I still got better performances with the vdexp subroutine, so I think I will use this one.&lt;BR /&gt;&lt;BR /&gt;I still just have one question : when should I use -mkl:sequential, -mkl:cluster and -mkl:parallel ? I didn't find how to choose which one is the better...</description>
      <pubDate>Mon, 12 Jul 2010 11:49:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Increase-chemistry-calculation-performances-EXPONENTIAL/m-p/784747#M1749</guid>
      <dc:creator>benoit_leveugle</dc:creator>
      <dc:date>2010-07-12T11:49:10Z</dc:date>
    </item>
  </channel>
</rss>

