<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: dgetri performance issues in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895422#M10854</link>
    <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
In case correcting your assignment of lwork doesn't help:&lt;BR /&gt;It looks as if you are hitting cache capacity limit. Did you check cache events? It may be interesting, once you find which function is taking up time, to compile that one from source so as to analyze by VTune or PTU.&lt;BR /&gt;</description>
    <pubDate>Wed, 04 Nov 2009 13:57:40 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2009-11-04T13:57:40Z</dc:date>
    <item>
      <title>dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895419#M10851</link>
      <description>I have been using the dgetri and dgetrf functions on my machine, but the perforamnce that I have been getting on my random matracies has been extremely poor (5 g/flops running on all 4 cores)&lt;BR /&gt;&lt;BR /&gt;My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?&lt;BR /&gt;&lt;BR /&gt;Specs:&lt;BR /&gt;Intel Q6600 Core 2 Quad&lt;BR /&gt;4 GB DDR2 RAM&lt;BR /&gt;Ubuntu 9.04 x86_64&lt;BR /&gt;&lt;BR /&gt;Code (C):&lt;BR /&gt;&lt;BR /&gt;for(x=0;x&lt;NUMTRIALS&gt;&lt;/NUMTRIALS&gt; {&lt;BR /&gt; //Variables Needed to be reset for&lt;BR /&gt; M=j;&lt;BR /&gt; N=j;&lt;BR /&gt; LDA=M;&lt;BR /&gt; LWORK=N;&lt;BR /&gt; INFO=0;&lt;BR /&gt; createMatrix(&amp;amp;M, &amp;amp;N, &amp;amp;A);&lt;BR /&gt; IPIV=(MKL_INT *)malloc(M*sizeof(int));   &lt;BR /&gt; WORK=(double *)malloc(M*sizeof(double));&lt;BR /&gt;&lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; DGETRF( &amp;amp;M, &amp;amp;N, A, &amp;amp;LDA, IPIV, &amp;amp;INFO );&lt;BR /&gt; gettimeofday(&amp;amp;time_s, NULL);&lt;BR /&gt; DGETRI( &amp;amp;N, A, &amp;amp;LDA, IPIV, WORK, &amp;amp;LWORK, &amp;amp;INFO );&lt;BR /&gt; gettimeofday(&amp;amp;time_e, NULL);&lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt; cpuTime=0;&lt;BR /&gt; CPU_gflops=0;&lt;BR /&gt; temp=0;&lt;BR /&gt; cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec&lt;BR /&gt; -time_s.tv_usec)*1e-3;&lt;BR /&gt; //Found in lawn41 lapack manual for greatest term in O(n) notation, p121&lt;BR /&gt; temp = (1.0f*M*N*N);//O(2mn^2)&lt;BR /&gt; &lt;BR /&gt; CPU_gflops = (temp/cpuTime) * 1e-6;&lt;BR /&gt; avg_flops&lt;X&gt;=CPU_gflops;&lt;BR /&gt;&lt;BR /&gt; free(A);&lt;BR /&gt; free(IPIV);   &lt;BR /&gt; free(WORK);   &lt;BR /&gt;&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;Makefile:&lt;BR /&gt;&lt;BR /&gt;FC = ifort&lt;BR /&gt;CC = icc&lt;BR /&gt;FCFLAGS = -O3 -cm -w&lt;BR /&gt;CCFLAGS = -O3&lt;BR /&gt;CXXDIR = /opt/intel/Compiler/11.1/038&lt;BR /&gt;LIBDIR:= $(CXXDIR)/mkl/lib/em64t&lt;BR /&gt;LIBS:=  $(LIBDIR)/libmkl_intel_lp64.a&lt;BR /&gt;LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread&lt;BR /&gt;&lt;BR /&gt;OBJECTS =     makematrix.o \&lt;BR /&gt; MatrixMath.o&lt;BR /&gt; &lt;BR /&gt; &lt;BR /&gt;&lt;BR /&gt; &lt;BR /&gt;DGETRI :  $(OBJECTS) DGETRIDriver.o&lt;BR /&gt; $(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;This is my first time working with MKL, so any help is appreciated, Thanks!&lt;BR /&gt;&lt;BR /&gt;Matt&lt;BR /&gt;&lt;/X&gt;</description>
      <pubDate>Tue, 03 Nov 2009 01:18:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895419#M10851</guid>
      <dc:creator>hpc-matt</dc:creator>
      <dc:date>2009-11-03T01:18:45Z</dc:date>
    </item>
    <item>
      <title>Re: dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895420#M10852</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/450346"&gt;hpc-matt&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;I have been using the dgetri and dgetrf functions on my machine, but the perforamnce that I have been getting on my random matracies has been extremely poor (5 g/flops running on all 4 cores)&lt;BR /&gt;&lt;BR /&gt;My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?&lt;BR /&gt;&lt;BR /&gt;Specs:&lt;BR /&gt;Intel Q6600 Core 2 Quad&lt;BR /&gt;4 GB DDR2 RAM&lt;BR /&gt;Ubuntu 9.04 x86_64&lt;BR /&gt;&lt;BR /&gt;Code (C):&lt;BR /&gt;&lt;BR /&gt;for(x=0;x&lt;NUMTRIALS&gt;&lt;/NUMTRIALS&gt;{&lt;BR /&gt;//Variables Needed to be reset for&lt;BR /&gt;M=j;&lt;BR /&gt;N=j;&lt;BR /&gt;LDA=M;&lt;BR /&gt;LWORK=N;&lt;BR /&gt;INFO=0;&lt;BR /&gt;createMatrix(&amp;amp;M, &amp;amp;N, &amp;amp;A);&lt;BR /&gt;IPIV=(MKL_INT *)malloc(M*sizeof(int)); &lt;BR /&gt;WORK=(double *)malloc(M*sizeof(double));&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;DGETRF( &amp;amp;M, &amp;amp;N, A, &amp;amp;LDA, IPIV, &amp;amp;INFO );&lt;BR /&gt;gettimeofday(&amp;amp;time_s, NULL);&lt;BR /&gt;DGETRI( &amp;amp;N, A, &amp;amp;LDA, IPIV, WORK, &amp;amp;LWORK, &amp;amp;INFO );&lt;BR /&gt;gettimeofday(&amp;amp;time_e, NULL);&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;cpuTime=0;&lt;BR /&gt;CPU_gflops=0;&lt;BR /&gt;temp=0;&lt;BR /&gt;cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec&lt;BR /&gt;-time_s.tv_usec)*1e-3;&lt;BR /&gt;//Found in lawn41 lapack manual for greatest term in O(n) notation, p121&lt;BR /&gt;temp = (1.0f*M*N*N);//O(2mn^2)&lt;BR /&gt;&lt;BR /&gt;CPU_gflops = (temp/cpuTime) * 1e-6;&lt;BR /&gt;avg_flops&lt;X&gt;=CPU_gflops;&lt;BR /&gt;&lt;BR /&gt;free(A);&lt;BR /&gt;free(IPIV); &lt;BR /&gt;free(WORK); &lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Makefile:&lt;BR /&gt;&lt;BR /&gt;FC = ifort&lt;BR /&gt;CC = icc&lt;BR /&gt;FCFLAGS = -O3 -cm -w&lt;BR /&gt;CCFLAGS = -O3&lt;BR /&gt;CXXDIR = /opt/intel/Compiler/11.1/038&lt;BR /&gt;LIBDIR:= $(CXXDIR)/mkl/lib/em64t&lt;BR /&gt;LIBS:= $(LIBDIR)/libmkl_intel_lp64.a&lt;BR /&gt;LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread&lt;BR /&gt;&lt;BR /&gt;OBJECTS = makematrix.o &lt;BR /&gt;MatrixMath.o&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;DGETRI : $(OBJECTS) DGETRIDriver.o&lt;BR /&gt;$(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;This is my first time working with MKL, so any help is appreciated, Thanks!&lt;BR /&gt;&lt;BR /&gt;Matt&lt;BR /&gt;&lt;/X&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Matt,&lt;BR /&gt;it will depends on the size of task you are running on these 4 cores.&lt;BR /&gt;Intel Math Kernel Library (Intel MKL) offers highly optimized routines for middle and large input sizes.&lt;BR /&gt;For you reference, please see &lt;BR /&gt;&lt;A href="http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf" target="_blank"&gt;http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf&lt;/A&gt;&lt;BR /&gt;you can find there some performance data for dgetrf of MKL vs Atlas.&lt;BR /&gt;--Gennady&lt;BR /&gt;</description>
      <pubDate>Tue, 03 Nov 2009 04:49:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895420#M10852</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2009-11-03T04:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895421#M10853</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/334681"&gt;Gennady Fedorov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Matt,&lt;BR /&gt;it will depends on the size of task you are running on these 4 cores.&lt;BR /&gt;Intel Math Kernel Library (Intel MKL) offers highly optimized routines for middle and large input sizes.&lt;BR /&gt;For you reference, please see &lt;BR /&gt;&lt;A href="http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf" target="_blank"&gt;http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf&lt;/A&gt;&lt;BR /&gt;you can find there some performance data for dgetrf of MKL vs Atlas.&lt;BR /&gt;--Gennady&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I am using matrcies of dimension 2k ~12k. I have been benchmarking my machine, and the dgetrf routine is about he same as the standard benchamrks, however the DGETRI funciton is underperforming substatially. I realize the runtime complexity is on the order of O(n*m^2), but still, if i can get 30+ g/Flops for dgetrf, I should be able ot get half of that using the dgetri. I am currently getting around 3gflops, with decreasing performance as size increases. It also does not matter if I am using fortran or C. Thanks!&lt;BR /&gt;</description>
      <pubDate>Tue, 03 Nov 2009 17:32:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895421#M10853</guid>
      <dc:creator>hpc-matt</dc:creator>
      <dc:date>2009-11-03T17:32:45Z</dc:date>
    </item>
    <item>
      <title>Re: dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895422#M10854</link>
      <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
In case correcting your assignment of lwork doesn't help:&lt;BR /&gt;It looks as if you are hitting cache capacity limit. Did you check cache events? It may be interesting, once you find which function is taking up time, to compile that one from source so as to analyze by VTune or PTU.&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Nov 2009 13:57:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895422#M10854</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-04T13:57:40Z</dc:date>
    </item>
    <item>
      <title>Re: dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895423#M10855</link>
      <description>&lt;DIV style="margin:0px;"&gt;Matt,&lt;BR /&gt;&lt;BR /&gt;There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:&lt;BR /&gt;&lt;BR /&gt;int MONE=-1;&lt;BR /&gt;double LWKOPT;&lt;BR /&gt;DGETRI( &amp;amp;N, A, &amp;amp;LDA, IPIV, &amp;amp;LWKOPT, &amp;amp;MONE, &amp;amp;INFO );&lt;BR /&gt;LWORK=(int)LWKOPT;&lt;BR /&gt;&lt;BR /&gt;Please also point attention that in your example instead of&lt;BR /&gt;WORK=(double *)malloc(M*sizeof(double));&lt;BR /&gt;should be:&lt;BR /&gt;WORK=(double *)malloc(LWORK*sizeof(double));&lt;BR /&gt;&lt;BR /&gt;--Alexander&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Thu, 05 Nov 2009 06:33:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895423#M10855</guid>
      <dc:creator>Alexander_K_Intel3</dc:creator>
      <dc:date>2009-11-05T06:33:53Z</dc:date>
    </item>
    <item>
      <title>Re: dgetri performance issues</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895424#M10856</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93635"&gt;Alexander Kobotov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;Matt,&lt;BR /&gt;&lt;BR /&gt;There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:&lt;BR /&gt;&lt;BR /&gt;int MONE=-1;&lt;BR /&gt;double LWKOPT;&lt;BR /&gt;DGETRI( &amp;amp;N, A, &amp;amp;LDA, IPIV, &amp;amp;LWKOPT, &amp;amp;MONE, &amp;amp;INFO );&lt;BR /&gt;LWORK=(int)LWKOPT;&lt;BR /&gt;&lt;BR /&gt;Please also point attention that in your example instead of&lt;BR /&gt;WORK=(double *)malloc(M*sizeof(double));&lt;BR /&gt;should be:&lt;BR /&gt;WORK=(double *)malloc(LWORK*sizeof(double));&lt;BR /&gt;&lt;BR /&gt;--Alexander&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;
&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
Thanks, that did improve my results substatially. I am working on getting VTune setup and working now. Thanks all for your help.&lt;BR /&gt;&lt;BR /&gt;Matt&lt;BR /&gt;</description>
      <pubDate>Tue, 10 Nov 2009 23:39:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/dgetri-performance-issues/m-p/895424#M10856</guid>
      <dc:creator>hpc-matt</dc:creator>
      <dc:date>2009-11-10T23:39:53Z</dc:date>
    </item>
  </channel>
</rss>

