<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic LU factorisation with OpenMP threaded functions(dgetrf) de MKL in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775863#M1015</link>
    <description>According to my understanding of the way you set this up, you are asking each thread to execute dgetrf, which could be an excellent strategy provided that the argument arrays are distinct in each thread.. If you don't set omp nested, which defaults off, MKL will not start new threads inside dgetrf.&lt;BR /&gt;If you want dgetrf to work on a single data set, using additional threads internal to itself, you would call it outside a parallel region, as Todd said.&lt;BR /&gt;If you don't have enough parallel cases to use all your cores, you could set omp nested; you would want to give each problem its own contiguous group of cores, using both the OpenMP and MKL settings for thread numbers suggested by Todd. You would want to get it working first without omp nested so as to have a basis for comparison.</description>
    <pubDate>Fri, 05 Aug 2011 15:57:04 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2011-08-05T15:57:04Z</dc:date>
    <item>
      <title>LU factorisation with OpenMP threaded functions(dgetrf) de MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775862#M1014</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I have a little problem but i cannot find any solutions after hours of searching on Internet. Maybe i misunderstood some concepts.&lt;BR /&gt;&lt;BR /&gt;I want to increase the speed of LU factorisation of a matrix A of a system A*x=b. With the OpenMP threaded version of DGETRF, if i don't misunderstand, this is included in the MKL. I don't know how to put it in my openmp code, and how to use it. This is what i do at the moment :&lt;BR /&gt;&lt;BR /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;BR /&gt; dim_tot = (M+1)*(N+1)&lt;BR /&gt; &lt;BR /&gt; !$OMP PARALLEL PRIVATE(rang)&lt;BR /&gt;&lt;BR /&gt; !$OMP SINGLE&lt;BR /&gt; nthr = OMP_GET_NUM_THREADS()&lt;BR /&gt; print*, "******"&lt;BR /&gt; print*, "Nomber of threads being used = ", nthr&lt;BR /&gt; !$OMP END SINGLE&lt;BR /&gt;&lt;BR /&gt; rang = OMP_GET_THREAD_NUM()&lt;BR /&gt; print *,"Thread No.",rang," is working !"&lt;BR /&gt;&lt;BR /&gt; !! Decomposition LU &lt;BR /&gt; call DGETRF( dim_tot, dim_tot, G, dim_tot, IPIVG, INFOG )&lt;BR /&gt;&lt;BR /&gt; !$OMP END PARALLEL &lt;BR /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;BR /&gt;&lt;BR /&gt;The Makefile looks like :&lt;BR /&gt;&lt;BR /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;BR /&gt;&lt;BR /&gt;MKLPATH = /opt/intel/composerxe-2011.2.137/mkl/lib/intel64&lt;BR /&gt;MKLINCLUDE = /opt/intel/composerxe-2011.2.137/mkl/include&lt;BR /&gt;&lt;BR /&gt;# Compilateur&lt;BR /&gt;IFORT = ifort&lt;BR /&gt;&lt;BR /&gt;IOPT = -c -openmp&lt;BR /&gt;&lt;BR /&gt;# Linkers&lt;BR /&gt;ILINK = -L$(MKLPATH) -I$(MKLINCLUDE) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread&lt;BR /&gt;&lt;BR /&gt;# edition de liens&lt;BR /&gt;code_omp : code_omp.o&lt;BR /&gt; $(IFORT) -o code_omp code_omp.o $(ILINK)&lt;BR /&gt;&lt;BR /&gt;# compilation&lt;BR /&gt;code_omp.o : code_omp.f90&lt;BR /&gt; $(IFORT) $(IOPT) code_omp.f90&lt;BR /&gt;!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&lt;BR /&gt;&lt;BR /&gt;I think the LU factorisation ("dgetrf") asks more time than the solving precedure ("dgetrs"), no?&lt;BR /&gt;That's why i want to threading the "dgetrf", instead of "dgetrs".&lt;BR /&gt;Can someone give me some inspirations?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;-- Xin</description>
      <pubDate>Fri, 05 Aug 2011 14:42:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775862#M1014</guid>
      <dc:creator>cfd_jinx</dc:creator>
      <dc:date>2011-08-05T14:42:21Z</dc:date>
    </item>
    <item>
      <title>LU factorisation with OpenMP threaded functions(dgetrf) de MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775863#M1015</link>
      <description>According to my understanding of the way you set this up, you are asking each thread to execute dgetrf, which could be an excellent strategy provided that the argument arrays are distinct in each thread.. If you don't set omp nested, which defaults off, MKL will not start new threads inside dgetrf.&lt;BR /&gt;If you want dgetrf to work on a single data set, using additional threads internal to itself, you would call it outside a parallel region, as Todd said.&lt;BR /&gt;If you don't have enough parallel cases to use all your cores, you could set omp nested; you would want to give each problem its own contiguous group of cores, using both the OpenMP and MKL settings for thread numbers suggested by Todd. You would want to get it working first without omp nested so as to have a basis for comparison.</description>
      <pubDate>Fri, 05 Aug 2011 15:57:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775863#M1015</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2011-08-05T15:57:04Z</dc:date>
    </item>
    <item>
      <title>LU factorisation with OpenMP threaded functions(dgetrf) de MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775864#M1016</link>
      <description>Hello Xin,&lt;BR /&gt;&lt;BR /&gt;The function DGETRF is in Intel MKL and has been threaded so that you can use it from your program and get parallelism without the use of any OpenMP*directives in your code. All you need to do is call DGETRF and since threading is turned on to use as many cores as are available you should see MKL parallelism at work!&lt;BR /&gt;&lt;BR /&gt;Note: Threading may not be used if the matrix is too small to be efficiently divided among threads. You can use the omp_set_num_threads() or &lt;A href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_manual_win_mac/hh_goto.htm#support/functn_mkl_set_num_threads.htm"&gt;mkl_set_num_threads()&lt;/A&gt; functions to change the number of threads and see if you can note the change in performance or see how the load changes in a performance monitoring tool.&lt;BR /&gt;&lt;BR /&gt;Todd</description>
      <pubDate>Fri, 05 Aug 2011 16:00:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775864#M1016</guid>
      <dc:creator>Todd_R_Intel</dc:creator>
      <dc:date>2011-08-05T16:00:14Z</dc:date>
    </item>
    <item>
      <title>LU factorisation with OpenMP threaded functions(dgetrf) de MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775865#M1017</link>
      <description>Thanks a lot for both of you for the quick responses.&lt;BR /&gt;&lt;BR /&gt;I've re-tried the library again today. Seems that nothing has to be changed in the code, but just the compilation's linker line.&lt;BR /&gt;&lt;BR /&gt;After comparaisons, i found that the mkl improves &lt;SPAN style="text-decoration: underline;"&gt;A LOT&lt;/SPAN&gt; the calculation performances. Gnial !&lt;BR /&gt;&lt;BR /&gt;More details : with 8 processors, a matrix of 16200*16200 and one right-hand side, the total calculation time of (dgetrf+dgetrs) has been reduced 10 times. Normal?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Xin</description>
      <pubDate>Thu, 11 Aug 2011 09:50:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775865#M1017</guid>
      <dc:creator>cfd_jinx</dc:creator>
      <dc:date>2011-08-11T09:50:18Z</dc:date>
    </item>
    <item>
      <title>LU factorisation with OpenMP threaded functions(dgetrf) de MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775866#M1018</link>
      <description>10 times faster than what? Are you comparing with the sequential MKL? Do you have 8 physical processors or you are using Hyper-Threading? Anyway, 10 times faster doesn't seem to be reasonable. In best case you will get linear speedup. What you have got is somehow super-linearity, but it happens very rarely and usually when you are processing large amount of data. Your matrix is not that large. You can also validate the results to make sure that you have called the routines correctly.&lt;BR /&gt;&lt;BR /&gt;D.&lt;BR /&gt;</description>
      <pubDate>Thu, 11 Aug 2011 11:07:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LU-factorisation-with-OpenMP-threaded-functions-dgetrf-de-MKL/m-p/775866#M1018</guid>
      <dc:creator>Dan4</dc:creator>
      <dc:date>2011-08-11T11:07:02Z</dc:date>
    </item>
  </channel>
</rss>

