<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MKL: Significantly reduced efficiency in parallel regime in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1652984#M36784</link>
    <description>&lt;P&gt;Thank you for posting your question.&lt;/P&gt;
&lt;P&gt;Please provide out a simple reproducer, as well as tell us your test environment(OS, CPU platforms, oneAPI version, etc.), and the steps to reproduce your issue. Also it's good if you test again on the latest oneAPI version(2025.0.1)&lt;/P&gt;</description>
    <pubDate>Mon, 30 Dec 2024 01:36:25 GMT</pubDate>
    <dc:creator>Ruqiu_C_Intel</dc:creator>
    <dc:date>2024-12-30T01:36:25Z</dc:date>
    <item>
      <title>MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1652867#M36778</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using Intel-oneapi-mkl/2024 to calculate inverse double complex matrix using&lt;/P&gt;&lt;P&gt;getrf/getri programs. &amp;nbsp;&lt;/P&gt;&lt;P&gt;On one core, the calculation time is 0.10 s&lt;/P&gt;&lt;P&gt;However in parallel regime using OpenMP parallelization, the calculation time is significantly increased:&lt;/P&gt;&lt;P&gt;OMP=2 calc.time = 0.16s&lt;/P&gt;&lt;P&gt;OMP=32 calc.time = 10.1s&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The size of the complex(8) matrix is 484x484&lt;/P&gt;&lt;P&gt;The program was compiled in hybrid MPI/OpenMP regime and used for one MPI process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 29 Dec 2024 04:25:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1652867#M36778</guid>
      <dc:creator>DmitrySkachkov</dc:creator>
      <dc:date>2024-12-29T04:25:11Z</dc:date>
    </item>
    <item>
      <title>Re: MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1652984#M36784</link>
      <description>&lt;P&gt;Thank you for posting your question.&lt;/P&gt;
&lt;P&gt;Please provide out a simple reproducer, as well as tell us your test environment(OS, CPU platforms, oneAPI version, etc.), and the steps to reproduce your issue. Also it's good if you test again on the latest oneAPI version(2025.0.1)&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2024 01:36:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1652984#M36784</guid>
      <dc:creator>Ruqiu_C_Intel</dc:creator>
      <dc:date>2024-12-30T01:36:25Z</dc:date>
    </item>
    <item>
      <title>Re: MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1653173#M36788</link>
      <description>&lt;P&gt;The following code makes inverse of complex matrix using MKL getri program. The version of Intel compiler is&amp;nbsp;2024.0.2 20231213&lt;/P&gt;&lt;P&gt;Calculation time&amp;nbsp;&lt;/P&gt;&lt;P&gt;OMP_NUM_THREADS=1 &amp;nbsp;&lt;STRONG&gt; 22.29s&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;OMP_NUM_THREADS=2 &amp;nbsp; &lt;STRONG&gt;74.69s&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;OMP_NUM_THREADS=4 &amp;nbsp; &lt;STRONG&gt;314.15s&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;The program was compiled with the following options:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ifx -o test.x program.o -qmkl-ilp64=parallel  /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_blas95_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_lapack95_ilp64.a -Wl,
--start-group /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_thread.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;  Module MKL95
   include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_lapack.fi'
   include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_blas.fi'
  end module MKL95


MODULE F95_PRECISION
    INTEGER, PARAMETER :: SP = KIND(1.0E0)
    INTEGER, PARAMETER :: DP = KIND(1.0D0)
END MODULE F95_PRECISION


INCLUDE 'mkl_lapack.f90'
INCLUDE 'mkl_blas.f90'


 module data
   integer, parameter         :: N = 400
   complex(8)                 :: AA(N,N)
   integer                    :: ipvt(N)
 contains
  subroutine set_A
   integer     :: i,j
   real        :: r1(N)
   real        :: r2(N)
   do i=1,N
    call random_number(r1) 
    call random_number(r2) 
    do j=1,N
     AA(i,j)%re = r1(i)
     AA(i,j)%im = r2(j)
    enddo 
   enddo    
   do i=1,N
    AA(i,i) = (1.d0,0.1d0)
   enddo
   do i=1,N
    do j=i+1,N
     AA(i,j) = AA(j,i)
    enddo
   enddo
  end subroutine set_A
 end module data



  Program A
   use F95_PRECISION
   use MKL95
   use LAPACK95
   use BLAS95
   use data
   use omp_lib
   integer                    :: info   
   real(8)                    :: t0,tf

   call cpu_time(t0)
   do i=1,1000
    call set_A
    call getrf(AA,ipvt,info)
    if(info/=0) print *,' getrf: info=',info
    call getri(AA,ipvt,info)
    if(info/=0) print *,' getri: info=',info
   enddo
   call cpu_time(tf)
   print *,'  t_calc=',tf-t0
   
  end program A&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2024 21:23:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1653173#M36788</guid>
      <dc:creator>DmitrySkachkov</dc:creator>
      <dc:date>2024-12-30T21:23:43Z</dc:date>
    </item>
    <item>
      <title>Re:MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1655425#M36814</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;There is an issue with the way the timing is being measured. When using CPU_TIME, the time returned is the sum of all active threads' CPU times, not the actual wall time.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;To estimate the performance or scaling of multithreaded applications, you should use the intrinsic subroutine SYSTEM_CLOCK or the portability function DCLOCK. Both of these routines return the elapsed time from a single clock.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Also, please consider upgrading to oneAPI version 2025.0, as I won't be able to provide advice for earlier versions.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Aleksandra&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Jan 2025 12:54:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1655425#M36814</guid>
      <dc:creator>Aleksandra_K</dc:creator>
      <dc:date>2025-01-08T12:54:12Z</dc:date>
    </item>
    <item>
      <title>Re:MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1657221#M36838</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Did you have a chance to try any of the suggested methods for measuring wall time?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 15 Jan 2025 12:30:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1657221#M36838</guid>
      <dc:creator>Aleksandra_K</dc:creator>
      <dc:date>2025-01-15T12:30:15Z</dc:date>
    </item>
    <item>
      <title>Re:MKL: Significantly reduced efficiency in parallel regime</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1658666#M36848</link>
      <description>&lt;P&gt;I am closing this issue due to lack of response and will no longer monitor this thread. If you need further assistance, please start a new thread.&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 20 Jan 2025 12:20:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Significantly-reduced-efficiency-in-parallel-regime/m-p/1658666#M36848</guid>
      <dc:creator>Aleksandra_K</dc:creator>
      <dc:date>2025-01-20T12:20:45Z</dc:date>
    </item>
  </channel>
</rss>

