<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Mathieu! in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149090#M26947</link>
    <description>&lt;P style="margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Hi Mathieu!&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;&amp;gt;Is mkl fftw3 wrapper completely thread safe ?&lt;BR /&gt;
	Generally speaking - NO. But there are cases when plan creation will work correctly.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;First of all, a pointer-to-the-plan should be defined for each thread personally. In other words, fftw_paln should be defined INSIDE a custom OMP loop, otherwise the behavior is undefined. This is a requirement.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;The only shared object during fftw plan creation in Intel(R) MKL FFTW3 wrappers is a special structure defined in fftw_version.c file. The critical variable there is nthreads - the number of threads used during plan computation. Both thread-safety and functional correctness depend on the value of this variable.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;First case: user wants to run each plan with one thread only (sequential) case. This should work. We recommend to link an application with mkl_sequential library to avoid possible side effects.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Second case: user wants to run each plan with a constant number of several threads (OMP nested) case. This case may work under limitations. To activate several threads for a compute section in a custom OMP loop, user needs to link an application with the mkl_xxx_thread library and do a call to the "mkl_set_num_threads_local(numThreads)" function. Otherwise, the behavior should be the same as in first case. numThreads should be the same for all threads. It's desired to avoid an over-subscription that's why:&lt;BR /&gt;
	(user's # of threads for OMP loop) * (# of Intel(R) MKL threads) &amp;lt;= (# of available machine threads);&lt;BR /&gt;
	If this configuration is setup, user is also required to set environment variable named "OMP_NESTED=true". Otherwise, functional correctness is not guaranteed.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Third case: user wants to run each plan with different number of several threads (complicated OMP nested) case. This case doesn't work because the actual number of threads will be overwritten after each thread creates a new plan. The behavior of this configuration is undefined.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Intel(R) MKL FFT team may provide examples that describe both cases and how to write them in one of the next releases by request.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Thank you.&lt;/P&gt;</description>
    <pubDate>Tue, 24 Apr 2018 04:00:46 GMT</pubDate>
    <dc:creator>Dmitry_Z_Intel1</dc:creator>
    <dc:date>2018-04-24T04:00:46Z</dc:date>
    <item>
      <title>MKL fftw3 thread safety</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149085#M26942</link>
      <description>&lt;P&gt;Is mkl fftw3 wrapper completely thread safe ?&lt;BR /&gt;
	I suppose that it respects at least fftw3 thread safety. That mean basically everything, but not the plan creation.&lt;/P&gt;

&lt;P&gt;MKL interface makes the plan creation thread safe ? If we need to create a plan for each thread of KNL with a lock around the plan creation, it will take ages !&lt;/P&gt;</description>
      <pubDate>Mon, 05 Mar 2018 09:05:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149085#M26942</guid>
      <dc:creator>MGRAV</dc:creator>
      <dc:date>2018-03-05T09:05:28Z</dc:date>
    </item>
    <item>
      <title>Hi Mathieu， </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149086#M26943</link>
      <description>&lt;P&gt;Hi Mathieu，&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The answer looks a little complex. Let's analyse the situation :&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;First factor&lt;/STRONG&gt;:&amp;nbsp;&amp;nbsp;in FFTW website &amp;nbsp;it claim some consideration about &amp;nbsp;&lt;A href="http://www.fftw.org/doc/Thread-safety.html"&gt;thread-safety of the fftw_execute function&lt;/A&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;All other routines (e.g. the planner) should only be called from one thread at a time. So, for example, you can wrap a semaphore lock around any calls to the planner; even more simply, you can just create all of your plans from one thread. We do not think this should be an important restriction (FFTW is designed for the situation where the only performance-sensitive code is the actual execution of the transform), and the benefits of shared data between plans are great.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;The FFTW planner is intended to be called from a single thread&lt;/STRONG&gt;. If you really must call it from multiple threads, you are expected to grab whatever lock makes sense for your application, with the understanding that you may be holding that lock for a long time, which is undesirable.&lt;/P&gt;

&lt;P&gt;Which means FFTW planner is called from a single thread, then thread-safety.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Second factor&lt;/STRONG&gt;: In mkl user guide:&amp;nbsp; intel MKL is thread-safe, (except the LAPACK deprecated routine ?lacon) work correctly during simultaneous execution by multiple threads. In particular, any chunk of&amp;nbsp;threaded Intel MKL code provides access for multiple threads to the same shared data, while permitting only one thread at any given time to access a shared piece of data. Therefore, you can call Intel MKL from multiple threads and not worry about the function instances interfering with each other.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;For FFTW wrapper,&amp;nbsp; we haven't&amp;nbsp; changed the functionality of&amp;nbsp; FFTW wrapper planner&amp;nbsp; part,&amp;nbsp; if the FFTW plan was implemented in sequential,then there is no thread-safe issue.&amp;nbsp; .&amp;nbsp;&lt;/P&gt;

&lt;P&gt;So the question may be how do you implement your multi thread?&amp;nbsp; Could you please&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;describle&lt;/SPAN&gt; &amp;nbsp;your&amp;nbsp; FFTW usage scene?&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;for example, If it is MPI , then there is no thread-safety problem.&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;And you mentioned "If we need to create a plan for each thread of KNL",&amp;nbsp; how&amp;nbsp; many thread do you compute at the same time and how do you link MKL?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Mar 2018 01:42:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149086#M26943</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-03-12T01:42:00Z</dc:date>
    </item>
    <item>
      <title>Hi Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149087#M26944</link>
      <description>&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Hi Ying,&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;thanks for your answer and all this information, with a special thanks to point out to me that "destroy_plan" is neither thread-safe&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Currently, the multithreading is implemented with OpenMP, under some specific condition I can use fftw_execute_dft for each thread, with a single plan (when all the data have the same size). But in the general approach, each thread has his own plan where the size of the data can be different for each thread. Currently, I use an OMP critical section, so basically a lock, over 3 plans that I need to have for each thread.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;The creation of each plan is linked to his memory allocation and initialization - with first touch policy-, and done with the thread that will use it later.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;The number of threads depends on the size of the problem, that can be nD and data can start from 5k, up to 30M, or even more. I have many parallelization levels, bigger is the data, more I parallelize the FFT to don’t exceed MCDRAM capacity.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Test shows that for small 2D image like 256*256 128 threads is slightly more efficient than 64. However, the increase in numbers of plans (in critical section) destroy all benefit.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;I am not sure to well understanding:&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;"For FFTW wrapper,&amp;nbsp; we haven't&amp;nbsp; changed the functionality of&amp;nbsp; FFTW wrapper planner&amp;nbsp; part,&amp;nbsp; if the FFTW plan was implemented in sequential,then there is no thread-safe issue."&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;I suppose that the FFTW wrapper use in background DftiCreateDescriptor, no ? I suppose that DftiCreateDescriptor is thread-safe, no ?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Mar 2018 14:46:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149087#M26944</guid>
      <dc:creator>MGRAV</dc:creator>
      <dc:date>2018-03-12T14:46:32Z</dc:date>
    </item>
    <item>
      <title>Hi Mathieu,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149088#M26945</link>
      <description>&lt;P&gt;Hi Mathieu,&lt;/P&gt;

&lt;P&gt;Right, the FFT wrapper use in background dfticreateDescriptor.&amp;nbsp;the related&amp;nbsp;part&amp;nbsp;should be thread-safe.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;You mentioned,&amp;nbsp; under some specific condition I can use fftw_execute_dft for each thread, with a single plan (when all the data have the same size). But in the general approach, each thread has his own plan where the size of the data can be different for each thread,&amp;nbsp; if thus, why the lock needed, it supposed ok to use in parallel.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;moreover, how do you link MKL sequential or parallel and which OpenMP (for example intel&amp;nbsp;implemented openmp or other)&amp;nbsp;?&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Mar 2018 08:53:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149088#M26945</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-03-13T08:53:32Z</dc:date>
    </item>
    <item>
      <title>Hi Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149089#M26946</link>
      <description>&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Hi Ying,&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;The execution is always thread-safe regarding FFTW documentation.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;My unique thread-safety-issues is just the creation of plans.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;I link with the default mkl, so parallel version, and I use parallelized FFT too.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica; min-height: 14px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Best,&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Helvetica;"&gt;Mathieu&lt;/P&gt;</description>
      <pubDate>Tue, 13 Mar 2018 14:27:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149089#M26946</guid>
      <dc:creator>MGRAV</dc:creator>
      <dc:date>2018-03-13T14:27:18Z</dc:date>
    </item>
    <item>
      <title>Hi Mathieu!</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149090#M26947</link>
      <description>&lt;P style="margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Hi Mathieu!&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;&amp;gt;Is mkl fftw3 wrapper completely thread safe ?&lt;BR /&gt;
	Generally speaking - NO. But there are cases when plan creation will work correctly.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;First of all, a pointer-to-the-plan should be defined for each thread personally. In other words, fftw_paln should be defined INSIDE a custom OMP loop, otherwise the behavior is undefined. This is a requirement.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;The only shared object during fftw plan creation in Intel(R) MKL FFTW3 wrappers is a special structure defined in fftw_version.c file. The critical variable there is nthreads - the number of threads used during plan computation. Both thread-safety and functional correctness depend on the value of this variable.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;First case: user wants to run each plan with one thread only (sequential) case. This should work. We recommend to link an application with mkl_sequential library to avoid possible side effects.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Second case: user wants to run each plan with a constant number of several threads (OMP nested) case. This case may work under limitations. To activate several threads for a compute section in a custom OMP loop, user needs to link an application with the mkl_xxx_thread library and do a call to the "mkl_set_num_threads_local(numThreads)" function. Otherwise, the behavior should be the same as in first case. numThreads should be the same for all threads. It's desired to avoid an over-subscription that's why:&lt;BR /&gt;
	(user's # of threads for OMP loop) * (# of Intel(R) MKL threads) &amp;lt;= (# of available machine threads);&lt;BR /&gt;
	If this configuration is setup, user is also required to set environment variable named "OMP_NESTED=true". Otherwise, functional correctness is not guaranteed.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Third case: user wants to run each plan with different number of several threads (complicated OMP nested) case. This case doesn't work because the actual number of threads will be overwritten after each thread creates a new plan. The behavior of this configuration is undefined.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Intel(R) MKL FFT team may provide examples that describe both cases and how to write them in one of the next releases by request.&lt;/P&gt;

&lt;P style="margin-top: 10px; margin-bottom: 0px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; font-size: 14px; background-color: rgb(245, 245, 245);"&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Apr 2018 04:00:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149090#M26947</guid>
      <dc:creator>Dmitry_Z_Intel1</dc:creator>
      <dc:date>2018-04-24T04:00:46Z</dc:date>
    </item>
    <item>
      <title>Hi Dimitry,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149091#M26948</link>
      <description>&lt;P&gt;Hi Dimitry,&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;gt;Generally speaking - NO. But there are cases when plan creation will work correctly.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I like this type of answer !&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;gt;First of all, a pointer-to-the-plan should be defined for each thread personally. In other words, fftw_paln should be defined INSIDE a custom OMP loop, otherwise the behavior is undefined. This is a requirement.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;What does mean exactly, I cannot create a plan with another thread ? Memory is memory, no ? Or just I need to have a different plan for each thread, but I can create all of them in the main thread.&lt;BR /&gt;
	Currently, I have the creation of plans and their usage in two different OMP section, I don't see why it would-be an issue.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;gt;First case: user wants to run each plan with one thread only (sequential) case. This should work. We recommend to link an application with mkl_sequential library to avoid possible side effects.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Performance aren't similar between mkl_sequential and mkl_parallel using a single thread ?&lt;/P&gt;

&lt;P&gt;Should I use mkl_set_num_threads_local() in each thread, or can I set it in the main thread before the OMP section, and have the value affecting each thread ?&amp;nbsp;&lt;BR /&gt;
	Interestingly mkl_set_num_threads() work fine for me !&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thanks a lot for your help,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Mathieu&lt;/P&gt;</description>
      <pubDate>Mon, 30 Apr 2018 10:56:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149091#M26948</guid>
      <dc:creator>MGRAV</dc:creator>
      <dc:date>2018-04-30T10:56:01Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149092#M26949</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi Mathieu,&lt;BR /&gt;
	&lt;BR /&gt;
	&amp;gt;What does mean exactly, I cannot create a plan with another thread ? Memory is memory, no ? Or just I need to have a different plan for each thread, but I can create all of them in the main thread.&lt;BR /&gt;
	&lt;BR /&gt;
	​[Ying]Yes, it should be ok for your create a plan with another thread.&amp;nbsp;Dmitry just want to emphasize each thread&amp;nbsp;need his own plan.&lt;BR /&gt;
	​&lt;BR /&gt;
	&amp;gt;Performance aren't similar between mkl_sequential and mkl_parallel using a single thread ?&lt;/P&gt;

&lt;P&gt;Should I use mkl_set_num_threads_local() in each thread, or can I set it in the main thread before the OMP section, and have the value affecting each thread ?&amp;nbsp;&lt;BR /&gt;
	&lt;BR /&gt;
	​[Ying] the performance should be similar when 1 thread.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;mkl_set_num_threads_local did the trick.&amp;nbsp; it should be ok.&amp;nbsp; No sure what number you setting, but please refer to mkl developer guide&lt;BR /&gt;
	&lt;BR /&gt;
	&lt;SPAN class="fontstyle0"&gt;&lt;B&gt;&lt;FONT size="2"&gt;CAUTION:&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle1"&gt;&lt;FONT face="Verdana" size="2"&gt;If your application is threaded with OpenMP* and parallelization of Intel MKL is based on nested&lt;BR /&gt;
	OpenMP parallelism, different OpenMP parallel regions reuse OpenMP threads. Therefore a thread-local&lt;BR /&gt;
	setting in one OpenMP parallel region may continue to affect not only the master thread after the&lt;BR /&gt;
	parallel region ends, but also subsequent parallel regions. To avoid performance implications of this&lt;BR /&gt;
	side effect, reset the thread-local number of threads before leaving the OpenMP parallel region (see&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle1" style="color:rgb(8,96,168);"&gt;&lt;FONT face="Verdana" size="2"&gt;Examples &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN class="fontstyle1"&gt;&lt;FONT face="Verdana" size="2"&gt;for how to do it).&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR style=" font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; " /&gt;
	&lt;BR /&gt;
	&lt;BR /&gt;
	&lt;SPAN class="fontstyle0"&gt;&lt;FONT face="Verdana" size="2"&gt;This example shows how to avoid the side effect of a thread-local number of threads by reverting to the&lt;BR /&gt;
	global setting:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;#include "omp.h"&lt;BR /&gt;
	#include "mkl.h"&lt;BR /&gt;
	…&lt;BR /&gt;
	mkl_set_num_threads(16);&lt;BR /&gt;
	my_compute_using_mkl(); // Intel MKL functions use up to 16 threads&lt;BR /&gt;
	#pragma omp parallel num_threads(2)&lt;BR /&gt;
	{&lt;BR /&gt;
	if (0 == omp_get_thread_num())&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;B&gt;&lt;FONT size="2"&gt;mkl_set_num_threads_local(4);&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;else&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;B&gt;&lt;FONT size="2"&gt;mkl_set_num_threads_local(12);&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;my_compute_using_mkl(); // Intel MKL functions use up to 4 threads on thread 0&lt;BR /&gt;
	// and up to 12 threads on thread 1&lt;BR /&gt;
	}&lt;BR /&gt;
	my_compute_using_mkl(); // Intel MKL functions use up to 4 threads (!)&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;B&gt;&lt;FONT size="2"&gt;mkl_set_num_threads_local( 0 ); // make master thread use global setting&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;my_compute_using_mkl(); // Intel MKL functions use up to 16 threads&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle0"&gt;&lt;FONT face="Verdana" size="2"&gt;This example shows how to avoid the side effect of a thread-local number of threads by saving and restoring&lt;BR /&gt;
	the existing setting:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;#include "mkl.h"&lt;BR /&gt;
	void my_compute( int nt )&lt;BR /&gt;
	{&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;B&gt;&lt;FONT size="2"&gt;int save = mkl_set_num_threads_local( nt ); // save the Intel MKL number of threads&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;my_compute_using_mkl(); // Intel MKL functions use up to nt threads on this thread&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle3"&gt;&lt;B&gt;&lt;FONT size="2"&gt;mkl_set_num_threads_local( save ); // restore the Intel MKL number of threads&lt;/FONT&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;SPAN class="fontstyle2"&gt;&lt;FONT size="2"&gt;}&lt;/FONT&gt;&lt;/SPAN&gt;&lt;BR style=" font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; " /&gt;
	​Best Regards,&lt;BR /&gt;
	​Ying&lt;BR /&gt;
	&lt;BR /&gt;
	&lt;BR /&gt;
	&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 May 2018 08:12:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-fftw3-thread-safety/m-p/1149092#M26949</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-05-03T08:12:21Z</dc:date>
    </item>
  </channel>
</rss>

