<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Richie, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976743#M17125</link>
    <description>Hi Richie,

DftiCompute functions need thread-local read-write memory. For performance reasons that memory used to be associated with DFTI descriptor. DFTI_NUMBER_OF_USER_THREADS parameter was provided to duplicate the memory per-calling-thread, so the descriptor could be shared by calling threads. This behavior will be fixed in future, and one will not need to specify number of calling threads.

I also wonder what version of MKL do you use and if the status returned by DftiCompute functions is checked in your application. It may return an error if N+1st thread uses the descriptor committed with configuration parameter DFTI_NUMBER_OF_USER_THREADS set to N. Its default value is one.

Thanks
Dima</description>
    <pubDate>Thu, 20 Dec 2012 06:53:21 GMT</pubDate>
    <dc:creator>Dmitry_B_Intel</dc:creator>
    <dc:date>2012-12-20T06:53:21Z</dc:date>
    <item>
      <title>MKL FFT library and thread assumptions</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976742#M17124</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;We are curious if we are using DFTI_NUMBER_OF_USER_THREADS correctly.&lt;/P&gt;
&lt;P&gt;We use the MKL FFT library in our application: the application is thread rich, but we don't use OpenMP. &amp;nbsp;We simply create all the POSIX (system level) threads ourself. &amp;nbsp;Among all these threads, we want to share the MKL DFTI descriptors. &amp;nbsp;The Descriptor, if we assume &amp;nbsp;the model of FFTW or any other FFT library, typically computes a twiddle table based on the length of the FFT. &amp;nbsp;This knowledge is "encapsulated" inside the descriptor.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Our hope is that by sharing descriptors among threads that we will reduce memory size (i,.e, share the twiddle tables). &amp;nbsp;We seem to be successfully using MKL. Until recently. At one point, we do a very large FFT (16Meg) and everything fails. &amp;nbsp;We believe (after some look in the forums here) that setting DFTI_NUMBER_OF_USER_THREADS to some reasonable value like 16 (it was 1 before) is the right thing to do, but we're not sure. &amp;nbsp;It seems to fix the problem (setting it to 16), but we wanted to verify: Given the scenario described above, (we create our own threads and want to share the Descriptors among several non-realted threads), is this correct?&lt;/P&gt;
&lt;P&gt;Now, our applications tend to be very FFT heavy: some threads on the front-end use an FFT, the main processing uses threads in a work-crew/map-reduce paradigm, and the back-end processing uses FFTs. &amp;nbsp;In other words, all sorts of threads from all over the application can be sharing the Descriptors, and there is no known "apriori" limit. &amp;nbsp;We don't have any insight how setting DFTI_NUMBER_OF_USER_THREADS to 16 allows the multiple threads to reuse it (in FFTW, there's no notion of this). &amp;nbsp;Does each thread "register" with the descriptor? &amp;nbsp;Is there thread-local data with the descriptor? &amp;nbsp;Once a thread has used the descriptor, can only that thread reuse it in that way? Or can I keep re-using the descriptor in multiple threads? (I.e., setting the DFTI_NUMBER_OF_USER_THREADS to 16, have some 16 threads use it, then another 16 threads, then a different 16 threads, or do the same threads have to reuse it?).&lt;/P&gt;
&lt;P&gt;If anyone knows about how DFT_NUMBER_OF_USER_THREADS works with the descriptor, it would be very helpful. &amp;nbsp;We think this fixes out problem, but we'd like to know if we have the right solution: once a thread has used a "sharing" slot, can no other thread use it?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance. &amp;nbsp;I am happy to supply some code showing how we use it. &amp;nbsp;I also want to thank the Intel Forums for helping us find the DFTI_NUMBER_OF_USER_THREADS in the first place!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Gooday,&lt;/P&gt;
&lt;P&gt;&amp;nbsp; Richie&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Dec 2012 19:40:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976742#M17124</guid>
      <dc:creator>Richard_S_3</dc:creator>
      <dc:date>2012-12-19T19:40:05Z</dc:date>
    </item>
    <item>
      <title>Hi Richie,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976743#M17125</link>
      <description>Hi Richie,

DftiCompute functions need thread-local read-write memory. For performance reasons that memory used to be associated with DFTI descriptor. DFTI_NUMBER_OF_USER_THREADS parameter was provided to duplicate the memory per-calling-thread, so the descriptor could be shared by calling threads. This behavior will be fixed in future, and one will not need to specify number of calling threads.

I also wonder what version of MKL do you use and if the status returned by DftiCompute functions is checked in your application. It may return an error if N+1st thread uses the descriptor committed with configuration parameter DFTI_NUMBER_OF_USER_THREADS set to N. Its default value is one.

Thanks
Dima</description>
      <pubDate>Thu, 20 Dec 2012 06:53:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976743#M17125</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2012-12-20T06:53:21Z</dc:date>
    </item>
    <item>
      <title>Hi Dima,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976744#M17126</link>
      <description>Hi Dima,

Thanks for the reply.

Re: checking error status:  Before we had the 16M error (described above), we had error checking in most places (not all).  I hadn't had error checking for the compute or mkl_malloc.  I went back and made sure I checked the return status of ALL of my MKL calls: I never did see an error. It's possible I missed checking the status of a call, but I don't think so.  I never did see MKL tell me I had too many threads connected to the descriptor.

Re: version: We are using the MKL that comes bundled with the Intel 12 compiler (the version, according to my build paths is using composer_xe_2011_sp1.8.273, so I think that means Intel 12.273?  The 'icc --version' returns 12.1.2 20111128).  I am sorry, I don't know how to 
separate the MKL version from the Intel tools/compiler suite bundle.  All the mkl stuff is under composer_xe_2011_sp1.8.273 dir above.

I figured that thread-local storage was used.  Do you know if after one thread has "finished" with its FFT, can a different thread go in and 
"reuse" the thread-local storage?

Thanks again for the quick reply.  It's good to know we won't necessarily have to worry about this in a future release: is there a particular macro we can check and/or version macro to check for this?

  Gooday,

  Richie</description>
      <pubDate>Thu, 20 Dec 2012 16:00:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976744#M17126</guid>
      <dc:creator>Richard_S_3</dc:creator>
      <dc:date>2012-12-20T16:00:53Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976745#M17127</link>
      <description>Hi,

As to MKL version:
    Please look and run example from $MKLROOT/examples/versionqueryc for C/C++ and $MKLROOT/examples/versionqueryf for Fortran

As to macro to check macro:
    Please look at $MKLROOT/include/mkl.h for C/C++ and $MKLROOT/include/mkl.fi for Fortran
    where there are defined __INTEL_MKL_* macros.

E.g. for MK 11.0.1

#define __INTEL_MKL_BUILD_DATE 20121009

#define __INTEL_MKL__ 11
#define __INTEL_MKL_MINOR__ 0
#define __INTEL_MKL_UPDATE__ 1

#define INTEL_MKL_VERSION (__INTEL_MKL__ * 10000 + \
        __INTEL_MKL_MINOR__ * 100 + __INTEL_MKL_UPDATE__)</description>
      <pubDate>Mon, 24 Dec 2012 07:32:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-library-and-thread-assumptions/m-p/976745#M17127</guid>
      <dc:creator>barragan_villanueva_</dc:creator>
      <dc:date>2012-12-24T07:32:06Z</dc:date>
    </item>
  </channel>
</rss>

