<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic It looks like threading in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935014#M5013</link>
    <description>&lt;P&gt;It looks like threading inside syevr depends on there being significant work done by gemv et al. at a lower level, or better, if ?latrd could be parallelized to use multiple copies of gemv. &amp;nbsp;See&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-mkl-threaded-functions" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-mkl-threaded-functions&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/forums/topic/292428" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/292428&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;(where it is suggested that threading should become useful from size 128)&lt;/P&gt;

&lt;P&gt;I don't see a clear indication about consideration of threading it at a higher level than gemv. &amp;nbsp;It's probably difficult on account of the varying gemv sizes.&lt;/P&gt;

&lt;P&gt;You would either call &amp;nbsp;MKL threaded functions&amp;nbsp;from outside parallel regions or use OMP_NESTED, OMP_NUM_THREADS to control how many MKL threads are in use and try to increase parallelism by calling lapack from multiple threads. &amp;nbsp;There aren't well developed facilities for placing the adjacent gemv threads on a single cache, if you are trying to run multiple copies.&lt;/P&gt;

&lt;P&gt;I suppose it would be interesting to get a report on which MKL version is active, other than by checking shared object search paths, but I don't see such a thing in the docs.&lt;/P&gt;</description>
    <pubDate>Sat, 08 Feb 2014 17:14:00 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2014-02-08T17:14:00Z</dc:date>
    <item>
      <title>MultiThreading with MKL library nonlinear least square solver</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935011#M5010</link>
      <description>&lt;P&gt;Hello everybody,&amp;nbsp;&lt;BR /&gt;
	I am using the intel&amp;nbsp;solution for&amp;nbsp;Nonlinear Least Squares Problem with Linear (Bound) Constraints&lt;BR /&gt;
	&lt;A href="http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08.htm#GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08"&gt;http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08.htm#GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Question: what do I need to do to run the optimizer&amp;nbsp;in parallel?&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;A.&amp;nbsp;&lt;/STRONG&gt;Let me consider the&amp;nbsp;intel example&lt;EM&gt;&lt;STRONG&gt;&amp;nbsp;&lt;A href="http://software.intel.com/en-us/node/471534"&gt;ex_nlsqp_bc_c.c&lt;/A&gt;&lt;/STRONG&gt;&lt;/EM&gt;, let's say I just&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;call omp_set_num_threads(n) before starting the minimization loop:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;omp_set_num_threads(n); //no pragmas!!!&amp;nbsp;Just want to make sure I don't have to put any pragmas in the cycle.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;while(not_converged) &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;{&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;dtrnlspbc_solve&lt;/EM&gt;&lt;/STRONG&gt;(OPTION); &amp;nbsp;//intel mkl function minimizer;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;if(OPTION-1) {my_function();} &amp;nbsp;// user-supplied function&lt;/P&gt;

&lt;P&gt;else if (OPTION-2) {&lt;EM&gt;&lt;STRONG&gt;djacobi&lt;/STRONG&gt;&lt;/EM&gt;(my_function);} //intel mkl function (numerical gradient); &amp;nbsp;&lt;STRONG&gt;&amp;nbsp;Does it&lt;/STRONG&gt;&amp;nbsp;&lt;STRONG&gt;call&amp;nbsp;my_function from different&amp;nbsp;threads?&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;In the multithreading mode what is done in parallel? Jacobian construction or just manipulations with Jacobian? I hope&amp;nbsp;that &amp;nbsp;calls to the user-supplied function are done with different X by multiple threads...&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;B.&amp;nbsp;&lt;/STRONG&gt;To check this I inserted omp_get_thread_num in my function&amp;nbsp;&lt;/P&gt;

&lt;P&gt;void&amp;nbsp;my_function() {&amp;nbsp;&lt;/P&gt;

&lt;P&gt;i=omp_get_thread_num();&lt;/P&gt;

&lt;P&gt;printf("%i\n",i); &amp;nbsp; &amp;nbsp; &amp;lt;&lt;STRONG&gt;-It prints different values thread numbers? Does it mean it is executed from different threads?&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;AND Thus&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;all I need there is a thread-save function&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;?? + set&amp;nbsp;OMP_NUM_THREADS + linking correct libs?&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;I wish there was a better documentation on this issue.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2013 04:58:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935011#M5010</guid>
      <dc:creator>Nikolay_P_1</dc:creator>
      <dc:date>2013-12-11T04:58:16Z</dc:date>
    </item>
    <item>
      <title>MKL has two versions of a</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935012#M5011</link>
      <description>&lt;P&gt;MKL has two versions of a library. Both versions are multi-thread safe.&lt;/P&gt;

&lt;P&gt;One version creates its own OpenMP thread pool. This version (&lt;EM&gt;&lt;STRONG&gt;OpenMP multi-threaded&lt;/STRONG&gt;&lt;/EM&gt;)&amp;nbsp;is intended for use with a &lt;EM&gt;&lt;STRONG&gt;single threaded application&lt;/STRONG&gt;&lt;/EM&gt;.&lt;/P&gt;

&lt;P&gt;The second versions does not create its own OpenMP thread pool. This version you would typically use with an OpenMP application.&lt;/P&gt;

&lt;P&gt;This may seem counter intuitive until you realize using the OpenMP version of MKL with an OpenMP application results in omp_num_threads() * kmp_num_threads() number of threads. Using defaults this results in the number of logical processors**2 - oversubscription.&lt;/P&gt;

&lt;P&gt;This said, there are some cases where you might want to use both with their own OpenMP pool (two pools). But in doing so you may have to use&lt;/P&gt;

&lt;P&gt;omp_set_num_threads(o);&lt;BR /&gt;
	kmp_set_num_threads(k);&lt;/P&gt;

&lt;P&gt;// o*k == number of logical processors&lt;/P&gt;

&lt;P&gt;And/or only call MKL from outside parallel regions .AND. set environment variable&amp;nbsp;KMP_BLOCKTIME=0&lt;BR /&gt;
	And/or first level parallel region with greatly reduced number of threads for both pools.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jan 2014 23:06:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935012#M5011</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2014-01-21T23:06:16Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935013#M5012</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have a similar question in regards to fortran&lt;/P&gt;

&lt;P&gt;I'm linking with&lt;/P&gt;

&lt;P&gt;LIBS=-L$(CFITSIO)lib64/ -lcfitsio&amp;nbsp; $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-gro&lt;BR /&gt;
	up $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/li&lt;BR /&gt;
	b/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm&lt;/P&gt;

&lt;P&gt;I use OPENMP explictly in several regions of the code and this working properly.&lt;/P&gt;

&lt;P&gt;(1) How do I ensure a Lapack call will use available threads? i.e.,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;CALL SYEVR(COVARIANCE,EIGVAL,UPLO,Z=EIGVEC,ABSTOL=ABSTOL,INFO=INFO)&lt;/P&gt;

&lt;P&gt;(2) Is there a way to determine the MKL version that the code is linked to at run-time?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;-- Pete Schuck&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 15:11:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935013#M5012</guid>
      <dc:creator>pwschuck</dc:creator>
      <dc:date>2014-02-08T15:11:40Z</dc:date>
    </item>
    <item>
      <title>It looks like threading</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935014#M5013</link>
      <description>&lt;P&gt;It looks like threading inside syevr depends on there being significant work done by gemv et al. at a lower level, or better, if ?latrd could be parallelized to use multiple copies of gemv. &amp;nbsp;See&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/articles/intel-mkl-threaded-functions" target="_blank"&gt;http://software.intel.com/en-us/articles/intel-mkl-threaded-functions&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://software.intel.com/en-us/forums/topic/292428" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/292428&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;(where it is suggested that threading should become useful from size 128)&lt;/P&gt;

&lt;P&gt;I don't see a clear indication about consideration of threading it at a higher level than gemv. &amp;nbsp;It's probably difficult on account of the varying gemv sizes.&lt;/P&gt;

&lt;P&gt;You would either call &amp;nbsp;MKL threaded functions&amp;nbsp;from outside parallel regions or use OMP_NESTED, OMP_NUM_THREADS to control how many MKL threads are in use and try to increase parallelism by calling lapack from multiple threads. &amp;nbsp;There aren't well developed facilities for placing the adjacent gemv threads on a single cache, if you are trying to run multiple copies.&lt;/P&gt;

&lt;P&gt;I suppose it would be interesting to get a report on which MKL version is active, other than by checking shared object search paths, but I don't see such a thing in the docs.&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 17:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/MultiThreading-with-MKL-library-nonlinear-least-square-solver/m-p/935014#M5013</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2014-02-08T17:14:00Z</dc:date>
    </item>
  </channel>
</rss>

