<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sorry, I misunderstood your in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008230#M19107</link>
    <description>&lt;P&gt;Sorry, I misunderstood your original question. Can you provide more information, such as MKL version, matrix size, etc.? Ideally, share a test case if you can.&lt;/P&gt;

&lt;P&gt;I've also taken a quick look at the MKL User's Guide. It looks like the Jacobian matrix calculation routines are not among those that have been threaded: &lt;A href="https://software.intel.com/en-us/node/528370" target="_blank"&gt;https://software.intel.com/en-us/node/528370&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 30 Sep 2014 18:02:07 GMT</pubDate>
    <dc:creator>Zhang_Z_Intel</dc:creator>
    <dc:date>2014-09-30T18:02:07Z</dc:date>
    <item>
      <title>djacobix only uses 4 threads on a 16 CPUs virtual machine</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008227#M19104</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I'm using djacobix in Intel MKL. My testing machine is a virtual Windows Server 2012 with 16 CPUs, . I'm use the following statements in my code:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mkl_set_dynamic(0);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;mkl_set_num_threads(12);&lt;/P&gt;

&lt;P&gt;But when it runs, djacobix only uses 4 threads at a time. I found the topic "Why the MKL can only call 4 threads?" (https://software.intel.com/en-us/forums/topic/288645). It mentioned that "MKL uses just 1 thread per core". I set the environment variable "KMP_AFFINITY=verbose" as suggested, and it gave me the following outputs:&lt;/P&gt;

&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;BR /&gt;
	OMP: Info #205: KMP_AFFINITY: cpuid leaf 11 not supported - decoding legacy APIC ids.&lt;BR /&gt;
	OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid info&lt;BR /&gt;
	OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #156: KMP_AFFINITY: 16 available OS procs&lt;BR /&gt;
	OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;BR /&gt;
	&lt;STRONG&gt;OMP: Info #159: KMP_AFFINITY: 16 packages x 1 cores/pkg x 1 threads/core (16 total cores)&lt;/STRONG&gt;&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 0 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 1 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 5 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	2OMP: Info #242: KMP_AFFINITY: pid 2828 thread 3 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	2OMP: Info #242: KMP_AFFINITY: pid 2828 thread 4 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	04OMP: Info #242: KMP_AFFINITY: pid 2828 thread 6 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 7 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 2 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 8 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 9 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 10 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;
	OMP: Info #242: KMP_AFFINITY: pid 2828 thread 11 bound to OS proc set {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;/P&gt;

&lt;P&gt;Is it possible to use 12 threads in djacobix on this machine?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Sep 2014 22:21:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008227#M19104</guid>
      <dc:creator>Maosi_C_</dc:creator>
      <dc:date>2014-09-29T22:21:05Z</dc:date>
    </item>
    <item>
      <title>The "KMP_AFFINITY=verbose"</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008228#M19105</link>
      <description>&lt;P&gt;The "KMP_AFFINITY=verbose" output clearly shows there are 12 threads (thread 0 ~ thread 11). So what is the problem?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2014 17:25:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008228#M19105</guid>
      <dc:creator>Zhang_Z_Intel</dc:creator>
      <dc:date>2014-09-30T17:25:35Z</dc:date>
    </item>
    <item>
      <title>When it runs, it actually</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008229#M19106</link>
      <description>&lt;P&gt;When it runs, it actually only uses 4 threads.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2014 17:26:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008229#M19106</guid>
      <dc:creator>Maosi_C_</dc:creator>
      <dc:date>2014-09-30T17:26:54Z</dc:date>
    </item>
    <item>
      <title>Sorry, I misunderstood your</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008230#M19107</link>
      <description>&lt;P&gt;Sorry, I misunderstood your original question. Can you provide more information, such as MKL version, matrix size, etc.? Ideally, share a test case if you can.&lt;/P&gt;

&lt;P&gt;I've also taken a quick look at the MKL User's Guide. It looks like the Jacobian matrix calculation routines are not among those that have been threaded: &lt;A href="https://software.intel.com/en-us/node/528370" target="_blank"&gt;https://software.intel.com/en-us/node/528370&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2014 18:02:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008230#M19107</guid>
      <dc:creator>Zhang_Z_Intel</dc:creator>
      <dc:date>2014-09-30T18:02:07Z</dc:date>
    </item>
    <item>
      <title>I'm not sure where is the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008231#M19108</link>
      <description>&lt;P&gt;I'm not sure where is the right place to check MKL version. I found "Intel(R) Math Kernel Library 11.1" under "license" in the Intel Software Manager.&lt;/P&gt;

&lt;P&gt;Matrix size: m=1362 (dimension of function value), n=4 (number of function variables), jac_eps = 0.0075&lt;/P&gt;

&lt;P&gt;The simplified code: (All undeclared variables here are global)&lt;/P&gt;

&lt;P&gt;djacobix(DC_TR_wrapper, &amp;amp;n, &amp;amp;m, fjac, x, &amp;amp;jac_eps, NULL);&lt;/P&gt;

&lt;P&gt;void DC_TR_wrapper(MKL_INT * m, MKL_INT * n, double *x, double *f, void *DC_OPT_DataRef) {&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::vector&amp;lt;double&amp;gt; x_vec_in(*n);&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for (int iX = 0; iX &amp;lt; *n; iX++) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;x_vec_in[iX] = x[iX];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::vector&amp;lt;double&amp;gt; RetObjFuncVals = DC_thread_call(std::ref(x_vec_in));&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (RetObjFuncVals.size() != *m) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::unique_lock&amp;lt;std::mutex&amp;gt; uniqLk_scrnPrint(mt_scrnPrint);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_scrnPrint.wait(uniqLk_scrnPrint, []{ return g_notified_scrnPrint == true; });&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified_scrnPrint = false;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; std::this_thread::get_id() &amp;lt;&amp;lt; " RetObjFuncVals.size() != *m. (DC_TR_wrapper)" &amp;lt;&amp;lt; endl;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified_scrnPrint = true;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_scrnPrint.notify_one();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; memcpy(f, &amp;amp;RetObjFuncVals[0], *m * sizeof(double));&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;long get_Next_available_subWS() {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;std::unique_lock&amp;lt;std::mutex&amp;gt; uniqLk(mt_subWS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_mt_subWS.wait(uniqLk, []{ return g_notified == true; });&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified = false;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;long cur_fst_available_subWS_idx = -1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;for (int iSWS = 0; iSWS &amp;lt; max_concurrent_threads; iSWS++) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (subWS_statuses[iSWS] &amp;gt; 0) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cur_fst_available_subWS_idx = iSWS;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;subWS_statuses[iSWS] = 0; //0: now it is in use&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;break;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;return cur_fst_available_subWS_idx;&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	int remove_one_subWS(long TBR_subWS_idx) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;std::unique_lock&amp;lt;std::mutex&amp;gt; uniqLk(mt_subWS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_mt_subWS.wait(uniqLk, []{ return g_notified == true; });&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified = false;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (TBR_subWS_idx &amp;gt;= max_concurrent_threads || TBR_subWS_idx &amp;lt; 0) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;return -1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;subWS_statuses[TBR_subWS_idx] = 1;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;return 1;&lt;BR /&gt;
	}&lt;/P&gt;

&lt;P&gt;std::vector&amp;lt;double&amp;gt; DC_thread_call(std::vector&amp;lt;double&amp;gt; x_vec) {&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;// get the next available subWS, if all unavailable, keep waiting.&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;std::vector&amp;lt;long&amp;gt; cpy_subWS_statuses;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;long cur_subWS_idx = -1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;while (cur_subWS_idx &amp;lt; 0) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cur_subWS_idx = get_Next_available_subWS();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (cur_subWS_idx &amp;gt;= 0) { cpy_subWS_statuses = subWS_statuses; }&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; g_notified = true;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_mt_subWS.notify_one();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::string cur_subWS_name = subWS_names[cur_subWS_idx];&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; // print the subWS_statuses&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::unique_lock&amp;lt;std::mutex&amp;gt; uniqLk_scrnPrint(mt_scrnPrint);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_scrnPrint.wait(uniqLk_scrnPrint, []{ return g_notified_scrnPrint == true; });&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified_scrnPrint = false;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; std::this_thread::get_id() &amp;lt;&amp;lt; ": " &amp;lt;&amp;lt; "get subWS_idx: " &amp;lt;&amp;lt; cur_subWS_idx &amp;lt;&amp;lt; " " &amp;lt;&amp;lt; cur_subWS_name &amp;lt;&amp;lt; endl;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;for (int iSWS = 0; iSWS &amp;lt; max_concurrent_threads; iSWS++) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::string str_iSWS;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (iSWS != cur_subWS_idx) {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;str_iSWS = std::to_string(cpy_subWS_statuses[iSWS]);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;else {&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;str_iSWS = '*';&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; str_iSWS &amp;lt;&amp;lt; " ";&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; endl;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified_scrnPrint = true;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_scrnPrint.notify_one();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Launch Model &amp;amp; Calc ObjFunc values&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;DC_ObjFunc * DC_OF_obj1 = new DC_ObjFunc();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; DC_OF_obj1-&amp;gt;Set_sub_WS_name(cur_subWS_name);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; int upd_status2 = DC_OF_obj1-&amp;gt;Update_VarVals(x_vec);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; DC_OF_obj1-&amp;gt;Launch_Daycent();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;DC_OF_obj1-&amp;gt;Collect_Comparison_Report();&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::vector&amp;lt;double&amp;gt; RetObjFuncVals = DC_OF_obj1-&amp;gt;ObjFuncVals_vec;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; delete DC_OF_obj1;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;// remove the current subWS&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;int RmStatus = remove_one_subWS(cur_subWS_idx);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;cpy_subWS_statuses = subWS_statuses;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;g_notified = true;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;cv_mt_subWS.notify_one();&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return RetObjFuncVals;&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2014 19:17:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008231#M19108</guid>
      <dc:creator>Maosi_C_</dc:creator>
      <dc:date>2014-09-30T19:17:00Z</dc:date>
    </item>
    <item>
      <title>Correct me if I'm wrong, but</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008232#M19109</link>
      <description>&lt;P&gt;Correct me if I'm wrong, but it looks like the function you supplied, DC_TR_wrapper, spawns threads? If this is true, then you'd better link with sequential MKL instead of parallel MKL. This is because parallel MKL relies on OpenMP threading technology, which may not be compatible with those threads spawned by DC_TR_wrapper.&lt;/P&gt;

&lt;P&gt;Can you try to make DC_TR_wrapper a single-threaded routine and try again? I'd expect there will be only one thread used. I still believe the djacobix routine in MKL is not threaded. So the 4 threads you saw before may actually come from DC_TR_wrapper.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2014 23:27:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008232#M19109</guid>
      <dc:creator>Zhang_Z_Intel</dc:creator>
      <dc:date>2014-09-30T23:27:12Z</dc:date>
    </item>
    <item>
      <title>DC_TR_wrapper does not spawn</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008233#M19110</link>
      <description>&lt;P&gt;DC_TR_wrapper does not spawn threads, it is just a wrapper of&amp;nbsp;DC_thread_call for djacobix. I have another wrapper of DC_thread_call called "DC_thread_call_wrapper2", which can be used to spawn threads by myself. I can use all 12 threads by the following statements:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::vector&amp;lt;std::thread&amp;gt; DC_Stage1_threads;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (int iThr = 0; iThr &amp;lt; 12; iThr++) {&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DC_Stage1_threads.push_back(std::thread(DC_thread_call_wrapper2, std::ref(x_OF_iT)));&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;

&lt;P&gt;x_OF_iT is an vector iterator that are used to pass some data.&lt;/P&gt;

&lt;P&gt;But I cannot control whether djacobix uses threads and how many threads it actually uses.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Oct 2014 01:08:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/djacobix-only-uses-4-threads-on-a-16-CPUs-virtual-machine/m-p/1008233#M19110</guid>
      <dc:creator>Maosi_C_</dc:creator>
      <dc:date>2014-10-01T01:08:35Z</dc:date>
    </item>
  </channel>
</rss>

