<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Concurrency Issues with DSS in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121403#M24976</link>
    <description>&lt;P&gt;I have a very large program which does numerical modeling. Up until recently I've been debugging and everything has been fine. &amp;nbsp;I'm now switching over to optimize areas that need improvements. &amp;nbsp;In my code I need to solve many linear algebra problems (50k+ cases) each take about a tenth of a second in sequential mode. &amp;nbsp;Each case is completely variable independent. &amp;nbsp;So it seemed to make sense to me to put the entire thing in a openmp for loop to run my 50k+cases and keep the cores fed that way. &amp;nbsp;That's when I started noticing small errors in the data. &amp;nbsp;Such as 4.600009 vs 4.600011. &amp;nbsp;Then I started seeing cases where the solver didn't converge at all (all of my cases converge without openmp).&lt;/P&gt;

&lt;P&gt;What clued me in to it might be DSS is I ran a very large set of cases (~1M) while running a profiler. &amp;nbsp;After about 100k or so the concurrency drops to 1 core (of 16) and shows that all of the cores are waiting on dss_reorder to complete.&lt;/P&gt;

&lt;P&gt;I cannot post all of the code(proprietary and legal reasons), but here's the bulk of the function that runs the DSS code. &amp;nbsp;Its fairly straight forward DSS code. &amp;nbsp;Each openmp loop gets its own dss handle.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_SETUP&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Error;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;_MKL_DSS_HANDLE_t Handle;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT opt = MKL_DSS_DEFAULTS | MKL_DSS_MSG_LVL_WARNING | MKL_DSS_TERM_LVL_ERROR | MKL_DSS_ZERO_BASED_INDEXING;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Sym = MKL_DSS_NON_SYMMETRIC;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Typ = MKL_DSS_INDEFINITE;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT RowCount = NodeCount;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT ColCount = NodeCount;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT NonZeros = mMatrixA.NonZeros.size();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT One = 1;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_create(Handle, opt);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_define_structure(Handle, Sym, mMatrixA.RowStart, RowCount, ColCount, mMatrixA.Columns, NonZeros);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_reorder(Handle, opt, 0);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_LOOP_INIT&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_factor_real(Handle, Typ, mMatrixA.Values);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_solve_real(Handle, opt, mVectorB, One, mCurrentValue);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_LOOP_END&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_delete(Handle, opt);&lt;/P&gt;

&lt;P&gt;SOLVER_***** &amp;nbsp;are macros&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The solver loop shown here is not the open mp loop discussed. &amp;nbsp;It is there because the equations are highly non-linear and the matrix mMatrixA depends upon mCurrentValue. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;None of the asserts indicate there are any issues with the solver at anytime.&lt;/P&gt;

&lt;P&gt;I'm currently running MKL 11.3 update 2. &amp;nbsp;Visual Studio 2013. &amp;nbsp;I've tried this on 3 different machines all with the same result.&lt;/P&gt;

&lt;P&gt;From what I've seen it seems like dss_reorder has concurrency issues, but the documentation says otherwise. &amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I have tried both the sequential and parallel version of the mkl with the same results.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I am hoping someone has seen this before and knows of a work around (Although, I did not see this issue on an&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;y forums)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks for any help&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 09 May 2016 13:52:54 GMT</pubDate>
    <dc:creator>joseph_v_1</dc:creator>
    <dc:date>2016-05-09T13:52:54Z</dc:date>
    <item>
      <title>Concurrency Issues with DSS</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121403#M24976</link>
      <description>&lt;P&gt;I have a very large program which does numerical modeling. Up until recently I've been debugging and everything has been fine. &amp;nbsp;I'm now switching over to optimize areas that need improvements. &amp;nbsp;In my code I need to solve many linear algebra problems (50k+ cases) each take about a tenth of a second in sequential mode. &amp;nbsp;Each case is completely variable independent. &amp;nbsp;So it seemed to make sense to me to put the entire thing in a openmp for loop to run my 50k+cases and keep the cores fed that way. &amp;nbsp;That's when I started noticing small errors in the data. &amp;nbsp;Such as 4.600009 vs 4.600011. &amp;nbsp;Then I started seeing cases where the solver didn't converge at all (all of my cases converge without openmp).&lt;/P&gt;

&lt;P&gt;What clued me in to it might be DSS is I ran a very large set of cases (~1M) while running a profiler. &amp;nbsp;After about 100k or so the concurrency drops to 1 core (of 16) and shows that all of the cores are waiting on dss_reorder to complete.&lt;/P&gt;

&lt;P&gt;I cannot post all of the code(proprietary and legal reasons), but here's the bulk of the function that runs the DSS code. &amp;nbsp;Its fairly straight forward DSS code. &amp;nbsp;Each openmp loop gets its own dss handle.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_SETUP&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Error;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;_MKL_DSS_HANDLE_t Handle;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT opt = MKL_DSS_DEFAULTS | MKL_DSS_MSG_LVL_WARNING | MKL_DSS_TERM_LVL_ERROR | MKL_DSS_ZERO_BASED_INDEXING;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Sym = MKL_DSS_NON_SYMMETRIC;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;MKL_INT Typ = MKL_DSS_INDEFINITE;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT RowCount = NodeCount;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT ColCount = NodeCount;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT NonZeros = mMatrixA.NonZeros.size();&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;const MKL_INT One = 1;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_create(Handle, opt);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_define_structure(Handle, Sym, mMatrixA.RowStart, RowCount, ColCount, mMatrixA.Columns, NonZeros);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_reorder(Handle, opt, 0);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_LOOP_INIT&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_factor_real(Handle, Typ, mMatrixA.Values);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_solve_real(Handle, opt, mVectorB, One, mCurrentValue);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CME_Assert(ERROR != MKL_DSS_SUCCESS);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;SOLVER_LOOP_END&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;Error = dss_delete(Handle, opt);&lt;/P&gt;

&lt;P&gt;SOLVER_***** &amp;nbsp;are macros&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The solver loop shown here is not the open mp loop discussed. &amp;nbsp;It is there because the equations are highly non-linear and the matrix mMatrixA depends upon mCurrentValue. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;None of the asserts indicate there are any issues with the solver at anytime.&lt;/P&gt;

&lt;P&gt;I'm currently running MKL 11.3 update 2. &amp;nbsp;Visual Studio 2013. &amp;nbsp;I've tried this on 3 different machines all with the same result.&lt;/P&gt;

&lt;P&gt;From what I've seen it seems like dss_reorder has concurrency issues, but the documentation says otherwise. &amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I have tried both the sequential and parallel version of the mkl with the same results.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I am hoping someone has seen this before and knows of a work around (Although, I did not see this issue on an&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;y forums)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks for any help&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 13:52:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121403#M24976</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-09T13:52:54Z</dc:date>
    </item>
    <item>
      <title>Hi Joseph,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121404#M24977</link>
      <description>&lt;P&gt;Hi Joseph,&lt;/P&gt;

&lt;P&gt;Right, if each case is completely variable independent, it is reasonable to&amp;nbsp;employ the&amp;nbsp;OpenMP for loop to run&amp;nbsp;the 50k+cases&lt;/P&gt;

&lt;P&gt;How do you link the MKL in the code?&amp;nbsp; If try mkl:sequential,&amp;nbsp; &amp;nbsp;( or &lt;A href="https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/"&gt;https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/&lt;/A&gt; =&amp;gt;&amp;nbsp;mkl_intel_lp64.lib mkl_core.lib mkl_sequential.lib) . what is the result?&lt;/P&gt;

&lt;P&gt;the latest MKL 11.3.3 release recently. it&amp;nbsp;fixed one&amp;nbsp;pardiso issue( DSS&amp;nbsp;is another interface of pardiso) &amp;nbsp;you may try it&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-113-bug-fixes-list"&gt;https://software.intel.com/en-us/articles/intel-mkl-113-bug-fixes-list&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;You may create a new issue in premier.intel.com =&amp;gt; Intel MKL for windows, where all private code was protected.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;

&lt;P&gt;MKL 11.3.3&amp;nbsp;can be downloaded&amp;nbsp;from Registration Center &lt;A href="https://registrationcenter.intel.com/en/products/"&gt;https://registrationcenter.intel.com/en/products/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 07:16:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121404#M24977</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-05-10T07:16:00Z</dc:date>
    </item>
    <item>
      <title>Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121405#M24978</link>
      <description>&lt;P&gt;Ying,&lt;/P&gt;

&lt;P&gt;They are completely independent, each case generates a completely new instance of everything and writes the results in a file. &amp;nbsp;I have compiled and linked using the MKL sequential tool path. &amp;nbsp;I see a nearly linear speed up in my code, which is fantastic, but&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I get those odd errors in the results and the ridiculous concurrency issue in the dss_reorder function.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;My code cannot be posted in the premier site either.... &amp;nbsp;a significant portion of my code is ITAR controlled.&lt;/P&gt;

&lt;P&gt;I will try the 11.3.3, but by the description of the bug report nothing seems applicable as the pardiso bugs seem quite fatal, I'm not getting any crashes or error messages at all.&lt;/P&gt;

&lt;P&gt;When I get a chance I will be looking for the root of the problem by paring down the code to see if I can completely contain the bug inside of non-ITAR controlled code and be able to post something more.&lt;/P&gt;

&lt;P&gt;I was just hoping that the issue was seen by someone before.&lt;/P&gt;

&lt;P&gt;As a side note the code runs completely fine when I run 1 case even when compiling/linking the MKL for parallel use. &amp;nbsp;Unfortunately, this doesn't help a whole lot as the cases are so small they can't keep the processors fed and I only see about a 1.5x speedup.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;thanks,&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 13:08:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121405#M24978</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-10T13:08:50Z</dc:date>
    </item>
    <item>
      <title>As suggested I've tried</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121406#M24979</link>
      <description>&lt;P&gt;As suggested I've tried updating MKL to 11.3.3. &amp;nbsp;This did not change any of the problems. I've been experiencing.&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 19:53:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121406#M24979</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-10T19:53:30Z</dc:date>
    </item>
    <item>
      <title>Hi Joseph, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121407#M24980</link>
      <description>&lt;P&gt;Hi Joseph,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks for your test. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;Regarding the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;odd errors in the results&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 19.512px;"&gt;. &lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;Yes, It is unknown issues. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;But as you know, &amp;nbsp;it is possible that the tiny change in value may cause the solve&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;didn't converge. &amp;nbsp;for example, the sparse matrix have bad condition number. &amp;nbsp; It is also possible to &amp;nbsp;using sequential and parallel method can cause the different number result.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="line-height: 19.512px;"&gt;As there are much clues in your discriptions, let's focus on how we reproduce the test case? .&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;1. &lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;about the test case, &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;1.1) how was your code model, how the OpenMP was add?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;!OMP parrallel for&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;for (i=0;i&amp;lt;50k; i++)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;SOLVER_SETUP();&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;1.2) you mentioned&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;the code runs completely fine when I run 1 case even when compiling/linking the MKL for parallel use.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;So do you mean&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;for (i=0;i&amp;lt;1; i++)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;SOLVER_SETUP();&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;it run ok both sequential or parallel. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;1.3) do you know which data set , will produce different result?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;You may know that there is DSS sample in MKL install directory. If the problem was located to DSS, you may modify the DSS sample and &amp;nbsp;add OpenMP there , then feed it with one case to see if you can find the root cause.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;2) About test environment, &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;MKL 11.3 update 2. &amp;nbsp;Visual Studio 2013 , &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;is it 32bit or 64bit code? &amp;nbsp;using Intel Compiler &amp;nbsp;or Miscrosoft C/C++ Compiler . How do you link OpenMP run-time library? is it from Intel or from Microsoft?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;3) Regarding the profiling at run-time,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;3.1)If you are using Intel OpenMP, may be you can try&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;gt; set KMP_AFFINITY=verbose&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;gt;your application executable,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;and let us know the output&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;3.2) Intel also provide threading error check tool, in parallel studio XE suit. If possible, please try it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;So the key is test case. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Ying&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 May 2016 05:34:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121407#M24980</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-05-11T05:34:00Z</dc:date>
    </item>
    <item>
      <title>1.  about the test case,  </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121408#M24981</link>
      <description>&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;1. &amp;nbsp;about the test case, &amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;1.1) how was your code model, how the OpenMP was add?&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;!OMP parrallel for&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;for (i=0;i&amp;lt;50k; i++)&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;SOLVER_SETUP();&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Yes, essentially this is how its done.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;1.2) you mentioned&amp;nbsp;the code runs completely fine when I run 1 case even when compiling/linking the MKL for parallel use.&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;So do you mean&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;for (i=0;i&amp;lt;1; i++)&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;SOLVER_SETUP();&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;it run ok both sequential or parallel. &amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Yes, as long as I don't use openmp to run this loop in parallel, it runs just fine.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;1.3) do you know which data set , will produce different result?&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;You may know that there is DSS sample in MKL install directory. If the problem was located to DSS, you may modify the DSS sample and &amp;nbsp;add OpenMP there , then feed it with one case to see if you can find the root cause.&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;All of my matrices cause the problem. &amp;nbsp;&lt;SPAN style="line-height: 1.5;"&gt;This is more or less what I was planning on doing, but my matrices are not very small(10k-20k rows and columns, unsure of count of non-zero entries) and it will take some doing to extract one and make it reproducible.&lt;/SPAN&gt;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;2) About test environment, &amp;nbsp;MKL 11.3 update 2. &amp;nbsp;Visual Studio 2013 ,&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;is it 32bit or 64bit code? &amp;nbsp;using Intel Compiler &amp;nbsp;or Miscrosoft C/C++ Compiler . How do you link OpenMP run-time library? is it from Intel or from Microsoft?&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;I'm now running MKL 11.3 update 3. &amp;nbsp;its 64bit, MS C/C++ compiler. &amp;nbsp;Its MS lib.&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;SPAN style="line-height: 1.5;"&gt;3.1)If you are using Intel OpenMP, may be you can try&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&amp;gt; set KMP_AFFINITY=verbose&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&amp;gt;your application executable,&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;and let us know the output&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Not using intel openmp, &amp;nbsp;I will attempt to get the intel openmp to work (not sure if I will be successful as I only have the MKL)&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;3.2) Intel also provide threading error check tool, in parallel studio XE suit. If possible, please try it.&amp;nbsp;&lt;/P&gt;

	&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;So the key is test case.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;I will do so when I get a chance. &amp;nbsp;(Have to find someone with the C++ parallel studio. &amp;nbsp;I have the Fortran variety only).&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;As far as the errors go &amp;nbsp;If I pass the exact same matrix in 50k times I get different results. &amp;nbsp;Mostly the same, but different.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;Here is the value of one of the points of interest (identical input matrix and vector), when run with openmp on the loop.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;First notice the results are usually correct at 4.600011 and occasionally it produces very close, but wrong results (e.g. 4.59998). &amp;nbsp;Much more concerning is the&amp;nbsp;indeterminate value towards the end, which was produced by a diverging solution.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;The problem seems to be a race condition between threads, I've noticed the problem comes up much less often on a machine with fewer cores (My laptop for example has 2 cores, where as a test machine has 4 cores and another has 16) . &amp;nbsp;The severe concurrency issue does not arise on my laptop (the errors in results still occur though). &amp;nbsp;The concurrency issue seems to only occur on the machine with 16 cores.&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;TABLE border="0" cellpadding="0" cellspacing="0" style="border-collapse:
 collapse;width:48pt" width="64"&gt;
	&lt;COLGROUP&gt;
		&lt;COL style="width:48pt" width="64" /&gt;&lt;/COLGROUP&gt;
	&lt;TBODY&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt;width:48pt" width="64"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.59998&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD height="20" style="height:15.0pt"&gt;…&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD height="20" style="height:15.0pt"&gt;-1.#IND00&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR height="20" style="height:15.0pt"&gt;
			&lt;TD align="right" height="20" style="height:15.0pt"&gt;4.600011&lt;/TD&gt;
		&lt;/TR&gt;
	&lt;/TBODY&gt;
&lt;/TABLE&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 May 2016 12:56:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121408#M24981</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-11T12:56:45Z</dc:date>
    </item>
    <item>
      <title>I was able to link to the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121409#M24982</link>
      <description>&lt;P&gt;I was able to link to the intel openmp library and ran the test as you suggested.&lt;/P&gt;

&lt;P&gt;This is the results, I had to type it in because I cannot copy from the command window (so typos are on me).&lt;/P&gt;

&lt;P&gt;Doesn't seem particularly useful other than indicating everything seems fine. &amp;nbsp;I tried changing the number of threads to 2 instead and this does not change the issue.&lt;/P&gt;

&lt;P&gt;OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid lead 11 info&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: (0,1,2,3)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #156: KMP_AFFINITY: 4 available OR procs&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #179: KMP_AFFINITY: 1 package x 2 cores/pkg x 2 threads/core (2 total cores)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #242: KMP_AFFINITY: pid 8980 thread 0 bound to OS proc set (0,1,2,3)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #242: KMP_AFFINITY: pid 8980 thread 2 bound to OS proc set (0,1,2,3)&lt;/SPAN&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #242: KMP_AFFINITY: pid 8980 thread 1 bound to OS proc set (0,1,2,3)&lt;/SPAN&gt;&lt;/P&gt;

	&lt;DIV&gt;
		&lt;P style="font-size: 13.008px; line-height: 19.512px;"&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;OMP: Info #242: KMP_AFFINITY: pid 8980 thread 3 bound to OS proc set (0,1,2,3)&lt;/SPAN&gt;&lt;/P&gt;

		&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
	&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 11 May 2016 14:23:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121409#M24982</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-11T14:23:55Z</dc:date>
    </item>
    <item>
      <title>Hi Joseph, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121410#M24983</link>
      <description>&lt;P&gt;Hi Joseph,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks for the detailed reply. &amp;nbsp;Besides the test case, &amp;nbsp;it seems two issues:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;1. race condition.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;If you team have parallel studio XE Professional Edition, inspector. (it's main page :&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&lt;A href="https://signin.intel.com/logout?target=https://software.intel.com/en-us/intel-parallel-studio-xe" target="_blank"&gt;https://signin.intel.com/logout?target=https://software.intel.com/en-us/intel-parallel-studio-xe&lt;/A&gt;, it provide trial version) . It should be able to run whatever your fortran program or c program). &amp;nbsp;then locate the race condition.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;2. Variable result. &amp;nbsp;(&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;If I pass the exact same matrix in 50k times I get different results)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Except the solver's feature, as you know, any float point computing on computer may vary. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;Here is some article&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/getting-reproducible-results-with-intel-mkl" target="_blank"&gt;https://software.intel.com/en-us/articles/getting-reproducible-results-with-intel-mkl&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/introduction-to-the-conditional-numerical-reproducibility-cnr" target="_blank"&gt;https://software.intel.com/en-us/articles/introduction-to-the-conditional-numerical-reproducibility-cnr&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;before you create test case, &lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;I guess they are worth to try&amp;nbsp;&lt;/SPAN&gt;like&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;STRONG&gt;Memory alignment, &amp;nbsp;or&lt;/STRONG&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;set the environment variable&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5; font-family: Consolas, 'Lucida Console', Menlo, Monaco, 'DejaVu Sans Mono', monospace, sans-serif;"&gt;MKL_CBWR = COMPATIBLE etc. and threading model.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5; font-family: Consolas, 'Lucida Console', Menlo, Monaco, 'DejaVu Sans Mono', monospace, sans-serif;"&gt;Regarding the Intel OpenMP, MS openmp, it should ok to use both of them. &amp;nbsp;You mentioned, &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;I tried changing the number of threads to 2. &amp;nbsp;but from the result, the OpenMP still start &amp;nbsp;4 threads. &amp;nbsp;(it seems a little issue) . &amp;nbsp;any way, h&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;ow about to set OMP_NUM_THREADS=1.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Ying&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 May 2016 01:49:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121410#M24983</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-05-12T01:49:00Z</dc:date>
    </item>
    <item>
      <title>Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121411#M24984</link>
      <description>&lt;P&gt;Ying,&lt;/P&gt;

&lt;P&gt;1) &amp;nbsp;The race condition seems to be between dss_create and dss_reorder functions. &amp;nbsp;The dss_reorder function is also causing concurrency issues with very large case counts. &amp;nbsp;The race condition seems to be:&lt;/P&gt;

&lt;P&gt;Thread A: dss_create&lt;/P&gt;

&lt;P&gt;Thread B: dss_create&lt;/P&gt;

&lt;P&gt;Thread A: dss_define_structure&lt;/P&gt;

&lt;P&gt;Thread B: dss_define_structure&lt;/P&gt;

&lt;P&gt;Thread B: dss_reorder &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;-- &amp;nbsp;this is seemingly causing my issue if thread B runs dss_reorder before thread A.&lt;/P&gt;

&lt;P&gt;Thread A: dss_reorder&lt;/P&gt;

&lt;P&gt;Now this does not seem to ever happen with smaller matrices 50x50 or something like that. &amp;nbsp;When the matrix is a bit larger 12k x 12k it seems its possible.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;2) I've already been through the repeatability documentation, this did not help. &amp;nbsp;I did try changing the thread count to 1 which solves the problem. &amp;nbsp;But if the race condition is the culprit then this would make complete sense. &amp;nbsp;As far as the results I showed that was from the default 4 threads. &amp;nbsp;When I run the 2 thread or 1 thread case openmp only loads the respective number of threads.&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 14:00:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121411#M24984</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-13T14:00:10Z</dc:date>
    </item>
    <item>
      <title>So on a whim I decided to try</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121412#M24985</link>
      <description>&lt;P&gt;So on a whim I decided to try putting the dss_create through dss_reorder inside of a &amp;nbsp; #pragma omp critical {} &amp;nbsp;section &amp;nbsp;this does not affect the results any. &amp;nbsp;Even though everything seems to be pointing at the dss_reorder function I am not sure its the problem as the critical directive should have fixed the issue. &amp;nbsp;I did do a check to see how quickly the cores start having concurrency issues. &amp;nbsp;Interestingly it did cut down the run time by about 30%.&lt;/P&gt;

&lt;P&gt;In 5 minutes its down to 90% utilization, 10min its down to 70%, after 90 min its 40% after 2hrs its down to 30%. &amp;nbsp;I stopped the test after that. &amp;nbsp;I have seen it go down to 6% which is just 1 core doing work.&lt;/P&gt;</description>
      <pubDate>Fri, 13 May 2016 17:25:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121412#M24985</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-13T17:25:02Z</dc:date>
    </item>
    <item>
      <title>So I wanted to verify that</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121413#M24986</link>
      <description>&lt;P&gt;So I wanted to verify that the concurrency issue wasn't just a problem with my 2 processor system. &amp;nbsp;I ran a large number of cases over the weekend. &amp;nbsp;Initially, the cases take between 0.1 and 0.6s, towards the end (it never did finish) it was taking between 30 and 50s. &amp;nbsp;From the data that was collected (it reached a limit and stopped recording) the issue is still with the dss_reorder function. &amp;nbsp;That function does not seem to be thread safe.... unfortunately.&lt;/P&gt;

&lt;P&gt;Also, I was recently informed I cannot send any matrices as apparently they are vaguely considered ITAR controlled and everyone wants to err on the side of caution.&lt;/P&gt;

&lt;P&gt;I will see if I can generate a fake matrix that will cause the same issues.&lt;/P&gt;</description>
      <pubDate>Mon, 16 May 2016 12:23:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121413#M24986</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-16T12:23:56Z</dc:date>
    </item>
    <item>
      <title>I've been using the dss</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121414#M24987</link>
      <description>&lt;P&gt;I've been using the dss routines in a few of my codes and just tried the multithreaded version, which fails to output meaningful numbers. After reading this thread, I tried turning off multithreading before the call to dss_reorder (&amp;nbsp;using mkl_set_num_threads ( 1 ) )&amp;nbsp;&amp;nbsp;and then turning it back on after the reorder. The good news is that the solver now works again; the bad news is that it is as slow as the sequential version so there's no point in multithreading. &amp;nbsp;For reference i'm solving a complex symmetric system and running this on OS X with version 11.3 of the compiler and MKL.&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2016 16:34:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121414#M24987</guid>
      <dc:creator>Kerry_K_</dc:creator>
      <dc:date>2016-05-27T16:34:00Z</dc:date>
    </item>
    <item>
      <title>Hi Kerry, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121415#M24988</link>
      <description>&lt;P&gt;Hi Kerry,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;As you know, it is hard to reproduce without workload.&amp;nbsp;&amp;nbsp;is it possible for you provide us one reproduce test case? &amp;nbsp;If the test case is private you can send me the email by "send author a message"&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 May 2016 05:53:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121415#M24988</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-05-30T05:53:49Z</dc:date>
    </item>
    <item>
      <title>Kerry,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121416#M24989</link>
      <description>&lt;P&gt;Kerry,&lt;/P&gt;

&lt;P&gt;Yes, there is definitely a bug of some sort within dss_reorder that seems to be related to thread safety. &amp;nbsp;Funny enough the issue I've been having is when I run multiple instances of dss in sequential mode. &amp;nbsp;For me it seems there is a race condition between dss_create and dss_reorder. &amp;nbsp;For now I'm running the problem in parallel. &amp;nbsp;I take a performance hit though from the overhead of running dss in parallel. &amp;nbsp;It also seems that the scheduler cannot keep all of the cores fed in parallel mode.&lt;/P&gt;

&lt;P&gt;I've been able to recreate the issue using only input matrices, but haven't been able to make a synthetic matrix be able to reproduce the bug and I cannot send Intel my working matrices.&lt;/P&gt;

&lt;P&gt;You say its as slow as the sequential version? but dss_reorder isn't particularly intensive the dss_solve_... is though. &amp;nbsp;For me at least dss_solve_... &amp;nbsp;is about 70% of my runtime (100+hrs) &amp;nbsp;while dss_reorder is less than 10%. &amp;nbsp;Although, I suppose if you have a massive matrix that requires a lot of reordering than reorder could utilize more of the runtime.&lt;/P&gt;</description>
      <pubDate>Tue, 31 May 2016 12:45:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121416#M24989</guid>
      <dc:creator>joseph_v_1</dc:creator>
      <dc:date>2016-05-31T12:45:06Z</dc:date>
    </item>
    <item>
      <title>Joseph, I apologize for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121417#M24990</link>
      <description>&lt;P&gt;Joseph, I apologize for chiming in here a few years later, but did you ever find any resolutions or workarounds for these issues? By the way, I appreciate the thoroughness of your investigations and explanation of your problem. You clearly spent many hours trying to understand the problem and find a solution.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2020 19:38:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Concurrency-Issues-with-DSS/m-p/1121417#M24990</guid>
      <dc:creator>CW</dc:creator>
      <dc:date>2020-02-10T19:38:15Z</dc:date>
    </item>
  </channel>
</rss>

