<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020. in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1232029#M30391</link>
    <description>&lt;P&gt;A correction to my previous reply: the fix is already available in MKL 2020u4 (and will also be a part of oneMKL 2021.1, that part was correct).&lt;/P&gt;</description>
    <pubDate>Thu, 26 Nov 2020 18:51:12 GMT</pubDate>
    <dc:creator>Kirill_V_Intel</dc:creator>
    <dc:date>2020-11-26T18:51:12Z</dc:date>
    <item>
      <title>MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207367#M30014</link>
      <description>&lt;P&gt;Dear Gennady and Kirill,&lt;/P&gt;
&lt;P&gt;We've come across an error trying to use the tracer tool to debug the MPI section of our code using the -check_mpi linking flag. The error happens within the first call to cluster_sparse_solver (Symbolic factorization). We get an error for collective SIZE mismatch in a call to MPI_Gatherv from MKLMPI_Gatherv. We've noted this also in our main source code (FDS) in Linux also using IMPI, also Parallel Studio XE 2020 u1.&lt;/P&gt;
&lt;P&gt;I used our demonstration code the solver an 8 MPI process Poisson problem using the cluster_sparse_solver to verify the find. Use the tarball attached and follow the instructions in the README:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. type: $ source /opt/intel20/parallel_studio_xe_2020/psxevars.sh&lt;/P&gt;
&lt;P&gt;2. make a test dir in the same level as the source/ directory extracted&lt;/P&gt;
&lt;P&gt;3. In source/ execute the make_test.sh to compile&lt;/P&gt;
&lt;P&gt;4. In test/ run in test the css_test program with 8 MPI procs.&lt;/P&gt;
&lt;P&gt;Any help on why this is coming up would ge gratly appreciated.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your time and attention.&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PS: Here is the std error:&lt;/P&gt;
&lt;P&gt;[~test]$ mpirun -n 8 ./css_test&lt;/P&gt;
&lt;P&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;BR /&gt;MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found&lt;/P&gt;
&lt;P&gt;[0] INFO: CHECK LOCAL:EXIT:SIGNAL ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:EXIT:BEFORE_MPI_FINALIZE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MPI:CALL_FAILED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:OVERLAP ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_MODIFICATION ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:INACCESSIBLE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_ACCESS OFF&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:INITIALIZATION OFF&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:ILLEGAL_CALL ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:NOT_FREED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:PREMATURE_FREE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:DATATYPE:NOT_FREED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:BUFFER:INSUFFICIENT_BUFFER ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:HARD ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:POTENTIAL ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:NO_PROGRESS ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:DATATYPE:MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:DATA_TRANSMISSION_CORRUPTED ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:PENDING ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:DATATYPE:MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:DATA_TRANSMISSION_CORRUPTED ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:OPERATION_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:SIZE_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:REDUCTION_OPERATION_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:ROOT_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:INVALID_PARAMETER ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:COMM_FREE_MISMATCH ON&lt;BR /&gt;[0] INFO: maximum number of errors before aborting: CHECK-MAX-ERRORS 1&lt;BR /&gt;[0] INFO: maximum number of reports before aborting: CHECK-MAX-REPORTS 0 (= unlimited)&lt;BR /&gt;[0] INFO: maximum number of times each error is reported: CHECK-SUPPRESSION-LIMIT 10&lt;BR /&gt;[0] INFO: timeout for deadlock detection: DEADLOCK-TIMEOUT 60s&lt;BR /&gt;[0] INFO: timeout for deadlock warning: DEADLOCK-WARNING 300s&lt;BR /&gt;[0] INFO: maximum number of reported pending messages: CHECK-MAX-PENDING 20&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Starting Program ...&lt;/P&gt;
&lt;P&gt;MPI Process 0 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 1 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 2 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 3 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 4 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 5 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 6 started on blaze.el.nist.gov&lt;BR /&gt;MPI Process 7 started on blaze.el.nist.gov&lt;BR /&gt;Into factorization Phase..&lt;/P&gt;
&lt;P&gt;[0] &lt;STRONG&gt;ERROR: GLOBAL:COLLECTIVE:SIZE_MISMATCH&lt;/STRONG&gt;: error&lt;BR /&gt;[0] ERROR: Mismatch found in local rank [0] (global rank [0]),&lt;BR /&gt;[0] ERROR: other processes may also be affected.&lt;BR /&gt;[0] ERROR: Root expects 442368 items but 110592 sent by local rank [0] (same as global rank):&lt;BR /&gt;[0] ERROR: &lt;STRONG&gt;MPI_Gatherv&lt;/STRONG&gt;(*sendbuf=0x2b6882aac240, sendcount=110592, sendtype=MPI_INT, *recvbuf=0x2b6882f64080, *recvcounts=0xa4f5c80, *displs=0xa4f5d00, recvtype=MPI_INT, root=0, comm=0xffffffffc4000000 SPLIT COMM_WORLD [0:7])&lt;BR /&gt;[0] ERROR: &lt;STRONG&gt;MKLMPI_Gatherv&lt;/STRONG&gt; (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: &lt;STRONG&gt;mkl_pds_lp64_cpardiso_mpi_gatherv&lt;/STRONG&gt; (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: &lt;STRONG&gt;mkl_pds_lp64_assemble_csr_full&lt;/STRONG&gt; (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: &lt;STRONG&gt;mkl_pds_lp64_cluster_sparse_solver&lt;/STRONG&gt; (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: MAIN__ (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/source/main.f90:269)&lt;BR /&gt;[0] ERROR: main (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: __libc_start_main (/usr/lib64/libc-2.17.so)&lt;BR /&gt;[0] ERROR: (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: No problem found in the 7 processes with local ranks [1:7] (same as global ranks):&lt;BR /&gt;[0] ERROR: MPI_Gatherv(*sendbuf=..., sendcount=110592, sendtype=MPI_INT, *recvbuf=..., *recvcounts=..., *displs=..., recvtype=MPI_INT, root=0, comm=... SPLIT COMM_WORLD [0:7])&lt;BR /&gt;[0] ERROR: MKLMPI_Gatherv (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: mkl_pds_lp64_cpardiso_mpi_gatherv (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: mkl_pds_lp64_assemble_csr_full (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: mkl_pds_lp64_cluster_sparse_solver (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: MAIN__ (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/source/main.f90:269)&lt;BR /&gt;[0] ERROR: main (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] ERROR: __libc_start_main (/usr/lib64/libc-2.17.so)&lt;BR /&gt;[0] ERROR: (/home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST_CHECKMPI/test/css_test)&lt;BR /&gt;[0] INFO: 1 error, limit CHECK-MAX-ERRORS reached =&amp;gt; aborting&lt;BR /&gt;[0] WARNING: starting premature shutdown&lt;/P&gt;
&lt;P&gt;[0] INFO: GLOBAL:COLLECTIVE:SIZE_MISMATCH: found 1 time (1 error + 0 warnings), 0 reports were suppressed&lt;BR /&gt;[0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed.&lt;/P&gt;
&lt;P&gt;....&lt;/P&gt;
&lt;P&gt;.....&lt;/P&gt;</description>
      <pubDate>Mon, 07 Sep 2020 15:27:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207367#M30014</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-07T15:27:47Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207478#M30017</link>
      <description>&lt;P&gt;Hello Marcos,&lt;/P&gt;
&lt;P&gt;Just a quick question while I'm looking for the PSXE at my disposal: do you see any falures when you don't use the trace analyzer and collector?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Kirill&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 02:21:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207478#M30017</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-09-08T02:21:58Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207610#M30026</link>
      <description>&lt;P&gt;Morning Kirill, thank you for looking into this. I actually also see the error only invoking the -check_mpi linking flag when compiling, without sourcing psxevars.sh.&lt;/P&gt;
&lt;P&gt;So, just compiling and running css_test you should be able to reproduce the error.&lt;/P&gt;
&lt;P&gt;Thank yo for your time, best&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 13:56:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207610#M30026</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-08T13:56:38Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207613#M30027</link>
      <description>&lt;P&gt;Sorry, what I meant by this is running the compiled css_test with -check_mpi and pxevars.sh sourced in a terminal where psxevars.sh has not been sourced. It probably is the same situation as having sourced psxevar.sh.&lt;/P&gt;
&lt;P&gt;In order to be able to compile with -check_mpi you need to source psxevars.sh. Without the flag the code runs.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 14:02:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207613#M30027</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-08T14:02:17Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207626#M30029</link>
      <description>&lt;P&gt;compiling and running your example without&amp;nbsp;-check_mpi,&lt;/P&gt;
&lt;P&gt;I see no problems on my end:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Starting Program ...&lt;/P&gt;
&lt;P&gt;MPI Process 0 started on cerberos&lt;BR /&gt;MPI Process 1 started on cerberos&lt;BR /&gt;MPI Process 2 started on cerberos&lt;BR /&gt;MPI Process 6 started on cerberos&lt;BR /&gt;MPI Process 7 started on cerberos&lt;BR /&gt;MPI Process 3 started on cerberos&lt;BR /&gt;MPI Process 4 started on cerberos&lt;BR /&gt;MPI Process 5 started on cerberos&lt;BR /&gt;Into factorization Phase..&lt;BR /&gt;Into solve Phase..&lt;BR /&gt;NSOLVES = 100&lt;BR /&gt;NSOLVES = 200&lt;BR /&gt;NSOLVES = 300&lt;BR /&gt;NSOLVES = 400&lt;BR /&gt;NSOLVES = 500&lt;BR /&gt;NSOLVES = 600&lt;BR /&gt;NSOLVES = 700&lt;BR /&gt;NSOLVES = 800&lt;BR /&gt;NSOLVES = 900&lt;BR /&gt;NSOLVES = 1000&lt;BR /&gt;NSOLVES = 1100&lt;BR /&gt;NSOLVES = 1200&lt;BR /&gt;NSOLVES = 1300&lt;BR /&gt;NSOLVES = 1400&lt;BR /&gt;NSOLVES = 1500&lt;BR /&gt;NSOLVES = 1600&lt;BR /&gt;NSOLVES = 1700&lt;BR /&gt;NSOLVES = 1800&lt;BR /&gt;NSOLVES = 1900&lt;BR /&gt;NSOLVES = 2000&lt;BR /&gt;NSOLVES = 2100&lt;BR /&gt;NSOLVES = 2200&lt;BR /&gt;NSOLVES = 2300&lt;BR /&gt;NSOLVES = 2400&lt;/P&gt;
&lt;P&gt;......&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 15:06:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207626#M30029</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-09-08T15:06:01Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207632#M30030</link>
      <description>&lt;P&gt;Hi Gennady, correct. The error comes with compiling with the -check_mpi flag (previously sourcing psxvars.sh).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 15:56:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207632#M30030</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-08T15:56:37Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207647#M30031</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I confirm the issue. The test fails when it is run with -check_mpi as Marcos described (I believe the Trace analyzer and collector forces the stop). The reported size mismatch needs to be investigated.&lt;/P&gt;
&lt;P&gt;B&lt;SPAN style="font-family: inherit;"&gt;est,&lt;BR /&gt;&lt;/SPAN&gt;Kirill&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 16:57:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207647#M30031</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-09-08T16:57:34Z</dc:date>
    </item>
    <item>
      <title>Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207832#M30033</link>
      <description>&lt;P&gt;The issue is escalated and this thread would be keep being updated.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Sep 2020 07:19:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1207832#M30033</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-09-09T07:19:11Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208062#M30038</link>
      <description>&lt;P&gt;Hello Marcos,&lt;/P&gt;
&lt;P&gt;The root cause is a bug in how the distributed CSR matrix is assembled inside the cluster sparse solver. We'll fix it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Meanwhile, I have the following workaround for you to try if you have time:&lt;/P&gt;
&lt;P&gt;1) Assemble the input matrix (and also solution and rhs vector) on the root (main MPI process) so that iparm(40) = 0 can be used.&lt;/P&gt;
&lt;P&gt;2) Distribute the matrix across MPI processes with intersections (so that some processes got rows in common), meaning that the ranges of [iparm(41); iparm(42)) will have an intersection across MPIs.&lt;/P&gt;
&lt;P&gt;I am not 100% sure as I haven't checked them yet but I believe any one of these two should solve the problem. I'd try the first one.&lt;/P&gt;
&lt;P&gt;I hope this helps.&lt;/P&gt;
&lt;P&gt;Best,&lt;BR /&gt;Kirill&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2020 21:52:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208062#M30038</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-09-09T21:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208288#M30046</link>
      <description>&lt;P&gt;Good Morning Kirill,&lt;/P&gt;
&lt;P&gt;Great to see the root cause of the error has been found. For us it doesn't make much sense to build the global Poisson matrix in Process 0 as it doesn't have information of other meshes held by other processes.&lt;/P&gt;
&lt;P&gt;We will have to wait for the fix and new release of MKL. Thank you very much for your time and attention.&lt;/P&gt;
&lt;P&gt;Best,&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;</description>
      <pubDate>Thu, 10 Sep 2020 13:57:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208288#M30046</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-10T13:57:35Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208506#M30049</link>
      <description>&lt;P&gt;Hi Marcos,&lt;/P&gt;
&lt;P&gt;I totally understand that it can be unnatural from the perspective of assembling pieces of discretization. What I suggest is to write a small code which will organize MPI communications between processes to form the matrix on the MPI root process.&lt;/P&gt;
&lt;P&gt;I guess we can provide such a snippet from our side if needed (this would need a communication outside of this forum). It will take local CSR matrix on each process and assemble the global matrix on the root via MPI.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The rationale of this suggestion is to make it possible for you to not wait on the next release.&lt;/P&gt;
&lt;P&gt;Let us know if you think it will help you proceed with your project faster.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Kirill&lt;/P&gt;</description>
      <pubDate>Fri, 11 Sep 2020 01:36:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208506#M30049</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-09-11T01:36:27Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208664#M30062</link>
      <description>&lt;P&gt;Hi Kirill, thank you very much for the offer. I would not worry about this, even though it would be interesting personally to see how the comm is setup to send back the Matrices to 0.&lt;/P&gt;
&lt;P&gt;I think we can wait for the next MKL release, noting that if doing tests with -check_mpi we don't want to use the cluster solver (we have other non-MKL Poisson solver based in Fishpack which is the default). This is a new flag we are using as we learn to use the tracer tool, but it is not yet set in our targets being compiled in our nightly builds/continuous integration.&lt;/P&gt;
&lt;P&gt;Again thank you, and best regards&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Sep 2020 12:32:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1208664#M30062</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-09-11T12:32:12Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1229957#M30378</link>
      <description>&lt;P&gt;Dear Kirill and Gennady, do you know if there have been any updates on this issue?&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;Marcos&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 18:52:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1229957#M30378</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-11-19T18:52:11Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1229982#M30379</link>
      <description>&lt;P&gt;Hi Marcos!&lt;/P&gt;&lt;P&gt;The fix should become available in oneMKL 2021 Gold release which is going to be released soon AFAIK.&lt;/P&gt;&lt;P&gt;Best,&lt;BR /&gt;Kirill&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 19:42:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1229982#M30379</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-11-19T19:42:58Z</dc:date>
    </item>
    <item>
      <title>Re: MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1232029#M30391</link>
      <description>&lt;P&gt;A correction to my previous reply: the fix is already available in MKL 2020u4 (and will also be a part of oneMKL 2021.1, that part was correct).&lt;/P&gt;</description>
      <pubDate>Thu, 26 Nov 2020 18:51:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1232029#M30391</guid>
      <dc:creator>Kirill_V_Intel</dc:creator>
      <dc:date>2020-11-26T18:51:12Z</dc:date>
    </item>
    <item>
      <title>Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 2020.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233548#M30404</link>
      <description>&lt;P&gt;This issue has been resolved and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 02 Dec 2020 15:36:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233548#M30404</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-02T15:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 202</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233555#M30406</link>
      <description>&lt;P&gt;Thank you Gennady and Kirill.&lt;/P&gt;
&lt;P&gt;Have a great day,&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2020 15:46:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233555#M30406</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-12-02T15:46:50Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 202</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233671#M30410</link>
      <description>&lt;P&gt;Hi Gennady, I'm seeing another issue. If you run the posted self contained program compiled with the -check_mpi flag and Update 4, it goes through the numerical factorization successfully but after 1500 solves the program crashes with a PMPI_Comm_free() error, see below:&lt;/P&gt;
&lt;P&gt;[0] INFO: CHECK LOCAL:EXIT:SIGNAL ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:EXIT:BEFORE_MPI_FINALIZE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MPI:CALL_FAILED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:OVERLAP ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_MODIFICATION ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:INACCESSIBLE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_ACCESS OFF&lt;BR /&gt;[0] INFO: CHECK LOCAL:MEMORY:INITIALIZATION OFF&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:ILLEGAL_CALL ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:NOT_FREED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:REQUEST:PREMATURE_FREE ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:DATATYPE:NOT_FREED ON&lt;BR /&gt;[0] INFO: CHECK LOCAL:BUFFER:INSUFFICIENT_BUFFER ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:HARD ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:POTENTIAL ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:DEADLOCK:NO_PROGRESS ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:DATATYPE:MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:DATA_TRANSMISSION_CORRUPTED ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:MSG:PENDING ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:DATATYPE:MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:DATA_TRANSMISSION_CORRUPTED ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:OPERATION_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:SIZE_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:REDUCTION_OPERATION_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:ROOT_MISMATCH ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:INVALID_PARAMETER ON&lt;BR /&gt;[0] INFO: CHECK GLOBAL:COLLECTIVE:COMM_FREE_MISMATCH ON&lt;BR /&gt;[0] INFO: maximum number of errors before aborting: CHECK-MAX-ERRORS 1&lt;BR /&gt;[0] INFO: maximum number of reports before aborting: CHECK-MAX-REPORTS 0 (= unlimited)&lt;BR /&gt;[0] INFO: maximum number of times each error is reported: CHECK-SUPPRESSION-LIMIT 10&lt;BR /&gt;[0] INFO: timeout for deadlock detection: DEADLOCK-TIMEOUT 60s&lt;BR /&gt;[0] INFO: timeout for deadlock warning: DEADLOCK-WARNING 300s&lt;BR /&gt;[0] INFO: maximum number of reported pending messages: CHECK-MAX-PENDING 20&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Starting Program ...&lt;/P&gt;
&lt;P&gt;MPI Process 0 started on blaze002.backend&lt;BR /&gt;MPI Process 1 started on blaze002.backend&lt;BR /&gt;MPI Process 2 started on blaze002.backend&lt;BR /&gt;MPI Process 3 started on blaze002.backend&lt;BR /&gt;MPI Process 4 started on blaze002.backend&lt;BR /&gt;MPI Process 5 started on blaze002.backend&lt;BR /&gt;MPI Process 6 started on blaze002.backend&lt;BR /&gt;MPI Process 7 started on blaze002.backend&lt;BR /&gt;Into factorization Phase..&lt;BR /&gt;Into solve Phase..&lt;BR /&gt;NSOLVES = 100&lt;BR /&gt;NSOLVES = 200&lt;BR /&gt;NSOLVES = 300&lt;BR /&gt;NSOLVES = 400&lt;BR /&gt;NSOLVES = 500&lt;BR /&gt;NSOLVES = 600&lt;BR /&gt;NSOLVES = 700&lt;BR /&gt;NSOLVES = 800&lt;BR /&gt;NSOLVES = 900&lt;BR /&gt;NSOLVES = 1000&lt;BR /&gt;NSOLVES = 1100&lt;BR /&gt;NSOLVES = 1200&lt;BR /&gt;NSOLVES = 1300&lt;BR /&gt;NSOLVES = 1400&lt;BR /&gt;NSOLVES = 1500&lt;BR /&gt;[6] ERROR: Unexpected MPI error, aborting:&lt;BR /&gt;[6] ERROR: Invalid communicator, error stack:&lt;BR /&gt;[6] ERROR: PMPI_Comm_free(137): MPI_Comm_free(comm=0xa343e90) failed&lt;BR /&gt;[6] ERROR: PMPI_Comm_free(85).: Null communicator&lt;BR /&gt;[7] ERROR: Unexpected MPI error, aborting:&lt;BR /&gt;[7] ERROR: Invalid communicator, error stack:&lt;BR /&gt;[7] ERROR: PMPI_Comm_free(137): MPI_Comm_free(comm=0x9dd1e20) failed&lt;BR /&gt;[7] ERROR: PMPI_Comm_free(85).: Null communicator&lt;BR /&gt;Abort(1) on node 7 (rank 7 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 7&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you guys see if you can reproduce this new issue on your side? This is a linux machine (cluster) as described in the post.&lt;/P&gt;
&lt;P&gt;Thank you,&lt;/P&gt;
&lt;P&gt;Marcos&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2020 22:07:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233671#M30410</guid>
      <dc:creator>Marcos_V_1</dc:creator>
      <dc:date>2020-12-02T22:07:36Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 202</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233767#M30411</link>
      <description>&lt;P&gt;Ok, we will&amp;nbsp; check asap&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2020 03:52:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233767#M30411</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-03T03:52:09Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MKL : Error running cluster_sparse_solver with -check_mpi file and tracer in Linux, PS XE 202</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233808#M30412</link>
      <description>&lt;P&gt;I see no issues with MKL 2020 u4.&amp;nbsp; I see &amp;gt; 20000 steps were done successfully and I stopped the execution.&lt;/P&gt;
&lt;P&gt;[gfedorov@cerberos test]$ &lt;STRONG&gt;mpirun -n 8 ./css_test&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Starting Program ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;MPI Process&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 started on cerberos&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt;....&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-family: inherit;"&gt; MPI Process&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7 started on cerberos&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Into factorization Phase..&lt;/P&gt;
&lt;P&gt;OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.&lt;/P&gt;
&lt;P&gt;OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.&lt;/P&gt;
&lt;P&gt;OMP: Info #274: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Into solve Phase..&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 100&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 200&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 300&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 400&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;500&lt;/P&gt;
&lt;P&gt;&amp;nbsp;……………………&lt;/P&gt;
&lt;P&gt;…………………….&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20700&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20800&lt;/P&gt;
&lt;P&gt;&amp;nbsp;NSOLVES =&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20900&lt;/P&gt;
&lt;P&gt;[mpiexec@cerberos] Sending Ctrl-C to processes as requested&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2020 07:13:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Error-running-cluster-sparse-solver-with-check-mpi-file-and/m-p/1233808#M30412</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-03T07:13:07Z</dc:date>
    </item>
  </channel>
</rss>

