Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

strange behavior of the cluster_sparse_solver

Dan_Ghiocel
Beginner
1,008 Views

Dear experts,
    I'm trying to use INTEL cluster sparse solver to solve a big size of symmetric complex number equations. I found that my code has strange behavior,  it gives the correct solution or crashes as the number of the cluster processes is changed.
    My cluster consists of two nodes, each node has Intel i7-5820k CPU with 6 cores, 128GB memory. The OS is OpenSUSE Leap version 42.3. The compiler is Intel Parallel Studio Cluster version, 2018.3.222.
    Based on the hardware configuration, I am able to run the code with the number of processes from 1 to 12.
    When running the code with 1 process, the code gives the correct solution.
    When 2 to 8 processes are used, the code is crashed, shown the following error messages.
    
***** < mpirun -n 2 ./test_psolver_v1.a > <Error Message>*****    
Fatal error in PMPI_Allgather: Message truncated, error stack:
PMPI_Allgather(1093)....................: MPI_Allgather(sbuf=0x7ffc336490e4, scount=1, MPI_INT, rbuf=0x8afcb80, rcount=1, MPI_INT, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(908)................: fail failed
MPIR_Allgather(861).....................: fail failed
MPIR_Allgather_intra(681)...............: fail failed
MPIDI_CH3_PktHandler_EagerShortSend(457): Message from rank 0 and tag 7 truncated; 8 bytes received but buffer size is 4

    
***** < mpirun -n 3 ./test_psolver_v1.a > <Error Message>*****    
Fatal error in PMPI_Allgather: Invalid count, error stack:
PMPI_Allgather(1093).....: MPI_Allgather(sbuf=0x8b53ea0, scount=1, MPI_LONG_LONG_INT, rbuf=0x8b54300, rcount=1, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(908).: fail failed
MPIR_Allgather(861)......: fail failed
MPIR_Allgather_intra(332): fail failed
MPIC_Send(335)...........: Negative count, value is -32766
Fatal error in PMPI_Allgather: Message truncated, error stack:
PMPI_Allgather(1093)....................: MPI_Allgather(sbuf=0x7ffd59422ee4, scount=1, MPI_INT, rbuf=0x9cf6e00, rcount=1, MPI_INT, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(908)................: fail failed
MPIR_Allgather(861).....................: fail failed
MPIR_Allgather_intra(267)...............: fail failed
MPIDI_CH3_PktHandler_EagerShortSend(457): Message from rank 0 and tag 7 truncated; 16 bytes received but buffer size is 12

***** < mpirun -n 7 ./test_psolver_v1.a > <Error Message>*****    
Fatal error in PMPI_Allgather: Invalid count, error stack:
PMPI_Allgather(1093).....: MPI_Allgather(sbuf=0xa4781a0, scount=1, MPI_LONG_LONG_INT, rbuf=0xa476580, rcount=1, MPI_LONG_LONG_INT, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(908).: fail failed
MPIR_Allgather(861)......: fail failed
MPIR_Allgather_intra(332): fail failed
MPIC_Send(335)...........: Negative count, value is -32766
MPIR_Allgather_intra(267): fail failed
MPIC_Sendrecv(547).......: Negative count, value is -32764
Fatal error in PMPI_Allgather: Message truncated, error stack:
PMPI_Allgather(1093)....................: MPI_Allgather(sbuf=0x7fff02c0b5e4, scount=1, MPI_INT, rbuf=0x86c7600, rcount=1, MPI_INT, MPI_COMM_WORLD) failed
MPIR_Allgather_impl(908)................: fail failed
MPIR_Allgather(861).....................: fail failed
MPIR_Allgather_intra(267)...............: fail failed
MPIDI_CH3_PktHandler_EagerShortSend(457): Message from rank 4 and tag 7 truncated; 16 bytes received but buffer size is 12
MPIR_Allgather_intra(267)...............: fail failed
MPIDI_CH3U_Receive_data_found(131)......: Message from rank 2 and tag 7 truncated; 32 bytes received but buffer size is 28
    


    When running the code with 9 to 12 processes, the code gives the correct solution again.
    I searched the Internet, but I don't get any useful information to solve it. Is this MKL problem or I did something wrong? Can you help me?

    The following are my code and test input files.

    Thank you very much.

    Dan   

0 Kudos
9 Replies
Gennady_F_Intel
Moderator
1,008 Views

Thanks we will check the problem on our side. At the first glance everething is ok with your code. 

0 Kudos
Gennady_F_Intel
Moderator
1,008 Views

could you please show exactly how did you link the example?

0 Kudos
Dan_Ghiocel
Beginner
1,008 Views

Dear Gennady,

Thanks for your reply. I compiled the code with debug information, and linked the code with either static sequential library or thread library. Both of them have same behavior. The compile and link commands are listed in the below.

compile command for the code:
${MPIROOT}/bin64/mpiifort  -warn all -g -c mkl_cluster_sparse_solver.f90 -heap-arrays -traceback -I${MKLROOT}/include
${MPIROOT}/bin64/mpiifort  -warn all -g -c test_cluster_solver_v1.f90 -heap-arrays -traceback -I${MKLROOT}/include

For using static sequential library:
${MPIROOT}/bin64/mpiifort test_cluster_solver_v1.o mkl_cluster_sparse_solver.o -o test_psolver_v1.exe -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread -lm -ldl

For using static thread library:
${MPIROOT}/bin64/mpiifort test_cluster_solver_v1.o mkl_cluster_sparse_solver.o -o test_psolver_v1mp.exe -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

Wish it will be helpful. Thank you very much.

Dan

 

 

 

 

0 Kudos
Gennady_F_Intel
Moderator
1,008 Views

Dan, could you share the type of CPU you running this case? smth like lscpu output...  

I ran this code on SKL system and see no problems so far with the latest MKL 2019.1

here is output I see on my side

[gfedorov@skl5 u799949]$ mpiexec -n 7 ./test.out

   Finish initializing iparm(:) ...

   Num. of Equ= 1000   Num. of NZ=    226509

   Num. of Equ= 1000   Num. of rhs=         1

   Data have been read in. ...
Memory allocated on phase  11 on Rank # 0       7.4155 MB
Memory allocated on phase  11 on Rank # 1       7.2276 MB
Memory allocated on phase  11 on Rank # 2       7.0090 MB
Memory allocated on phase  11 on Rank # 3       6.7920 MB
Memory allocated on phase  11 on Rank # 4       6.6012 MB
Memory allocated on phase  11 on Rank # 5       6.3842 MB
Memory allocated on phase  11 on Rank # 6       7.3950 MB

   Reordering completed ...
Number of non-zeros in L on Rank # 0    30730
Number of non-zeros in U on Rank # 0    1
Number of non-zeros in L on Rank # 1    24384
Number of non-zeros in U on Rank # 1    1
Number of non-zeros in L on Rank # 2    20288
Number of non-zeros in U on Rank # 2    1
Number of non-zeros in L on Rank # 3    16192
Number of non-zeros in U on Rank # 3    1
Number of non-zeros in L on Rank # 4    94491
Number of non-zeros in U on Rank # 4    1
Number of non-zeros in L on Rank # 5    44736
Number of non-zeros in U on Rank # 5    1
Number of non-zeros in L on Rank # 6    291558
Number of non-zeros in U on Rank # 6    1
Memory allocated on phase  22 on Rank # 0       21.4268 MB
Memory allocated on phase  22 on Rank # 1       21.1421 MB
Memory allocated on phase  22 on Rank # 2       20.8610 MB
Memory allocated on phase  22 on Rank # 3       20.5814 MB
Memory allocated on phase  22 on Rank # 4       21.5854 MB
Memory allocated on phase  22 on Rank # 5       20.6092 MB
Memory allocated on phase  22 on Rank # 6       26.1679 MB

Percentage of computed non-zeros for LL^T factorization
 7 %  100 %

   Factorization completed ...

   Solution completed ...

   The Rank=    0    Call mpi_finalize(...), error code =    0

> the similar I see with 9 MPI processes and etc...

 

0 Kudos
Dan_Ghiocel
Beginner
1,008 Views

Dear Gennady,

Thank for your response. I also wonder what reason that causes the code strange behavior with the number of processes. Is it possible caused by my platform software configuration?

My current test platform has 2 identical nodes. Each node has Intel i7-5820k CPU with 6 cores, 128GB memory. The OS is OpenSUSE Leap version 42.3. The compiler is Intel Parallel Studio Cluster version, 2018.3.222.

The CPU information you like to know is in the attachment, which I got by lscpu for one node, and cluster's processes information. I hope it is useful to debug.

Thank you for your help.

Dan

 

0 Kudos
Gennady_F_Intel
Moderator
1,008 Views

Hi Dan, this helped. I see some hanging happens with 3 mpi processes on such system. Could you submit this problem to the intel online service center?

0 Kudos
Dan_Ghiocel
Beginner
1,008 Views

Dear Gennady,

Thank for your time investigating the problem. I will get help from the Intel online service center.

Have a great Thanksgiving day.

Dan

 

0 Kudos
Gennady_F_Intel
Moderator
1,008 Views

Dan, the problem you reported has been fixed and the fix we are planning to release the next update.  In the case if you want to check the engineerig build to try on your side, please open the Intel online service center issue and we will shared with you these binaries. 

0 Kudos
Gennady_F_Intel
Moderator
1,008 Views

could you please check MKL 2019 update 4, based on internal records, this case has been fixed in this update

0 Kudos
Reply