Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Cluster pardiso gave wrong results when np>2

Letian_W_
Beginner
250 Views
I'm trying to test the cluster pardiso to solver complex linear equations with Hermitian matrix. When I ran the parallel program using -np 2 option, everything looks fine. However, when I used more MPIs like -np 4, it reported errors and gave wrong results. Please see the attached code, output with np 2 and np 4 as well as error message with np 4. Please have the attached FORTRAN code. Any suggestions are greatly appreciated. Thanks. ******************************************************************************** program cluster_sparse_solver_complex_unsym implicit none include 'mkl_cluster_sparse_solver.fi' include 'mpif.h' C.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) pt(64) C.. All other variables INTEGER maxfct, mnum, mtype, phase, n, nrhs, error, msglvl INTEGER*4 rank, mpi_stat INTEGER iparm(64) INTEGER ia(9) INTEGER ja(13) COMPLEX*16 a(13) COMPLEX*16 b(8) COMPLEX*16 bs(8) COMPLEX*16 x(8) INTEGER i, idum(1) COMPLEX*16 ddum(1) COMPLEX*16 res, res0 complex*16 c_temp,alpha,beta character*1 matdescra(6) C.. Fill all arrays containing matrix data. DATA n /8/, nrhs /1/, maxfct /1/, mnum /1/ data ia/1,3,5,7,8,9,11,13,14/ data ja/1,2,2,3,3,4,4,5,6,7,7,8,8/ data b/3,4,5,6,7,8,1,2/ a(1)=15d0 a(2)=(3d0,1d0) a(3)=14d0 a(4)=(2d0,1.1d0) a(5)=20d0 a(6)=(1d0,0.5d0) a(7)=25d0 a(8)=20d0 a(9)=15d0 a(10)=(1d0,0.2d0) a(11)=17d0 a(12)=(2d0,0.5d0) a(13)=25d0 C.. variables for matrix multiplication matdescra(1)='H' ! Hermitian for cyclic complex matrix matdescra(2)='U' ! upper matrix matdescra(3)='N' ! non unit diagonal matdescra(4)='F' ! one based indexing alpha=1.0 beta=0.0 C.. C.. Initialize MPI. call MPI_INIT(mpi_stat) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, mpi_stat) C.. C.. Set up Cluster Sparse Solver control parameter C.. do i = 1, 64 iparm(i) = 0 enddo iparm(1) = 1 ! no solver default iparm(2) = 2 ! fill-in reordering from METIS iparm(6) = 0 ! =0 solution on the first n compoments of x iparm(8) = 2 ! numbers of iterative refinement steps iparm(10) = 13 ! perturbe the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! maximum weighted matching algorithm is switched-off iparm(40) = 0 ! Input: matrix/rhs/solution stored on master error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 4 ! symmetric, indefinite C.. Initiliaze the internal solver memory pointer. This is only C necessary for the FIRST call of the Cluster Sparse Solver. do i = 1, 64 pt(i)%DUMMY = 0 enddo C.. Reordering and Symbolic Factorization, This step also allocates C all memory that is necessary for the factorization phase = 11 ! only reordering and symbolic factorization call cluster_sparse_solver (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, 1 idum, nrhs, iparm, msglvl, ddum, ddum, MPI_COMM_WORLD, error) if (RANK.EQ.0) write(*,*) 'Reordering completed ... ' if (error .NE. 0) then if (RANK.EQ.0) write(*,*) 'The following ERROR was detected: ', error call MPI_FINALIZE(mpi_stat) stop 1 endif C.. Factorization. phase = 22 ! only factorization call cluster_sparse_solver (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, 1 idum, nrhs, iparm, msglvl, ddum, ddum, MPI_COMM_WORLD, error) if (RANK.EQ.0) write(*,*) 'Factorization completed ... ' if (error .NE. 0) then if (RANK.EQ.0) write(*,*) 'The following ERROR was detected: ', error call MPI_FINALIZE(mpi_stat) stop 1 endif C.. Back substitution and iterative refinement phase = 33 ! only solution if (RANK.EQ.0) then do i = 1, n b(i) = (1.d0,1.d0) enddo endif call cluster_sparse_solver (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, 1 idum, nrhs, iparm, msglvl, b, x, MPI_COMM_WORLD, error) if (error .NE. 0) then if (RANK.EQ.0) write(*,*) 'The following ERROR was detected: ', error call MPI_FINALIZE(mpi_stat) stop 1 endif if (RANK.eq.0) then write(*,*) 'Solve completed ... ' write(*,*) 'The solution of the system is ' do i = 1, n write(*,*) ' x(',i,') = ', x(i) enddo call mkl_zcsrmv('N',n,n,alpha,matdescra,a,ja,ia(1),ia(2),x,beta,bs) res = (0.d0,0.d0) res0 = (0.d0,0.d0) do i=1,n res = res + (bs(i)-b(i))*conjg((bs(i)-b(i))) res0 = res0 + b(i)*conjg(b(i)) enddo print *, 'Relative residual = ', sqrt(abs(res))/sqrt(abs(res0)) endif C.. Termination and release of memory phase = -1 ! release internal memory call cluster_sparse_solver (pt, maxfct, mnum, mtype, phase, n, ddum, idum, idum, 1 idum, nrhs, iparm, msglvl, ddum, ddum, MPI_COMM_WORLD, error) if (error .NE. 0) then if (RANK.EQ.0) write(*,*) 'The following ERROR was detected: ', error call MPI_FINALIZE(mpi_stat) stop 1 endif if(rank .eq. 0) then if ( sqrt(abs(res))/sqrt(abs(res0)) .gt. 1.d-10 ) then write(*,*) 'Error: residual is too high!' stop 1 endif endif call MPI_FINALIZE(mpi_stat) end
0 Kudos
1 Reply
Letian_W_
Beginner
250 Views

Finally, my post is here. Tried to enclose the code and error message, and my post would not appear on this forum. Here are more information:

When I run the code "mpirun -np 4 ./test.exe", here is the error message and the solution results were not correct. However, if I run -np 2, everything looks correct and no error message.

[wangl@sclinuxec4 intel_intelmpi_lp64_intel64_a]$ mpirun -np 4 ./cl_solver_CHermitian_f90.exe >outputnp4.out
1
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source            
cl_solver_CHermit  0000000004FAF5D5  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004FAD1F7  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7CAC4  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7C8D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F5D9D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F38D0E  Unknown               Unknown  Unknown
libpthread.so.0    0000003D46A0F710  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12425945F  Unknown               Unknown  Unknown
libmpi.so.12       00002BA124364E39  Unknown               Unknown  Unknown
libmpi.so.12       00002BA124364B32  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12423E44A  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12423DD5D  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12423DBDC  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12423DB0C  Unknown               Unknown  Unknown
libmpi.so.12       00002BA1243F4A1A  Unknown               Unknown  Unknown
libmpi.so.12       00002BA12434209A  Unknown               Unknown  Unknown
libmpifort.so.12   00002BA123E3349A  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000547550  MAIN__                    177  cl_solver_CHermitian_f90.f
cl_solver_CHermit  0000000000546B1E  Unknown               Unknown  Unknown
libc.so.6          0000003D4661ED1D  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000546A29  Unknown               Unknown  Unknown
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source            
cl_solver_CHermit  0000000004FAF5D5  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004FAD1F7  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7CAC4  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7C8D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F5D9D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F38D0E  Unknown               Unknown  Unknown
libpthread.so.0    0000003D46A0F710  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB05B45C  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB166E39  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB166B32  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB04044A  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB03FD5D  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB03FBDC  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB03FB0C  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB1F6A1A  Unknown               Unknown  Unknown
libmpi.so.12       00002B9AFB14409A  Unknown               Unknown  Unknown
libmpifort.so.12   00002B9AFAC3549A  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000547550  MAIN__                    177  cl_solver_CHermitian_f90.f
cl_solver_CHermit  0000000000546B1E  Unknown               Unknown  Unknown
libc.so.6          0000003D4661ED1D  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000546A29  Unknown               Unknown  Unknown
forrtl: error (69): process interrupted (SIGINT)
Image              PC                Routine            Line        Source            
cl_solver_CHermit  0000000004FAF5D5  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004FAD1F7  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7CAC4  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F7C8D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F5D9D6  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000004F38D0E  Unknown               Unknown  Unknown
libpthread.so.0    0000003D46A0F710  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECB53862  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECC5EE39  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECC5EB32  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECB3844A  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECB37D5D  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECB37BDC  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECB37B0C  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECCEEA1A  Unknown               Unknown  Unknown
libmpi.so.12       00002AF1ECC3C09A  Unknown               Unknown  Unknown
libmpifort.so.12   00002AF1EC72D49A  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000547550  MAIN__                    177  cl_solver_CHermitian_f90.f
cl_solver_CHermit  0000000000546B1E  Unknown               Unknown  Unknown
libc.so.6          0000003D4661ED1D  Unknown               Unknown  Unknown
cl_solver_CHermit  0000000000546A29  Unknown               Unknown  Unknown

 

0 Kudos
Reply