- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear all,

unfortunately, I have again some troubles with the mkl_cluster_sparse_solver as in my previous topic. I have taken one of the examples intel provides in the example dir of mkl and modified it in two ways: on the one hand the code can now read an arbitrary matrix stored in the file fort.110 and on the other hand I perform a loop over the routines since I want to change the matrix within one cycle later on. The first problem arises when treating large system sizes.

In this case, you can find the matrix in fort1.zip. The program aborts with a segmentation fault after 18%: forrtl: severe (174): SIGSEGV, segmentation fault occurred. Unfortunately, this is somehow hard to track down what is the issue but it must be in the subroutine since it starts. As I said this happens for large matrices. Unfortunately I dont know how to get rid of this problem.

The next problem occurs for small matrices as found in fort.zip. The problem seems to be the loop: the first cycle everything works fine but the second cycle aborts with an error message I have already seen in one of my last topics:

Fatal error in PMPI_Reduce: Message truncated, error stack:

PMPI_Reduce(2334).................: MPI_Reduce(sbuf=0x7d7d7f8, rbuf=0x7f0b900, count=22912, MPI_DOUBLE, MPI_SUM, root=0, comm=0x84000004) failed

MPIR_Reduce_impl(1439)............: fail failed

I_MPIR_Reduce_intra(1533).........: Failure during collective

MPIR_Reduce_intra(1201)...........: fail failed

MPIR_Reduce_Shum_ring(833)........: fail failed

MPIDI_CH3U_Receive_data_found(131): Message from rank 1 and tag 11 truncated; 14000 bytes received but buffer size is 1296

I have tried what I did the last time: provide all parameters (nhrs, msglevel, iparm, ..) for all ranks again but it does not seem to fix the issue.

This is the program code (cl_solver_f90.f90):

program cluster_sparse_solver use mkl_cluster_sparse_solver implicit none include 'mpif.h' integer, parameter :: dp = kind(1.0D0) !.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: pt(64) integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, ik, l1, k1, idum(1), DimensionL, Nsparse integer*4 mpi_stat, rank, num_procs double precision :: ddum(1) integer, allocatable :: IA( : ), JA( : ), iparm( : ) double precision, allocatable :: VAL( : ), rhodot( : ), rho( : ) integer(4) MKL_COMM MKL_COMM=MPI_COMM_WORLD call mpi_init(mpi_stat) call mpi_comm_rank(MKL_COMM, rank, mpi_stat) do l1 = 1, 64 pt(l1)%dummy = 0 end do error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 11 ! real, non-symmetric nrhs = 1 maxfct = 1 mnum = 1 allocate(iparm(64)) do l1 = 1, 64 iparm(l1) = 0 end do !Setup PARDISO control parameter iparm(1) = 1 ! do not use default values iparm(2) = 3 ! fill-in reordering from METIS iparm(8) = 100 ! Max. number of iterative refinement steps on entry iparm(10) = 13 ! perturb the pivot elements with 1E-13 iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS iparm(13) = 1 ! Improved accuracy using nonsymmetric weighted matching iparm(27) = 1 ! checks whether column indices are sorted in increasing order within each row read(110,*) DimensionL, Nsparse allocate(VAL(Nsparse),JA(Nsparse),IA(DimensionL)) if (rank.eq.0) then do k1=1,Nsparse read(110,*) VAL(k1) end do do k1=1,DimensionL+1 read(110,*) IA(k1) end do do k1=1,Nsparse read(110,*) JA(k1) end do end if allocate(rhodot(DimensionL), rho(DimensionL)) if (rank.eq.0) then rhodot=0.0d0 rhodot(1) = 1.0d0 rho=0.0d0 end if if (rank.eq.0) write(*,*) 'INIT PARDISO' ik = 0 Pardisoloop: do ik = ik + 1 phase = 12 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 33 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error if (ik.ge.4) exit Pardisoloop end do Pardisoloop call MPI_BARRIER(MKL_COMM,mpi_stat) phase = -1 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error call mpi_finalize(mpi_stat) end

I compile with

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include/mkl_cluster_sparse_solver.f90

mpiifort -i8 -I${MKLROOT}/include -c -o cl_solver_f90.o cl_solver_f90.f90

mpiifort mkl_cluster_sparse_solver.o cl_solver_f90.o -o MPI.out -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run the program with mpiexec -n 2 ./MPI.out. Our cluster has 16 cores per node and I request two of them. Ram should not be the problem (64gb), since it perfectly runs with the normal pardiso on just one node. I set export MKL_NUM_THREADS=16. Am I right that the slave MPI process should automatically obtain parts of the LL^T factors or do I have to use the distributed version in order to do so? The reason why I ask is that I cannot observe any process running on the second node.

The Versions are: MKL version: 2017.4.256, Ifort version: 17.0.6.256, IMPI version: 2017.4.239, but my college can also reproduce the issue on other versions/clusters.

Thanks in advance,

Horst

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Just an update since I somehow proceed with the problem

I think the problem of the segmentation fault is somehow connected to openmp, since I sometimes get this error message:

libiomp5.so 00002B71618FFD43 __kmp_invoke_micr Unknown Unknown

libiomp5.so 00002B71618CE7AD __kmp_fork_call Unknown Unknown

libiomp5.so 00002B71618A682E __kmpc_fork_call Unknown Unknown

This error is obtained when running valgrind --tool=memcheck --track-origins=yes --leak-check=yes -v --leak-check=full --show-reachable=yes. However, the rest of the output does (at least to me) not give any further information about what the error could be.

The problem of the loop can sometimes be solved by calling

call MPI_Bcast(pt,64,MPI_INTEGER, 0, MKL_COMM,mpi_stat)

sometimes it still gives an error. Moreover, I am not sure if I can update the pt like this, since it is in principle not an integer.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

The documentation for the cluster sparse solver (https://software.intel.com/en-us/mkl-developer-reference-fortran-cluster-sparse-solver) says, that the entries of pt must be set to zero before the first call. Thus, in-between the loop iterations you need to set pt elements to 0. I quickly checked that this seems to fix the issue. Let me know if you still see them and under which conditions then.

As I understand your intention is to change the matrix between the loop iterations. How exactly do you want to re-use the previous calls to the cluster sparse solver? If you just want to call the entire functionality again, starting with reordering, then you should consider the next loop as calling cluster sparse solver "for the first time" and again set pt entries to 0. If you want to do something else, e.g., re=using the existing factorization (as approximate inverse for the modified matrix), you can use only the solve step (with iterative refinement turned on) and then you should not do anything about the pt.

Notice that you trick with broadcasting pt doesn't do the right thing.

Hope this helps!

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much for your response.

I also checked it and yes it fixes the issue. Yes, my idea is to use the iterative refinement or when needed the reordering and factorization. I interpreted the first call as the very first call before the loop starts. However, not setting iparm=0 for pardiso_64 works fine. Thats the reason why I didn't think about doing this again. Thank you for this information.

Concerning the segmentation fault, I have seen a strange behavior in my further investigations: the program calculates to a higher percentage of LL^T factors when I set export OMP_NUM_THREADS=1 instead of export OMP_NUM_THREADS=16. For sure, the calculations last longer but the segmentation fault occurs later. Then I think openmp is not the issue (as I thought) since in the case of one thread it does not work either.

Best,

Horst

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

To avoid misunderstanding, am I right that the problem with the smaller matrix has been solved now? So, the segfault happens for the larger matrix, right?

As for the percentage computed before the segfault, it's normal that the numbers are different for different thread number.

Thanks,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Kirill,

thank you for your response. Yes, the problem with the smaller matrix has been solved. And yes, the segfault still happens for the larger matrix. Do you have any idea how I can track down the problem of this segfault? Thanks in advance.

Best,

Horst

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

We'll investigate the issue. As a workaround I suggest temporarily switching to the different code path by disabling matching, scaling and distributed reordering. That is, set iparm(2) = 2, iparm(11) = 0 and iparm(13) = 0.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Kirill,

thank you for your suggestions. Unfortunately, I cannot report that this workaround works. The segfault still occurs. I also tried some combination of it but it changed nothing.

Thanks for investigating this problem.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello Horst,

Can you share the code, run settings and the output for your last statement? I'm kinda surprised that the segfault still exists, I don't observe it in my runs.

Thanks,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hey Kirill,

for sure, here is the code:

program cluster_sparse_solver use mkl_cluster_sparse_solver implicit none include 'mpif.h' integer, parameter :: dp = kind(1.0D0) !.. Internal solver memory pointer for 64-bit architectures TYPE(MKL_CLUSTER_SPARSE_SOLVER_HANDLE) :: pt(64) integer maxfct, mnum, mtype, phase, nrhs, error, msglvl, i, ik, l1, k1, idum(1), DimensionL, Nsparse integer*4 mpi_stat, rank, num_procs double precision :: ddum(1) integer, allocatable :: IA( : ), JA( : ), iparm( : ) double precision, allocatable :: VAL( : ), rhodot( : ), rho( : ) integer(4) MKL_COMM MKL_COMM=MPI_COMM_WORLD call mpi_init(mpi_stat) call mpi_comm_rank(MKL_COMM, rank, mpi_stat) do l1 = 1, 64 pt(l1)%dummy = 0 end do error = 0 ! initialize error flag msglvl = 1 ! print statistical information mtype = 11 ! real, non-symmetric nrhs = 1 maxfct = 1 mnum = 1 allocate(iparm(64)) do l1 = 1, 64 iparm(l1) = 0 end do !Setup PARDISO control parameter iparm(1) = 1 ! do not use default values iparm(2) = 2 ! fill-in reordering from METIS iparm(8) = 100 ! Max. number of iterative refinement steps on entry iparm(10) = 13 ! perturb the pivot elements with 1E-13 iparm(27) = 1 ! checks whether column indices are sorted in increasing order within each row read(110,*) DimensionL, Nsparse allocate(VAL(Nsparse),JA(Nsparse),IA(DimensionL)) if (rank.eq.0) then do k1=1,Nsparse read(110,*) VAL(k1) end do do k1=1,DimensionL+1 read(110,*) IA(k1) end do do k1=1,Nsparse read(110,*) JA(k1) end do end if allocate(rhodot(DimensionL), rho(DimensionL)) if (rank.eq.0) then rhodot=0.0d0 rhodot(1) = 1.0d0 rho=0.0d0 end if if (rank.eq.0) write(*,*) 'INIT PARDISO' phase = 12 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error phase = 33 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, VAL, IA, JA, idum, nrhs, iparm, msglvl, rhodot, rho, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'ERROR: ', error call MPI_BARRIER(MKL_COMM,mpi_stat) phase = -1 call cluster_sparse_solver_64 ( pt, maxfct, mnum, mtype, phase, DimensionL, ddum, idum, idum, idum, nrhs, iparm, msglvl, ddum, ddum, MKL_COMM, error ) if (error.ne.0.and.rank.eq.0) write(*,*) 'Release of memory: ', error call mpi_finalize(mpi_stat) end

I compile the program with

mpiifort -i8 -I${MKLROOT}/include -c -o mkl_cluster_sparse_solver.o ${MKLROOT}/include/mkl_cluster_sparse_solver.f90

mpiifort -i8 -I${MKLROOT}/include -c -o cl_solver_f90.o cl_solver_f90.f90

mpiifort mkl_cluster_sparse_solver.o cl_solver_f90.o -o MPI.out -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

and run the program with mpiexec -n 2 ./MPI.out. The output is the following:

INIT PARDISO

Percentage of computed non-zeros for LL^T factorization

1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 %

and the segfault is:

forrtl: severe (174): SIGSEGV, segmentation fault occurred

forrtl: severe (174): SIGSEGV, segmentation fault occurred

forrtl: severe (174): SIGSEGV, segmentation fault occurred

forrtl: severe (174): SIGSEGV, segmentation fault occurred

If you need any further information, please let me know.

Best

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This one, with advanced options turned off, I cannot reproduce with the latest versions of Intel MKL, though I did observe the failure for MKL 2017 Update 4. Since that release we've had a lot of updates for the cluster sparse solver.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

thank you for reply. Unfortunately, I have no access to the latest version. I have tried it up to the 2018 version. Then we need to update the compiler.

Thank you very much!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Horst,

The fix of this issue available in MKL v.2020 ( the Initial version) and MKL 2019 u5. You may check these versions and let us know how it works on your side.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page