Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Extended Eigenvalue Solver hanging

asd__asdqwe
Beginner
564 Views

Hello,

I'm trying to solve N independent generalized eigenvalue problems. The following piece of code hangs on my computer when launching mpirun -np N ./a.out with 1 < N < 9. I don't need the MPI version of the EES (and I know you don't support it as of update 2), just a SMP version that works independently on any random MPI process (I don't have such a problem with PARDISO for example). Can you reproduce this error ? Is there a way to fix this issue ?

Thank your for your help.

0 Kudos
12 Replies
Sridevi_A_Intel
Employee
564 Views

Dear Customer,

what is the Link line that you are using? any specific machine that you are trying to run this program on? I'll reproduce the issue and get back to you soon

Thanks,

Sridevi

0 Kudos
asd__asdqwe
Beginner
564 Views

Thanks for your help, and here a my specs : Debian 3.2.35-2 x86_64 GNU/Linux, icpc version 13.1.0.146 Build 20130121, MPICH2 version 1.4.1, and finally, compile line is icpc FEAST_hang.cpp -I/usr/include/mpich2 -lmpi -lmkl_rt -lmkl_intel_thread -lmkl_mc  -lmkl_intel_lp64 -lmkl_core -liomp5 -lifcore -limf

0 Kudos
asd__asdqwe
Beginner
564 Views

Hello,

Can you reproduce the error ?

0 Kudos
asd__asdqwe
Beginner
564 Views

Could it be possible to know why nobody is answering please ?

0 Kudos
Sridevi_A_Intel
Employee
564 Views

Hello,

I'm using composer xe 2013 update 2 and MPICH2 latest version 3.0.3 and ran the following commands:

 icpc FEAST_hang.cpp -I/project/sallam1/mpich2/include -L/usr/lib64/openmpi/lib/libmpi.so.0 -lmkl_rt -lmkl_intel_thread -lmkl_mc -lmkl_intel_lp64 -lmkl_core -liomp5

It gave a warning: 

<<<

FEAST_hang.cpp(18): warning #592: variable "rank" is used before its value is set

oss << (int)rank;
>>>

when I ran: -bash-4.1$ mpirun -np 2 ./a.out, here is the Output:

GO ! 0
GO ! 0
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Extended Eigensolvers: Size subspace 100
Extended Eigensolvers: Size subspace 100
#Loop | #Eig | Trace | Error-Trace | Max-Residual
#Loop | #Eig | Trace | Error-Trace | Max-Residual
0,26,5.344662201251952e+00,1.000000000000000e+00,4.611205669551193e-07
0,26,5.344662201251954e+00,1.000000000000000e+00,4.649064852068096e-07
1,26,5.344662201251753e+00,4.996003610813204e-13,1.378371549897360e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
DONE ! 0
1,26,5.344662201251755e+00,4.973799150320701e-13,1.384700328307968e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
DONE ! 0

I dont see a hang here. may be the versions of my builds causing difference?

Thanks,

Sridevi

0 Kudos
asd__asdqwe
Beginner
564 Views

Hello,

Thanks for your answer. The value of the variable "rank" is set line 15, so there is definetly a problem with your compiler output (icpc version 13.1.0.146 build 20130121 does not produce such warning). Moreover, at execution, it should read "GO ! 0" and "GO ! 1" .... "GO ! size - 1", not size - 1 times "GO ! 0". In my case, I only see "DONE ! 0" at the end, all rank other than the root hang.

Thanks in advance for your help.

0 Kudos
Sridevi_A_Intel
Employee
564 Views

Hello, Yes, you are right. The Testcase did hang for me too. Here is the output:

-bash-4.1$ mpirun -n 2 ./a.out
 GO ! 1
 GO ! 0
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Extended Eigensolvers: Size subspace 100
#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual
0,26,5.344662201251952e+00,1.000000000000000e+00,4.605558650066731e-07
1,26,5.344662201251757e+00,4.884981308350689e-13,1.379730242389445e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
 DONE ! 0

It hangs after DONE! 0

I'm escalating this Issue to our engineering Team by submitting a ticket. I'll update you the status

Thank you,

Sridevi

0 Kudos
Vitaly_Lukinov
Beginner
564 Views

Hi,

I reproduced hang on SMP version of your code. So it doesn’t depend on any MPI processes.

I found that the reason of hang is incorrect CSR format of the matrix B from file B_1.txt. If we look at B_1.txt we can see that array ib has equal values:

“1

1

2

4

4

4

6

6

…….”

Extended Eigenvalue Solver uses the same 3-array variation of the CSR format as in PARDISO (please see Intel MKL manual, appendix A).  Based on that format of ib array is incorrect.  Could you change format for B matrix and write about results?

Thanks,

Vitaly

0 Kudos
asd__asdqwe
Beginner
564 Views

Hello Vitaly,

Thanks to your remark, I just saw that it seems MKL does not support empty lines for symmetric CSR. I guess I have to add dummy 0s in the arrays then. Just out of curiosity, why ?

0 Kudos
Vitaly_Lukinov
Beginner
564 Views

Hi,

I think there is some misunderstanding. MKL supports symmetric CSR format for empty rows. But for generalized problem Ax = λBx, B should be a real symmetric positive definite matrix (please see Intel MKL manual, Extended Eigensolver Functionality). It is known that real symmetric positive definite matrix should not contain zeros on diagonals (zero rows in your case). If possible you can change the input matrices B_i that they become a positive definite matrices.

 

Thanks,

Vitaly

0 Kudos
asd__asdqwe
Beginner
564 Views

Hello,

Thanks again for your answer. I'm almost positive I succesfuly solved SP undefinite generalized eigenvalue problem with FEAST RCI even if it is not covered in theory, because it only needs to factorize (zB - A), which in this case is SPD. Moreover, if you look at B_0.txt, there are also numeros empty lines, so that it is not SPD, but the method still converges. Any ideas ?

0 Kudos
Vitaly_Lukinov
Beginner
564 Views

 Hi,

Thanks a lot good discussion about our EE solver. First of all I need to say that theoretically matrix B corresponds to some energy norm and based on this norm EE solver’s algorithm implemented. That’s the main reason why this matrix needs to be positive define. In such case MKL works correctly. We know that there is an issue with hanging EE solver in case of specific indefinite matrix B and are working to resolve it. Anyway if matrix B is positive define (based on MKL manual user needs to set B as positive define) we doesn’t see any hanging.

With best regards,

Vitaly Lukinov.

0 Kudos
Reply