Résolu : Re:MKL Krylov Schur iterative routine error for big system size

kumar__aman · ‎04-06-2021

Dear Developers

I am using Intel MKL Krylov Schur algorithm using this "mkl_sparse_d_ev(&which, pm, A, descr, k0, &k, E, X, res)" subroutine.

It perfectly gives me "K" lowest and highest eigenstates for small system size. But it gives me an error when I increase my system size lets say N=745472 and number of non zero elements are

n=45846528 , which is nearly n=61.5*N. I am trying to evaluate first 4000 eigenvalues and it gives an error saying

"mkl_sparse_d_ev output info 5".

While compiling the code I am using ILP64 interface. I dont think this is a memory issue. Please help me regarding this. Let me know If you need the code script to verify.

Thanks

aman

Gennady_F_Intel · ‎07-08-2021

The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Voir la solution dans l'envoi d'origine

Kirill_V_Intel · ‎04-06-2021

Hi!

Having the code will help us to provide a quicker and more confident answer. Just based on your description, there can be multiple reasons for what could happen inside the eigensolver and lead to the output you got.

Please, share a reproducer (code + compilation/linking) if it is possible. Also, specify which version of MKL/oneMKL you're using if it is known.

Thanks,
Kirill

kumar__aman · ‎04-06-2021

Thanks for the reply!

Here I am attaching the very small code, firstly I am loading the row, column and value arrays in CSR format in zero based indexing and then sending the job to the mentioned Krylov Schur subroutine.

I am using Ctype library to use the c code in python, first I am compiling my c code(example.c) and then using it in python code test.py (please change extension to .py).

I am also attaching my compiling script (please extension to .sh)

Please download this array_file_npz file from google drive, since it is large(525 Mb) in size could not attach here.

So, I first compile my c code using

./compile.sh

then I run my python code which uses the intel code

python test.py

Please let me know, If there is any issue.

Thanks

aman

kumar__aman · ‎04-07-2021

Dear Developers

As you mentioned, there can be many reasons for that error. Can you list all possible ones?

Kirill_V_Intel · ‎04-07-2021

Hi,

I better not dive into possible reasons as they are all quite specific to the internal code and unless I see the exact reason, it is hard to recommend whether it is possible to do anything on your side.

A small update. I've checked your reproducer and reproduced the failure. I'll need to have a second look to identify the reason but the execution ends up with a segmentation fault, while consuming 30+ Gb of RAM, for GNU threading. For Intel OpenMP, the execution ends up without any errors )and also consumes about 30 Gb of RAM).

So, I take my words back, actually memory can be an issue. How much RAM do you have in your system? If less than or close to 30 Gb, it can be the reason for the failure.

Best,
Kirill

kumar__aman · ‎04-07-2021

Thank you for reproducing the result. In my machine, I am having upto 180GB of RAM, so I don't think this error is due to lack of memory. As you also verified, it ended up giving failure with consuming 30 GB of RAM.

Thanks

aman

kumar__aman · ‎04-10-2021

Is this error is reproduced from your side, or it is specifically memory issue?

Kirill_V_Intel · ‎04-11-2021

Hi!

We need more time to investigate. Several things you can try on your side:

1) Try to force Krylov-Schur method by setting pm[2] = 1. Currently, with k0=4000 the eigensolver decides internally to use the FEAST algorithm (so the title of your post is imprecise, Krylov-Schur algorithm is not used for your case).

2) Is there a particular reason why you prefer the GNU threading and gcc compiler? If possible, try icc + Intel OpenMP. In general, the latter combination performs better (especially the GNU OpenMP is known to be inferior in performance AFAIK).

Best,
Kirill

kumar__aman · ‎04-13-2021

Hi

1.) Thanks for the suggestion, I thought it only Krylov Schur when I go for extremal eigenvalues. But , this time I am forcing it to use Krylov Schur. But, is there a way to know which algorithm is getting used while running the code.

2.) Another, can you tell me how to modify my compile.sh file, so that it uses Intel OpenMP instead of GNU threading, for icc I will replace gcc.

Thanks

Kirill_V_Intel · ‎04-19-2021

Hi!

Sorry for delay.

1) There is a parameter pm[2] which defined the method to use (0 is the automatic choice, 1 is the Krylov-Schur and 2 is for the FEAST). I believe in your code pm[2] was set to 0.

2) For correct linking and to avoid possible mistakes, please refer to the oneMKL Link Line Advisor: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html

A recipe by Gennady has several typos and misses a compilation flag.

3) UPDATE: We have identified an issue with the eigensolver which was likely causing the segfault on your side, for the FEAST algorithm. The most stable workaround is to construct a generalized eigenvalue problem and use identity matrix as the second one, and cal mkl_sparse_d_gv instead of mkl_sparse_d_ev.

Note: even with the fix it took a very long time for the FEAST algorithm to process your matrix (I haven't waited to see if it finishes). So I suggest anyway to switch to the Krylov-Schur method.

Let us know if you have further questions.

Best,
Kirill

kumar__aman · ‎04-13-2021

I had another question, can I use Krylov Schur on NVIDIA GPU? If yes is there any basis tutorial codes available for the same.

Gennady_F_Intel · ‎04-13-2021

${MKLROOT}/lib/intel64/libmkl_gnu_thread.a -> replaced by ${MKLROOT}/lib/intel64/libmkl_intel_thread.a

-fopenmp replaced by -lompp5

that's not possible to run this code on NVidia GPU at this moment.

Gennady_F_Intel · ‎07-03-2021

Hi,

The fix of the issue available in the official release of MKL 2021.3 which is now ready for download. You could take it to try and let us know the results.

thanks, Gennady

Gennady_F_Intel · ‎07-08-2021

The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.