Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7014 Discussions

call pardiso when using coarray fortran

xulisdu
Beginner
1,437 Views

Hi,

 

I wonder that what is the most efficient way to call pardiso in coarray fortran. For example, my laptop has 6 cores and 12 logical processors. The num_images() was set to be 6 and I call pardiso in the following way:

 

if(this_image()==1)then

      call psrdiso                          !             call pardiso on image 1

      sync images(*)

else

      sync images(1)                   !              other images are waiting the pardiso

end if

call co_broadcast(<solution>,1)  ! broadcast the solution to images 2-6.

 

It works. But, is this the best way to use pardiso in this case? I feel that in this way pardiso can not make full use of the cpu. I mean, there are some processors occupied by images 2-6 and they can not be used by pardiso, and also, they can not do anything else because their next tasks are based on the result from pardiso. Is there a better way to call pardiso in such a case? 

Using sparse solver efficiently is important for our project. 

 

Thanks and best regards, 

Xu

Labels (3)
0 Kudos
8 Replies
VidyalathaB_Intel
Moderator
1,409 Views

Hi Xu,


Thanks for reaching out to us.


>>...pardiso can not make full use of the cpu

Could you please let us know how you are compiling the code?

Because, by default, Intel® oneAPI Math Kernel Library uses the number of OpenMP threads equal to the number of physical cores on the system.


>>Is there a better way to call pardiso in such a case? Using sparse solver efficiently is important for our project.

Please provide us with the sample reproducer code and the steps to compile (along with your OS details and MKL version being used) and run & the timings that you are getting so that we could try the same from our end.


Regards,

Vidya.


0 Kudos
VidyalathaB_Intel
Moderator
1,367 Views

Hi Xu,


As we haven't heard back from you, could you please provide us with an update regarding the issue? Please get back to us with the above-mentioned details if the issue still persists.


Regards,

Vidya.


0 Kudos
xulisdu
Beginner
1,361 Views

Hi Vidya,

 

I complie the code with both Visual Studio 2022 (Win 10 system) and in my workstation (Linux system).

 

I make a short smaple, which you can find in the attachment. On Win 10 you can directly open the .sln file and then run it. On Linux I use the following commands:

 

mpiifort -c -O3 -coarray -coarray-num-images=9 main.f90 sparseMatrix.f90

 mpiifort -c -O3 -coarray -coarray-num-images=9 main.f90 sparseMatrix.f90

 mpiifort -zero -coarray -coarray-num-images=9 sparseMatrix.o main.o -o main -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread

 

./main

...

 

It seems that in both cases pardiso is run on 1 openmp automatically. The ifort version is 2021.6 and the MKL version is 2022.1.0.

 

Thanks and best regards,

 

Xu

0 Kudos
VidyalathaB_Intel
Moderator
1,327 Views

Hi Xu,


Thanks for providing us the details.

The issue is reproducible from our end and could see that even after setting the MKL to parallel it still runs on 1 openmp thread.

We are working on this issue, we will get back to you soon.


Meanwhile, could you please try setting MKL_NUM_THREADS environment variable and then try executing the code and check if the issue still persists?

Eg:

> set MKL_NUM_THREADS=4


Regards,

Vidya.


0 Kudos
VidyalathaB_Intel
Moderator
1,304 Views

Hi Xu,


As we haven't heard back from you, could you please provide us with an update regarding the issue?


Regards,

Vidya.


0 Kudos
xulisdu
Beginner
1,292 Views

Hi Vidya,

 

I am sorry for my late reply. Setting MKL_NUM_THREADS=4 really works. However, what is wired is that setting it to 4 is slower than setting it to 2. My laptop has 6 cores. This is indeed the real reason why I propose the question. I wonder an optimal choice of MKL_NUM_THREADS under coarray environment. Does this depend on the image numbers set for the coarray Fortran?

 

Thanks and best regards,

 

Xu

0 Kudos
VidyalathaB_Intel
Moderator
1,242 Views

Hi Xu,


Thanks for getting back to us.

>>4 is slower than setting it to 2. My laptop has 6 cores.

Could you please let us know the timings that you are getting with 4 threads and 2 threads?


Regards,

Vidya.


0 Kudos
xulisdu
Beginner
1,225 Views

Hi,

 

the timing on 4 threads is nearly twice as long as the one on 2 threads. On the other hand, I plan to close this question. Anyway, thank you very much for you kindly help!

 

Best regards,

 

Xu

0 Kudos
Reply