- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wonder that what is the most efficient way to call pardiso in coarray fortran. For example, my laptop has 6 cores and 12 logical processors. The num_images() was set to be 6 and I call pardiso in the following way:
if(this_image()==1)then
call psrdiso ! call pardiso on image 1
sync images(*)
else
sync images(1) ! other images are waiting the pardiso
end if
call co_broadcast(<solution>,1) ! broadcast the solution to images 2-6.
It works. But, is this the best way to use pardiso in this case? I feel that in this way pardiso can not make full use of the cpu. I mean, there are some processors occupied by images 2-6 and they can not be used by pardiso, and also, they can not do anything else because their next tasks are based on the result from pardiso. Is there a better way to call pardiso in such a case?
Using sparse solver efficiently is important for our project.
Thanks and best regards,
Xu
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xu,
Thanks for reaching out to us.
>>...pardiso can not make full use of the cpu
Could you please let us know how you are compiling the code?
Because, by default, Intel® oneAPI Math Kernel Library uses the number of OpenMP threads equal to the number of physical cores on the system.
>>Is there a better way to call pardiso in such a case? Using sparse solver efficiently is important for our project.
Please provide us with the sample reproducer code and the steps to compile (along with your OS details and MKL version being used) and run & the timings that you are getting so that we could try the same from our end.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xu,
As we haven't heard back from you, could you please provide us with an update regarding the issue? Please get back to us with the above-mentioned details if the issue still persists.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vidya,
I complie the code with both Visual Studio 2022 (Win 10 system) and in my workstation (Linux system).
I make a short smaple, which you can find in the attachment. On Win 10 you can directly open the .sln file and then run it. On Linux I use the following commands:
mpiifort -c -O3 -coarray -coarray-num-images=9 main.f90 sparseMatrix.f90
mpiifort -c -O3 -coarray -coarray-num-images=9 main.f90 sparseMatrix.f90
mpiifort -zero -coarray -coarray-num-images=9 sparseMatrix.o main.o -o main -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread
./main
...
It seems that in both cases pardiso is run on 1 openmp automatically. The ifort version is 2021.6 and the MKL version is 2022.1.0.
Thanks and best regards,
Xu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xu,
Thanks for providing us the details.
The issue is reproducible from our end and could see that even after setting the MKL to parallel it still runs on 1 openmp thread.
We are working on this issue, we will get back to you soon.
Meanwhile, could you please try setting MKL_NUM_THREADS environment variable and then try executing the code and check if the issue still persists?
Eg:
> set MKL_NUM_THREADS=4
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xu,
As we haven't heard back from you, could you please provide us with an update regarding the issue?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vidya,
I am sorry for my late reply. Setting MKL_NUM_THREADS=4 really works. However, what is wired is that setting it to 4 is slower than setting it to 2. My laptop has 6 cores. This is indeed the real reason why I propose the question. I wonder an optimal choice of MKL_NUM_THREADS under coarray environment. Does this depend on the image numbers set for the coarray Fortran?
Thanks and best regards,
Xu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xu,
Thanks for getting back to us.
>>4 is slower than setting it to 2. My laptop has 6 cores.
Could you please let us know the timings that you are getting with 4 threads and 2 threads?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the timing on 4 threads is nearly twice as long as the one on 2 threads. On the other hand, I plan to close this question. Anyway, thank you very much for you kindly help!
Best regards,
Xu
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page