- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
my name is Nicola Giuliani and I am a Software Engineering working in the field of numerical analysis.
In the past days I have been testing the PARDISO solver on a sparse matrix with 486k rows/cols and 3.3M non zeros. I am storing it in CSR format and calling PARDISO to solve it. I am compiling the program (that creates the CSR matrix and then factorizes it). I have tested three different scenarios using both intel an gcc (9.2 and 10.2). I use 1 2 4 8 16 32 64 threads to see the scalability of the solution. I am using Intel oneAPI version 2025.1.1.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @NicolaGiuliani,
Please attach your code for us to be able to help you out.
Without the code we can only make guesses, so here is mine:
icpx kernel_only_pardiso.cpp -qmkl=parallel -o intel_intel.out
is similar to (but maybe not exactly same as):
icpx kernel_only_pardiso.cpp -m64 -L${MKLROOT}/lib -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp -lpthread -lm -ldl -I"${MKLROOT}/include" -o intel_intel.out
The difference between your (1) and (3) options (icpx compilation) is mainly the "Intel versus GNU threading layer", i.e., `-lmkl_intel_thread -liomp` versus `-lmkl_gnu_thread -lgomp`. In both cases, `kernel_only_pardiso.cpp` is also compiled with icpx. You mentioned you got similar timings for those cases, so in this particular case, if appears that using`-lmkl_intel_thread -liomp` versus `-lmkl_gnu_thread -lgomp` does not matter for performance.
Although it may initially appear that there should then also be no difference between g++ compilation (2) versus icpx compilation (3), the difference there is that your `kernel_only_pardiso.cpp` is being compiled with g++ versus icpx. The difference in timings indicates that you are doing a lot more work in that file outside the oneMKL PARDISO function calls. Is that the case? (The oneMKL library is pre-compiled with Intel compiler(s) that you are only linking to and not really compiling with g++ in (2) or icpx in (3), if you get what I mean; if that's not clear, maybe I can elaborate in another reply).
However, without looking at the code, this is still just a guess. It might be something entirely different, we can only be sure if you can share the code.
Hope that helps,
Gajanan Choudhary
Intel oneMKL team
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gajanan_Choudhary ,
thank you for the swift reply. I attach the simple example I was talking about.
a.out AScaledReal5_1.txt bScaledReal5_1.txt
Thank you again,
Nicola
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gajanan_Choudhary ,
I have done some additional digging and I have some news.
g++ -Ofast kernel_only_pardiso.cpp -m64 -L${MKLROOT}/lib -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl -I"${MKLROOT}/include" -o gnu_gnu.out
has the same performance than the intel compiled version. A further refinement shows that
g++ -funsafe-math-optimizations kernel_only_pardiso.cpp -m64 -L${MKLROOT}/lib -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl -I"${MKLROOT}/include" -o gnu_gnu.out
shows the same performance as the the intel compiled one. Looking at the gcc man page -funsafe-math-optimizations turns on -fno-signed-zeros,-fno-trapping-math,-fassociative-math and -freciprocal-math.
The man page also states that "When used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations". So I tried to identify if any of the previous flags is responsible of the performance increase.
g++ -fno-signed-zeros,-fno-trapping-math,-fassociative-math and -freciprocal-math kernel_only_pardiso.cpp -m64 -L${MKLROOT}/lib -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl -I"${MKLROOT}/include" -o gnu_gnu.out
shows the same slowdown w.r.t. the intel compiled one so my conclusion is that it is not the compilation of kernel_only_pardiso.cpp that causes a performance fluctuation but the different linking of the mkl libraries that is influenced by -funsafe-math-optimizations . Do you have any idea bout what's changing at linking level, or any idea on how to find out?
Thank you again,
Nicola

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page