I am using pardiso for a large program that I have created, and I have noticed that multiple runs of the same exact input give different results for multi-threaded execution. After checking Intel's online resources, I have realized that, to resolve this issue, I will need to align the arrays that I use with pardiso. I checked the MKL user guide for Windows, it included an example (in Section 8.1) with alignment, but it was not clear to me on how I can use the described procedure for my own program.
I wanted to ask whether there is a way to ask the compiler to pursue an array alignment on, e.g., 128 byte boundaries in a module, where I define ALLOCATABLE arrays to be eventually used by mkl and pardiso.
For example, if I have the following declaration:
real*8,allocatable,dimension(:) :: KffSMv
how would I have to modify this declaration to inform the compiler that the array is aligned?
to get the same pardiso's output, please check iparm == the number of OpenMP threads and keep is the same. Please refer to the reference guide to see more details.
Gennady, thank you for the prompt reply. I tried your suggestion, but it still does not work.
Is there anything else I could try?
Again, thank you for your help.
Do you use OpenMP threading?
Did you set ipar == 8 as an example and see the computed results are different from run to run?
or you expect to see the identical results when you call pardiso with different #of threads?
First, I must clarify that the problem I am referring to has to do with repeatedly running the SAME EXACT INPUT, on the SAME EXACT hardware, for the SAME EXACT number of threads. If I had used different numbers of threads, then I would have expected to see some changes to the obtained results.
I use OpenMP threading. I give the requested number of threads in the input file that I run. This is used to initialize a variable in my program called NCPU1.
The MKL number of threads is set equal to the requested number of OpenMP threads, by using mkl_set_num_threads(NCPU1) before I call pardiso.
I need to add that I have definitely pinpointed pardiso as the reason for my issue. That is, when I run my program multi-threaded (there are other areas of the code where I use multiple threads), but I use mkl_set_num_threads(1) before I call pardiso, then the problem vanishes: I get 100% identical results from multiple runs.
I checked the pardiso reference guide, and I realized that for Fortran, what you mention as iparm would correspond to iparm(34). I also noticed that the pardiso guide only mentions that iparm(34) must be set equal to 1 or 0, it does not mention anything about setting iparm(34) equal to the actual number of threads (in my case NCPU1). By the way, I tried setting iparm(34) equal to both 1 and NCPU1, and for both cases, I still obtain discrepancies when I run the same exact input multiple times, for the same exact number of threads.
I will keep checking my code to see whether, e.g., one of the optimization options causes the issue. In the meantime, please let me know whether you think there is anything else I should try.
Again, thank you for your help.
yes, in the case of Fortran API, please set iparm(34) > 0 as Developer Reference states.
My recommendation - could you create a standalone example, which we could build and run on our side to reproduce the issue.
I can privately share my code with Intel (and a sample input where the problem is manifested, together with some information).Will it be possible for you (or someone else from Intel) to reach me through e-mail or through a private message? I imagine you can see my profile information.
We see you already escalated the same issue via Online service Center ticket #04726666. Therefore you may attach your reproducer ( the code which we could build with the latest version of mkl ) into this ticket.
Thank you. I myself have noticed that there are cases (particularly small problems) where the issue is not observed.
I just uploaded (in my support request case) my source code with the buildlog, and also a sample input where the problem is manifested, together with a detailed description of the observed issue.