The solution is error by using pardiso solver while set mkl_thread_num=2

lutx · ‎02-15-2022

Hi,

I found some error while using pardiso for solve the linear question. The attachments are example source code and data files.

The example program just do two things:

1. solve a 4x4 sparse linear equations, and calculate |Ax-b|, it should be zero theoretically,

2. solve a large problem(1102x1102) read from data file, and calculate |Ax-b|, it should be zero theoretically,

The command line will be: a.exe ***.dat n

first argument is data filename, and the second argument is thread number.

The first step, run with thread number 1, output is :

$./qt-snippet matrix1.dat 1
MKL Version: 2018.0.3 Build 20180406_Intel(R) 64 architecture
mkl_max_threads:1

----------------------analysis-------------------------
Rows of Matrix: 4, non-zero cells 10

----------------------factsolve-------------------------
Rows of Matrix: 4, non-zero cells 10

----------------------release-------------------------
Rows of Matrix: 4, non-zero cells 10
Solve linear equations difference: |Ax-b|=0

read matrix rows:1102
read matrix cells:7641
read rhs size:1102
read data finished.

----------------------analysis-------------------------
Rows of Matrix: 1102, non-zero cells 7641

----------------------factsolve-------------------------
Rows of Matrix: 1102, non-zero cells 7641

----------------------release-------------------------
Rows of Matrix: 1102, non-zero cells 7641
Solve linear equations difference: |Ax-b|=2.25537e-15

Two problems are all correct.

The second step, run with thread number 2, the output is :

$ ./qt-snippet matrix1.dat 2
MKL Version: 2018.0.3 Build 20180406_Intel(R) 64 architecture
mkl_max_threads:2

----------------------analysis-------------------------
Rows of Matrix: 4, non-zero cells 10

----------------------factsolve-------------------------
Rows of Matrix: 4, non-zero cells 10

----------------------release-------------------------
Rows of Matrix: 4, non-zero cells 10
Solve linear equations difference: |Ax-b|=0

read matrix rows:1102
read matrix cells:7641
read rhs size:1102
read data finished.

----------------------analysis-------------------------
Rows of Matrix: 1102, non-zero cells 7641
OMP: Info #270: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

----------------------factsolve-------------------------
Rows of Matrix: 1102, non-zero cells 7641

----------------------release-------------------------
Rows of Matrix: 1102, non-zero cells 7641
Solve linear equations difference: |Ax-b|=0.0028809
verify solution error.

Now, the simple problem 4x4 is correct, but the solution of large problem has error. If I run the last step more times, the result of |Ax-b| will change to another value:

----------------------release-------------------------
Rows of Matrix: 1102, non-zero cells 7641
Solve linear equations difference: |Ax-b|=0.00179701
verify solution error.

----------------------release-------------------------
Rows of Matrix: 1102, non-zero cells 7641
Solve linear equations difference: |Ax-b|=0.00149793
verify solution error.

Are there something error in my source? pardiso control parameters? or solve progress ?

ShanmukhS_Intel · ‎02-16-2022

Hi,

Thank you for posting on Intel Communities.

Thanks for sharing the sample reproducer and steps. We are investigating the shared issue at our end with multiple threads as mentioned.

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎02-23-2022

Hi,

We have tried compiling and executing the code shared by you and we were able to get the results as mentioned. We'll get back to you soon on this after performing work arounds.

Best Regards,

Shanmukh.SS

ShanmukhS_Intel · ‎03-01-2022

Hi,

Could you please export the environment variable MKL_CBWR with value "AUTO" and try running the code, as this turns on the CNR(Conditional Numerical Reproducibility) mode.

export MKL_CBWR="AUTO"

Please refer the below link for setting the environment variable for CNR.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/obtaining-numerically-reproducible-results/set-env-var-for-cond-numerical-reproducibility.html

Best Regards,

Shanmukh.SS

lutx · ‎03-02-2022

Hi Shanmukh,

Thank you for your reply.

Yes, result is correct after set environment. and :

1. is it the final solution ?

2. Will this setting affect other aspects, such as performance, speed, etc.

3. How to choose appropriate value for this setting?

lu@lu:~$ export MKL_CBWR="AUTO"
lu@lu:~$ ./qt-snippet matrix1.dat 2
MKL Ver0sion: 2018.0.3 Build 20180406_Intel(R) 64 architecture
mkl_max_threads:2
----------------------analysis-------------------------
Rows of Matrix: 4, non-zero cells 10
----------------------factsolve-------------------------
Rows of Matrix: 4, non-zero cells 10
----------------------release-------------------------
Rows of Matrix: 4, non-zero cells 10
Solve linear equations difference: |Ax-b|=0
read matrix rows:1102
read matrix cells:7641
read rhs size:1102
read data finished.
----------------------analysis-------------------------
Rows of Matrix: 1102, non-zero cells 7641
----------------------factsolve-------------------------
Rows of Matrix: 1102, non-zero cells 7641
----------------------release-------------------------
Rows of Matrix: 1102, non-zero cells 7641
Solve linear equations difference: |Ax-b|=2.33677e-15

ShanmukhS_Intel · ‎03-04-2022

Hi,

>>1. is it the final solution ?

Conditional numerical reproducibility (CNR) is a functionality that enables you to obtain reproducible results from oneMKL routines.

When CNR mode gets enabled,

-> It allows you choose a specific code branch of Intel® oneAPI Math Kernel Library that corresponds to the instruction set architecture (ISA) that you target. You can specify the code branch and other CNR options using the MKL_CBWR environment variable.

-> It uses the standard ISA-based dispatching model while ensuring fixed cache sizes, deterministic reductions, and static scheduling

For more information regarding CNR mode, you could refer to below link

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/support-functions/conditional-numerical-reproducibility-control.html

2. Will this setting affect other aspects, such as performance, speed, etc.

Since it runs a CPU check to return a CNR branch that is optimized for the processor where the program is currently running it might not affect performance related parameters.

3. How to choose appropriate value for this setting?

Please refer the below link regarding appropriate values, specifying the code branches.

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/obtaining-numerically-reproducible-results/specifying-code-branches.html

Best Regards,

Shanmukh.SS

lutx · ‎03-06-2022

Hi,

I found the CNR mode is not effective for every cases I meet.

Finally, I got the fundamental cause of my error is : iparam[12] should be set to 1 because the matrix is unsymmetrical , and some diagonal item may be too small. All solutions are correct after set the iparam[12].

Thank you for your help.

Best regards,

Tianxiong Lu

ShanmukhS_Intel · ‎03-07-2022

Hi Lutx,

Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Best Regards,

Shanmukh.SS

The solution is error by using pardiso solver while set mkl_thread_num=2

Error