Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6440 Discussions

gcc's thread sanitizer reports a race condition in Intel MKL Pardiso

MBDyn-User
Beginner
608 Views

Dear Intel MKL developers,

I'm using Intel MKL Pardiso as a parallel direct linear solver for the multibody dynamics software MBDyn (https://public.gitlab.polimi.it/DAER/mbdyn). Pardiso provides significant performance improvements compared to all other solvers currently used with MBDyn. Unfortunately this solver suffers from stability issues which are not observed with Umfpack for example. If MBDyn is compiled with gcc's -fsanitize=thread, several warnings about race conditions in Pardiso are reported. Those warnings do not appear if Pardiso is used in single thread mode (e.g. if only one OpenMP thread is used). Since only a single thread in MBDyn is calling pardiso, I suppose that the race condition is caused by Pardiso itself.

Below you can find further information, how Pardiso is used by MBDyn:

The following links show the code which provides the interface between MBDyn and Pardiso:

https://public.gitlab.polimi.it/DAER/mbdyn/-/blob/develop/libraries/libmbwrap/pardisowrap.h

https://public.gitlab.polimi.it/DAER/mbdyn/-/blob/develop/libraries/libmbwrap/pardisowrap.cc

 

In order to build MBDyn on a Linux system (e.g. Ubuntu 20.04) with support for Pardiso use the following steps:

tar -jxvf mbdyn-pardiso.tar.bz2

./mbdyn-pardiso.sh

 

Then you should get a message like below. If you need further information, please let me know!

==================
WARNING: ThreadSanitizer: data race (pid=257811)
Write of size 8 at 0x7b4000010000 by main thread:
#0 free ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:707 (libtsan.so.0+0x35f25)
#1 mkl_serv_free_buffers <null> (libmkl_core.so+0x200334)

Previous write of size 8 at 0x7b4000010000 by thread T6:
#0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:651 (libtsan.so.0+0x30323)
#1 mm_account_ptr_by_tid..0 <null> (libmkl_core.so+0x1fe133)

Thread T6 (tid=257818, running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
#1 <null> <null> (libgomp.so.1+0x1adea)
#2 PardisoSolutionManager<SpGradientSparseMatrixHandler, int>::Solve() mbdyn/libraries/libmbwrap/pardisowrap.cc:270 (mbdyn+0x11bfd04)
#3 LineSearchFull::Solve(NonlinearProblem const*, Solver*, int, double const&, int&, double&, double const&, double&) mbdyn/mbdyn/base/linesearch.cc:707 (mbdyn+0x76281c)
#4 DerivativeSolver::Advance(Solver*, double, double, StepIntegrator::StepChange, std::deque<MyVectorHandler*, std::allocator<MyVectorHandler*> >&, std::deque<MyVectorHandler*, std::allocator<MyVectorHandler*> >&, MyVectorHandler*, MyVectorHandler*, int&, double&, double&) mbdyn/mbdyn/base/stepsol.cc:310 (mbdyn+0x59786a)
#5 Solver::Prepare() mbdyn/mbdyn/base/solver.cc:839 (mbdyn+0x58a8bd)
#6 Solver::Run() mbdyn/mbdyn/base/solver.cc:1632 (mbdyn+0x56fab0)
#7 RunMBDyn mbdyn/mbdyn/mbdyn.cc:1498 (mbdyn+0x50c137)
#8 mbdyn_program mbdyn/mbdyn/mbdyn.cc:942 (mbdyn+0x50c137)
#9 main mbdyn/mbdyn/mbdyn.cc:1168 (mbdyn+0x4ec45b)

SUMMARY: ThreadSanitizer: data race (/lib/x86_64-linux-gnu/libmkl_core.so+0x200334) in mkl_serv_free_buffers
==================
ThreadSanitizer: reported 1 warnings

0 Kudos
11 Replies
VidyalathaB_Intel
Moderator
541 Views

Hi,


Thanks for reaching out to us.


>>race conditions in Pardiso are reported..warnings do not appear if Pardiso is used in single thread mode

You can avoid these race conditions by using mkl_set_num_threads_local function as this only affects the current execution thread of the application.

Please refer to the below link 

https://software.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/...  

You can get more details regarding the mkl_set_num_threads_local function from the below link.

https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/...

Hope the information provided above helps.


Regards,

Vidya.


MBDyn-User
Beginner
525 Views

Dear Vidya,

 

Thank you for your advice, but it does not solve the issue. In this software there is only a single thread in the main program which is calling Pardiso. In addition to that, the race condition also appears if the main program uses only one thread of execution (e.g. no multithreading is used outside Pardiso). According to the link you shared, using mkl_set_num_threads_local should help only in situations where MKL is called from several threads simultaneously.

 

Best regards

MBDyn-User

VidyalathaB_Intel
Moderator
454 Views

Hi,


We are looking into this issue, we will get back to you soon.


Regards,

Vidya.


MBDyn-User
Beginner
388 Views

Dear Vidya,

 

Thank you for your support! It seems that unstable numerical factorization with pardiso (e.g. zero pivot) is not related to those race condition. Probably the issue was, that the weighted matching (e.g. phase=11) was not executed always when former non-zeros in the matrix pattern were turned to numerical zeros again. The reason for that is, that in MBDyn the non-zero pattern is preserved between iterations, unless new non-zeros appear. Only in the latter case the symbolic factorization (e.g. phase=11) was executed. So, the weighted matching was still based on an old non-zero pattern and zero pivots were the consequence. The new approach is to compute a checksum of the non-zero pattern, considering only numerical non-zeros. Whenever this checksum changes, the symbolic factorization is executed again. See pardisowrap.cc.  However, with this approach pardiso is no longer competitive in some cases because the numerical non-zero pattern changes almost at each iteration. Another alternative would be to compute the backward error of the solution. Whenever the backward error is above a certain limit or zero pivot (e.g. ierror = -4) is detected, the symbolic factorization is repeated. This approach is not fully tested yet.

Nevertheless, it would be fine if also those race condition could be fixed because it could lead to further issues.

Best regards

MBDyn-User

Khang_N_Intel
Employee
342 Views

Hi,

Thank you for the additional information!

We are currently working on the issue and will let you know how we will proceed next.

Best regards,

Khang


Khang_N_Intel
Employee
271 Views

Hi Reinhard,


I have been trying to reproduce the issue that you mentioned with no success.

I received many errors when running the script mbdyn-pardiso.sh.

I tried it on 2 different systems.


I will do some troubleshooting and will let you know how thing goes.


Best regards,

Khang




Khang_N_Intel
Employee
159 Views

Hi Reinhard,


I am still having problem with reproducing the error. Let me gather the error and will send it to you.


Best,

Khang


MBDyn-User
Beginner
123 Views

Dear Khang,

 

What kind of problems do you have? Were you able to compile and run the code?

 

Best regards,

MBDyn User

Khang_N_Intel
Employee
117 Views

Hi Reinhard,


The issue is the same on both machines.

When I ran the shell script mbdyn-pardiso.sh, it complained that the bootstrap and configure are not found. Also, the permission is denied for mbdyn/mbdyn.


Khang


MBDyn-User
Beginner
97 Views

Dear Khang,

I think there was an issue in the first submission that the bootstrap script was called "bootstrap" instead of "bootstrap.sh". If bootstrap is not executed, "configure" and "mbdyn" will not be generated. I have attached a new version of all files.

On Ubuntu you probably need to install additional packages:

sudo apt-get build-dep octave
sudo apt-get install  libmkl-full-dev

Please let me know if you have still problems running the script!

Best regards

Reinhard

Khang_N_Intel
Employee
89 Views

Hi Reinhard,


Thank you for providing the updated code!

I will test the new update and will you know the status.


Best regards,

Khang


Reply