Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

DTRSM in threaded applications

mullervki
Beginner
700 Views

Hello,

I have a threaded application where I call the MKL library on a Windows 7 platform. Intel Inspector is reporting a read/write collision in DTRSM (it goes through many other BLAS calls with no problem; only on this one I have a problem).

The definition of DTRSM is as follows:

op( A )*X = alpha*B,   or   X*op( A ) = alpha*B

My "A" is the same on both threads, but X and B are different. Is "A" being modified in this call? Or is MKL reading/writing to some internal memory location during this operation that is shared by both threads?

I'm trying to remove this problem but I'm at a loss on how to resolve it. I compiled the NetLib source code and use this (very!) non-optimized version of the BLAS to try to get some insight into the problem. But then Inspector reports no problems.

If anybody could help me out on this I would appreciate it.

0 Kudos
14 Replies
Ying_H_Intel
Employee
700 Views

Hi mullervki,

As i understand, only B was rewrited in the function.  Have you tried the code in Intel Inspector when without multi-threaded?  and how you link MKL library, mkl_intel_thread.lib or mkl_seqential? 

Could you please provide a small test case so we can investigate this?

Thanks

Ying 

 

0 Kudos
mullervki
Beginner
700 Views

Hi Ying,

My apologies for the long absence. I had other urgent matters to take care of.

Here's a summary of where I stand:

1) If I use mkl_sequential I have no problem.

2) If I use mkl_intel_thread and set mkl_set_num_theads(2), then I run into all sorts of problems - even if my code is not threading anything! The Intel Inspector error messages seem to all be coming from Inspector problems themselves. For example, here's a typical case for a data race condition:

Read: the line in my code where I'm calling dtrsm.

Write: _kmp_launch_monitor (???)

I understand these kmp functions are Intel's, but I don't understand why this is happening at all. Like I said, my own code is not even threading anything; I disabled all threading in it.

3) Another data race condition:

Write: the line in my code where I'm calling dtrsm

Write: _kmp_end_split_barrier

4) And yet another:

Read: _kmp_task_team_sync

Write: my call to dtrsm

5) These calls are all inside a linear equation solver, so extracting the data is not a trivial matter. But here's my best attempt. I'm calling dtrsm with the arguments

dtrsm ("L","L","N","U",njcols,nrhs,real_one,lnz,jlen,tmprhs,n)

where njcols=39, nrhs=2,real_one=1.0,jlen=156,n=5997.

The double arrays lnz and tmprhs have their own data in them. I suppose you can fake anything you want there. And, of course, the constant values are all passed by address, not by value.

Any thoughts? Could it be I'm compiling/linking my code incorrectly? That's always a possibility, but I do get the correct answers using the sequential library.

Thanks.

0 Kudos
mullervki
Beginner
700 Views

Ying,

Yet another update. I've managed to put together a small piece of code that just calls dtrsm.

int main( int argc, char* argv[] ) {
   int njcols = 39, nrhs = 2, jlen = 156, n = 5997;
   double real_one = 1.;
   double *lnz, *tmprhs;
   int i;

   tmprhs = (double*)malloc(n*nrhs*sizeof(double));
   for(i = 0; i < n*nrhs; ++i) tmprhs = 1.;
   lnz = (double*)malloc(jlen*njcols*100*sizeof(double));
   for(i = 0; i < jlen*njcols*100; ++i) lnz = -1.;

   mkl_set_num_threads(32);
   dtrsm("L","L","N","U",&njcols,&nrhs,&real_one,lnz,&jlen,tmprhs,&n);
}

I believe that if this is a unit triangular matrix, setting all entries in the matrix to -1. will make the matrix diagonally dominant, hence, non-singular. If I'm wrong please feel free to change the matrix entries. I also wanted to make sure the matrix size was enough to just properly define this diagonally dominant matrix - hence, the 100 factor.

If I run this with inspector I get a data race condition with

Read: _kmp_end_split_barrier

Write: _kmp_launch_monitor

Can you reproduce this problem? I'm running on Windows 7 64-bits building a 64-bit application.

Thanks.

0 Kudos
Ying_H_Intel
Employee
700 Views

Hi Mullervki

This is not problem.  One question, which mkl version are you trying?  

I tried the latest MKL 11.3 (parallel Studio XE 2016),  mkl_intel_thread_dll.lib (dynamic library) 

 Searching C:\Program Files (x86)\Intel_sw_development_tools\compilers_and_libraries_2016.0.036\windows\mkl\lib\intel64_win\mkl_intel_thread_dll.lib:

The program just run fine. and inspect report on data race. But it seems in MKL serv _lock. Could you please try the latest version and let me know the result.  

Best Regards,

Ying memory_0.jpg

0 Kudos
mullervki
Beginner
700 Views

Hi Ying,

I'm using the MKL version that came in Composer XE 2013 SP1. When I have the Inspector window opened, the top rhs displays "Intel Inspector XE 2015".

1) How exactly can I download just the latest version of the MKL libraries, or even find out what version I'm using?

2) You mentioned that Inspector also reported a data race condition inside the MKL library and that "this is not a problem". Is there a way to have Inspector ignore problems that are really not a problem? I ask because I have spent hours on this already and it's very hard for me to tell whether the problem is happening because of something in my application or in the MKL library. In that case of the small example I posted, I was finally able to learn that this had nothing to do with my problem. But, in general, when I have multiple calls to the MKL library coming from various Windows threads in my application, I have no way of telling whether I should ignore the message or not: it looks like any other message I get if there were a real problem in my application. For example, if I had called the DTRSM function from different threads but erroneously using the same X/B arrays, wouldn't I get a similar message of a data race condition inside the MKL library?

Thanks.

0 Kudos
mullervki
Beginner
700 Views

Hi Ying,

I determined that I have been using MKL 11.2 Update 3 for Windows.  The Intel Software Manager tells me there is an update for it, so I'm installing it now. I'll keep you posted.

Thanks.

0 Kudos
mullervki
Beginner
700 Views

Sorry, I meant to say MKL 11.2 Update 3 for Windows is what I can download to update my version. I don't really know what version I'm running, but will be using this latest download.

0 Kudos
mullervki
Beginner
700 Views

Back using the sequential MKL library - and still running into race conditions.

This is what my code looks like when calling DTRSM:

				   dtrsm (
					   "L", "L", "N", "U",
					   &njcols, &int_one,
					   &real_one,
					   &lnz[jlpnt], &jlen,
					   &soln_ptr[fjcol], &njcols
				   );

I have 2 threads at this exact same location where a race condition is detected. In one thread I have the following data:

 

		njcols	3	int
		int_one	1	int
		real_one	1.0000000000000000	double
+		&lnz[jlpnt]	0x0000000040690078 {34024417536.302307}	double *
		jlen	42	int
+		&soln_ptr[fjcol]	0x000000003fcd93a8 {0.00000000000000000}	double *
		njcols	3	int

In the other thread I have the following:

		njcols	3	int
		int_one	1	int
		real_one	1.0000000000000000	double
+		&lnz[jlpnt]	0x0000000040690078 {34024417536.302307}	double *
		jlen	42	int
+		&soln_ptr[fjcol]	0x000000003fcf2cc8 {0.00000000000000000}	double *
		njcols	3	int

The data is identical except, of course, for &soln_ptr[fjcol]. I assume that "lnz[jlpnt]" is a read-only array. All other arguments are constant.

Can anybody tell me which variable in each call reflects a data race condition?

I should also add that I downloaded the source code for the BLAS from NetLib (completely unoptimized!) and used that instead. All problems went away.

0 Kudos
mullervki
Beginner
700 Views

The puzzle continues...

I had my application compiled with /MD. I changed it all to /MT thinking that maybe, somehow, the memory allocation wouldn't be thread safe under /MD.

I also set both MKL_NUM_THREADS and OMP_NUM_THREADS to 1 and chose to use the parallel version of MKL.

To my surprise, I now have a data race condition detected within the SAME thread, Read/Write location being the call to DTRSM. However, rather than Inspector naming the 2 thread numbers where the conflict is coming from, the message now is

"Data race at data location 0x000000000135E0C0 for threads OMP Master Thread #1 and OMP Master Thread #2"

In other words, because I'm now using the parallel version of MKL, it's threading inside the library and Intel Inspector is telling me there is a problem inside this single call to DTRSM, but through different OMP threads inside this call.

I can't make any sense out of this.

The version of MKL is

Intel(R) Math Kernel Library Version 10.3.2 Product Build 20110117 for Intel(R) 64 architecture applications

If there is a way to get a newer version of MKL I don't really know how to do it. I used the Intel update software and it believes it has the latest version.

Any suggestions out there?

Thanks.

 

0 Kudos
Ying_H_Intel
Employee
700 Views

Hi Mullervki, 

The latest version is MKL 11.3  ( there are some memory bug fix since MKL 110.3.2   for example https://software.intel.com/en-us/articles/intel-mkl-111-bug-fixes/  ) . 

You can easily get the latest version with free-community license by register in  http://software.intel.com/sites/campaigns/nest/ . Then you will receive one email, click the link in email, you will able to download the install package. 

Best Regards,

Ying 

 

0 Kudos
mullervki
Beginner
700 Views

Hi Ying,

Again, apologies for the delay. I had a problem with the registration of the software with Intel and it took days of back-and-forth until we could clear this up.

I downloaded and installed the latest MKL:

Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications

Using the sequential version I no longer run into problems with DTRSM. But not everything is resolved: Intel Inspector is reporting what I believe are false date race positives but even though I'm asking it to stop at the problem, I have no stack at any of the threads because it's all happening inside ntdll.dll.

I'm not sure whether I should continue on this thread or start a new one to try to clear up this issue. Any suggestions?

Thanks.

0 Kudos
Ying_H_Intel
Employee
700 Views

Hi Mullervki

Thanks a lot for the updates.  since the problem is also related to Intel inspector,  Could you please help to  post the issue to

https://software.intel.com/en-us/forums/intel-inspector-xe? ;

for example, include the code + build command line  MKL version 

int main( int argc, char* argv[] ) {
02    int njcols = 39, nrhs = 2, jlen = 156, n = 5997;
03    double real_one = 1.;
04    double *lnz, *tmprhs;
05    int i;
06  
07    tmprhs = (double*)malloc(n*nrhs*sizeof(double));
08    for(i = 0; i < n*nrhs; ++i) tmprhs = 1.;
09    lnz = (double*)malloc(jlen*njcols*100*sizeof(double));
10    for(i = 0; i < jlen*njcols*100; ++i) lnz = -1.;
11  
12    mkl_set_num_threads(32);
13    dtrsm("L","L","N","U",&njcols,&nrhs,&real_one,lnz,&jlen,tmprhs,&n);
14 }

 

and Inpector result . 

Thanks

Ying

0 Kudos
mullervki
Beginner
700 Views

Hi Ying,

The Inspector problem can be found here:

https://software.intel.com/en-us/forums/intel-inspector-xe/topic/594242#comment-1841986

I had opened an issue earlier, but now I managed to reproduce the problem exactly as I'm having it without even using the MKL library.

Thanks.

0 Kudos
Ying_H_Intel
Employee
700 Views

Hi mullervki,

Thanks a lot for completing the message.  It seems Inspector 2016 will solve the problem.  

Thanks

Ying

 

 

s

0 Kudos
Reply