Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

A problem with MKL 11.2 Update 3 and ddot_direct

AndrewC
New Contributor III
527 Views

Our software is failing QA on a 8 cores/thread system and is hanging in the main thread. The culprit seems to be ddot_direct. No other user threads are running. This is a new problem introduced with the new <D,Z,S,C>dot direct calls.

     ntdll.dll!NtWaitForSingleObject()  + 0xa bytes    
     KernelBase.dll!WaitForSingleObjectEx()  + 0x9c bytes    
     libiomp5md.dll!__kmp_suspend_64()  + 0x1c0 bytes    
     libiomp5md.dll!__kmp_barrier()  + 0x32d0 bytes    
     libiomp5md.dll!__kmp_join_barrier()  + 0x5fe bytes    
     libiomp5md.dll!__kmp_join_call()  + 0xf1 bytes    
     libiomp5md.dll!__kmpc_fork_call()  + 0x76 bytes    
     mkl_intel_thread.dll!000007fed54250c3()     
     [Frames below may be incorrect and/or missing, no symbols loaded for mkl_intel_thread.dll]    
     mkl_intel_thread.dll!000007fed539a8ba()     
     ddot_direct()  + 0x74 bytes    

 

0 Kudos
9 Replies
Gennady_F_Intel
Moderator
527 Views

Vasci, Could you give more details about the parameters about this function?   

How do you link this case?

any specific CPU where the problem has happened?

 

0 Kudos
AndrewC
New Contributor III
527 Views

 

Parameters as below ( n=2576, incx=1, incy=1)

-        n    0x000000000012ccf0    const int *            2576    const int

-        x    0x000000ef58eca600    const double *  2.1437157556647435e-005    const double

-        incx    0x000000000012cd00    const int *  1    const int
-        y    0x0000000045184380    const double *  4.4600692790355519e-016    const double
-        incy    0x000000000012cd0c    const int *  1    const int
        ret    0.00000000000000000    double

It is being linked with Parallel MKL on Windows 64 in Visual Studio.

Intel Xeon 3.6GHZ also happens on other Xeon machines.

SSE    :Y
SSE2   :Y
SSE3   :Y
SSSE3  :Y
SSE41  :Y
SSE42  :Y
AVX    :Y
AVX2   :N
----------

OS Enabled AVX :Y
AES            :Y
CLMUL          :Y
RDRAND         :Y
F16C           :Y
Maximum number of OpenMP threads:8
MKL Version:Intel(R) Math Kernel Library Version 11.2.3 Product Build 20150413 for Intel(R) 64 architecture applications

Failing at this call to doot_direct

/* {S,D}DOT_DIRECT */
static __inline double mkl_dc_ddot_convert(const MKL_INT *n, const double* x, const MKL_INT *incx, const double *y, const MKL_INT *incy) {
    double ret = 0.0;
    if (MKL_DC_DDOT_CHECKSIZE(n)) {
        ret = mkl_dc_ddot((n), (x), (incx), (y), (incy));
    } else {
        ret = ddot_direct((n), (x), (incx), (y), (incy));
    }
    return ret;
}

As I said, there are no other user threads running at the time, this is being called from the main thread.

 

 

0 Kudos
AndrewC
New Contributor III
527 Views

I have removed the 'direct' calls so that the 'regular' DDOT is called. Interestingly the problems persist.

The issue is 100% reproducible, and only can be worked around by setting OMP_NUM_THREADS=1

 

0 Kudos
AndrewC
New Contributor III
527 Views

FYI, code locked at

     ntdll.dll!NtWaitForSingleObject()  + 0xa bytes    
     KernelBase.dll!WaitForSingleObjectEx()  + 0x9c bytes    
     libiomp5md.dll!__kmp_suspend_64()  + 0x1c0 bytes    
     libiomp5md.dll!__kmp_barrier()  + 0x32d0 bytes    
     libiomp5md.dll!__kmp_join_barrier()  + 0x5fe bytes    
     libiomp5md.dll!__kmp_join_call()  + 0xf1 bytes    
     libiomp5md.dll!__kmpc_fork_call()  + 0x76 bytes    
     mkl_intel_thread.dll!000007fedf2450c3()     
     [Frames below may be incorrect and/or missing, no symbols loaded for mkl_intel_thread.dll]    
     mkl_intel_thread.dll!000007fedf1ba8ba()     
     ddot()  + 0x83 bytes    

 

0 Kudos
AndrewC
New Contributor III
527 Views

Further analysis

  • Replacing DDOT with my 'own' naive DDOT causes the issue to go away - as expected
  • A simple test program with the same  input parameters does not reproduce the problem

 

0 Kudos
AndrewC
New Contributor III
527 Views

Just FYI,

This program crashes instantly on the call to MKL_Thread_Free_Buffers(); I know its a bit perverse, but this is new in latest Update.

 

 

int _tmain(int argc, _TCHAR* argv[])
{
    int n=2576;
    double *x=(double *)malloc(sizeof(double) * n);
    double *y=(double *)malloc(sizeof(double) * n);
    int incx=1;
    int incy=1;
    for(int i=0;i<n;i++){
        x=i;
        y=i*2;
    }
    MKL_Thread_Free_Buffers();
    for(int j=0;j<10000000;j++){
        double res=ddot(&n, x, &incx, y, &incy);
    }
}

 

0 Kudos
Sarah_K_Intel
Employee
527 Views

Thank you for your detailed analysis.  I could reproduce the crash with Intel MKL 11.2.3 on Windows when using dynamic linking (but not with static linking).  We are looking into the issue in more detail.

As a potential workaround, inserting a call to ddot (or likely to any MKL call that is sufficiently large enough for thread initialization to occur) before the MKL_Thread_Free_Buffers() call appeared to make the crash no longer occur.  Can you please see if this workaround works for you?

Explicitly, I modified your reproducer to be:

int _tmain(int argc, _TCHAR* argv[])
{
    int n=2576;
    double *x=(double *)malloc(sizeof(double) * n);
    double *y=(double *)malloc(sizeof(double) * n);
    int incx=1;
    int incy=1;
    for(int i=0;i<n;i++){
        x=i;
        y=i*2;
    }
    ddot(&n, x, &incx, y, &incy);  
    MKL_Thread_Free_Buffers();
    for(int j=0;j<10000000;j++){
        double res=ddot(&n, x, &incx, y, &incy);
    }
}

0 Kudos
AndrewC
New Contributor III
527 Views

Hi, I had already implemented your suggested workaround - that is, do not call MKL_Thread_Free_Buffers() until some calls into MKL have been made to initialise the buffers.

I am more concerned about the ddot issue that started this thread. It's clearly a subtle issue, but it is 'new' and hopefully by comparing ddot from a previous version of MKL to the latest will show why it has arisen.

 

 

0 Kudos
John_L_8
Beginner
527 Views
The problem of " libiomp5md.dll!__kmp_suspend_64() " does not only result from ddot, but also comes with OpenMP. And MKL_Thread_Free_Buffers() does not work usually. Who can solve this problem completely? Thank you very much.
0 Kudos
Reply