Unexplained crashes during first set of offloads

John_F_1 · ‎04-23-2014

I'm encountering a non-deterministic crash (it happens once every ~50 executions) that is making me scratch my head. The crash seems to happen near the start of the application, and the dialog box that pops up in windows reports the following:

 Problem Event Name:    APPCRASH
  Application Name:    M21Driver.exe
  Application Version:    0.0.0.0
  Application Timestamp:    5357af5b
  Fault Module Name:    coi_host.dll
  Fault Module Version:    0.0.0.0
  Fault Module Timestamp:    533c374b
  Exception Code:    c0000005
  Exception Offset:    00000000000182a6
  OS Version:    6.1.7601.2.1.0.274.10
  Locale ID:    1033
  Additional Information 1:    4f29
  Additional Information 2:    4f294221479b9e035dee5100eb4b064e
  Additional Information 3:    666a
  Additional Information 4:    666ae0416fc870b83fed8f7dde00c38b

The offload_main process on the MIC is still alive, and I've attached GDB to it and found it in this state:

#0  0x00007f52ae4ab9d0 in sem_wait () from /lib64/libpthread.so.0
#1  0x00007f52afa392d7 in ?? () from /usr/lib64/libcoi_device.so.0
#2  0x00007f52afa3936e in ?? () from /usr/lib64/libcoi_device.so.0
#3  0x00007f52afa0b240 in COIProcessWaitForShutdown () from /usr/lib64/libcoi_device.so.0
#4  0x00007f52af391706 in __offload_target_main () from /tmp/coi_procs/1/44736/liboffload.so.5
#5  0x00000000004009e5 in main ()

On the host side, I have attached a remote debugger (I do not have Visual Studio + Intel on the machine with the MIC, so I am attaching the remote debugger from my desktop, thus I have no way of debugging the offload regions!) and found the code in this state:

Not Flagged    >    8304    0    Main Thread    Main Thread    coi_host.dll!000007fee4db82a6    Normal
Not Flagged        7420    0    Worker Thread    libiomp5md.dll!__kmp_launch_monitor()   libiomp5md.dll!__kmp_launch_monitor    Highest
Not Flagged        8620    0    Worker Thread    libiomp5md.dll!__kmp_launch_monitor()    libiomp5md.dll!__kmp_suspend    Normal
Not Flagged        8552    0    Worker Thread    libiomp5md.dll!__kmp_launch_monitor()   libiomp5md.dll!__kmp_suspend    Normal
Not Flagged        6864    0    Worker Thread    libiomp5md.dll!__kmp_launch_monitor()   libiomp5md.dll!__kmp_suspend    Normal
Not Flagged        6280    0    Worker Thread    ntdll.dll thread    ntdll.dll!0000000077c5186a    Normal
Not Flagged        9024    0    Worker Thread    coi_host.dll thread    uSCIF.dll!000007feeb452019    Normal
Not Flagged        8932    0    Worker Thread    coi_host.dll thread    uSCIF.dll!000007feeb452019    Normal
Not Flagged        7324    0    Worker Thread    coi_host.dll thread    coi_host.dll!000007fee4e12d84    Normal
Not Flagged        9064    0    Worker Thread    msvcr110.dll!_threadstartex    coi_host.dll!000007fee4e64430    Normal
Not Flagged        7548    0    Worker Thread    msvcr110.dll!_threadstartex    mswsock.dll!000007fefcf1c699    Normal
Not Flagged        8896    0    Worker Thread    coi_host.dll thread    uSCIF.dll!000007feeb452019    Normal
Not Flagged        6148    0    Worker Thread    coi_host.dll thread    coi_host.dll!000007fee4e4135f    Normal
Not Flagged        8576    0    Worker Thread    msvcr110.dll!_threadstartex    zlibwapi.dll!000007feea625543    Normal
Not Flagged        6140    0    Worker Thread    msvcr110.dll!_threadstartex    coi_host.dll!000007fee4e64430    Normal
Not Flagged        6080    0    Worker Thread    coi_host.dll thread    uSCIF.dll!000007feeb452019    Normal

You can see that I have several threads that offload, as well as some threads doing other work. I am pretty confident that the threads doing the work on the CPU are not causing the crash, as when I run without offloads through the Inspector it doesn't show anything interesting. I've really trimmed down the code that is being executed to the point that it is nearly doing nothing at all besides a few offloads. This makes me wonder, is it allowed to offload from multiple threads concurrently in the same process? Is there anything else that can be gleaned from the program's state at the time coi_host.dll goes belly up? I am using MPSS 3.2.1 and Composer XE 2013 SP1 (w_ccompxe_2013_sp1.2.176) Any ideas or suggestions?

John_F_1 · ‎04-23-2014

We were able to extract the call stack from the thread that throws an exception:

Child-SP          RetAddr           Call Site
00000000`002bbc58 000007fe`fd3b1430 ntdll!NtWaitForMultipleObjects+0xa
00000000`002bbc60 00000000`77422ce3 KERNELBASE!GetCurrentProcess+0x40
00000000`002bbd60 00000000`77499105 kernel32!WaitForMultipleObjectsEx+0xb3
00000000`002bbdf0 00000000`77499287 kernel32!WinExec+0x3b5
00000000`002bbe90 00000000`774992df kernel32!WinExec+0x537
00000000`002bbec0 00000000`774994fc kernel32!WinExec+0x58f
00000000`002bbef0 00000000`775b3398 kernel32!UnhandledExceptionFilter+0x1fc
00000000`002bbfd0 00000000`775385c8 ntdll!MD5Final+0x1de8
00000000`002bc000 00000000`77549d2d ntdll!_C_specific_handler+0x9c
00000000`002bc070 00000000`775391cf ntdll!RtlDecodePointer+0xad
00000000`002bc0a0 00000000`77571248 ntdll!RtlUnwindEx+0xbbf
00000000`002bc780 000007fe`eb7b9d3d ntdll!KiUserExceptionDispatcher+0x2e
00000000`002bceb0 000007fe`eb7bfefc coi_host!fragcount_node::IncNumCompleted+0x11fd
00000000`002bcfc0 000007fe`eb7c0292 coi_host!TaskScheduler::Complete+0x56c
00000000`002bd010 000007fe`eb79e2be coi_host!TaskScheduler::RunReady+0x12
00000000`002bd040 000007fe`eb7b2b55 coi_host+0xe2be
00000000`002bd190 00000001`80011f40 coi_host!COIBufferCopy+0x215
00000000`002bd240 00000001`8000d80e LIBOFFLOAD!_dbg_target_so_unloaded+0x4a30
00000000`002bd300 00000001`8002ce0e LIBOFFLOAD!_dbg_target_so_unloaded+0x2fe
00000000`002bd400 00000001`3ff89d3c LIBOFFLOAD!_offload_offload1+0x5e
00000000`002bd470 00000001`3ff86f2e M21Driver!PreProcess+0xb2c

It appears the function

fragcount_node::IncNumCompleted+0x11fd

is the function that throws an unhandled exception. This further makes me question if offloading in multiple threads is OK.

John_F_1 · ‎04-25-2014

Here is an alternative thread stack at time of crash:

Child-SP          RetAddr           Call Site
00000000`8bc5e718 000007fe`fd4d1430 ntdll!NtWaitForMultipleObjects+0xa
00000000`8bc5e720 00000000`77412ce3 KERNELBASE!GetCurrentProcess+0x40
00000000`8bc5e820 00000000`77489105 kernel32!WaitForMultipleObjectsEx+0xb3
00000000`8bc5e8b0 00000000`77489287 kernel32!WinExec+0x3b5
00000000`8bc5e950 00000000`774892df kernel32!WinExec+0x537
00000000`8bc5e980 00000000`774894fc kernel32!WinExec+0x58f
00000000`8bc5e9b0 00000000`775a3398 kernel32!UnhandledExceptionFilter+0x1fc
00000000`8bc5ea90 00000000`775285c8 ntdll!MD5Final+0x1de8
00000000`8bc5eac0 00000000`77539d2d ntdll!_C_specific_handler+0x9c
00000000`8bc5eb30 00000000`775291cf ntdll!RtlDecodePointer+0xad
00000000`8bc5eb60 00000000`77561248 ntdll!RtlUnwindEx+0xbbf
00000000`8bc5f240 000007fe`ed4382a6 ntdll!KiUserExceptionDispatcher+0x2e
00000000`8bc5f950 000007fe`ed4486fc coi_host+0x182a6
00000000`8bc5f980 000007fe`ed44ad55 coi_host!fragcount_node::AllFragCompleted+0x186c
00000000`8bc5fa20 000007fe`ed44fefc coi_host!fragcount_node::IncNumCompleted+0x2215
00000000`8bc5fb10 000007fe`ed450292 coi_host!TaskScheduler::Complete+0x56c
00000000`8bc5fb60 000007fe`ed4930b0 coi_host!TaskScheduler::RunReady+0x12
00000000`8bc5fb90 000007fe`ed492da8 coi_host!COIPipelineSetCPUMask+0x9dc0
00000000`8bc5fc50 000007fe`ed494ef9 coi_host!COIPipelineSetCPUMask+0x9ab8
00000000`8bc5fd10 00000000`7740652d coi_host!COIPipelineSetCPUMask+0xbc09
00000000`8bc5fd40 00000000`7753c541 kernel32!BaseThreadInitThunk+0xd
00000000`8bc5fd70 00000000`00000000 ntdll!RtlUserThreadStart+0x21

As you can see, the entire call stack comes from coi_host.dll. How can I figure out what is happening? This crashes once out of 150 executions. Furthermore I can see that offload_main is still executing on the MIC when it crashes, and when I attach the debugger my offload regions are all spinning waiting for data from the host which will never arrive because the host has crashed. Is there an issue with coi_host!fragcount_node::IncNumCompleted+0x2215?

Kevin_D_Intel · ‎04-25-2014

Pardon the delayed response, John. Thank you for your extra effort to debug this and details that have already provided.
In consulting with out offload compiler developers, from the offload aspect, if you can collect the OFFLOAD_REPORT=3 output (or use the API mentioned below) then we can analyze the offload activity to see whether that sheds any additional information.
There is an _Offload_report() API that you can use if you feel there’s certain points of offload that are suspect. That would trim down the amount of data produced versus using the OFFLOAD_REPORT environment variable for all offload activity in the application. (More information about the Offload Report is discussed in the UG here)

John_F_1 · ‎04-28-2014

Thanks for your response. It would appear that the offload that fails is one that reuses a buffer in a dedicated thread. Here's what I'm doing:

#pragma offload_transfer target (mic : m_deviceId)  \
    if (m_useMic) \
    nocopy(buffer1[0:4421000] : alloc_if(1) free_if(0)) \
    nocopy(buffer2[0:4421000] : alloc_if(1) free_if(0)) \
    nocopy(buffer3[0:4421000] : alloc_if(1) free_if(0)) \
    nocopy(buffer4[0:4421000] : alloc_if(1) free_if(0))

And then later on in the same thread I re-use these buffers multiple times:

#pragma offload target (mic : m_deviceId)  \
    if (m_useMic) \
    in(vector1_data[0:vector1_size] : alloc_if(0) free_if(0) into(buffer1[0:vector1_size])) \
    in(vector2_data[0:vector2_size] : alloc_if(0) free_if(0) into(buffer2[0:vector2_size])) \
    in(vector3_data[0:vector3_size] : alloc_if(0) free_if(0) into(buffer3[0:vector3_size])) \
    in(vector4_data[0:vector4_size] : alloc_if(0) free_if(0) into(buffer4[0:vector4_size])) 
        {
            //assign vectors on the MIC
        }

However, the exception is always thrown from coi_host.dll, and it is always this function

fragcount_node::IncNumCompleted

I should mention that I am running multiple copies of this thread, and they all have their own private buffers. At the time of the exception, at least two such threads are in the "assign" offload.

Kevin_D_Intel · ‎04-28-2014

Thank you for the additional details. I repeated a private reply that did not reach you last week to request some additional information. I'll also inquire w/Developers whether collecting OFFLOAD_REPORT details for at least those two offloads would really be helpful at this point.

Kevin_D_Intel · ‎09-26-2014

To formally close this thread, the underlying root cause related to some internal clean-up within COI. The changes (tracked under the internal Intel Tracking ID: HSD# 4868965) to address the issue appear in the MPSS 3.3 (July 14 2014) release.