Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Communicator creation fails with Trace Collector correctness checking on

Denis_D_3
Beginner
430 Views

Hi,

I am using Intel MPI Library 5.1.2.146 and Intel C++ Compiler 16.0.1.146 under Windows 7 Professional x64.

The following program:

#include <mpi.h>

int main(int nArgC, char *apszArgV[])
{
  MPI_Comm comm;
  MPI_Init(&nArgC, &apszArgV);
  MPI_Comm_dup(MPI_COMM_WORLD, &comm);
  MPI_Finalize();
}

built with the following command:

mpiicc -check_mpi send_leak.c

and run with the following command:

mpiexec^
  -hosts stu003-home -n 2^
  -trace-pt2pt^
  -trace-collectives^
  -print-rank-map^
  send_leak^
  --itc-args --check-tracing ON

gives the following output:

...
[0] WARNING: EXCEPTION_ACCESS_VIOLATION occurred                                       
[0] ERROR: Signal 3 caught in ITC code section.                                        
[0] ERROR: Either ITC is faulty or (more likely in a release version)                  
[0] ERROR: the application has corrupted ITC's internal data structures.               
[0] ERROR: Giving up now...                                                            
[1] ERROR: Signal 3 caught in ITC code section.                                        
[1] ERROR: Either ITC is faulty or (more likely in a release version)                  
[1] ERROR: the application has corrupted ITC's internal data structures.               
[1] ERROR: Giving up now...                                                            
                                                                                       
[0] ERROR: LOCAL:EXIT:SIGNAL: fatal error                                              
[0] ERROR:    Fatal signal 3 (???) raised.                                             
[0] ERROR:    Stack back trace:                                                        
[0] ERROR:       (send_leak)                                                           
[0] ERROR:       BaseThreadInitThunk (kernel32)                                        
[0] ERROR:       RtlUserThreadStart (ntdll)                                            
[0] ERROR:       ()                                                                    
[0] ERROR:    While processing:                                                        
[0] ERROR:       MPI_Comm_dup(comm=MPI_COMM_WORLD, *newcomm=0x00000000002afc60)        
                                                                                       
[1] ERROR: LOCAL:EXIT:SIGNAL: fatal error                                              
[1] ERROR:    Fatal signal 3 (???) raised.                                             
[1] ERROR:    Stack back trace:                                                        
[1] ERROR:       (send_leak)                                                           
[1] ERROR:       BaseThreadInitThunk (kernel32)                                        
[1] ERROR:       RtlUserThreadStart (ntdll)                                            
[1] ERROR:       ()                                                                    
[1] ERROR:    While processing:                                                        
[1] ERROR:       MPI_Comm_dup(comm=MPI_COMM_WORLD, *newcomm=0x00000000001af780)        
[0] WARNING: starting emergency trace file writing                                     
[0] WARNING: proceeding although 2 out of 2 processes are in an unknown state          
[0] INFO: Writing tracefile send_leak.stf in D:\Work\CPP\Tutorial\fast                                                                                                        

In fact any MPI call that creates a new communicator (not only MPI_Comm_dup()) leads to the same result.

Please help with my problem. Thanks in advance!

0 Kudos
1 Reply
Dmitry_K_Intel2
Employee
430 Views

Hi Denis,

Thank you for the report. It looks a real bug in the library. Quite strange because Linux version of the Message Checker doesn't expose the issue and internal code is the same. Could you please use correctness checking on Linux while we are fixing the issue.

Thanks!
---Dmitry

 

0 Kudos
Reply