Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Received data has zeros...

seongyun_k_
Beginner
123 Views

 

I have a few mega bytes of data and fixed size buffer (4MB).

So the sender thread iterates sending (MPI_Send) a fixed sized data at a time until it sends all the data.
The receiver knows how many bytes it will receive before beginning and iterates (MPI_Recv) using the same fixed sized buffer (4MB)

Sometimes the receiver receives data with all zeros only when it receives the last remaining data (possibly smaller than 4MB)

 - I checked that the sender sends the correct data. ('CheckContents' on the below code)
 - I checked that the receiver received the correct amount of bytes.

Sender(int bytes_to_send) {
   char* buffer = new char[4MB];
   while (;) {
      memcpy_from_some_other_buffer_into (buffer);
      CheckContents(buffer);
      MPI_Send(buffer, 4MB or remaining bytes);
      CheckContents(buffer);
   }
   delete [] buffer;
}

Receiver has the exactly same loop form except for the fact that it uses 'MPI_Recv'.

- 1. Is it possible that MPI_Send() still has the reference to the 'buffer' that I delete right after the last call?
- 2. Is there any method that I can use to debug this problem?

 

Can the following flags affect the program's correctness?
I_MPI_FABRICS
I_MPI_FALLBACK
MPICH_ASYNC_PROGRESS
I_MPI_PIN
I_MPI_DYNAMIC_CONNECTION

0 Kudos
1 Reply
Michael_Intel
Moderator
123 Views

Hello,

1. Is it possible that MPI_Send() still has the reference to the 'buffer' that I delete right after the last call?

No, the blocking send operation MPI_Send() might be treated in a similar way like non-blocking MPI_ISend() - but in this particular case (the so called Eager protocol), the MPI library will create a copy of the send buffer before returning control back to the user.

2. Is there any method that I can use to debug this problem?

Yes, in cases where your code violates the MPI standard or where the data transmission gets corrupted, the Intel® Trace Analyzer and Collector provides some correctness checking functionality. Please see the related reference manual for further information: https://software.intel.com/en-us/node/561293

Can the following flags affect the program's correctness?

No.

Best regards,

Michael

Reply