Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
1989 Discussions

Bug Report: MPI_Comm attribute deleter not called

jcrean
Beginner
344 Views

I believe I found a bug in IntelMPI 2021.3. In the code below, the call to MPI_Comm_free() does not cause the deleter function for the attribute to be called. The MPI 3.1 standard specifies that the deleter should be called from MPI_Comm_free (Section 6.4.3, page 248, lines 26-27).

 

I tested OpenMPI 4.0, MPICH 3.2, and IntelMPI 2018.4, and they all call the deleter as expected.

 

The code below is a minimal reproducible example. Be sure to compile it in debug mode so the assert() fires.

 

Note: the outstanding MPI_Ibarrier() appears to be important. If the MPI_Ibarrier() is not in progress, the callback is executed as expected.

 

#include "mpi.h"
#include <iostream>
#include <cassert>
#include <stdexcept>

MPI_Request request;
bool is_complete;

int comm_destructor_callback(MPI_Comm comm, int keyval, void* attribute_val, void* extra_state)
{
std::cout << "destructor callback executing" << std::endl;
MPI_Wait(&request, MPI_STATUS_IGNORE);
is_complete = true;

return MPI_SUCCESS;
}

int main(int argc, char* argv[])
{
// initialization
MPI_Init(&argc, &argv);
MPI_Comm comm1 = MPI_COMM_WORLD, comm2;
MPI_Comm_dup(comm1, &comm2);

// set up attribute
int keyval;
MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN, &comm_destructor_callback, &keyval, nullptr);
MPI_Comm_set_attr(comm2, keyval, nullptr);

// start Ibarrier then immediately free comm2
is_complete = false;
MPI_Ibarrier(comm2, &request);
std::cout << "about to free comm2" << std::endl;
MPI_Comm_free(&comm2);
std::cout << "finished freeing comm2" << std::endl;
assert(is_complete);

MPI_Finalize();

return 0;
}

 

PS: is there a way to attach C++ source code to a post? I tried attaching main.cxx, but it was rejected because the file contents did not match the extension.

Labels (1)
0 Kudos
6 Replies
SantoshY_Intel
Moderator
321 Views

Hi,

 

Thanks for reaching out to us.

 

We were able to reproduce your issue at our end using the latest Intel MPI Library 2021.5 on a Linux machine.

 

We have reported this issue to the concerned development team. They are looking into your issue.

 

>>" is there a way to attach C++ source code to a post? I tried attaching main.cxx, but it was rejected because the file contents did not match the extension."

Yes, you can share files through a .zip file or you can attach .cpp files.

 

Thanks & Regards,

Santosh

 

 

jcrean
Beginner
295 Views

Thanks.  Once the bug is fixed, can you post which versions of IntelMPI have the fix included?

SantoshY_Intel
Moderator
285 Views

Hi,

 

>>"Once the bug is fixed, can you post which versions of IntelMPI have the fix included?"

We will update you in the forum once the issue is fixed.

 

Thanks & Regards,

Santosh

 

 

jcrean
Beginner
252 Views

Great, thanks.

SantoshY_Intel
Moderator
111 Views

Hi,

 

According to MPI-3.1 standard, section 6.4.3"Communicator Destructors" describing MPI_Comm_free call: "This collective operation marks the communication object for deallocation. The handle is set to MPI_COMM_NULL. Any pending operations that use this communicator will complete normally; the object is actually deallocated only if there are no other active references to it. This call applies to intra- and inter-communicators. The delete callback functions for all cached attributes (see Section 6.7) are called in arbitrary order. "

 

MPI_Ibarrier is a nonblocking version of MPI_barrier. By calling MPI_Ibarrier, a process notifies that it has reached the barrier. The call returns immediately, independent of whether other processes have called MPI_Ibarrier.  Since MPI_Ibarrier is in progress at the moment of MPI_Comm_free, the communicator is only marked for deallocation and actual deallocation happens only as soon as all pending operations will complete. So delete callbacks will be called upon pending operations completion.

 

That does not seem to be a problem and looks like a valid behavior. We would recommend waiting till operation completion before the MPI_Comm_free call (in that case all handlers would be called from MPI_Comm_free).

 

Also, we have tried multiple MPI versions and the example doesn't work for all of them (please find examples below), except OpenMPI. But it seems that OpenMPI doesn't implement proper reference counting as we can find here:

https://github.com/open-mpi/ompi/blob/main/ompi/mpi/c/comm_free.c#L61 and here https://github.com/open-mpi/ompi/blob/main/ompi/communicator/comm.c#L1992 but just clean all attributes.

~/tests$ mpicxx.mpich 1.cpp -o test.out
~/tests$ mpirun.mpich -n 2 ./test.out
about to free comm2
finished freeing comm2
about to free comm2
finished freeing comm2
~/tests$ mpirun.mpich --version
HYDRA build details:
    Version:                                 3.3.2

...

~/tests$ mpicxx.openmpi 1.cpp -o test.out
~/tests$ mpirun.openmpi -n 2 ./test.out
about to free comm2
destructor callback executing
finished freeing comm2
about to free comm2
destructor callback executing
finished freeing comm2
~/tests$ mpirun.openmpi --version
mpirun.openmpi (OpenRTE) 4.0.3


~/tests$ mpicxx 1.cpp -o test.out
~/tests$ mpirun -n 2 ./test.out
about to free comm2
finished freeing comm2
about to free comm2
finished freeing comm2
~/tests$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2018 Update 5 Build 20190404 (id: 18839)

Copyright 2003-2019 Intel Corporation.

~/tests$ mpicxx 1.cpp -o test.out
~/tests$ mpirun -n 2 ./test.out
about to free comm2
finished freeing comm2
about to free comm2
finished freeing comm2
~/tests$ mpirun --version

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Copyright 2003-2022, Intel Corporation.

 

 

Thanks & Regards,

Santosh

 

SantoshY_Intel
Moderator
95 Views

Hi,


I assume we have answered your query. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


Reply