The structure of my code is,
MPI_Allgatherv(); //Replaced by MPI_Iallgatherv();
Collective operations in part 2 is the bottleneck of this program.
I replaced "MPI_Allgatherv()" by the NBC "MPI_Iallgatherv()" in order to hide the collective communication by part3 and part4. But part3 and part4 take much longer than before. What do you think is the cause of this problem?