- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am working with IntelMPI version 4.1.0.024 and I detected a problem with the MPI_Barrier() function (maybe a bug).
In the attached code I create a new process via the MPI_Comm_spawn function. Then I merge the intercomm and
the parent communicator with the MPI_Intercomm_merge function and I call a MPI_Barrier() function with the new
communicator.
The problem is some processes don't continue the execution (they remain held in the MPI_Barrier() function).
I have tested the code with other MPI implementations and it works fine.
Any solution??
Thanks,
Iván Cores.
The code is:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main( int argc, char *argv[] )
{
MPI_Comm parentcomm, intercomm;
printf("Starting ...\n");
MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
if (parentcomm == MPI_COMM_NULL)
{
char *newHost;
newHost = (char *)malloc(sizeof(char) * 255);
//Open de file
//Read host for new process from source file
//For this tests:
memcpy(newHost, "compute-0-0");
//Host for new process
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", newHost);
// Create 1 more process
int errcodes[1];
MPI_Comm_spawn( "~/testSpawn/spawn_example", MPI_ARGV_NULL, 1, info, 0, MPI_COMM_WORLD, &intercomm, errcodes );
char hostname[256];
gethostname(hostname,255);
printf(" I'm the parent %s.\n", hostname);
//Merge between the intercomm and the intracomm
MPI_Comm comm_new_and_old;
MPI_Intercomm_merge(intercomm, 0, &comm_new_and_old);
int npesNEW = -1;
int myidNEW = -1;
MPI_Comm_size(comm_new_and_old, &npesNEW);
MPI_Comm_rank(comm_new_and_old, &myidNEW);
printf(" Im %d of %d.\n", myidNEW, npesNEW);
//PROBLEMATIC BARRIER.
MPI_Barrier(comm_new_and_old);
printf(" After barrier %d\n", myidNEW);
MPI_Comm_free(&comm_new_and_old);
}
else
{
char hostname2[256];
gethostname(hostname2,255);
printf(" I'm the spawned %s.\n", hostname2);
//Merge between the intercomm and the intracomm
MPI_Comm comm_new_and_old;
MPI_Intercomm_merge(parentcomm, 1, &comm_new_and_old);
int npesNEW = -1;
int myidNEW = -1;
MPI_Comm_size(comm_new_and_old, &npesNEW);
MPI_Comm_rank(comm_new_and_old, &myidNEW);
printf(" Im %d of %d (New).\n", myidNEW, npesNEW);
//PROBLEMATIC BARRIER.
MPI_Barrier(comm_new_and_old);
printf(" After barrier (New proc.)\n");
MPI_Comm_free(&comm_new_and_old);
}
MPI_Finalize();
return 0;
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
I had to make some modifications to your program (memcpy needs the length argument, and changing names of the host and the executable to launch), but with those modifications I was able to compile and run with no problems using 4.1.0.024. What compiler are you using?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thanks for your answer. I apologize for the problem with the memcpy, it was a change in the last second to simplify the code without check. About the compiler we use the icc version 12.1.5.
We are using a new cluster (Intel Sandy Bridge with Infiniband). Could be a problem with the configure?
Sincerely,
Iván Cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
I made a few more changes to the code so there are less hardcoded values. Try running the attached example with I_MPI_DEBUG=5 and send me the output.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
I run your code with I_MPI_DEBUG=5. I attached the output file.
I think it is a problem with the InfiniBand controller, but I don't know if I should change the I_MPI_FABRICS_LIST parameter.
Sincerely,
Iván Cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
I see, running with multiple ranks for the initial program and using DAPL I am able to reproduce the issue. I'm going to investigate this some more and I'll let you know when I've got more information.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Is there anything new about the issue?
Sincerely,
Iván Cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
I apologize for not responding sooner. From my investigations, I believe there is a bug we will need to correct. Running the correctness checking library shows that the intercommunicators are invalid, even for a simple example I have. They are "working" at small rank counts, but there is definitely a problem somewhere.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
The root problem is that some parameters were being obtained directly by the spawned processes, rather than from the spawning processes. This led to inconsistencies in the full job. The developers have corrected this and the fix should be available in the next release.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thank you so much for your response. I hope the next release will be available soon.
Sincerely,
Iván Cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
We are planning to release the update this summer.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page