Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts)

ArthurRatz
Novice
967 Views

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts). In my program I use MPI_Win_allocate_shared function to allocate shared memory using RMA window. And I'm wondering what is the possible cause why my program doesn't work. Do I actually need to implement intercommunicators for that purpose? Here's the code:

MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, proc_rank, MPI_INFO_NULL, &comm_sm);
MPI_Comm_rank(comm_sm, &rank_sm);
MPI_Comm_size(comm_sm, &numprocs_sm);

MPI_Info info_noncontig;
MPI_Info_create(&info_noncontig);
MPI_Info_set(info_noncontig, "alloc_shared_noncontig", "true");

int disp_size = sizeof(ullong);
MPI_Aint array_size = number_of_items * disp_size;
MPI_Win_allocate_shared(array_size, disp_size, info_noncontig, comm_sm, &array, &win_sm);
MPI_Win_shared_query(win_sm, 0, &array_size, &disp_size, &array);

MPI_Barrier(comm_sm);

ullong i_start = proc_rank * number_of_items / (ullong)numprocs;
ullong i_end = (proc_rank + 1) * number_of_items / (ullong)numprocs;

MPI_Win_lock_all(MPI_MODE_NOCHECK, win_sm);

if (proc_rank == 0)
{
 ullong value = number_of_items - 1;
 srand((unsigned)time(NULL) + proc_rank * numprocs + namelen);
 for (ullong index = 0; index < number_of_items; index++, value--)
  array[index] = (rand_mode == 1) ? rand() % rand_seed + 1 : value;

}

MPI_Barrier(comm_sm);

for (ullong index = i_start; index <= i_end; index++)
fprintf(stdout, "%llu ", array[index]);

fprintf(stdout, "\n\n");
fflush(stdout);

MPI_Barrier(comm_sm);

Output:

[COMP-PC.MYHOME.NET@mpiexec] Process 0 of 2
71 81 12 56 66 49 70 39 100 90 27 57 46 66 6 13 39 20 70 4 6 13 16 5
 56 60 90 44 97 5 87 51 44 12 7 54 70 5 29 65 95 69 70 44 45 38 87 1 9 80 54 78
67 77 68 13 16 78 79 40 98 50 74 6 52

[WIN-9MFH3O78GLQ.MYHOME.NET@mpiexec] Process 1 of 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see the process 1 doesn't receive the array buffer address ?!?!?!?!?!

0 Kudos
1 Solution
Sergey_O_Intel
Employee
967 Views

hi

as i can see at line 21 there is condition:

if (proc_rank == 0)

it seems proc_rank is rank number in COMM_WORLD, right? in this case only one rank in COMM_WORLD fills array, and it fills array on same node only. but as i can see from output there are 2 different hosts: COMP-PC.MYHOME.NET & WIN-9MFH3O78GLQ.MYHOME.NET (BTW, which OS used? Windows or Linux?), and array on host different from COMM_WORLD:rank 0 will not be updated.

it seems you should update your condition at line 21 to:

if (rank_sm == 0)

thank you for bug report

--Sergey

View solution in original post

0 Kudos
6 Replies
Sergey_O_Intel
Employee
968 Views

hi

as i can see at line 21 there is condition:

if (proc_rank == 0)

it seems proc_rank is rank number in COMM_WORLD, right? in this case only one rank in COMM_WORLD fills array, and it fills array on same node only. but as i can see from output there are 2 different hosts: COMP-PC.MYHOME.NET & WIN-9MFH3O78GLQ.MYHOME.NET (BTW, which OS used? Windows or Linux?), and array on host different from COMM_WORLD:rank 0 will not be updated.

it seems you should update your condition at line 21 to:

if (rank_sm == 0)

thank you for bug report

--Sergey

0 Kudos
ArthurRatz
Novice
967 Views

Hello, Sergey. Thank you very much for your reply. I'm going to check this.

0 Kudos
ArthurRatz
Novice
967 Views

This question is outdated. I've already solved this problem.

0 Kudos
ArthurRatz
Novice
967 Views

And one more question is there any difference between contiguous and non-contiguous memory ?

0 Kudos
Sergey_O_Intel
Employee
967 Views

hi

could you clarify your question?

in general: Contiguous means it's all in one chunk, so from the start to the end there's nothing else in it. Non-contiguous is the opposite, it means that the memory is fragmented and there are one or more sections that are allocated

0 Kudos
ArthurRatz
Novice
967 Views

Thanks for reply.

0 Kudos
Reply