Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2275 토론

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts)

ArthurRatz
초보자
2,930 조회수

My MPI program doesn't work (hangs) when you launch processes on different nodes (hosts). In my program I use MPI_Win_allocate_shared function to allocate shared memory using RMA window. And I'm wondering what is the possible cause why my program doesn't work. Do I actually need to implement intercommunicators for that purpose? Here's the code:

MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, proc_rank, MPI_INFO_NULL, &comm_sm);
MPI_Comm_rank(comm_sm, &rank_sm);
MPI_Comm_size(comm_sm, &numprocs_sm);

MPI_Info info_noncontig;
MPI_Info_create(&info_noncontig);
MPI_Info_set(info_noncontig, "alloc_shared_noncontig", "true");

int disp_size = sizeof(ullong);
MPI_Aint array_size = number_of_items * disp_size;
MPI_Win_allocate_shared(array_size, disp_size, info_noncontig, comm_sm, &array, &win_sm);
MPI_Win_shared_query(win_sm, 0, &array_size, &disp_size, &array);

MPI_Barrier(comm_sm);

ullong i_start = proc_rank * number_of_items / (ullong)numprocs;
ullong i_end = (proc_rank + 1) * number_of_items / (ullong)numprocs;

MPI_Win_lock_all(MPI_MODE_NOCHECK, win_sm);

if (proc_rank == 0)
{
 ullong value = number_of_items - 1;
 srand((unsigned)time(NULL) + proc_rank * numprocs + namelen);
 for (ullong index = 0; index < number_of_items; index++, value--)
  array[index] = (rand_mode == 1) ? rand() % rand_seed + 1 : value;

}

MPI_Barrier(comm_sm);

for (ullong index = i_start; index <= i_end; index++)
fprintf(stdout, "%llu ", array[index]);

fprintf(stdout, "\n\n");
fflush(stdout);

MPI_Barrier(comm_sm);

Output:

[COMP-PC.MYHOME.NET@mpiexec] Process 0 of 2
71 81 12 56 66 49 70 39 100 90 27 57 46 66 6 13 39 20 70 4 6 13 16 5
 56 60 90 44 97 5 87 51 44 12 7 54 70 5 29 65 95 69 70 44 45 38 87 1 9 80 54 78
67 77 68 13 16 78 79 40 98 50 74 6 52

[WIN-9MFH3O78GLQ.MYHOME.NET@mpiexec] Process 1 of 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see the process 1 doesn't receive the array buffer address ?!?!?!?!?!

0 포인트
1 솔루션
Sergey_O_Intel
2,930 조회수

hi

as i can see at line 21 there is condition:

if (proc_rank == 0)

it seems proc_rank is rank number in COMM_WORLD, right? in this case only one rank in COMM_WORLD fills array, and it fills array on same node only. but as i can see from output there are 2 different hosts: COMP-PC.MYHOME.NET & WIN-9MFH3O78GLQ.MYHOME.NET (BTW, which OS used? Windows or Linux?), and array on host different from COMM_WORLD:rank 0 will not be updated.

it seems you should update your condition at line 21 to:

if (rank_sm == 0)

thank you for bug report

--Sergey

원본 게시물의 솔루션 보기

0 포인트
6 응답
Sergey_O_Intel
2,931 조회수

hi

as i can see at line 21 there is condition:

if (proc_rank == 0)

it seems proc_rank is rank number in COMM_WORLD, right? in this case only one rank in COMM_WORLD fills array, and it fills array on same node only. but as i can see from output there are 2 different hosts: COMP-PC.MYHOME.NET & WIN-9MFH3O78GLQ.MYHOME.NET (BTW, which OS used? Windows or Linux?), and array on host different from COMM_WORLD:rank 0 will not be updated.

it seems you should update your condition at line 21 to:

if (rank_sm == 0)

thank you for bug report

--Sergey

0 포인트
ArthurRatz
초보자
2,930 조회수

Hello, Sergey. Thank you very much for your reply. I'm going to check this.

0 포인트
ArthurRatz
초보자
2,930 조회수

This question is outdated. I've already solved this problem.

0 포인트
ArthurRatz
초보자
2,930 조회수

And one more question is there any difference between contiguous and non-contiguous memory ?

0 포인트
Sergey_O_Intel
2,930 조회수

hi

could you clarify your question?

in general: Contiguous means it's all in one chunk, so from the start to the end there's nothing else in it. Non-contiguous is the opposite, it means that the memory is fragmented and there are one or more sections that are allocated

0 포인트
ArthurRatz
초보자
2,930 조회수

Thanks for reply.

0 포인트
응답