Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2234 Discussions

MPI_Win_lock: extremely long time for waiting

foxtran
New Contributor I
680 Views

Hello! 

I have tried to debug MPI_Win_lock because of high MPI imbalance.

In the attached example, one may notice that at final all MPI ranks are sleep on lock3, until root MPI rank did not finish his job. For example, on my machine with 16 MPI ranks the output looks like: 

hop MPI: 0
no-hop MPI: 0
lock3, MPI: 14 delay: 8455487
lock3, MPI: 15 delay: 8469940
lock3, MPI: 0 delay: 15
Done!
lock3, MPI: 13 delay: 8427417
lock3, MPI: 6 delay: 7120527
lock3, MPI: 7 delay: 7459421
lock3, MPI: 8 delay: 7716278
lock3, MPI: 9 delay: 7959353
lock3, MPI: 10 delay: 8112826
lock3, MPI: 11 delay: 8249979
Done!
lock3, MPI: 1 delay: 1665868
Done!
lock3, MPI: 2 delay: 4028945
Done!
lock3, MPI: 3 delay: 5681485
Done!
lock3, MPI: 4 delay: 6210385
Done!
lock3, MPI: 5 delay: 6757722

Delay is given with microsecond resolution, so, lock time is from 1.6 to 8 seconds that is awful! For me, it looks like while root MPI rank will not finish his work, other MPI ranks are sleeping in Lock.

I have tried to use I_MPI_ASYNC_PROGRESS=1, but it leads to segfaults (that is problem too).
One of thread gives me that:

Image              PC                Routine            Line        Source
libpthread-2.28.s  00007F118DEFACF0  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007F118E82BCE5  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007F118E76FD37  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007F118E74C94E  MPI_Win_lock          Unknown  Unknown
libmpifort.so.12.  00007F1198455C61  pmpi_win_lock_        Unknown  Unknown
a.out              0000000000405A12  Unknown               Unknown  Unknown
a.out              000000000040674C  Unknown               Unknown  Unknown
a.out              00000000004052ED  Unknown               Unknown  Unknown
libc-2.28.so       00007F118D3CFD85  __libc_start_main     Unknown  Unknown
a.out              000000000040520E  Unknown               Unknown  Unknown

Another one: 

Image              PC                Routine            Line        Source
libpthread-2.28.s  00007FEBA38E8CF0  Unknown               Unknown  Unknown
libuct.so.0.0.0    00007FEB9FBE6F69  Unknown               Unknown  Unknown
libucp.so.0.0.0    00007FEB9FE4C82A  ucp_worker_progre     Unknown  Unknown
libmlx-fi.so       00007FEBA00D7461  Unknown               Unknown  Unknown
libmlx-fi.so       00007FEBA00D5715  Unknown               Unknown  Unknown
libmlx-fi.so       00007FEBA00F2030  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007FEBA4504413  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007FEBA4219DDA  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007FEBA415DD37  Unknown               Unknown  Unknown
libmpi.so.12.0.0   00007FEBA413A94E  MPI_Win_lock          Unknown  Unknown
libmpifort.so.12.  00007FEBADE43C61  pmpi_win_lock_        Unknown  Unknown
a.out              0000000000405A12  Unknown               Unknown  Unknown
a.out              000000000040674C  Unknown               Unknown  Unknown
a.out              00000000004052ED  Unknown               Unknown  Unknown
libc-2.28.so       00007FEBA2DBDD85  __libc_start_main     Unknown  Unknown
a.out              000000000040520E  Unknown               Unknown  Unknown

Compilation of example:

mpiifx lock.f90 -cpp -O3

Running without I_MPI_ASYNC_PROGRESS (MPI_Win_lock will be slow):

mpirun -n 16 ./a.out

Running with I_MPI_ASYNC_PROGRESS (SegFault):

I_MPI_ASYNC_PROGRESS=1 mpirun -n 16 ./a.out


The code works fine with OpenMPI. Lock are fast, so I did not try any options like I_MPI_ASYNC_PROGRESS.

I used IntelMPI 2021.13 and IFX 2024.2.1 (latest available combination)

Igor 




0 Kudos
7 Replies
taehunkim
Employee
556 Views

Hi,

When we run the program, the memory used by the program is very large, so program is using swap memory space. ( "MPI_win_lock" uses physical memory. ) When we test it in systems without swap memory, the out of memory occurs . When the program uses swap memory, the performance comes out very slowly. If you reduce the memory size used in the main program, the issue will not occur.

 

Thanks.

0 Kudos
foxtran
New Contributor I
534 Views

Hi!

I've extremely reduced the memory allocations of example load, so, not should not take more than 10 Mb/MPI rank.

Please, find it in the attachment.

I've also run it with I_MPI_DEBUG=10, so, it may help a little bit:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.13  Build 20240701 (id: 179630a)
[0] MPI startup(): Copyright (C) 2003-2024 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric loaded: libfabric.so.1
[0] MPI startup(): libfabric version: 1.20.1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Load tuning file: "/scratch/software/packages/intel/mpi/2021.13/opt/mpi/etc/tuning_generic_shm-ofi_mlx_hcoll.dat"
[0] MPI startup(): threading: mode: direct
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: app_threads: -1
[0] MPI startup(): threading: runtime: generic
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: async_progress: 0
[0] MPI startup(): threading: lock_level: global
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[0] MPI startup(): Number of NICs:  1
[0] MPI startup(): ===== NIC pinning on vn01 =====
[0] MPI startup(): Rank    Thread id  Pin nic
[0] MPI startup(): 0       0          mlx
[0] MPI startup(): 1       0          mlx
[0] MPI startup(): 2       0          mlx
[0] MPI startup(): 3       0          mlx
[0] MPI startup(): 4       0          mlx
[0] MPI startup(): 5       0          mlx
[0] MPI startup(): 6       0          mlx
[0] MPI startup(): 7       0          mlx
[0] MPI startup(): 8       0          mlx
[0] MPI startup(): 9       0          mlx
[0] MPI startup(): 10      0          mlx
[0] MPI startup(): 11      0          mlx
[0] MPI startup(): 12      0          mlx
[0] MPI startup(): 13      0          mlx
[0] MPI startup(): 14      0          mlx
[0] MPI startup(): 15      0          mlx
[0] MPI startup(): ===== CPU pinning =====
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       763919   vn01       {0,1,2,48,49,50}
[0] MPI startup(): 1       763921   vn01       {3,4,5,51,52,53}
[0] MPI startup(): 2       763922   vn01       {6,7,8,54,55,56}
[0] MPI startup(): 3       763923   vn01       {9,10,11,57,58,59}
[0] MPI startup(): 4       763924   vn01       {12,13,14,60,61,62}
[0] MPI startup(): 5       763925   vn01       {15,16,17,63,64,65}
[0] MPI startup(): 6       763926   vn01       {18,19,20,66,67,68}
[0] MPI startup(): 7       763927   vn01       {21,22,23,69,70,71}
[0] MPI startup(): 8       763928   vn01       {24,25,26,72,73,74}
[0] MPI startup(): 9       763931   vn01       {27,28,29,75,76,77}
[0] MPI startup(): 10      763932   vn01       {30,31,32,78,79,80}
[0] MPI startup(): 11      763934   vn01       {33,34,35,81,82,83}
[0] MPI startup(): 12      763935   vn01       {36,37,38,84,85,86}
[0] MPI startup(): 13      763936   vn01       {39,40,41,87,88,89}
[0] MPI startup(): 14      763937   vn01       {42,43,44,90,91,92}
[0] MPI startup(): 15      763938   vn01       {45,46,47,93,94,95}
[0] MPI startup(): I_MPI_ROOT=/scratch/software/packages/intel/mpi/2021.13
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_BIND_WIN_ALLOCATE=localalloc
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_RETURN_WIN_MEM_NUMA=0
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10


 

0 Kudos
taehunkim
Employee
478 Views

Hi,

Logs in debug mode don't seem to be a problem. Does delay happen on  small memory program?

0 Kudos
foxtran
New Contributor I
467 Views
0 Kudos
taehunkim
Employee
464 Views

Hi,

Would you like to config I_MPI_PMI_LIBRARY?

ex) If you use slurm scheduler, you can set ;

export I_MPI_PMI_LIBRARY=/<path to slurm>/lib/libpmi2.so

 

Please refer to :

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-13/other-environment-variables.html#GUID-6B9D4E5C-8582-42E6-B7DA-72C87622357D

 

Thanks

0 Kudos
foxtran
New Contributor I
461 Views

Hi!

I've found only libpmix.so on this computer, so, I tried only with libpmix.so. However , it says that I_MPI_PMI_LIBRARY will be ignored... 

$ I_MPI_PMI_LIBRARY=/usr/lib64/libpmix.so I_MPI_PMI=pmix mpirun -n 16 ./a.out
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found


Do you have other ideas?

 

0 Kudos
taehunkim
Employee
360 Views
0 Kudos
Reply