Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

pthread_setaffinity_np failed when I_MPI_ASYNC_PROGRESS=1

BenWibking
Beginner
1,522 Views
I am trying to run the Sandia MPI overlap benchmark (https://github.com/sandialabs/SMB/tree/599675fe131baca55329a530b1d001add15bdbdb/src/mpi_overhead) with Intel MPI, using Intel MPI 2021.6.0.
 
It works with the default settings, but any MPI launch with I_MPI_ASYNC_PROGRESS enabled fails. 
 
The error message is:
 
pthread_setaffinity_np failed
Abort(566543) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(239)......:
MPID_Init_async_thread(667): MPID_Thread_create failed
0 Kudos
6 Replies
VarshaS_Intel
Moderator
1,506 Views

Hi,

 

Thanks for posting in Intel Communities.

 

On our end, we are able to get the expected results without any errors in both the cases(with/without enabling the I_MPI_ASYNC_PROGRESS_THREADS=1/0). We have tried with the make file(changing mpiicc instead of mpicc).  Please find the below screenshot where we are able to run without any error:

result1.png

Could you please provide us with the details of the system environment(OS details) and Interconnect Hardware you are using? Could you please let us know the steps you have followed to run the benchmark?

 

And also, could you please provide us with the output of debug log using the below command:

 

I_MPI_DEBUG=10 FI_LOG_LEVEL=debug mpirun -np 2 ./mpi_overhead

 

 

Thanks & Regards,

Varsha

 

0 Kudos
BenWibking
Beginner
1,439 Views

It's a Rocky Linux 8.5 cluster with Mellanox IB cards (mlx5) on 2x Intel(R) Xeon(R) Gold 6330 (Ice Lake) CPUs.

Here's the kernel cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=images/rocky-8.5-compute-02/vmlinuz-4.18.0-348.20.1.el8.nci.x86_64 ro selinux=0 console=tty0 console=ttyS0,115200n8 ip=ib0:dhcp lnet_network=o2ib2:ib0 root=lustre:10.6.201.1@o2ib2,10.6.201.101@o2ib2:10.6.201.2@o2ib2,10.6.201.102@o2ib2:10.6.201.3@o2ib2,10.6.201.103@o2ib2:10.6.201.4@o2ib2,10.6.201.104@o2ib2:/gadisys/images/rocky-8.5-compute-02:localflock rd.neednet=1 rd.net.timeout.carrier=120 rd.net.timeout.ifup=120 rd.net.timeout.iflink=120 ETHMACID=4c:52:62:2b:cc:00 nohz_full=all rcu_nocbs=all rd.driver.blacklist=cdc_ether,nouveau transparent_hugepage=madvise halfroot=134217728:half-root:zstd lustreroot.nosubtree=1

ibstatus:

$ ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:b859:9f03:0006:fdea
base lid: 0x55b
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (2X HDR)
link_layer: InfiniBand

I'm running on a single node with this PBS script:

#!/bin/bash
#PBS -l ncpus=56
#PBS -l ngpus=4
#PBS -l mem=500GB

# --- Intel MPI
module unload openmpi
module load intel-mpi/2021.6.0
export I_MPI_ASYNC_PROGRESS=1

MPI_OPTIONS="-ppn 4 -np $PBS_NGPUS -bind-to numa"

echo "######### START #########"
echo Running on `hostname`
echo Dir is `pwd`
echo "Using MPI_OPTIONS = $MPI_OPTIONS"

mpirun=mpirun

min_msgsize=0
max_msgsize=`expr 2 \* 1024 \* 1024`
msgsize=$min_msgsize
while [ $msgsize -le $max_msgsize ]
do
command="$mpirun $MPI_OPTIONS ./mpi_overhead -b 1.075 -t 2 --msgsize $msgsize $1"
if [ $msgsize -gt $min_msgsize ]; then
command="$command --nohdr"
fi
$command

if [ $msgsize -eq 0 ]; then
msgsize=2
else
msgsize=`expr $msgsize \* 2`
fi
done

 

0 Kudos
BenWibking
Beginner
1,439 Views

Running with I_MPI_DEBUG=10 FI_LOG_LEVEL=debug mpirun -np 2 ./mpi_overhead, I get:

 

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Load tuning file: "/apps/intel-mpi/2021.6.0/etc/tuning_icx_shm-ofi_mlx_100.dat"
[0] MPI startup(): threading: mode: handoff
[0] MPI startup(): threading: vcis: 1
[0] MPI startup(): threading: progress_threads: 0
[0] MPI startup(): threading: progress_threads: 1
[0] MPI startup(): threading: async_progress: 1
[0] MPI startup(): threading: lock_level: nolock
[0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575)
[0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151)
[1] MPI startup(): global_rank 1, local_rank 1, local_size 2, threads_per_node 2


libfabric:3932129:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932129:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932128:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932128:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932128:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:3932128:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:3932128:core:mr:ofi_default_cache_size():78<info> default cache size=2412190226
libfabric:3932129:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:3932129:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:3932129:core:mr:ofi_default_cache_size():78<info> default cache size=2412190226
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: verbs (113.20)
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: tcp (113.20)
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: sockets (113.20)
libfabric:3932128:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932128:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932128:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: shm (114.0)
libfabric:3932129:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932129:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932129:core:core:ofi_hmem_init():222<info> Hmem iface FI_HMEM_ZE not supported
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: shm (114.0)
libfabric:3932128:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932128:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932129:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:3932129:core:core:ofi_hmem_init():209<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:3932128:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:3932128:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:3932129:core:core:ze_hmem_dl_init():422<warn> Failed to dlopen libze_loader.so
libfabric:3932129:core:core:ofi_hmem_init():214<warn> Failed to initialize hmem iface FI_HMEM_ZE: No data available
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: ofi_rxm (113.20)
libfabric:3932128:psm3:core:fi_prov_ini():752<info> build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: psm3 (1102.0)
libfabric:3932129:psm3:core:fi_prov_ini():752<info> build options: VERSION=1102.0=11.2.0.0, HAVE_PSM3_src=1, PSM3_CUDA=0
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: psm3 (1102.0)
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: mlx (1.4)
libfabric:3932129:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:3932129:core:core:fi_getinfo_():1138<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:3932129:mlx:core:mlx_getinfo():211<info> primary detected device: mlx5_0
libfabric:3932129:mlx:core:mlx_getinfo():254<info> used inject size = 1024
libfabric:3932129:mlx:core:mlx_getinfo():301<info> Loaded MLX version 1.12.0
libfabric:3932129:mlx:core:mlx_getinfo():348<warn> MLX: spawn support 0
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, psm3 has been skipped. To use psm3, please, set FI_PROVIDER=psm3
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:3932129:core:core:fi_getinfo_():1138<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:3932129:mlx:core:mlx_getinfo():211<info> primary detected device: mlx5_0
libfabric:3932129:mlx:core:mlx_getinfo():254<info> used inject size = 1024
libfabric:3932129:mlx:core:mlx_getinfo():301<info> Loaded MLX version 1.12.0
libfabric:3932129:mlx:core:mlx_getinfo():348<warn> MLX: spawn support 0
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, psm3 has been skipped. To use psm3, please, set FI_PROVIDER=psm3
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:3932129:core:core:fi_getinfo_():1161<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:3932129:mlx:core:mlx_fabric_open():172<info>
libfabric:3932129:core:core:fi_fabric_():1423<info> Opened fabric: mlx
libfabric:3932129:mlx:core:ofi_check_rx_attr():786<info> Tx only caps ignored in Rx caps
libfabric:3932129:mlx:core:ofi_check_tx_attr():884<info> Rx only caps ignored in Tx caps
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: mlx (1.4)
libfabric:3932128:core:core:ofi_register_provider():474<info> registering provider: ofi_hook_noop (113.20)
libfabric:3932128:core:core:fi_getinfo_():1138<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:3932128:mlx:core:mlx_getinfo():211<info> primary detected device: mlx5_0
libfabric:3932128:mlx:core:mlx_getinfo():254<info> used inject size = 1024
libfabric:3932128:mlx:core:mlx_getinfo():301<info> Loaded MLX version 1.12.0
libfabric:3932128:mlx:core:mlx_getinfo():348<warn> MLX: spawn support 0
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, psm3 has been skipped. To use psm3, please, set FI_PROVIDER=psm3
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:3932128:core:core:fi_getinfo_():1138<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:3932128:mlx:core:mlx_getinfo():211<info> primary detected device: mlx5_0
libfabric:3932128:mlx:core:mlx_getinfo():254<info> used inject size = 1024
libfabric:3932128:mlx:core:mlx_getinfo():301<info> Loaded MLX version 1.12.0
libfabric:3932128:mlx:core:mlx_getinfo():348<warn> MLX: spawn support 0
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, psm3 has been skipped. To use psm3, please, set FI_PROVIDER=psm3
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=sockets
libfabric:3932128:core:core:fi_getinfo_():1161<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:3932128:mlx:core:mlx_fabric_open():172<info>
libfabric:3932128:core:core:fi_fabric_():1423<info> Opened fabric: mlx
libfabric:3932128:mlx:core:ofi_check_rx_attr():786<info> Tx only caps ignored in Rx caps
libfabric:3932128:mlx:core:ofi_check_tx_attr():884<info> Rx only caps ignored in Tx caps
libfabric:3932128:mlx:core:ofi_check_rx_attr():786<info> Tx only caps ignored in Rx caps
libfabric:3932128:mlx:core:ofi_check_tx_attr():884<info> Rx only caps ignored in Tx caps
libfabric:3932129:mlx:core:ofi_check_rx_attr():786<info> Tx only caps ignored in Rx caps
libfabric:3932129:mlx:core:ofi_check_tx_attr():884<info> Rx only caps ignored in Tx caps
libfabric:3932128:mlx:core:mlx_cm_getname_mlx_format():73<info> Loaded UCP address: [262]...
libfabric:3932129:mlx:core:mlx_cm_getname_mlx_format():73<info> Loaded UCP address: [262]...
libfabric:3932128:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=1) fi_addr=0x163f930
libfabric:3932129:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=1) fi_addr=0x17fb980
libfabric:3932128:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:3932128:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=1) fi_addr=0x164a100
libfabric:3932129:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:3932129:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=1) fi_addr=0x18061d0
libfabric:3932129:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:3932128:mlx:core:mlx_av_insert():189<warn> address inserted
pthread_setaffinity_np failed
Abort(566543) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(239)......:
MPID_Init_async_thread(667): MPID_Thread_create failed
0 Kudos
BenWibking
Beginner
1,447 Views

Hi,

 

It's a Rocky Linux 8.5 cluster running on 2x Ice Lake CPUs per node with 1x Mellanox EDR Infiniband (mlx5) card:

$ ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:b8ce:f603:0083:bccc
base lid: 0x92d
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: InfiniBand

I've attached the standard error and standard output logs when running with debugging enabled.

 

Thanks,

Ben

0 Kudos
BenWibking
Beginner
1,415 Views

Hi,

 

This error message was caused by the cpuset excluding the second hardware thread / hyperthread on each core, which appear as logical cores 56-111. On our PBS implementation, adding `-lother=hyperthread` to the job script is necessary to be able to use the hyperthreads on each core. Adding this option fixed the problem.

 

This should be considered a bug in Intel MPI, since it should query the cpuset itself and not attempt to pin threads to cores the job doesn't have access to. At the very least, it should print an error message that is understandable to the user in this case.

 

Regards,
Ben

0 Kudos
VarshaS_Intel
Moderator
1,376 Views

Hi,


We have not heard back from you internally. This thread will no longer be monitored by Intel. If you need additional information, please post a new question.


Thanks & Regards,

Varsha


0 Kudos
Reply