Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Zhoulong_J_Intel
Employee
127 Views

openmp application performance dropped with I_MPI_ASYNC_PROGRESS=enable

Hi,

I tried MPI/openmp process pining, it seems that When I use non-blocking api(Iallreduce) and specific I_MPI_ASYNC_PROGRESS like the following command, it I set I_MPI_ASYNC_PROGRESS=enable, then application will spent much more time on libiomp.so(kmp_hyper_barrier_release), and vmlinux also got a little hotter, compare with (I_MPI_ASYNC_PROGRESS=disable), is there any issue with my configuration? I use vtune and it shows that all the cores are pin in the right cores. the only difference is core 67 is used by MPI communication thread. 

========command=================

mpirun    -n 2 -ppn 1    -genv OMP_PROC_BIND=true -genv  I_MPI_ASYNC_PROGRESS= -genv I_MPI_ASYNC_PROGRESS_PIN=67 -genv I_MPI_PIN_PROCS=0-66 -genv OMP_NUM_THREADS=67  -genv I_MPI_PIN_DOMAIN=sock -genv I_MPI_FABRICS=ofi -f ./hostfile   python train_imagenet_cpu.py  --arch alex --batchsize 256 --loaderjob 68  --epoch 100 --train_root /home/jiangzho/imagenet/ILSVRC2012_img_train --val_root /home/jiangzho/imagenet/ILSVRC2012_img_val --communicator naive /home/jiangzho/train.txt /home/jiangzho/val.txt

0 Kudos
2 Replies
Zhoulong_J_Intel
Employee
127 Views

root caused why libiomp5.so got much hotter,

,  set command as above, tring to make MPI communication thread pin on core 67 and openmp threads pin on core 0-core66, Vtune shows that MPI communication did pined on core 67 and OPenmp has 67 threads, but OMP_thread66 pined on core 67, so it lag the whole performance, making libiomp,so has lots of spin time.But I still didn’t figure out how to making it work correctly…

 

​any idea? thanks

Zhoulong_J_Intel
Employee
127 Views

root caused, thanks

Reply