Software Archive
Read-only legacy content
17061 Discusiones

hybrid application on the Xeon Phi

Miah__Wadud
Principiante
538 Vistas

I am trying to execute a hybrid application (CP2K) on the Xeon Phi using MPI + OpenMP. I have the following environment set up: $ export OMP_NUM_THREADS=15 $ export I_MPI_PIN_PROCESSOR_LIST=$(seq -s "," 1 $OMP_NUM_THREADS 240) $ echo $I_MPI_PIN_PROCESSOR_LIST 1,16,31,46,61,76,91,106,121,136,151,166,181,196,211,226 $ mpirun -n $(expr 240 / $OMP_NUM_THREADS) cp2k.psmp.epcc H2O-64.inp When I run the "top" command only shows the 16 MPI processes and not any of the threads and says the Phi system is 6.2% user busy (16 / 240 * 100). It seems like the threads are not running. Any help will be greatly appreciated. Thanks in advance,

0 kudos
2 Respuestas
TimP
Colaborador Distinguido III
538 Vistas

Did you build using -mmic -openmp ?

It looks like you are pinning each rank to the same group of thread contexts; needless to say, that doesn't make sense; it would require the ranks to take turns on the cores you specified, thrashing the cache.

Intel MPI default is to assign a group of cores to each rank in accordance with OMP_NUM_THREADS, so it seems better to take advantage of that at least as a starting point.  By setting I_MPI_DEBUG, you can get feedback on what it does.

On other hybrid applications, I've found at most 3 threads per core effective.  As Intel MPI will divide the available cores as evenly as possible among ranks, in general you need to set KMP_AFFINITY=balanced or equivalent to spread those threads evenly across the group of cores. You would also bear in mind that mpss and mpi libraries occupy one core, so you would need a 61 core Phi if you want to spread your application over 60 cores.  If you have only 60 cores, your application might do best using 50 or so.

As you didn't get prompt response here, If you think it's not on account of the clutter in your post, you might try the HPC/cluster forum as there is more Intel MPI expertise there.

Miah__Wadud
Principiante
538 Vistas
Hi Tim, Yes, I did build it with -mmic -openmp. I am not sure what you mean by "ranks to take turns on the cores". Could you let me know how you would configure the environment variables to run a hybrid application? Do you have the link for the Intel HPC/cluster forum? Thanks in advance, Wadud.
Responder