Use of gather and scatter when it's not supported

Sérgio_Caldas · ‎10-25-2016

Hi there!

I am a MSc student in HPC, and I am currently working with Quantum Espresso, in order to improve their performance in a cluster environment.

Recently I generated the call graph of the application through a machine with Ivy Bridge microarchitecture (Intel® Xeon® Processor E5-2670 v2), which uses AVX as extension of the instruction set, which supposedly does not support operations to gather and scatter with the MPI, but as you can see in the call graph excerpt from the link below, the application uses these operations by libmkl_avx.so library more precisely the following operations: mkl_dft_avx_gather_z_z and mkl_dft_avx_scatter_z_z.

http://imgur.com/k7754x8

Someone can help me understand why this occur?

Thanking you in advance, yours sincerely

Sérgio Caldas

Zhen_Z_Intel · ‎10-25-2016

Hi Sérgio,

I am not really understand why you mentioned gather, scatter instructions of MPI. Because avx instruction is normally for vectorized code which mkl used for vectorization operation. For instance, the mkl_dft_avx_gather_z_z is used for access discontinuous data elements. But the gather instruction for MPI mainly for collecting elements from many processes to one process. I am afraid the gather instruction for mkl is not used for MPI. Even you use MKL, not every function will required to call avx. for example, some blas 1 level function might not call avx instruction.

Best regards,
Fiona

Sérgio_Caldas · ‎10-27-2016

Hi Fiona

Apologies for my previous post. I was worried with a distributed memory project and I mixed the terms "gather & scatter" in a vector computer environment with the same terms in MPI…

I’m aware that since the early Cray vector machines, these could access vector operands scattered in memory and store back the resulting vector in disjoint memory locations. And also that the SIMD extensions in Intel processors did not support these features until AVX 2 (and only gather) and AVX-512 (both gather and scatter).

However, when I ran a particular code with QE on a cluster node with dual Ivy Bridge chips (that only support AVX and not AVX 2), I analysed through a call graph the functions that were used and I noticed that the code used 2 AVX functions (mkl_dft_avx_gather_z_z and mkl_dft_avx_scatter_z_z) that are not supported in the Ivy Bridge microarchitecture hardware.

Could you please explain what is happening?

Best Regards

Sérgio Caldas