We are trying to use non blocking api(Iallreduce) on computation intensive program, we tried on two nodes(xeon phi) and find two nodes are not balance with intel trace analyzer tool, it said that one node spent more time on Iallreduce(sum?), We want to know whether we can create a thread and let the iallreduce/sum do in one specific core and let it parallel with user code(openmp)? or is there api or config in intel mpi can do this job? thanks
Si, Zhuowei (Intel) wrote:
Hi Zhoulong, for hybird MPI/OpenMP programming, could you please refer Beginning Hybrid MPI/OpenMP Development and Running an MPI/OpenMP* Program. For process pining, please refer Environment Variables for Process Pinning and Interoperability with OpenMP API. Thank you.