Is it possible to use LEO directives to offload a ZGEMM call to multiple Xeon Phi coprocessors in a way such that they distribute the work among them automatically without me having to partition the matrices myself?
Hi, currently there is no way to target multiple Xeon phi cards in the way that you mention. Using Automatic offload there is a chance that if the MKL runtime determines that the problem is worth running on multiple phi cards that it will do the offload to multiple cards for you, but there is no way to strongly suggest to the runtime to do this.
Let me know if this answer your question.
Intel Developer Support
thank you Keneeth, so I'll just partitionen them myself.
One more question though: Is it possible to transfer data between two xeon phi cards directly (i.e., without having to copy the data to the host) while using the offload model. Alternatively, is it possible to access the memory of another Xeon Phi remotely from a different device? If so, how is that done?