Hi all, I'm using mpiifort to compile a set of Fortran scripts that make a few Lapack and Blas calls. I've been able to use automatic offloading for the ZGETRF Lapack routine, which is LU factorization by increasing my problem size so that the matrices the ZGETRF call is processing are, in fact, greater than 8192 x 8192. However, there are some other Lapack and Blas routines not supported for automatic offloading in the scripts as well. I'm wondering if also denoting some of those routines for offloading will be worth it, because explicit offloading for me in the past has only increased computational time.
It's worth noting I'm using mpiifort to keep the MPICH2 calls inside the code intact. If that is the source of the slowdown, let me know.
When deciding whether to explicitly offload the other LAPACK and BLAS routines, you should take into consideration the overhead of data transfer (back and forth between host and MIC) as the biggest factor. It's hard to give a general answer. You would have to do some benchmarking to determine the trade-off of data transfer and computation speed. But there are a few guidelines:
Hope this helps.