Forcing AO with MKL?

AndrewC · ‎05-07-2013

I have a large numerically intensive C++ program that is a heavy user of Intel MKL 11.0.2 ( a lot of use of zgemm, for example)

I am experimenting with AO without changing the source code , but I can't seem to get the MIC to "kick-in" for typical, large problems. My first thought was to check that AO is working in a test problem so I wrote a simple program that uses zgemm and the same link options and environment as the main program.

In my simple test, clearly the MIC is kicking it at N=4096. My problem is I want to force the MIC to be used for small 'N'. All attempts with the MKL_HOST_WORKDIVISION option do not change at which N the MIC is used. I would seem to me to be very useful to have some way to "force" AO to at least validate it is being used and for what functions.

export MKL_MIC_ENABLE=1
export OFFLOAD_REPORT=2

simple zgemm test running..
N= 32 Time = 0.186048
simple zgemm test running..
N= 64 Time = 0.069303
simple zgemm test running..
N= 128 Time = 0.000514007
simple zgemm test running..
N= 256 Time = 0.0207761
simple zgemm test running..
N= 512 Time = 0.0325701
simple zgemm test running..
N= 1024 Time = 0.176791
simple zgemm test running..
N= 2048 Time = 0.997894
simple zgemm test running..
[MKL] [MIC --] [AO Function]    ZGEMM
[MKL] [MIC --] [AO ZGEMM Workdivision] 0.23 0.77
[MKL] [MIC 00] [AO ZGEMM CPU Time]      2.869560 seconds
[MKL] [MIC 00] [AO ZGEMM MIC Time]      0.879049 seconds
[MKL] [MIC 00] [AO ZGEMM CPU->MIC Data] 486539264 bytes
[MKL] [MIC 00] [AO ZGEMM MIC->CPU Data] 872415232 bytes
N= 4096 Time = 3.1413

Sumedh_N_Intel · ‎05-07-2013

You can use Compiler Assited Offload (COA) to enforce offloading of functions. Enforcing Automatic Offload (AO) has not been enabled for sizes < 4096 because the performance gains from using the coprocessor are not enough to offset the slow data transfer. Also note that the other dimension should be at least 256 for the automatic offload to kick in. Lastly, AO is not available for all MKL routines. The following MKL functions supports AO: ?GEMM, ?TRSM, ?TRMM, LU, QR and Cholesky.

AndrewC · ‎05-07-2013

I understand that a limited set of MKL functions are AO. I understand that AO does not activate unless it is worthwhile. However I see value in forcing AO on for testing. For example, a target machine may be less powerful than a development machine.

Sumedh_N_Intel · ‎05-07-2013

Currently, this functionality is not available in Intel MKL. I will pass this feature request on to the developers. However, there is no guarantee that this feature will implemente (based on their cost-benefit analysis). I'll keep you updated, if I hear something more about this from the developers.