Solved: MKL_DIRECT_CALL and ICX

AndrewC · ‎07-18-2022

Are there any limitations/caveats when using ICX ( Intel C++ Compiler 2022.2) and MKL_DIRECT_CALL. I am looking through some of the headers and notice some __INTEL_COMPILER blocks ( which is ICL specfic). It seems to be that ICX disables MKL_DIRECT_CALL. The snippet below is from the mkl_direct.h in MKL 2022.1.0. When MKL_DC_USE_C is 0, many of the direct calls are skipped. I am not sure why this is restricted to __INTEL_COMPILER. Simply forcing MKL_DC_USE_C to 1 for ICX seems to work just fine. I suppose it's not clear to me if MKL_DIRECT_CALL is abandonware now?

#ifdef __INTEL_COMPILER
#define MKL_DC_USE_C 1
#if (__INTEL_COMPILER <= 1500)
#define MKL_DC_POTRF_DISABLE 1
#else
#define MKL_DC_POTRF_DISABLE 0
#endif
#elif defined(__GNUC__)
#if defined(__STRICT_ANSI__) && !defined(__STDC_VERSION__)
#define MKL_DC_USE_C 0
#else
#define MKL_DC_USE_C 1
#endif
#define MKL_DC_POTRF_DISABLE 1
#else
#define MKL_DC_USE_C 0
#endif

.

Gennady_F_Intel · ‎12-21-2022

Andrew,

Please check the latest version of MKL 2023 and let us know if the problem is still there.

Thanks,

Gennady

View solution in original post

Gennady_F_Intel · ‎07-22-2022

Andrew,

MKL_DIRECT_CALL is not abandonware. We need to check this version.

Checking the small gemm calls with/without direct call, I see the following perf results ( MKL v 2022.1.0 ) :

icx --version

Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.1.0.20220316)

[2 x 2], SGEMM Execution Time == 6.798655e-08 sec

[2 x 2], JIT_SGEMM Execution Time == 4.284084e-08 sec

....

[8 x 8], SGEMM Execution Time == 7.217750e-08 sec

[8 x 8], JIT_SGEMM Execution Time == 4.936010e-08 sec

that's mean direct call mode works with jit version of gemm as well.

-Gennady

AndrewC · ‎07-22-2022

Code compiled using ICX with MKL_DIRECT_CALL defined seems to skip the calls to the direct version (because __INTEL_COMPILER is NOT defined)

As I mentioned in my original post, the "variable" MKL_DC_USE_C is defined "0" unless __INTEL_COMPILER is defined

#define MKL_DC_GEMM3M_CHECKSIZE(m,n,k) (((*(m) <= 4 && *(n) <= 4 && *(k) <= 4)) && MKL_DC_USE_C)

Always evaluates to FALSE, so the direct call is never made - example below.

#define zgemm(transa,transb,m,n,k,alpha,a,lda,b,ldb,beta,c,ldc)  MKL_DC_ZGEMM_CONVERT(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
#define zgemm_(transa,transb,m,n,k,alpha,a,lda,b,ldb,beta,c,ldc) MKL_DC_ZGEMM_CONVERT(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)
#define ZGEMM(transa,transb,m,n,k,alpha,a,lda,b,ldb,beta,c,ldc)  MKL_DC_ZGEMM_CONVERT(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)

/* ZGEMM3M */
#define MKL_DC_ZGEMM3M_CONVERT(transa, transb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc)  do { \
    if (MKL_DC_GEMM3M_CHECKSIZE(m,n,k)) { \
        mkl_dc_zgemm((transa), (transb), (m), (n), (k), (alpha), (a), (lda), (b), (ldb), (beta), (c), (ldc));\
    } else {  \
        MKL_DIRECT_CALL_INIT_FLAG; \
        zgemm3m_direct((transa), (transb), (m), (n), (k), (alpha), (a), (lda), (b), (ldb), (beta), (c), (ldc), &mkl_direct_call_flag); \
    }\
} while (0)

Am I missing something here?

Gennady_F_Intel · ‎12-21-2022

Andrew,

Please check the latest version of MKL 2023 and let us know if the problem is still there.

Thanks,

Gennady

AndrewC · ‎12-21-2022

Will do! Thanks for following up.

AndrewC · ‎12-29-2022

Hi Gennady,

Just checked 2023.0 and its clear that has been taken care of.

Thanks

Andrew