Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

MKL matmul with avx 512 shows bad performance on matrix with certain input size

Wang__Shuo
Beginner
1,703 Views

Description: For Intel-MKL compiled with AVX512 support, matmul performance will be bad for certain matrix size. For example, let C = np.matmul(A, B), where A.shape = (m, k), B.shape = (k, n). If m < 192 and n is multiple of 1024, the performance is not as good as expected. For example, on my machine which has CPU "Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz", if A.size = (191, 20000), B.size = (20000, 1024), np.matmul(A, B) will use 120 ms (export OMP_NUM_THREADS =1), however, if A.size = (191, 20000), B.size = (20000, 1023 or 1025). np.matmul(A, B) will us 80 ms. On the other hand, if  A.size = (192, 20000), B.size = (20000, 1024), np.matmul will use 75 ms. I did many experiments, and find that if m < 192 and n is 1024, 2048, 3072 ..., the performance will be bad, the number k seems not relevant. The above test is done using numpy with MKL backend installed by Anaconda, the intel-tensorflow shows the same result.

Operating system and version : CentOS Linux release 7.4.1708

Library version: Intel Optimized tensorflow 1.15.0 installed with "pip install intel-tensorflow==1.15.0", and numpy 1.18.1 shipped with Anaconda

Compiler version: gcc 4.8.5

Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions)
 

import numpy as np
import time
a = np.random.random((191,20000)).astype(np.float32)
b = np.random.random((20000,1024)).astype(np.float32)
for i in range(20):
    time1 = time.time()
    c = np.matmul(a,b)
    time2 = time.time()
    print(time2 - time1)

Working compiler, tool, or library version, and accelerator driver version (for regressions)

0 Kudos
1 Reply
Gennady_F_Intel
Moderator
1,698 Views

You could submit the report of the problem against the MKL team to the Intel Online Service Center.


0 Kudos
Reply