Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- MKL matmul with avx 512 shows bad performance on matrix with certain input size

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Wang__Shuo

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-11-2020
11:29 PM

248 Views

MKL matmul with avx 512 shows bad performance on matrix with certain input size

**Description**: For Intel-MKL compiled with AVX512 support, **matmul **performance will be bad for certain matrix size. For example, let C = np.matmul(A, B), where A.shape = (**m, k**), B.shape = (**k, n**). If **m** < 192 and **n** is multiple of 1024, the performance is not as good as expected. For example, on my machine which has CPU "Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz", if A.size = (191, 20000), B.size = (20000, 1024), np.matmul(A, B) will use 120 ms (*export OMP_NUM_THREADS =1*), however, if A.size = (191, 20000), B.size = (20000, 1023 or 1025). np.matmul(A, B) will us 80 ms. On the other hand, if A.size = (192, 20000), B.size = (20000, 1024), np.matmul will use 75 ms. I did many experiments, and find that if **m **< 192 and **n** is 1024, 2048, 3072 ..., the performance will be bad, the number **k** seems not relevant. The above test is done using numpy with MKL backend installed by Anaconda, the intel-tensorflow shows the same result.

**Operating system and version** : CentOS Linux release 7.4.1708

**Library version**: Intel Optimized tensorflow 1.15.0 installed with "pip install intel-tensorflow==1.15.0", and numpy 1.18.1 shipped with Anaconda

**Compiler version**: gcc 4.8.5

Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions)

import numpy as np import time a = np.random.random((191,20000)).astype(np.float32) b = np.random.random((20000,1024)).astype(np.float32) for i in range(20): time1 = time.time() c = np.matmul(a,b) time2 = time.time() print(time2 - time1)

Working compiler, tool, or library version, and accelerator driver version (for regressions)

Link Copied

1 Reply

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-07-2020
06:58 AM

243 Views

You could submit the report of the problem against the MKL team to the Intel Online Service Center.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.