- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running a very simple MKL BLAS matrix-matrix and matrix-vector multiplication on a computer with two AMD EPYC 7443 24-Core Processors and 1007GB RAM.
The code, compiling line and test results are given at the end of this post.
BLAS is apparently not multithreading the mat-vec operation, but only the mat-mat as you can see below.
How can I make the mat-vec operation multithreaded?
What am I doing wrong?
Here's the code:
program main
use blas95
implicit none
integer, parameter :: lp = kind(DBLE(1.0))
integer :: m, n, i
complex(kind=lp), dimension(:), allocatable :: x, y
complex(kind=lp), dimension(:,:), allocatable :: A, B, C
m=2**12
n=2**12
allocate(A(m,n))
allocate(B(n,m),C(m,m))
allocate(x(n),y(m))
do i=0,5
call mkl_set_num_threads_local(2**i)
call mkl_set_dynamic(0)
call gemm(A,B,C)
end do
do i=0,5
call mkl_set_num_threads_local(2**i)
call mkl_set_dynamic(0)
call gemv(A,x,y)
end do
end program main
Here's my compile line:
gfortran -Ofast -I$MKLROOT/include -I$BLASROOT/include/intel64/lp64 main.F90 -L$MKLROOT/lib/intel64 -o main -lgomp -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core $BLASROOT/lib/intel64/libmkl_blas95_lp64.a
Here's the output:
MKL_VERBOSE oneMKL 2022.0 Product build 20211112 for Intel(R) 64 architecture Intel(R) Architecture processors, Lnx 1.79GHz lp64 gnu_thread
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 10.94s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:1
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 5.90s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:2
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 3.76s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:4
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 1.59s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:8
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 925.07ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:16
MKL_VERBOSE ZGEMM(N,N,4096,4096,4096,0x7fff21099cf0,0x154a1f17b010,4096,0x154a0f17a010,4096,0x7fff21099ce0,0x1549ff179010,4096) 606.32ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:32
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 12.23ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:1
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 11.68ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:2
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 11.66ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:4
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 11.62ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:8
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 11.64ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:16
MKL_VERBOSE ZGEMV(N,4096,4096,0x7fff21099d10,0x154a1f17b010,4096,0x1d59930,1,0x7fff21099d00,0x1d69940,1) 11.60ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:32
And here's a test result of only the mat-vec but with a larger matrix and vector:
MKL_VERBOSE oneMKL 2022.0 Product build 20211112 for Intel(R) 64 architecture Intel(R) Architecture processors, Lnx 1.79GHz lp64 gnu_thread
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.89s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:1
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.87s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:2
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.90s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:4
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.90s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:8
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.90s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:16
MKL_VERBOSE ZGEMV(N,65536,65536,0x7fff04973380,0x14f20a01e010,65536,0x1502125d9010,1,0x7fff04973370,0x14d209f1b010,1) 4.90s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:32
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Astor,
Thanks for posting in Intel communities.
We have tried compiling your code, but we are getting the following error.
Fatal Error: Reading module ‘blas95’ at line 1 column 2: Unexpected EOF
We are using the following compile line:
gfortran -Ofast -I$MKLROOT/include -I/opt/intel/oneapi/mkl/latest/include/intel64/lp64 main.F90 -L$MKLROOT/lib/intel64 -o main -lgomp -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_blas95_lp64.a
There might be some mismatch between your mod files and ours.
Could you please give us the following so we could look into your issue further?
- The MKL version you are using.
- blas.mod file
Thanks and Regards,
Praneeth Achanta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Praneeth,
Thanks for your assistance.
The error you see is most likely due to your blas95 interface not having been compiled with gfortran for your architecture.
In order to solve it you can follow the instructions in the Developer Guide here.
I cannot unfortunately give you the blas95.mod file since this forum does not allow attachments of that kind and throws an error.
The version of MKL is 2022.0.2.
Please let me know of any other information you need.
Again, thanks for your assistance.
Best regards,
Astor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Astor,
Thank you for the information.
We have tried running your code on Intel Sapphire Rapids and got results as shown in the attached file.
We can only offer direct support for Intel hardware platforms that the Intel® oneAPI product supports. Please see this link for a list of all supported processors.
Please let us know if it works as intended on Intel processors for you.
Thanks and Regards,
Praneeth Achanta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Astor,
We have not heard back from you. Could you give us an update on your issue?
Thanks and Regards,
Praneeth Achanta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Astor,
We have not heard back from you. We hope the information provided helped. If you need any additional help please post a new question as this thread will no longer be monitored by Intel.
Thanks and Regards,
Praneeth Achanta

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page