- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found some problems working with cblas_daxpy and cblas_dscal routines using libMKL: I coded a C program which working with large number of variables (vector size is 1300²). I use just BLAS1 routines and everything is working fine when I use OpenBLAS and compiling with -lcblas, but when I changed to -lmkl_rt doesn't work. In my program I have only dcopy, ddot, dscal and daxpy routines, so I was checking every instruction until I found that the issues are the daxpy and the dscal routines. I tested these routines in a toy example using libMKL and working fine, but inside my program doesn't work.
In addition, I made another test: I left the -lcblas in the linker but switch between OpenBLAS and libMKL using update-alternatives and both working fine, but libMKL was evidently slower than OpenBLAS.
I'm working with a Lenovo W540 with Intel® Core™ i7-4700MQ CPU @ 2.40GHz × 8, using Debian buster and
GNU g++ 8.3.0
OpenBLAS 0.3.5+ds-3
libMKL 2019.2.187-1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you again for reaching us. This issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
>> but when I changed to -lmkl_rt doesn't work.
Please check with the link line advisor regarding compiling and linking options
from the below link
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
>>libMKL 2019.2.187-1....libMKL was evidently slower than OpenBLAS.
Could you please try using the latest version of Intel MKL which is oneMKL 2022 ?
Intel MKL is available as part of the Intel® oneAPI Base Toolkit.
Link to download oneAPI Base Toolkit:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html
Please do let us know if you face any issues by providing us a minimal reproducer (&steps to reproduce if any) along with the timings you are getting for both OpenBLAS and oneMKL so that we can work on it from our end.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your kind answer.
I answer in parts
1) Could you please try using the latest version of Intel MKL which is oneMKL 2022 ?
I'm afraid it is a little some difficult right now because I'm a little busy. For me, changes in my OS takes some time. I'm planning to do it in the near future.
2) Please check with the link line advisor...us a minimal reproducer (&steps to reproduce if any) along with the...
I made two different experiments: the first was an extensive test of the libraries; I include all the following link (https://www.dropbox.com/s/xd6s2eynpxbuoeo/experiment1.zip?dl=0). This link it will be valid for a week.
The second experiment is the main reason of my post. I'm solving a nonlinear PDE with CG, so I implemented using BLAS1; the algorithm consist of two loops: the external loop update some weights and the internal is the iteration of the CG. The following are the details of my experiments, using the compiling and linking options from the link that you provide me. I hope this is helpful (sorry for the Spanish outputs):
*************************************************
I have a 307200 x 307200 sparse matrix with 1533760 variables. I do not use any lib related with handling sparse info, I made it by myself.
First run; this is the top of my makefile
CC = g++ -O2
CFLAGS = -m64 -Wno-deprecated -Wall -ansi `pkg-config --cflags opencv`
LDFLAGS = -lm -lcblas -lrt `pkg-config --libs opencv`
This is the output
iteracion : 68 error : 4.70732e-05
iteraciones total : 34886
Tiempo empleado : 229922 mili-segundos
Normalized error: = 0.167415
****
Second run; this is the top of my makefile
CC = g++ -O2
CFLAGS = -DMKL_ILP64 -m64 -Wno-deprecated -Wall -ansi `pkg-config --cflags opencv`
LDFLAGS = -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl -lrt `pkg-config --libs opencv`
This is the output
iteracion : 68 error : 4.64704e-05
iteraciones total : 34784
Tiempo empleado : 219650 mili-segundos
Normalized error: = 0.167413
****
Third run; this is the top of my makefile
CC = g++ -O2
CFLAGS = -m64 -Wno-deprecated -Wall -ansi `pkg-config --cflags opencv`
LDFLAGS = -lmkl_rt -Wl,--no-as-needed -lpthread -lm -ldl -lrt `pkg-config --libs opencv`
DOESN'T WORK
***
Forth run; this is the top of my makefile
CC = g++ -O2
CFLAGS = -m64 -Wno-deprecated -Wall -ansi `pkg-config --cflags opencv`
LDFLAGS = -lm -lcblas -lrt `pkg-config --libs opencv`
But in this case I switch from openblas to MKL using
update-alternatives --config libblas.so.3-x86_64-linux-gnu
update-alternatives --config libblas.so.-x86_64-linux-gnu
This is the output
iteracion : 67 error : 4.77133e-05
iteraciones total : 34748
Tiempo empleado : 229701 mili-segundos
Normalized error: = 0.167421
***
Fifth run; this is the top of my makefile
CC = g++ -O2
CFLAGS = -m64 -Wno-deprecated -Wall -ansi `pkg-config --cflags opencv`
LDFLAGS = -lm -lmkl_rt -lrt `pkg-config --libs opencv`
DOESN'T WORK
*******************************************************************
Again, I'm working with a Lenovo W540 with Intel® Core™ i7-4700MQ CPU @ 2.40GHz × 8, using Debian buster and
GNU g++ 8.3.0
OpenBLAS 0.3.5+ds-3
libMKL 2019.2.187-1
Regards,
Ricardo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are looking into this issue, we will get back to you soon.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>In addition, I made another test: I left the -lcblas in the linker but switch between OpenBLAS and libMKL using update-alternatives and both working fine, but libMKL was evidently slower than OpenBLAS.
For this one, the reason is MKL runs multithread by default, while OpenBLAS runs in single thread. When the multithread run out of memory bandwidth, then computing speed would very slow. You can set MKL_NUM_THREADS=1 to let MKL runs in single thread.
I tested your sample code with multithread by default for MKL vs OpenBlas, the results as below.
MKL library:
root@icx01-tce:/home/rcao8/mkl_test/issue_cblas_daxpy# time ./test_mkl 10000000
iteración 0, numero de datos = 10000000
Performance (MFlops/seg), iteración 0, numero de datos = 10000000
Solucion 3 : 8388.21
Error relativo, iteración 0, numero de datos = 10000000
Solucion 3 : -6.07927e-08
real 0m1.419s
user 0m17.905s
sys 0m0.682s
OpenBlas library :
root@icx01-tce:/home/rcao8/mkl_test/issue_cblas_daxpy# time ./test_blas 10000000
iteración 0, numero de datos = 10000000
Performance (MFlops/seg), iteración 0, numero de datos = 10000000
Solucion 3 : 1922.24
Error relativo, iteración 0, numero de datos = 10000000
Solucion 3 : -6.07927e-08
real 0m6.259s
user 0m5.973s
sys 0m0.257s
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you again for reaching us. This issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page