Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6534 Discussions

MKL's simat_copy poor parallel performance

JoaoAlves95
Novice
439 Views

Hello all,

I've been doing some testing with Intel's MKL simat_copy function and noticed that its multi-threaded version is in most cases slower than its sequential counter-part (even for large matrices).

The following results were obtained on a Intel i9-10980XE CPU, with environment variables OMP_NUM_THREADS=N and OMP_DYNAMIC=false. I've also tested it with OMP_DYNAMIC=true but the results don't seem to change. The file was compiled using the transposition example Makefile and GCC.

 

Single-threaded:

Number of threads:1
Major version: 2020
...
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost)
================================================================

Transpose took 0.046586 seconds

 

Multi-threaded:

Number of threads 2: Transpose took 0.067779 seconds

Number of threads 4: Transpose took 0.033118 seconds

Number of threads 8: Transpose took 0.046896 seconds

Number of threads 10: Transpose took 0.015994 seconds

Number of threads 18: Transpose took 0.045859 seconds

 

I find these results very strange and can't find away to explain or improve them.

 

Any insights regarding how to optimize the parallel version will be deeply appreciated!

0 Kudos
3 Replies
JoaoAlves95
Novice
435 Views

Forgot to add that the input matrix is 8000x8000 and also tested with variable.

RahulV_intel
Moderator
410 Views

Hi,


Thanks for reporting this issue. I have forwarded your query to the MKL experts. They will get back to you.


Regards,

Rahul


Khang_N_Intel
Employee
247 Views

Hi Joao,


I have been for your reply about which OS (Windows or Linux) you were using.

You didn't even me the instruction how to build the app.


I went ahead and built this code on both Windows and Linux.


Windows:

icl /Qopenmp parallel_test.c /Qmkl=parallel


Linux:

gcc  -DMKL_ILP64 -m64 -I"${MKLROOT}/include" parallel_test.c -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl


I was able to build and link on both Windows and Linux. However, when I tried to run the code, it gave me a segmentation fault error.

I tested the code on the latest version of oneMKL, 2021.2.0.


Since it has been a long time, I would assume that you already got this issue resolved. I will go ahead and close this issue.





Reply