Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JoaoAlves95
Novice
168 Views

MKL's simat_copy poor parallel performance

Hello all,

I've been doing some testing with Intel's MKL simat_copy function and noticed that its multi-threaded version is in most cases slower than its sequential counter-part (even for large matrices).

The following results were obtained on a Intel i9-10980XE CPU, with environment variables OMP_NUM_THREADS=N and OMP_DYNAMIC=false. I've also tested it with OMP_DYNAMIC=true but the results don't seem to change. The file was compiled using the transposition example Makefile and GCC.

 

Single-threaded:

Number of threads:1
Major version: 2020
...
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost)
================================================================

Transpose took 0.046586 seconds

 

Multi-threaded:

Number of threads 2: Transpose took 0.067779 seconds

Number of threads 4: Transpose took 0.033118 seconds

Number of threads 8: Transpose took 0.046896 seconds

Number of threads 10: Transpose took 0.015994 seconds

Number of threads 18: Transpose took 0.045859 seconds

 

I find these results very strange and can't find away to explain or improve them.

 

Any insights regarding how to optimize the parallel version will be deeply appreciated!

0 Kudos
2 Replies
JoaoAlves95
Novice
164 Views

Forgot to add that the input matrix is 8000x8000 and also tested with variable.

RahulV_intel
Moderator
139 Views

Hi,


Thanks for reporting this issue. I have forwarded your query to the MKL experts. They will get back to you.


Regards,

Rahul


Reply