- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I've been doing some testing with Intel's MKL simat_copy function and noticed that its multi-threaded version is in most cases slower than its sequential counter-part (even for large matrices).
The following results were obtained on a Intel i9-10980XE CPU, with environment variables OMP_NUM_THREADS=N and OMP_DYNAMIC=false. I've also tested it with OMP_DYNAMIC=true but the results don't seem to change. The file was compiled using the transposition example Makefile and GCC.
Single-threaded:
Number of threads:1
Major version: 2020
...
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost)
================================================================
Transpose took 0.046586 seconds
Multi-threaded:
Number of threads 2: Transpose took 0.067779 seconds
Number of threads 4: Transpose took 0.033118 seconds
Number of threads 8: Transpose took 0.046896 seconds
Number of threads 10: Transpose took 0.015994 seconds
Number of threads 18: Transpose took 0.045859 seconds
I find these results very strange and can't find away to explain or improve them.
Any insights regarding how to optimize the parallel version will be deeply appreciated!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Forgot to add that the input matrix is 8000x8000 and also tested with variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reporting this issue. I have forwarded your query to the MKL experts. They will get back to you.
Regards,
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Joao,
I have been for your reply about which OS (Windows or Linux) you were using.
You didn't even me the instruction how to build the app.
I went ahead and built this code on both Windows and Linux.
Windows:
icl /Qopenmp parallel_test.c /Qmkl=parallel
Linux:
gcc -DMKL_ILP64 -m64 -I"${MKLROOT}/include" parallel_test.c -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl
I was able to build and link on both Windows and Linux. However, when I tried to run the code, it gave me a segmentation fault error.
I tested the code on the latest version of oneMKL, 2021.2.0.
Since it has been a long time, I would assume that you already got this issue resolved. I will go ahead and close this issue.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page