Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7014 Discussions

why does dfeast_scsrev only use half the available computational power?

Gagan
Beginner
554 Views

hi,

 

i asked this question ten years ago and i was told at that time it was due to something with the dual processor configuration and non-uniform memory architecture (NUMA).

 

however this is wrong, as i've been abusing (lol) my 2020 wintel macbook pro 16' i9 9980HX and it is still showing 49% when running scsrev.

 

this means there is something simple or trivial on your guys' (excuse my preference for convenience in the off-chance there are ladies on the team) end.

 

0 Kudos
5 Replies
Gagan
Beginner
553 Views

i'm so good lol, i feel bad though because it's like i'm cracking a whip on the team's back

 

i wonder if i got the attention of the team's top dogg gennady--i know you watchin' fam! lol

0 Kudos
Fengrui
Moderator
430 Views

Hello!

I tried the in-package example dfeast_sparse.c with some modifications. I kept the pattern of the matrix unchanged, but made it much larger with N=10000 to have a stable run for ~10 seconds. On a dual-socket Intel(R) Xeon(R) Platinum 8480+ node (112 physical cores in total, Ubuntu 22.04.1 LTS for OS), I observed CPU usages remained ~11200% (by "top" command) for ~ 10 seconds. The oneMKL version used was 2024.1.

The subspace size looks relatively small. How long does it take to complete the run in your case? Also I'm not an expert of istat you mentioned in the old post. The difference could also come from the measuring tools.


Thanks,

Fengrui


0 Kudos
Gagan
Beginner
339 Views

fengrui i was hoping you can confirm you have replicated the observed behaviour (as discussed in private), and please elaborate on the future solution.

 

 as i understand it, you will make it so that mkl_sparse_d_ev (fpm[2]=2) and dfeast_scsrev will only use NUM_REAL_CORES*2 threads IF AND ONLY IF mkl_set_dynamic(1).

 

is this correct?

 

thanks

0 Kudos
Fengrui
Moderator
153 Views

In the tests, the threading behavior of dfeast_scsrev() is expected. By default, the number of OpenMP threads is equal to the number of real cores. 

However, we find that mkl_sparse_d_ev(fpm[2]=2) tries to use NUM_REAL_CORES*2 threads (by default) when hyperthreading is on, and mkl_set_num_threads() fails. After further investigation, we observe that even though NUM_REAL_CORES*2 threads are used by this function, only m of them are actually working, where m is number we put in mkl_set_num_threads(). The rest of the threads are just spinning. We are working on this issue. 

For mkl_set_dynamic(), by default it is turned on (mkl_set_dynamic(1)). If it is set to false, mkl_set_dynamic(0), oneMKL functions, not only mkl_sparse_d_ev and dfeast_scsrev, can use NUM_REAL_CORES*2 threads if we do "mkl_set_num_threads(NUM_REAL_CORES*2)". Though it is generally not recommended to do so. So the statement should be "...so that mkl_sparse_d_ev (fpm[2]=2) and dfeast_scsrev will only use NUM_REAL_CORES*2 threads IF AND ONLY IF mkl_set_dynamic(0) and mkl_set_num_threads(NUM_REAL_CORES*2)."

Thank you for bringing up this issue!

 

0 Kudos
Gagan
Beginner
104 Views

hi fengrui,

 

thanks for getting back to me.

 

i didn't realise that the earlier issue of mkl_set_dynamic(0) and mkl_set_num_threads(NUM_REAL_CORES*2) had been fixed, as when i tried it a decade ago it did not fix the issue (i believe top dogg gennady made the recommendation at the time).

 

it's good to know that old issue was fixed, and that the inconsistent behaviour between fpm[2]=1 and fpm[2]=2 for mkl_sparse_d_ev is a consequence of the krylov solver using twice the number of threads it is supposed to.

 

looking forward to the release with this fix, as i suspect it will increase the krylov solver's performance even further.

 

 

0 Kudos
Reply