- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
On a dual-core win64 machine, I have some code that uses LAPACK and BLAS inside some numerical computation.s
I experimented withMKL_NUM_THREADS as follows:
Not set(i.e. let MKL use both cores):CPU time=54.5s;wall clock time=28.0s
=1: CPU time=27.6s; wall clock time=27.7s
How come letting MKL use both cores uses more CPU and has a longer wall clock time?
Note: my PC does not have HT enabled.
Ссылка скопирована
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Thanks Tim for your reply. So basically we are saying that this behaviour is not unexpected. Or put another way, the algorithm inside MKL that decides how many threads may not always choose the optimal number of threads?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
On a dual-core win64 machine, I have some code that uses LAPACK and BLAS inside some numerical computation.s
I experimented withMKL_NUM_THREADS as follows:
Not set(i.e. let MKL use both cores):CPU time=54.5s;wall clock time=28.0s
=1: CPU time=27.6s; wall clock time=27.7s
How come letting MKL use both cores uses more CPU and has a longer wall clock time?
Note: my PC does not have HT enabled.
Tony,
what is the typical size of your task?
--Gennnady
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Tony,
what is the typical size of your task?
--Gennnady
To be more accurate, we are using a third party linear sparse solver inside our application andthis solver makes heavy use of the BLAS - so the issue is related to BLAS, not LAPACK as I first thought. Our problem size is n=163. The sparse solver makes use of at least level 1 and 2 blas and possibly level 3 (I can check if knowing this is important).
We are using Fortran 10.0.25 and MKL 10.1.1 and this is on a windows win64 (2-core no HT) machine, but we are also seeing the same type of behaviour on linux machines (8 core HT) too.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
I tried another test. I extracted a matrix from our application and set up an off-line test to solve and factorise that matrix repeatedly. Here are the results (on a dual core win64 machine):
NUM_MKL_THREADS CPU Time Wall clock
Not set 91.5 47.66
154.4 54.61
In this case, the CPU is a lot for when the 2 cores are used, but the wall clock time does go down. What this tells me is that (not suprisingly) there is a cost of mult-threading, but that cost generally pays off.
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Garry,
If I understood you right, you are using third party solver and BLAS routine (Is it dgemm or another routine? ) with the square matrix (163x163). Am I right?
I guess third party solver is not mkl's routine and
Could you send us the similar performance numbers for the BLAS routine?
And one more question - what is the CPU type you are running on?
--Gennady
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Garry,
If I understood you right, you are using third party solver and BLAS routine (Is it dgemm or another routine? ) with the square matrix (163x163). Am I right?
I guess third party solver is not mkl's routine and
Could you send us the similar performance numbers for the BLAS routine?
And one more question - what is the CPU type you are running on?
--Gennady
The third party solver uses a variety of BLAS routines - I am not sure which one of these could be the culprit. The third party solver is not MKLs - it is a Fortran sparse linear solver. I need to dig into the third party code and maybe try to track it down, but it will take some time. It is likely to be DGEMM, but I amnot 100%sure.The matrix is 163x163 that the third party solving is solving, but I need to make sure that N=163 on the BLAS calls because it may be doing some partitioning.
So, what you would like is for to break the problem down and try to find out which BLAS routine is the culprit?
Im running on win64, chip details below, but we have also seen similiar behaviour on linux.
Intel Xeon CPU 5150 @ 2.66Hz, no HT
If you can confirm the next steps, I can work with you to diagnose this problem further...
thank you!
Tony
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
The third party solver uses a variety of BLAS routines - I am not sure which one of these could be the culprit. The third party solver is not MKLs - it is a Fortran sparse linear solver. I need to dig into the third party code and maybe try to track it down, but it will take some time. It is likely to be DGEMM, but I amnot 100%sure.The matrix is 163x163 that the third party solving is solving, but I need to make sure that N=163 on the BLAS calls because it may be doing some partitioning.
So, what you would like is for to break the problem down and try to find out which BLAS routine is the culprit?
Im running on win64, chip details below, but we have also seen similiar behaviour on linux.
Intel Xeon CPU 5150 @ 2.66Hz, no HT
If you can confirm the next steps, I can work with you to diagnose this problem further...
thank you!
Tony
Hi Gennnady - any update please?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
Hi Gennnady - any update please?
The solution time of 27 seconds is too slow for 163-by-163 linear system, so I assume 163 is the BLAS block size? Or are you using iterative solver?
- Отметить как новое
- Закладка
- Подписаться
- Отключить
- Подписка на RSS-канал
- Выделить
- Печать
- Сообщить о недопустимом содержимом
The solution time of 27 seconds is too slow for 163-by-163 linear system, so I assume 163 is the BLAS block size? Or are you using iterative solver?
Unfortunately, I cannot say which sparse solver were we using.

- Подписка на RSS-канал
- Отметить тему как новую
- Отметить тему как прочитанную
- Выполнить отслеживание данной Тема для текущего пользователя
- Закладка
- Подписаться
- Страница в формате печати