- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
I wanted to compare the speed of the vml function vzExp to the normal complex exponential function of libm.
I wrote a simple timing program to compare these for different array lengths, as wel as different number of used threads.
I ran this on our cluster which has 2 xeon dual core processors per node, so it has 4 cores, so 4 threads.
For some strange reason, vml only uses 2 of the 4 threads, and this for more then 1000 000 elements in the array. What could be the reason of this?
Could the reason be that the master of our cluster only has one xeon processor, and that I installed it on that one in the opt directory, and later shared this opt directory to the nodes?
I have placed the program below,
This is how I compiled it
icpc -o timing -O3 -openmp timing.cpp -lvml
This is how I ran it
export OMP_NUM_THREADS=4
./timing
You clearly see when vml starts to use the second thread, but never uses the other two.
Thanks in advance
klaas
I wanted to compare the speed of the vml function vzExp to the normal complex
I wrote a simple timing program to compare these for different array lengths, as wel as different number of used threads.
I ran this on our cluster which has 2 xeon dual core processors per node, so it has 4 cores, so 4 threads.
For some strange reason, vml only uses 2 of the 4 threads, and this for more then 1000 000 elements in the array. What could be the reason of this?
Could the reason be that the master of our cluster only has one xeon processor, and that I installed it on that one in the opt directory, and later shared this opt directory to the nodes?
I have placed the program below,
This is how I compiled it
icpc -o timing -O3 -openmp timing.cpp -lvml
This is how I ran it
export OMP_NUM_THREADS=4
./timing
You clearly see when vml starts to use the second thread, but never uses the other two.
Thanks in advance
klaas
#include
#include
#include
#include
#include
#include
int main(void) {
unsigned long int N = 1000000;
unsigned long int M;
std::complex*c;
std::complex*z;
c = new std::complex;
z = new std::complex;
double t1,t2;
double T;
for (register unsigned long int k = 0; k < N; ++k)
c= std::complex (rand()/double(RAND_MAX),rand()/double(RAND_MAX));
long int delta = 1;
for (register long int k = 1; k <= N; k+=delta) {
if (k == 10*delta) delta *= 10;
M = N*10/k;
// the libm version
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
for (register int l = 0; l < k; ++l)
z= exp(c );
t2 = omp_get_wtime();
T = (t2 - t1)/M;
std::cout << k << " " << T << " " << T/k << " ";
for (register int omp = 1; omp <= 4; omp *= 2) {
omp_set_num_threads(omp);
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
#pragma omp parallel for
for (register int l = 0; l < k; ++l)
z= exp(c );
t2 = omp_get_wtime();
T = (t2 - t1)/M;
std::cout << T << " " << T/k << " ";
// the mkl version
vmlSetMode(VML_HA);
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
t2 = omp_get_wtime();
T = (t2 - t1)/M;
std::cout << T << " " << T/k << " ";
// the mkl version
vmlSetMode(VML_LA);
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
t2 = omp_get_wtime();
T = (t2 -t1)/M;
std::cout << T << " " << T/k << " ";
}
std::cout << std::endl;
}
return 0;
}
링크가 복사됨
3 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
VML threading in MKL9 (the version you presumably use) containedbug fix for whichwill beincluded into the nearestrelease of the library.Thanks, Andrey
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
This fix will be available in MKL from 9.1.1 and 10.0 beta.