Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

VML does not use all available threads

kvtournh1
Beginner
589 Views
Hi,

I wanted to compare the speed of the vml function vzExp to the normal complex exponential function of libm.

I wrote a simple timing program to compare these for different array lengths, as wel as different number of used threads.

I ran this on our cluster which has 2 xeon dual core processors per node, so it has 4 cores, so 4 threads.

For some strange reason, vml only uses 2 of the 4 threads, and this for more then 1000 000 elements in the array. What could be the reason of this?

Could the reason be that the master of our cluster only has one xeon processor, and that I installed it on that one in the opt directory, and later shared this opt directory to the nodes?

I have placed the program below,

This is how I compiled it
icpc -o timing -O3 -openmp timing.cpp -lvml

This is how I ran it
export OMP_NUM_THREADS=4
./timing

You clearly see when vml starts to use the second thread, but never uses the other two.

Thanks in advance
klaas

#include 
#include
#include
#include
#include

#include


int main(void) {

unsigned long int N = 1000000;
unsigned long int M;

std::complex *c;
std::complex *z;

c = new std::complex ;
z = new std::complex ;

double t1,t2;
double T;


for (register unsigned long int k = 0; k < N; ++k)
c = std::complex(rand()/double(RAND_MAX),rand()/double(RAND_MAX));

long int delta = 1;

for (register long int k = 1; k <= N; k+=delta) {
if (k == 10*delta) delta *= 10;
M = N*10/k;

// the libm version
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
for (register int l = 0; l < k; ++l)
z = exp(c);
t2 = omp_get_wtime();

T = (t2 - t1)/M;

std::cout << k << " " << T << " " << T/k << " ";

for (register int omp = 1; omp <= 4; omp *= 2) {
omp_set_num_threads(omp);

t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
#pragma omp parallel for
for (register int l = 0; l < k; ++l)
z = exp(c);
t2 = omp_get_wtime();
T = (t2 - t1)/M;

std::cout << T << " " << T/k << " ";


// the mkl version
vmlSetMode(VML_HA);
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
t2 = omp_get_wtime();
T = (t2 - t1)/M;

std::cout << T << " " << T/k << " ";

// the mkl version
vmlSetMode(VML_LA);
t1 = omp_get_wtime();
for (register int i = 0; i < M; ++i)
vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
t2 = omp_get_wtime();
T = (t2 -t1)/M;

std::cout << T << " " << T/k << " ";
}

std::cout << std::endl;
}


return 0;
}
0 Kudos
3 Replies
Andrey_N_Intel
Employee
589 Views

VML threading in MKL9 (the version you presumably use) containedbug fix for whichwill beincluded into the nearestrelease of the library.Thanks, Andrey

0 Kudos
kvtournh1
Beginner
589 Views
Thanks for the response.
When do you think we can expect this bug fix?
0 Kudos
Andrey_G_Intel2
Employee
589 Views
This fix will be available in MKL from 9.1.1 and 10.0 beta.
0 Kudos
Reply