VML does not use all available threads

kvtournh1 · ‎08-15-2007

Hi,

I wanted to compare the speed of the vml function vzExp to the normal complex exponential function of libm.

I wrote a simple timing program to compare these for different array lengths, as wel as different number of used threads.

I ran this on our cluster which has 2 xeon dual core processors per node, so it has 4 cores, so 4 threads.

For some strange reason, vml only uses 2 of the 4 threads, and this for more then 1000 000 elements in the array. What could be the reason of this?

Could the reason be that the master of our cluster only has one xeon processor, and that I installed it on that one in the opt directory, and later shared this opt directory to the nodes?

I have placed the program below,

This is how I compiled it
icpc -o timing -O3 -openmp timing.cpp -lvml

This is how I ran it
export OMP_NUM_THREADS=4
./timing

You clearly see when vml starts to use the second thread, but never uses the other two.

Thanks in advance
klaas

#include 
#include 
#include 
#include 
#include 

#include 


int main(void) {

  unsigned long int N = 1000000;
  unsigned long int M;

  std::complex *c;
  std::complex *z;

  c = new std::complex ;
  z = new std::complex ;

  double t1,t2;
  double T;


  for (register unsigned long int k = 0; k < N; ++k)
    c = std::complex(rand()/double(RAND_MAX),rand()/double(RAND_MAX));

  long int delta = 1;

  for (register long int k = 1; k <= N; k+=delta) {
    if (k == 10*delta) delta *= 10;
    M = N*10/k;

    // the libm version
    t1 = omp_get_wtime();
    for (register int i = 0; i < M; ++i)
    for (register int l = 0; l < k; ++l)
      z = exp(c);
    t2 = omp_get_wtime();

    T = (t2 - t1)/M;

    std::cout << k << "	" << T << "	" << T/k << "	";

    for (register int omp = 1; omp <= 4; omp *= 2) {
      omp_set_num_threads(omp);

      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
#pragma omp parallel for
        for (register int l = 0; l < k; ++l)
          z = exp(c);
      t2 = omp_get_wtime();
      T = (t2 - t1)/M;

      std::cout << T << "	" << T/k << "	";


      // the mkl version
      vmlSetMode(VML_HA);
      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
       vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
      t2 = omp_get_wtime();
      T = (t2 - t1)/M;

      std::cout << T << "	" << T/k << "	";

      // the mkl version
      vmlSetMode(VML_LA);
      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
        vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
      t2 = omp_get_wtime();
      T = (t2 -t1)/M;

      std::cout << T << "	" << T/k << "	";
    }

    std::cout << std::endl;
  }


  return 0;
}

Andrey_N_Intel · ‎08-15-2007

VML threading in MKL9 (the version you presumably use) containedbug fix for whichwill beincluded into the nearestrelease of the library.Thanks, Andrey

kvtournh1 · ‎08-18-2007

Thanks for the response.
When do you think we can expect this bug fix?

Andrey_G_Intel2 · ‎08-20-2007

This fix will be available in MKL from 9.1.1 and 10.0 beta.