VML does not use all available threads

kvtournh1 — Wed, 15 Aug 2007 14:45:22 GMT

Hi,

I wanted to compare the speed of the vml function vzExp to the normal complex exponential function of libm.

I wrote a simple timing program to compare these for different array lengths, as wel as different number of used threads.

I ran this on our cluster which has 2 xeon dual core processors per node, so it has 4 cores, so 4 threads.

For some strange reason, vml only uses 2 of the 4 threads, and this for more then 1000 000 elements in the array. What could be the reason of this?

Could the reason be that the master of our cluster only has one xeon processor, and that I installed it on that one in the opt directory, and later shared this opt directory to the nodes?

I have placed the program below,

This is how I compiled it
icpc -o timing -O3 -openmp timing.cpp -lvml

This is how I ran it
export OMP_NUM_THREADS=4
./timing

You clearly see when vml starts to use the second thread, but never uses the other two.

Thanks in advance
klaas

#include 
#include 
#include 
#include 
#include 

#include 


int main(void) {

  unsigned long int N = 1000000;
  unsigned long int M;

  std::complex *c;
  std::complex *z;

  c = new std::complex ;
  z = new std::complex ;

  double t1,t2;
  double T;


  for (register unsigned long int k = 0; k < N; ++k)
    c = std::complex(rand()/double(RAND_MAX),rand()/double(RAND_MAX));

  long int delta = 1;

  for (register long int k = 1; k <= N; k+=delta) {
    if (k == 10*delta) delta *= 10;
    M = N*10/k;

    // the libm version
    t1 = omp_get_wtime();
    for (register int i = 0; i < M; ++i)
    for (register int l = 0; l < k; ++l)
      z = exp(c);
    t2 = omp_get_wtime();

    T = (t2 - t1)/M;

    std::cout << k << "	" << T << "	" << T/k << "	";

    for (register int omp = 1; omp <= 4; omp *= 2) {
      omp_set_num_threads(omp);

      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
#pragma omp parallel for
        for (register int l = 0; l < k; ++l)
          z = exp(c);
      t2 = omp_get_wtime();
      T = (t2 - t1)/M;

      std::cout << T << "	" << T/k << "	";


      // the mkl version
      vmlSetMode(VML_HA);
      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
       vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
      t2 = omp_get_wtime();
      T = (t2 - t1)/M;

      std::cout << T << "	" << T/k << "	";

      // the mkl version
      vmlSetMode(VML_LA);
      t1 = omp_get_wtime();
      for (register int i = 0; i < M; ++i)
        vzExp(k,(MKL_Complex16 *) c, (MKL_Complex16 *) z);
      t2 = omp_get_wtime();
      T = (t2 -t1)/M;

      std::cout << T << "	" << T/k << "	";
    }

    std::cout << std::endl;
  }


  return 0;
}

Re: VML does not use all available threads

Andrey_N_Intel — Thu, 16 Aug 2007 06:37:09 GMT

VML threading in MKL9 (the version you presumably use) containedbug fix for whichwill beincluded into the nearestrelease of the library.Thanks, Andrey

Re: VML does not use all available threads

kvtournh1 — Sat, 18 Aug 2007 13:59:07 GMT

Thanks for the response.
When do you think we can expect this bug fix?

Re: VML does not use all available threads

Andrey_G_Intel2 — Mon, 20 Aug 2007 08:40:53 GMT

This fix will be available in MKL from 9.1.1 and 10.0 beta.

topic VML does not use all available threads in Intel® oneAPI Math Kernel Library

VML does not use all available threads

Re: VML does not use all available threads

Re: VML does not use all available threads

Re: VML does not use all available threads