Why different thread num makes no different in performance?

Yan__Lin · ‎10-14-2019

Hi everyone,

I'm testing MKL using VisualStudio 2019 and MKL v2019.5 on Intel i7-9750H CPU with 6 cores and 12 threads.I'm interested in the time consumed of vector mathematics and FFT functions in MKL.As I understand it, as to these two categories of functions,the time consumed should decrease when max theads num increases.But it did'nt happen to vector mathematics functions.I have tested vcMul and vcAdd function.The time consumed just makes no much different between thread num setting to 1 and 6.It's werid to me and I can't figure out a reason for it.Can anyone help me about it?The code is attached below,thanks very much!

////////////////////////////////

int N = 16384;
int M = 2000;

//#define FFTTEST
#define CMULTEST
int main(void)
{

double clkfreq = mkl_get_clocks_frequency();

   unsigned MKL_INT64 startclk, endclk;
   double time;
   double time2[16384];
   int kk = 0;

/* Execution status */
MKL_LONG status = 0;

DFTI_DESCRIPTOR_HANDLE hand = 0;

//mkl_set_dynamic(0);

   //mkl_set_num_threads(1);
   int threadnum = mkl_get_max_threads();
   printf("设置线程数：%d\n", threadnum);
   printf("FFT点数：%d FFT次数：%d\n", N,M);

   /* Pointer to input/output data */
   MKL_Complex8* x = 0;
   MKL_Complex8* y = 0;
   x = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   y = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   MKL_Complex8* x2 = 0;
   MKL_Complex8* y2 = 0;
   x2 = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   y2 = (MKL_Complex8*)mkl_malloc(N * M * sizeof(MKL_Complex8), 64);
   if (x == NULL) goto failed;

   init2(x, x2);
   vmlSetMode(VML_EP);
   mkl_get_cpu_clocks(&startclk);
   for (kk = 0; kk < M; kk++)
   {
       vcAdd(N, &x[N*kk], &x2[N * kk], &y[N * kk]);
   }
   mkl_get_cpu_clocks(&endclk);
   time = (double)(endclk - startclk) / (clkfreq * 1e9) * 1e6 / M;
   printf("复乘： %f us\n", time);

   mkl_free(x);
   mkl_free(y);
   mkl_free(x2);
   mkl_free(y2);

failed:
return 0;
}

Pamela_H_Intel · ‎10-18-2019

Lin,

MKL internally parallelizes using OpenMP. If you are using a threading library, you need to turn off MKL threading. -- Read this article: https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application

If that does not help, tell me what link and compile lines are you using?

By the way, Intel MKL does not yet support Visual Studio 2019. Though I would not expect that to cause this kind of performance issue.

Pamela