Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

MKL Functions in C: much slower the first iteration

Davide87
Beginner
499 Views

Hello everybody.

I'm using the Intel OneMKL routines in my C project in Eclipse environment.

I need to perform some tasks using MKL and estimate the elapsed times varying the size of the matrices involved.

To estimate the elapsed times, I perform the same function in a for cycle overwriting the same memory allocations over and over again.

One sample of my code is as follows:

 

MKL_Complex8 *A;
float *B;

A = (MKL_Complex8*)malloc(N*sizeof(MKL_Complex8));
B = (MKL_Complex8*)malloc(N*sizeof(float));

// fill the vector A by reading from a .bin file

double T0 = 0.0;
for(int loop=0; loop<LOOP_COUNT; loop++)
{
    T0 = dsecnd();
    vcAbs(N, A, B);
    printf("Elapsed time in milliseconds: %f\n",(dsecnd()-T0)*1000);
}

 

where N is an integer variable that indicates the size of the A and B vectors.

If LOOP_COUNT=10, then the result I obtain in the console is as follows:

 

Elapsed time in milliseconds: 3.760848
Elapsed time in milliseconds: 0.029799
Elapsed time in milliseconds: 0.027766
Elapsed time in milliseconds: 0.022673
Elapsed time in milliseconds: 0.023277
Elapsed time in milliseconds: 0.022508
Elapsed time in milliseconds: 0.022143
Elapsed time in milliseconds: 0.021755
Elapsed time in milliseconds: 0.021557
Elapsed time in milliseconds: 0.021865

 

The vcAbs function is much, much slower at the first iteration than it is in all the others and the same goes for other MKL functions.

(The c project is built in Release mode with -O3 optimization level).

Changing the optimization level, inizializing the vector B with a memset or preventively using the vcAbs function on a smaller verctor does not change the outcome.

There is a reason why the MKL functions behave this way? Is there a way to fix this issue?

Obviously, in my final project the function will need to run only once, therefore the effective elapsed time will be the 3+ milliseconds.

Thank you in advance.

0 Kudos
0 Replies
Reply