Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

MKL Functions in C: much slower the first iteration

Davide87
Beginner
656 Views

Hello everybody.

I'm using the Intel OneMKL routines in my C project in Eclipse environment.

I need to perform some tasks using MKL and estimate the elapsed times varying the size of the matrices involved.

To estimate the elapsed times, I perform the same function in a for cycle overwriting the same memory allocations over and over again.

One sample of my code is as follows:

 

MKL_Complex8 *A;
float *B;

A = (MKL_Complex8*)malloc(N*sizeof(MKL_Complex8));
B = (MKL_Complex8*)malloc(N*sizeof(float));

// fill the vector A by reading from a .bin file

double T0 = 0.0;
for(int loop=0; loop<LOOP_COUNT; loop++)
{
    T0 = dsecnd();
    vcAbs(N, A, B);
    printf("Elapsed time in milliseconds: %f\n",(dsecnd()-T0)*1000);
}

 

where N is an integer variable that indicates the size of the A and B vectors.

If LOOP_COUNT=10, then the result I obtain in the console is as follows:

 

Elapsed time in milliseconds: 3.760848
Elapsed time in milliseconds: 0.029799
Elapsed time in milliseconds: 0.027766
Elapsed time in milliseconds: 0.022673
Elapsed time in milliseconds: 0.023277
Elapsed time in milliseconds: 0.022508
Elapsed time in milliseconds: 0.022143
Elapsed time in milliseconds: 0.021755
Elapsed time in milliseconds: 0.021557
Elapsed time in milliseconds: 0.021865

 

The vcAbs function is much, much slower at the first iteration than it is in all the others and the same goes for other MKL functions.

(The c project is built in Release mode with -O3 optimization level).

Changing the optimization level, inizializing the vector B with a memset or preventively using the vcAbs function on a smaller verctor does not change the outcome.

There is a reason why the MKL functions behave this way? Is there a way to fix this issue?

Obviously, in my final project the function will need to run only once, therefore the effective elapsed time will be the 3+ milliseconds.

Thank you in advance.

0 Kudos
0 Replies
Reply