<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL Functions in C: much slower the first iteration in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/MKL-Functions-in-C-much-slower-the-first-iteration/m-p/1549613#M8258</link>
    <description>&lt;P&gt;Hello everybody.&lt;/P&gt;&lt;P&gt;I'm using the Intel OneMKL routines in my C project in Eclipse environment.&lt;/P&gt;&lt;P&gt;I need to perform some tasks using MKL and estimate the elapsed times varying the size of the matrices involved.&lt;/P&gt;&lt;P&gt;To estimate the elapsed times, I perform the same function in a for cycle overwriting the same memory allocations over and over again.&lt;/P&gt;&lt;P&gt;One sample of my code is as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MKL_Complex8 *A;
float *B;

A = (MKL_Complex8*)malloc(N*sizeof(MKL_Complex8));
B = (MKL_Complex8*)malloc(N*sizeof(float));

// fill the vector A by reading from a .bin file

double T0 = 0.0;
for(int loop=0; loop&amp;lt;LOOP_COUNT; loop++)
{
    T0 = dsecnd();
    vcAbs(N, A, B);
    printf("Elapsed time in milliseconds: %f\n",(dsecnd()-T0)*1000);
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where N is an integer variable that indicates the size of the A and B vectors.&lt;/P&gt;&lt;P&gt;If LOOP_COUNT=10, then the result I obtain in the console is as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Elapsed time in milliseconds: 3.760848
Elapsed time in milliseconds: 0.029799
Elapsed time in milliseconds: 0.027766
Elapsed time in milliseconds: 0.022673
Elapsed time in milliseconds: 0.023277
Elapsed time in milliseconds: 0.022508
Elapsed time in milliseconds: 0.022143
Elapsed time in milliseconds: 0.021755
Elapsed time in milliseconds: 0.021557
Elapsed time in milliseconds: 0.021865&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The vcAbs function is much, much slower at the first iteration than it is in all the others and the same goes for other MKL functions.&lt;/P&gt;&lt;P&gt;(The c project is built in Release mode with -O3 optimization level).&lt;/P&gt;&lt;P&gt;Changing the optimization level, inizializing the vector B with a memset or preventively using the vcAbs function on a smaller verctor does not change the outcome.&lt;/P&gt;&lt;P&gt;There is a reason why the MKL functions behave this way? Is there a way to fix this issue?&lt;/P&gt;&lt;P&gt;Obviously, in my final project the function will need to run only once, therefore the effective elapsed time will be the 3+ milliseconds.&lt;/P&gt;&lt;P&gt;Thank you in advance.&lt;/P&gt;</description>
    <pubDate>Fri, 01 Dec 2023 15:02:10 GMT</pubDate>
    <dc:creator>Davide87</dc:creator>
    <dc:date>2023-12-01T15:02:10Z</dc:date>
    <item>
      <title>MKL Functions in C: much slower the first iteration</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MKL-Functions-in-C-much-slower-the-first-iteration/m-p/1549613#M8258</link>
      <description>&lt;P&gt;Hello everybody.&lt;/P&gt;&lt;P&gt;I'm using the Intel OneMKL routines in my C project in Eclipse environment.&lt;/P&gt;&lt;P&gt;I need to perform some tasks using MKL and estimate the elapsed times varying the size of the matrices involved.&lt;/P&gt;&lt;P&gt;To estimate the elapsed times, I perform the same function in a for cycle overwriting the same memory allocations over and over again.&lt;/P&gt;&lt;P&gt;One sample of my code is as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;MKL_Complex8 *A;
float *B;

A = (MKL_Complex8*)malloc(N*sizeof(MKL_Complex8));
B = (MKL_Complex8*)malloc(N*sizeof(float));

// fill the vector A by reading from a .bin file

double T0 = 0.0;
for(int loop=0; loop&amp;lt;LOOP_COUNT; loop++)
{
    T0 = dsecnd();
    vcAbs(N, A, B);
    printf("Elapsed time in milliseconds: %f\n",(dsecnd()-T0)*1000);
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where N is an integer variable that indicates the size of the A and B vectors.&lt;/P&gt;&lt;P&gt;If LOOP_COUNT=10, then the result I obtain in the console is as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Elapsed time in milliseconds: 3.760848
Elapsed time in milliseconds: 0.029799
Elapsed time in milliseconds: 0.027766
Elapsed time in milliseconds: 0.022673
Elapsed time in milliseconds: 0.023277
Elapsed time in milliseconds: 0.022508
Elapsed time in milliseconds: 0.022143
Elapsed time in milliseconds: 0.021755
Elapsed time in milliseconds: 0.021557
Elapsed time in milliseconds: 0.021865&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The vcAbs function is much, much slower at the first iteration than it is in all the others and the same goes for other MKL functions.&lt;/P&gt;&lt;P&gt;(The c project is built in Release mode with -O3 optimization level).&lt;/P&gt;&lt;P&gt;Changing the optimization level, inizializing the vector B with a memset or preventively using the vcAbs function on a smaller verctor does not change the outcome.&lt;/P&gt;&lt;P&gt;There is a reason why the MKL functions behave this way? Is there a way to fix this issue?&lt;/P&gt;&lt;P&gt;Obviously, in my final project the function will need to run only once, therefore the effective elapsed time will be the 3+ milliseconds.&lt;/P&gt;&lt;P&gt;Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Dec 2023 15:02:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MKL-Functions-in-C-much-slower-the-first-iteration/m-p/1549613#M8258</guid>
      <dc:creator>Davide87</dc:creator>
      <dc:date>2023-12-01T15:02:10Z</dc:date>
    </item>
  </channel>
</rss>

