Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6974 Discussions

First test show bad performace - what's wrong?

akerlund
Beginner
309 Views
I just installed MKL 9.1.027 and wanted to try it out with c++, visual studio 2005.

First I made this simple wrapper namespace:

namespace mkl
{
struct vec3f
{
vec3f(const float x, const float y, const float z)
{
e[0] = x;
e[1] = y;
e[2] = z;
}
float e[4];
};

inline void sqrt(const vec3f &in, vec3f &out)
{
vsSqrt(3, in.e, out.e);
}
}



Then I also wrote this to compare with:
void oldSqrt(mkl::vec3f &in, mkl::vec3f &out)
{
out.e[0] = sqrtf(in.e[0]);
out.e[1] = sqrtf(in.e[1]);
out.e[2] = sqrtf(in.e[2]);
}


And this is the testing code (I changed the function call and timed the different runs):
mkl::vec3f v(9.0f, 0.0f, 100.0f);
mkl::vec3f v2(0.0f, 0.0f, 0.0f);

for (unsigned int i = 0; i < 10000000; ++i)
mkl::sqrt(v,v2);

Now, the run time for the standard sqrtf was 0.0003 seconds, but for the MKL version I had to wait 1.4 seconds! Why is this? These are my additional dependencies:
mkl_c_dll.lib
mkl_ia32.lib
libguide40.lib
0 Kudos
4 Replies
Andrey_G_Intel2
Employee
309 Views

akerlund,

you are trying to calculate vsSqrt on very short vector. For most cases you will not receive performance gain from VML usage for such small vectors. Try biggervectors - with 100 elements or more.

Andrey

0 Kudos
akerlund
Beginner
309 Views
Only somewhere between 100k and 500k floats, I see that vsSqrt runs faster. Is this right? Here is the new code I am testing with:

int howMany;
cin >> howMany;
float *numbersIn = new float[howMany];
float *numbersOut = new float[howMany];
for (int i = 0; i < howMany; ++i)
numbersIn = (1.0f / RAND_MAX) * rand();

Timer tm;

//vsSqrt(howMany, numbersIn, numbersOut);
for (int i = 0; i < howMany; ++i)
numbersOut = sqrtf(numbersIn);

float a = 0.0f;
for (int i = 0; i < howMany; ++i)
a += numbersOut;

tm.Now();
printf("Time: %f | a: %f ", 1000.0f * tm.TimeElapsed(), a);
0 Kudos
TimP
Honored Contributor III
309 Views
Your un-optimized sum reduction will take a significant part of the time, as well as likely producing insufficient accuracy, for such long vectors. Certainly, it would take rather long vectors before VML sqrt() could compete with optimized source code. If you are interested in performance on such code, you should consider SSE parallel intrinsics, or a vectorizing compiler.
0 Kudos
levicki
Valued Contributor I
309 Views

Regarding your first test case — I am not sure how MKL handles floating point exceptions but that may as well be the cause of the slowdown. Try picking numbers so as to avoid denormals after repeatedly calculating square root for many iterations.

0 Kudos
Reply