- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just installed MKL 9.1.027 and wanted to try it out with c++, visual studio 2005.
First I made this simple wrapper namespace:
namespace mkl
{
struct vec3f
{
vec3f(const float x, const float y, const float z)
{
e[0] = x;
e[1] = y;
e[2] = z;
}
float e[4];
};
inline void sqrt(const vec3f &in, vec3f &out)
{
vsSqrt(3, in.e, out.e);
}
}
Then I also wrote this to compare with:
void oldSqrt(mkl::vec3f &in, mkl::vec3f &out)
{
out.e[0] = sqrtf(in.e[0]);
out.e[1] = sqrtf(in.e[1]);
out.e[2] = sqrtf(in.e[2]);
}
And this is the testing code (I changed the function call and timed the different runs):
mkl::vec3f v(9.0f, 0.0f, 100.0f);
mkl::vec3f v2(0.0f, 0.0f, 0.0f);
for (unsigned int i = 0; i < 10000000; ++i)
mkl::sqrt(v,v2);
Now, the run time for the standard sqrtf was 0.0003 seconds, but for the MKL version I had to wait 1.4 seconds! Why is this? These are my additional dependencies:
mkl_c_dll.lib
mkl_ia32.lib
libguide40.lib
First I made this simple wrapper namespace:
namespace mkl
{
struct vec3f
{
vec3f(const float x, const float y, const float z)
{
e[0] = x;
e[1] = y;
e[2] = z;
}
float e[4];
};
inline void sqrt(const vec3f &in, vec3f &out)
{
vsSqrt(3, in.e, out.e);
}
}
Then I also wrote this to compare with:
void oldSqrt(mkl::vec3f &in, mkl::vec3f &out)
{
out.e[0] = sqrtf(in.e[0]);
out.e[1] = sqrtf(in.e[1]);
out.e[2] = sqrtf(in.e[2]);
}
And this is the testing code (I changed the function call and timed the different runs):
mkl::vec3f v(9.0f, 0.0f, 100.0f);
mkl::vec3f v2(0.0f, 0.0f, 0.0f);
for (unsigned int i = 0; i < 10000000; ++i)
mkl::sqrt(v,v2);
Now, the run time for the standard sqrtf was 0.0003 seconds, but for the MKL version I had to wait 1.4 seconds! Why is this? These are my additional dependencies:
mkl_c_dll.lib
mkl_ia32.lib
libguide40.lib
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
akerlund,
you are trying to calculate vsSqrt on very short vector. For most cases you will not receive performance gain from VML usage for such small vectors. Try biggervectors - with 100 elements or more.
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Only somewhere between 100k and 500k floats, I see that vsSqrt runs faster. Is this right? Here is the new code I am testing with:
int howMany;
cin >> howMany;
float *numbersIn = new float[howMany];
float *numbersOut = new float[howMany];
for (int i = 0; i < howMany; ++i)
numbersIn = (1.0f / RAND_MAX) * rand();
Timer tm;
//vsSqrt(howMany, numbersIn, numbersOut);
for (int i = 0; i < howMany; ++i)
numbersOut = sqrtf(numbersIn);
float a = 0.0f;
for (int i = 0; i < howMany; ++i)
a += numbersOut;
tm.Now();
printf("Time: %f | a: %f ", 1000.0f * tm.TimeElapsed(), a);
int howMany;
cin >> howMany;
float *numbersIn = new float[howMany];
float *numbersOut = new float[howMany];
for (int i = 0; i < howMany; ++i)
numbersIn = (1.0f / RAND_MAX) * rand();
Timer tm;
//vsSqrt(howMany, numbersIn, numbersOut);
for (int i = 0; i < howMany; ++i)
numbersOut = sqrtf(numbersIn);
float a = 0.0f;
for (int i = 0; i < howMany; ++i)
a += numbersOut;
tm.Now();
printf("Time: %f | a: %f ", 1000.0f * tm.TimeElapsed(), a);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your un-optimized sum reduction will take a significant part of the time, as well as likely producing insufficient accuracy, for such long vectors. Certainly, it would take rather long vectors before VML sqrt() could compete with optimized source code. If you are interested in performance on such code, you should consider SSE parallel intrinsics, or a vectorizing compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding your first test case — I am not sure how MKL handles floating point exceptions but that may as well be the cause of the slowdown. Try picking numbers so as to avoid denormals after repeatedly calculating square root for many iterations.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page