- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a function that finds the turning point of a composite curve consisting of multiple sine curves. Can anyone suggest anything I can do to get this function to vectorize?
I tried turning Curves
Can anyone think of a clever way this could be re-arranged so it vectorizes?
Thanks in advance.
class SineTurns
{
public:
unsigned int CurvesC;
float **CurvesV;
float Value;
SineTurns ();
void Search ();
}
SineTurns::SineTurns () {}
void SineTurns::Search ()
{
static float Frequencies[MAXCURVES];
static float SearchSpace[MAX_TURN_DISTANCE];
int x;
float xx;
unsigned int i;
int x1; // x+1
int x2; // x+2
for (i = 0; i < CurvesC; i++)
Frequencies = 1.0f / CurvesV[0];
bzero(SearchSpace, MAX_TURN_DISTANCE);
for (x = 0, xx = 0; x > -MAX_TURN_DISTANCE; x--)
{
#pragma ivdep // Bogus flow dependence without ivdep
// dereference too complex with ivdep :-(
for (i = 0; i < CurvesC; i++)
{
SearchSpace+= CurvesV[2] * sinf (Frequencies * xx + CurvesV[1]) + CurvesV[3];
}
xx--;
if (x < -1)
{
x1 = x + 1;
x2 = x + 2;
if ( (SearchSpace[x1] > SearchSpace&& SearchSpace[x1] > SearchSpace[x2]) ||
(SearchSpace[x1] > SearchSpace&& SearchSpace[x1] > SearchSpace[x2]))
{
Value = SearchSpace[x1];
break;
}
}
}
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This vectorizes very well with Intel C++ 10.027 if you make the memory traversal unit stride for the CurvesV array.
Best Regards,
Lars Petter Endresen
for (i = 0; i < CurvesC; i++){
SearchSpace
+= CurvesV[2] * sinf (Frequencies * xx + CurvesV[1]) + CurvesV[3]; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if you make the memory traversal unit stride for the CurvesV array
What exactly do you mean by that?
I tried what you suggested (turning and
I usually lean toward trusting the compiler's guess on such things, because it usually means something isn't aligned properly. Granted, CurvesV is a float**, so the compiler won't necessarily know how it's aligned, but is that the only problem? If so I'm not bothered (I can ensure that I align the CurvesV[][] properly at new() time, but I'm concerned that there could be something else causing inefficiency there.
With 10.0.026, I get "remark: loop was not vectorized: unsupported loop structure."
:-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am very sorry for the confusion. Indeed if the loop trip count is too low, vectorizing may not always be efficient. Please read Aart Bik's excellent bookabout vectorization. I guessed that the loop had trip count of 640,
#include
#define
MAXCURVES 640#define
MAX_TURN_DISTANCE 640
which gave the following efficient code, calling the optimized SVML (Short Vector Math Library) math function ___svml_sinf4,
movups xmm1, XMMWORD PTR [esi+edi*4] movaps xmm0, XMMWORD PTR Frequencies$170$0$0[0+edi*4]mulps xmm0, XMMWORD PTR [esp+32] addps xmm0, xmm1 call ___svml_sinf4
Best Regards,
Lars Petter Endresen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I don't know if this is still an issue since it is several months old,
but I've found that using class variables in a loop can prohibit vectorization.
This goes even for the loop count variable. For example you might try
reassigning 'CurvesC' and 'CurvesV' to local variables:
int count = CurvesC;
float *C1 = &(CurvesV[1][0]);
float *C2 = &(CurvesV[2][0]);
float *C3 = &(CurvesV[3][0]);
#pragma ivdep
for (i = 0; i < count; i++)
SearchSpace
+= C2 * sinf (Frequencies * xx + C1) + C3; I'm guessing the two-dimensional arrays aren't a problem,
but generally you should try to simplify the code as much as possible
for easier parsing by the vectorizer.
Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry that last post came out kind of messy but hopefully you get the idea.
-Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guessed that the loop had trip count of 640,#include
#define MAXCURVES 640
#define MAX_TURN_DISTANCE 640
640 is an order of magnitude out, but my understanding is that
even a trip count of 4 can be sufficient for effective vectorization.
MAXCURVES is 16 (but it's typically between 4 and 8, 16 is
just the hard limit, actual number is CurvesC).
MAX_TURN_DISTANCE is 64. I guess this makes it reasonable to
just tell the compiler to vectorize anyway and be done with it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Strange, the code always vectorizes in my simple test project, even with trip count as low as 4.I am using Visual Studio 2005 with Intel C++ 10.027.
Best Regards,
Lars Petter Endresen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your testcase include the C++ class structure?As I wrote in the earlier post,classvariable references can inhibit vectorization. You can also use "#pragma loop count (x)" to give the compiler a hint about loop trip counts.
-Mike
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page