Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Need suggestions on getting a piece of code to vectorize...

gordan
Beginner
905 Views
Hi,

I have a function that finds the turning point of a composite curve consisting of multiple sine curves. Can anyone suggest anything I can do to get this function to vectorize?

I tried turning Curves inside out to Curves, but it didn't help. The most difference it made was in the loop that calculates Frequencies[], and that only changed it from dereference too complex to "could vectorize but seems inefficient".

Can anyone think of a clever way this could be re-arranged so it vectorizes?

Thanks in advance.


class SineTurns
{
public:
unsigned int CurvesC;
float **CurvesV;

float Value;

SineTurns ();

void Search ();
}

SineTurns::SineTurns () {}

void SineTurns::Search ()
{
static float Frequencies[MAXCURVES];
static float SearchSpace[MAX_TURN_DISTANCE];

int x;
float xx;

unsigned int i;

int x1; // x+1
int x2; // x+2

for (i = 0; i < CurvesC; i++)
Frequencies = 1.0f / CurvesV[0];

bzero(SearchSpace, MAX_TURN_DISTANCE);

for (x = 0, xx = 0; x > -MAX_TURN_DISTANCE; x--)
{
#pragma ivdep // Bogus flow dependence without ivdep
// dereference too complex with ivdep :-(
for (i = 0; i < CurvesC; i++)
{
SearchSpace += CurvesV[2] * sinf (Frequencies * xx + CurvesV[1]) + CurvesV[3];
}
xx--;

if (x < -1)
{
x1 = x + 1;
x2 = x + 2;
if ( (SearchSpace[x1] > SearchSpace && SearchSpace[x1] > SearchSpace[x2]) ||
(SearchSpace[x1] > SearchSpace && SearchSpace[x1] > SearchSpace[x2]))
{
Value = SearchSpace[x1];
break;
}
}
}
}
0 Kudos
12 Replies
Lars_Petter_E_
Beginner
905 Views

Hello,

This vectorizes very well with Intel C++ 10.027 if you make the memory traversal unit stride for the CurvesV array.

Best Regards,

Lars Petter Endresen

for (i = 0; i < CurvesC; i++)

{

SearchSpace += CurvesV[2] * sinf (Frequencies * xx + CurvesV[1]) + CurvesV[3];

}

0 Kudos
gordan
Beginner
905 Views
if you make the memory traversal unit stride for the CurvesV array

What exactly do you mean by that?

I tried what you suggested (turning and around in CurvesV), but as I said before, that gives me a "remark: loop was not vectorized: vectorization possible but seems inefficient." with 9.1.051.

I usually lean toward trusting the compiler's guess on such things, because it usually means something isn't aligned properly. Granted, CurvesV is a float**, so the compiler won't necessarily know how it's aligned, but is that the only problem? If so I'm not bothered (I can ensure that I align the CurvesV[][] properly at new() time, but I'm concerned that there could be something else causing inefficiency there.

With 10.0.026, I get "remark: loop was not vectorized: unsupported loop structure."

:-(
0 Kudos
Intel_C_Intel
Employee
905 Views
add "#pragma vector always" after ivdep.
0 Kudos
gordan
Beginner
905 Views
So, yet again, v9.1 cat be at least forced to vectorize, and v10.0 cannot... :-(
0 Kudos
Lars_Petter_E_
Beginner
905 Views

Hello,

I am very sorry for the confusion. Indeed if the loop trip count is too low, vectorizing may not always be efficient. Please read Aart Bik's excellent bookabout vectorization. I guessed that the loop had trip count of 640,

#include

#define

MAXCURVES 640

#define

MAX_TURN_DISTANCE 640

which gave the following efficient code, calling the optimized SVML (Short Vector Math Library) math function ___svml_sinf4,

movups xmm1, XMMWORD PTR [esi+edi*4]

movaps xmm0, XMMWORD PTR Frequencies$170$0$0[0+edi*4]

mulps xmm0, XMMWORD PTR [esp+32]

addps xmm0, xmm1

call ___svml_sinf4

Best Regards,

Lars Petter Endresen

0 Kudos
Michael_S_Intel8
Employee
905 Views

Hello,

I don't know if this is still an issue since it is several months old,

but I've found that using class variables in a loop can prohibit vectorization.

This goes even for the loop count variable. For example you might try

reassigning 'CurvesC' and 'CurvesV' to local variables:

int count = CurvesC;

float *C1 = &(CurvesV[1][0]);

float *C2 = &(CurvesV[2][0]);

float *C3 = &(CurvesV[3][0]);

#pragma ivdep

for (i = 0; i < count; i++)

SearchSpace += C2 * sinf (Frequencies * xx + C1) + C3;

I'm guessing the two-dimensional arrays aren't a problem,

but generally you should try to simplify the code as much as possible

for easier parsing by the vectorizer.

Mike

0 Kudos
Michael_S_Intel8
Employee
905 Views
I guess this is a new issue, I got confused by the "joined on" date in the original post. Note, you could also use Vector Math Library (part of MKL) to do the sin() call and ensure you get best performance from that: __declspec(align(16)) float temp[MAXCURVES]; for (i = 0; i < count; i++) temp = Frequencies * xx + C1; vdSin(temp, temp, count); for (i = 0; i < count; i++) SearchSpace += C2 * temp + C3; This might be faster than letting the compiler vectorize with SVML. Regards, --Mike
0 Kudos
Michael_S_Intel8
Employee
905 Views

Sorry that last post came out kind of messy but hopefully you get the idea.

-Mike

0 Kudos
gordan
Beginner
905 Views
I guessed that the loop had trip count of 640,

#include

#define MAXCURVES 640

#define MAX_TURN_DISTANCE 640



640 is an order of magnitude out, but my understanding is that

even a trip count of 4 can be sufficient for effective vectorization.


MAXCURVES is 16 (but it's typically between 4 and 8, 16 is

just the hard limit, actual number is CurvesC).


MAX_TURN_DISTANCE is 64. I guess this makes it reasonable to

just tell the compiler to vectorize anyway and be done with it.

0 Kudos
Lars_Petter_E_
Beginner
905 Views

Hello,

Strange, the code always vectorizes in my simple test project, even with trip count as low as 4.I am using Visual Studio 2005 with Intel C++ 10.027.

Best Regards,

Lars Petter Endresen

0 Kudos
gordan
Beginner
905 Views
That may be because you are actually feeding in data from an aligned array. I was compiling to a library. The real data probably convinces the compiler that there's enough trip counts to make it worthwhile. Also note that the latest v10 for Linux is 10.0.26, so it's possible that something was fixed in 10.0.27...
0 Kudos
Michael_S_Intel8
Employee
905 Views

Does your testcase include the C++ class structure?As I wrote in the earlier post,classvariable references can inhibit vectorization. You can also use "#pragma loop count (x)" to give the compiler a hint about loop trip counts.

-Mike

0 Kudos
Reply