Multiplying large float array with a scalar float

unpocoloco1 · ‎02-21-2008

Hi all,
I've been using the Intel Math Kernel Library for a full day now, so I apologize if this is a newbie question. Hopefully it is, actually, and there's an easy answer for it.

I have a large (1-D) float array of data. I am running a filter on the data which essentially just modifies each element of data as a sum of current & previous inputs, multiplied by scalar coefficients (filter taps), plus current and previous outputs, multiplied by filter taps. Kind of like this:

// x's are my inputs, y's are my outputs, a's and b's are the coefficients
float* pData = &myLargeFloatArray[0];
float x0=0, x1=0, y0=0, y1=0;
float a0, a1, b0, b1; // Pretend these are filled in to whatever values
for (DWORD i=0; i x0 = pData;
y0 = x0*a0 + x1*a1 + y1*b1;
x1 = x0;
y1 = y0;
}

Okay. Now in the above for loop, this gets VERY inefficient when the array size gets large. I assume the line calculating y0 is especially slow. Is there a more efficient way of doing this, other than element by element?

My thoughts:
1. Maybe there is a function I don't know about that can quickly multiply a scalar times a vector (both floats). vsMul needed two vectors of the same length, and when I tried creating an array containing copies of a0, a1, and b1, I didn't see any performance improvement.
2. Maybe there are some functions that do IIR filtering?

Thanks for your help! I greatly appreciate it.

TimP · ‎02-21-2008

According to what you show, the performance bottleneck would be in the serial dependency in the calculation of y1. I've seen cases where icc would optimize this better with #pragma unroll(4), to cut the time spent on store forwarding.