Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Arbitrary interleaver (shuffle) using IPP

egrayver
Beginner
606 Views

I read somewhere that the new processors include special instructions for small lookup tables.  Is there a way to optimize the following simple operation:

float data[10] = {0, ...9}

unsigned int idx[10] = {2,3,5,0,...9} // Arbitrary permutation of 0..9

float result[10];

result = data[idx]

I have to do this operation often and it takes quite a bit of time in a 'for' loop. Currently
for (int i=0;i<10;i++) result=data[idx];

 

 

0 Kudos
2 Replies
Chuck_De_Sylva
Beginner
606 Views
Have yiou checked to see if your compiler has the auto-vectorizer turned on? That will probably help you a lot. Since there are only 10 elements in the loop the overhead of threading the function may make the performance worse.
0 Kudos
egrayver
Beginner
606 Views
The 10 element array was just an example. Actual arrays may have 1000 elements. I believe icpc will auto-vectorize when /O3 switch is used.
0 Kudos
Reply