Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Arbitrary interleaver (shuffle) using IPP

egrayver
Beginner
322 Views

I read somewhere that the new processors include special instructions for small lookup tables.  Is there a way to optimize the following simple operation:

float data[10] = {0, ...9}

unsigned int idx[10] = {2,3,5,0,...9} // Arbitrary permutation of 0..9

float result[10];

result = data[idx]

I have to do this operation often and it takes quite a bit of time in a 'for' loop. Currently
for (int i=0;i<10;i++) result=data[idx];

 

 

0 Kudos
2 Replies
Chuck_De_Sylva
Beginner
322 Views
Have yiou checked to see if your compiler has the auto-vectorizer turned on? That will probably help you a lot. Since there are only 10 elements in the loop the overhead of threading the function may make the performance worse.
0 Kudos
egrayver
Beginner
322 Views
The 10 element array was just an example. Actual arrays may have 1000 elements. I believe icpc will auto-vectorize when /O3 switch is used.
0 Kudos
Reply