- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I read somewhere that the new processors include special instructions for small lookup tables. Is there a way to optimize the following simple operation:
float data[10] = {0, ...9}
unsigned int idx[10] = {2,3,5,0,...9} // Arbitrary permutation of 0..9
float result[10];
result = data[idx]
I have to do this operation often and it takes quite a bit of time in a 'for' loop. Currently
for (int i=0;i<10;i++) result=data[idx];
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have yiou checked to see if your compiler has the auto-vectorizer turned on? That will probably help you a lot. Since there are only 10 elements in the loop the overhead of threading the function may make the performance worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 10 element array was just an example. Actual arrays may have 1000 elements. I believe icpc will auto-vectorize when /O3 switch is used.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page