- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have downloaded a 2010 paper by Fritz Gerneth, FIR Filter Algorithm Implementation Using Intel SSE Instructions, targeted for the Atom.

In page 4, there is a brief description of the sum to be implemented and vectorized. The code is as follows:

for ( j = 0; j < 640; j++ ) {

int s = 0; // s = accumulator

for ( i =0; i <= 63; i++ )

s += c* * x[i + j]; // x[] = input values, c[] = filter coefficients y = s; // y[] = output values}*

When j is at the last iteration (639) and i is in the second (1), the index in x[i + j] will overflow, as the text says it is 640 input elements that will be filtered. By the time i is in its last iteration (63), we will have x[63 + 639], which is clearly broken.

- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization

Link Copied

0 Replies

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page