- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have downloaded a 2010 paper by Fritz Gerneth, FIR Filter Algorithm Implementation Using Intel SSE Instructions, targeted for the Atom.
In page 4, there is a brief description of the sum to be implemented and vectorized. The code is as follows:
for ( j = 0; j < 640; j++ ) {
int s = 0; // s = accumulator
for ( i =0; i <= 63; i++ )
s += c * x[i + j]; // x[] = input values, c[] = filter coefficients
y
}
When j is at the last iteration (639) and i is in the second (1), the index in x[i + j] will overflow, as the text says it is 640 input elements that will be filtered. By the time i is in its last iteration (63), we will have x[63 + 639], which is clearly broken.
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page