- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider the loop:
do i = 1, n
out(i) = out(i) + in( index(i) )
enddo
You can get it to vectorize using -xSSE4.1 but it is still very slow. This is due to the
indirect memory reference.
The loop
do i = 1, n
out(i) = out(i) + in( i )
enddo
Would run 4 or 5 times faster.
What's the most efficient way to perform loops with indirect memory references?
do i = 1, n
out(i) = out(i) + in( index(i) )
enddo
You can get it to vectorize using -xSSE4.1 but it is still very slow. This is due to the
indirect memory reference.
The loop
do i = 1, n
out(i) = out(i) + in( i )
enddo
Would run 4 or 5 times faster.
What's the most efficient way to perform loops with indirect memory references?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try a localized gather without read/modify/write
[cpp]do i = 1, n, 128 jmax = min(128,n-i+1) do j=i, jmax inTemp(j) = in( index(i+j-1) ) enddo do j=i, jmax out(i+j-1) = out(i+j-1) + inTemp( j ) enddo enddo [/cpp]
Jim Dempsey
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SSE4 vectorization speedup for a gather depends a great deal on cache locality. It could do as well as double the speed with good locality, or show no gain with poor locality. If the loop length is on the order of 1000, and there is a fair amount of cache locality so that no thread has to read all cache lines, OpenMP parallel should show a significant gain.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try a localized gather without read/modify/write
[cpp]do i = 1, n, 128 jmax = min(128,n-i+1) do j=i, jmax inTemp(j) = in( index(i+j-1) ) enddo do j=i, jmax out(i+j-1) = out(i+j-1) + inTemp( j ) enddo enddo [/cpp]
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page