- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am trying to use SIMD optimization on a simple vector copy kernel like A = b (both vectors are in global memory). What I found is that when I use SIMD(4)/SIMD(8), the efficient global memory will be increased to 4.3X/8.4X compared with non-optimized codes. But I think in ideal case the improvement will be limited to 4/8 when using SIMD(4)/SIMD(8). Then why the actual improvement I got exceeded the theoretical ideal case? Any suggestion is appreciated. Thanks.Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page