- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was wondering what the penalty, in clock cycles, is for doing 256-bit loads and stores when there is
a cache line split?
Thanks!
-Jeremy
I was wondering what the penalty, in clock cycles, is for doing 256-bit loads and stores when there is
a cache line split?
Thanks!
-Jeremy
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For Sandy Bridge, the compilers avoid a cache line split on AVX-256 by always splitting explicitly into AVX-128 instructions, which are expected to be faster in that case. You would have to write intrinsics to test it. Your guess about other AVX CPUs is as good as mine.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page