Penalty for 256-bit loads and stores with cache line splits

jeremyweek · ‎05-10-2011

Hi,

I was wondering what the penalty, in clock cycles, is for doing 256-bit loads and stores when there is
a cache line split?

Thanks!

-Jeremy

TimP · ‎05-11-2011

For Sandy Bridge, the compilers avoid a cache line split on AVX-256 by always splitting explicitly into AVX-128 instructions, which are expected to be faster in that case. You would have to write intrinsics to test it. Your guess about other AVX CPUs is as good as mine.