- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
My hotspot function in VTune shows that L1 hit ratio is 99% (and L1 misses is 5%). Does that mean that almost all of the data I need is already prefetched into the caches and there is no need for manual prefetching?
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Probably data is aligned and is accessed in linear pattern.For example a small array of 512 doubles.Moreover such loop If I am not wrong when translated to <= 28 uops could be a candidate for LSD.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Yes, and you should really focus on the L2 cache. The penalty for missing L1 is very small. However, if the CPU has to go to L2 and the data is NOT there, then you incur a huge penalty (10x to 20x the L1 penalty) and this is what you want to tune (unless you are on a server chip with L3...). Note: that doesn't mean you have to use manual prefetching! The processor is very good at prefetching by itself and if you start adding prefetching by hand, you could actually reduce your performance. Instead, focus on data layout and getting all the data you need to touch in the same cache line (e.g., members of a struct, touch them all, don't use multiple loops and touch one member in each loop).
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Yes, penalty for an L2 miss is indeed quite high, but doesn't L1 hit ratio being 99% mean that 99% of my data needs are properly prefetched and only 1% is missed, so even if this 1% is missed in L2 as well, its still a trivial number? I'm using it on MIC.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Ah, yes. Sorry. I misread your original post. Your understanding is correct.
