09-10-2004 01:10 AM
I have some queries regarding theCache-Blocking method for utilizing data locality. I am refering to the following article.
I know thatthe main factors to be taken into consideration are
- the L2 processor's cache size.
- number of iterations / re-use.
- the data block size as compared to the L2 cache size.
Question # 1
However, does anyone know whether the the Sector-Mapping featuresand the Set-Associativityof the L2 cache plays any part ?
Does anyone have any data on this ?
Question # 2
If system has an onboard L3 cache, would it not be advisable to utilize the data localityinside the larger L3 cachethan inside the L2 cache ?
- Mithun Shanbhag
(ps : My system has a 8-way sector-mapped, 512 KB (Unified) L2 cache witha cache line size of 64 bytes. It does not have any L3 cache.)
09-15-2004 09:01 PM
The purpose for the article was to highlight the shared nature of cache hierarchy in processors that support Hyper-Threading Technology. That with two threads sharing the same cache hierarchy, the effective available cache to each logical processor is reduced. An application that is using cache blocking should detect for processors supportingHyper-Threading technology and reduce the block size appropriately.
As a generalguideline to start from, I've recommended that cache blocking techniques target ~50% of the cache size for processors without Hyper-Threading technology enabled. If50% was areasons block size without Hyper-Threading, thenrunning the same application but with 2 threads on a Hyper-Threading enabled processor should target ~25-35% of the cache size. The optimizal cache blocking is highly application dependent and significantly influenced by other processes that may be running as well.Certaintly, the set-associativity plays a part in both the L2 and L3 cache behavior / performance. There are cases where you can effectively increase (or inadvertantly decrease) the cache performanceby utilizing knowledge of the set-associativity and fine tuning the applications access behavior. Unfortunatley, I don't have any specific data on this.
By extension, the cache blocking technique can be applied to the L3 instead of the L2 cache but is again application dependent. Beware that applying the cache blocking technique to L3 cache can run into other performance related bottlenecks. For example, the number of entries in the DTLB may also limit the effect size of the block by causing DTLB misses if the block size is too large. While this isn't as likely with an L2 cache size of 512K, it can be an issue with larger cache sizes found in L3 caches.
I hope this helps.