- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
The Intel Manual (Intel 64 and IA-32 Architectures Optimization Reference Manual)
says in Section 2.1.4.3 that the L2 cache can only track 1 memory stream per 4K memory page.
Is this also true for the L1D cache, or can the L1D cache prefetch more than one
memory stream from the same page?
Regards
Chet
The Intel Manual (Intel 64 and IA-32 Architectures Optimization Reference Manual)
says in Section 2.1.4.3 that the L2 cache can only track 1 memory stream per 4K memory page.
Is this also true for the L1D cache, or can the L1D cache prefetch more than one
memory stream from the same page?
Regards
Chet
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see a section number like that in a recent version. You appear to have nearly the same information which I see in a current one, which says the latest model supports 1 forward and 1 backward L2 prefetch stream per 4K page, while L1D supports only 1 forward stream per page.
I guess the one forum where the question would be more topical would be the AVX/instruction set forum, depending on which model you mean.
I guess the one forum where the question would be more topical would be the AVX/instruction set forum, depending on which model you mean.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From my little working experience, it would appear that prefetch to L1D cache is only useful when the TLB cach for the target address is not in cache.
In order for a memory location to be read (or written) the two levels for the Virtual Memory page tables must reside in the TLB caching system (separate from LnD and LnI caches). I've experienced performance degradation when the TLB for that address is already loaded. The TLB cache, caches a small portion of the page tables.
IOW, the prefetch seems to only be effective to loading the TLB cache.
On different archectetures this may not hold.
Jim Dempsey
In order for a memory location to be read (or written) the two levels for the Virtual Memory page tables must reside in the TLB caching system (separate from LnD and LnI caches). I've experienced performance degradation when the TLB for that address is already loaded. The TLB cache, caches a small portion of the page tables.
IOW, the prefetch seems to only be effective to loading the TLB cache.
On different archectetures this may not hold.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reason for terminating these hardware prefetchers at the end of the page is to avoid page miss side effects. If you wish to initiate a prefetch to Xeon TLB cache, so as to get started early on resolution of a possible DTLB miss, you would require a software prefetch, either explicit in source code, or generated by a compiler option. I'm not reading this as part of the original question.
The question does raise interesting consequences. If an array has been modified by a different core, the prefetcher would accelerate getting it back to a core which needs the update. Only 1 forward going array in the page could be so accelerated by hardware prefetch, and an additional backward going array could be accelerated only into L2.
The question does raise interesting consequences. If an array has been modified by a different core, the prefetcher would accelerate getting it back to a core which needs the update. Only 1 forward going array in the page could be so accelerated by hardware prefetch, and an additional backward going array could be accelerated only into L2.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page