- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel cores used to fetch aligned 16 bytes per cycle from instruction cache; hence, Intel recommended to align branch targets to 16-byte boundaries. However, Golden Cove increased the fetch bandwidth to 32 bytes per cycle. I was wondering about its implications on branch target alignment. Do the branch targets now need to be aligned to 32-byte boundaries or does Golden Cove fetch “unaligned” 32 bytes per cycle?
Best,
Rakesh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello RakeshD,
Thank you for posting in the communities. To explain this to you in detail, may I please know the specific model of your CPU? I will be waiting for your reply. Thank you.
Ramyer M.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ramyer,
The question was not about a particular product, rather the generic microarchitecture.
Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1 (https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html) mentions in Section 2.3.1 that Golden Cove fetch bandwidth in increased from 16 to 32 bytes/cycle. Further, Section 3.4.1.4 recommends to align branch targets to 16 byte boundaries. So, I assume that the 32 byte fetch (16 byte in earlier microarchitectures) has to be aligned. Is that correct?
I was also wondering why the fetch from instruction cache must be aligned to 16-byte boundaries, especially when the data-cache does not have this constraint? A downside of this constraint is that a 16-byte fetch needs to be split into two fetch requests if it crosses a 16-byte boundary even within the same 64-byte cache block. For example, if we want to fetch byte_8 to byte_23 (16 bytes) from an instruction cache block, we need to make two cache accesses: first fetching byte_0 to byte_15 in one cycle and then byte_16 to byte_31 in the next cycle. However, if the instruction cache allows to cross 16-byte boundaries, just like the data cache, we need only one cycle to fetch these bytes.
Thanks,
Rakesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello RakeshD,
Thank you for sharing this information. I will coordinate this internally with our team so we can answer your inquiry. Rest assured that I will keep this thread updated once the information is already available. Thank you for your patience and cooperation.
Ramyer M.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello RakeshD,
We appreciate your patience and apologize for the extended wait.
After a comprehensive review, we must inform you that we are currently unable to provide an official response regarding your query on GOLDEN COVE MICROARCHITECTURE. For the most up-to-date information, we invite you to check our Where to Find Intel® Product Roadmaps article.
As a result, we will be closing this inquiry. Should you require additional support in the future, please feel free to submit a new question, as we will no longer be monitoring this thread.
Best regards,
Norman S.
Intel Customer Support Engineer

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page