Mobile and Desktop Processors
Intel® Core™ processors, Intel Atom® processors, tools, and utilities
Announcements
For support on Altera products please visit the Altera Community Forums.
17248 Discussions

Branch target address alignment on Golden Cove

RakeshD
Novice
2,886 Views

Intel cores used to fetch aligned 16 bytes per cycle from instruction cache; hence, Intel recommended to align branch targets to 16-byte boundaries. However, Golden Cove increased the fetch bandwidth to 32 bytes per cycle. I was wondering about its implications on branch target alignment. Do the branch targets now need to be aligned to 32-byte boundaries or does Golden Cove fetch “unaligned” 32 bytes per cycle?

 

Best,

Rakesh

0 Kudos
4 Replies
RamyerM_Intel
Moderator
2,838 Views

Hello RakeshD, 


Thank you for posting in the communities. To explain this to you in detail, may I please know the specific model of your CPU? I will be waiting for your reply. Thank you. 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
RakeshD
Novice
2,829 Views

Hello Ramyer,

 

The question was not about a particular product, rather the generic microarchitecture. 

 

Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1 (https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html) mentions in Section 2.3.1 that Golden Cove fetch bandwidth in increased from 16 to 32 bytes/cycle. Further, Section 3.4.1.4 recommends to align branch targets to 16 byte boundaries. So, I assume that the 32 byte fetch (16 byte in earlier microarchitectures) has to be aligned. Is that correct? 

 

I was also wondering why the fetch from instruction cache must be aligned to 16-byte boundaries, especially when the data-cache does not have this constraint? A downside of this constraint is that a 16-byte fetch needs to be split into two fetch requests if it crosses a 16-byte boundary even within the same 64-byte cache block. For example, if we want to fetch byte_8 to byte_23 (16 bytes) from an instruction cache block, we need to make two cache accesses: first fetching byte_0 to byte_15 in one cycle and then byte_16 to byte_31 in the next cycle. However, if the instruction cache allows to cross 16-byte boundaries, just like the data cache, we need only one cycle to fetch these bytes. 

 

Thanks,

Rakesh

0 Kudos
RamyerM_Intel
Moderator
2,765 Views

Hello RakeshD, 


Thank you for sharing this information. I will coordinate this internally with our team so we can answer your inquiry. Rest assured that I will keep this thread updated once the information is already available. Thank you for your patience and cooperation. 


Ramyer M.

Intel Customer Support Technician 



0 Kudos
NormanS_Intel
Moderator
2,314 Views

Hello RakeshD,


We appreciate your patience and apologize for the extended wait.


After a comprehensive review, we must inform you that we are currently unable to provide an official response regarding your query on GOLDEN COVE MICROARCHITECTURE. For the most up-to-date information, we invite you to check our Where to Find Intel® Product Roadmaps article.


As a result, we will be closing this inquiry. Should you require additional support in the future, please feel free to submit a new question, as we will no longer be monitoring this thread.


Best regards,

Norman S.

Intel Customer Support Engineer


0 Kudos
Reply