- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I was wondering what is the latency and throughput of the vbroadcastsd instruction? (This is
for Sandy Bridge)
I did not find that information in the Optimization Reference Manual.
Thanks!
-Jeremy
I was wondering what is the latency and throughput of the vbroadcastsd instruction? (This is
for Sandy Bridge)
I did not find that information in the Optimization Reference Manual.
Thanks!
-Jeremy
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to Agner Fog's instruction tables it has a 1 / cycle throughput, like every other shuffle type instruction. The latency is mainly determined by the memory access. A load from L1 cache takes 4 cycles, and the broadcast itself should take 1 cycle. There might also be a penalty of 1 or 2 cycles for crossing domains (I don't remember if that applies here). In any case this is a fast instruction for what it's designed to do.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page