- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
The following is the snapshot from VTune on my Haswell processor. However, I don't understand that why the CPU time and the number of instructions retired for the highlighted code (vpbroadcastq) are so significantly greater than the others in the same basic block. I thought the number of the retired instructions should be not too different, though there might be cache misses or TLB misses. Can someone explain some possible reasons for it? Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think this topic is Hardware Event Skid relevant - please see this
You may find performance data refelcting to source line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your link, Peter. But the question is that the highlighted instruction is in the middle of the basic block. Why the recording of the other instructions before/after it is not affected if this is due to hardware event skid?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page