- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey, I've got a tricky performance snag on my new Intel Xeon cluster running that super-optimized HPC workload (using VNNI for INT8 gains).
It's fast, but sometimes it just stalls out. I think it might be due to cache-line contention.
So I've been wondering:
If I only had Intel V-Tune and access to raw PMT data, how to approach this?
What are the top two or three PMU events to profile in V-Tune to confirm it's actually cache contention and not just bad instruction flow?
How to use the PMT data (power, thermals, etc.) to definitively tell us, 'No, this isn't contention, the platform is just throttling the clock speed'?
And finally, say the real fix is a vendor-pushed microcode patch for better VNNI instruction scheduling. How do I ensure that the critical little binary file is legit before the CPU loads it, touching on code signing and Intel TXT?
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page