Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
1572 Discussions

Performance of YASK with different snoop configuration modes on Xeons

Michael_T_
Beginner
138 Views

I was wondering how does performance of the YASK stencil benchmarks varies based on different snoop configuration modes for Haswells or Broadwells ? Early-snoop, vs Home-snoop vs Cluster-onDie ?

Thanks,

Michael

0 Kudos
1 Reply
McCalpinJohn
Black Belt
138 Views

I have not run these benchmarks, but most stencil operations are bandwidth-limited, so they will benefit from the higher bandwidth of "Home Snoop" vs "Early Snoop".  If the implementation is NUMA-friendly, then "cluster-on-die" should provide an additional benefit.

The local bandwidth difference between "Home Snoop" and "Early Snoop" is not large, but there is a very big difference in remote bandwidth on  the systems I have tested (mostly Xeon E5 v3 "Haswell EP").    The attached chart shows results I obtained using the Intel Memory Latency Checker on a 2-socket Xeon E5-2660 v3 system --- NOTE that these are REMOTE bandwidth numbers only -- the local bandwidth numbers are much, much closer.

HSW-EP_RemoteBW-vs-SnoopMode_v2.png

Reply