Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
1635 Discussions

Performance of YASK with different snoop configuration modes on Xeons

Michael_T_
Beginner
185 Views

I was wondering how does performance of the YASK stencil benchmarks varies based on different snoop configuration modes for Haswells or Broadwells ? Early-snoop, vs Home-snoop vs Cluster-onDie ?

Thanks,

Michael

0 Kudos
1 Reply
McCalpinJohn
Black Belt
185 Views

I have not run these benchmarks, but most stencil operations are bandwidth-limited, so they will benefit from the higher bandwidth of "Home Snoop" vs "Early Snoop".  If the implementation is NUMA-friendly, then "cluster-on-die" should provide an additional benefit.

The local bandwidth difference between "Home Snoop" and "Early Snoop" is not large, but there is a very big difference in remote bandwidth on  the systems I have tested (mostly Xeon E5 v3 "Haswell EP").    The attached chart shows results I obtained using the Intel Memory Latency Checker on a 2-socket Xeon E5-2660 v3 system --- NOTE that these are REMOTE bandwidth numbers only -- the local bandwidth numbers are much, much closer.

HSW-EP_RemoteBW-vs-SnoopMode_v2.png

Reply