Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Performance of YASK with different snoop configuration modes on Xeons

Michael_T_
Beginner
706 Views

I was wondering how does performance of the YASK stencil benchmarks varies based on different snoop configuration modes for Haswells or Broadwells ? Early-snoop, vs Home-snoop vs Cluster-onDie ?

Thanks,

Michael

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
706 Views

I have not run these benchmarks, but most stencil operations are bandwidth-limited, so they will benefit from the higher bandwidth of "Home Snoop" vs "Early Snoop".  If the implementation is NUMA-friendly, then "cluster-on-die" should provide an additional benefit.

The local bandwidth difference between "Home Snoop" and "Early Snoop" is not large, but there is a very big difference in remote bandwidth on  the systems I have tested (mostly Xeon E5 v3 "Haswell EP").    The attached chart shows results I obtained using the Intel Memory Latency Checker on a 2-socket Xeon E5-2660 v3 system --- NOTE that these are REMOTE bandwidth numbers only -- the local bandwidth numbers are much, much closer.

HSW-EP_RemoteBW-vs-SnoopMode_v2.png

0 Kudos
Reply