Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1711 Discussions

Performance of YASK with different snoop configuration modes on Xeons

Michael_T_
Beginner
505 Views

I was wondering how does performance of the YASK stencil benchmarks varies based on different snoop configuration modes for Haswells or Broadwells ? Early-snoop, vs Home-snoop vs Cluster-onDie ?

Thanks,

Michael

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
505 Views

I have not run these benchmarks, but most stencil operations are bandwidth-limited, so they will benefit from the higher bandwidth of "Home Snoop" vs "Early Snoop".  If the implementation is NUMA-friendly, then "cluster-on-die" should provide an additional benefit.

The local bandwidth difference between "Home Snoop" and "Early Snoop" is not large, but there is a very big difference in remote bandwidth on  the systems I have tested (mostly Xeon E5 v3 "Haswell EP").    The attached chart shows results I obtained using the Intel Memory Latency Checker on a 2-socket Xeon E5-2660 v3 system --- NOTE that these are REMOTE bandwidth numbers only -- the local bandwidth numbers are much, much closer.

HSW-EP_RemoteBW-vs-SnoopMode_v2.png

0 Kudos
Reply