Impact of h/w counter bugs on general exploration results
I've been testing out the top-down model that the general exploration mode offers on Haswell-EP (E5-2670). I am especially interested in the L3 Bound breakdown. But then I noticed many of the counters used appear to be buggy. Errata HSM26 and HSM30 (http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-mobile-specification-update.pdf) are worrisome. HSM30 appears to only apply to SMT mode, but HSM26 states that counters may undercount by as much as 40%.
I didn't notice any warnings within vtune about such issues. How should I interpret metrics such as "Contested Accesses", "Data Sharing", "L3 Latency" etc. that may be impacted by the various errata?
The main errata I found that applies to top-down analysis:
HSM26: Certain Local Memory Read / Load Retired PerfMon Events May
Undercount (Undercounts up to 40% have been observed - seems this one is the biggest issue)
HSM30: Performance Monitor Counters May Produce Incorrect Results (Seems to be less serious than the above, and can be worked around by disabling SMT in BIOS)
HSM31: Performance Monitor UOPS_EXECUTED Event May Undercount (Not too concerned about this one as it seems rare)
How much should we worry about each one? Running in ST mode also improves multiplexing reliability so that seems to be a smart move. But not sure what do make of HSM26.