Following command can be used to read raw hardware performance counter registers for an application.
perf stat -e r01A2,r08A2 <Application>
event code A2 with event mask 0x01 records resource related stall cycles
whereas with event mask 0x08 records cycles stalled due to no store buffer available.
My question is while earlier intel processors, broadwell family, supported event mask 0x02, 0x04 and 0x10 to measure few more kind of stalls, they are not shown to be supported by skylake family of processor. Still measuring on skylake family processor using command
perf stat -e r04A2,r02A2,r10A2 <Application> throws some number.
I would like to know whether those numbers correspond to earlier supported kind of stalls.
Unfortunately, there are a lot of possibilities here....
I could probably come up with specific examples of each of these cases from either my analysis of Intel processors or my prior experience in the design teams at SGI, IBM, and AMD, but then I would have to spend too much time thinking about confidentiality issues....
When I see cases like these, I try to develop directed tests that generate a known (or otherwise measurable) amount of activity in the event, and test related events to see if counts with the various Umasks add up to all counts for this event or all counts for this activity measured by a different event.
Sometimes interpretation of the Umasks can be done quickly and unambiguously
Sometimes it can be quickly proven that the undocumented Umasks are broken (at least with respect to the activity for which the Umask used to be associated).
Sometimes the results don't make any sense, and you have to decide whether to keep on looking for a plausible & provable interpretation, or whether you need to move on to more productive work. I often choose poorly in these cases.