I have a question regarding the "Retire Stalls" hardware-event metric. Hereit says:
This metric is defined as a ratio of the number of cycles when no micro-operations are retired to all cycles. In the absence of performance issues, long latency operations, and dependency chains, retire stalls are insignificant. Otherwise, retire stalls result in a performance penalty. On Intel microarchitecture codename Nehalem, this metric is based on precise events that do not suffer from significant skid.
From the definition, I would think the ratio should always be less than 1. However, in my Amplifier XE run, I see this number can be as large as 39 in my application.
Could someone shed some light on this number? Thanks a lot!
Sorry that I don't know what predefined analysis type you use, or you created a newanalysis type for PMU event-based sampling collection? If so, I think that event RESOURCE_STALLS.ROB_FULL should be used. This event counts the number of cycles when the number of instructions in the pipeline waiting for retirement reaches the limit the processor (Re-oder Buffer (ROB) is full) can handle. This is a penalty (with cycles) from Resource Stall.
Please describeyour problem in detail, ifpossible. Thank you!