I have absolutely verbatim searched everywhere for what the difference is and nobody knows. Some descriptions say it's PCIe related (so I guess DMA traffic from IIO), although there is already a PCIeItoM or PCIItoM, suggesting that it actually originates from cores->LLC rather than LLC->cores. This is supported by OFFCORE_REQUESTS.DEMAND_RFO 'Counts the demand RFO (read for ownership) requests including
regular RFOs, locks, ItoM.' What is the difference between RFO and ItoM sent by a core, and if it's a matter of partial vs full cache line, why does it need to distinguish and what is the benefit of distinguishing these opcodes?
Link Copied
In the Scalable Memory Family Uncore Performance Monitoring Reference Guide, ItoM shows up in five sections -- each of which provides a small clue.
So this event is not like an RFO because it does not request a copy of the cache line.
It is not exactly like a streaming store because the requesting agent is going to retain the data in a cache (either a processor cache or the specialized IO Directory Cache).
This looks a lot like what any other protocol would use for "upgrade" requests. E.g., a data cache line was loaded in S state and now you want to write to it. You don't need to read the data again, but you do need to invalidate the line in any other caches and make sure that any directories (e.g., Snoop Filter, Memory Directory) track the line as M state.
I don't see any other transactions in Table 3-1 that look like upgrades, but it would be relatively easy to misunderstand what is being presented. Testing these hypotheses is mostly straightforward, but tedious....
In the Scalable Memory Family Uncore Performance Monitoring Reference Guide, ItoM shows up in five sections -- each of which provides a small clue.
So this event is not like an RFO because it does not request a copy of the cache line.
It is not exactly like a streaming store because the requesting agent is going to retain the data in a cache (either a processor cache or the specialized IO Directory Cache).
This looks a lot like what any other protocol would use for "upgrade" requests. E.g., a data cache line was loaded in S state and now you want to write to it. You don't need to read the data again, but you do need to invalidate the line in any other caches and make sure that any directories (e.g., Snoop Filter, Memory Directory) track the line as M state.
I don't see any other transactions in Table 3-1 that look like upgrades, but it would be relatively easy to misunderstand what is being presented. Testing these hypotheses is mostly straightforward, but tedious....
Oh I see now.. it's assumed that the full line is going to be written to so it doesn't need a copy of the data already in the line, and it already has the data if it's in any other state (S, E, M). A theoretical StoI is the same thing as an RFO, same for E, all except for I, where ItoM and RFO differs in that the LLC doesn't need to send the data to the core for an ItoM. The name emphasises only the state changes. How it knows the whole line is going to be written to by stores I dont know.. maybe the L1d cache can squash a bunch of sequential senior stores in the MOB all at once while it allocates a LFB, because the RFO is sent immediately upon allocation I thought (and then retires them all once the RFO arrives). I guess it has some further time for stores to arrive in the LFB (L2 lookup) before the opcode has to be generated
There are several cases in which the processor knows that the full line is going to be overwritten, but this can depend a lot on the implementation.
For more complete information about compiler optimizations, see our Optimization Notice.