I am using an intel xeon machine with id Family 6 and model 45(2D). Is there some way to distinguish between partial and complete data line reads?
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Order Number 325384-047US, June 2013, Vol 3b, page 18-42 lists that in offcore event monitoring, if 0th bit of offcore response MSR (corresponding to address 0x1A6) is turned on, then, counter counts all requests topartial and completecache lines.
Is this the case for all other ways to measure cache misses too? Example LLC references, LLC misses table 19.1 same document.
Currently for measuring no of bytes fetched I am doing something like
Total no of bytes fetched by L3 cache = CR * block size
where CR is the value of counter 3 using offcore monitoring, setting request type to DMND_DATA_RD + PF_LLC_DATA_RD and response type to Any and Snoop to SNP_NONE.
Partial cache line reads should be extremely rare. In most system configurations they cannot even be generated as the result of user-mode instructions. They can be generated by kernel or driver code that executes loads to uncached memory-mapped IO space, but these should be infrequent.
For the Xeon E5-2600 (06_2D) processor family it would probably be more accurate to use the performance counters in the uncore to measure traffic between the memory and L3, though you do lose the connection between the traffic and the core that requested the traffic in that case.
Thanks John. I had no way to know that partial cache line reads are rare and can be generated by kernel or driver code. This information solves much of my problem now.
Now that we have installed newer kernel I would be using uncore events for my experiments soon.
Thanks a ton!