We noticed, in dual-socket setup with *"directory mode" disabled*, some write-backs to DRAM of clean cache lines (i.e. that were only read and never modified by anyone). This "issue" is very present in case all the HW prefetchers are enabled and very rare but still present when they are all disabled.
Does anyone have any idea what causes those clean write-backs and how to prevent them ?
I noticed in "Intel Xeon Processor Scalable Memory Family Uncore Performance Monitoring" 2 IDI opcodes that look relevant: WbEFtoI and WbEFtoE.
Of course, I can come up with more details if needed.
My first guess is that these are lines that the L2 was trying to evict to the L3, but which the L3 decided not to cache. We know the behavior of the caches is dynamic, but we don't have much idea of what controls the behavior, and (as this issue suggests), we only have a coarse idea of the options available to the caches.
In general terms, it does not make any difference whether such a clean eviction is dropped or written back to DRAM, but it is certainly possible that some detail of the implementation of the snoop filters or L3 or memory directories makes writing the line all the way back to memory the preferred option.
It is also possible that this occurs for some types of eviction operations and not for others. For example, clean lines evicted from the L2 due to Snoop Filter Evictions may behave differently than clean lines evicted from the L2 due to loads into the L2.
The processor family has the unusual characteristic that the associativity of the victim cache (11-way L3) is lower than the associativity of the Snoop Filter (12-way) or the L2 caches (16-way). This can lead to pathological conflict behavior (e.g., https://sites.utexas.edu/jdm4372/2019/01/07/sc18-paper-hpl-and-dgemm-performance-variability-on-inte...), which could easily result in unexpected cache transactions (either as a deliberate performance mitigation, or as the unexpected occurrence of a "corner case" transaction that exists for obscure reasons known only to the design team).
Thank you for your answer.
Can you expand on why you think that's a L2 cache line not accepted by the L3 ?
I struggle to understand when a useless write-back to DRAM could be the preferred option and why I observe this behavior only in dual-socket. Maybe that's a corner-case that is only visible in this configuration because of timing. And I don't see the point of having a sort of RFO that marks the line as Modified instead of Exclusive, the gain would be null right ?
"It is also possible that this occurs for some types of eviction operations and not for others. " => I agree with you since evicting those cache lines by software does not raise the issue. And disabling the prefetch only mitigates the issue.
"In general terms, it does not make any difference whether such a clean eviction is dropped or written back to DRAM" => I agree with you that it is harmless for the system coherence, but not for us where our in-DRAM processors changed its content and got overwritten by those clean write-backs
I used perf tool to monitor both WbEFtoI and WbEFtoE opcodes and, though I have seen some, they do not seem to be the cause of the problem we encounter.
I'll definitely take a look at your presentation that I came across already, and I will keep digging this issue. If anything comes to your mind, please don't hesitate