Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1711 Discussions

Does clwb/clflush/clflushopt sequentially write data from cache line back to iMC?

spearNeil
Beginner
1,014 Views

After reading Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 2) Instruction Set Reference, I know that clwb/clflush/clflushopt can write data from cache line back to memory if the data in cache line is modified. But there is no description of what sequence the data in cache line is written back to iMC (memory controller). Typically, in Intel 64 architecture, the cache line size is 64 bytes, but the size of data bus is 8 bytes. So, when using clwb/clflush/clflushopt to write data back to iMC, there must be an order. Is there any document about these instructions?

0 Kudos
2 Replies
spearNeil
Beginner
939 Views

 

After reading Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 2) Instruction Set Reference, I know that clwb/clflush/clflushopt can write data from cache line back to memory if the data in cache line is modified. But there is no description of what sequence the data in cache line is written back to iMC (memory controller). Typically, in Intel 64 architecture, the cache line size is 64 bytes, but the size of data bus is 8 bytes. So, when using clwb/clflush/clflushopt to write data back to iMC, there must be an order. In addition, I think this order is sequential from low address to high address. Only two hardware causes out-of-order memory access in x86 architecture from what I know, i.e. store-buffer and iMC. Moreover, a figure in Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 3) may demonstrate my idea. We can see that there is no other hardware between L3 Cache and iMC. So, with a high probability, the clwb/clflush/clflushopt may write data back to iMC sequentially.

1.png

Is my idea right? Or, Is there any more detailed document about these instructions?

0 Kudos
McCalpinJohn
Honored Contributor III
915 Views

Are you asking about the order of bytes delivered on the DRAM interface within a single cache line, or the order of cache lines being delivered to the DRAM?

 

According to JEDEC Standard No. 79-4A, Table 18, 64-Byte burst writes are always delivered to the DRAM in sequential order from byte 0 to byte 7.   This would only be visible if you had a logic analyzer on the memory interface.

 

(Aside: For reads, the order of byte delivery depends on the low-order bits of the column) address, but that "latency optimization" has not made sense to me for a long time.  If you want byte 7, for example, receiving byte 7 first instead of last saves a maximum of 7 transfer cycles of latency -- that is a whopping 2.2 ns for DDR4/3200.   In many designs there will be no benefit at all because the entire cache line has to be buffered before sending it from the memory controller over the fabric.)

 

For flushes on multiple cache lines there should be no ordering guarantees at the memory controller(s).  (What does ordering even mean across multiple memory controllers and multiple DRAM channels per memory controller?).   For flushes that operate on cache lines in the same 4KiB page that map to the same memory controller and DRAM channel, I would expect the results to show up most of the time in program order, but even that special case could easily have exceptions.

0 Kudos
Reply