- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After reading Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 2) Instruction Set Reference, I know that clwb/clflush/clflushopt can write data from cache line back to memory if the data in cache line is modified. But there is no description of what sequence the data in cache line is written back to iMC (memory controller). Typically, in Intel 64 architecture, the cache line size is 64 bytes, but the size of data bus is 8 bytes. So, when using clwb/clflush/clflushopt to write data back to iMC, there must be an order. Is there any document about these instructions?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After reading Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 2) Instruction Set Reference, I know that clwb/clflush/clflushopt can write data from cache line back to memory if the data in cache line is modified. But there is no description of what sequence the data in cache line is written back to iMC (memory controller). Typically, in Intel 64 architecture, the cache line size is 64 bytes, but the size of data bus is 8 bytes. So, when using clwb/clflush/clflushopt to write data back to iMC, there must be an order. In addition, I think this order is sequential from low address to high address. Only two hardware causes out-of-order memory access in x86 architecture from what I know, i.e. store-buffer and iMC. Moreover, a figure in Intel 64 and IA-32 Architectures Software Developer's Manual (Volume 3) may demonstrate my idea. We can see that there is no other hardware between L3 Cache and iMC. So, with a high probability, the clwb/clflush/clflushopt may write data back to iMC sequentially.
Is my idea right? Or, Is there any more detailed document about these instructions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you asking about the order of bytes delivered on the DRAM interface within a single cache line, or the order of cache lines being delivered to the DRAM?
According to JEDEC Standard No. 79-4A, Table 18, 64-Byte burst writes are always delivered to the DRAM in sequential order from byte 0 to byte 7. This would only be visible if you had a logic analyzer on the memory interface.
(Aside: For reads, the order of byte delivery depends on the low-order bits of the column) address, but that "latency optimization" has not made sense to me for a long time. If you want byte 7, for example, receiving byte 7 first instead of last saves a maximum of 7 transfer cycles of latency -- that is a whopping 2.2 ns for DDR4/3200. In many designs there will be no benefit at all because the entire cache line has to be buffered before sending it from the memory controller over the fabric.)
For flushes on multiple cache lines there should be no ordering guarantees at the memory controller(s). (What does ordering even mean across multiple memory controllers and multiple DRAM channels per memory controller?). For flushes that operate on cache lines in the same 4KiB page that map to the same memory controller and DRAM channel, I would expect the results to show up most of the time in program order, but even that special case could easily have exceptions.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page