Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

How does core get data from uncacheable memory?


Hi all,

I am writing a benchmark on my Core i7 sandybridge platform. In which I want to test DRAM by using uncacheable memory(setting a range of pages uncacheable).

To what i know, normally, DRAM bursts data to up layers. But if there is uncacheable memory, does DRAM still work in burst mode? If still in burst mode, does data be stored in somewhere? Or, on cacheable memory, data is sent by single word or byte between core and DRAM.

And, for uncacheable memory, the page tables can still be buffered in TLB, right?

Thank you!



0 Kudos
3 Replies

Hello Zhang,

As I understand uncacheable behavior, if you do a load of 4 bytes, you get the 4 bytes. It is not cached anywhere between memory and the register. If the memory is defined as write-combinable, then when you do stores, if the stores are to consecutive addresses in the same cache line, then the stores will be appended in a write-combine buffer and sent out when the buffer is complete or the next store is not to a consecutive address.

The page tables can still be buffered in the cache.


Black Belt

Most DRAM controllers are set up to only generate 64 Byte reads.  This is the standard mode of operation of a DIMM, which is 64 bits (8 Bytes) wide and transfers data in an 8-element "burst", so 64 Bytes is the minimum access.  Actually it is the only access supported (except for the rarely used "burst chop" function that allows a read to be interrupted after 4 of the 8 transfer cycles).

So if you mark a memory region as uncacheable (using either an MTRR or the PTEs), the memory controller will still read 64 Bytes from the DRAM, but will only return the specifically requested bytes to the processor.  This will be 1, 2, 4, or 8 bytes in most cases.   I recall one Intel document recommending against attempting to use 16 byte reads (128-bit packed SSE loads) for memory-mapped IO, but I don't know if the same restriction applies to uncached DRAM.

It is possible that the DRAM controller could service multiple uncached reads from a buffer holding the 64 Bytes from a single DRAM read, but this would be more an accident of the implementation than a "feature".  Intel's PCM (v2.4 or later) supports DRAM bandwidth measurements on the Core i7, as documented at
The web page indicates that the DRAM_DATA_READS and DRAM_DATA_WRITES count all DRAM CAS accesses (which are by definition 64 Bytes each in the absence of "burst chop"), so you should be able to tell if multiple uncached reads (in the same cache line) are being serviced from a buffer in the memory controller or if they are going all the way out to the DRAM every time.

Black Belt

IIRC correctly uncachaeble stores will not go tyhrough the L1 cache.Such a type of data is non-temporal like frame buffers or MMIO of hardware devices.