A memory location, say X, is cached and modified. Then I want to access the original value of X directly in RAM without interfering the data in the cache. Can I achieve by changing the memory type of X into Uncachable (by setting MTRR)? If not, is it possible at all?
Thank you very much!
If I am able to get a memory region outside the control of VM manager, how can I set the attribute of this region? I want to load come RAM directly into resigsters while keeping the modified corresponding caches untouched.
This operation would require changing the MTRRs without flushing the caches or having different MTRR values on different cores. Neither are supported operations, so it might work and it might not work. There is also a decent chance of hanging the system while experimenting.
If you have a programmable PCIe device, you might be able to perform a DMA load with the "snoop not required" bit set, but there is an excellent chance that the hardware will ignore you if the MTRR for the region is WB and do the snoop anyway.
It seems like you are asking if you can have the same physical region of memory defined as WB and uncacheable. Sure you can do this. And you can probably read and modify some data from the WB defined region and then directly access the uncacheable version of the same memory. But your system will probably crash due to cache coherency problems.
Maybe it would be more useful to explain what you are trying to do.
I have programed different memory types on two cores using MTRRs, but memory coherency is still maintained as Mr. McCalpin say. That is, on the core with UC type, I always get the modified data in cache of the core with WB type, not the copy in the RAM. I guess this is due to the Self Snoop feature reported by cpuid. Below is from the intel development manual.
Self Snoop. The processor supports the management of conflicting memory types by performing a snoop of its own cache structure for transactions issued to the bus.
Does this implies that caches are always consistent?
Pat, just out of curiosity, I want to dig into the cache details of the Intel CPUs.
Mr. McCalpin, It seems that it's CPU that enforces the cache coherency. Do you think PCIe agent and memory controller make any difference?
It doesn't sound right to me doing an uncached read would return the in-cache value. Maybe the modified cached value has already been written back to memory? In any case, this is not an area of my expertise so I could be wrong... it has been almost 20 years since I messed with the same physical memory defined with uncached and writeback attributes... and it seems like it always, eventually, crashed the cpu.
Here is another experiment using only one core. Maybe this one is more conviencing that the modified cached value has not been written back.
At first, a variable, say X, is assigned 0. Then X is flushed using clflush. Following, X is changed into 1. At this point, X should be in the L1 data cache with modified state of value 1. Immediately, I modified the current core's MTRR to set the physical memory location of X into UC. Finally, X is read and the result is still 1. This is to say, UC memory access does not bypass the cache subsystem.
I'm quite confident about the setting of MTRR because the correctness is verified by memory access timing.
We (way, way back) had the memory setup such that we had the same physical memory setup both as uncacheable and writeback at the same time. The default was to load x with writeback. Then, if we added some amount to the address (like 0x4000_0000 on a 32bit system) we'd get the uncacheable version of the memory. But I didn't setup the MTRR registers, some OS guys programmed the registers and I just tested it.
Thanks for your information! Sounds interesting! Can you remember the used CPU model? I can run my code on a similar CPU to see if I can get a similar result.
According to what you said, It seems that you did not set MTRR. Whereas you mapped a physical memory into 2 virtual pages and distinguished the 2 pages by setting page attributes (PAT).
I am not surprised that Intel attempts to maintain coherence in the presence of conflicting memory types, but I am pretty sure that this is clearly labelled as an unsupported configuration.
Given the "self snoop" feature described above, I would expect that PCIe transactions with the "snoop not required" bit set would still snoop if the corresponding MTRR was set to WB.
If you really want to be sneaky, you might try the following on a 2-socket system:
This configuration might be sneaky enough to inhibit the snoop from being sent from socket 0 to socket 1, but the whole topic is in the "unsupported" area, so it will likely be difficult to get support from the engineers who know how the system actually works at the lowest levels.
Hi Mr. Kostrov,
All my experiments were done in a Linux machine.
>> workaround Virtual Memory address Translation subsystem
In fact, I do not understand this either. My understanding is that virtual memory is a global configuration(CR0.PG = 0). What you mean by "workaround " should be something like isolating a memory region from OSs and mapping the virtual memory addresses within this range directly as their physical ones.
In Linux, I can manage virtual memory subsystem with full power, so maybe I do not need to 'workaround VM address translation system'.
Hi Mr. Kostrov,
I searched the origin of the excerpted sentence in http://software.intel.com/en-us/forums/topic/279104. I still cannot be convinced by that configuration, as paging is enabled globally. Unless you disable Virtual Memory temporarily, the driver address must be translated. Do you have any idea about how it is implemented?
>> a Virtual Memory translation needs to be bypassed in order to have the direct access to RAM.
What is the reason for that?