- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
A memory location, say X, is cached and modified. Then I want to access the original value of X directly in RAM without interfering the data in the cache. Can I achieve by changing the memory type of X into Uncachable (by setting MTRR)? If not, is it possible at all?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
Thank you very much!
If I am able to get a memory region outside the control of VM manager, how can I set the attribute of this region? I want to load come RAM directly into resigsters while keeping the modified corresponding caches untouched.
Le Guan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This operation would require changing the MTRRs without flushing the caches or having different MTRR values on different cores. Neither are supported operations, so it might work and it might not work. There is also a decent chance of hanging the system while experimenting.
If you have a programmable PCIe device, you might be able to perform a DMA load with the "snoop not required" bit set, but there is an excellent chance that the hardware will ignore you if the MTRR for the region is WB and do the snoop anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Le,
It seems like you are asking if you can have the same physical region of memory defined as WB and uncacheable. Sure you can do this. And you can probably read and modify some data from the WB defined region and then directly access the uncacheable version of the same memory. But your system will probably crash due to cache coherency problems.
Maybe it would be more useful to explain what you are trying to do.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have programed different memory types on two cores using MTRRs, but memory coherency is still maintained as Mr. McCalpin say. That is, on the core with UC type, I always get the modified data in cache of the core with WB type, not the copy in the RAM. I guess this is due to the Self Snoop feature reported by cpuid. Below is from the intel development manual.
Self Snoop. The processor supports the management of conflicting memory types by performing a snoop of its own cache structure for transactions issued to the bus.
Does this implies that caches are always consistent?
Pat, just out of curiosity, I want to dig into the cache details of the Intel CPUs.
Mr. McCalpin, It seems that it's CPU that enforces the cache coherency. Do you think PCIe agent and memory controller make any difference?
Le Guan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Le,
It doesn't sound right to me doing an uncached read would return the in-cache value. Maybe the modified cached value has already been written back to memory? In any case, this is not an area of my expertise so I could be wrong... it has been almost 20 years since I messed with the same physical memory defined with uncached and writeback attributes... and it seems like it always, eventually, crashed the cpu.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Pat,
Here is another experiment using only one core. Maybe this one is more conviencing that the modified cached value has not been written back.
At first, a variable, say X, is assigned 0. Then X is flushed using clflush. Following, X is changed into 1. At this point, X should be in the L1 data cache with modified state of value 1. Immediately, I modified the current core's MTRR to set the physical memory location of X into UC. Finally, X is read and the result is still 1. This is to say, UC memory access does not bypass the cache subsystem.
I'm quite confident about the setting of MTRR because the correctness is verified by memory access timing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We (way, way back) had the memory setup such that we had the same physical memory setup both as uncacheable and writeback at the same time. The default was to load x with writeback. Then, if we added some amount to the address (like 0x4000_0000 on a 32bit system) we'd get the uncacheable version of the memory. But I didn't setup the MTRR registers, some OS guys programmed the registers and I just tested it.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Pat,
Thanks for your information! Sounds interesting! Can you remember the used CPU model? I can run my code on a similar CPU to see if I can get a similar result.
According to what you said, It seems that you did not set MTRR. Whereas you mapped a physical memory into 2 virtual pages and distinguished the 2 pages by setting page attributes (PAT).
Le Guan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pentium Pro chip... I don't know how they programmed the memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not surprised that Intel attempts to maintain coherence in the presence of conflicting memory types, but I am pretty sure that this is clearly labelled as an unsupported configuration.
Given the "self snoop" feature described above, I would expect that PCIe transactions with the "snoop not required" bit set would still snoop if the corresponding MTRR was set to WB.
If you really want to be sneaky, you might try the following on a 2-socket system:
- Assume a PCIe device attached to socket 0
- Set up a memory buffer on socket 0
- On socket 0, set the MTRRs for that range to UC
- On socket 1, set the MTRRs for that range to WB
- On socket 1, write a value to a cache line in the memory buffer
- Then perform a DMA read from the PCIe device with the "snoop not required" bit set
This configuration might be sneaky enough to inhibit the snoop from being sent from socket 0 to socket 1, but the whole topic is in the "unsupported" area, so it will likely be difficult to get support from the engineers who know how the system actually works at the lowest levels.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergey,
I'm not sure what you mean by 'workaround VM address translation system'. The registers settings are valid, unsupported, settings.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergey,
I'm not sure what you mean by 'workaround VM address translation system'. The registers settings are valid, unsupported, settings.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mr. Kostrov,
All my experiments were done in a Linux machine.
>> workaround Virtual Memory address Translation subsystem
In fact, I do not understand this either. My understanding is that virtual memory is a global configuration(CR0.PG = 0). What you mean by "workaround " should be something like isolating a memory region from OSs and mapping the virtual memory addresses within this range directly as their physical ones.
In Linux, I can manage virtual memory subsystem with full power, so maybe I do not need to 'workaround VM address translation system'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mr. Kostrov,
I searched the origin of the excerpted sentence in http://software.intel.com/en-us/forums/topic/279104. I still cannot be convinced by that configuration, as paging is enabled globally. Unless you disable Virtual Memory temporarily, the driver address must be translated. Do you have any idea about how it is implemented?
>> a Virtual Memory translation needs to be bypassed in order to have the direct access to RAM.
What is the reason for that?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page