Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1710 Discussions

Is it possible to access RAM directly while the memory is cached?

le_g_1
New Contributor I
1,080 Views

Hi there,

      A memory location, say X, is cached and modified. Then I want to access the original value of X directly in RAM without interfering the data in the cache. Can I achieve by changing the memory type of X into Uncachable (by setting MTRR)? If not, is it possible at all?

Thanks!

0 Kudos
17 Replies
SergeyKostrov
Valued Contributor II
1,080 Views
It is impossible in an Operating System ( OS ) that uses Virtual Memory ( VM ) manager. However, in case of Windows OSs you could reserve some amount of memory outside of a regular VM manager controlled memory using a special driver and take a look at Windows DDK for an example.
0 Kudos
le_g_1
New Contributor I
1,080 Views

Hi Sergey,

   Thank you very much!

   If I am able to get a memory region outside the control of VM manager, how can I set the attribute of this region? I want to load come RAM directly into resigsters while keeping the modified corresponding caches untouched.

Le Guan

0 Kudos
McCalpinJohn
Honored Contributor III
1,080 Views

This operation would require changing the MTRRs without flushing the caches or having different MTRR values on different cores.  Neither are supported operations, so it might work and it might not work.   There is also a decent chance of hanging the system while experimenting.

If you have a programmable PCIe device, you might be able to perform a DMA load with the "snoop not required" bit set, but there is an excellent chance that the hardware will ignore you if the MTRR for the region is WB and do the snoop anyway.

0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

Hello Le,

It seems like you are asking if you can have the same physical region of memory defined as WB and uncacheable. Sure you can do this. And you can probably read and modify some data from the WB defined region and then directly access the uncacheable version of the same memory. But your system will probably crash due to cache coherency problems.

Maybe it would be more useful to explain what you are trying to do.

Pat

0 Kudos
le_g_1
New Contributor I
1,080 Views

Hi all,

I have programed different memory types on two cores using MTRRs, but memory coherency is still maintained as Mr. McCalpin say. That is, on the core with UC type, I always get the modified data in cache of the core with WB type, not the copy in the RAM. I guess this is due to the Self Snoop feature reported by cpuid. Below is from the intel development manual.

      Self Snoop. The processor supports the management of conflicting memory types by performing a snoop of its own cache structure for transactions issued to the bus.

     Does this implies that caches are always consistent?

      Pat, just out of curiosity, I want to dig into the cache details of the Intel CPUs.

      Mr. McCalpin, It seems that it's CPU that enforces the  cache coherency. Do you think PCIe agent and memory controller make any difference?

Le Guan

0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

Hello Le,

It doesn't sound right to me doing an uncached read would return the in-cache value. Maybe the modified cached value has already been written back to memory? In any case, this is not an area of my expertise so I could be wrong... it has been almost 20 years since I messed with the same physical memory defined with uncached and writeback attributes... and it seems like it always, eventually, crashed the cpu.

Pat

0 Kudos
le_g_1
New Contributor I
1,080 Views

Hello Pat,

Here is another experiment using only one core. Maybe this one is more conviencing that the modified cached  value has not been written back.

At first, a variable, say X, is assigned 0. Then X is flushed using clflush. Following, X is changed into 1. At this point, X should be in the L1 data cache with modified state of value 1. Immediately, I modified the current core's MTRR to set the physical memory location of X into UC. Finally, X is read and the result is still 1. This is to say, UC memory access does not bypass the cache subsystem.

   I'm quite confident about  the setting of MTRR because the correctness is verified by memory access timing.

0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

We (way, way back) had the memory setup such that we had the same physical memory setup both as uncacheable and writeback at the same time. The default was to load x with writeback. Then, if we added some amount to the address (like 0x4000_0000 on a 32bit system) we'd get the uncacheable version of the memory. But I didn't setup the MTRR registers, some OS guys programmed the registers and I just tested it.

Pat

0 Kudos
le_g_1
New Contributor I
1,080 Views

Hello Pat,

Thanks for your information! Sounds interesting! Can you remember the used CPU model? I can run my code on a similar CPU to see if I can get a similar result.

According to what you said, It seems that you did not set MTRR. Whereas you mapped a physical memory into 2 virtual pages and distinguished the 2 pages by setting page attributes (PAT).

Le Guan

0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

Pentium Pro chip... I don't know how they programmed the memory.

0 Kudos
McCalpinJohn
Honored Contributor III
1,080 Views

I am not surprised that Intel attempts to maintain coherence in the presence of conflicting memory types, but I am pretty sure that this is clearly labelled as an unsupported configuration.

Given the "self snoop" feature described above, I would expect that PCIe transactions with the "snoop not required" bit set would still snoop if the corresponding MTRR was set to WB.

If you really want to be sneaky, you might try the following on a 2-socket system:

  1. Assume a PCIe device attached to socket 0
  2. Set up a memory buffer on socket 0
  3. On socket 0, set the MTRRs for that range to UC
  4. On socket 1, set the MTRRs for that range to WB
  5. On socket 1, write a value to a cache line in the memory buffer
  6. Then perform a DMA read from the PCIe device with the "snoop not required" bit set

This configuration might be sneaky enough to inhibit the snoop from being sent from socket 0 to socket 1, but the whole topic is in the "unsupported" area, so it will likely be difficult to get support from the engineers who know how the system actually works at the lowest levels.

0 Kudos
SergeyKostrov
Valued Contributor II
1,080 Views
>>... If I am able to get a memory region outside the control of VM manager, how can I set the attribute of this region?.. I simply would like to understand what operating system do you use? Thanks in advance. To all the rest who responded: Does it mean that all these register(s) manipulations could workaround Virtual Memory address Translation subsystem?
0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

Hello Sergey,

I'm not sure what you mean by 'workaround VM address translation system'. The registers settings are valid, unsupported, settings.

Pat

0 Kudos
Patrick_F_Intel1
Employee
1,080 Views

Hello Sergey,

I'm not sure what you mean by 'workaround VM address translation system'. The registers settings are valid, unsupported, settings.

Pat

0 Kudos
le_g_1
New Contributor I
1,080 Views

Hi Mr. Kostrov,

All my experiments were done in a Linux machine.

>> workaround Virtual Memory address Translation subsystem

In fact, I do not understand this either. My understanding is that virtual memory is a global configuration(CR0.PG = 0). What you mean by "workaround " should be something like isolating a memory region from OSs and mapping the virtual memory addresses within this range directly as their physical ones. 

In Linux, I can manage virtual memory subsystem with full power, so maybe I do not need to  'workaround VM address translation system'.

0 Kudos
SergeyKostrov
Valued Contributor II
1,080 Views
Hi everybody, >>...What you mean by "workaround " should be something like isolating a memory region from OSs... Yes. >>...and mapping the virtual memory addresses within this range directly as their physical ones... No. MEMIO example in the Windows DDK could give you some additional information and please take a look ( sorry for a Windows example ). Also, some time ago I've created a thread: Forum Topic: Measuring Memory Bandwidth of Non-NT ( Non-Virtual ) memory Web-link: http://software.intel.com/en-us/forums/topic/279104 and please take a look. I understand that these details do not help you too much but my point of view is the same: Even on a Linux platform a Virtual Memory translation needs to be bypassed in order to have the direct access to RAM.
0 Kudos
le_g_1
New Contributor I
1,080 Views

Hi Mr. Kostrov,

I searched the origin of the excerpted sentence in http://software.intel.com/en-us/forums/topic/279104. I still cannot be convinced by that configuration, as paging is enabled globally. Unless you disable Virtual Memory temporarily, the driver address must be translated. Do you have any idea about how it is implemented?

>> a Virtual Memory translation needs to be bypassed in order to have the direct access to RAM.

What is the reason for that?

0 Kudos
Reply