Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Disabling the Hardware Page Walker

Venkataraman__Madhav
2,039 Views

Hi all,

I am researching some improvements to page table walks. Since I cannot change anything in hardware, I would like to implement my ideas in the OS (Linux) and test them out. To do this, I need to disable the hardware page table walker so the OS can receive TLB miss traps and walk the page table. Then, I can implement the page table in a different way and put in instrumentation to measure the number of walks, etc.

My question is - is there a way to disable the hardware page table walker? I googled this topic. I could only find information on disabling the page walker on Itanium.

If there is no way to disable the walker, would it work if I loaded a dummy page directory in CR3 whose entries are all marked "Not Present"? Will that cause any other side-effects?

Appreciate your time and help in answering this question.

Thanks.

Madhavan

0 Kudos
1 Solution
HadiBrais
New Contributor III
2,039 Views

There is no publicly documented way to disable the page table walker on Intel processors.

If there is no way to disable the walker, would it work if I loaded a dummy page directory in CR3 whose entries are all marked "Not Present"? Will that cause any other side-effects?

Paging structure entries that are marked as "Not Present" are not cached in the MMU. A hardware page table walk is still performed to determine that there is no valid translation and invoke the page fault handler in that case. But then when a page fault occurs, you'll have to somehow bring the target paging structures in the MMU caches. The only way to do this is be causing a page table walk after making the translation valid in the page fault handler. This can be achieved by performing the same memory access operation at the end of the page fault handler. When a page table entry gets cached in the TLBs, it may get evicted explicitly (by the OS using one of the TLB flushing methods) or implicitly (which occurs when all entries in a TLB set are occupied and a new entry needs to be cached). When this occurs, you'll have mark that translation "Not Present" again. This is possible because paging structures are allowed to be incoherent. But the resulting performance would be worse than a processor that actually doesn't support page walking in the hardware.

The bottom line is that you have to use a full system simulator instead.

View solution in original post

0 Kudos
6 Replies
HadiBrais
New Contributor III
2,040 Views

There is no publicly documented way to disable the page table walker on Intel processors.

If there is no way to disable the walker, would it work if I loaded a dummy page directory in CR3 whose entries are all marked "Not Present"? Will that cause any other side-effects?

Paging structure entries that are marked as "Not Present" are not cached in the MMU. A hardware page table walk is still performed to determine that there is no valid translation and invoke the page fault handler in that case. But then when a page fault occurs, you'll have to somehow bring the target paging structures in the MMU caches. The only way to do this is be causing a page table walk after making the translation valid in the page fault handler. This can be achieved by performing the same memory access operation at the end of the page fault handler. When a page table entry gets cached in the TLBs, it may get evicted explicitly (by the OS using one of the TLB flushing methods) or implicitly (which occurs when all entries in a TLB set are occupied and a new entry needs to be cached). When this occurs, you'll have mark that translation "Not Present" again. This is possible because paging structures are allowed to be incoherent. But the resulting performance would be worse than a processor that actually doesn't support page walking in the hardware.

The bottom line is that you have to use a full system simulator instead.

0 Kudos
Venkataraman__Madhav
2,039 Views

I am new to X86 architecture. I did not realize that there is no instruction for the OS to load an entry in the TLB. Only the hardware walker can load an entry in the TLB.

One alternative could be to have the page table but mark all leaf entries not present. Then, when the OS receives a TLB miss exception, it could enable just that one entry. When a TLB miss comes in for another page, mark the previous page as not present and the current one present. This way, there is at the most one present entry in the page table. I would mostly get the behavior I want.

In this experiment, I am not looking for performance. I want to instrument the kernel and measure how many page table accesses are required to translate a VA for different page table structures.

Do you see any problems with the above idea?

Thanks for your input.

Madhavan

0 Kudos
HadiBrais
New Contributor III
2,039 Views

If you don't care about modeling performance, then I think the method of marking pages as "Not Present" should work fine.

Note that you can actually capture every single demand hardware page walk for any translation when it occurs as follows:

  1. Initially, mark all entries as "Not Present." You'll have to maintain a database of entries that would have been "Present" otherwise.
  2. When a page fault occurs, check in your database first and see if the entry should have been present. Otherwise, invoke the normal page fault handler of the OS. Either way, if the result is a valid translation (i.e., not an error), execute a memory access operation that requires that translation (such as a MOV from a virtual address from that page) in the page fault handler itself so that the translation gets cached in the TLBs. Finally, before the page fault handler returns, mark that entry as "Not Present." This would make the in-memory entry and the TLB entry incoherent, but that's OK. When the page fault handler returns to the faulting instruction, a hardware page walk will not occur because the translation can be found in the TLBs. Hence, the faulting instruction will be reexecuted successfully. Marking the entry as "Not Present" before returning from the handler takes care of implicit evictions from the TLBs.
  3. When the OS wants to check for any reason whether a page table entry is valid or not, you'll have to add code to check first whether that entry is in your database. If it is, you'll have to basically make sure that the OS knows that this entry is actually valid.

In this way, you can intercept every single demand hardware page walk. It's worth noting that the following page walks cannot be intercepted using that method:

  • Those from the TLB prefetcher.
  • Those from software prefetching instructions.

In both cases, the translation operation is aborted if a page fault needs to occur.

0 Kudos
Venkataraman__Madhav
2,039 Views

Your suggestion is pretty much what I had in mind. Except that I did not think to access the VA in the handler and mark the entry as not present before returning from the handler. That is a cool suggestion.

Thanks for the help. I have what I need.

0 Kudos
Venkataraman__Madhav
2,039 Views

The only other question I have is this - if the VA is in kernel space, then this will just work. But if it is in user space, then how do I do it? I have to probably look at the code that copies stuff from user space to kernel space on a system call, for instance. On X86, are user accesses permitted from the kernel? I remember reading about a security issue related to that and mitigations for the issue. The access is probably disabled by default. May be, there is a config variable to enable it in Linux?

0 Kudos
HadiBrais
New Contributor III
2,039 Views

I think the easiest way is to disable supervisor-mode access protection (SMAP), kernel page table isolation (KPTI), and memory protection keys (MPK). These can be disabled in Linux using the following kernel parameters: nosmap, nopti, and nopku, respectively. With these features disabled, the kernel can load from any valid user page using the same virtual address that caused the page fault.

0 Kudos
Reply