VT-d conceptual problems....

Tracy_Camp · ‎11-23-2010

I'm working on a hypervisor that is using the EPT mechanism to demand page guest memory in and out of physical memory (over-commit, though thats not exactly my point here). I was hoping to be able to use VT-d to enable the guests to continue to directly 'own' devices. Though to provide some sort of general purpose DMA remapping I (think at least) I need some means to handle arbitrary guest physical addresses that may be involved in a DMA operation that are presently not mapped to physical pages (i.e. I would mark both the EPT and remapping table entries as invalid for a given GPA). However what seems to be prominently missing from VT-d is some mechanism for the CPU to handle a DMA re-mapping table 'page fault' (insert a PCI wait state and interrupt the CPU?) Is my understanding off base? Is there some other solution that I should be persuing for systems like this?

David_O_Intel1 · ‎11-29-2010

Hi Tracy,

I received two sets of comments on this issue.

(1)

"Your understanding is not off base. However, for modern I/O buses/links (like PCI-Express) that supports split transactions, there is really no easy way to insert a PCI wait state and interrupt the CPU. Reason being that, there is no equivalent of wait states (as they are on shared-bus-based protocols) on these message/link based bus protocols, and if the root-complex/chipset held-up the DMA request waiting for CPU to resolve the page-fault and process it, the device would continue to send requests eventually run out of link-level credits. This can introduce protocol-level deadlock/dependency conditions with current PCI-E ordering rules and existing device implementations. Finally, if the CPU had to get the page from disk (via DMA), and the disk itself is behind the same PCI-E root-port, now the DMA requests from disk cannot progress until the prior DMA (including the one which faulted) is resolved, but the one with the page-fault cannot be resolved until the disk DMA/paging is completed, causing deadlocks at functional level.

Given all this, the only way (if at all) to support I/O page-faults is for the page-fault to be detected at the device itself, before it issues the DMA request on the link. To do this, the device has to participate in the address-translation of VT-d, by having a local TLB on the device, and using the PCI-SIG defined Address Translation Services (ATS) protocol to request IOMMU to service local device-TLB misses. You may look at the evolving Address Translation Services version 1.1 specification from PCI-SIG for more details on how TLB misses and page-faults are being defined. But in short summary, until the spec. is mature and device and VT-d implementations start supporting it, there is really no way for hardware to support paging of DMA buffers (be it in guest or host). Until such hardware support is available, you have to resort to software approaches to determine which guest memory may be not DMA targets by assigned device (and hence pageable by VMM), or pin all of guest memory."

(2)

"This question appears to be about theavailability of VT-d page fault. This is something that is not hard for the VT-d engine to implement, but youll need devices to hold their DMA requests. This is not possible now since mostdevicesdon't support it. Perhaps in the future, some moreadvanced devices (e.g.,GPU) may want that

In short, its not doable today. Maybe he can allocate DMA buffers from a special guest memory range, which is surely backed with real physical memory."

David Ott