Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1736 Discussions

Why does IOTLB invalidation take more than 1k cycles on average?


I have been looking into IOMMU and IOTLB and noticed that the IOTLB invalidation operation in IOMMU takes many CPU clock cycles (average=1680, max=56090, min=516). 

Note that this is purely the time the OS kernel waits till a single IOTLB invalidation is completed by the IOMMU.


I went through the Intel VT-d specification to see a justifiable reason for IOTLB invalidation taking this many clock cycles. I saw that there are page structure caches that cache the intermediate memory accesses when doing a page table walk which should be invalidated when a relevant IOTLB entry gets invalidated. I can understand that this operation may take additional clock cycles.

However, I am not fully convinced that this is the only reason for IOTLB invalidation taking this many clock cycles on average.

Are there any other operations/scenarios that contribute towards the high cycle count  (or high latency) of IOTLB invalidation?


My setup:

CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

OS: Linux 6.2.8


0 Kudos
0 Replies