This is a Feature Request ( actually suggested by Jim ):
How much time was spent waiting for I/O requests ( related to paging file ) during processing.
Note: We have a discussion on Intel C++ Compiler Forum ( http://software.intel.com/en-us/forums/topic/401303 ) related to processing of very large data sets exceeding size of Physical Memory ( in 3x - 5x ). Or, may be such functionality is already available in Intel PCM?
This is a long story in your issue discussed in Compiler Forum...thank you.
1. If you use lightweight-hotspots (now called advanced-hotspots in U10/U11) and enable "context switches" before collecting data. Change "Hardware Event Sample Counts" as viewpoint in bottom-up report, there is a column named "Wait Time". Wait-time could be caused by many reasons, such as io-wait (File operations, Page walks, Wait message, etc), or thread suspending, etc.
2. If you want to identify issue caused by page walks - suggest to use hardware PMU events, for example:
DTLB_LOAD_MISSES, DTLB_STORE_MISSES ; they are supported in SandyBridge processors
You can use DTLB_STORE_MISSES.WALK_DURATION to measure Cycles of busy during page walks