According to Intel manual, it said "After executing this instruction, the processor does not wait for the external caches to complete their write-back and flushing operations before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache write-back and flush signals. The amount of time or cycles for WBINVD to complete will vary due to size and other factors of different cache hierarchies."
I am wondering is there any method to force the program waits for unit the external cache flushing complete caused by WBINVD instruction? Then we can measure the overhead of WBINVD instruction.