We are working on a new research operating system. To do message passing, we use different mechanisms, including polling, IPIs, and monitor/mwait. To benchmark the performance, we send a ping-pong message between two processes running on two different cores, and count the number of cycles for this round-trip message on sender core. The thing that confuses us is that it seems monitor/mwait's performance differs few hundred cycles if we change the address of monitor area. I have to mention that we use WriteBack cache policy, and the processor is Intel(R) Xeon(R) CPU E31270 @ 3.40GHz which is not NUMA. We used two addresses which are relatively close to each other. first one was 0x100, and the second one 0xA0800.
Is monitor/mwait performance address dependent?
Are these physical addresses or virtual addresses?
Do you "touch" memory in the same page as the monitored location prior to entering monitor in both cases? (i.e. preload page table in event not loaded)
Can you setup a test to use addresses 0x100 and 0xA0100? (i.e. same relative offset within a page).
Presumably one logical processor is monitoring 0x100 and a different logical processor is monitoring 0xA0800, and each have a monotor "window" within one cache line. Hamid, would you comment on this?