what is the cost of IPI? As far I know, inter-processor interrupts are used to synchronize cache between cores and processors. Such synchronization can be "costly" (my state of knowledge does not allow me to use precise expressions...). However, what is the cost of IPI itself? Is there anything else, besides the cache synchronization, that can trigger IPI?
Please share some information on this topic.
I was following what wikipedia says:
"An inter-processor interrupt (IPI) is a special type of interrupt by which one processor may interrupt another processor in a multiprocessor system. IPIs are typically used to implement a cache coherency synchronization point.
In x86 based systems, an IPI synchronizes the cache and Memory Management Unit (MMU) between processors.
Are you sure that IPIs do not have anything to do with cache synchronization?
Thanks in advance.
Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself.
However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs if you are still interested in them. When I do simple synthetic test on a dual core machine I've obtained following numbers.
420 cycles is the minimum cost of the function on issuing core.
1600 cycles is mean cost of the function on issuing core.
1300 cycles is mean cost of the function on remote core.
Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.
Note "This computer hardware-related article is a stub" at the bottom. It's unclear what is "a cache coherency synchronization point" and what is "to synchronize the cache". To the best of my knowledge there is no such terms.
Perhaps the author means that with IPI one can ensure instruction ordering on remote processor. This does not directly relate to cache-coherency.
And I was wrong saying that user-space level developer does not care about IPIs. Because of the above mentioned application (ensure instruction ordering on remote processor) with IPIs one is able to develop algorithms that draw it's strength from dark side of force. As an example you may see following asymmetric reader-writer mutex algorithm which slaughters all other rw mutexes on read-mostly workload:
It may be implemented with other means, but IPIs are preferable because of their "reactivity".
I'm investigating the IPIs because of following paper:
(which is a successor of following paper: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1409136)
They stated, that setting process and network card affinity causes performance gain because of:
- better cache coherency,
- lower amount of IPIs.
The IPIs have indirect cost of flushing the processor pipeline. Until today I was thinking that most common way of triggering the IPIs (causing pipeline flush) is false sharing on cache. But it turns out that this isn't the truth. Not bad news :)
However, are there any easily abusable ways to trigger the IPIs? It would be good to avoid them, to avoid processor pipeline flushes.
I haven't read the paper (either paper) but I have looked at the appropriate systems programming guide which suggests the common uses for IPIs: startup (SIPIs), self interrupting, and propagating interrupts (either interrupt another processor or allow a processor to forward an interrupt to another processor). It seems logical that setting process and network card affinity would increase the likelihood that the processor that first gets the initial NIC interrupt would be able to handle it rather than deferring to another processor. The recommended uses seem very limited though Dmitriy's observation that Win32 provides a function call could mean all kinds of crazies are using it out there.