Cache-coherency protocols do not use IPIs, and as a user-space level developer you do not care about IPIs at all. One is most interested in the cost of cache-coherency itself.
However, Win32 API provides a function that issues IPIs to all processors (in the affinity mask of the current process) FlushProcessWriteBuffers(). You can use it to investigate the cost of IPIs if you are still interested in them. When I do simple synthetic test on a dual core machine I've obtained following numbers.
420 cycles is the minimum cost of the function on issuing core.
1600 cycles is mean cost of the function on issuing core.
1300 cycles is mean cost of the function on remote core.
Note that, as far as I understand, the function issues IPI to remote core, then remote core acks it with another IPI, issuing core waits for ack IPI and then returns.