Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Michael4
Beginner
132 Views

IPIs and weak memory ordering

Hi all!

I was wondering if it was possible for an IPI to "overtake" a memory write.

For example:
1. CPU A writes some global variable (and the write happens to stay in the store buffer for a long time)
2. CPU A sends an IPI to CPU B
3. CPU B's IPI ISR reads the global variable

Is it theoretically possible in this scenario that the store buffer of CPU A has not been drained to the cache/memory when CPU B takes the interrupt and thus reads an old value of the variable?
I.e. is an explicit synchronisation instruction needed?

I couldn't find any information on that in chapter 8.2 (Memory Ordering) of the Software Developer's Manual Vol. 3. And while chapter 11.10 (Store Buffer) says that the store buffer is drained whenever an "exception or interrupt is generated", I suspect this only refers to the CPU receiving the interrupt, not the one sending it.

Cheers
Michael
0 Kudos
4 Replies
Dmitry_Vyukov
Valued Contributor I
132 Views

I think you may consult Linux kernel sources. As far as I remember, there are no special instructions to ensure memory visibility before sending an IPI for arch/x86. In either case, the instruction that waits for a store buffer to drain is MFENCE.


Dmitry_Vyukov
Valued Contributor I
132 Views

However, that was indeed possible for some architectures in the past, so your concern in not unfounded. Here is an excerpt from the "Is Parallel Programming Hard, And, If So, What Can You Do About It?" book:

C.9 Advice to Hardware Designers
There are any number of things that hardware designers
can do to make the lives of software people
difficult. Here is a list of a few such things that we
have encountered in the past, presented here in the
hope that it might help prevent future such problems:
...
3. Inter-processor interrupts (IPIs) that ignore
cache coherence.
This can be problematic if the IPI reaches its
destination before all of the cache lines in the
corresponding message buffer have been committed
to memory.

jimdempseyatthecove
Black Belt
132 Views

You could use the MFENCE as Dmitriy suggest or if you setup for single producer single consumer messaging you can use a present/taken structure. Sketch follows

message_t* messageAtoB = NULL;

// code on A
void SendMessageToB(message_t* message)
{
// check for prior message not taken
// should seldom occure
while(messageAtoB)
_mm_pause(); // not taken (rework this code for failures)
messageAtoB = message;
IPI(signalB);
}

...

// code on B
message_t* ReadMessageFromA()
{
while(!messageAtoB)
_mm_pause(); // not present(rework this code for failures)
message_t* p =messageAtoB;
messageAtoB = NULL; //A will eventually observe we took the message
return p;
}

Expand the sketch to use a ring buffer and to issue the IPI on first fill.
Also flesh out the error detection for interrupt lost and/or spurrious interrupt assumed.

Note, the above is a sketch and not necessarily the code you would implement.

message_t* messageAtoB = NULL;
message_t* newMessageForB = NULL;
// code on A
void SendMessageToB(message_t* message)
{
// check for prior message not taken
// should seldom occure
while(messageAtoB)
{
if(newMessageForB == NULL)
IPI(signalB);
_mm_pause(); // not taken (rework this code for failures)
}
messageAtoB = message;
if(newMessageForB == NULL)
IPI(signalB);
}

...

// code on B
message_t* newMessageForB = NULL;
IPIscan:
push rax;
...
if(messageAtoB)
{
newMessageForB=messageAtoB;
messageAtoB = NULL; //A will eventually observe we took the message
}
...
pop rax
iret

Something along the above ought to work.

Jim Dempsey
www.quickthreadprogramming.com
Michael4
Beginner
132 Views

Thanks for your replies.
Yes, I have also seen that Linux assumes that such a behaviour is not possible.
Nevertheless, I was wondering if this assumption is justified.
Means: Which part of the Software Developer's Manual guarantees that I'm allowed to assume that?
I suspect this information is missing in the manual and therefore suggest it should be updated.

Just to clarify my interest in this topic:
I'm not just writing some code which I want to work correctly.
I'm developing a formal multiprocessor execution model for x86 CPUs in which I have to formally state whether such a behaviour is possible or not. And I have to justify such a formalisation with a reference to the Software Developer's Manual.
Reply