"Things will appear to be so between P1 and P2, even if a P3 might beg to differ. Forget about "memory": ..."
"Or maybe I should ask in another way, is it possible that we can ensure correctness with atomic
Yes, "msg=14" should reach cache (memory subsystem) earlier than "ready=true" (release semantics for 'ready').
And accordingly on the consumer side request for loading 'ready' should reach cache (memory subsystem) earlier than request for loading 'msg' (acquire semantics for 'ready').
There may be some user-invisible speculations on an implementation level, though.
In general, yes, communication via shared cache can efficient and "not so efficient". The details are involved, it depends on whether a system has exclusive, inclusive or hybrid cache, whether a system has Owner cache state, etc. I'm unable to go that deep, perhaps you will get more definitive answers if you ask over comp.arch.