<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic IPIs and weak memory ordering in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810698#M991</link>
    <description>However, that was indeed possible for some architectures in the past, so
 your concern in not unfounded. Here is an excerpt from the "Is Parallel Programming Hard, And, If So, What Can You Do About It?" book:&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;C.9 Advice to Hardware Designers&lt;BR /&gt;There are any number of things that hardware designers&lt;BR /&gt;can do to make the lives of software people&lt;BR /&gt;difficult. Here is a list of a few such things that we&lt;BR /&gt;have encountered in the past, presented here in the&lt;BR /&gt;hope that it might help prevent future such problems:&lt;BR /&gt;...&lt;BR /&gt;3. Inter-processor interrupts (IPIs) that ignore&lt;BR /&gt;cache coherence.&lt;BR /&gt;This can be problematic if the IPI reaches its&lt;BR /&gt;destination before all of the cache lines in the&lt;BR /&gt;corresponding message buffer have been committed&lt;BR /&gt;to memory.&lt;BR /&gt;&lt;/I&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 27 May 2010 08:28:27 GMT</pubDate>
    <dc:creator>Dmitry_Vyukov</dc:creator>
    <dc:date>2010-05-27T08:28:27Z</dc:date>
    <item>
      <title>IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810696#M989</link>
      <description>Hi all!&lt;BR /&gt;&lt;BR /&gt;I was wondering if it was possible for an IPI to 
"overtake" a memory write.&lt;BR /&gt;&lt;BR /&gt;For example:&lt;BR /&gt;1. CPU A writes some 
global variable (and the write happens to stay in the store buffer for a long time)&lt;BR /&gt;2. CPU A sends an IPI to CPU B&lt;BR /&gt;3. CPU B's IPI ISR 
reads the global variable&lt;BR /&gt;&lt;BR /&gt;Is it theoretically possible in this 
scenario that the store buffer of CPU A has not been drained to the cache/memory when CPU B takes the interrupt and thus reads an old value of the variable?&lt;BR /&gt;I.e. is an explicit synchronisation instruction needed?&lt;BR /&gt;&lt;BR /&gt;I couldn't find any information on that in chapter 
8.2 (Memory Ordering) of the Software Developer's Manual Vol. 3. And while chapter 11.10 (Store Buffer) says that the store buffer is drained whenever an "exception or interrupt is generated", I suspect this only refers to the CPU receiving the interrupt, not the one sending it.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;Michael&lt;BR /&gt;</description>
      <pubDate>Thu, 27 May 2010 04:52:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810696#M989</guid>
      <dc:creator>Michael4</dc:creator>
      <dc:date>2010-05-27T04:52:44Z</dc:date>
    </item>
    <item>
      <title>IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810697#M990</link>
      <description>I think you may consult Linux kernel sources. As far as I remember, there are no special instructions to ensure memory visibility before sending an IPI for arch/x86. In either case, the instruction that waits for a store buffer to drain is MFENCE.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 27 May 2010 08:26:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810697#M990</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-05-27T08:26:32Z</dc:date>
    </item>
    <item>
      <title>IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810698#M991</link>
      <description>However, that was indeed possible for some architectures in the past, so
 your concern in not unfounded. Here is an excerpt from the "Is Parallel Programming Hard, And, If So, What Can You Do About It?" book:&lt;BR /&gt;&lt;BR /&gt;&lt;I&gt;C.9 Advice to Hardware Designers&lt;BR /&gt;There are any number of things that hardware designers&lt;BR /&gt;can do to make the lives of software people&lt;BR /&gt;difficult. Here is a list of a few such things that we&lt;BR /&gt;have encountered in the past, presented here in the&lt;BR /&gt;hope that it might help prevent future such problems:&lt;BR /&gt;...&lt;BR /&gt;3. Inter-processor interrupts (IPIs) that ignore&lt;BR /&gt;cache coherence.&lt;BR /&gt;This can be problematic if the IPI reaches its&lt;BR /&gt;destination before all of the cache lines in the&lt;BR /&gt;corresponding message buffer have been committed&lt;BR /&gt;to memory.&lt;BR /&gt;&lt;/I&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 27 May 2010 08:28:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810698#M991</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2010-05-27T08:28:27Z</dc:date>
    </item>
    <item>
      <title>IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810699#M992</link>
      <description>You could use the MFENCE as Dmitriy suggest or if you setup for single producer single consumer messaging you can use a present/taken structure. Sketch follows&lt;BR /&gt;&lt;BR /&gt;message_t* messageAtoB = NULL;&lt;BR /&gt;&lt;BR /&gt;// code on A&lt;BR /&gt;void SendMessageToB(message_t* message)&lt;BR /&gt;{&lt;BR /&gt; // check for prior message not taken&lt;BR /&gt; // should seldom occure&lt;BR /&gt; while(messageAtoB)&lt;BR /&gt; _mm_pause(); // not taken (rework this code for failures)&lt;BR /&gt;messageAtoB = message;&lt;BR /&gt; IPI(signalB);&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;// code on B&lt;BR /&gt;message_t* ReadMessageFromA()&lt;BR /&gt;{&lt;BR /&gt; while(!messageAtoB)&lt;BR /&gt; _mm_pause(); // not present(rework this code for failures)&lt;BR /&gt;message_t* p =messageAtoB;&lt;BR /&gt; messageAtoB = NULL; //A will eventually observe we took the message&lt;BR /&gt; return p;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;Expand the sketch to use a ring buffer and to issue the IPI on first fill.&lt;BR /&gt;Also flesh out the error detection for interrupt lost and/or spurrious interrupt assumed.&lt;BR /&gt;&lt;BR /&gt;Note, the above is a sketch and not necessarily the code you would implement.&lt;BR /&gt;&lt;BR /&gt;message_t* messageAtoB = NULL;&lt;BR /&gt;message_t* newMessageForB = NULL;&lt;BR /&gt;// code on A&lt;BR /&gt;void SendMessageToB(message_t* message)&lt;BR /&gt;{&lt;BR /&gt; // check for prior message not taken&lt;BR /&gt; // should seldom occure&lt;BR /&gt; while(messageAtoB)&lt;BR /&gt; {&lt;BR /&gt; if(newMessageForB == NULL)&lt;BR /&gt; IPI(signalB);&lt;BR /&gt; _mm_pause(); // not taken (rework this code for failures)&lt;BR /&gt; }&lt;BR /&gt;messageAtoB = message;&lt;BR /&gt;if(newMessageForB == NULL)&lt;BR /&gt;IPI(signalB);&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;// code on B&lt;BR /&gt;message_t* newMessageForB = NULL;&lt;BR /&gt;IPIscan:&lt;BR /&gt;push rax;&lt;BR /&gt;...&lt;BR /&gt;if(messageAtoB)&lt;BR /&gt;{&lt;BR /&gt;newMessageForB=messageAtoB;&lt;BR /&gt; messageAtoB = NULL; //A will eventually observe we took the message&lt;BR /&gt;}&lt;BR /&gt;...&lt;BR /&gt;pop rax&lt;BR /&gt;iret&lt;BR /&gt;&lt;BR /&gt;Something along the above ought to work.&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;A href="https://community.intel.com/www.quickthreadprogramming.com" target="_blank"&gt;www.quickthreadprogramming.com&lt;/A&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 27 May 2010 14:29:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810699#M992</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2010-05-27T14:29:10Z</dc:date>
    </item>
    <item>
      <title>IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810700#M993</link>
      <description>Thanks for your replies.&lt;BR /&gt;Yes, I have also seen that Linux assumes that such a behaviour is not possible.&lt;BR /&gt;Nevertheless, I was wondering if this assumption is justified.&lt;BR /&gt;Means: Which part of the Software Developer's Manual guarantees that I'm allowed to assume that?&lt;BR /&gt;I suspect this information is missing in the manual and therefore suggest it should be updated.&lt;BR /&gt;&lt;BR /&gt;Just to clarify my interest in this topic:&lt;BR /&gt;I'm not just writing some code which I want to work correctly.&lt;BR /&gt;I'm developing a formal multiprocessor execution model for x86 CPUs in which I have to formally state whether such a behaviour is possible or not. And I have to justify such a formalisation with a reference to the Software Developer's Manual.</description>
      <pubDate>Tue, 08 Jun 2010 02:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/810700#M993</guid>
      <dc:creator>Michael4</dc:creator>
      <dc:date>2010-06-08T02:18:00Z</dc:date>
    </item>
    <item>
      <title>Re: IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1585720#M8214</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;It's possible, and a barrier is required for such architectures. See the smp_call implementation of the Linux kernel.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="cpp"&gt;void __smp_call_single_queue(int cpu, struct llist_node *node)
{
        ...
        /*
         * The list addition should be visible to the target CPU when it pops
         * the head of the list to pull the entry off it in the IPI handler
         * because of normal cache coherency rules implied by the underlying
         * llist ops.
         *
         * If IPIs can go out of order to the cache coherency protocol
         * in an architecture, sufficient synchronisation should be added
         * to arch code to make it appear to obey cache coherency WRT
         * locking and barrier primitives. Generic code isn't really
         * equipped to do the right thing...
         */
        if (llist_add(node, &amp;amp;per_cpu(call_single_queue, cpu)))
                send_call_function_single_ipi(cpu);
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And, there's a fence (&lt;SPAN&gt;mfence+lfence)&amp;nbsp;&lt;/SPAN&gt;before sending IPI over X2APIC.&lt;/P&gt;&lt;LI-CODE lang="cpp"&gt;static void x2apic_send_IPI(int cpu, int vector)
{
	u32 dest = per_cpu(x86_cpu_to_apicid, cpu);

	/* x2apic MSRs are special and need a special fence: */
	weak_wrmsr_fence();
	__x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL);
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Apr 2024 02:31:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1585720#M8214</guid>
      <dc:creator>Changbin</dc:creator>
      <dc:date>2024-04-03T02:31:04Z</dc:date>
    </item>
    <item>
      <title>Re: IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1587507#M8215</link>
      <description>&lt;P&gt;See my full answer here: &lt;A href="https://stackoverflow.com/questions/76352933/will-memory-write-be-visible-after-sending-an-ipi-on-x86/78264953#78264953" target="_blank"&gt;https://stackoverflow.com/questions/76352933/will-memory-write-be-visible-after-sending-an-ipi-on-x86/78264953#78264953&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Apr 2024 22:08:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1587507#M8215</guid>
      <dc:creator>Changbin</dc:creator>
      <dc:date>2024-04-08T22:08:57Z</dc:date>
    </item>
    <item>
      <title>Re: IPIs and weak memory ordering</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1709402#M8222</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Are barriers required in the case of a legacy xAPIC?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;As the SDM points out "(Note: The MMIO-based xAPIC interface is mapped by system software as an un-cached region. Consequently, read/writes to the xAPIC-MMIO interface have serializing semantics in the xAPIC mode.)"&lt;/P&gt;&lt;P&gt;This seems to imply that all stores issued by the IPI-sender should be visible in the IPI-receiver even without using any explicit atomics/barriers. Is that correct?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Aug 2025 04:02:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/IPIs-and-weak-memory-ordering/m-p/1709402#M8222</guid>
      <dc:creator>BSD4dot2</dc:creator>
      <dc:date>2025-08-14T04:02:48Z</dc:date>
    </item>
  </channel>
</rss>

