<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Problem when ported from single-core to multicore in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816506#M1174</link>
    <description>&amp;gt; Yes, the code may be not so nice, but it works on singlecore.&lt;BR /&gt;&lt;BR /&gt;It only appears to work most of the time.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; I checked there is no compiler out-of-order optimizing&lt;BR /&gt;&lt;BR /&gt;So you intended to never compile it under release configuration, right?&lt;BR /&gt;&lt;BR /&gt;&amp;gt; and I declared m_uiTickIndex as 'volatile'.&lt;BR /&gt;&lt;BR /&gt;Volatile does not work that way. At least you need to declare ALL participating variables as volatile.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; Then in my opinion, the Intel CPU will promise the executing order as the program order.&lt;BR /&gt;&lt;BR /&gt;Not quite. For example, &lt;A href="http://en.wikipedia.org/wiki/Dekker%27s_algorithm"&gt;Dekker's algorithm&lt;/A&gt; won't work on IA-32/Intel64 without explicit memory fences.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; And TickRolling() updates values at index 1 first, then sets m_uiTickIndex to 1.&lt;BR /&gt;&lt;BR /&gt;For example, it can break in the following way.&lt;BR /&gt;Current time is 1,100 (high, low).&lt;BR /&gt;A reader reads it as &lt;B&gt;1,100&lt;/B&gt;.&lt;BR /&gt;On the next try, the reader reads high part 1. Then, time is changed to 2,50. Then reader reads low part - 50. So the result is &lt;B&gt;1,50&lt;/B&gt;. The time indeed goes back.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 27 Jan 2011 12:14:15 GMT</pubDate>
    <dc:creator>Dmitry_Vyukov</dc:creator>
    <dc:date>2011-01-27T12:14:15Z</dc:date>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816503#M1171</link>
      <description>&lt;PRE&gt;[bash]Hi all,&lt;BR /&gt;My program works fine under single-core environment. But if under the following environment:&lt;BR /&gt;gcc version 4.1.0 (SUSE Linux)&lt;BR /&gt;8 * Intel Xeon CPU E5520 @ 2.27GHz&lt;BR /&gt;&lt;BR /&gt;Occasionly, the latter values are lower than the former values of two sequential TickGet()'s:&lt;BR /&gt;(Before return of TickGet(), we recorded values of 'uiIdex', 'lm_uiRollingTick[0]' and 'm_uiRollingTick[1]')&lt;BR /&gt;(gdb) call TICK_DebugShow()&lt;BR /&gt;uiIdx=1, uiTick[0]=35849, uiTick[1]=35848. -------- last time:     get uiTick[1]=35848&lt;BR /&gt;uiIdx=0, uiTick[0]=35849, uiTick[1]=35848. -------- former time: get uiTick[0]=35849&lt;BR /&gt;uiIdx=0, uiTick[0]=35848, uiTick[1]=35847.&lt;BR /&gt;&lt;BR /&gt;I heard that Intel CPU is conservative and ordered. So what causes this problem and how?&lt;BR /&gt;&lt;BR /&gt;CODE:&lt;BR /&gt;TICK runs a thread to update rolling tick(using TickRolling) at regular intervals.&lt;BR /&gt;And it provides a interface TickGet() for other threads to get the current ticks.&lt;BR /&gt;We use a read buffer m_uiRollingTick[1] to prevent using lock.&lt;BR /&gt;&lt;BR /&gt;unsigned int m_uiRollingTickHigh[2];
unsigned int m_uiRollingTick[2];&lt;BR /&gt;volatile unsigned int m_uiTickIndex;

int TickGet(unsigned int *puiHigh, unsigned int *puiLow)
{
	unsigned int uiIndex;
	
	uiIndex = m_uiTickIndex;
	*puiHigh = m_uiRollingTickHigh[uiIndex];
	*puiLow  = m_uiRollingTick[uiIndex];

	return 0;
}

void TickRolling(unsigned int uiMillSec)
{
	unsigned int uiRollingTickAndLost;
	unsigned int uiLostTicks = uiMillSec/1000;
                                                                 
	m_uiRollingTickHigh[1]  = m_uiRollingTickHigh[0];
	m_uiRollingTick[1]      = m_uiRollingTick[0];
	m_uiTickIndex = 1;

	uiRollingTickAndLost = m_uiRollingTick[0] + uiLostTicks;
	if(m_uiRollingTick[0] &amp;gt; uiRollingTickAndLost)
	{
		m_uiRollingTickHigh[0]++;
	}
	m_uiRollingTick[0] += uiLostTicks;
	m_uiTickIndex = 0;
}&lt;BR /&gt;&lt;BR /&gt;Thanks &amp;amp; Regards&lt;BR /&gt;&lt;BR /&gt;Hyphone&lt;BR /&gt;[/bash]&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Jan 2011 05:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816503#M1171</guid>
      <dc:creator>hyphone</dc:creator>
      <dc:date>2011-01-26T05:43:00Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816504#M1172</link>
      <description>The code is dead wrong, it's difficult to enumerate all the problems.&lt;BR /&gt;For example, TickGet() can read high part from one index, and then low part from another index. Or TickRolling() can set m_uiTickIndex to 1 and then update values at index 1. All accesses are not atomic.&lt;BR /&gt;It's not only CPU that reorders accesses, it can be done can a compiler as well.&lt;BR /&gt;The code is not working on singlecore CPU as well, you were just lucky.&lt;BR /&gt;&lt;BR /&gt;The easiest thing to do is use atomic 64-bit loads and stores. Then you do not need all that code - just atomically store new value, and atomically read current value.&lt;BR /&gt;&lt;BR /&gt;And do read&lt;A href="http://www.1024cores.net/home/lock-free-algorithms/so-what-is-a-memory-model-and-how-to-cook-it"&gt; So what is a memory model? And how to cook it?&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 26 Jan 2011 07:59:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816504#M1172</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2011-01-26T07:59:51Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816505#M1173</link>
      <description>&lt;BR /&gt;Dmitriy, Thank you for your reply.&lt;BR /&gt;&lt;BR /&gt;Yes, the code may be not so nice, but it works on singlecore.&lt;BR /&gt;I checked there is no compiler out-of-order optimizing and I declared m_uiTickIndex as 'volatile'.&lt;BR /&gt;Then in my opinion, the Intel CPU will promise the executing order as the program order.&lt;BR /&gt;&lt;BR /&gt;SoTickGet()reads m_uiTickIndex first,and thereis nopartly read.&lt;BR /&gt;And TickRolling() updates values at index 1 first, then sets m_uiTickIndex to 1.&lt;BR /&gt;&lt;BR /&gt;PS: I run the program as DEBUG version, and there is no optimizing.&lt;BR /&gt;(gdb) disass TickRolling&lt;BR /&gt;Dump of assembler code for function TickRolling:&lt;BR /&gt;0x08048633 &lt;TICKROLLING&gt;: push %ebp&lt;BR /&gt;0x08048634 &lt;TICKROLLING&gt;: mov %esp,%ebp&lt;BR /&gt;0x08048636 &lt;TICKROLLING&gt;: sub $0x10,%esp&lt;BR /&gt;0x08048639 &lt;TICKROLLING&gt;: movl $0x0,0xfffffff8(%ebp)&lt;BR /&gt;0x08048640 &lt;TICKROLLING&gt;: mov 0x8049128,%edx&lt;BR /&gt;0x08048646 &lt;TICKROLLING&gt;: mov 0x8(%ebp),%eax&lt;BR /&gt;0x08048649 &lt;TICKROLLING&gt;: mov %edx,%ecx&lt;BR /&gt;0x0804864b &lt;TICKROLLING&gt;: mov $0x0,%edx&lt;BR /&gt;0x08048650 &lt;TICKROLLING&gt;: div %ecx&lt;BR /&gt;0x08048652 &lt;TICKROLLING&gt;: mov %eax,0xfffffffc(%ebp)&lt;BR /&gt;0x08048655 &lt;TICKROLLING&gt;: mov 0x8049168,%eax&lt;BR /&gt;0x0804865a &lt;TICKROLLING&gt;: mov %eax,0x804916c&lt;BR /&gt;0x0804865f &lt;TICKROLLING&gt;: mov 0x8049160,%eax&lt;BR /&gt;0x08048664 &lt;TICKROLLING&gt;: mov %eax,0x8049164&lt;BR /&gt;0x08048669 &lt;TICKROLLING&gt;: movl $0x1,0x8049170&lt;BR /&gt;0x08048673 &lt;TICKROLLING&gt;: mov 0x8049160,%eax&lt;BR /&gt;0x08048678 &lt;TICKROLLING&gt;: add 0xfffffffc(%ebp),%eax&lt;BR /&gt;0x0804867b &lt;TICKROLLING&gt;: mov %eax,0xfffffff8(%ebp)&lt;BR /&gt;0x0804867e &lt;TICKROLLING&gt;: mov 0x8049160,%eax&lt;BR /&gt;0x08048683 &lt;TICKROLLING&gt;: cmp 0xfffffff8(%ebp),%eax&lt;BR /&gt;0x08048686 &lt;TICKROLLING&gt;: jbe 0x8048695 &lt;TICKROLLING&gt;&lt;BR /&gt;0x08048688 &lt;TICKROLLING&gt;: mov 0x8049168,%eax&lt;BR /&gt;0x0804868d &lt;TICKROLLING&gt;: add $0x1,%eax&lt;BR /&gt;0x08048690 &lt;TICKROLLING&gt;: mov %eax,0x8049168&lt;BR /&gt;0x08048695 &lt;TICKROLLING&gt;: mov 0x8049160,%eax&lt;BR /&gt;0x0804869a &lt;TICKROLLING&gt;: add 0xfffffffc(%ebp),%eax&lt;BR /&gt;0x0804869d &lt;TICKROLLING&gt;: mov %eax,0x8049160&lt;BR /&gt;0x080486a2 &lt;TICKROLLING&gt;: movl $0x0,0x8049170&lt;BR /&gt;0x080486ac &lt;TICKROLLING&gt;: leave &lt;BR /&gt;0x080486ad &lt;TICKROLLING&gt;: ret &lt;BR /&gt;End of assembler dump.&lt;BR /&gt;&lt;BR /&gt;And thanks for your recommendation, Iwill read the article carefully.&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;&lt;/TICKROLLING&gt;</description>
      <pubDate>Thu, 27 Jan 2011 02:05:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816505#M1173</guid>
      <dc:creator>hyphone</dc:creator>
      <dc:date>2011-01-27T02:05:33Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816506#M1174</link>
      <description>&amp;gt; Yes, the code may be not so nice, but it works on singlecore.&lt;BR /&gt;&lt;BR /&gt;It only appears to work most of the time.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; I checked there is no compiler out-of-order optimizing&lt;BR /&gt;&lt;BR /&gt;So you intended to never compile it under release configuration, right?&lt;BR /&gt;&lt;BR /&gt;&amp;gt; and I declared m_uiTickIndex as 'volatile'.&lt;BR /&gt;&lt;BR /&gt;Volatile does not work that way. At least you need to declare ALL participating variables as volatile.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; Then in my opinion, the Intel CPU will promise the executing order as the program order.&lt;BR /&gt;&lt;BR /&gt;Not quite. For example, &lt;A href="http://en.wikipedia.org/wiki/Dekker%27s_algorithm"&gt;Dekker's algorithm&lt;/A&gt; won't work on IA-32/Intel64 without explicit memory fences.&lt;BR /&gt;&lt;BR /&gt;&amp;gt; And TickRolling() updates values at index 1 first, then sets m_uiTickIndex to 1.&lt;BR /&gt;&lt;BR /&gt;For example, it can break in the following way.&lt;BR /&gt;Current time is 1,100 (high, low).&lt;BR /&gt;A reader reads it as &lt;B&gt;1,100&lt;/B&gt;.&lt;BR /&gt;On the next try, the reader reads high part 1. Then, time is changed to 2,50. Then reader reads low part - 50. So the result is &lt;B&gt;1,50&lt;/B&gt;. The time indeed goes back.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 27 Jan 2011 12:14:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816506#M1174</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2011-01-27T12:14:15Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816507#M1175</link>
      <description>I'm afraidI didn't explain my point clear.&lt;BR /&gt;&lt;BR /&gt;First, I think this code is wrong too.&lt;BR /&gt;And I see that if current time is 1,0xffffffff (high, low), one reader reads the high part 1 with index 0.&lt;BR /&gt;Then, time is changed to 2, 0. Then the reader reads the low part 0. &lt;BR /&gt;So the result is &lt;STRONG&gt;1,0 and time goes back&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;But as I ran the DEBUG version, I got a (0,35849) followed by (0,35848).&lt;BR /&gt;I just don't know how this problem come out,that is, what is the program executing flow in a global view.&lt;BR /&gt;&lt;BR /&gt;BTW, ifI use spin_lock in TickGet and TickRolling, theproblem is gone.&lt;BR /&gt;&lt;BR /&gt;The do-while loop also does the same work. &lt;BR /&gt;Is that a lock-freedomprotection as mentioned in your article?&lt;BR /&gt;&lt;P&gt;intVOS_TickGet(unsigned int *puiHigh,unsigned int *puiLow)&lt;BR /&gt;{&lt;BR /&gt;unsigned intuiIndex;&lt;/P&gt;&lt;P&gt; do {&lt;BR /&gt; uiIndex = m_uiTickIndex;&lt;BR /&gt; *puiHigh = m_uiRollingTickHigh[uiIndex] ;&lt;BR /&gt; *puiLow = m_uiRollingTick[uiIndex];&lt;BR /&gt; }while (uiIndex != m_uiTickIndex);&lt;/P&gt;&lt;P&gt; return 0;&lt;BR /&gt;}&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2011 02:42:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816507#M1175</guid>
      <dc:creator>hyphone</dc:creator>
      <dc:date>2011-01-28T02:42:52Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816508#M1176</link>
      <description>&lt;P&gt;&amp;gt; But as I ran the DEBUG version, I got a (0,35849) followed by (0,35848).&lt;/P&gt;
&lt;P&gt;If you are interested in how exactly it's possible under sequentially consistent memory model, follow me.&lt;/P&gt;&lt;P&gt;Below is a sequence of modifications of the variables during update operation:&lt;/P&gt;&lt;P&gt;&lt;PRE&gt;[cpp]time    index     tick[0]     tick[1]
0       0         10          5
1       0         10          10
2       1         10          10
3       1         15          10
4       0         15          10[/cpp]&lt;/PRE&gt;&lt;/P&gt;&lt;BR /&gt;First read operation starts at time=0.&lt;BR /&gt;A thread reads index=0 (time=0)&lt;BR /&gt;Then time advances to time=3.&lt;BR /&gt;Then the thread reads tick[0]=15 (time=3).&lt;BR /&gt;&lt;BR /&gt;Second read operation starts at time=3.&lt;BR /&gt;The thread reads index=1 (time=3).&lt;BR /&gt;Then the thread reads tick[1]=10 (time=3).&lt;BR /&gt;&lt;BR /&gt;So, indeed, in two consecutive reads under sequentially consistent memory model a thread observes time=15 and then time=10. Welcome to concurrent programming!&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2011 08:37:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816508#M1176</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2011-01-28T08:37:43Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816509#M1177</link>
      <description>By the way, to model such things you may use my &lt;A href="http://www.1024cores.net/home/relacy-race-detector/rrd-introduction"&gt;Relacy Race Detector&lt;/A&gt;. It would show you precise execution history which leads to any possible outcome you are interested in.&lt;BR /&gt;</description>
      <pubDate>Fri, 28 Jan 2011 08:40:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816509#M1177</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2011-01-28T08:40:16Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816510#M1178</link>
      <description>&lt;P&gt;&lt;BR /&gt;Thank you for your replies.&lt;BR /&gt;I'll try the &lt;A href="http://www.1024cores.net/home/relacy-race-detector/rrd-introduction"&gt;Relacy Race Detector&lt;/A&gt;tool to see the precise execution flow.&lt;/P&gt;</description>
      <pubDate>Sun, 30 Jan 2011 05:46:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816510#M1178</guid>
      <dc:creator>hyphone</dc:creator>
      <dc:date>2011-01-30T05:46:39Z</dc:date>
    </item>
    <item>
      <title>Problem when ported from single-core to multicore</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816511#M1179</link>
      <description>As you see it can equally happen on a single-core machine. It does not relate to multicore, it's just a multithreading bug. As I said, the simplest way to fix it is to use 64-bit atomic loads and stores.</description>
      <pubDate>Mon, 31 Jan 2011 08:07:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Problem-when-ported-from-single-core-to-multicore/m-p/816511#M1179</guid>
      <dc:creator>Dmitry_Vyukov</dc:creator>
      <dc:date>2011-01-31T08:07:30Z</dc:date>
    </item>
  </channel>
</rss>

