<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Lockfree_mpmc and scalability ... in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813710#M1094</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Here it is, i have compiled the push.pas to &lt;BR /&gt;push1.exe (one thread) and push4.exe (four threads)&lt;/P&gt;&lt;P&gt;Here they are:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/push1.exe"&gt;http://pages.videotron.com/aminer/push1.exe&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/push4.exe"&gt;http://pages.videotron.com/aminer/push4.exe&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If your computer is an x86 and you have an L3 cache &lt;BR /&gt;and your computer have 4 or more cores , can please &lt;BR /&gt;run those two programs and give me there output...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 28 May 2012 22:54:08 GMT</pubDate>
    <dc:creator>aminer10</dc:creator>
    <dc:date>2012-05-28T22:54:08Z</dc:date>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813708#M1092</link>
      <description>&lt;BR /&gt;Hello all,&lt;BR /&gt;&lt;DIV&gt;&lt;BR /&gt;I have finally found why lockfree_mpmc doesn't scale...&lt;BR /&gt;&lt;BR /&gt;you can get the the source code of lockfree_mpmc from:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://pages.videotron.com/aminer/" target="_blank"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;So please follow with me..&lt;BR /&gt;&lt;BR /&gt;If you take a look at lockfree_mpmc object pascal &lt;/DIV&gt;&lt;DIV&gt;source code you will read this on the push side:&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;function TLockfree_MPMC.push(tm : tNodeQueue):boolean;&lt;BR /&gt;var lasttail,newtemp:longword;&lt;BR /&gt;i,j:integer;&lt;BR /&gt;begin&lt;BR /&gt;&lt;BR /&gt;if getlength &amp;gt;= fsize &lt;BR /&gt;then &lt;BR /&gt; begin&lt;BR /&gt; result:=false;&lt;BR /&gt; exit;&lt;BR /&gt;end; &lt;BR /&gt;&lt;BR /&gt;result:=true;&lt;BR /&gt;&lt;BR /&gt;newTemp:=LockedIncLong(temp);&lt;BR /&gt;lastTail:=newTemp-1;&lt;BR /&gt;&lt;BR /&gt;setObject(lastTail,tm);&lt;BR /&gt;&lt;BR /&gt;repeat&lt;BR /&gt;if CAS(tail,lasttail,newtemp) &lt;BR /&gt;then &lt;BR /&gt; begin&lt;BR /&gt; exit; &lt;BR /&gt; end;&lt;BR /&gt;asm pause end;&lt;BR /&gt;until false;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;When i have tested the push() side with 4 threads i have noticed that lockfree_mpmc &lt;BR /&gt;doesn't scale at all., in fact i have got a retrograde throughput, that means that &lt;BR /&gt;i got less throughput than on a single thread test.. and i have finally found &lt;BR /&gt;why lockfree_mpmc doesn't scale. When you are using a lockfree_mpmc &lt;BR /&gt;on a single thread test the CAS does read and update the variables on the &lt;BR /&gt;level 1 cache, and it's fast, but when you are using 4 threads it does get &lt;BR /&gt;too slow cause we are reading and updating from the L2 and from the memory.&lt;BR /&gt;&lt;BR /&gt;I have thried to play with the affinity mask and i have found that when i am &lt;BR /&gt;using two threads on my tests and reading and updating from the same level 2 cache &lt;BR /&gt;it does scale a little bit more and i have got more throughput with two threads&lt;/DIV&gt;&lt;DIV&gt;on different cores and on the same level 2 cache than the single threadtest.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I have alsomodified lockfree_mpmc to not touch the CAS and &lt;BR /&gt;the cache when tail and lasttail are not equal by using the following code inside &lt;BR /&gt;the repeat until loop:&lt;BR /&gt;&lt;BR /&gt;if tail &amp;lt;&amp;gt; lasttail&lt;BR /&gt;then &lt;BR /&gt;begin&lt;BR /&gt;continue;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;and it does give better performance with this method&lt;BR /&gt;&lt;BR /&gt;here is the final code of the push() side of lockfree_mpmc..&lt;BR /&gt;&lt;BR /&gt;i think i will modify the pop() side like that...&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;---&lt;BR /&gt;function TLockfree_MPMC.push(tm : tNodeQueue):boolean;&lt;BR /&gt;var lasttail,newtemp:longword;&lt;BR /&gt;i,j:integer;&lt;BR /&gt;begin&lt;BR /&gt;&lt;BR /&gt;if getlength &amp;gt;= fsize &lt;BR /&gt;then &lt;BR /&gt; begin&lt;BR /&gt; result:=false;&lt;BR /&gt; exit;&lt;BR /&gt;end; &lt;BR /&gt;&lt;BR /&gt;result:=true;&lt;BR /&gt;&lt;BR /&gt;newTemp:=LockedIncLong(temp);&lt;BR /&gt;lastTail:=newTemp-1;&lt;BR /&gt;&lt;BR /&gt;setObject(lastTail,tm);&lt;BR /&gt;&lt;BR /&gt;repeat&lt;BR /&gt;&lt;BR /&gt;if tail &amp;lt;&amp;gt; lasttail&lt;BR /&gt;then &lt;BR /&gt;begin&lt;BR /&gt; continue;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;if CAS(tail,lasttail,newtemp) &lt;BR /&gt;then &lt;BR /&gt; begin&lt;BR /&gt; exit; &lt;BR /&gt; end;&lt;BR /&gt;asm pause end;&lt;BR /&gt;until false;&lt;BR /&gt;end;&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;But as i have said before lockfree_mpmc doesn't scale when we are &lt;/DIV&gt;&lt;DIV&gt;using different cores and WE ARE NOT sharing the same cache, &lt;/DIV&gt;&lt;DIV&gt;that means that on my Intel Core 2 Quad Q6600 it does scale only &lt;/DIV&gt;&lt;DIV&gt;when we are using 2 threads on different cores that shares the same &lt;/DIV&gt;&lt;DIV&gt;level2 cache.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;&lt;DIV&gt;Thank you.&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 28 May 2012 19:47:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813708#M1092</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-28T19:47:43Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813709#M1093</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I have only tested lockfree_mpmc ona Intel Core 2 Quad Q6600,&lt;BR /&gt;i don't have here an L3 cache, but perhaps lockfree_mpmc&lt;BR /&gt;will scale on an x86 that have an L3 cache.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I didn't tested it with an L3 cache, can you please do it for me &lt;BR /&gt;if you have an L3 cache and a quad core or morecore on your x86 computer ?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Just download the Lockfree MPMC and SPMC fifo queues version 1.12&lt;BR /&gt;from &lt;A href="http://pages.videotron.com/aminer/"&gt;http://pages.videotron.com/aminer/&lt;/A&gt;and look inside the zip &lt;BR /&gt;file i have put a push.pas and a pop.pas tests, just open for example the &lt;BR /&gt;push.pas test and test it with a single threadsandafter that with 4 threads&lt;BR /&gt;bygiving the variable a the value of 1 andafter that 4 and after that just&lt;BR /&gt;email me the throughput for 1 and 4 threads.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;BR /&gt;&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 28 May 2012 20:39:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813709#M1093</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-28T20:39:40Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813710#M1094</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Here it is, i have compiled the push.pas to &lt;BR /&gt;push1.exe (one thread) and push4.exe (four threads)&lt;/P&gt;&lt;P&gt;Here they are:&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/push1.exe"&gt;http://pages.videotron.com/aminer/push1.exe&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/push4.exe"&gt;http://pages.videotron.com/aminer/push4.exe&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;If your computer is an x86 and you have an L3 cache &lt;BR /&gt;and your computer have 4 or more cores , can please &lt;BR /&gt;run those two programs and give me there output...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2012 22:54:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813710#M1094</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-28T22:54:08Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813711#M1095</link>
      <description>Hello,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I have receaived the benchmarks from some persons &lt;BR /&gt;that have an L3 cache, and i have noticed that lockfree_mpmc &lt;BR /&gt;doesn't scale either on with an L3 cache. &lt;BR /&gt;Do you know why this lock free fifo doesn't scale, cause &lt;BR /&gt;look at the following code on the push() side:&lt;BR /&gt;&lt;BR /&gt;--&lt;BR /&gt;&lt;BR /&gt;function TLockfree_MPMC.push(tm : tNodeQueue):boolean;&lt;BR /&gt;var lasttail,newtemp:longword;&lt;BR /&gt;i,j:integer;&lt;BR /&gt;begin&lt;BR /&gt;&lt;BR /&gt;if getlength &amp;gt;= fsize &lt;BR /&gt;then &lt;BR /&gt;begin&lt;BR /&gt;result:=false;&lt;BR /&gt;exit;&lt;BR /&gt;end; &lt;BR /&gt;result:=true;&lt;BR /&gt;newTemp:=LockedIncLong(temp);&lt;BR /&gt;&lt;BR /&gt;lastTail:=newTemp-1;&lt;BR /&gt;setObject(lastTail,tm);&lt;BR /&gt;&lt;BR /&gt;repeat&lt;BR /&gt;&lt;BR /&gt;if CAS(tail,lasttail,newtemp) &lt;BR /&gt;then &lt;BR /&gt;begin&lt;BR /&gt;exit; &lt;BR /&gt;end;&lt;BR /&gt;asm pause end;&lt;BR /&gt;&lt;BR /&gt;until false;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;You have two thinks:&lt;BR /&gt;&lt;BR /&gt;[1] newTemp:=LockedIncLong(temp);&lt;BR /&gt;&lt;BR /&gt;[2] CAS(tail,lasttail,newtemp) &lt;BR /&gt;&lt;BR /&gt;In the 4 threads scenario , as you can see &lt;BR /&gt;in [1] temp has to be loaded from the L3 cache &lt;BR /&gt;of the other cores on computers that have an L3 cache&lt;BR /&gt;but on my also from memory on my Intel Core 2 Quad Q6600&lt;BR /&gt;that doesn't have an L2 cache(just an L2 cache for every two cores) , &lt;BR /&gt;so that will make the the four thread test with an L3 cache a little bit &lt;BR /&gt;slower than the single thread version and much slower without an &lt;DIV&gt;L3 cache compared to the single thread version that loads the values &lt;/DIV&gt;&lt;DIV&gt;from the L1 cache. That's the same for [2] , tail has to be loaded the same &lt;/DIV&gt;&lt;DIV&gt;way.&lt;BR /&gt;&lt;BR /&gt;It's whyi am getting a retrograde throughput on my &lt;BR /&gt;Intel Core 2 Quad Q6600 and alomost the same thoughput&lt;BR /&gt;as the single thread on a computer with an L3 cache.&lt;BR /&gt;&lt;BR /&gt;In the two thread scenario, you have to do a load &lt;BR /&gt;from the local L2 cache in [1] and [2] and this loads makes &lt;BR /&gt;the S part of the Amadahl equation much bigger than &lt;BR /&gt;the P part, it's why the two threads version doens't scale&lt;BR /&gt;either.&lt;BR /&gt;&lt;BR /&gt;So in general i think it's not possible to make lockfree &lt;BR /&gt;fifo queues to scale on x86 when the lockfree code is sharing variables &lt;BR /&gt;between the cores, cause sharing variables is so expensive..&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;BR /&gt;&lt;BR /&gt;Amine Moulay Ramdane.&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 May 2012 19:05:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813711#M1095</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-29T19:05:15Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813712#M1096</link>
      <description>&lt;DIV&gt;&lt;BR /&gt;Hello,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Of course i was speaking about the x86 architecture...&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Amine Moulay Ramdane.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 May 2012 19:08:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813712#M1096</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-29T19:08:07Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813713#M1097</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I have corrected some typos , please read again...&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I have received the benchmarks from some persons&lt;BR /&gt;that have an L3 cache, and i have noticed that lockfree_mpmc&lt;BR /&gt;doesn't scale either on with an L3 cache.&lt;BR /&gt;Do you know why this lock free fifo doesn't scale, cause&lt;BR /&gt;look at the following code on the push() side:&lt;BR /&gt;&lt;BR /&gt;--&lt;BR /&gt;&lt;BR /&gt;function TLockfree_MPMC.push(tm : tNodeQueue):boolean;&lt;BR /&gt;var lasttail,newtemp:longword;&lt;BR /&gt;i,j:integer;&lt;BR /&gt;begin&lt;BR /&gt;&lt;BR /&gt;if getlength &amp;gt;= fsize&lt;BR /&gt;then&lt;BR /&gt;begin&lt;BR /&gt;result:=false;&lt;BR /&gt;exit;&lt;BR /&gt;end;&lt;BR /&gt;result:=true;&lt;BR /&gt;newTemp:=LockedIncLong(temp);&lt;BR /&gt;&lt;BR /&gt;lastTail:=newTemp-1;&lt;BR /&gt;setObject(lastTail,tm);&lt;BR /&gt;&lt;BR /&gt;repeat&lt;BR /&gt;&lt;BR /&gt;if CAS(tail,lasttail,newtemp)&lt;BR /&gt;then&lt;BR /&gt;begin&lt;BR /&gt;exit;&lt;BR /&gt;end;&lt;BR /&gt;asm pause end;&lt;BR /&gt;&lt;BR /&gt;until false;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;end;&lt;BR /&gt;&lt;BR /&gt;---&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;You have two thinks:&lt;BR /&gt;&lt;BR /&gt;[1] newTemp:=LockedIncLong(temp);&lt;BR /&gt;&lt;BR /&gt;[2] CAS(tail,lasttail,newtemp)&lt;BR /&gt;&lt;BR /&gt;In the 4 threads scenario , as you can see in [1] temp has to be &lt;/DIV&gt;&lt;DIV&gt;loaded from the L3 cache on computers that have an L3 cache , &lt;/DIV&gt;&lt;DIV&gt;but on my Intel Core 2 Quad Q6600that doesn't have an &lt;/DIV&gt;&lt;DIV&gt;L3 cache(just an L2 cache for every two cores) i think it has to &lt;/DIV&gt;&lt;DIV&gt;be loaded from memory, so that will make the four thread test &lt;/DIV&gt;&lt;DIV&gt;with an L3 cache a little bit slower than the single thread version&lt;/DIV&gt;&lt;DIV&gt;that loads the values from the L1 cache and much slower on a computer&lt;/DIV&gt;&lt;DIV&gt;without an L3 cache. That's the same for [2] , tail has to be loaded the &lt;/DIV&gt;&lt;DIV&gt;same way.&lt;BR /&gt;&lt;BR /&gt;It's why i am getting a retrograde throughput with four threads &lt;/DIV&gt;&lt;DIV&gt;on my Intel Core 2 Quad Q6600 and almost the same thoughput&lt;/DIV&gt;&lt;DIV&gt;as the single thread on a computer with an L3 cache.&lt;BR /&gt;&lt;BR /&gt;In the two thread scenario, you have to do a load&lt;BR /&gt;from the local L2 cache in [1] and [2] and this loads makes&lt;BR /&gt;the S part of the Amadahl equation much bigger than&lt;BR /&gt;the P part, it's why the two threads version doens't scale&lt;BR /&gt;either.&lt;BR /&gt;&lt;BR /&gt;So in general i think it's not possible to make lockfree&lt;BR /&gt;fifo queues to scale when the lockfree code is sharing variables&lt;BR /&gt;between the cores, cause sharing variables is so expensive..&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;BR /&gt;&lt;BR /&gt;Amine Moulay Ramdane.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 May 2012 19:22:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813713#M1097</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-29T19:22:08Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813714#M1098</link>
      <description>&lt;DIV&gt;I wrote:&lt;/DIV&gt;&lt;DIV&gt;&amp;gt;In the two thread scenario, you have to do a load&lt;BR /&gt;&amp;gt; from the local L2 cache in [1] and [2] and this loads makes&lt;BR /&gt;&amp;gt; the S part of the Amadahl equation much bigger than&lt;BR /&gt;&amp;gt; the P part, it's why the two threads version doens't scale&lt;BR /&gt;&amp;gt; either.&lt;BR /&gt;&amp;gt; &lt;BR /&gt;&amp;gt; So in general i think it's not possible to make lockfree&lt;BR /&gt;&amp;gt; fifo queues to scale when the lockfree code is sharing variables&lt;BR /&gt;&amp;gt; between the cores, cause sharing variables is so expensive..&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I mean the CAS and the sharing of the variablesmakethe S part of &lt;/DIV&gt;&lt;DIV&gt;lockfree_mpmc much bigger than the S part andfrom the Amadahl equestion &lt;/DIV&gt;&lt;DIV&gt;this makeslockfree_mpmcnot scalable.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;So in general i think it's not possible to make lockfree&lt;BR /&gt;fifo queues to scale when the lockfree code is sharing variables&lt;BR /&gt;between the cores and you are using CASes, cause sharing variables &lt;/DIV&gt;&lt;DIV&gt;and using CAS are so expensive..&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thank you.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Amine Moulay Ramdane.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 May 2012 19:46:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813714#M1098</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-29T19:46:36Z</dc:date>
    </item>
    <item>
      <title>Lockfree_mpmc and scalability ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813715#M1099</link>
      <description>&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Even the following code inside push() method: &lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;if getlength &amp;gt;= fsize &lt;BR /&gt;then &lt;BR /&gt;begin&lt;BR /&gt;result:=false;&lt;BR /&gt;exit;&lt;BR /&gt;end; &lt;/DIV&gt;&lt;DIV&gt;If you hav e noticedgetlength() method is sharing &lt;BR /&gt;variables between the cores and making the 4 threads &lt;BR /&gt;test much slower than the single thread test on &lt;BR /&gt;Intel Core 2 Quad Q6600, and i have tested it on my computer, &lt;/DIV&gt;&lt;DIV&gt;it makes it much slower cause the cache to cache tranfer is costly &lt;/DIV&gt;&lt;DIV&gt;on Intel Core 2 Quad Q6600, but on new architechtures that have &lt;/DIV&gt;&lt;DIV&gt;and L3 cache and hypertransport, it gives the same throughput &lt;/DIV&gt;&lt;DIV&gt;onsingle thread and four threads but it doesn't scale with four threads.&lt;BR /&gt;.&lt;BR /&gt;So the following parts are sharing variables between the cores and making &lt;BR /&gt;the 4 threads test much slower than the single thread test.on Intel Core 2 Quad Q6600:&lt;BR /&gt;&lt;BR /&gt;[1] getlength()&lt;BR /&gt;&lt;BR /&gt;[1] newTemp:=LockedIncLong(temp); ...&lt;BR /&gt;&lt;BR /&gt;[2] and CAS(tail,lasttail,newtemp) also...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;It's why i have told you that the CAS and sharing variables &lt;BR /&gt;are expensive.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thank you.&lt;BR /&gt;&lt;BR /&gt;Amine Moulay Ramdane.&lt;BR /&gt;&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 May 2012 22:08:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Lockfree-mpmc-and-scalability/m-p/813715#M1099</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2012-05-29T22:08:58Z</dc:date>
    </item>
  </channel>
</rss>

