<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic My Parallel Sort Library and benchmarks ... in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824701#M1351</link>
    <description>&lt;BR /&gt;Tudor wrote:&lt;BR /&gt;&amp;gt;1 core - 35.188 sec&lt;BR /&gt;&amp;gt;2 cores - 13.382 sec&lt;BR /&gt;&amp;gt;that's almost 3x speedup for double the number of cores.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Don't forget that i am using an Intel Core 2 Quad , so, &lt;BR /&gt;the CPU-Z gave me: *2* x 4096 KBytes of Level2 cache.&lt;BR /&gt;&lt;BR /&gt;So, on four cores i amsorting four chunks of the big array &lt;BR /&gt;with just *2* Level 2 caches :)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 07 May 2010 19:16:43 GMT</pubDate>
    <dc:creator>aminer10</dc:creator>
    <dc:date>2010-05-07T19:16:43Z</dc:date>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824697#M1347</link>
      <description>&lt;P&gt;&lt;BR /&gt;Hello all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have used my Parallel Sort Library and benchmarked an Object Pascal&lt;BR /&gt;program that sorts a dynamic array of 10 millions of strings.&lt;/P&gt;&lt;P&gt;Please look at the benchmarks in following page: &lt;/P&gt;&lt;P&gt;&lt;A href="http://pages.videotron.com/aminer/parallelsort/parallelsort.htm"&gt;http://pages.videotron.com/aminer/parallelsort/parallelsort.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;On four cores four threads, here is the results:&lt;BR /&gt; &lt;BR /&gt;Parallel Quicksort gives 5.31x speed and 2.877 seconds&lt;BR /&gt;Parallel Heapsort gave me 4.72x and 7.452 seconds&lt;/P&gt;&lt;P&gt;Note: Parallel Quicksort is much faster in pratice than parallel heapsort &lt;BR /&gt; or parallel mergesort.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sincerely&lt;BR /&gt;Amine Moulay Ramdane.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 May 2010 08:09:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824697#M1347</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-05-07T08:09:55Z</dc:date>
    </item>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824698#M1348</link>
      <description>Are you sure the algorithms perform normally when there is only 1 core? It seems you are getting superlinear speedup for every configuration compared to 1 core. That seems very strange.&lt;BR /&gt;For example, I extracted this from your tests:&lt;BR /&gt;Heapsort:&lt;BR /&gt;2 cores - 13.382 sec&lt;BR /&gt;4 cores - 7.453 sec&lt;BR /&gt;this is normal, a bit below 2x speedup. But:&lt;BR /&gt;1 core - &lt;SPAN style="font-family: Arial;"&gt;&lt;/SPAN&gt;35.188 sec&lt;BR /&gt;2 cores - &lt;SPAN style="font-family: Arial;"&gt;&lt;/SPAN&gt;13.382 sec&lt;BR /&gt;that's almost 3x speedup for double the number of cores.&lt;BR /&gt;</description>
      <pubDate>Fri, 07 May 2010 10:43:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824698#M1348</guid>
      <dc:creator>Tudor</dc:creator>
      <dc:date>2010-05-07T10:43:39Z</dc:date>
    </item>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824699#M1349</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Tudor wrote:&lt;BR /&gt;&amp;gt;Are you sure the algorithms perform normally when there is only 1 core? It seems you are getting &amp;gt;superlinear speedup for every configuration compared to 1 core. That seems very strange.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I thinkit's the 'divide and conquer' that givesthis speed up,due &lt;BR /&gt;to the fact that i am sorting 'parts' of the array in parallel and in &lt;BR /&gt;every parts of the array i am applying the quicksort or heapsort...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;So, if we have 80 elements , in single threaded the average complexity &lt;BR /&gt;will be 80 Ln(80)this will give350.56 , but in mutlthreaded with four threads &lt;BR /&gt;on four coresthis will give 4(20 ln(20)) = 239 , and we are also runnning &lt;BR /&gt;the quicksort or heapsort in parallel on every part of the array , it's why &lt;BR /&gt;it gives better than 4x.&lt;BR /&gt;&lt;BR /&gt;:)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;Amine.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 07 May 2010 18:54:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824699#M1349</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-05-07T18:54:29Z</dc:date>
    </item>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824700#M1350</link>
      <description>&lt;BR /&gt;&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Of course i am using NO synchronization between the threads...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Just a smallinterlockincrement() to know when the sort have finnished...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 07 May 2010 19:02:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824700#M1350</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-05-07T19:02:42Z</dc:date>
    </item>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824701#M1351</link>
      <description>&lt;BR /&gt;Tudor wrote:&lt;BR /&gt;&amp;gt;1 core - 35.188 sec&lt;BR /&gt;&amp;gt;2 cores - 13.382 sec&lt;BR /&gt;&amp;gt;that's almost 3x speedup for double the number of cores.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Don't forget that i am using an Intel Core 2 Quad , so, &lt;BR /&gt;the CPU-Z gave me: *2* x 4096 KBytes of Level2 cache.&lt;BR /&gt;&lt;BR /&gt;So, on four cores i amsorting four chunks of the big array &lt;BR /&gt;with just *2* Level 2 caches :)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 07 May 2010 19:16:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824701#M1351</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-05-07T19:16:43Z</dc:date>
    </item>
    <item>
      <title>My Parallel Sort Library and benchmarks ...</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824702#M1352</link>
      <description>&lt;BR /&gt;Tudor wrote:&lt;BR /&gt;&amp;gt;1 core - 35.188 sec&lt;BR /&gt;&amp;gt;2 cores - 13.382 sec&lt;BR /&gt;&amp;gt;that's almost 3x speedup for double the number of cores.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Forget about he complexity , cause with a big chunks there &lt;BR /&gt;is almost no difference in the complexity from one to two cores...&lt;BR /&gt;&lt;BR /&gt;So , i think the single threaded program is less cache friendly &lt;BR /&gt;than the two threads... and since we have 2 xLevel 2 caches , this &lt;BR /&gt;explains the 3x from 1 core to two cores...&lt;BR /&gt;&lt;BR /&gt;I don't have any other explaination...&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Amine.</description>
      <pubDate>Fri, 07 May 2010 19:41:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/My-Parallel-Sort-Library-and-benchmarks/m-p/824702#M1352</guid>
      <dc:creator>aminer10</dc:creator>
      <dc:date>2010-05-07T19:41:24Z</dc:date>
    </item>
  </channel>
</rss>

