<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to optimize this uPipe diagram? in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129913#M7677</link>
    <description>&lt;P&gt;Look at the attachment. How to optimize this uPipe diagram?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 28 Oct 2019 06:25:31 GMT</pubDate>
    <dc:creator>sun__lei</dc:creator>
    <dc:date>2019-10-28T06:25:31Z</dc:date>
    <item>
      <title>How to optimize this uPipe diagram?</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129913#M7677</link>
      <description>&lt;P&gt;Look at the attachment. How to optimize this uPipe diagram?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 06:25:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129913#M7677</guid>
      <dc:creator>sun__lei</dc:creator>
      <dc:date>2019-10-28T06:25:31Z</dc:date>
    </item>
    <item>
      <title>It would help if you would</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129914#M7678</link>
      <description>&lt;P&gt;It would help if you would show the sections of the code that is giving you this result.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 11:57:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129914#M7678</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-10-28T11:57:39Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129915#M7679</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It would help if you would show the sections of the code that is giving you this result.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sorry, the code belongs to my company and the code is too long. In a nutshell, the code has much random memory access. I have tried many optimization methods such as&amp;nbsp;prefetch，but all failed. I don't know whether I have inserted the right prefetch position or prefetch size. Maybe you have a better solution.&lt;/P&gt;&lt;P&gt;Thanks for your reply very much. :)&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 12:10:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129915#M7679</guid>
      <dc:creator>sun__lei</dc:creator>
      <dc:date>2019-10-28T12:10:10Z</dc:date>
    </item>
    <item>
      <title>I have found that the</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129916#M7680</link>
      <description>&lt;P&gt;I have found that the hardware pre-fetch capability of modern CPUs makes it in most cases counter-productive to introduce software prefetching.&lt;/P&gt;&lt;P&gt;The better solution is (often) to analyze your algorithms and data placement, and if possible rearrange data and restructure the algorithm (without adversely affecting results) such that the CPU can get more work done per memory fetch. Note, memory is fetched by cache line (or double cache line).&lt;/P&gt;&lt;P&gt;If you are unable to resolve this by yourself, then you can contact me and we can work something out under NDA.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 15:04:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129916#M7680</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-10-28T15:04:46Z</dc:date>
    </item>
    <item>
      <title>Quote:jimdempseyatthecove</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129917#M7681</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;jimdempseyatthecove (Blackbelt) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have found that the hardware pre-fetch capability of modern CPUs makes it in most cases counter-productive to introduce software prefetching.&lt;/P&gt;&lt;P&gt;The better solution is (often) to analyze your algorithms and data placement, and if possible rearrange data and restructure the algorithm (without adversely affecting results) such that the CPU can get more work done per memory fetch. Note, memory is fetched by cache line (or double cache line).&lt;/P&gt;&lt;P&gt;If you are unable to resolve this by yourself, then you can contact me and we can work something out under NDA.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yeah, I have optimized my algorithm and data placement as much as possible.&amp;nbsp;Next, I want to use AVX to try to solve the uPipe diagram. Do you think AVX can help? It is my pleasure to be able to communicate with you. What is your contact? We can keep in touch and exchange some problems.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2019 15:22:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129917#M7681</guid>
      <dc:creator>sun__lei</dc:creator>
      <dc:date>2019-10-28T15:22:37Z</dc:date>
    </item>
    <item>
      <title>Depending on your code, data</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129918#M7682</link>
      <description>&lt;P&gt;Depending on your code, data organization&amp;nbsp;and optimizations selected, your code may already be using multiple lanes of the AVX registers.&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Tue, 29 Oct 2019 12:41:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/How-to-optimize-this-uPipe-diagram/m-p/1129918#M7682</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2019-10-29T12:41:21Z</dc:date>
    </item>
  </channel>
</rss>

