<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Andres: in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/streaming-video-thru-Phi/m-p/1049642#M49362</link>
    <description>&lt;P&gt;Andres:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; I think PCI-Express x16 2.0 runs at about 8 GB/s bidirectional.&amp;nbsp; So that might be enough for a few uncompressed video streams, or a fair number of compressed streams.&amp;nbsp; But to make use of that capacity in a meaningful way requires considerable care in terms of overlapped compute and transfer, appropriate buffer sizes, making sure neither the host nor the coprocessor is waiting on each other, etc.&amp;nbsp; In short, it's not trivial.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; What concerns me is that you say "this is obviously not what I would call a massively parallel application".&amp;nbsp; If you can't max out the Xeon host most of the time with highly parallel work that is&amp;nbsp;well-vectorized and makes optimal use of the host's cache architecture, then it's not looking good for a coprocessor.&amp;nbsp; The Intel Xeon Phi coprocessor is a pretty slow serial machine that really needs a high degree of parallelism, vectorization, and good memory use to shine.&amp;nbsp; And by "high degree" I mean using all threads well north of 90% of the time (look up Amdahl's law to see why).&amp;nbsp; Anything you do to optimize the Intel Xeon code will benefit a Intel Xeon Phi implementation if you eventually go that way.&lt;/P&gt;

&lt;P&gt;Sorry, Charles&lt;/P&gt;</description>
    <pubDate>Fri, 27 Mar 2015 19:48:03 GMT</pubDate>
    <dc:creator>Charles_C_Intel1</dc:creator>
    <dc:date>2015-03-27T19:48:03Z</dc:date>
    <item>
      <title>streaming video thru Phi</title>
      <link>https://community.intel.com/t5/Software-Archive/streaming-video-thru-Phi/m-p/1049641#M49361</link>
      <description>&lt;P&gt;I am currently developing a real-time video processing application that runs on a dedicated 2-CPU Xeon linux box. &amp;nbsp;The application supports multiple video inputs and multiple video outputs with standard image processing like picture-in-a-picture, graphics and language-specific text overlay, etc. It is basically a pipeline-based architecture where a given input video stream is over laid with language-specific text overlays, then each language specific stream is output on a separate output.&lt;/P&gt;

&lt;P&gt;I currently know nothing about Phi or GPU programming; only that it is for applications that can be structured for parallel processing like vector processing for example. &amp;nbsp;I do not know if it is a good choice for my particular application so I thought I would ask a high level newbie question.&lt;/P&gt;

&lt;P&gt;Q: Is the memory transfer bandwidth between host Xeon CPU memory to the Phi memory sufficient to support multiple video streams?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The Phi seems appropriate for image processing once the image is in the Phi memory, but since this is obviously not what I would call a massively parallel application I am not sure the Phi is a good choice for this particular application.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I am looking for an excuse to start the learning curve for Phi/GPU programming but probably should not go down that path if this application is not a viable match.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;-Andres&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2015 03:55:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/streaming-video-thru-Phi/m-p/1049641#M49361</guid>
      <dc:creator>Andres_G_1</dc:creator>
      <dc:date>2015-03-13T03:55:39Z</dc:date>
    </item>
    <item>
      <title>Andres:</title>
      <link>https://community.intel.com/t5/Software-Archive/streaming-video-thru-Phi/m-p/1049642#M49362</link>
      <description>&lt;P&gt;Andres:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; I think PCI-Express x16 2.0 runs at about 8 GB/s bidirectional.&amp;nbsp; So that might be enough for a few uncompressed video streams, or a fair number of compressed streams.&amp;nbsp; But to make use of that capacity in a meaningful way requires considerable care in terms of overlapped compute and transfer, appropriate buffer sizes, making sure neither the host nor the coprocessor is waiting on each other, etc.&amp;nbsp; In short, it's not trivial.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; What concerns me is that you say "this is obviously not what I would call a massively parallel application".&amp;nbsp; If you can't max out the Xeon host most of the time with highly parallel work that is&amp;nbsp;well-vectorized and makes optimal use of the host's cache architecture, then it's not looking good for a coprocessor.&amp;nbsp; The Intel Xeon Phi coprocessor is a pretty slow serial machine that really needs a high degree of parallelism, vectorization, and good memory use to shine.&amp;nbsp; And by "high degree" I mean using all threads well north of 90% of the time (look up Amdahl's law to see why).&amp;nbsp; Anything you do to optimize the Intel Xeon code will benefit a Intel Xeon Phi implementation if you eventually go that way.&lt;/P&gt;

&lt;P&gt;Sorry, Charles&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2015 19:48:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/streaming-video-thru-Phi/m-p/1049642#M49362</guid>
      <dc:creator>Charles_C_Intel1</dc:creator>
      <dc:date>2015-03-27T19:48:03Z</dc:date>
    </item>
  </channel>
</rss>

