<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic unexpected hStreams performance in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081348#M61742</link>
    <description>&lt;P&gt;Hey guys,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I am writing microbenchmarks with hStreams, but I met with some unexpected performance numbers. The observations are that, when increasing the number of partitions (or places per domain), the execution time doubles. Theoretically, I expect the performance should remain the same (or around). Could you help me with the issue?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I attach the source code and the results. BTW, I am using hStreams v3.5.2.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Jianbin&lt;/P&gt;</description>
    <pubDate>Mon, 11 Jan 2016 14:25:30 GMT</pubDate>
    <dc:creator>Jianbin_F_</dc:creator>
    <dc:date>2016-01-11T14:25:30Z</dc:date>
    <item>
      <title>unexpected hStreams performance</title>
      <link>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081348#M61742</link>
      <description>&lt;P&gt;Hey guys,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I am writing microbenchmarks with hStreams, but I met with some unexpected performance numbers. The observations are that, when increasing the number of partitions (or places per domain), the execution time doubles. Theoretically, I expect the performance should remain the same (or around). Could you help me with the issue?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I attach the source code and the results. BTW, I am using hStreams v3.5.2.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Jianbin&lt;/P&gt;</description>
      <pubDate>Mon, 11 Jan 2016 14:25:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081348#M61742</guid>
      <dc:creator>Jianbin_F_</dc:creator>
      <dc:date>2016-01-11T14:25:30Z</dc:date>
    </item>
    <item>
      <title>I believe to have identified</title>
      <link>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081349#M61743</link>
      <description>&lt;P&gt;I believe to have identified the rootcause of your problem.&lt;/P&gt;

&lt;P&gt;Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking).&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).&lt;/P&gt;

&lt;P&gt;I think a better way would be to call hStreams_GetCurrentOptions(&amp;amp;hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.&lt;/P&gt;

&lt;P&gt;Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.&lt;/P&gt;

&lt;P&gt;Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).&lt;/P&gt;

&lt;P&gt;I hope this helps!&lt;/P&gt;</description>
      <pubDate>Mon, 11 Jan 2016 16:00:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081349#M61743</guid>
      <dc:creator>Wojciech_W_Intel</dc:creator>
      <dc:date>2016-01-11T16:00:19Z</dc:date>
    </item>
    <item>
      <title>Quote:WOJCIECH W. (Intel)</title>
      <link>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081350#M61744</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;WOJCIECH W. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I believe to have identified the rootcause of your problem.&lt;/P&gt;

&lt;P&gt;Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking).&amp;nbsp;Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.&lt;/P&gt;

&lt;P&gt;If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).&lt;/P&gt;

&lt;P&gt;I think a better way would be to call hStreams_GetCurrentOptions(&amp;amp;hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.&lt;/P&gt;

&lt;P&gt;Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.&lt;/P&gt;

&lt;P&gt;Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).&lt;/P&gt;

&lt;P&gt;I hope this helps!&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thanks very much for the help! It solves the problem.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jan 2016 14:21:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/unexpected-hStreams-performance/m-p/1081350#M61744</guid>
      <dc:creator>Jianbin_F_</dc:creator>
      <dc:date>2016-01-12T14:21:14Z</dc:date>
    </item>
  </channel>
</rss>

