<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Try in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115693#M7479</link>
    <description>&lt;P&gt;Try&lt;/P&gt;

&lt;P&gt;masked load of target (same mask as for source)&lt;BR /&gt;
	masked load of source&lt;BR /&gt;
	xor the mask load of target with target (zeroing out the field of interest)&lt;BR /&gt;
	or the masked load of source into the target (with zeroed out the field of interest)&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Feb 2016 17:33:00 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2016-02-03T17:33:00Z</dc:date>
    <item>
      <title>Single load operation</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115689#M7475</link>
      <description>&lt;P&gt;Dear Intel developers,&lt;/P&gt;

&lt;P&gt;by using a __mm128 type, what is the best and fast way to fill that type one float per time starting from an array of float? Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 13:36:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115689#M7475</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-03T13:36:56Z</dc:date>
    </item>
    <item>
      <title>Did you consult intrinsics</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115690#M7476</link>
      <description>&lt;P&gt;Did you consult intrinsics guide, e.g. &lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/" target="_blank"&gt;https://software.intel.com/sites/landingpage/IntrinsicsGuide/&lt;/A&gt; ?&lt;/P&gt;

&lt;P&gt;If you don't want _mm_set_ps or _mm_setr_ps, you will need to explain your requirements.&amp;nbsp; Depending on what you have in mind, the C++ or possibly the ISA forum may be appropriate.&lt;/P&gt;

&lt;P&gt;These intrinsics will choose appropriate instructions according to your compiler architecture switch setting.&amp;nbsp; Supposing that you do want to change just one 32-bit field, you can set the other fields to the current values, and check whether the compiler optimizes away redundant operations.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 14:36:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115690#M7476</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-02-03T14:36:00Z</dc:date>
    </item>
    <item>
      <title>Quote:Tim P. wrote:</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115691#M7477</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Tim P. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Did you consult intrinsics guide, e.g. &lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/"&gt;https://software.intel.com/sites/landingpage/IntrinsicsGuide/&lt;/A&gt; ?&lt;/P&gt;

&lt;P&gt;If you don't want _mm_set_ps or _mm_setr_ps, you will need to explain your requirements.&amp;nbsp; Depending on what you have in mind, the C++ or possibly the ISA forum may be appropriate.&lt;/P&gt;

&lt;P&gt;These intrinsics will choose appropriate instructions according to your compiler architecture switch setting.&amp;nbsp; Supposing that you do want to change just one 32-bit field, you can set the other fields to the current values, and check whether the compiler optimizes away redundant operations.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi Tim,&lt;/P&gt;

&lt;P&gt;Yes I use frequently Intel Intrinsics Guide, but at the moment I didn't find a solution. Starting from a __m128 type, so having 4 floats [1, 2, 3, 4] I would like to set a single float per time without modify the others. By using a maskload for example, I can set a single elements, but the others are set to zero from that instructions, but It seems, also from your reply, the partial solution is to rewrite the elements with the same values except the value to modify&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 14:58:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115691#M7477</guid>
      <dc:creator>unrue</dc:creator>
      <dc:date>2016-02-03T14:58:00Z</dc:date>
    </item>
    <item>
      <title>It is easy enough to write</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115692#M7478</link>
      <description>&lt;P&gt;It is easy enough to write the four values to consecutive memory locations using a 4-element dummy array, then perform a 128-bit load to get them all back into a vector register.&amp;nbsp;&amp;nbsp; I find this more convenient than figuring out some of the more obscure intrinsic functions.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 15:49:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115692#M7478</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-02-03T15:49:23Z</dc:date>
    </item>
    <item>
      <title>Try</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115693#M7479</link>
      <description>&lt;P&gt;Try&lt;/P&gt;

&lt;P&gt;masked load of target (same mask as for source)&lt;BR /&gt;
	masked load of source&lt;BR /&gt;
	xor the mask load of target with target (zeroing out the field of interest)&lt;BR /&gt;
	or the masked load of source into the target (with zeroed out the field of interest)&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 17:33:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115693#M7479</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2016-02-03T17:33:00Z</dc:date>
    </item>
    <item>
      <title>Usually I am using what @John</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115694#M7480</link>
      <description>&lt;P&gt;Usually I am using what @John described in his response. As a additional advise you may align your float array on 16-byte boundaries before loading it into XMM register.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Mar 2016 18:54:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Single-load-operation/m-p/1115694#M7480</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2016-03-05T18:54:07Z</dc:date>
    </item>
  </channel>
</rss>

