<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thanks for the code sample. I in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943492#M1822</link>
    <description>&lt;P&gt;Thanks for the code sample. I'll take a look and get back to you. Just to clarify FMA3 is only supported in 4th Gen Intel Core Processors. What is your CPU config?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Raghu&lt;/P&gt;</description>
    <pubDate>Fri, 12 Jul 2013 17:07:26 GMT</pubDate>
    <dc:creator>Raghupathi_M_Intel</dc:creator>
    <dc:date>2013-07-12T17:07:26Z</dc:date>
    <item>
      <title>AVX2 and FMA3 support</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943491#M1821</link>
      <description>&lt;P&gt;The FAQ states "Yes, Intel OpenCL* SDK 2013 introduces performance improvements that include full code generation on the Intel Advanced Vector Extensions (Intel AVX and Intel AVX2)."&lt;/P&gt;
&lt;P&gt;I'm trying to get it to produce code that utilises the AVX2 FMA3 instructions.&lt;/P&gt;
&lt;P&gt;I'm using the Kernel Builder (CPU - 64 bit AVX2) i.e. target set for AVX2 instruction set.&lt;/P&gt;
&lt;P&gt;-------------&lt;BR /&gt;__kernel void dofma(const global float *a, const global float *b, const global float *c, global float *out)&lt;BR /&gt;{&lt;BR /&gt;uint gid= get_global_id(0);&lt;BR /&gt;float fa = a[gid];&lt;BR /&gt;float fb = b[gid];&lt;BR /&gt;float fc = c[gid];&lt;BR /&gt;fa = mad(fa,fb,fc);&lt;BR /&gt;out[gid] = fa;&lt;BR /&gt;}&lt;BR /&gt;------------------&lt;/P&gt;
&lt;P&gt;Gives code that uses vmulps and vaddps but not VFMADD213 type code&lt;/P&gt;
&lt;P&gt;using fa = fma(fa,fb,fc);&lt;BR /&gt;produces alot more code and a function call for the fma which results in very low performance.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2013 06:51:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943491#M1821</guid>
      <dc:creator>MSimm2</dc:creator>
      <dc:date>2013-07-08T06:51:36Z</dc:date>
    </item>
    <item>
      <title>Thanks for the code sample. I</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943492#M1822</link>
      <description>&lt;P&gt;Thanks for the code sample. I'll take a look and get back to you. Just to clarify FMA3 is only supported in 4th Gen Intel Core Processors. What is your CPU config?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Raghu&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jul 2013 17:07:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943492#M1822</guid>
      <dc:creator>Raghupathi_M_Intel</dc:creator>
      <dc:date>2013-07-12T17:07:26Z</dc:date>
    </item>
    <item>
      <title>Quote:Raghu Muthyalampalli</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943493#M1823</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Raghu Muthyalampalli (Intel) wrote:&lt;BR /&gt;Thanks for the code sample. I'll take a look and get back to you. Just to clarify FMA3 is only supported in 4th Gen Intel Core Processors. What is your CPU config?&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;i7-4770 no K&lt;/P&gt;
&lt;P&gt;However that shouldn't matter if the kernel builder build options are set to target AVX2 instruction set.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Jul 2013 22:53:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943493#M1823</guid>
      <dc:creator>MSimm2</dc:creator>
      <dc:date>2013-07-12T22:53:46Z</dc:date>
    </item>
    <item>
      <title>The Intel SPMD Program</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943494#M1824</link>
      <description>&lt;P&gt;The Intel SPMD Program Compiler &lt;STRONG&gt;does&lt;/STRONG&gt; emit fma instructions (vfmadd213ps&amp;nbsp;&amp;nbsp; &amp;nbsp;%ymm0, %ymm1, %ymm2)&lt;BR /&gt;It's an example of how the opencl asm should appear&lt;/P&gt;
&lt;P&gt;However this isn't useful to me since I need to target both CPU's and GPU (and GPUs have more Gflops) and I don't want to maintain the code in two different apis.&lt;/P&gt;
&lt;P&gt;e.g. with a file Test.ispc as below and the command&lt;/P&gt;
&lt;P&gt;ispc -O2 Test.ispc -o Test.asm -h Test_ispc.h --target=avx2 --emit-asm&lt;/P&gt;
&lt;P&gt;------------------------------------------&lt;BR /&gt;export void simple(uniform float a[],uniform float b[] ,uniform float c[] ,uniform float out[], uniform int count)&lt;BR /&gt;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; foreach (index = 0 ... count)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; float fa = a[index];&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;float fb = b[index];&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;float fc = c[index];&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;fa = fb * fc + fa;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out[index] = fa;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Aug 2013 07:56:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943494#M1824</guid>
      <dc:creator>MSimm2</dc:creator>
      <dc:date>2013-08-01T07:56:00Z</dc:date>
    </item>
    <item>
      <title>Still does not use AVX2 FMA</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943495#M1825</link>
      <description>&lt;P&gt;Still does not use AVX2 FMA instructions... Isn't this like, an obvious thing to implement!&lt;/P&gt;

&lt;P&gt;I'm still getting&lt;/P&gt;

&lt;P style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"&gt;vmovups YMM1, YMMWORD PTR [R11 + 4*RDI]&lt;BR /&gt;
	vmulps YMM0, YMM1, YMM0&lt;BR /&gt;
	vmovups YMM1, YMMWORD PTR [R9 + 4*RDI]&lt;BR /&gt;
	vaddps YMM0, YMM0, YMM1&lt;BR /&gt;
	vmovups YMMWORD PTR [R8 + 4*RDI], YMM0&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"&gt;Where is a VFMADD213!&lt;/P&gt;

&lt;P style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;-------------------------------------&lt;/P&gt;

&lt;P&gt;Using build options: -cl-unsafe-math-optimizations -cl-fast-relaxed-math -cl-mad-enable&lt;/P&gt;

&lt;P style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"&gt;Setting target instruction set architecture to: Advanced Vector Extension 2 (AVX2)&lt;BR /&gt;
	Intel OpenCL Intel CPU device was found!&lt;BR /&gt;
	Device name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz&lt;BR /&gt;
	Device version: OpenCL 1.2 (Build 78712)&lt;BR /&gt;
	Device vendor: Intel(R) Corporation&lt;BR /&gt;
	Device profile: FULL_PROFILE&lt;BR /&gt;
	Compilation started&lt;BR /&gt;
	Compilation done&lt;BR /&gt;
	Linking started&lt;BR /&gt;
	Linking done&lt;BR /&gt;
	Kernel &amp;lt;dofma&amp;gt; was successfully vectorized&lt;BR /&gt;
	Done.&lt;BR /&gt;
	Build succeeded!&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2013 23:58:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943495#M1825</guid>
      <dc:creator>MSimm2</dc:creator>
      <dc:date>2013-12-18T23:58:00Z</dc:date>
    </item>
    <item>
      <title>YES!</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943496#M1826</link>
      <description>&lt;P&gt;YES!&lt;/P&gt;

&lt;P&gt;Intel opencl sdk 2014 64bit CPU runtime&lt;/P&gt;

&lt;P&gt;FMA working.&lt;/P&gt;

&lt;P&gt;Its generating vfmadd213ps %ymm0, %ymm1, %ymm2 instructions for both mad() and fma()&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 May 2014 02:39:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/AVX2-and-FMA3-support/m-p/943496#M1826</guid>
      <dc:creator>MSimm2</dc:creator>
      <dc:date>2014-05-22T02:39:32Z</dc:date>
    </item>
  </channel>
</rss>

