OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

AVX2 and FMA3 support

MSimm2
New Contributor I
3,338 Views

The FAQ states "Yes, Intel OpenCL* SDK 2013 introduces performance improvements that include full code generation on the Intel Advanced Vector Extensions (Intel AVX and Intel AVX2)."

I'm trying to get it to produce code that utilises the AVX2 FMA3 instructions.

I'm using the Kernel Builder (CPU - 64 bit AVX2) i.e. target set for AVX2 instruction set.

-------------
__kernel void dofma(const global float *a, const global float *b, const global float *c, global float *out)
{
uint gid= get_global_id(0);
float fa = a[gid];
float fb = b[gid];
float fc = c[gid];
fa = mad(fa,fb,fc);
out[gid] = fa;
}
------------------

Gives code that uses vmulps and vaddps but not VFMADD213 type code

using fa = fma(fa,fb,fc);
produces alot more code and a function call for the fma which results in very low performance.

0 Kudos
1 Solution
MSimm2
New Contributor I
3,338 Views

YES!

Intel opencl sdk 2014 64bit CPU runtime

FMA working.

Its generating vfmadd213ps %ymm0, %ymm1, %ymm2 instructions for both mad() and fma()

 

View solution in original post

0 Kudos
5 Replies
Raghupathi_M_Intel
3,338 Views

Thanks for the code sample. I'll take a look and get back to you. Just to clarify FMA3 is only supported in 4th Gen Intel Core Processors. What is your CPU config?

Thanks,
Raghu

0 Kudos
MSimm2
New Contributor I
3,338 Views

Raghu Muthyalampalli (Intel) wrote:
Thanks for the code sample. I'll take a look and get back to you. Just to clarify FMA3 is only supported in 4th Gen Intel Core Processors. What is your CPU config?

i7-4770 no K

However that shouldn't matter if the kernel builder build options are set to target AVX2 instruction set.

0 Kudos
MSimm2
New Contributor I
3,338 Views

The Intel SPMD Program Compiler does emit fma instructions (vfmadd213ps    %ymm0, %ymm1, %ymm2)
It's an example of how the opencl asm should appear

However this isn't useful to me since I need to target both CPU's and GPU (and GPUs have more Gflops) and I don't want to maintain the code in two different apis.

e.g. with a file Test.ispc as below and the command

ispc -O2 Test.ispc -o Test.asm -h Test_ispc.h --target=avx2 --emit-asm

------------------------------------------
export void simple(uniform float a[],uniform float b[] ,uniform float c[] ,uniform float out[], uniform int count)
{
    foreach (index = 0 ... count)
    {
        float fa = a[index];
        float fb = b[index];
        float fc = c[index];
        fa = fb * fc + fa;
        out[index] = fa;
    }
}

0 Kudos
MSimm2
New Contributor I
3,338 Views

Still does not use AVX2 FMA instructions... Isn't this like, an obvious thing to implement!

I'm still getting

vmovups YMM1, YMMWORD PTR [R11 + 4*RDI]
vmulps YMM0, YMM1, YMM0
vmovups YMM1, YMMWORD PTR [R9 + 4*RDI]
vaddps YMM0, YMM0, YMM1
vmovups YMMWORD PTR [R8 + 4*RDI], YMM0
 

Where is a VFMADD213!

 

-------------------------------------

Using build options: -cl-unsafe-math-optimizations -cl-fast-relaxed-math -cl-mad-enable

Setting target instruction set architecture to: Advanced Vector Extension 2 (AVX2)
Intel OpenCL Intel CPU device was found!
Device name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Device version: OpenCL 1.2 (Build 78712)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Compilation started
Compilation done
Linking started
Linking done
Kernel <dofma> was successfully vectorized
Done.
Build succeeded!
 

0 Kudos
MSimm2
New Contributor I
3,339 Views

YES!

Intel opencl sdk 2014 64bit CPU runtime

FMA working.

Its generating vfmadd213ps %ymm0, %ymm1, %ymm2 instructions for both mad() and fma()

 

0 Kudos
Reply