OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1722 Discussions

Dot(*,*) does not translate to DPPS

Paul_S_
Beginner
1,272 Views

Hi all,

I'm curious why the dot(*,*) function does not translate into the DPPS instruction for float4 data types. Instead it translates into a VMULPS followed by two VHADDPS. (Compiled with Intel(R) OpenCL(TM) Offline Compiler Command-Line Client, version 1.0.2 with AVX enabled).

Thanks,
Paul

0 Kudos
5 Replies
Raghupathi_M_Intel
1,272 Views

I am guessing its for performance reasons. dpps has longer latency than vmulps and vhaddps.

Thanks,
Raghu

0 Kudos
Paul_S_
Beginner
1,272 Views
Thanks for the reply. If dpps has a longer latency than doing one vmulps and two vhaddps, I'm curious why one should every use a dpps. Best, Paul
0 Kudos
Raghupathi_M_Intel
1,272 Views

The recommendation is not to use it :-)

0 Kudos
Paul_S_
Beginner
1,272 Views
That's kind of odd, why would you have such an instruction in this case. By the way, the following code snippet will be translated into a dpps call: void dot(float * a, float*__restrict__ b, float *__restrict__ c){ *c = a[0] * b[0]; *c += a[1] * b[1]; *c += a[2] * b[2]; *c += a[3] * b[3]; } ::L__routine_start__Z3dotPfS_S(void): dot(float*, float*, float*): vmovups xmm0, XMMWORD PTR [rsi] #9.3 vmovups xmm1, XMMWORD PTR [rdi] #9.3 vdpps xmm2, xmm0, xmm1, 241 #9.3 vmovss DWORD PTR [rdx], xmm2 #9.3 ret (Using icc 13.0.1 with flags set to: -O3 -masm=intel -mavx) Thanks, Paul
0 Kudos
Paul_S_
Beginner
1,272 Views
That's kind of odd, why would you have such an instruction in this case. By the way, the following code snippet will be translated into a dpps call: void dot(float * a, float*__restrict__ b, float *__restrict__ c){ *c = a[0] * b[0]; *c += a[1] * b[1]; *c += a[2] * b[2]; *c += a[3] * b[3]; } ::L__routine_start__Z3dotPfS_S(void): dot(float*, float*, float*): vmovups xmm0, XMMWORD PTR [rsi] #9.3 vmovups xmm1, XMMWORD PTR [rdi] #9.3 vdpps xmm2, xmm0, xmm1, 241 #9.3 vmovss DWORD PTR [rdx], xmm2 #9.3 ret (Using icc 13.0.1 with flags set to: -O3 -masm=intel -mavx) Thanks, Paul
0 Kudos
Reply