- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm curious why the dot(*,*) function does not translate into the DPPS instruction for float4 data types. Instead it translates into a VMULPS followed by two VHADDPS. (Compiled with Intel(R) OpenCL(TM) Offline Compiler Command-Line Client, version 1.0.2 with AVX enabled).
Thanks,
Paul
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am guessing its for performance reasons. dpps has longer latency than vmulps and vhaddps.
Thanks,
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply. If dpps has a longer latency than doing one vmulps and two vhaddps, I'm curious why one should every use a dpps.
Best,
Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The recommendation is not to use it :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's kind of odd, why would you have such an instruction in this case.
By the way, the following code snippet will be translated into a dpps call:
void dot(float * a, float*__restrict__ b, float *__restrict__ c){
*c = a[0] * b[0];
*c += a[1] * b[1];
*c += a[2] * b[2];
*c += a[3] * b[3];
}
::L__routine_start__Z3dotPfS_S(void):
dot(float*, float*, float*):
vmovups xmm0, XMMWORD PTR [rsi] #9.3
vmovups xmm1, XMMWORD PTR [rdi] #9.3
vdpps xmm2, xmm0, xmm1, 241 #9.3
vmovss DWORD PTR [rdx], xmm2 #9.3
ret
(Using icc 13.0.1 with flags set to: -O3 -masm=intel -mavx)
Thanks,
Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's kind of odd, why would you have such an instruction in this case.
By the way, the following code snippet will be translated into a dpps call:
void dot(float * a, float*__restrict__ b, float *__restrict__ c){
*c = a[0] * b[0];
*c += a[1] * b[1];
*c += a[2] * b[2];
*c += a[3] * b[3];
}
::L__routine_start__Z3dotPfS_S(void):
dot(float*, float*, float*):
vmovups xmm0, XMMWORD PTR [rsi] #9.3
vmovups xmm1, XMMWORD PTR [rdi] #9.3
vdpps xmm2, xmm0, xmm1, 241 #9.3
vmovss DWORD PTR [rdx], xmm2 #9.3
ret
(Using icc 13.0.1 with flags set to: -O3 -masm=intel -mavx)
Thanks,
Paul
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page