- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
vsubpd %ymm14,%ymm5,%ymm10
vaddpd %ymm12,%ymm2,%ymm2
vaddpd 0x20(%rsp),%ymm12,%ymm3
vaddpd %ymm14,%ymm13,%ymm13
vmulpd 0x40(%rsp),%ymm9,%ymm1
vmovupd %ymm0,0xe0(%rsp)
vmovupd %ymm10,0xa0(%rsp)
vmulpd 0xe5b(%rip),%ymm8,%ymm5
vmulpd 0xe53(%rip),%ymm2,%ymm12
vmulpd 0xe0b(%rip),%ymm11,%ymm8
vsubpd %ymm14,%ymm7,%ymm0
vsubpd 0xe1e(%rip),%ymm4,%ymm7
vsubpd 0xe16(%rip),%ymm5,%ymm11
vsubpd %ymm2,%ymm13,%ymm13
vmulpd %ymm1,%ymm6,%ymm6
vaddpd %ymm0,%ymm11,%ymm11
vmulpd %ymm3,%ymm4,%ymm4
vsubpd %ymm2,%ymm9,%ymm9
vmulpd %ymm3,%ymm15,%ymm15
vaddpd %ymm0,%ymm7,%ymm7
vmulpd %ymm1,%ymm13,%ymm13
vsubpd %ymm2,%ymm5,%ymm5
vmulpd %ymm3,%ymm11,%ymm11
vaddpd %ymm2,%ymm14,%ymm14
vmulpd %ymm1,%ymm9,%ymm9
vsubpd %ymm0,%ymm12,%ymm12
vmulpd %ymm3,%ymm7,%ymm7
vaddpd %ymm2,%ymm10,%ymm10
vmulpd %ymm1,%ymm5,%ymm5
vsubpd %ymm0,%ymm8,%ymm8
vmulpd %ymm1,%ymm14,%ymm14
vaddpd %ymm2,%ymm6,%ymm6
vmulpd %ymm3,%ymm12,%ymm12
vsubpd %ymm0,%ymm4,%ymm4
vmulpd %ymm1,%ymm10,%ymm10
vaddpd %ymm0,%ymm15,%ymm15
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I took a look at the provided example and can confirm the overhead with superfluous load/store operations. As you're entirely using intrinsics in your loop there should not be any register pressure causing the compiler to save/restore registers. Hence I've filed a defect for engineering (DPD200293901). I'll let you know about the status.
Thank you for your test sample!
Best regards,
Georg Zitzlsberger

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page