Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
46 Views

Intel Compiler for KNL and Skylake X Problem

Hello

I have been trying to compile code for AVX512 with -O2 and all the other optimizations. However stumbled upon a small problem.

When using the zmm registers before each arithmetic instruction such as vaddpd, vsubpd, vmulpd, fmaddpd move instructions pop out of the blue. Each time the data from the high zmm registers (16 - 31) is first moved to the low zmm registers and the the operation is being done...

Is this a hardware problem where the instructions only use the low registers or is it a compiler bug?

Best,
Thom

0 Kudos
5 Replies
Highlighted
Moderator
46 Views

Hi Thom,

Is it possible to provide a test case for us to investigate?

Thanks,

Viet

0 Kudos
Highlighted
Black Belt
46 Views

I thought there was an option for this, but the 2018 documentation changes have stopped me from searching for it.

0 Kudos
Highlighted
Moderator
46 Views

Hi Tim,

Is this what your are referring to?

-qopt-zmm-usage=<keyword>
          Specifies the level of zmm registers usage.  You can specify one of
          the following:
            low  - Tells the compiler that the compiled program is unlikely to
                   benefit from zmm registers usage. It specifies that the
                   compiler should avoid using zmm registers unless it can
                   prove the gain from their usage.
            high - Tells the compiler to generate zmm code without restrictions

 

0 Kudos
Highlighted
46 Views

Hi Viet

Using the -qopt-zmm-usage=high solves the problem. It seems now that all 32 registers are used.

Thank you,

Thom

0 Kudos
Highlighted
46 Views

Hi Viet

It seems it still does not work:

  4048b6:       62 51 ed 48 58 ef       vaddpd %zmm15,%zmm2,%zmm13
  4048bc:       62 b1 7c 48 28 d8       vmovaps %zmm16,%zmm3
  4048c2:       62 71 dd 48 58 f3       vaddpd %zmm3,%zmm4,%zmm14
  4048c8:       62 61 7c 48 28 c0       vmovaps %zmm0,%zmm24
  4048ce:       62 d1 7c 48 28 c7       vmovaps %zmm15,%zmm0
  4048d4:       62 31 7c 48 28 fe       vmovaps %zmm22,%zmm15
  4048da:       62 51 fd 48 5c ff       vsubpd %zmm15,%zmm0,%zmm15
  4048e0:       62 f1 e5 48 5c cc       vsubpd %zmm4,%zmm3,%zmm1
  4048e6:       62 61 7c 48 28 c9       vmovaps %zmm1,%zmm25
  4048ec:       62 b1 7c 48 28 cc       vmovaps %zmm20,%zmm1
  4048f2:       62 b1 7c 48 28 c7       vmovaps %zmm23,%zmm0
  4048f8:       62 f1 fd 48 58 d9       vaddpd %zmm1,%zmm0,%zmm3
  4048fe:       62 b1 7c 48 28 e5       vmovaps %zmm21,%zmm4
  404904:       62 91 7c 48 28 c0       vmovaps %zmm24,%zmm0
  40490a:       62 f1 fd 48 58 d4       vaddpd %zmm4,%zmm0,%zmm2
  404910:       62 41 7c 48 28 d7       vmovaps %zmm15,%zmm26
  404916:       62 31 7c 48 28 ff       vmovaps %zmm23,%zmm15
  40491c:       62 d1 f5 48 5c cf       vsubpd %zmm15,%zmm1,%zmm1
  404922:       62 61 7c 48 28 d9       vmovaps %zmm1,%zmm27
  404928:       62 31 7c 48 28 fd       vmovaps %zmm21,%zmm15
  40492e:       62 91 7c 48 28 c8       vmovaps %zmm24,%zmm1
  404934:       62 f1 85 48 5c e1       vsubpd %zmm1,%zmm15,%zmm4

This is the assembly code... it seems that before each vsubpd, vaddpd and so on there are still a lot of moves.

Best,

Thom

0 Kudos