- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I have been trying to compile code for AVX512 with -O2 and all the other optimizations. However stumbled upon a small problem.
When using the zmm registers before each arithmetic instruction such as vaddpd, vsubpd, vmulpd, fmaddpd move instructions pop out of the blue. Each time the data from the high zmm registers (16 - 31) is first moved to the low zmm registers and the the operation is being done...
Is this a hardware problem where the instructions only use the low registers or is it a compiler bug?
Best,
Thom
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thom,
Is it possible to provide a test case for us to investigate?
Thanks,
Viet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought there was an option for this, but the 2018 documentation changes have stopped me from searching for it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
Is this what your are referring to?
-qopt-zmm-usage=<keyword>
Specifies the level of zmm registers usage. You can specify one of
the following:
low - Tells the compiler that the compiled program is unlikely to
benefit from zmm registers usage. It specifies that the
compiler should avoid using zmm registers unless it can
prove the gain from their usage.
high - Tells the compiler to generate zmm code without restrictions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Viet
Using the -qopt-zmm-usage=high solves the problem. It seems now that all 32 registers are used.
Thank you,
Thom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Viet
It seems it still does not work:
4048b6: 62 51 ed 48 58 ef vaddpd %zmm15,%zmm2,%zmm13
4048bc: 62 b1 7c 48 28 d8 vmovaps %zmm16,%zmm3
4048c2: 62 71 dd 48 58 f3 vaddpd %zmm3,%zmm4,%zmm14
4048c8: 62 61 7c 48 28 c0 vmovaps %zmm0,%zmm24
4048ce: 62 d1 7c 48 28 c7 vmovaps %zmm15,%zmm0
4048d4: 62 31 7c 48 28 fe vmovaps %zmm22,%zmm15
4048da: 62 51 fd 48 5c ff vsubpd %zmm15,%zmm0,%zmm15
4048e0: 62 f1 e5 48 5c cc vsubpd %zmm4,%zmm3,%zmm1
4048e6: 62 61 7c 48 28 c9 vmovaps %zmm1,%zmm25
4048ec: 62 b1 7c 48 28 cc vmovaps %zmm20,%zmm1
4048f2: 62 b1 7c 48 28 c7 vmovaps %zmm23,%zmm0
4048f8: 62 f1 fd 48 58 d9 vaddpd %zmm1,%zmm0,%zmm3
4048fe: 62 b1 7c 48 28 e5 vmovaps %zmm21,%zmm4
404904: 62 91 7c 48 28 c0 vmovaps %zmm24,%zmm0
40490a: 62 f1 fd 48 58 d4 vaddpd %zmm4,%zmm0,%zmm2
404910: 62 41 7c 48 28 d7 vmovaps %zmm15,%zmm26
404916: 62 31 7c 48 28 ff vmovaps %zmm23,%zmm15
40491c: 62 d1 f5 48 5c cf vsubpd %zmm15,%zmm1,%zmm1
404922: 62 61 7c 48 28 d9 vmovaps %zmm1,%zmm27
404928: 62 31 7c 48 28 fd vmovaps %zmm21,%zmm15
40492e: 62 91 7c 48 28 c8 vmovaps %zmm24,%zmm1
404934: 62 f1 85 48 5c e1 vsubpd %zmm1,%zmm15,%zmm4
This is the assembly code... it seems that before each vsubpd, vaddpd and so on there are still a lot of moves.
Best,
Thom

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page