- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
Any updates that fixed the problem of using only 16 registers of the 32 available for KNL and Skylake?
I tried qopt-zmm-usage=low/high and so on and it seems that nothing works... Here is an assembly example:
40122e: 62 71 7c 48 28 fd vmovaps %zmm5,%zmm15
401234: 62 d1 85 48 5c ec vsubpd %zmm12,%zmm15,%zmm5
40123a: 62 51 ed 48 58 fa vaddpd %zmm10,%zmm2,%zmm15
401240: 62 c1 7c 48 28 ca vmovaps %zmm10,%zmm17
401246: 62 31 7c 48 28 e0 vmovaps %zmm16,%zmm12
40124c: 62 61 7c 48 28 e9 vmovaps %zmm1,%zmm29
401252: 62 d1 9d 48 58 c8 vaddpd %zmm8,%zmm12,%zmm1
401258: 62 61 7c 48 28 e4 vmovaps %zmm4,%zmm28
40125e: 62 b1 7c 48 28 e1 vmovaps %zmm17,%zmm4
401264: 62 61 7c 48 28 d9 vmovaps %zmm1,%zmm27
40126a: 62 f1 dd 48 5c ca vsubpd %zmm2,%zmm4,%zmm1
An alternative is to use only the 16 registers... is there a flag that actually works and restricts the number of registers to 16?
Thanks
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think we have an option to limit the usage of zmm registers to 16.
Regards,
Viet

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page