- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiling code using AVX 512 intrinsics with processor specific optimization set to AVX2, results in AVX 512 instructions (looking at generated assembly) using only zmm0-zmm15.
So only 16 registers are used instead of 32.
With processor specific optimization set to AVX512 core, all 32 zmm registers are used.
(I need to compile with processor specific to AVX2, else the AVX2 code path does no run.)
Is this a known issue that can be fixed ?
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intrinsics are assembly-coded; so if you call AVX512 instrinsics you will see zmm registers being used.
If you target AVX2, then I would suggest to use AVX2 instrinsics instead.
Regards,
Viet Hoang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry if my explanation was not clear.
AVX512 intrinsics of course generate assembly using zmm registers.
My point is the produced assembly only uses 16 out of the 32 zmm registers, which is not good, it should use all, to reduce register spilling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Filed an issue CMPLRS-45119 for this. Compiler developers are working on this. In the meantime, try the workaround _allow_cpu_features(_FEATURES_AVX512F). More information documented at https://software.intel.com/en-us/articles/new-intrinsic-allow-cpu-features-support

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page