As has been stated on a number of review sites, AVX 512 performance on the 6/8 core Skylake-X is compromised.
Only on the 10 core, the present hardware is fully enabled.
Would Intel be so kind as to provide in depth detail of what the performance difference means ?
From the vague information available it seems one of 2(3?) AVX 512 ports is disabled (port 5).
Can we get more detailed information, which ports are used for AVX 512 ?
What AVX 512 instructions can the ports execute, do they have 512-bit data paths to registers/cache ?
How is AVX 512 gather affected regarding the 6/8 core versus 10 core ?
A similar drawing as below for AVX2 would be appreciated.
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
I'm not an Intel representative, but this is how I understand the article. The 6- and 8-core models have one of the two FMA units disabled (the one connected to Port 5), thus FMA instructions only having half the throughput of the 10-core model. One 512-bit register contains 8 DP FP elements, so from the article it follows that FMA instructions have reciprocal throughput of 0.5 on 6- and 8-core models and 0.25 on the 10-core model.
Ports 0, 1 and 5 are all enabled on all Skylake-X CPU models. Ports 0 and 1 are used for most 256-bit vector instructions and can fuse together to issue a 512-bit vector instruction (i.e. to execute the same 256-bit instruction on the two 256-bit lanes). Port 5 is 512-bit and can also issue 512-bit vector instructions. It is additionally used for cross-lane operations, such as shuffles. On the 10-core CPU its is also used for the second FMA unit.
Apparently, what follows from this is that most of the 512-bit instructions should have at most the 2/3 throughput compared to the corresponding 256-bit counterparts. But I have not seen any numbers yet to confirm that.
Fortunately this information is included in the Intel ARK entries for the server parts. For example, the Xeon Platinum 8160 description at https://ark.intel.com/products/120501/Intel-Xeon-Platinum-8160-Processor-33M-Cache-2_10-GHz includes
# of AVX-512 FMA Units 2
This is the correct answer for this processor.
In general, the Platinum series processors and the Gold 6000 series processors all have 2 FMA units, and the other processors have 1 FMA unit. I know of at least one exception -- the Gold 5122 has 2 FMA units. I don't know if there are other exceptions -- there are 58 processor models and the number of FMA units is not a field that can be used with the advanced search function.
In case you have one of those Skylake-X processors, and want to find out if it has 2 AVX 512 FMAs.
Here a real time AVX2 / AVX512 / GPU Julia/Mandelbrot zoomer:
All computations done with double precision. Very much optimized with FMA computations and multi-threading.
You can switch from AVX512 to AVX2. If you notice a big difference in frames per second you can assume to have 2 AVX512 FMAs
Computation speed is up to 60 FPS at 4K resolution on an 8 core running at 4 Ghz using AVX512.
As John already indicated, the AVX-512 unit count is provided for all of the parts enumerated on https://ark.intel.com/products/series/125191/Intel-Xeon-Scalable-Processors.
Information about Xeon is totally useless if the question is about information for Skylake-X.
No information about nr AVX 512 units for Skylake-X as you can see.