As has been stated on a number of review sites, AVX 512 performance on the 6/8 core Skylake-X is compromised.
Only on the 10 core, the present hardware is fully enabled.
Would Intel be so kind as to provide in depth detail of what the performance difference means ?
From the vague information available it seems one of 2(3?) AVX 512 ports is disabled (port 5).
Can we get more detailed information, which ports are used for AVX 512 ?
What AVX 512 instructions can the ports execute, do they have 512-bit data paths to registers/cache ?
How is AVX 512 gather affected regarding the 6/8 core versus 10 core ?
A similar drawing as below for AVX2 would be appreciated.
I'm not an Intel representative, but this is how I understand the article. The 6- and 8-core models have one of the two FMA units disabled (the one connected to Port 5), thus FMA instructions only having half the throughput of the 10-core model. One 512-bit register contains 8 DP FP elements, so from the article it follows that FMA instructions have reciprocal throughput of 0.5 on 6- and 8-core models and 0.25 on the 10-core model.
Ports 0, 1 and 5 are all enabled on all Skylake-X CPU models. Ports 0 and 1 are used for most 256-bit vector instructions and can fuse together to issue a 512-bit vector instruction (i.e. to execute the same 256-bit instruction on the two 256-bit lanes). Port 5 is 512-bit and can also issue 512-bit vector instructions. It is additionally used for cross-lane operations, such as shuffles. On the 10-core CPU its is also used for the second FMA unit.
Apparently, what follows from this is that most of the 512-bit instructions should have at most the 2/3 throughput compared to the corresponding 256-bit counterparts. But I have not seen any numbers yet to confirm that.
Some people that have bought the 7800x now claim, based on benchmarks, both FMA 512 units are enabled on the 6 core.
Can somebody from Intel please confirm this ?
Fortunately this information is included in the Intel ARK entries for the server parts. For example, the Xeon Platinum 8160 description at https://ark.intel.com/products/120501/Intel-Xeon-Platinum-8160-Processor-33M-Cache-2_10-GHz includes
# of AVX-512 FMA Units 2
This is the correct answer for this processor.
In general, the Platinum series processors and the Gold 6000 series processors all have 2 FMA units, and the other processors have 1 FMA unit. I know of at least one exception -- the Gold 5122 has 2 FMA units. I don't know if there are other exceptions -- there are 58 processor models and the number of FMA units is not a field that can be used with the advanced search function.
In case you have one of those Skylake-X processors, and want to find out if it has 2 AVX 512 FMAs.
Here a real time AVX2 / AVX512 / GPU Julia/Mandelbrot zoomer:
All computations done with double precision. Very much optimized with FMA computations and multi-threading.
You can switch from AVX512 to AVX2. If you notice a big difference in frames per second you can assume to have 2 AVX512 FMAs
Computation speed is up to 60 FPS at 4K resolution on an 8 core running at 4 Ghz using AVX512.
As John already indicated, the AVX-512 unit count is provided for all of the parts enumerated on https://ark.intel.com/products/series/125191/Intel-Xeon-Scalable-Processors.
Information about Xeon is totally useless if the question is about information for Skylake-X.
No information about nr AVX 512 units for Skylake-X as you can see.