I'm trying to understand the number of ports that are available for the vector instructions being executed on my processor, an Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (Cascade Lake).
Wikichip points me toward Skylake to see the microarchitecture, and as I understand it I should have two FMA ports, one "fused" by using Port 0 and Port 1 to form a 512-bit FMA unit, and another dedicated 512-bit FMA unit on Port 5. (https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)#Scheduler_.26_512-SIMD_additi...
For total context, I'm comparing single float vs int16 matrix-vector multiplication, and the single float version edges out my int16 even though I should have more data parallelism from the smaller data type. I'm comparing two sequences of assembly that repeatedly execute, the first calls these floating-point vector instructions including FMA:
And the second calls these integer vector instructions that mimic the fused multiply-accumulate but with int16_t:
My two questions are:
Thanks so much in advance.
Thank you for joining the Intel community
Please allow me some time to research on this and I will get back to you as soon as I have some updates.
In the meantime you could take a look at the Xeo Processors resource site:
Intel Customer Support